TensorFlow Crucial - updating continuously

Tensorflow has been proved is a prevalent and effective Deep Learning framework, I decide to make a series of post for some of the main aspects in which they may easy to be confused or difficult to use(This article may behave unordered form as I will update it randomly, and yet keep the pace more frequently).


  • Session.run() and Tensor.eval()

In TensorFlow, what is the difference between Session.run() and Tensor.eval()?

I often see people use Session.run() and Tensor.eval() interchangeably which makes me overwhelmingly confused.
First, we need to make it clear that what is Tensor??? (Check it link).

After the graph has been launched in a session, the value of the Tensor can be computed by passing it to tf.Session.run. t.eval() is a shortcut for calling tf.get_default_session().run(t).

In another words, If you have a Tensor t, calling t.eval() is equivalent to calling tf.get_default_session().run(t).

# Make a session the defaults as follows
t = tf.constant(42.0)
sess = tf.Session()
with sess.as_default():   # or `with sess:` to close on exit
    assert sess is tf.get_default_session()
    assert t.eval() == sess.run(t)

"""
The most important difference is that you can use sess.run()
to fetch the values of many tensors in the same step
"""
t = tf.constant(42.0)
u = tf.constant(37.0)
tu = tf.mul(t, u)
ut = tf.mul(u, t)
with sess.as_default():
   tu.eval()  # runs one step
   ut.eval()  # runs one step
   sess.run([tu, ut])  # evaluates both tensors in a single step

Note that each call to eval and run will execute the whole graph from scratch. To cache the result of a computation, assign it to a tf.Variable.

Another example of eval call

# Returns the truth value of (x == y) element-wise.
correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))        
# Casts a tensor to a new type.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float')) 
# Notice that `Z3` needs X input  which comes from `other code snippet`
"""
# forward propagation, build forward propagation in the tensorflow graph
Z3 = forward_propagation(X, parameters)
"""  
# Hence it needs X input as the `fed_dict` for input 
print('Train Accuracy: ', accuracy.eval({X: X_train, Y: Y_train}))
print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))

  • tf.cast see tf.cast(...) in the above code snippet

tf.cast(
x,
dtype,
name=None
)

Defined in tensorflow/python/ops/math_ops.py.
See the guide: Tensor Transformations > Casting
Casts a tensor to a new type.
The operation casts x (in case of Tensor) or x.values (in case of SparseTensor) to dtype.

x:         A Tensor or SparseTensor.
dtype: The destination type.
name: A name for the operation (optional).

A Tensor or SparseTensor with same shape as x will be returned.

x = tf.constant([1.8, 2.2], dtype=tf.float32)
tf.cast(x, tf.int32)  # [1, 2], dtype=tf.int32

Generally, tf.set_random_seed(x) makes the output of the random value is the same each time the program runs

tf.set_random_seed(seed)

Defined in tensorflow/python/framework/random_seed.py.

Sets the graph-level random seed.

Operations that rely on a random seed actually derive it from two seeds: the graph-level and operation-level seeds. This sets the graph-level seed.

Its interactions with operation-level seeds is as follows:

  • If neither the graph-level nor the operation seed is set: A random seed is used for this op.
  • If the graph-level seed is set, but the operation seed is not: The system deterministically picks an operation seed in conjunction with the graph-level seed so that it gets a unique random sequence.
  • If the graph-level seed is not set, but the operation seed is set: A default graph-level seed and the specified operation seed are used to determine the random sequence.
  • If both the graph-level and the operation seed are set: Both seeds are used in conjunction to determine the random sequence.
    • Set on neither graph level nor op level
    a = tf.random_uniform([1])
    b = tf.random_normal([1])
    print("Session 1")
    with tf.Session() as sess1:
        print(sess1.run(a))  # generates 'A1'
        print(sess1.run(a))  # generates 'A2'
        print(sess1.run(b))  # generates 'B1'
        print(sess1.run(b))  # generates 'B2'
    print("Session 2")
    with tf.Session() as sess2:
        print(sess2.run(a))  # generates 'A3'
        print(sess2.run(a))  # generates 'A4'
        print(sess2.run(b))  # generates 'B3'
        print(sess2.run(b))  # generates 'B4'
    
    • Set on op level
    a = tf.random_uniform([1], seed=1)
    b = tf.random_normal([1])
    # Repeatedly running this block with the same graph will generate the same
    # sequence of values for 'a', but different sequences of   values for 'b'.
    print("Session 1")
    with tf.Session() as sess1:
        print(sess1.run(a))  # generates 'A1'
        print(sess1.run(a))  # generates 'A2'
        print(sess1.run(b))  # generates 'B1'
        print(sess1.run(b))  # generates 'B2'
    print("Session 2")
    with tf.Session() as sess2:
        print(sess2.run(a))  # generates 'A1'
        print(sess2.run(a))  # generates 'A2'
        print(sess2.run(b))  # generates 'B3'
        print(sess2.run(b))  # generates 'B4'
    
    • Set on graph level
    tf.set_random_seed(1234)
    a = tf.random_uniform([1])
    b = tf.random_normal([1])
    # Repeatedly running this block with the same graph will generate the same
    # sequences of 'a' and 'b'.
     print("Session 1")
    with tf.Session() as sess1:
        print(sess1.run(a))  # generates 'A1'
        print(sess1.run(a))  # generates 'A2'
        print(sess1.run(b))  # generates 'B1'
        print(sess1.run(b))  # generates 'B2'
    print("Session 2")
    with tf.Session() as sess2:
        print(sess2.run(a))  # generates 'A1'
        print(sess2.run(a))  # generates 'A2'
        print(sess2.run(b))  # generates 'B1'
        print(sess2.run(b))  # generates 'B2'
    

A TensorFlow computation, represented as a dataflow graph.

A Graph contains a set of tf.Operation objects, which represent units of computation; and tf.Tensor objects, which represent the units of data that flow between operations.

A default Graph is always registered, and accessible by calling tf.get_default_graph. To add an operation to the default graph, simply call one of the functions that defines a new Operation:

c = tf.constant(4.0)
assert c.graph is tf.get_default_graph()

Another typical usage involves the tf.Graph.as_default context manager, which overrides the current default graph for the lifetime of the context:

g = tf.Graph()
with g.as_default():
  # Define operations and tensors in `g`.
  c = tf.constant(30.0)
  assert c.graph is g

Important note: This class is not thread-safe for graph construction. All operations should be created from a single thread, or external synchronization must be provided. Unless otherwise specified, all methods are not thread-safe.

A Graph instance supports an arbitrary number of "collections" that are identified by name. For convenience when building a large graph, collections can store groups of related objects: for example, the tf.Variable uses a collection (named tf.GraphKeys.GLOBAL_VARIABLES) for all variables that are created during the construction of a graph. The caller may define additional collections by specifying a new name.


Relevant tf.functions for CNN forward propagation

  • tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = 'SAME'): given an input and a group of filters , this function convolves 's filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation here

  • tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = 'SAME'): given an input A, this function uses a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window. You can read the full documentation here

  • tf.nn.relu(Z1): computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation here.

  • tf.contrib.layers.flatten(P): given an input P, this function flattens each example into a 1D vector it while maintaining the batch-size. It returns a flattened tensor with shape [batch_size, k]. You can read the full documentation here.

  • tf.contrib.layers.fully_connected(F, num_outputs): given a the flattened input F, it returns the output computed using a fully connected layer. You can read the full documentation here.

In the last function above (tf.contrib.layers.fully_connected), the fully connected layer automatically initializes weights in the graph and keeps on training them as you train the model. Hence, you did not need to initialize those weights when initializing the parameters.


Difference between "SAME" and "VALID" padding in tf.nn.max_pool, tf.nn.conv2d...

SAME means that the output feature map has the same spatial dimensions as the input feature map. Zero padding is introduced to make the shapes match as needed, equally on every side of the input map.
VALID means no padding.

9185794-012fd210a07d6c78.gif
padding - VALID/without padding, drop the right most
9185794-b7faf8bc9ddc325d.png
VALID padding
9185794-b2033c7359e040a7.gif
padding - SAME/with padding, try to padding evenly

9185794-42b6fcde8dcc2409.png
SAME padding
  • To sum up - In practice, "SAME" padding is used much often than "VALID" padding in which it tends to maintain the spatial dimensions of the original input, i.e. training longer network is possible in order to prevent from under fitting problem.

Difference between

tf.nn.softmax_cross_entropy_with_logits and tf.nn.softmax_cross_entropy_with_logits_v2

It is easy to be confused, because in supervised learning one doesn't need to backpropagate to labels. They are considered fixed ground truth and only the weights need to be adjusted to match them.

But in some cases, the labels themselves may come from a differentiable source, another network. One example might be adversarial learning. In this case, both networks might benefit from the error signal. That's the reason why tf.nn.softmax_cross_entropy_with_logits_v2 was introduced. Note that when the labels are the placeholders (which is also typical), there is no difference if the gradient through flows or not, because there are no variables to apply gradient to.


tf.reduce_man

tf.reduce_mean(
input_tensor,
axis=None,
keepdims=None,
name=None,
reduction_indices=None,
keep_dims=None
)

Defined in tensorflow/python/ops/math_ops.py.

See the guide: Math > Reduction

Computes the mean of elements across dimensions of a tensor. (deprecated arguments)

SOME ARGUMENTS ARE DEPRECATED. They will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead

Reduces input_tensor along the dimensions given in axis. Unless keepdims is true, the rank of the tensor is reduced by 1 for each entry in axis. If keepdims is true, the reduced dimensions are retained with length 1.

If axis has no entries, all dimensions are reduced, and a tensor with a single element is returned.


tf.train.AdamOptimizer

Class AdamOptimizer

Inherits From: Optimizer

Defined in tensorflow/python/training/adam.py.

See the guide: Training > Optimizers

Optimizer that implements the Adam algorithm.

See Kingma et al., 2014 (pdf).

9185794-1b28542f2b8a86d5.png
Adam optimization

Notice that AdamOptimization is a combination of Gradient with momentum and RMSprop

  1. learning_rate: A Tensor or a floating point value. The learning rate.
  2. beta1: A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
  3. beta2: A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
  4. epsilon: A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.
  5. use_locking: If True use locks for update operations.
  6. name: Optional name for the operations created when applying gradients. Defaults to "Adam".
# Common use
tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

minimize(
loss,
global_step=None,
var_list=None,
gate_gradients=GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=False,
name=None,
grad_loss=None
)


tf.global_variables vs tf.local_variables

tf.clip_by_value

tf.clip_by_value(
t,
clip_value_min,
clip_value_max,
name=None
)
Clips tensor values to a specified min and max.

  • Args:
    t: A Tensor.
    clip_value_min: A 0-D (scalar) Tensor, or a Tensor with the same
    shape as t. The minimum value to clip by.
    clip_value_max: A 0-D (scalar) Tensor, or a Tensor with the
    same shape as t. The maximum value to clip by.
    name: A name for the operation (optional).
  • Returns:
    A clipped Tensor.
cross_entropy = -tf.reduce_mean(
            y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0))
             + (1-y_) * tf.log(tf.clip_by_value(1-y, 1e-10, 1.0))
)

猜你喜欢

转载自blog.csdn.net/weixin_34273046/article/details/88157307