Tensorflow has been proved is a prevalent and effective Deep Learning
framework, I decide to make a series of post for some of the main aspects in which they may easy to be confused or difficult to use(This article may behave unordered form as I will update it randomly, and yet keep the pace more frequently
).
-
Session.run() and Tensor.eval()
In TensorFlow, what is the difference between Session.run() and Tensor.eval()?
I often see people use Session.run() and Tensor.eval()
interchangeably which makes me overwhelmingly confused.
First, we need to make it clear that what is Tensor??? (Check it link
).
After the graph has been launched in a session, the value of the
Tensor
can be computed by passing it totf.Session.run
.t.eval()
is a shortcut for callingtf.get_default_session().run(t)
.
In another words, If you have a Tensor
t, calling t.eval()
is equivalent to calling tf.get_default_session().run(t)
.
# Make a session the defaults as follows
t = tf.constant(42.0)
sess = tf.Session()
with sess.as_default(): # or `with sess:` to close on exit
assert sess is tf.get_default_session()
assert t.eval() == sess.run(t)
"""
The most important difference is that you can use sess.run()
to fetch the values of many tensors in the same step
"""
t = tf.constant(42.0)
u = tf.constant(37.0)
tu = tf.mul(t, u)
ut = tf.mul(u, t)
with sess.as_default():
tu.eval() # runs one step
ut.eval() # runs one step
sess.run([tu, ut]) # evaluates both tensors in a single step
Note that each call to eval
and run
will execute the whole graph from scratch. To cache the result of a computation, assign it to a tf.Variable
.
Another example of eval call
# Returns the truth value of (x == y) element-wise.
correct_prediction = tf.equal(tf.argmax(Z3), tf.argmax(Y))
# Casts a tensor to a new type.
accuracy = tf.reduce_mean(tf.cast(correct_prediction, 'float'))
# Notice that `Z3` needs X input which comes from `other code snippet`
"""
# forward propagation, build forward propagation in the tensorflow graph
Z3 = forward_propagation(X, parameters)
"""
# Hence it needs X input as the `fed_dict` for input
print('Train Accuracy: ', accuracy.eval({X: X_train, Y: Y_train}))
print ("Test Accuracy:", accuracy.eval({X: X_test, Y: Y_test}))
-
tf.cast
see tf.cast(...) in the above code snippet
tf.cast(
x,
dtype,
name=None
)
Defined in tensorflow/python/ops/math_ops.py
.
See the guide: Tensor Transformations > Casting
Casts a tensor to a new type.
The operation casts x
(in case of Tensor
) or x.values
(in case of SparseTensor
) to dtype
.
x: A Tensor or SparseTensor.
dtype: The destination type.
name: A name for the operation (optional).
A Tensor or SparseTensor with same shape as x will be returned.
x = tf.constant([1.8, 2.2], dtype=tf.float32)
tf.cast(x, tf.int32) # [1, 2], dtype=tf.int32
Generally, tf.set_random_seed(x)
makes the output of the random value is the same each time the program runs
tf.set_random_seed(seed)
Defined in tensorflow/python/framework/random_seed.py
.
Sets the graph-level random seed.
Operations that rely on a random seed
actually derive it from two seeds: the graph-level and operation-level seeds
. This sets the graph-level seed.
Its interactions with operation-level seeds is as follows:
- If neither the graph-level nor the operation seed is set: A random seed is used for this op.
- If the graph-level seed is set, but the operation seed is not: The system deterministically picks an operation seed in conjunction with the graph-level seed so that it gets a unique random sequence.
- If the graph-level seed is not set, but the operation seed is set: A default graph-level seed and the specified operation seed are used to determine the random sequence.
- If both the graph-level and the operation seed are set: Both seeds are used in conjunction to determine the random sequence.
- Set on neither graph level nor op level
a = tf.random_uniform([1]) b = tf.random_normal([1]) print("Session 1") with tf.Session() as sess1: print(sess1.run(a)) # generates 'A1' print(sess1.run(a)) # generates 'A2' print(sess1.run(b)) # generates 'B1' print(sess1.run(b)) # generates 'B2' print("Session 2") with tf.Session() as sess2: print(sess2.run(a)) # generates 'A3' print(sess2.run(a)) # generates 'A4' print(sess2.run(b)) # generates 'B3' print(sess2.run(b)) # generates 'B4'
- Set on op level
a = tf.random_uniform([1], seed=1) b = tf.random_normal([1]) # Repeatedly running this block with the same graph will generate the same # sequence of values for 'a', but different sequences of values for 'b'. print("Session 1") with tf.Session() as sess1: print(sess1.run(a)) # generates 'A1' print(sess1.run(a)) # generates 'A2' print(sess1.run(b)) # generates 'B1' print(sess1.run(b)) # generates 'B2' print("Session 2") with tf.Session() as sess2: print(sess2.run(a)) # generates 'A1' print(sess2.run(a)) # generates 'A2' print(sess2.run(b)) # generates 'B3' print(sess2.run(b)) # generates 'B4'
- Set on graph level
tf.set_random_seed(1234) a = tf.random_uniform([1]) b = tf.random_normal([1]) # Repeatedly running this block with the same graph will generate the same # sequences of 'a' and 'b'. print("Session 1") with tf.Session() as sess1: print(sess1.run(a)) # generates 'A1' print(sess1.run(a)) # generates 'A2' print(sess1.run(b)) # generates 'B1' print(sess1.run(b)) # generates 'B2' print("Session 2") with tf.Session() as sess2: print(sess2.run(a)) # generates 'A1' print(sess2.run(a)) # generates 'A2' print(sess2.run(b)) # generates 'B1' print(sess2.run(b)) # generates 'B2'
-
tf.Graph
Defined intensorflow/python/framework/ops.py
.
A TensorFlow computation, represented as a dataflow graph.
A Graph
contains a set of tf.Operation
objects, which represent units of computation; and tf.Tensor
objects, which represent the units of data that flow between operations.
A default Graph
is always registered, and accessible by calling tf.get_default_graph
. To add an operation to the default graph, simply call one of the functions that defines a new Operation
:
c = tf.constant(4.0)
assert c.graph is tf.get_default_graph()
Another typical usage involves the tf.Graph.as_default
context manager, which overrides the current default graph for the lifetime of the context:
g = tf.Graph()
with g.as_default():
# Define operations and tensors in `g`.
c = tf.constant(30.0)
assert c.graph is g
Important note: This class is not thread-safe for graph construction. All operations should be created from a single thread, or external synchronization must be provided. Unless otherwise specified, all methods are not thread-safe.
A
Graph
instance supports an arbitrary number of "collections" that are identified by name. For convenience when building a large graph, collections can store groups of related objects: for example, thetf.Variable
uses a collection (namedtf.GraphKeys.GLOBAL_VARIABLES
) for all variables that are created during the construction of a graph. The caller may define additional collections by specifying a new name.
Relevant tf.functions
for CNN forward propagation
tf.nn.conv2d(X,W1, strides = [1,s,s,1], padding = 'SAME'): given an input and a group of filters , this function convolves 's filters on X. The third input ([1,f,f,1]) represents the strides for each dimension of the input (m, n_H_prev, n_W_prev, n_C_prev). You can read the full documentation here
tf.nn.max_pool(A, ksize = [1,f,f,1], strides = [1,s,s,1], padding = 'SAME'): given an input A, this function uses a window of size (f, f) and strides of size (s, s) to carry out max pooling over each window. You can read the full documentation here
tf.nn.relu(Z1): computes the elementwise ReLU of Z1 (which can be any shape). You can read the full documentation here.
tf.contrib.layers.flatten(P): given an input P, this function flattens each example into a 1D vector it while maintaining the batch-size. It returns a flattened tensor with shape [batch_size, k]. You can read the full documentation here.
tf.contrib.layers.fully_connected(F, num_outputs): given a the flattened input F, it returns the output computed using a fully connected layer. You can read the full documentation here.
In the last function above (tf.contrib.layers.fully_connected
), the fully connected layer automatically initializes weights in the graph and keeps on training them as you train the model. Hence, you did not need to initialize those weights when initializing the parameters.
Difference between "SAME" and "VALID" padding in tf.nn.max_pool, tf.nn.conv2d...
SAME means that the output feature map has the same spatial dimensions as the input feature map. Zero padding is introduced to make the shapes match as needed, equally on every side of the input map.
VALID means no padding.
- To sum up - In practice, "SAME" padding is used much often than "VALID" padding in which it tends to
maintain the spatial dimensions of the original input
, i.e. training longer network is possible in order to prevent from under fitting problem.
Difference between
tf.nn.softmax_cross_entropy_with_logits and tf.nn.softmax_cross_entropy_with_logits_v2
It is easy to be confused, because in supervised learning one doesn't need to backpropagate to labels. They are considered fixed ground truth and only the weights need to be adjusted to match them.
But in some cases, the labels themselves may come from a differentiable source, another network. One example might be adversarial learning. In this case, both networks might benefit from the error signal. That's the reason why tf.nn.softmax_cross_entropy_with_logits_v2
was introduced. Note that when the labels are the placeholders (which is also typical), there is no difference if the gradient through flows or not, because there are no variables to apply gradient to.
tf.reduce_man
tf.reduce_mean(
input_tensor,
axis=None,
keepdims=None,
name=None,
reduction_indices=None,
keep_dims=None
)
Defined in tensorflow/python/ops/math_ops.py
.
See the guide: Math > Reduction
Computes the mean of elements across dimensions of a tensor. (deprecated arguments)
SOME ARGUMENTS ARE DEPRECATED. They will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead
Reduces input_tensor
along the dimensions given in axis
. Unless keepdims
is true, the rank of the tensor is reduced by 1 for each entry in axis
. If keepdims
is true, the reduced dimensions are retained with length 1.
If axis
has no entries, all dimensions are reduced, and a tensor with a single element is returned.
tf.train.AdamOptimizer
Class AdamOptimizer
Inherits From: Optimizer
Defined in tensorflow/python/training/adam.py
.
See the guide: Training > Optimizers
Optimizer that implements the Adam algorithm.
See Kingma et al., 2014 (pdf).
Notice that AdamOptimization is a combination of Gradient with momentum and RMSprop
- learning_rate: A Tensor or a floating point value. The learning rate.
- beta1: A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
- beta2: A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
- epsilon: A small constant for numerical stability. This epsilon is "epsilon hat" in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.
- use_locking: If True use locks for update operations.
- name: Optional name for the operations created when applying gradients. Defaults to "Adam".
# Common use
tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
minimize(
loss,
global_step=None,
var_list=None,
gate_gradients=GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=False,
name=None,
grad_loss=None
)
tf.global_variables vs tf.local_variables
tf.clip_by_value
tf.clip_by_value(
t,
clip_value_min,
clip_value_max,
name=None
)
Clips tensor values to a specified min and max.
- Args:
t: A Tensor.
clip_value_min: A 0-D (scalar) Tensor, or a Tensor with the same
shape as t. The minimum value to clip by.
clip_value_max: A 0-D (scalar) Tensor, or a Tensor with the
same shape as t. The maximum value to clip by.
name: A name for the operation (optional). - Returns:
A clipped Tensor.
cross_entropy = -tf.reduce_mean(
y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0))
+ (1-y_) * tf.log(tf.clip_by_value(1-y, 1e-10, 1.0))
)