Hands-on TensorBoard (TensorFlow Dev Summit 2017)

By: Google Developers

1289   14   105980

Uploaded on 02/15/2017

Join Dandelion Mané in this talk as they demonstrate all the amazing things you can do with TensorBoard. You'll learn how to visualize your TensorFlow graphs, monitor training performance, and explore how your models represent your data. The code examples shown are available here: https://goo.gl/ZwGnPE.

Visit the TensorFlow website for all session recordings: https://goo.gl/bsYmza

Subscribe to the Google Developers channel at http://goo.gl/mQyv5L

Comments (13):

By anonymous    2017-09-20

Sadly, I cannot find a more comprehensive documentation. Below I collect all related resources:

PS: Thanks for upvoting me. Now I can post all the links.

Original Thread

By anonymous    2017-09-20

While @rmeerten's answer is correct, you can consider also using TensorBoard which can be a useful tool for debugging your models and seeing what's happening. For background, you can also check out the TensorBoard session from the TensorFlow Dev Summit.

Original Thread

By anonymous    2017-09-20

There are two ways to profile models. One way is a tensorboard. Here is a comprehensive tutorial about it and here is a good video.

Additionally, clicking on a node will display the exact total memory, compute time, and tensor output sizes.

enter image description here


Another way is tensorflow debugger, which also has tutorials.

Original Thread

By anonymous    2017-10-22

There is an awesome video tutorial (https://www.youtube.com/watch?v=eBbEDRsCmv4) on Tensorboard that describes almost everything about Tensorboard (Graph, Summaries etc.)

Original Thread

By anonymous    2017-10-22

  1. Variable summaries (scalar, histogram, image, text, etc) help track your model through the learning process. For example, tf.summary.scalar('v_loss', validation_loss) will add one point to the loss curve each time you call the summary op, thus give you a rough idea whether the model has converged and when to stop.
  2. It depends on your variable type. For values like loss, tf.summary.scalar shows the trend across epochs; for variables like weights in a layer, it would be better to use tf.summary.histogram, which shows the change of entire distribution of weights; I typically use tf.summary.image and tf.summary.text to check the images / texts my model generates over different epochs.
  3. The graph shows your model structure and the size of tensors flowing through each op. I found it hard at the beginning to organise ops nicely in the graph presentation, and I learnt a lot about variable scope from that. The other answer provides a link for a great tutorial for beginners.

Original Thread

By anonymous    2017-11-13

I am new to Tensorflow and TFLearn and when I was following some tutorials I found the tool Projector https://www.youtube.com/watch?v=eBbEDRsCmv4&t=629s. I was trying to use it with TFLearn but I couldn't found any example in the internet and the documentation in the Tensorflow page is not the very intuitive https://www.tensorflow.org/programmers_guide/embedding. Can somebody help me with a proper example that integrate TFLearn and projector.

Original Thread

By anonymous    2017-11-20

Straight to the point. I'm using SkipGram (see Word2Vec Tutorial) to obtain word embeddings for sequences of words. I've used Hands-on Tensorboard as a starting point. I'd like to run the model for different hyperparameters and compare the resulting weight matrices for using t-SNE (even if this is ill-advised). I understand that there are several ways to output the weight matrix and get around this problem, but I'd like to use tf.train.Saver() as described below.

  • Problem: I save each run in a separate folder, namely Tensorboard_data/folder1, Tensorboard_data/folder2 etc. Each folder contains the output of a tf.summary.Filewriter() and session tf.train.Saver()-class (after training is completed). Afterwards I run tensorboard --logdir /Tensorboard_data. As stated in Hands-on Tensorboard I successfully obtain a comparative plot of, say 4, runs in the histogram, scalar, weight section and graph. Once I press the tab-down menu of "Inactive" (the error might be here, why is it inactive?) and select Projector, I once again have 4 runs. However it seems I have messed my checkpoint file somehow - every run has the same amount of variance explained in PCA (and if I dotensorboard --logdir /Tensorboard_data/folder1 I get a different result. However, the last run, say folder4, correspond to the amount of variance explained.

I'm at a loss as to how Tensorflow/Tensorboard understands the checkpoint files outputted by tf.train.Saver() and is able to overwrite the previous runs despite the files being in different folders. This might be a bug, however since I'm not sure about this, I didn't want to bother the Tensorflow people over at Github.

Original Thread

By anonymous    2018-03-26

So I am running a CNN for a classification problem. I have 3 conv layers with 3 pooling layers. P3 is the output of the last pooling layer, whose dimensions are: [Batch_size, 4, 12, 48]_, and I want to flatten that matrix into a [Batch_size, 2304] size matrix, being 2304 = 4*12*48. I had been working with "Option A" (see below) for a while, but one day I wanted to try out "Option B", which would theoretically give me the same result. However, it did not. I have cheked the following thread before

Is tf.contrib.layers.flatten(x) the same as tf.reshape(x, [n, 1])?

but that just added more confusion, since trying "Option C" (taken from the aforementioned thread) gave a new different result.

P3 = tf.nn.max_pool(A3, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding='VALID')

P3_shape = P3.get_shape().as_list()

P = tf.contrib.layers.flatten(P3)                             <-----Option A

P = tf.reshape(P3, [-1, P3_shape[1]*P3_shape[2]*P3_shape[3]]) <---- Option B

P = tf.reshape(P3, [tf.shape(P3)[0], -1])                     <---- Option C

I am more inclined to go with "Option B" since that is the one I have seen in a video by Dandelion Mane (https://www.youtube.com/watch?v=eBbEDRsCmv4&t=631s), but I would like to understand why these 3 options are giving different results.

Thanks for any help!

Original Thread

By anonymous    2018-03-26

In the Hands-on TensorBoard video by Dandelion Mané he writes the following code when talking about collecting some summaries and writing them to disk:

#(... some code and some summaries...)
merged_summary = tf.summary.merge_all()
writer = tf.summary.FileWriter("/tmp/mnist_demo/3")
writer.add_graph(sess.graph)

for i in range(2001):
  batch = mnist.train.next_batch(100)
  if i % 5 == 0:
    s = sess.run(merged_summary, feed_dict={x:batch[0], y: batch[1]})
    writer.add_summary(s, i)

So I took inspiration from there for my code, below I show a snippet:

costs = []   # To keep track of the cost per epoch
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=Z5, labels=Y))
tf.summary.scalar('cost', cost)

for epoch in range(num_epochs):

        minibatches_cost = 0
        seed = seed + 1
        minibatches_train = random_mini_batches(X_train, Y_train, minibatch_size, seed)
        num_minibatches_train = len(minibatches_train)

        for minibatch in minibatches_train:

            # Select a minibatch
            (minibatch_X, minibatch_Y) = minibatch

            # Run the session to execute the optimizer and the cost, the feedict should contain a minibatch for (X,Y).
            _ , minibatch_cost = sess.run([optimizer, cost], feed_dict={X:minibatch_X, Y:minibatch_Y})

            minibatches_cost += minibatch_cost    # Adding the cost per minibatch

        epoch_cost = minibatches_cost / num_minibatches_train  # Cost per epoch

        if print_cost == True and epoch % 5 == 0:      # Print the cost
            print ("Cost after epoch %i: %f" % (epoch, epoch_cost))
            print ("Time elapsed: %i" % t_elapsed)

        if epoch % 1 == 0:                             # Append the cost
            costs.append(epoch_cost)

        if epoch % 1 == 0:                             # Write summaries
            summary_str = merged_summary.eval(feed_dict={X:minibatch_X, Y:minibatch_Y})
            file_writer.add_summary(summary_str, epoch)

My question is whether I am feeding the correct data to the session when evaluating merged_summary, because the way I am doing it now, the cost that is going to be written to disk in the summary is the cost of one minibatch (actually the last minibatch, generated with random_mini_batches), whereas the cost per epoch (epoch_cost in the code) that I save in the costs variable to then plot it and study its evolution, is the average cost per epoch (a more accurate measure of the cost than the cost per minibatch, I assume).

I guess feeding the whole training data is not the solution, but I am a bit confused with why only feeding one batch of the training data when evaluating the summaries.

Thanks for any help

Original Thread

By anonymous    2018-04-02

I am working on a project that aims to detect objects in certain difficult circumstances. I ran a test with Mask_RCNN on a dataset that contains that specific type of difficult examples and it did a pretty good job in some of them.

But some other examples didn't get detected surprisingly, when there is no obvious reason. To understand the reason behind this performance difference, I've been adviced to use Tensorboard. But then I realized that its mostly used for training phase, as I understood from this video.

At the end of the video, however, they mention about an integration project of Tensorboard, namely the Tensorflow Debugger Integration. But unfortunately I could not find further information regarding the continuation about that feature.

Is there any way to visualize weights and activation maps inside a CNN during inference/evaluation phase?

Original Thread

Submit Your Video

If you have some great dev videos to share, please fill out this form.