Sequence Models and the RNN API (TensorFlow Dev Summit 2017)

By: Google Developers

205   7   22204

Uploaded on 02/16/2017

In this talk, Eugene Brevdo discusses the creation of flexible and high-performance sequence-to-sequence models. He covers reading and batching sequence data, the RNN API, fully dynamic calculation, fused RNN cells for optimizations for special cases, and dynamic decoding.

Visit the TensorFlow website for all session recordings:

Subscribe to the Google Developers channel at

Comments (5):

By greato    2017-09-20

Using tf.scan is a bad idea.

scan implements strict semantics so it will always execute the same number of timesteps no matter what the accumulator is (nan).

while_loop implements dynamic execution (quit once cond is not met) and at the same time allows parallel execution when some ops are not dependent on accumulator.

If you read the code for `dynamic_rnn` and contrib.legacy Seq2seq model you'll find while_loop. I have yet to see tensorflow library code using tf.scan anywhere!

Also, internally, scan is defined using while_loop. In my code, I find scan lacking in RNN and always have to fall back to while_loop.

Here is video of a talk by the RNN/Seq2Seq author himself:

Original Thread

By anonymous    2017-09-20

You have to keep in mind that TensorFlow is, as far as the user is concerned, "just" a machine learning API. People may happen to use it for image classification - the 2017 Dev Summit showed medical use cases in skin cancer detection and retinal imaging - but all the topics of supervised and unsupervised machine learning are candidates for TensorFlow, just like they are for any other ML library; regression of sales by advertisement budget, clustering of users in a social network and recommending books based on previous purchases via collaborative filtering, just to name a few.

If you heard about the recent self-driving car projects, think about obtaining steering wheel and brake control commands from a live camera feed. NVIDIA had a paper on it, for example.

One rather interesting use case are sequence to sequence models to transform one arbitrary sequence of inputs to another one; according to this video, Google Translate might be taking advantage of it on the phone. If you're thinking of image and video retrieval, sequence labelling is another topic, where you train a network to describe, in human words, the content of a video. Or even natural language processing, where you try to determine the concepts within written text.

There are also papers like this describing the usage of recurrent models like LSTMs for energy usage prediction (Note the paper isn't specific to TensorFlow, but LSTMs are part of the core library). Here are slides on electricity price forecasting with TensorFlow, if you're interested in it.

Original Thread

By anonymous    2017-09-20

I do not know the dataset but I think that you problem is the following: you have a very long sequence and you want to know how to shape this sequence in order to provide this to the network.

The 'tf.contrib.rnn.static_rnn' has the following signature:

tf.contrib.rnn.static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None)


inputs: A length T list of inputs, each a Tensor of shape [batch_size, input_size], or a nested tuple of such elements.

So the inputs need to be shaped into lists, where each element of the list is the element of the input sequence at each time step.

The length of this list depend on your problem and/or on computational issues.

  • In Natural Language Processing, for example, the length of this list can be the maximum sentence length of your document, where shorter sentences are padded to that length. As in this case, in many domains the length of the sequence is driven by the problem
  • However, you can have no such evidences in your problem or still having a long sequence. Long sequences are very heavy from a computational point of view. The BPTT algorithm, used to optimize this models, "unfolds" the recurrent network in a very deep feedforward network with shared parameters and back propagates over it. In this cases, it is still convenient to "cut" the sequence to a fixed length.

And here we arrive at your question, given this fixed length, let us say 10, how do I shape my input?

Usually, what is done is to cut the dataset in non overlapping windows (in your example, we will have 1-9, 10-19, 20-29, etc. What happens here is that the network only looks a the last 10 elements of the sequence each time it updates the weights with BPTT.

However, since the sequence has been arbitrarily cut, it is likely that predictions need to exploit evidences that are far back in the sequence, outside the current window. To do this, we initialize the initial state of the RNN at window i with the final state of the window i-1 using the parameter:

initial_state: (optional) An initial state for the RNN.

Finally, I give you two sources to go into more details:

  1. RNN Tutorial This is the official tutorial of tensorflow. It is applied to the task of Language Modeling. At a certain point of the code, you will see that the final state is fed to the network from one run to the following one, in order to implement what said above.

    feed_dict = {}
    for i, (c, h) in enumerate(model.initial_state):
      feed_dict[c] = state[i].c
      feed_dict[h] = state[i].h
  2. DevSummit 2017 This is a video of a talk during the Tensorflow DevSummit 2017 where, in the first section (Reading and Batching Sequence Data), it is explained how and using which functions you should shape your sequence inputs.

Hope this helps :)

Original Thread

Popular Videos 754

Submit Your Video

If you have some great dev videos to share, please fill out this form.