# 12. Learning: Neural Nets, Back Propagation

1174 16 177481

Uploaded on
01/10/2014

MIT 6.034 Artificial Intelligence, Fall 2010

View the complete course: http://ocw.mit.edu/6-034F10

Instructor: Patrick Winston

How do we model neurons? In the neural net problem, we want a set of weights that makes the actual output match the desired output. We use a simple neural net to work out the back propagation algorithm, and show that it is a local computation.

License: Creative Commons BY-NC-SA

More information at http://ocw.mit.edu/terms

More courses at http://ocw.mit.edu

Comments (2):

Submit Your Video

By anonymous 2017-09-20

TLDR: Word2Vec is building word projections (embeddings) in alatent spaceof N dimensions, (N being the size of the word vectors obtained). The float values represents the coordinates of the words in this N dimensional space.The major idea behind latent space projections, putting objects in a different and continuous dimensional space, is that your objects will have a representation (a vector) that has more interesting calculus characteristics than basic objects.

Word2Vec algorithms do this:

Imagine that you have a sentence:

You obviously want to fill the blank with the word "outside" but you could also have "out". The w2v algorithms are inspired by this idea. You'd like all words that fill in the blanks near, because they belong together. Therefore the words "out" and "outside" will be closer together whereas a word like "carrot" would be farther away.

This is sort of the "intuition" behind word2vec. For a real mathematical explanation of what's going on i'd suggest reading:

For paragraph vectors, the idea is the same as in w2v. Each paragraph can be represented by its words. Two models are presented in the paper.

fixed lengthparagraph vector is used to predict its words.fixed lengthparagraph token in word contexts (the pv-dm model). By retropropagating the gradient they get "a sense" of what's missing, bringing paragraph with the same words/topic "missing" close together.Bits from the article:

For full understanding on how these vectors are built you'll need to learn how neural nets are built and how the backpropagation algorithm works. (i'd suggest starting by this video and Andrew NG's Coursera class)

NB:Softmax is just a fancy way of saying classification, each word in w2v algorithms is considered as a class. Hierarchical softmax/negative sampling are tricks to speed up softmax and handle a lot of classes.Original Thread