Recurrent neural networks (RNN) Deep Learning - Part 2

Contributors

Understand the difference between feedforward neural networks (FNN) and RNN
Learn various RNN types and architectures
Learn how to create a neural network using Galaxy’s deep learning tools
Solve a sentiment analysis problem on IMDB movie review dataset using RNN in Galaxy

last_modification Published: May 31, 2021

last_modification Last Updated: May 31, 2021

Speaker Notes

What is a recurrent neural network (RNN)?

RNN models sequential data (temporal/ordinal)
In RNN, training example is a sequence, which is presented to RNN one at a time
- E.g., sequence of English words is passed to RNN, one at a time
- And, RNN generates a sequence of Persian words, one at a time
In RNN, output of network at time t is used as input at time t+1
RNN applied to image description, machine translation, sentiment analysis, etc.

Neurons forming a one-to-many recurrent neural network

Neurons forming a many-to-one recurrent neural network

Neurons forming a many-to-many recurrent neural network

We perform sentiment analysis on IMDB movie reviews dataset
Train RNN on training dataset (25000 positive/negative movie reviews)
Test RNN on test set (25000 positive/negative movie reviews)
Training and test sets have no overlap
Since dealing with text data, good to review mechanisms for representing text data

If you don’t care about the order of the words in a document
2D array. Rows represent documents. Columns represent words in vocabulary
- All unique words in all documents
If a word not present in a document, we have a zero at row and column entry
If a word is present in a document, we have a one at row and colum entry
- Or, we could use the word count or frequency

Table showing a bag-of-words representation of sample documents

BoW is simple, but does not consider rarity of words across documents
- Important for document classification

If you don’t care about the order of the words in a document
Similar to BoW, we have an entry for each document-word pair
Entry is product of
- Term frequency, frequency of a word in a document, and
- Inverse document frequency, total number of documents divided by number of documents that have word
  - Usually use logarithm of the IDF
TF-IDF takes into account rarity of a word across documents

Mathematical vectors representing one-hot-encoding representation of words orange, apple, and banana

OHE problems
- For very large vocabulary sizes requires tremendous amount of storage
- Also, no concept of word similarity

Each word represented as an n dimensional vector
- n much smaller than vocabulary size
Words that have similar meanings are close in vector space
Words considered similar if they co-occur often in documents
Two Word2Vec architectures
- Continuous BoW
  - predicts probability of a word given the surrounding words
- Continuous skip-gram
  - given a word predicts probability of the surrounding words

Sentiment classification of IMDB movie reviews with RNN
Train RNN using IMDB movie reviews
Goal is to learn a model such that given a review we predict whether review is positive/negative
We evaluate the trained RNN on test dataset and plot confusion matrix

This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!