Neural Networks That Don't Forget Long Information, LSTM

LSTM (Long Short-Term Memory) is a neural network architecture that improves upon the structure of RNNs.

A basic RNN can remember past information, but the problem is that it tends to forget earlier information after some time has passed.

LSTM was created to address this issue, and it has a structure that can maintain important information over a prolonged period.

Why Do We Need LSTM?

Traditional RNNs face challenges in passing information from earlier in the sequence as the sentences grow longer.

For instance, in a lengthy sentence where subjects and verbs are far apart, an RNN may fail to remember their relationship, leading to inaccurate predictions.

To solve this problem, LSTM introduces the concept of a cell state.

This cell retains important information for the long term, erasing irrelevant information when it’s no longer needed.

Core Structure of LSTM

LSTM processes inputs sequentially over time, similar to an RNN, but its internal structure is more complex.

The key components are:

Cell State: Acts like a memory pipeline, preserving important information across time steps and updating by retaining useful data and removing irrelevant parts.
Gates: Mechanisms that decide how much information to retain and discard. Each gate is made up of a small neural network.

The Gates of LSTM

LSTM utilizes the following three types of gates:

Input Gate: Determines how much of the new information should be stored in the cell.
Forget Gate: Decides which parts of the existing information should be erased.
Output Gate: Determines which information to pass on to the next stage based on current needs.

These gate structures enable LSTM to retain key information for long durations while naturally filtering out unnecessary data.

A Summary of How LSTM Operates

LSTM operates through the following steps:

It decides which information to retain and which to forget based on the previous cell state and current input.
New information is stored in the cell and the existing cell state is updated through the input gate.
Through the output gate, it outputs the necessary information at the current time step and passes it to the next stage.

By repeating this process, LSTM effectively processes sequential data, maintaining meaningful information even in long sentences or complex time sequences.

In the next lesson, we will explore GRU, which is similar to LSTM but has a simpler structure.

Want to learn more?

Join CodeFriends Plus membership or enroll in a course to start your journey.

Why Do We Need LSTM?​

Core Structure of LSTM​

The Gates of LSTM​

A Summary of How LSTM Operates​

Want to learn more?

Why Do We Need LSTM?

Core Structure of LSTM

The Gates of LSTM

A Summary of How LSTM Operates