Long Short-Term Memory Models

Methodology

This section develops four major sentiment analysis models using ML techniques, i.e., LSTM, BoT, CNN, and transformer.


LSTM-Based Sentiment Analysis

Neural networks utilize backpropagation algorithms to update the weights using chain rule in calculus. For large deep neural networks, backpropagation could cause troubles such as vanishing or exploding gradients. Long-term memory architecture is an improved version of the Recurrent Neural Network (RNN) to overcome the vanishing gradient problem with an extra recurrent state called a memory cell. LSTM achieves long-range data series learning, making it a suitable technique for a sentiment analysis task.

Forward and backward RNNs are combined to form a single tensor to increase the performance of the LSTM-based model. In addition to bidirectionality, multiple LSTM layers could be stacked on top of each other to increase the performance further. We have used two layers of LSTM and 0.5 dropout on hidden states for regularization to decrease the probability of overfitting.


Bag of Tricks-Based Sentiment Analysis

Among DL techniques, linear classifiers could achieve similar performances with a more straightforward design. However, one of the disadvantages of linear classifiers is their inability to share the parameters among features and classes. The Bag of Tricks (BoT) architecture uses linear models with a rank constraint and a fast loss approximation as a base. Means of words' vector representations are fed to linear classifiers to get a probability distribution of a class. In this study, bag of n-grams, i.e., bigrams, are utilized instead of word order for higher performance where the n-gram technique stores n-adjacent words together.

This architecture also does not use pre-trained word embeddings, which could ease its usage in other languages that do not yet have efficient pre-trained word embeddings. This model has fewer parameters than the other models, and the results are comparable in less time.


CNN (Convolutional Sentiment Analysis)-Based Sentiment Analysis

Convolutional Neural Network (CNN) is a DL approach used primarily for raw data. CNN has a wide scope of application fields from image recognition to NLP with its architecture. CNN has a multi-layered feed-forward Neural Network architecture. It aims to reduce the data into a shape so that it does not lose any features while processing. This way, CNN makes sure that the prediction accuracy and quality will be higher. There are convolutional and pooling layers in its architecture to reshape the data while training. Traditionally, CNNs have one or more convolutional layers followed by one or more linear layers.

In the convolutional layer, the data is processed and reshaped by kxk filters, usually k = 3. Each filter has a form in the architecture, and this filter gives the weight for the data points. The intuitive idea behind learning the weights is that the convolutional layers act as feature extractors, extracting parts of the most critical data. This way, the dominant features are extracted.

CNN has been mainly used for image fields for a long time; image recognition, image detection, analysis, etc. However, it is started to be used in NLP approaches, and it gives significant results. In this study, convolutional layers will be used as “k” consecutive words in a piece of text. The kxk filter mentioned in the convolutional layer would represent a patch of an image in image related field. However, here 1xk filter will be used to focus on k consecutive words as in bi-grams (a 1x2 filter), tri-grams (a 1x3 filter), and/or n-grams (a 1xn filter) inside the text.


Transformer Based Sentiment Analysis

The transformer is a state of art network architecture proposed in 2017. In this state-of-art approach, the results showed that with the use of a transformer, NLP tasks outperformed other techniques. After transformer architecture, various models focusing on NLP fields such as ROBERT, BERT, ELECTRA were proposed. Specifically, BERT (Bidirectional Encoder Representations from Transformers) model is one of the most robust state-of-art approaches on NLP fields. BERT was introduced in 2019 by Google AI Language, and since then, it has started to be used very quickly in academics and industry. BERT is a pre-trained model which is very easy to fine-tune model into our dataset. It has a wide range of language options.

BERT architecture is a multi-layer bidirectional transformer encoder. BERT input representation has three embedding layers: position embedding, segment embedding, and token embedding. In the pre-training part, BERT applied two unsupervised tasks, Masked LM (MLM) and Next Sentence Prediction (NSP), instead of traditional sequence modeling. BERT has pre-trained with more than 3,000 M words.

In the study, we used the transformers library to obtain a pre-trained BERT model employed as our embedding layers. We only trained the rest of the model with the pre-trained architecture itself, which learns from the transformer's representations. The transformer provides a pooled output and the embedding for the whole sequence. Due to the purpose of this study (sentiment outputs), the model has not utilized the pooled output from the architecture.

The input sequence of the data was tokenized and trimmed to the maximum sequence size. The tokenized input was converted to a tensor and prepared for fine-tuning. After fine-tuning the model, it was then used to evaluate the sentiment of various sequences.