Algorithm – Gibbs Sampling
We used Gibbs sampling which is one of the algorithms that is used by LDA models [6]. To explain exactly how this algorithm works, let us split the process of the algorithm into steps for better explanation and understanding.
Initialise Parameters
It is also important to tell the algorithm how many topics to find and this is a draw back because you are required to know the notion of the number of topics you want to find. Testing is, therefore, required to confirm whether the results are satisfying. Number of iteration is also needs to be tested and optimal number of iteration is found when the model is converging. In other words, until the model produces same/very similar results. The hyperparameters, alpha and beta, are parameters of a prior distribution. Prior distribution is the probability distribution that expresses the uncertainty of the variable before the data is taken into account [7]. Hyperparameter is determined by statistical models such as the L-Tangent Norm [8].
Initialise Topic Assignments Randomly
After initializing the parameters, the algorithm will initiate random assignment of topics to each word in a document.
Iterate
Based on initial parameters, such as number of topics (K), and assignments (W and Z) the algorithm will compute frequency of words in each topics and distribution of topics in documents.
Therefore, it will create these temporary variables named as and , where is the word distribution for topic
and is the topic distribution for document.
Therefore, it will create these temporary variables named as ϕ(k) and θ(i), where ϕ(k) is the word distribution for topic k and θ(i) is the topic distribution for document i.
Resampling
Topic of each word in each document is resampled, given all the other words and their current topic assignments. This is ultimately answering the following questions:
- Which topics occur in this document?
- Which topics like the word Y?
Gibbs Sampling: Mathematical Description
The collection of documents is represented by a set of word indices wi and document indices di for each token i. Gibbs sampling then considers each word in token in the text collection in turn then estimates the probability of assigning the current word token to each topic. From this conditional distribution, a topic is sampled and stored as the new topics assignment for this word token. This conditional distribution can be written as:
Where zi=j represents the topic assignment of topic i to topic j, z-i refers to the topic assignments of all the other word tokens and “.” refers to all the other known or observed information such as all the other word tokens w-i and document indices d-i and hyperparameters α and β. This is calculated by:
Thus,
Where CWT and CDT are matrices of counts with dimensions W×T and D×T respectively; CDTdj contains the number of times topic j is assigned to some word token in document d, not including the current instance i. [9]