Markov chains are memoryless mathematical process (or a sequence) which jump from one state to another, following the rules of the Markov property. A state can be thought of as a situation/event or a set of values. One example to demonstrate a markov chain can be weather conditions; Sunny and Rainy being two weather conditions (states), one such sample of a sequence of events can be as follows:

Rainy Sunny Rainy Rainy Sunny Sunny Sunny Sunny Rainy...

The Markov chain follows the shifts or transitions based on a Transition Probability Matrix, \(T\), which contain information about how probable it is to visit state \(j\) when the current state is \(i\), for all possible states of the system (called the state space, \(S\)). The Markov property states that the conditional probability distribution of the future states depends only on the present state, not the sequence of previous states. Mathematically, assume \(X\) is a sequence of states \(x_i \in S\), then \(X = x_n, x_{n-1}, ..., x_0\) is a Markov sequence iff:

\[\mathbb{P}(X_n = x_n | X_{n-1} = x_{n-1}, ..., X_0 = x_0) = \mathbb{P}(X_n = x_n | X_{n-1} = x_{n-1})\]

Where the probability of transition is taken from \(T\), i.e.

\[\mathbb{P}(X_n = j | X_{n-1} = i) = T_{ij}; T = \begin{bmatrix} p_{11} & p_{12} & p_{13} & \dots & p_{1m} \\ p_{21} & p_{22} & p_{23} & \dots & p_{2m} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ p_{m1} & p_{m2} & p_{m3} & \dots & p_{mm} \end{bmatrix}\]

It is because of the markov property that the markov chain is called a memoryless process since there is no requirement to store the past states in the memory. The system jumps from one state to another following the probability distribution given by the transition probability matrix \(T\). An excellent interactive example of a markov chain can be found here.

Also, since markov chains predict the probability of going from a state \(i\) to state \(j\) (\(i, j \in S\)) in one step, they can also be used to predict the probability of going from state \(i\) to state \(j\) in some \(k\) number of steps. The probability of going from \(i\) to \(j\) in 2 steps (reaching an intermediate state \(p\) in between) \(i \to p \to j\) is:

\[\mathbb{P}(X_n = j | X_{n-1} = p) . \mathbb{P}(X_{n-1} = p | X_{n-2} = i) = T_{ip} . T_{pj}\]

which is essentially the element at position (\(i, j\)) in a matrix \(A = T^2\). In general, this probability for \(k\) steps can be computed from \(T_{ij}^{k}\).

Few popular applications of Markov chains include Google PageRank, Autocomplete/typing word prediction, Generating sequences of text (for sentences) or pixels (for images) etc.

TODO: Add code + example for text generation using markov chains.