# Markov assumption and regression

When doing unsupervised learning (well… it’s not entirely unsupervised but still) with sequences a possible way to map our problem to a supervised learning problem is making  the Markov assumption of order k.

To clarify, in a sequence of size $T$ we are trying for example to maximize the likelihood $\mathbb{P}(x_{1}, x_{2}, \dots, x_{T-1}, x_{T}) = \prod_{t=1}^{T}{\mathbb{P}(x_{t} \mid x_{1}, \dots, x_{t-1})}$. The Markov assumption of order k is stating that $\mathbb{P}(x_{t} \mid x_{t-1}, \dots, x_{t-1}) = \mathbb{P}(x_{t} \mid x_{t-k}, \dots, x_{t-1})$. AR(p) is a special case (the linear one) of such model.

So we just have to define such conditional model (including the limit case for the k first step) in the same way we would define a supervised regression model, thus obtaining a probabilistic model with which we maximize likelihood.

Anyway, this is just to try to justify the name here.

P.S: There is still no preprocessing yet. I’m not using phones AND phonemes. If people want to add that, feel free to make a separate branch.

P.P.S: I’m working with Vincent for a more pylearn2-friendly implementation of this.

EDIT: Here are some occurrences of such modelization:

I might have forgotten some. If so please tell me.