When doing unsupervised learning (well… it’s not entirely unsupervised but still) with sequences a possible way to map our problem to a supervised learning problem is making the Markov assumption of order k.
To clarify, in a sequence of size we are trying for example to maximize the likelihood . The Markov assumption of order k is stating that . AR(p) is a special case (the linear one) of such model.
So we just have to define such conditional model (including the limit case for the k first step) in the same way we would define a supervised regression model, thus obtaining a probabilistic model with which we maximize likelihood.
Anyway, this is just to try to justify the name here.
EDIT: Here are some occurrences of such modelization:
I might have forgotten some. If so please tell me.