# Monthly Archives: February 2014 # Breaking the sinus

Ok I’ve seen some experiments done. That’s great.

However, I have a small remark as for generating from the model. I have mostly seen people using mean squared error which which can be put in this setting by saying that , which is also equivalent to .

One way to find this $\sigma^{2}$ after training the MLP is to use the estimator where $i$ are training examples (and possibly valid).

You can then sample from Randomization might break convergence to some attractor behavior like seen in here (sinusoid) and here (flatline) and might give some more interesting results.

P.S: Other options includes having $\sigma$ a function of $(x_{t-k}, \dots, x_{t-1})$ and then minimizing the log-likelihood cost .

Advertisements # Markov assumption and regression

When doing unsupervised learning (well… it’s not entirely unsupervised but still) with sequences a possible way to map our problem to a supervised learning problem is making  the Markov assumption of order k.

To clarify, in a sequence of size $T$ we are trying for example to maximize the likelihood $\mathbb{P}(x_{1}, x_{2}, \dots, x_{T-1}, x_{T}) = \prod_{t=1}^{T}{\mathbb{P}(x_{t} \mid x_{1}, \dots, x_{t-1})}$. The Markov assumption of order k is stating that $\mathbb{P}(x_{t} \mid x_{t-1}, \dots, x_{t-1}) = \mathbb{P}(x_{t} \mid x_{t-k}, \dots, x_{t-1})$. AR(p) is a special case (the linear one) of such model.

So we just have to define such conditional model (including the limit case for the k first step) in the same way we would define a supervised regression model, thus obtaining a probabilistic model with which we maximize likelihood.

Anyway, this is just to try to justify the name here.

P.S: There is still no preprocessing yet. I’m not using phones AND phonemes. If people want to add that, feel free to make a separate branch.

P.P.S: I’m working with Vincent for a more pylearn2-friendly implementation of this.

EDIT: Here are some occurrences of such modelization:

I might have forgotten some. If so please tell me.

# TIMIT in a class

So I wrapped TIMIT into a class. You can use it as you see fit.

I haven’t added any preprocessing (centering, normalization,  wavelets, Fourier transform, LPC…). (EDIT: I use however the segment_axis function used by João here to cut the sequence into frames, copy this file in your Python path.)

This class is using a reduced set of phonemes, as the same phoneme can be written (and is written) in multiple ways (mentioned here). # Those who can’t, teach machine to do it

People seem skeptical of the processing of the data that I have made.

I’m fine with that.

Because actually, by just looking at the integer vector, I can’t really tell if it’s supposed to be a sound or someone has been playing a prank on me by replacing the meaningful waveform vectors with random vectors. If the data is raw or in another representation like wavelets or MFCC. That’s actually somehow interesting that we expect our machine learning algorithm to figure this out.

So, I’ve made a python script to check if the vectors made sense. I pick a random sentence in the training data, see its waveform, the corresponding phonemes and words and output a .WAV file. I also output the feature of the speaker so I can also check if the voice fits.

It’s supposed to say “Diane may splurge and buy a turquoise necklace”.

It does.

Also, reading the  script might help you understand how to use the .npy and .pkl files.

P.S: On an unrelated note, if I would expect words not to bring that much information over the phonemes, I would however consider the final punctuation to be obviously important in the prosody learning (assertion, question, exclamation…). So important that I haven’t included this feature yet…

P.P.S: Now if I wanted to transform the data into mainstream representations like Fourier transform or wavelets, I might want to try the signal processing and discrete Fourier transform scipy packages.