Alright.

First, about Waffles. It seems to be a toolkit for supervised learning, mostly, that means "given a set of things from class A and a set of things from class B, generate a model that allows me to classify things into A and B automatically". It also has tools for clustering, that is "given this data, figure out what classes there are (possibly with the number of classes as input)". Neither of these are neccesarily useful for generating new things that are 'like' some given input.

For doing that, the typical Thing To Do (tm) is training a markov model with your data. In short:

A markov model consists of "states" an d "transitions", which have "probabilities" and "emissions". For every "state", there is a number of "transitions" to different "states", which are taken with some given "probability", and then "emit" something. It's probably easiest to look at this graphically.

This is a graphical representation of a markov model. the circles are states, and the arrows are transitions. The things in "" are emissions, and the numbers are probabilities. The transition probabilities here depend only on the current state, making this a "first order" markov model. If they depended on the previous two states, it would be a second order model, etc. If there were no states, it would be a 0th order markov model, but that's silly so lets ignore that.

It should be relatively obvious how to generate new things when you have a model like this: You just start in the start state, and then pick a transition according to the probabilities given, append that transitions emission to the output, and keep doing that until you land in the end state. The actual problem is figuring out the model, given some input data.

There's two cases here, one being a bit harder but more poweful, and one being a little less powerful, but easier. The more powerful thing is "hidden" markov models - hidden refering to the state sequence that is passed through. They are generally a pain, and are trained with variations of the Baum-Welch algorithm, and generally only first order models are used. They are important for speech recognition and the likes, but since we only want to generate text, and need not assume hidden states, we can do something easier.

Simply put, we count the number of occurrences of each input symbol in each state (state being the last input symbol - or, for an nth order markov model, the last n symbols), and finally divide by the number of total occurrences, so, for example, given:

**START** I love cake **END**
**START** I love pancakes **END**
**START** I really love cake **END**
**START** You love cake **END**

We'd get the states:
**START**, I, love, cake, really, pancakes, You, **END**

With occurrence counting:
"**START**" => 3 * I, 1 * You
"I" => 2 * love, 1 * really
"love" => 3 * cake, 1 * pancakes
[...etcetera]

So now, we have a model, which can then generate text, which might or might not make a lot of sense. This model, for example, might generate "You love pancakes". It will never generate "You really love cake" or "I love you", since the training data never had "really" following "you", or "I" following "love". It's too simple to not make sense, but as soon as you have a little more training data, grammar goes out of the window fast. Higher order models might make a little more sense, but might become overtrained and not generate new things a lot. Generally, second and third order is pretty decent at not being *too* nonsensical.