B9 Indifference

13 Nov 2017

Output from my script, with Riker speaking the following: "Captain Picard, this is the USS Enterprise. I'm sorry, Worf... but I can't help you -- I don't know who any of you are. Give us everything you can to get them out of there Mister O'Brien... We're fine tuned enough to see your parents, that's your business... but we don't get in that thing, I guarantee you won't either."

More captains for NaNoGenMo 2017, but I have moved from the eighteenth century to the twenty-fourth! I also moved from simpler random text generation to something slightly more sophisticated – Markov chains.

I can’t pretend to understand the mathematical underpinnings of Markov chains, but I was able to find the wonderful markovify package to do a lot of the heavy lifting for me. With it I was able to take a body of text – in my case a collection of Star Trek: The Next Generation scripts – and use that to generate random sentences that are probabilistically based on the originals. The package does a good job of making sure that the sentences it generates don’t overlap too closely with those originals as well. It seems to build it up word by word, deciding what’s the most likely word to follow the current one, and you can almost see it doing this, and see how it gets it wrong:

PICARD: I’m wary about making changes in this time period, the Klingons have taken over the Romulan Empire… But more than that, I was beginning to get used to the idea of death has a terrible sense of finality to it. Shock, certainly, at the sight of your friends and family.

The markovify package gives you some dials you can tweak to fine tune your model, and one of them is state_size – “State size is a number of words the probability of a next word depends on.” I found two or three to be the right number for my model overall, but you can see how it easily gets tripped up in the text above:

given “the idea of” it picks “death”;
given “idea of death” it picks “has”;
given “of death has” it picks “a”; and
given “death has a” it picks “terrible”

At that point the sense from the beginning of the sentence has completely changed and it no longer scans; it’s almost like the model doesn’t quite know when to quit while it’s ahead! I do enjoy these little moments, though, where you can almost “see” the underlying model, this invisible mathematical force at work.

Github repo