Tomorrow night, President Barack Obama will give the annual State of the Union address to a joint session of Congress. Today, you can generate your own random speech with Sunlight's new State of the Union Machine that is modeled on the language in different presidents' previous addresses.
The project generates random text so the speeches will likely be a mix of eloquent presidential prose and uncomfortable executive dissonance. To generate a new speech just adjust the sliders to alter the weight given to each president. We built the State of the Union Machine using language modeling to randomly generate text based on different presidents' previous speeches. These models are called "n-gram models" or "Markov models," and are used in many places, from machine translation to DNA sequencing. There is one model for each of the nine presidents we chose. Each sentence is generated using a single model, using inputs from the previous sentence as context. You can mouse over a section of the generated speech to see which president's model is being used.
The models are trained on a corpus of text where they learn about the probability of a word given its preceding context. It's a bit like a robot that learns how to fill in the blanks. For instance: Models trained on recent presidents have learned that the words "my fellow" are frequently followed by the word "Americans."
We trained our model on the archive of previous State of the Union addresses made available by researchers at the University of California, Santa Barbara's American Presidency Project. We used the Natural Language Toolkit's language modeling tools to train and all the code is available in open source on our Github repo.
We had a great time building it and hope you enjoy it, too!