My NaNoWrimo Stats

12/21/2013

by Gabe Koss

On a total whim I decided to participate in National Novel Writing Month. This is a month long writing marathon in which particpants attempt to write a 50,000 word novel in the month of November. I cheated a little bit and started on October 26.

Total Words	54173
October Words	2917
November Words	51256
Avg Words/Day (Nov)	1709

Progress Over Time

The vertical axis represents the word count of the story as it grew. Each bar indicates the total number of words reached per day. Hovering your mouse will show you the exact number of words reached on that date. The light line is created from the word count done each time I made a substantial save.

Common Words

After excluding common English stop words such as "that" or "is" the 10 most common words in my story were as follows:

sage	631 instances
out	315 instances
rama	249 instances
back	184 instances
one	165 instances
down	144 instances
looked	139 instances
here	138 instances
more	125 instances
know	124 instances

Common Bigrams

Bigrams are two word units such as "depraved heathen" or "kind soul". The most common two word groupings were as follows:

of the	390 instances
in the	222 instances
to the	188 instances
on the	163 instances
into the	151 instances
she had	107 instances
was a	104 instances
from the	92 instances
out of	91 instances
she was	90 instances

Code snippets:

I wrote the story with Vim and tracked my progress with Git. I did the analysis on this data using a combination of Ruby, D3.js and the Linux command line. Much of my data analysis was inspired by the classic Unix for Poets.

Here are some of the tools I used to do this analysis.

Extract top 10 words:

tr -sc '[A-Z][a-z]' '[\012*]' < story.md | tr '[A-Z]' '[a-z]' | sort | grep -E -v '^.{,2}$' | grep -E -v -f ../stop_words.grep |uniq -c | sort -n | tail -n 10

Extract top 10 bigrams

tr -sc '[A-Z][a-z]' '[\012*]' < story.md > nano.words                                   
tail -c +2 nano.words > nano.next
paste nano.words nano.next | sort | uniq -c | sort -n > nano.bigrams    
tail -n 10 nano.bigrams