A Token of Tolkien

The Hobbit and The Lord of the Rings are undeniably great books, with rich complex characters and a deep thoughtful story. So, this seems like an ideal case to apply an autosummary algorithm!

The method I programmed is pretty straightforward, where I find the most representative sentence within a given blob of text. This is accomplished by tokenizing each sentence, converting to a tf-idf matrix, multiplying this matrix by it's transpose to get a similarity between sentences, mapping to a graph where each node is a sentence and each edge is its similarity, then PageRanking to determine the 'best' sentence.

The 'ok, these seems reasonable, I guess' results are shown below:

Photo

Special Bonus: When I apply this autosummary to sentences that contain the word "Sam" throughout The Lord of the Rings trilogy, I get:

Photo