Last month Nicholas Rougeux released a series of posters titles “Sonnet Signatures”, which are computer generated visualizations of William Shakespeare’s 154 sonnets. Each visualization is unique, and they all look like abstract calligraphic symbols. Overall, they have a remarkably “Chinese” or “Japanese” look to them, which is a marvelous, if ahistorical, association for Will’s poetry.
Let’s be clear: these glyphs are very abstract, and have no obvious semantic relation to the texts, their oral performance, or their emotional significance. If anything, I would say that the the stark, pristine strokes are the antithesis of the deep, passionate emotional content of the sonnets.
I’m no expert on poetry or these sonnets, but I do know this: these works were created to be oral recitations. The ink on paper, and now the pixels on a screen, are not in any way the intended representation of the work, they are just mnemonics for oral interpretations.
By extension, the visualizations by Rougeux are yet another irrelevant representation, and in fact, as a representation based on the written text, is a second order irrelevancy. It would make more sense to visualize a digital recording of an oral reading of the sonnets, no?
In any case, curiosity drove me to wonder what techniques were used to generate these striking representations, however misguided. I was not surprised that the technique is simple and shallow. (“Simple and shallow” is not necessarily a bad thing for an algorithm, especially if it yields interesting results.)
The actual technique does simple arithmetic, coding each letter as a number (a =1, b =2, etc.), and computing the average numeric value for each line of the sonnet. So, each sonnet should have 14 numbers, representing each line. These numbers are then visualized by plotting them and tracing a line through the points in order of the poem. The line is rendered to suggest a calligraphic stroke, fat at the start and steadily thinning to disappear at the last.
It’s just that simple.
Looking at the method, we can see immediately that each visualization will be different, though this represents only the effect of the encoding scheme, not anything particular about the words themselves, or the overall poem. His comment that “No two are the same—or even similar” is vacuous.
It is also clear that this encoding not only has nothing to do with meaning, in any sense of that word. The specific letters used to spell out the words are arbitrary in the first place (for English spelling is not phonetic and generally chaotic), and the chosen encoding (“a” = “1”) is a further level of arbitrariness. (“a” could equal any number you want, right?) Taking the average of these meaningless numbers yields a meaningless number, and one that tosses away a lot of information contained in the encoding.
The word “any” has the “average” 1+14+25 / 3 = 13.333, which equals “m” and 1/3. In turn the word “some” has an average of 13.0, which equals “m”. These numbers are pure nonsense.
Now, there are ways that this idea could be done with much more respect for the actual poetry. First of all, sonnets are composed with great attention to syllables (not letters). Perhaps one could assign numeric values to syllables, and compute a representative number for each line.
Furthermore, the lines are rhythmic, with different emphasis on the syllables, and with rhymes. I note that people have been diagramming these structures for centuries, so there are definitely ways to encode these artifacts. Wouldn’t it make more sense to apply the visualization to the poetic pattern, rather than the meaningless atomic letters?
Even better, shouldn’t the input be a digitized voice, reciting the poems? Speech analysis is much more difficult that trivial text analysis, but I’m sure that a reading could be represented by a handful of interesting features, which could then be visualized. (It would be even cooler to generate the visualization in real time, as the recording plays. Watch the glyph unfold as the reader speaks the lines.)
Am I being unfair to Rougeux? Am I taking his work out of context, applying inappropriate technical demands, and generally being a pest?
I don’t think so.
He admits that this technique is shallow, but he thinks the results are provocative and interesting.
“Connections between the shape and the meaning of a sonnet is coincidental but a welcome interpretation. The signatures are not meant to assign meaning but to inspire others to think about them differently than before.”
Well–he inspired me to think about this stuff more carefully, so he has succeeded, and I am following his intentions.
He also suggests that,
“What’s more interesting to consider is the hidden shapes revealed by looking at centuries-old poetry through a different lens.”
I don’t know about that. These shapes aren’t so much “hidden” and “revealed” as just plain made up.
One more point.
These visualizations are actually an interesting device to illustrate the way that simple arithmetic on digital text can generate “signatures” that distinguish texts from each other. This, of course, is the principle underlying checksums (used for error checking), digital hashing (used in cryptography and digital signatures), compression and encryption.
For instance, you can immediately see how these visualizations form a digital fingerprint for each sonnet. We can immediately tell one from another by comparing the glyphs (the 14 numbers in each), and we can reject a fake sonnet or a modified version, because the signature will not match any of the
This principle underlies important technologies including passwords and passphrases, document verification, and even cryptocurrencies such as Bitcoin. These visualizations are quite attractive, and they show the concept quite nicely. In other words, this is a nice approach, though not necessarily for the reasons Rougeux suggests.