This series uses the same bigram dataset as the word
spectrum visualization. Please refer
to that page for an extended description of the data and processing.
To eliminate occlusion, I developed an entirely different layout.
Now, instead of a continuous spectrum of words, words are bucketed
into one of 25 different rays. Each of these represent a different
tendency of use (ranging from 0 to 100% in 4% intervals). Words
are sorted by decreasing frequency within each ray. I render as
many words as can fit onto the canvas. There is a nice visual analogy
at play - the "lean" of each ray represents the strength
of the tendency towards one of the two terms. As in the word
spectrum visualization, font size is based on a inverse power
function (uniquely set for each visualization, so you can't compare
across pieces). Common words (a, the, for, as, etc.) are not shown.
I was really pleased at how many interesting details get packed
into these fairly simple visualizations. I think they offer an interesting
insight into our language and what topics are prevalent on the web.
This is only a subset of possible word pairings. If you have a
interesting idea for a word comparison, email
me.
Warning: the visualizations use actual word frequencies from the
web - foul language is present!
Each thumbnail links to a PDF version.