Word Associations Visualizing Google's Bi-Gram DataThis series uses the same bigram dataset as the word spectrum visualization. Please refer to that page for an extended description of the data and processing. To eliminate occlusion, I developed an entirely different layout. Now, instead of a continuous spectrum of words, words are bucketed into one of 25 different rays. Each of these represent a different tendency of use (ranging from 0 to 100% in 4% intervals). Words are sorted by decreasing frequency within each ray. I render as many words as can fit onto the canvas. There is a nice visual analogy at play - the "lean" of each ray represents the strength of the tendency towards one of the two terms. As in the word spectrum visualization, font size is based on a inverse power function (uniquely set for each visualization, so you can't compare across pieces). Common words (a, the, for, as, etc.) are not shown. I was really pleased at how many interesting details get packed into these fairly simple visualizations. I think they offer an interesting insight into our language and what topics are prevalent on the web. Warning: the visualizations use actual word frequencies from the web - foul language is present! Each thumbnail links to a PDF version. | |||||||||||||||||||||||||||
© Chris Harrison |