web association visualization

 

This series uses the same bigram dataset as the word spectrum visualization. Please refer to that page for an extended description of the data and processing.

To eliminate occlusion, I developed an entirely different layout. Now, instead of a continuous spectrum of words, words are bucketed into one of 25 different rays. Each of these represent a different tendency of use (ranging from 0 to 100% in 4% intervals). Words are sorted by decreasing frequency within each ray. I render as many words as can fit onto the canvas. There is a nice visual analogy at play - the "lean" of each ray represents the strength of the tendency towards one of the two terms. As in the word spectrum visualization, font size is based on a inverse power function (uniquely set for each visualization, so you can't compare across pieces). Common words (a, the, for, as, etc.) are not shown.

I was really pleased at how many interesting details get packed into these fairly simple visualizations. I think they offer an interesting insight into our language and what topics are prevalent on the web.

This is only a subset of possible word pairings. If you have a interesting idea for a word comparison, email me.

Warning: the visualizations use actual word frequencies from the web - foul language is present!

Each thumbnail links to a PDF version.

 
google vs yahoo visualization
mac vs pc visualization
microsoft vs apple visualization
chinese vs american visualization america china
british vs russian visualization
science vs faith visualization
google vs yahoo visualization

 

 

   
chris.harrison@cs.cmu.edu
© Chris Harrison