searchclock visualization

 

I was curious about how people used the internet. Specifically, I wanted to see how internet behavior changed over the course of a day. Search engines are the gateway to the internet for most people, and so search queries provide insight into what people are doing and thinking. I had several assumptions before I started:

  • Overall, internet usage is highest during the day, tapers off at night, and reaches a lull in the early morning hours.
  • People search for information during the workday (8-6ish)
  • People socialize or look for information of personal interest when they get home from work. (6ish to midnight)
  • People look for entertainment (often of the sexual variety) late at night and into the wee hours. (midnight-6am)

I was curious to see if data from search engines would support my anecdotal observations. I built a simple clock-like visualization that displays the top search terms over a 24-hour period. Displaying search terms in a cyclical layout (like a clock) allows continuous examination of trends that would otherwise be broken up. The data I had access to was both large and noisy. In response, I combined hourly data into week or year averages. All search strings were broken up into single words (period, commas and similar were considered whitespace as well). This helped pool frequent terms, and better illuminate search motivation (e.g. “information about taxes” and “information about chinchillas” counted as two hits for "information"). The top five search terms were shown for each hour, sized to reflect their relative frequency (larger = more popular). A list of stop words was developed to eliminate uninteresting terms (e.g. that, for, an, not, free). I have not modified the data in any way – you see it as it is.

Some might be wondering if international users in different time zones impacted the search distribution. This is probably true. However, my guess is that most users were based in North America (especially for Magellan in the late 90s and AOL in general). The data seems to support this as well, with search activity slowing down at night (western hemisphere time).

I ran the visualization with two unique data sets:


 

Special note: this page was slashdotted on March 2nd, 2007.

 

 

   
chris.harrison@cs.cmu.edu
© Chris Harrison