Magellan Voyeur Data Visualization

Magellan, search engine of yesteryear, offered a service called Voyeur, which displayed the last 10 search queries. Brian Amento of AT&T Labs archived this data in 10-minute intervals from 1997 to 2001. There are gaps in the data set from outages and changes to the Voyeur service. However, these events are assumed to be random, and thus have little impact on the distribution of search terms. Furthermore, because the data spanned a four-year period, I combined hourly data into yearly averages, which further helped to compensate for gaps and noise.

This data set is interesting for a few reasons. Foremost, it is more than decade old. People were searching for different things back then, and it shows. Secondly, the data spans a multi-year period, which helps exaggerate overarching trends. Lastly, and perhaps most importantly, Magellan was used to search for a variety of content by a diverse user group (including people at work, unlike the AOL data set).

 

Medium resolution
GIF - 2000x2000

Notes:

The inner most ring is the average for 1997. Rings then work outward one year at a time until 2000. 2001 was not included because only a fraction of the year was collected. The size of the font is a linear relationship with the number of times the term appeared in that hour (e.g. 100 hits = Courier size 100). Time is EST.

 

Interesting Trends:

I could explain every trend for you, but half the fun is exploring the data! For those who are lazy, here are some major (and obvious) trends to get you started:

Overall

  • There appears to be a dramatic shift away from chat and towards information retrieval between 1997 and 2000.
  • People are diurnal - search activity dies down at night and picks up again as people get up for work.

1997

  • It is clear that chat is most prevalent when people are home (evening). You can see chat frequency starting to grow around 11am, dominating by 5pm, and tapering off around 1am. It is supplanted by sex around 5am.
  • It seems people are curious about adult topics throughout the day. You can see sex jump in frequency around 11pm, reaching a climax around 2am (no pun intended) and dying down to nominal levels by 5am. However, since everyone is in bed, it clings to the top spot until pictures jumps to life, snatching the top spot as people roll out of bed.
  • Secondary terms are interesting as well. Entertainment oriented terms are popular in the afternoon and evening. University and software make their main appearance during the work day (8am-5pm). Warez makes it into the top five from 5am-7am thanks to late-night pirates and people who can’t get to sleep.

1998

  • Chat and pictures vie for the top spot starting around 5pm, continuing until 2am. However, mp3s (and download) make a strong appearance, especially at night.

1999 & 2000

  • These two years are similar, and so I've grouped them for brevity. The data shows chat, mp3s and porn begin to lose out to information, which dominates around the clock. MP3 remain popular in 1999. By 2000, e-commerce has matured; people are increasingly searching for things to buy.

 

 

Special note: this page was slashdotted on March 2nd, 2007.

 

 

 

 

 

   
chris.harrison@cs.cmu.edu
© Chris Harrison