AOL Data Visualization

The AOL data set will live in infamy for it's much hyped breach of privacy. The data is a nice compliment to the voyeur data set as it is different in a several important ways. First, it is significantly larger (~30 million search queries). Secondly, the data was collected from March to May, 2006, a three-month period, and for a subset of users. Third, AOL caters to a very different user demographic; it is primarily targeted at home users, and thus, search queries seem to reflect more personal and less work-related topics. Adding to this difference is the fact the population on the internet has dramatically changed since the late 90s.

 

Medium resolution
GIF - 3000 x 3000

 

Notes

  • Each month has four weeks (the first week is day 1-7, the second week is 8-13, etc.) Months are separated by a gap. The inner four rings makes up the month of March, followed by April in the middle, and finally by May on the outside.
  • The size of the font is a non-linear relationship that corresponds to the number of times the term appeared in that hour. This was necessary to dampen very frequent terms, such as myspace, and allow less popular terms to remain readable. (e.g. (1000 hits)^(0.66) = Courier size 95). Of course, the non-native-resolution versions (like the thumbnail above) had a linear scale as well.
  • This is AOL’s data, and I have no idea how they put it together. Thus, I cannot vouch for its reliability or independence. We just have to assume it’s somewhat randomized. At quick glace, it seems that more data was included for March than the other months (based on term frequency - you can see this in the image above clearly). I chose not to normalize frequency based on total number of searches, as I felt that removed some transparency from the visualization; You see it as was rendered, straight from the data.

 

Interesting Trends

This data only spans nine weeks, and searching trends seemed to have changed little over this period (unlike the drastic differences in the multi-year voyeur data set). However, this is not necessarily bad – it simply shows that searching behavior on a weekly basis is not that volatile.

The most obvious trend is that myspace is popular - searches for the social website increase as people get home from work and fall off as people go to bed.

Perhaps more revealing are the second through fifth search terms. eBay picks up in the afternoon and evening period as one would expect. Entertainment related terms (lyrics and games) grow from 4pm onwards until bedtime. Sex and other porn-related terms are prevalent at night, starting around 11pm, although their frequency pales in comparison to daytime searches. Civic terms, such as state, county, gov and Florida are surprisingly ubiquitous, although mostly popular during the workday. Is AOL's average user a retired Floridian?

There are a few week-specific blips. Some are explainable, such as "Easter" and "happy" (see 24:00 hours on 6th ring out, aka the week before easter in 2006). I have no clue why "profileedit" and "myspace" become so popular in the 8th week (22-23:00 hours). Adultfriendfinder(.com) is also popular for a week (5th ring out, 3am-6am).

 

 

AOL Data Visualization with Google and Yahoo

Many AOL users tend to type URLs into the search field, or the names of popular sites. This is why we see myspace as a search term even though people could just type myspace.com in the address bar. Google and Yahoo were added to my stopwords because they were so prevalent and not that revealing about what people were searching for (only that they were searching). However, I felt it would be more complete if I provided an additional visualization which included Google and Yahoo. In order to dampen how frequent these terms appeared, I tweaked my power (really root) function to 0.65.

 

Medium resolution
GIF - 3000 x 3000

 

 

Special note: this page was slashdotted on March 2nd, 2007.

 

 

 

 

   
chris.harrison@cs.cmu.edu
© Chris Harrison