The AOL data set will live in infamy for it's much hyped breach
of privacy. The data is a nice compliment to the voyeur
data set as it is different in a several important ways. First,
it is significantly larger (~30 million search queries). Secondly,
the data was collected from March to May, 2006, a three-month period,
and for a subset of users. Third, AOL caters to a very different
user demographic; it is primarily targeted at home users, and thus,
search queries seem to reflect more personal and less work-related
topics. Adding to this difference is the fact the population on
the internet has dramatically changed since the late 90s.
- Each month has four weeks (the first week is day 1-7, the second
week is 8-13, etc.) Months are separated by a gap. The inner four
rings makes up the month of March, followed by April in the middle,
and finally by May on the outside.
- The size of the font is a non-linear relationship that corresponds
to the number of times the term appeared in that hour. This was
necessary to dampen very frequent terms, such as myspace, and
allow less popular terms to remain readable. (e.g. (1000 hits)^(0.66)
= Courier size 95). Of course, the non-native-resolution versions
(like the thumbnail above) had a linear scale as well.
- This is AOL’s data, and I have no idea how they put it
together. Thus, I cannot vouch for its reliability or independence.
We just have to assume it’s somewhat randomized. At quick
glace, it seems that more data was included for March than the
other months (based on term frequency - you can see this in the
image above clearly). I chose not to normalize frequency based
on total number of searches, as I felt that removed some transparency
from the visualization; You see it as was rendered, straight from
This data only spans nine weeks, and searching trends seemed to
have changed little over this period (unlike the drastic differences
in the multi-year voyeur data set). However,
this is not necessarily bad – it simply shows that searching
behavior on a weekly basis is not that volatile.
The most obvious trend is that myspace is popular - searches for
the social website increase as people get home from work and fall
off as people go to bed.
Perhaps more revealing are the second through fifth search terms.
eBay picks up in the afternoon and evening period as one would expect.
Entertainment related terms (lyrics and games) grow from 4pm onwards
until bedtime. Sex and other porn-related terms are prevalent at
night, starting around 11pm, although their frequency pales in comparison
to daytime searches. Civic terms, such as state, county, gov and
Florida are surprisingly ubiquitous, although mostly popular during
the workday. Is AOL's average user a retired Floridian?
There are a few week-specific blips. Some are explainable, such
as "Easter" and "happy" (see 24:00 hours on
6th ring out, aka the week before easter in 2006). I have no clue
why "profileedit" and "myspace" become so popular
in the 8th week (22-23:00 hours). Adultfriendfinder(.com) is also
popular for a week (5th ring out, 3am-6am).
Data Visualization with Google and Yahoo
Many AOL users tend to type URLs into the search field, or the
names of popular sites. This is why we see myspace as a search term
even though people could just type myspace.com in the address bar.
Google and Yahoo were added to my stopwords because they were so
prevalent and not that revealing about what people were searching
for (only that they were searching). However, I felt it would be
more complete if I provided an additional visualization which included
Google and Yahoo. In order to dampen how frequent these terms appeared,
I tweaked my power (really root) function to 0.65.
Special note: this page was slashdotted
on March 2nd, 2007.
Return to SearchClock Homepage
Go to Home Page
Go to Projects Page