| 
Aaron Swartz, who runs theinfo.org,
contacted me back in January '08 with an interesting data set. He
had built a list of 735,323 books by crawling Amazon.
Of course a gigantic list is pretty boring, but Aaron had also captured
similarity data between books. In particular, he had amassed a whopping
10,316,775 connections (edges) between books Amazon believed were
related. This allowed me to throw the data into my old wikiviz engine
to spatially layout a huge mosaic of books (I let it run for a 140
hours). Items that were noted as being similar had attractive forces,
bringing them together, often into large groups. Unsurprisingly,
when we color coded by Amazon book category, there was an obvious
coalescence. The way various high-level categorizations mix and
meet also seems fairly logical.
I produced a few versions of what I am dubbing the Amazon Book
Map. The first visualization is a huge mosaic of book covers, tinted
by their respective category colors. I can't produce this in one
go at full resolution because the memory requires are enormous.
The second version uses color-coded dots.
The layout (clustering-wise) is decent, but not great. I don't
think my algorithm works all that well for highly-unstructured graphs.
For those that are curious, I've included a small graph of how the
layout converged. Details below.

Book Cover Version - Download
Full Resolution JPG (10,296 x 15,444)

Close-up of Book Cover Version

Super Close-up of Book Cover Version (email
me if you want this level of resolution).

Dot Version - Download Full
Resolution JPG (8580 x 8580)
 |
This is a graph illustrating how the layout
converged over time. The visualization can be though of like
a huge mosaic or checkerboard - a single book occupies a single
spot. When first starting the layout process, books are on
average 353.25 spots away from the items they are associated
with (centriod). The layout algorithm shuffles books around
in an attempt to place items that are associated near each
other, thus producing a increasingly superior layout. After
10,000 iterations, the average distance had decreased to 9.12.
|

Color Coding Key - Amazon Book Categories
Go to Home Page
Go to Projects Page
|