amazon book map

 

Aaron Swartz, who runs theinfo.org, contacted me back in January '08 with an interesting data set. He had built a list of 735,323 books by crawling Amazon. Of course a gigantic list is pretty boring, but Aaron had also captured similarity data between books. In particular, he had amassed a whopping 10,316,775 connections (edges) between books Amazon believed were related. This allowed me to throw the data into my old wikiviz engine to spatially layout a huge mosaic of books (I let it run for a 140 hours). Items that were noted as being similar had attractive forces, bringing them together, often into large groups. Unsurprisingly, when we color coded by Amazon book category, there was an obvious coalescence. The way various high-level categorizations mix and meet also seems fairly logical.

I produced a few versions of what I am dubbing the Amazon Book Map. The first visualization is a huge mosaic of book covers, tinted by their respective category colors. I can't produce this in one go at full resolution because the memory requires are enormous. The second version uses color-coded dots.

The layout (clustering-wise) is decent, but not great. I don't think my algorithm works all that well for highly-unstructured graphs. For those that are curious, I've included a small graph of how the layout converged. Details below.

 

amazon book map visualization

Book Cover Version - Download Full Resolution JPG (10,296 x 15,444)

 

 

amazon book map visualization

Close-up of Book Cover Version

 

 

amazon book map visualization

Super Close-up of Book Cover Version (email me if you want this level of resolution).

 

 

amazon book map visualization

Dot Version - Download Full Resolution JPG (8580 x 8580)

 

 

wikiviz layout graph convergance

 

This is a graph illustrating how the layout converged over time. The visualization can be though of like a huge mosaic or checkerboard - a single book occupies a single spot. When first starting the layout process, books are on average 353.25 spots away from the items they are associated with (centriod). The layout algorithm shuffles books around in an attempt to place items that are associated near each other, thus producing a increasingly superior layout. After 10,000 iterations, the average distance had decreased to 9.12.

 

 

amazon map color code key

Color Coding Key - Amazon Book Categories

 

 

 

   
chris.harrison@cs.cmu.edu
© Chris Harrison