Royal Society Archive Visualization: 1665-2005IntroductionThe Royal Society recently provided access to an archive of papers published in the scientific academy's prestigious journals. Some 25 thousand scholarly works are represented, which date from 1665 to 2005. Many notable scientific minds are represented, including Isaac Newton, Michael Faraday and Charles Darwin. This interesting data set was ripe for some visual tinkering. The database I used was put together by Brian Amento and Mike Yang of AT&T Labs. The images are extremely large due to the huge volume of content and the necessity for high resolution print-outs. The entire timeline has been segmented into 10 sections. Contact me for high or custom resolution versions. I think this would be a very unique and educational installation for a hallway or ceiling. The length could range from 10 feet to 10,000 feet (I can render at any resolution). Medium resolution versions are linked from the thumbnails (5000x500 pixels). The following journals are included:
Author DistributionThis visualization displays papers chronologically. Paper titles radiate downward from the vertical midpoint at a 45 degree angle. Within a single year, papers are sorted alphabetically. The year a volume was published is shown, centered among it's respective block of papers. The size varies linearly by the number of number of papers published during that year's volume. Authors are shown radiating upwards from the vertical midpoint at a 45 degree angle. Their positions are computed by calculating the average position of the papers they authored. The size of the author's name reflects how prolific they were (linear relationship). Essentially, author names are "centered" above the time period they were active. Technical Note: Many of the papers in the The Royal Society database are missing author names. This is probably because of the labor needed to copy them from the old texts. In addition, names vary in format and spelling. For example, Edmond Halley is also spelled Edmund Halley, E. Halley and Edm. Halley. To compensate for the latter, names were truncated to single letter first names and full last names (e.g. E. Halley). However, this reduces uniqueness, increasing the likelihood of collisions. To avoid biasing the computation of average dates, a filtering process is applied. The process is roughly as follows: The standard deviation of dates is computed. If the standard deviation is large (which indicates multiple, time-varied, and prolific authors), the name is simply excluded. However, if the standard deviation is sufficiently small, the average date is recomputed excluding outliers. This is often the case if there is one major author and one or more lesser authors. It's really interesting to explore these images! For example, the first section (1665-1710) has Edmond Halley (of Halley Comet fame), Isaac Newton, Antony van Leeuwenhoek (inventor of the microscope) and other famous scholars. What does this show? Well, you should take a look yourself. Here are some obvious ones:
Word DistributionThis visualization has the same visual characteristics as the author distribution (above). However, instead of authors, this visualization explores the distribution of words in publication titles. words size is determined with a square root function, which helps dampen extremely common words (i.e. 'the' and 'of'). Only words used three or more times are shown. It's interesting to see how words evolve and fields like photography and electronics emerge. Some interesting and popular words with their average year:
Special Note: Average location can be deceiving. words can have parabolic or other irregular distributions which causes words to "center" above a time periods which may have no relevance. However, after an inspection of the data, I believe this is a limited problem, effecting a small minority of words.
CombinationI considered several designs for combining author and word distribution into a single timeline. Ultimately, I settled with the design below. However, from a visualization viewpoint, this is far less understandable because of overlapping elements. Since the rendering was already plagued with readability issues, I figured I'd go all out and include almost all keywords and authors regardless of significance. The resulting infographic leans more on the side of aesthetics. | ||||
© Chris Harrison |