This map displays location information extracted from 6,826,827 commit messages from the public GitHub timeline on June 23, 2012.
Users can enter virtually anything they like in their GitHub Location setting, some real world examples include: "right behind you", "Earth", "The moon", "arrakis" and "The Internet". Fortunately, most provided locations are more realistic and allow for automatic identification of countries.
To do so I wrote some Python scripts you find in this GitHub repo and as a byproduct a Python library called geonamescache that provides access to a small part of the public data available from the GeoNames database without requests to their Web service.
While the official names for the world's countries are unique, city names are not. There is more than one San Francisco, but when a user specified "San Francisco" I assumed the largest city with this name, i. e. the one in the US. The same applies to many other city names.
I also identified some countries manually, thanks Google Search, but had to draw the line at some point, so that location data from 197,139 commits (not included in the number above) remain unresolved.
By default the map displays the number of commits per 100,000 inhabitants using different color values from a diverging color range.
To make the map more interactive and hopefully more useful, I added controls to choose different regions of the world, other data ranges, i. e. total number of commits and country population, color schemes and the data range to be displayed.
The latter is especially important, because there is this GitHub user from Pitcairn, who is responsible for a huge gap from the highest commit ratio to the runner-ups. By adjusting the color range maximum you can highlight the different country ratios.
The map uses the HTML5 input range element and from the three browsers I tested, i.e. Chrome, Firefox and Opera, works best in Chrome.
Last but not least hundreds of thousands of developers, who host their open source projects on GitHub, provided the foundation for this map.
Ramiro is a developer who likes open source, data mining, visualization, and writing. To learn more about him and this site, see the about page.github visualization data analysis map