0

GitHub Commit Map

This map displays location information extracted from 6,826,827 commit messages from the public GitHub timeline on June 23, 2012.

Country Identification

Users can enter virtually anything they like in their GitHub Location setting, some real world examples include: "right behind you", "Earth", "The moon", "arrakis" and "The Internet". Fortunately, most provided locations are more realistic and allow for automatic identification of countries.

To do so I wrote some Python scripts you find in this GitHub repo and as a byproduct a Python library called geonamescache that provides access to a small part of the public data available from the GeoNames database without requests to their Web service.

Caveats

While the official names for the world's countries are unique, city names are not. There is more than one San Francisco, but when a user specified "San Francisco" I assumed the largest city with this name, i. e. the one in the US. The same applies to many other city names.

I also identified some countries manually, thanks Google Search, but had to draw the line at some point, so that location data from 197,139 commits (not included in the number above) remain unresolved.

What is on the Map

By default the map displays the number of commits per 100,000 inhabitants using different color values from a diverging color range.

To make the map more interactive and hopefully more useful, I added controls to choose different regions of the world, other data ranges, i. e. total number of commits and country population, color schemes and the data range to be displayed.

The latter is especially important, because there is this GitHub user from Pitcairn, who is responsible for a huge gap from the highest commit ratio to the runner-ups. By adjusting the color range maximum you can highlight the different country ratios.

The map uses the HTML5 input range element and from the three browsers I tested, i.e. Chrome, Firefox and Opera, works best in Chrome.

If you want to play around with the code for this map, you can fork this gist, the map data is available as a public Google spreadsheet.

Credits

Data for the map was obtained from the GitHub Archive public dataset via Google's BigQuery service. The map visualization uses the Google Chart Tools JavaScript library and API and color schemes from the Colorbrewer tool.

Last but not least hundreds of thousands of developers, who host their open source projects on GitHub, provided the foundation for this map.



blog comments powered by Disqus