GitHub Commit Map
GitHub Commit Map
This map displays location information extracted from 6,826,827 commit messages from the public GitHub timeline on June 23, 2012.
Country Identification
Users can enter virtually anything they like in their GitHub Location setting, some real world examples include: "right behind you", "Earth", "The moon", "arrakis" and "The Internet". Fortunately, most provided locations are more realistic and allow for automatic identification of countries.
To do so I wrote some Python scripts you find in this GitHub repo and as a byproduct a Python library called geonamescache that provides access to a small part of the public data available from the GeoNames database without requests to their Web service.
Caveats
While the official names for the world's countries are unique, city names are not. There is more than one San Francisco, but when a user specified "San Francisco" I assumed the largest city with this name, i. e. the one in the US. The same applies to many other city names.
I also identified some countries manually, thanks Google Search, but had to draw the line at some point, so that location data from 197,139 commits (not included in the number above) remain unresolved.
What is on the Map
By default the map displays the number of commits per 100,000 inhabitants using different color values from a diverging color range.
To make the map more interactive and hopefully more useful, I added controls to choose different regions of the world, other data ranges, i. e. total number of commits and country population, color schemes and the data range to be displayed.
The latter is especially important, because there is this GitHub user from Pitcairn, who is responsible for a huge gap from the highest commit ratio to the runner-ups. By adjusting the color range maximum you can highlight the different country ratios.
The map uses the HTML5 input range element and from the three browsers I tested, i.e. Chrome, Firefox and Opera, works best in Chrome.
If you want to play around with the code for this map, you can fork this gist, the map data is available as a public Google spreadsheet.
Credits
Data for the map was obtained from the GitHub Archive public dataset via Google's BigQuery service. The map visualization uses the Google Chart Tools JavaScript library and API and color schemes from the Colorbrewer tool.
Last but not least hundreds of thousands of developers, who host their open source projects on GitHub, provided the foundation for this map.
This post was written by Ramiro Gómez (@yaph) and published on . Subscribe to the Geeksta RSS feed to be informed about new posts.
Tags: data analysis git github map visualization
Disclosure: External links on this website may contain affiliate IDs, which means that I earn a commission if you make a purchase using these links. This allows me to offer hopefully valuable content for free while keeping this website sustainable. For more information, please see the disclosure section on the about page.
Share post: Facebook LinkedIn Reddit Twitter