(Ab)using Pandas to Migrate Disqus Threads

I recently converted an old site of mine from Drupal to a static Web site created with Logya to save some kittens' lives. I intend to write a more detailed post about the process, but will focus on a URL issue here.

Logya is flexible regarding URLs, accepting common file extensions like .htm, .html and even .php, but the most straightforward way is to end them with a forward slash. On the old Drupal site I had a mix of URLs ending with .html or no file extension but without an ending slash, e. g. www.ramiro.org/blog/umstellung-von-joomla-auf-drupal.

In theory I could have kept all URLs like they were, because Apache takes care of redirects, if the path corresponds to a directory on the server, which it does. But reality is different, since I use Disqus for comments and the redirected URLs differed from the ones Disqus knew about.

To resolve this issue I took advantage of the Migrate Threads tool Disqus offers. You find it at your-site-id.disqus.com/admin/tools/migrate/. For cases like this you can download a file containing the URLs Disqus knows about on your site and upload a CSV file which maps old URLs to new ones, hence URL mapper.

To create this mapping I wrote the following short Python script, using the pandas library, which is actually meant to facilitate more sophisticated tasks like doing data analysis, but also takes the pain out of dealing with CSV files in Python.

In addition to appending a slash to URLs that don't end with .html the www subdomain is removed, because short URLs are sooo en vogue. To have Apache redirect from www to non-www I added the following generic rewrite rule to the .htaccess file.

Usually, I use the Python standard library for reading and writing CSV files, but pandas came in quite handy here.

This post was written by Ramiro Gómez (@yaph) and published on . Subscribe to the Geeksta RSS feed to be informed about new posts.

Tags: migration pandas python tutorial

Disclosure: External links on this website may contain affiliate IDs, which means that I earn a commission if you make a purchase using these links. This allows me to offer hopefully valuable content for free while keeping this website sustainable. For more information, please see the disclosure section on the about page.

Share post: Facebook LinkedIn Reddit Twitter