(Ab)using Pandas to Migrate Disqus Threads

(Ab)using Pandas to Migrate Disqus Threads
pandas: cool for several reasons

I recently converted an old site of mine from Drupal to a static Web site created with Logya to save some kittens' lives. I intend to write a more detailed post about the process, but will focus on a URL issue here.

Logya is flexible regarding URLs, accepting common file extensions like .htm, .html and even .php, but the most straightforward way is to end them with a forward slash. On the old Drupal site I had a mix of URLs ending with .html or no file extension but without an ending slash, e. g. www.ramiro.org/blog/umstellung-von-joomla-auf-drupal.

In theory I could have kept all URLs like they were, because Apache takes care of redirects, if the path corresponds to a directory on the server, which it does. But reality is different, since I use Disqus for comments and the redirected URLs differed from the ones Disqus knew about.

To resolve this issue I took advantage of the Migrate Threads tool Disqus offers. You find it at your-site-id.disqus.com/admin/tools/migrate/. For cases like this you can download a file containing the URLs Disqus knows about on your site and upload a CSV file which maps old URLs to new ones, hence URL mapper.

To create this mapping I wrote the following short Python script, using the pandas library, which is actually meant to facilitate more sophisticated tasks like doing data analysis, but also takes the pain out of dealing with CSV files in Python.

In addition to appending a slash to URLs that don't end with .html the www subdomain is removed, because short URLs are sooo en vogue. To have Apache redirect from www to non-www I added the following generic rewrite rule to the .htaccess file.

Usually, I use the Python standard library for reading and writing CSV files, but pandas came in quite handy here. I'm curious to learn about other somewhat deviant use cases, feel free to share yours in the comments.



blog comments powered by Disqus