I recently converted an old site of mine from Drupal to a static Web site created with Logya to save some kittens' lives. I intend to write a more detailed post about the process, but will focus on a URL issue here.
Logya is flexible regarding URLs, accepting common file extensions like
.html and even
.php, but the most straightforward way is to end them with a forward slash. On the old Drupal site I had a mix of URLs ending with
.html or no file extension but without an ending slash, e. g. www.ramiro.org/blog/umstellung-von-joomla-auf-drupal.
In theory I could have kept all URLs like they were, because Apache takes care of redirects, if the path corresponds to a directory on the server, which it does. But reality is different, since I use Disqus for comments and the redirected URLs differed from the ones Disqus knew about.
To resolve this issue I took advantage of the Migrate Threads tool Disqus offers. You find it at
your-site-id.disqus.com/admin/tools/migrate/. For cases like this you can download a file containing the URLs Disqus knows about on your site and upload a CSV file which maps old URLs to new ones, hence URL mapper.
To create this mapping I wrote the following short Python script, using the pandas library, which is actually meant to facilitate more sophisticated tasks like doing data analysis, but also takes the pain out of dealing with CSV files in Python.
In addition to appending a slash to URLs that don't end with
www subdomain is removed, because short URLs are sooo en vogue. To have Apache redirect from www to non-www I added the following generic rewrite rule to the
Usually, I use the Python standard library for reading and writing CSV files, but pandas came in quite handy here. I'm curious to learn about other somewhat deviant use cases, feel free to share yours in the comments.
This post was written by Ramiro Gómez (@yaph) and published on (updated: ). Ramiro is a developer who likes open source, data mining, visualization and writing. To be informed about new posts you can subscribe to the Geeksta RSS feed.Tags: pandas disqus python migration code