Taking Web Page Screenshots with PhantomJS and Python

Taking Web Page Screenshots with PhantomJS and Python

Taking screenshots of Web pages is a common task for Web developers and publishers, so it's a good idea to automate it. Searching for existing solutions to the problem, I found PhantomJS, a headless WebKit with a JavaScript API, that makes it pretty easy to do so.

Simple PhantomJS example

With just the few lines of PhantomJS code below, you fetch the Web page at http://example.com, render it, and save the screenshot as a PNG file with the specified width and height.

var page = require('webpage').create(),
    url = 'http://example.com',
    w = 1024,
    h = 768

page.viewportSize = { width: w, height: h }
page.open(url, function(status) {
    if (status !== 'success') {
        console.log('Unable to load url: ' + url)
    } else {
        window.setTimeout(function() {
            page.clipRect = { top: 0, left: 0, width: w, height: h }
        }, 200)

This code creates a webpage object, defines some variables, sets the browser viewport to the given values, requests the URL, and if successful, crops the image to the desired size and saves it as img.png.

Without setting page.clipRect the height of the created image may be bigger, since PhantomJS renders the entire Web page and not only the portion that fits into the viewport.

Of course, it's not satisfying to modify a script each time a different URL or resolution is required. Thankfully, the PhantomJS API enables you to process command-line options, so that variables can be passed to the script.

I ended up creating the webshots project. It currently consists of a Python script, that simply calls phantomjs repeatedly to create screenshots for common browser resolutions or, if called with width and height arguments, creates one screenshot image with the appropriate dimensions.

Why Python?

Initially I created a JavaScript only solution, but repeatedly calling a function, that opened and rendered the page at different resolutions, wasn't working well without setting high timeout values. And even then, in case of errors, it either did not terminate, because of never calling phantom.exit() or produced weird JavaScript error messages.

Instead of constantly getting these errors or ending in callback hell, I decided to write a Python script that spawns one phantomjs process for each screenshot, which works well.

Installing and using webshots

To use webshots you need to download and install PhantomJS first. If you run Ubuntu Linux, you can install it via sudo apt-get install phantomjs, which will give you version 1.4.0 and not the latest 1.6.1 at the time of writing.

Then clone the webshots repository and create symbolic links to the Python and JavaScript files in a directory that contains executable scripts. On Linux based systems you would do something along the lines of:

git clone https://github.com/yaph/webshots.git
ln -s /path/to/webshots/webshots $HOME/bin
ln -s /path/to/webshots/webshots.js $HOME/bin

If that's all done, you can create several screenshots at common browser resolutions by calling:

webshots http://example.com

or a single screenshot at the given browser resolution with:

webshots --width=800 --height=600 http://example.com

What's next?

Right now webshots works well for me, on the one system I used it. A potential next step would be to release it as a PyPI package to make installation and possible future upgrades easier. If you have ideas for how to improve webshots or run into trouble, feel free to leave feedback in the comments below or create an issue on GitHub.

blog comments powered by Disqus