Taking Web Page Screenshots with PhantomJS and Python
Taking screenshots of Web pages is a common task for Web developers and publishers, so it's a good idea to automate it. Searching for existing solutions to the problem, I found PhantomJS, a headless WebKit with a JavaScript API, that makes it pretty easy to do so.
Simple PhantomJS example
With just the few lines of PhantomJS code below, you fetch the Web page at http://example.com
, render it, and save the screenshot as a PNG file with the specified width and height.
var page = require('webpage').create(),
url = 'http://example.com',
w = 1024,
h = 768
page.viewportSize = { width: w, height: h }
page.open(url, function(status) {
if (status !== 'success') {
console.log('Unable to load url: ' + url)
} else {
window.setTimeout(function() {
page.clipRect = { top: 0, left: 0, width: w, height: h }
page.render('img.png')
phantom.exit()
}, 200)
}
})
This code creates a webpage object, defines some variables, sets the browser viewport to the given values, requests the URL, and if successful, crops the image to the desired size and saves it as img.png
.
Without setting page.clipRect
the height of the created image may be bigger, since PhantomJS renders the entire Web page and not only the portion that fits into the viewport.
Of course, it's not satisfying to modify a script each time a different URL or resolution is required. Thankfully, the PhantomJS API enables you to process command-line options, so that variables can be passed to the script.
I ended up creating the webshots project. It currently consists of a Python script, that simply calls phantomjs repeatedly to create screenshots for common browser resolutions or, if called with width and height arguments, creates one screenshot image with the appropriate dimensions.
Why Python?
Initially I created a JavaScript only solution, but repeatedly calling a function, that opened and rendered the page at different resolutions, wasn't working well without setting high timeout values. And even then, in case of errors, it either did not terminate, because of never calling phantom.exit()
or produced weird JavaScript error messages.
Instead of constantly getting these errors or ending in callback hell, I decided to write a Python script that spawns one phantomjs process for each screenshot, which works well.
Installing and using webshots
To use webshots you need to download and install PhantomJS first. If you run Ubuntu Linux, you can install it via sudo apt-get install phantomjs
, which will give you version 1.4.0 and not the latest 1.6.1 at the time of writing.
Then clone the webshots repository and create symbolic links to the Python and JavaScript files in a directory that contains executable scripts. On Linux based systems you would do something along the lines of:
git clone https://github.com/yaph/webshots.git
ln -s /path/to/webshots/webshots $HOME/bin
ln -s /path/to/webshots/webshots.js $HOME/bin
If that's all done, you can create several screenshots at common browser resolutions by calling:
webshots http://example.com
or a single screenshot at the given browser resolution with:
webshots --width=800 --height=600 http://example.com
What's next?
Right now webshots works well for me, on the one system I used it. A potential next step would be to release it as a PyPI package to make installation and possible future upgrades easier. If you have ideas for how to improve webshots or encounter a bug, feel free to create an issue on GitHub.
Featured Merch
Latest Posts
- Introducing Charla: A Terminal-Based Chat Client for Language Models
- Introducing Hashtagify: Easy Hashtag Creation for VS Code
- Synchronize Files Preserving Remote Ownership and Permissions with rsync
- How to Exit a Python Program Gracefully
- How to Import Modules in a Standalone Python Program
Featured Book
Subscribe to RSS Feed
This post was written by Ramiro Gómez (@yaph) and published on . Subscribe to the Geeksta RSS feed to be informed about new posts.
Tags: automation code javascript python tutorial web development
Disclosure: External links on this website may contain affiliate IDs, which means that I earn a commission if you make a purchase using these links. This allows me to offer hopefully valuable content for free while keeping this website sustainable. For more information, please see the disclosure section on the about page.
Share post: Facebook LinkedIn Reddit Twitter