Convert Web Pages to Ebooks in MOBI Format using Wget and Calibre

If you're an avid reader, you probably know that ebooks can be a great way to carry your favorite books with you wherever you go. However, what about those websites that you enjoy reading? You might not want to be online all the time just to read them. That's where converting web pages to ebooks comes in handy.

This article will walk you through how to convert web pages to ebooks in MOBI format using Wget and Calibre.

Wget

Wget is a free and open-source command-line utility for downloading files from the internet. It is available for Linux, macOS, and Windows operating systems.

In our case, we will use Wget to download web pages and all their linked assets, such as images, CSS files, and JavaScript files. Wget can also convert all links in the downloaded files to relative links so that they work offline.

Here is an example command that downloads a web page:

wget --level=inf --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows "https://www.example.com"

Let's break down what each option does:

  • --level=inf: Follows links to an unlimited depth (useful for downloading all linked assets).
  • --no-clobber: Don't overwrite any existing files, so you can run the script multiple times without re-downloading everything.
  • --page-requisites: Downloads all necessary files to display the page, including CSS files, images, and JavaScript files.
  • --html-extension: Save the downloaded HTML file with a .html extension, even if the original URL didn't have one.
  • --convert-links: Converts all links in the downloaded files to relative links so they work offline.
  • --restrict-file-names=windows: Replace any characters in filenames that are illegal on Windows (such as : or ?) with underscores.

Calibre

Calibre is a free and open-source ebook management tool that allows you to convert ebooks between different formats. It is available for Linux, macOS, and Windows operating systems.

In our case, we will use Calibre to convert the downloaded HTML file to MOBI format. Here is an example command that converts an HTML file to MOBI format:

ebook-convert input.html output.mobi --output-profile kindle_pw

In this example, input.html is the path to the HTML file, and output.mobi is the desired output file. The --output-profile kindle_pw option specifies the output profile, which is optimized for the Kindle Paperwhite device.

Putting it all together

Now that we know how to use Wget and Calibre separately, let's put it all together and create a script that automates the conversion process.

Once the webpage is downloaded, the script uses the find command to locate the downloaded HTML file in the specified directory. The find command searches for files in a directory hierarchy and returns a list of files that match the specified criteria. In this case, we search for files with the .html or .htm extensions in the directory named after the hostname of the provided URL. The html_file variable is to the path of the first HTML file found in the specified directory. If no HTML file is found, the script prints an error message and exits.

Finally, the script converts the downloaded HTML file to MOBI format using the ebook-convert command from the Calibre package. The ebook-convert command converts between various ebook formats, including EPUB, MOBI, and PDF. In this case, we convert the downloaded HTML file to MOBI format using the --output-profile kindle_pw option, which specifies the output profile to use for the conversion.

Once the conversion is complete, the resulting MOBI file is saved in the same directory as the downloaded HTML file, with the same name as the HTML file but with the .mobi extension. The resulting MOBI file can be transferred to a Kindle device or app and read like any other ebook.


This post was written by Ramiro Gómez (@yaph) and published on . Subscribe to the Geeksta RSS feed to be informed about new posts.

Tags: code command line document conversion tutorial

Disclosure: External links on this website may contain affiliate IDs, which means that I earn a commission if you make a purchase using these links. This allows me to offer hopefully valuable content for free while keeping this website sustainable. For more information, please see the disclosure section on the about page.


Share post: Facebook LinkedIn Reddit Twitter

Merchandise