Amazon To Bookwyrm Browser Extension

Overview

I’ve created a Firefox extension to make it easy to populate BookWyrm’s “Create Book” form with information from Amazon.

This post describes the approach taken, and provides info for anyone who wishes to create a fork that can extract data from other web sites.

If you just want the extension, you can find it here the Firefox Addons site: https://addons.mozilla.org/amazon-to-bookwyrm

⚠️ Note: This post is being written before the extension has made its way through the approval process. If that link doesn’t work, you can either try tomorrow or - if you’re a geek - install it from the source code (see below).

A screenshot of the extension being used on an Amazon book's product listing.

Why this exists

BookWyrm is a wonderful federated book review platform, but adding a new book that doesn’t already exist in the database requires manually entering all the book’s metadata. This is tedious and time-consuming, especially when all that information already exists on Amazon.

Most of the books I read aren’t in BookWyrm’s database yet, and needing to manually look up and enter all the data for a book has prevented me from leaving a lot of reviews that I really wanted to make.

So, I created a browser extension to copy the data from Amazon.

What it does and why

It would be lovely to use Amazon’s books API to get nice structured data for books. However, that API requires AWS account credentials, and isn’t free. Running a service that used their API to populate a site like BookWyrm would probably violate their Terms of Service too. We shouldn’t expect to see this kind of functionality within BookWyrm itself, because it’s likely its creators would get in trouble with Amazon.

So, this plugin just reads the HTML of an Amazon product page you’re viewing, and checks known locations for the information you’ll need to populate the “Create Book” form on your BookWyrm instance.

It does not submit the form, because you really need to verify the data it extract. There are lots of edge cases that can lead to odd results. Series - for example - is the biggest offender. When looking at a book in a series, some authors/publishers will set up the book data such that it only shows the series name, some include which book in the series, or how many books with different formatting, some put it in the title. Using Brandon Sanderson’s “The Final Empire” as an example, here are all the ways - so far - that I’ve seen a book be recorded.

  • title: The Final Empire, series: Mistborn
  • title: The Final Empire, series: Book 1 of 7: Mistborn
  • title: The Final Empire, series: Part of: Mistborn (7 books)
  • title: The Final Empire: Mistborn Book 1, series: Book 1 of 7: Mistborn
  • title: The Final Empire (Mistborn Book 1), series: Book 1 of 7: Mistborn

And all sorts of other combinations of the above. Extracting the “correct” data is pretty hard when there’s no consistency in how that data appears in the source.

We don’t want to be filling your BookWyrm instance with bad data. So, please take the time to validate all of the data that’s been copied over before submitting, and obviously search for the book on your BookWyrm instance before jumping to the creation of a new one. Much better to use / improve an existing book that others have been reading / reviewing than to create a duplicate.

Why Only Amazon?

It doesn’t currently support other sites for one simple reason: tons of indie authors only publish their books on Amazon. This is largely because it’s a requirement if you want your book to be available via Kindle Unlimited. There are lots of reasons this is terrible for authors, and publishers, and readers, but that’s a separate discussion. What’s important for the purposes of this plugin is that regardless of how you feel about them, Amazon is the one place where you can consistently go and find the book you want to import to BookWyrm.

It currently only supports amazon.com and amazon.co.uk because most - or all - of the other amazon sites are in languages I can’t read and thus I can’t verify the extension will work on them.

Why only Firefox?

Because that’s what I use. Well, technically I use Waterfox because Mozilla seems determined to make Firefox awful. I might make a Chrome extension, but - for many reasons - I don’t like using Chrome, so I’m not strongly motivated to go through all the work. 😉

If you’re a motivated geek, I’ll happily accept a Pull Request that creates a “chrome” folder with a chrome extension that reuses the core code, or just fork it and change it to be just a Chrome extension. Have fun.

Geekery

The GitHub repo can be found here: https://github.com/masukomi/firefox_amazon_to_bookwyrm/

I’ve endeavored to put all the info you’ll need to get started with modifying this to work on other sites.

Please fork it, make changes, submit Pull Requests, all the good stuff.