TLDR Amazon turns a blind eye towards book piracy. A cynical take is that they do it to line the pocketbooks.

Gaming the system

Why would the world's largest book marketplace encourage piracy?

A shady side of Amazon rears its head and is the bane of book authors everywhere. This is not limited to self-published books, it also impacts books published by the largest publishing houses. I'm talking about the rampant piracy on Amazon's book marketplace, enabled by KDP. (I'm going to focus on KDP and ignore the equally problematic issue of counterfeit or "bootleg" books that are also frequently available on Amazon.)

KDP (Kindle Direct Publishing) has minted a new generation of authors. You are no longer at the mercy of a publishing house. Simply publish your content (like Hugh Howey, author of Wool did) and wait for the checks to come rolling in (some estimate Hugh makes $4M per year). Now, like all content games, there is long-tail behavior here and Howey is an outlier. However, even small market books can provide a nice income (my books monthly sales generally would pay for my mortgage.)

When you disrupt a market and ease the ability to play, you will find folks gaming the system. And some do it in less reputable ways.

Book Reception

My most recent book, Effective Pandas, was released at the end of 2021. Reception and reviews for the book have been excellent. In the past, I relied solely upon Amazon for book distribution (which is fair since they are the largest platform).

This collection will get you close to 98%-99% of all the necessary core skills to be a good Data Scientists. @tunguz*

However, for this book, I changed things up slightly. I only offered the physical book on Amazon and sold the digital book from my own store (Amazon forces you to price from $3-$9.99 for ebooks and I felt that my book offered much more value than that). I also wanted to make sure that I'm building up my own platform to future-proof myself.

Amazon can handle everything with the physical book and I'll gladly give them a cut for dealing with that. (Ok, one issue is book cost. This is a full-color book and some complain that the physical book is too expensive in places like India. I don't really have a solution here as creating a greyscale version would IMO drastically lower the user experience with my book).

On my platform, I get a large cut of a book that is priced more inline with the value it provides. I also get to interact with readers and provide more Pandas best-practices via my mailing list. Admittedly, I am missing out on some organic Amazon users who don't want the physical book.

The scheme has worked out so well that the book is routinely recommended and is ranked relatively high on Amazon. And you can reverse engineer proceeds from Amazon ranks. Enterprising folks, I'll just come out and call them "pirates", do exactly that. They look for ways to ride on the coattails of successful books.

Two Recent Experiences

In March of 2022, I was made aware that some readers were upset with the formatting of my Kindle version. This surprised me because I didn't have a Kindle version! Or at least I thought that was the case until I went to Amazon and found that a Kindle version had been uploaded! Right on my book page, there was now a Kindle version available.

You might think, "Why do you care, someone else took the effort to put up a Kindle version, so you can sit back and enjoy the proceeds?"

Well, I don't get the proceeds from the Kindle version. The anonymous uploader does. And Amazon let them easily upload a book as if they were me.

This was a time when my book was selling very well. And this pirate was able to jump in and capitalize on that.

This ebook had major formatting issues that my ebooks don't suffer from, so my worst review at the time was from a book I didn't even create.

In June 2022, I was made aware that the content of my book was stolen and uploaded to Amazon under a new title, Best Practices for Manipulating Data with Pandas (Treading on Python). The author, "Samuel Kramer", has also re-published Francois Chollet's wonderful deep learning book, Sebastian Raschka's ML book, Dan Robert's deep learning book, and Stefan Jansen's Algorithmic Trading book. (Interesting that many of these are in @tunguz tweet in the image above.)

My book is self-published, but the others are published by Packt, Cambridge University, and Manning.

This is not just a problem with self-published content.

The Flex Tape Solution

Dealing with piracy is a never-ending game of whack-a-mole. I get that some folks legitimately can't afford a book at full price. (This is one of the advantages of running my own platform, I can provide pricing parity discounts.) And other people seem to enjoy the thrill of collecting "free" things. They love a deal.

However, when your biggest partner undermines you, it kind of hurts. Even more so when this is a problem that could easily be solved by a summer intern. (I'm happy to consult with Amazon to fix this.)

I'll jump through the hoops that Amazon makes to remove this content. Playing email tag, proving that I own the content, and that my copyright is being violated while the pirates continue to profit off of my work. After a week or so Amazon will take down the offending content.

I don't know who created and uploaded either book but I estimate that the person who uploaded my unauthorized Kindle version made around $1,000 during the short period that it was available. It doesn't look like "Sam Kramer" has many sales right now but that could change quickly if they learn how to game a few other things to boost the rank of these books.

Why this won't get fixed

In addition to teaching machine learning, I also consult around ML.

This problem could be fixed easily with ML.

However, the best ML projects are projects that make a boatload of money or save money. What happens if Amazon fixes this gaping hole? The cynic in me says they don't care to because they get paid either way, for both real books and pirated books.

Could they save money? I generally have 4 back and forth emails to resolve issues like this. I'm not sure of the backend process but I think it involves multiple continents, managers, and manual processes. Let's say it takes 1-3 hours at $100/hour overhead. These are the business decisions that actually go into deploying ML. I can't imagine that Jeff and folks haven't run the calculus and have determined turning a blind eye is better for their pocketbooks.

Takeaways for authors

What are the takeaways for authors? Piracy is a never-ending game. When you release something on the internet you need to be aware that it will be pirated.

Your publisher (or self-publisher) will probably do little to counter piracy. You need to determine what will be your take. I've grown into the opinion that someone who steals my book is not a customer that I'm interested in bending over backward to cater to. I mostly ignore that. I would like them to consider how they would feel if I asked them to perform their work for me for free.

However, if someone is making a US salary and pushing others in a similar position to steal books, that is not cool. (I was at a meetup where the coordinator shared a drive with attendees. I called him out and he wasn't happy.)

Amazon enabling this is pretty depressing. I don't have more words to describe it. One thought is to stop writing books altogether.

Moving to platforms that give authors more control might be comforting but can have financial ramifications. Amazon has a huge search engine that enables purchases with a single click. If you can crack that nut, the algorithm can generate many sales.

I'm positive that I have less readers because my ebook is only available on my store and not Amazon, but I'm ok with that. I want to be able to provide extra value to readers and I don't feel like Amazon enables that.

Another way of phrasing this is "own your platform". If Amazon, Twitter, YouTube (or insert social media platform) decides to cancel you (which happens), you still have access to your customers.

What are your thoughts?