After a long day of manual labor last weekend, I spent a couple minutes relaxing by converting some PDF’s to mobi files so my mom could read them on her Kindle. Her Kindle supports PDF, but reading PDF’s on Kindles (especially of the non-DX eink variety) is a pain. You can zoom into sections, but it isn’t appropriate for long reading, and the default fonts are too small. One of her favorite features is to bump up the font size on mobi files, especially at night. So I obliged.
To that end I’ve created some code that helps in this process—ebookgenerators. Most readers don’t care about the process of converting pdf to mobi. How I cleaned up the text might be interesting though. I used a chain of Python generators!
A few have complained that my Iteration and Generator book doesn’t have enough real examples. Some form of this blog post will probably end up as a chapter there. So without further ado, here’s some code.
A concept I briefly mention in my book is a Peeker class. A peeker can look ahead during iteration. This is useful if deciding the output of an action requires more than one item. Here’s mine:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | |
I use the PeekDone exception as a sentinel value, rather that returning a special value. Here’s an example of a generator removing double blank lines from lines of text using Peeker:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
That could be done by someone fluent in awk in probably two lines. But here’s one that I wouldn’t want to touch.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | |
Here’s a simple generator without Peeker. I need to ensure that paragraphs have a empty line between them so docutils does the right thing:
1 2 3 4 5 6 7 | |
In the end, using a chain of these generators, I was able to generate three mini-ebooks for my mother before she left for a week-long cruise.
My scripts for cleaning up the text looked something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | |