| PDF from EPUB
An entry in my dirty little AI tools repo
Apr

At the current juncture, where the internet is still open enough to be searched by agents, and models have minimized many of the issues of earlier versions, it feels like we’re kind of in a sweet spot between the old and new web. By the new web, I mean a time where data becomes increasingly walled off, and access falls almost entirely into a transactional model.

The Peter Watts Rifters trilogy touched on this back in 1999 better than I think a lot of modern fiction does. In it, people had to negotiate with an intermediary AI (digital organisms) to try and negotiate access to information on a web that had become choked with self-replicating bots vying for turf, hardware, information, and whatever.

What this means for future us, I have no idea, but for today it means that I can use an LLM to rip out some nice little programs for things I’ve been wanting. I have a number of them now, and I plan to tuck them away in a single repo in order to keep my Github from becoming a mess. The first thing I made, which I apologize for introducing in this rather roundabout way, is something I made for single book: Frank Meyer, plant hunter in Asia.

I have an old print version of this book, but it’s now sitting at my parents' place back in the U.S. I was able to find a scanned PDF version, but reading on a tablet is not the most enjoyable. Converting it to an epub seemed to be the best choice, so I iterated for a bit on getting Claude to produce something that was reasonable, starting with using pdftotext, pulling out images so they have their own pages, and making sure that paragraphs were stitched together correctly. All together it took about an hour, and with my initial spot checks it seems to be more than readable enough. I’m going to keep track of issues as I read it, list them here, and then create a follow up tool which will address them.

The long term goal would be a more generic tool that can be used to for any PDF, but given the variation of books and formatting that may be more difficult..

The repo: https://github.com/kilroyjones/misc-ai-tools/