I’ve been a big fan of Austin Kleon’s blog and newsletter for a long time. He recently referenced his Year with Thoreau where every day he read Henry David Thoreau’s journal entries for that day of the year. He went old school and used a physical book but it got me thinking how helpful it would be to have a digital version automatically curated every day. In the past, this kind of project would’ve stopped at the idea phase. It’s neat but not worth the long hours of work to fetch, prepare, format and package nicely. The latest LLMs lower the bar to complete this kind of just for fun project - so I decided to give it a go.
I started by downloading Thoreau’s journals from one of my favorite sites, Project Gutenberg. They have journals from 1837-1846 and 1850-1851 available but I chose to use the journals from 1837-1846 since they were the most organized. In Thoreau’s later journals, many entries are undated, whereas the earlier entries are more consistent, making it easier to automate parsing.
I worked with Claude to separate the journal entries by date. Data preparation ended up being the most difficult part of the project. I discovered a lot of edge cases in the journals: from endnotes, to links, to how dates are formatted. In some entries, Thoreau placed the day of the week beside the date and he varied abbreviation throughout. Some days he wrote multiple entries and other entries aren’t dated at all. This variance reveals Thoreau’s humanity but it makes it hard to automatically parse the entries. After each improvement to the journal parsing, I used Datasette to manually inspect the data, which helped me discover new elements to improve. One of those was the many endnote links scattered throughout the entries. These are quite helpful when reading an eBook, but weren’t something I wanted to include. I also used Readability to remove any images and to sanitize the HTML.
Once I had the data prepared, it was all downhill. I had Claude and ChatGPT generate the HTML and CSS for the page and help me deploy it to a CloudFlare bucket. I made most of the tech stack decisions before starting, which made it easier to point the robots in a consistent direction. I mostly went with my normal stack, using TypeScript, Tailwind and Bun but used vanilla HTML and vanilla Typescript. I didn’t want to run a server to generate the responses for a day, so I decided to ship the entire (2MB) database as part of the request and let client side JavaScript retrieve the correct entries based on the viewer’s date.
This project highlights the best of building with LLMs. Far from replacing human work, they can be used to highlight it and make it more accessible. This is the bright side of the new abilities we’ve found ourselves with. In the best case, they remove the friction that keep projects as ideas in your head, and like any good tool, enhance the capabilities of the one wielding it. Check out it at Today in Thoreau's Journal!