Migration to Jekyll from Blogger

I’ve been writing on this blog for something like 18 years now, which is just wild. When I originally fired up this blog as a way to improve my technical writing out of High School, I picked Blogger since they had already been aquired by Google at that point, so it was certain that Blogger would continue to be a thing into the foreseeable future.
Fastforward to 2024, and Blogger being a Google product now makes its ultimate demise a certainty, so clearly we need to come up with a new plan before Google yanks the plug with relatively short notice.
Since I’ve been running my own hosting network since 2017, and comments on blog posts as an idea have run their course to the spam and hate filled conclusion, migrating my whole blog to a static site generator and throwing it in my self hosted network seems like a very reasonable next step.
I originally started this work in late 2020, and rapidly came to the realization how much work this was going to be to migrate more than 500 blog posts to literally anything. If you’ve been wondering why this blog went oddly quiet for the last few years, a lot of that has been that I didn’t want to write anything new until the migration was complete, but every time I picked up the migration again, I’d chip away at another dozen blog posts or two, and then lose interest again and put it back down.
At this point, I’m just calling it. I’m not done with the migration, and have actually only updated about 20% of the posts to using Markdown, but I came to the conclusion that if anyone is looking at any of my writing prior to 2015 still, they’re just going to need to come to terms with some of the formatting being a little wonky and we’re all going to move on from that.
The Plan
I am a… lukewarm fan of Jekyll the static site generator. It is pretty good, with a lot of problems around dependency management, and it seems like the new hotness has moved on past it to other static site generators like Hugo, but I’ve already got a half dozen projects under my belt using Jekyll, so better to stick with the devil I know.
The Minima theme from Jekyll gets the job done, but I did actually go as far as editing the color scheme in the sass files to get it so this website doesn’t glaringly look like the default Jekyll theme like all my other websites.
Taking the single huge XML file export from Blogger, Jekyll actually has a handy tool that will parse out the XML file into individual Markdown files for each blog post with all the requisite frontmatter to get the blogpost to render with the correct URL, etc.
A good first step, but all of each post’s content is still the machine generated HTML out of the Blogger editor, with the photos still embedded as one of about three different generations of how Blogger handled photos in blog posts.
There was some broad frist passes applied to ALL of the posts at once using sed and awk to convert <br /> tages into new lines, break each sentence onto its own line so interacting with the markdown files wouldn’t involve 1kB long single lines, deleting nbsp tags, etc.
But there’s still a lot of per post work that just took… ages. To handle the pictures, which were still getting linked from Google’s own CDN, I need to download all of them, delete the hrefs from the markdown, and insert a proper image reference to the local copy.
The one thing that made this slightly easier is that each Blogger photo uses a lower resolution copy, and then links to a higher resolution copy that always has “1600” in the URL
cat blogpost-im-currently-working-on.md | grep -Po "http.*?1600.*?jpg" | while read PICTURE; do wget $PICTURE; done
Did a respectably good job of pulling out most of the images per post! For posts where I was using more diagrams and figures, I’d need to run it again looking for PNGs, but that automated most of the external work.
Rinse and repeat for about 100 of the posts so far, and I’ve just had it. So if you go back and look at any of the post in the 2011-2015 era, it’s likely the formatting might be kind of wonky or really wonky, but ideally all of the content should still be there in one form or another. Links and code snippets are a total mess, so a low level of consistency there, but again, we’re just going to live with it. If anything is completely mangled, I have faith in our true lord and savior the Internet Wayback Machine, which should have a very good view of all of my blog posts by now.
I ultimately decided that getting this new framework live and load bearing was more important than updating alllll the old posts beforehand (perfect really is the enemy of done). So hope you enjoy; let me know if anything breaks wildly. I tried to at least get the main RSS feed in the same place as where Blogger put it, but Bloggers support for things like an alternative ATOM feed and… RSS feeds for comments per blog post (I think?) just weren’t going to happen.
As for this now being a totally static site without support for comments, if you want to yell at me my email address is right down there, and if you feel like you need to dunk on my ideas in public, it shouldn’t be too hard to post a screenshot of any of this on… TikTok?
Stay mindful everyone, and keep being humans posting about things you’re passionate about online. It seems like that’s becoming an increasingly rare commodity these days.