Migrating From Blogger to Hugo
For a while now I had been meaning to migrate my (not recently updated) food blog from Blogger to a static site. This week I finally got around to doing it!
It turned out to be relatively straightforward using Blogger's XML export facility and a handy script I found on the Internet, blog2md, to convert this XML into Markdown.
This gives you a set of files that you could use immediately in Hugo, but all the image links are still pointing at the Blogger CDN and there were a few hard-coded links to the old Blogger site. I also wanted to remove all of the Amazon affiliate links and tracking images. Of course I wrote a Guile script to do this extra clean-up...
The rest of this post outlines the process, from the Blogger export to a published Hugo site.
Export content from Blogger #
I began by following these instructions to export the blog posts to XML.
Create a new Hugo site #
I already had Hugo installed, so I just ran:
hugo new site start-again-at-zero
I decided to use the fairly minimal Congo theme for the blog, which I installed as a git submodule:
cd start-again-at-zero
git init .
git submodule add -b stable https://github.com/jpanther/congo.git themes/congo
Next I removed the generated hugo.toml
file and copied the Congo configuration into place:
rm hugo.toml
cp -r themes/congo/config .
...and made a few small changes to the configuration for my site. The most
important thing is to add theme = "congo"
to config/_default/config.toml
but
I also changed the base URL and blog name, enabled recent posts on the home
page, and enabled search.
Convert the Blogger export to Markdown #
Install the blog2md
utility:
cd ..
git clone https://github.com/palaniraja/blog2md.git
cd blog2md
# Install dependencies
npm install
Run the Markdown conversion:
node index.js b ~/Downloads/blog-08-27-2024.xml ../start-again-at-zero/content/posts/
At this point you have a working Hugo site that you can test locally by running:
cd ../start-again-at-zero
hugo serve
Additional clean-up #
As I mentioned earlier, the images and links in the Markdown are still pointing to Blogger, and the content contains some Amazon affiliate links and tracking that I don't want to keep. I wrote a Guile script that updates the Markdown files to:
- Remove Amazon affiliate tracking images.
- Remove Amazon affiliate links.
- Download images from Blogger to the Hugo
static/
folder and replace the image source with a relative link. - Replace links to images in Blogger with a relative link, downloading the image if necessary.
- Replace internal links to Blogger with relative links to the new blog.
It's a bit of an ugly script, relying a lot on regular expressions, but it got the job done.
I'm still relatively new to Guile and made use of a couple of new (to me) libraries in this script. The first was
SRFI 26, which was recommended by someone on the #guile IRC channel.
It provides a cut
function that is similar to, but more flexible than, Clojure's partial
. For example, the
scandir
function takes an optional
select?
argument to filter the returned files, and we can use cut
to build a select function:
(cut string-suffix ".md" <>)
The equivalent long-hand would be:
(lambda (x) (string-suffix ".md" x))
We can use the <>
placeholder in any argument position, while Clojure's
partial
only allows us to specify the leading arguments. To build an
equivalent function in Clojure, we would have to use the #()
reader macro:
#(clojure.string/ends-with? % ".md")
The second library that came in handy was SRFI 197 that provides
pipeline operators similar to Clojure's ->
(thread first) and ->>
(thread last) macros, but again more flexible as you
can specify where the placeholder goes. I used this to chain together a number of document transforms:
(define (process-file path)
(format #t "process-file ~a~%" path)
(let ((doc (chain (call-with-input-file path get-string-all)
(remove-amazon-tracking-images _)
(remove-amazon-links _)
(replace-images _)
(replace-image-links _)
(replace-self-links _))))
(call-with-output-file path (cut put-string <> doc))))
This function also shows another use of cut
, this time creating a lambda to pass to
call-with-output-file
.
Publishing the Blog #
My domain and web site is already hosted by Mythic Beasts so I simply headed over to their control panel to configure web hosting for a new subdomain. I added shell access to my hosting package so I can publish content using rsync:
# Generate the static site
hugo
# Push to the Mythic Beasts hosting server
rsync -rvct --delete public/. \
bobcat.mythic-beasts.com:www/start-again-at-zero.1729.org.uk
In case you don't have shell access, they also support upload by SFTP or (not recommended) FTP.
That's it! The site is now live at https://start-again-at-zero.1729.org.uk.