Converting DokuWiki Blog to Publii
While having a blog in 2023 may be anachronistic or perhaps even crazy, what's even crazier is having content that dates all the way back to 2004. I've managed this in the past by writing scripts to convert from one blog system to another, and the conversion to Publii is no different.
I had a few things to work through in the wiki-to-publii conversion:
- DokuWiki uses a custom markup, Publii uses HTML
- Converting image galleries, which in DokuWiki are just folders of images
- DokuWiki posts are just text files, Publii stores its posts in a SQLite database.
My tool of choice for this type of work is Jupyter Notebook, a web-based interface for Python. And there are libraries available to address each of my conversion concerns. My, concersions.
Pandoc
Pandoc let me convert from DokuWiki to Publii. I filtered some things like "read more" links and the image galleries out of the wiki-format before passing each post to Pandoc, but when I was finished I had pretty good HTML that would work with Publii.
Pillow & PyExiv2
For galleries, first I wrote code that checked a post for an image gallery. If a gallery was found, I used PyExiv2 to extract the captions stored in each image so that I could include them in the Publii galleries. After that I used some of the HTML code I found in the Publii database to build up a fragment of HTML that will display the gallery. I also used the Python image library, Pillow, to create thumbnail versions of each image.
SQLite & Files
Python has good support for SQLite, so once I had my text and my images, a few basic queries were all it took to add all my old posts to my new site. The image files were automatically copied to the Publii data directory, and with a few clicks I had all my old posts in the new format.
There's probably some weird formatting in a few pages, but I'm not going to worry too much about it. I'm fine with a little loss of fidelity - it happens each time I move to a new system.
If it helps anyone else, I posted the Jupyter notebook on my GitHub page. To use it, you'll need two folders:
- A folder containing all the DokuWiki text files
- A folder with subfolders for each image gallery. Each subfolder should have the same name as a text file, just without the ".txt" extension
TextFiles\
|-blog_post1.txt
|-blog_post2.txt
Pictures\
|-blog_post1\
|-blog_post2\
Comments