Generate RSS For Blog

February 08, 2024

I'm not sure how many people still use RSS, but I still want to have a RSS feed for my blog though I don't use RSS myself either.

From what I known, many people recommend using ox-rss.el to generate RSS XML from blog files. The ox-rss.el is a org-export backed derived from the HTML backend and it exports org files to an RSS XML file. But ox-rss.el is not part of org-contrib right now.

Luckily, Python has many modules which can generate and parse RSS feeds (python-feedgen, feedparser, rfeed, etc.). I love Python, it is easy to use. So I decided to use python-feedgen to generate an RSS XML file from HTML files of my blog.

Getting Posts List

I published all of blog files into a public directory, firstly I need to filter all necessary HTML files.

import os
import fnmatch

htmls = ["public/index.html"]
for root, dirs, files in os.walk(os.getcwd() + "/public/posts"):
    relpath = os.path.relpath(root)
    for filename in fnmatch.filter(files, "*.html"):
        filename = os.path.join(relpath, filename)
        filename = filename.replace("\\", "/")
        htmls.append(filename)
htmls.sort(reverse=True)

Parsing HTML

Beautiful Soup is a powerful library that make it easy to scrape information from HTML files. I need to scrape title and page content from HTML files, and remove all blank lines in content.

from bs4 import BeautifulSoup

soup = BeautifulSoup(htmlfile, "html.parser")
title = soup.title.string
description = soup.find("div", class_="content").text
# delete blank lines
description = "\n".join([line for line in description.split("\n") if line.strip()])

Generating RSS XML

It is easy to create a feed and generate a feed by following the documentation. The most important elements are title, link and description.

This blog is published by org, here is a trick to generate rss automatically.

(with-eval-after-load 'ox-publish
  (defun dotemacs-generate-rss (project &optional force async)
    (let ((default-directory dotemacs-org-site-dir))
      (dotemacs-call-process "python" "rss.py"))
    (message "dotemacs generate rss in %s" dotemacs-org-site-dir))
  (advice-add 'org-publish-project :after #'dotemacs-generate-rss))

Top