Generate RSS For Blog
I'm not sure how many people still use RSS, but I still want to have a RSS feed for my blog though I don't use RSS myself either.
From what I known, many people recommend using ox-rss.el to generate RSS XML from blog files. The ox-rss.el is a org-export backed derived from the HTML backend and it exports org files to an RSS XML file. But ox-rss.el is not part of org-contrib right now.
Luckily, Python has many modules which can generate and parse RSS feeds (python-feedgen, feedparser, rfeed, etc.). I love Python, it is easy to use. So I decided to use python-feedgen to generate an RSS XML file from HTML files of my blog.
Getting Posts List
I published all of blog files into a public
directory, firstly I need to filter all necessary HTML files.
import os import fnmatch htmls = ["public/index.html"] for root, dirs, files in os.walk(os.getcwd() + "/public/posts"): relpath = os.path.relpath(root) for filename in fnmatch.filter(files, "*.html"): filename = os.path.join(relpath, filename) filename = filename.replace("\\", "/") htmls.append(filename) htmls.sort(reverse=True)
Parsing HTML
Beautiful Soup is a powerful library that make it easy to scrape information from HTML files. I need to scrape title
and page content
from HTML files, and remove all blank lines in content
.
from bs4 import BeautifulSoup soup = BeautifulSoup(htmlfile, "html.parser") title = soup.title.string description = soup.find("div", class_="content").text # delete blank lines description = "\n".join([line for line in description.split("\n") if line.strip()])
Generating RSS XML
It is easy to create a feed and generate a feed by following the documentation. The most important elements are title
, link
and description
.
This blog is published by org, here is a trick to generate rss automatically.
(with-eval-after-load 'ox-publish (defun dotemacs-generate-rss (project &optional force async) (let ((default-directory dotemacs-org-site-dir)) (dotemacs-call-process "python" "rss.py")) (message "dotemacs generate rss in %s" dotemacs-org-site-dir)) (advice-add 'org-publish-project :after #'dotemacs-generate-rss))