Lightweight RSS Content-Filtering
How I use DistilBERT & AWS Sagemaker to filter my RSS subscriptions
I’ve been tinkering with an initial idea for a content filter using LLMs and RSS, as much of a user’s preferences can be determined with just keyword identification and contextual understanding of RSS item titles. I tested this idea by click tracking my feed reader’s items, then trained the filter’s models against this data. The idea proved to to be at least partially successful based on my subjective experience.
The first version of the filter used a technique called in-context learning, and ICL made it common to spend ~$0.50 USD per day on inference alone. I wasn’t even filtering many feeds at the time, my rough estimate is that I filtered around 3 feeds a day twice a day. I’ve since swapped ICL for a fine-tuning approach, but regardless I figured that a distilled model could handle the task while being much cheaper.
So, I swapped out ChatGPT for DistilBERT. At present, my filter is now a weighted ensemble of two models: a TF-IDF logistic regression classifier and a fine-tuned DistilBERT classifier. I run inference entirely on my Beelink SER5 Pro Mini, but that hardware isn’t enough for fine-tuning so I run AWS Sagemaker jobs once a week and then download the model artifacts. The results are quite similar to the previous content-filter, but now at next to no cost.
If you’d like to check out the project, please visit github.com/sltptr/lss.