reinaelyabut.work@gmail.com

PXX RSS Feed
Filter.

Intelligent Web Scraping & AI-Powered Reporting
Role: Full Stack AI Engineer
Private Repo

The Overview

PXX RSS Feed Filter is an internal intelligence tool that automates the process of monitoring, scraping, and analyzing web-based RSS feeds. Rather than manually reading through hundreds of articles and updates, the system autonomously collects data from configured RSS sources, extracts key insights, and uses Google Gemini to generate concise, actionable summary reports for the team.

How It Works

The system follows a structured pipeline to transform raw web data into digestible intelligence:

  • RSS Feed IngestionConfigurable RSS feed URLs are monitored periodically. New entries are detected and queued for processing automatically.
  • Web Scraping EngineA Python-based scraper extracts the full article content from each feed entry, handling various page structures and content formats.
  • AI Summarization (Gemini)Scraped content is passed to Google Gemini, which generates structured summaries highlighting key takeaways, trends, and action items.
  • Report GenerationThe summarized insights are compiled into a clean, human-readable report accessible through the Streamlit dashboard interface.

Architecture & Design

The backend is powered by FastAPI, providing a high-performance REST API for managing feed sources, triggering scraping jobs, and serving generated reports. PostgreSQL stores all feed metadata, scraped content, and historical summaries, enabling trend analysis over time. The Streamlit frontend provides an interactive dashboard where users can configure feeds, view real-time scraping progress, and browse the latest AI-generated reports.

Project Screenshots

Dashboard view — Feed configuration and scraping status

RSS Feed Filter - AI Summary Report

Editable configurations of RSS feed scraper

RSS Feed Filter - Dashboard View

Credits & Acknowledgements

This project was developed during my internship at Parallaxx. The system and its implementation are the exclusive property of Parallaxx. Special thanks to the Digital Team for their collaboration and guidance.

Tech Stack

  • Python (Core Engine)
  • FastAPI (REST API)
  • Streamlit (Dashboard UI)
  • Google Gemini (AI Summarization)
  • PostgreSQL (Data Store)

Key Features

  • Automated RSS Feed Monitoring
  • Intelligent Web Scraping Pipeline
  • AI-Generated Summary Reports
  • Historical Trend Analysis
  • Interactive Streamlit Dashboard
Case Study: PXX RSS Feed Filter