ScrapeGraphAI

ScrapeGraphAI is a web scraping python library that uses LLMs and direct graph logic to create scraping pipelines for websites and local documents.

The project was founded by my good friend and university mate Marco Vinciguerra, who envisioned the project during his Erasmus semester after a frustrating experience with traditional scrapers. He then went on to develop it with his competitive programming team, and released it as open source. The project was an overnight success, amassing thousands of GitHub stars over a short period of a few months and ranking as the 2nd open source AI project of 2024 worldwide.

Marco and his team were soon overwhelmed by the amount of pull requests and issues on the repository. I stepped in as a maintainer for a while, to help with bugfixing, pull request filtering and CI/CD maintenance.

Thanks to this, I learned a lot about maintaining large codebases and working under pressure. Even if I'm not actively contributing to ScrapeGraphAI anymore, I still thank Marco and his friends for letting me participate in the project.