The open-source stack,
curated by builders.
Tell us what you're building or what you need. We'll surface the right repo, with an honest take on whether to use it.
Matches for "scraper"
8 reposThe 2026 default for LLM-grade scraping. Fires Playwright behind the scenes, returns clean markdown ready to feed into RAG. Self-host for free or use the hosted tier — both expose the same API. The sw…
Python-native LLM-friendly crawler. Strong at extracting structured data (JSON schemas) from messy HTML using an embedded LLM. Heavier setup than Firecrawl but more control over extraction prompts. Be…
Node-native crawler from the Apify team. Built-in queues, retries, proxy rotation, headless browser pool — production patterns out of the box. Switches between Playwright, Puppeteer, and plain HTTP ba…
The Python scraping veteran. Mature ecosystem, plugins for everything (caching, proxies, middlewares), and a years-honed pipeline architecture. Steeper learning curve than the modern alternatives but…
The browser automation library the rest of the scraping tools depend on. Direct API for when you want fine-grained control: stealth mode, anti-bot bypass, multi-context, headless or headed. Also doubl…
Chrome-only browser automation from Google. Slightly more raw than Playwright with fewer batteries included, but lighter weight and battle-tested on Chrome quirks. The choice when Playwright's multi-b…
jQuery-style HTML parsing for Node, with no browser. Ridiculously fast because it never renders JS. The default when you're scraping server-rendered pages (most blogs, docs, news, marketplace listings…
Go's answer to Scrapy. Built-in rate limiting, caching, parallelism, and storage backends. Compiles to a single binary which makes deployment to a cheap VPS trivial. Use when you want to scrape millio…
We'll add it in 60 minutes.
Tell us what tool or use case is missing. We'll research the best repo for it, write an honest take, add it to the directory, and email you the link. No paywall, no signup required.