stackpicks.dev
Preview mode — 111 repos, zero database

The open-source stack,
curated by builders.

Tell us what you're building or what you need. We'll surface the right repo, with an honest take on whether to use it.

Or paste a GitHub repo — owner/repo — to preview it live.

Filtered byscraperclear

Matches for "scraper"

8 repos
mendableai
firecrawl

The 2026 default for LLM-grade scraping. Fires Playwright behind the scenes, returns clean markdown ready to feed into RAG. Self-host for free or use the hosted tier — both expose the same API. The sw…

Scraping & CrawlingAI & ML
You're building a RAG agent, a competitive-intel tool, or any system that ingests web content as training/search data.
You're scraping structured APIs (JSON endpoints) — overkill. Use plain fetch + Zod.
mendableai/firecrawlView
unclecode
crawl4ai

Python-native LLM-friendly crawler. Strong at extracting structured data (JSON schemas) from messy HTML using an embedded LLM. Heavier setup than Firecrawl but more control over extraction prompts. Be…

Scraping & CrawlingAI & ML
You're in Python, want self-hosted, and need extraction with an exact JSON schema.
You don't need LLM-driven extraction — Scrapy or Crawlee will be cheaper and faster.
unclecode/crawl4aiView
apify
crawlee

Node-native crawler from the Apify team. Built-in queues, retries, proxy rotation, headless browser pool — production patterns out of the box. Switches between Playwright, Puppeteer, and plain HTTP ba…

Scraping & Crawling
You need to crawl hundreds of thousands of pages reliably in TypeScript or Node.
A 50-line Playwright script will do — Crawlee is overhead for a quick scrape.
apify/crawleeView
scrapy
scrapy

The Python scraping veteran. Mature ecosystem, plugins for everything (caching, proxies, middlewares), and a years-honed pipeline architecture. Steeper learning curve than the modern alternatives but…

Scraping & Crawling
You're a Python team scraping at scale and want middleware/pipeline patterns out of the box.
You're scraping JS-heavy SPAs — Scrapy needs Playwright integration which is awkward; Crawlee is cleaner.
scrapy/scrapyView
microsoft
playwright

The browser automation library the rest of the scraping tools depend on. Direct API for when you want fine-grained control: stealth mode, anti-bot bypass, multi-context, headless or headed. Also doubl…

Scraping & CrawlingTesting
You want a single library that scrapes AND runs your E2E tests — one toolchain, two jobs.
You only need HTTP + HTML parsing (no JS execution) — Cheerio or BeautifulSoup will be 10x faster.
microsoft/playwrightView
puppeteer
puppeteer

Chrome-only browser automation from Google. Slightly more raw than Playwright with fewer batteries included, but lighter weight and battle-tested on Chrome quirks. The choice when Playwright's multi-b…

Scraping & Crawling
You're Chrome-only and want a thinner abstraction than Playwright.
You need Firefox or Safari support — Playwright handles those, Puppeteer does not.
puppeteer/puppeteerView
cheeriojs
cheerio

jQuery-style HTML parsing for Node, with no browser. Ridiculously fast because it never renders JS. The default when you're scraping server-rendered pages (most blogs, docs, news, marketplace listings…

Scraping & Crawling
The page works without JavaScript — view-source contains the content you want.
The page renders content client-side (SPA, infinite scroll) — you need Playwright/Puppeteer.
cheeriojs/cheerioView
gocolly
colly

Go's answer to Scrapy. Built-in rate limiting, caching, parallelism, and storage backends. Compiles to a single binary which makes deployment to a cheap VPS trivial. Use when you want to scrape millio…

Scraping & Crawling
You're in Go and need scraping with low memory + single-binary deploy.
You're not in Go — Crawlee or Scrapy will have a richer ecosystem.
gocolly/collyView
Don't see what you need?

We'll add it in 60 minutes.

Tell us what tool or use case is missing. We'll research the best repo for it, write an honest take, add it to the directory, and email you the link. No paywall, no signup required.

We respond in under 60 minutes during business hours (10:00–18:00 IST).

Preview — the curated 104 — StackPicks