How to Scrape Amazon Without Getting Blocked (2026 Guide).
Amazon has the most aggressive anti-scraping of any major site. Here's exactly what works in 2026: residential proxies, header rotation, and the specific libraries to use.
Published: 2026-04-17 | Updated: 2026-04-17
Why Amazon Is Uniquely Hard to Scrape
Amazon runs a multi-layer anti-bot stack that makes it harder to scrape than almost any other e-commerce site. The layers: IP reputation (datacenter IPs get captchas within 5-10 requests), TLS fingerprinting (curl and python-requests give away their identity in the ClientHello), JavaScript challenges on product and search pages, behavioral analysis (mouse movement, scroll patterns, timing), and aggressive rate limiting per IP and per account. A naive scraper with `requests.get(amazon.com/dp/...)` gets 503s or captchas within minutes.
Use Residential Proxies, Not Datacenter
Datacenter proxies are the #1 reason scrapers get blocked on Amazon — every IP from AWS, Hetzner, DigitalOcean, and OVH is pre-flagged. Residential proxies route through real consumer ISPs (Comcast, BT, Deutsche Telekom) and appear as regular shoppers to Amazon. **Bright Data** has the largest residential pool (72M+ IPs across 195 countries) and is what most serious Amazon scraping operations use. **Oxylabs** has similar quality with 100M+ IPs. **Smartproxy** is cheaper at $8.50/GB and works well for smaller operations. Budget expectation: $5-$15 per GB of bandwidth for residential.
TLS Fingerprint: Use curl-impersonate or tls-client
Amazon inspects the TLS ClientHello sent by your scraper and compares it against real browsers. Python's `requests` library and Node's `axios` have unique TLS fingerprints that scream "bot." Use **curl-impersonate** (C library with Python/Go/Node bindings) or **tls-client** (Go library with Python bindings) to send a ClientHello that matches Chrome, Firefox, or Safari exactly. This single change often increases success rate from 40% to 90%+ without any other modifications.
Headless Browsers: When and How
For product pages with lazy-loaded content, price variations, or A/B tests, raw HTTP scraping misses data — you need a real browser. **Playwright with stealth plugin** works better than Selenium in 2026. **Puppeteer-extra-plugin-stealth** is the Node equivalent. Bright Data's **Scraping Browser** is a hosted Playwright-compatible browser that handles proxy rotation, captcha solving, and fingerprint randomization for you at around $8.40/GB — often cheaper than running your own fleet of browsers.
Rate Limiting: The Math That Works
A single residential IP can make roughly 10-30 Amazon requests per hour before getting captcha'd — less if hitting the same category repeatedly. For a catalog of 100,000 products refreshed daily, you need at least 150-300 concurrent IPs with smart rotation. Add jitter to request timing (1-5 second random delays), vary User-Agent across real browser strings, and rotate the Accept-Language header. Never hammer a single ASIN repeatedly — spread requests across the catalog to look organic.
The Legal Side
Scraping publicly accessible Amazon product pages is legal in the US under the 2022 hiQ v. LinkedIn ruling, which confirmed that public web data isn't protected by the CFAA. However, Amazon's Terms of Service prohibit scraping and they can block your IPs and ban your account. This is a civil (contract) matter, not criminal. Avoid: logging into an Amazon account while scraping (that triggers ToS claims), scraping customer reviews and reseller data (privacy concerns), and redistributing scraped data without transformation (copyright concerns on product descriptions).
Should You Just Use the Amazon PA API?
Amazon's Product Advertising API (PA API v5) is the official way to get product data — legal, stable, and free. The catch: you need to be an approved Amazon Associate affiliate, maintain active sales (3 qualifying sales in 180 days or you lose API access), and rate limits are harsh (1 request/second to start). For commercial competitive pricing tools, the PA API is too limited — you'll need scraping. For affiliate content sites, the PA API is fine and removes the proxy/fingerprint headache entirely.
FAQ
- What's the cheapest way to scrape Amazon reliably? +
- Smartproxy residential proxies at $8.50/GB combined with curl-impersonate for TLS fingerprinting is the budget-friendly stack. Expect $20-$50/month to scrape around 10,000 products daily. Bright Data and Oxylabs are more expensive but have higher success rates for large-scale operations.
- Can I scrape Amazon with just Puppeteer? +
- Plain Puppeteer gets blocked within a few requests from a datacenter IP. You need puppeteer-extra-plugin-stealth plus residential proxies plus viewport/user-agent randomization. Even then, a hosted solution like Bright Data Scraping Browser is usually more cost-effective than maintaining your own stealth browser fleet.
- Is it illegal to scrape Amazon product prices? +
- No. Public product data is legal to scrape in the US (hiQ v. LinkedIn, 2022) and most of the EU. Amazon's ToS prohibits it, so they can block your IPs, but it's a contract issue, not a crime. Don't log in, don't scrape reviews (privacy), and don't redistribute copyrighted descriptions verbatim.
- How often does Amazon update its anti-bot systems? +
- Constantly. TLS fingerprint checks tightened in 2024, JavaScript challenges got harder in early 2026, and new captcha variants roll out monthly. Production scrapers need at least monthly maintenance to stay unblocked. This is why hosted solutions (Bright Data Scraping Browser, Oxylabs Web Scraper API) often make sense — they maintain the anti-anti-bot layer so you don't have to.