How to Bypass Cloudflare Anti-Bot Protection: A Complete Technical Guide (2026)

Over 24 million active websites โ at least 20% of all websites on the internet โ use Cloudflare, making it the primary anti-bot solution behind many of the sites you might want to scrape. When your scraper hits Cloudflare protection, you're not dealing with one layer โ you're dealing with five simultaneously. Understanding each one is the difference between a scraper that works and one that gets blocked before the first request lands.
This is the most comprehensive guide to Cloudflare bypass available in 2026. We cover every detection layer in technical depth, the specific bypass technique for each, real Python implementations, and when to stop building infrastructure and use purpose-built bypass tools like ScrapeBadger. No hand-waving. No vague advice to "use proxies." Concrete techniques, working code, and honest assessments of where each approach breaks down.
What Cloudflare Actually Does (And Why It's Hard)
Cloudflare is not a single anti-bot system. It's a layered security platform where each layer is independently capable of blocking your scraper. Most bypass guides treat it as one problem. That's why most bypass guides are wrong within a month.
Cloudflare's web application firewall identifies and blocks bots based on several distinct traits: TLS fingerprints that identify clients and their configurations, HTTP/2 fingerprints that match against known bot signatures, HTTP header inspection for bot-like configurations, JavaScript fingerprints that gather browser, OS, and hardware details, and behavioural analysis that monitors request rates, mouse movements, and idle times with machine learning. Apify
A scraper that addresses only one or two of these layers gets through one check and fails another. The five layers interact โ a perfect TLS fingerprint combined with a datacenter IP gets past fingerprinting but fails at IP reputation. Residential proxies with perfect headers fail if your TLS stack is wrong. You need all five addressed simultaneously for reliable bypass.
Layer 1: IP Reputation and Rate Limiting
The first check is the simplest and the one most developers understand: Cloudflare maintains databases of IP reputation. Every major cloud provider's IP range โ AWS, Google Cloud, DigitalOcean, Hetzner, OVH โ is pre-flagged. A requests.get() call from a server IP reaches Cloudflare's edge and fails before your headers or fingerprint are even evaluated.
Cloudflare has a global database of IP addresses and their reputation. If an IP is known for scraping, spam, or suspicious activity, it may be blocked or challenged. Requests coming from a single IP address are more likely to be rate limited or blocked, while using multiple IP addresses can help distribute traffic and avoid detection. Sending too many requests in a short amount of time will trigger rate limiting rules and block further access.
The fix: residential and mobile proxies
Residential IPs โ addresses allocated by ISPs to real home users โ carry significantly higher trust than datacenter IPs. Mobile IPs (from cellular networks) carry the highest trust of all, because carriers share IPs across hundreds of users simultaneously, making individual blocking costly in false positives for Cloudflare.
The proxy type hierarchy from lowest to highest trust: datacenter โ ISP/static residential โ rotating residential โ mobile.
For most Cloudflare-protected sites, rotating residential proxies are the baseline requirement. Datacenter IPs don't make it past the first check on properly configured Cloudflare deployments.
Rate limiting matters as much as proxy type. Real users don't make 100 requests per second. Cloudflare's rate limiting tracks per-IP request frequency and will block even a clean residential IP if it sends requests at machine speed. Add realistic delays โ minimum 1โ3 seconds between requests, with random variance to avoid pattern detection.
python
import time
import random
def polite_delay(min_seconds: float = 1.0, max_seconds: float = 4.0):
"""
Simulate human browsing timing.
Never scrape at constant machine speed โ it's the easiest detection signal.
"""
base_delay = random.uniform(min_seconds, max_seconds)
# Occasional longer pauses โ real users sometimes stop to read
if random.random() < 0.15:
base_delay += random.uniform(3.0, 10.0)
time.sleep(base_delay)Layer 2: TLS Fingerprinting (JA3/JA4)
This is the detection layer that breaks more scrapers than any other โ and the one most tutorials skip entirely.
The construction of a TLS fingerprint occurs during a client-server TLS Handshake. In this process, Cloudflare analyzes the "client hello" message fields โ cipher suites, extensions, and elliptic curves โ to compute a fingerprint hash for a given client. It then looks up that hash in a database of pre-collected fingerprint hashes to determine the client sending the request. ScraperAPI
The JA3 fingerprint for Python's requests library is 7e262e4d3a3f90eb9de01e5acef8a6ce. Every enterprise anti-bot system has this signature in its blocklist. Your request fails at the TLS handshake level โ before HTTP headers, before cookies, before any application data is exchanged.
Cloudflare checks if the TLS and HTTP/2 fingerprints match the declared browser headers. If there's a mismatch, it suggests that the request might be coming from a bot trying to spoof a real browser. Different versions of browsers and HTTP clients have distinct TLS and HTTP/2 fingerprints, which Cloudflare compares to the declared browser headers to verify authenticity. ScraperAPI
JA4, introduced in 2023 and widely adopted by 2025, extends this analysis to capture transport protocol details, extension ordering, and ALPN values. It's harder to spoof because it captures more of the handshake structure.
When reverse-engineering the JA3 handshake, Cloudflare was found to flag "impossible" combinations, such as a Chrome 144 header paired with a TLS stack that didn't support Chrome 144's capabilities โ a combination no real browser would produce. Firecrawl
The fix: curl_cffi for browser-accurate TLS
The curl_cffi library is the current standard for TLS fingerprint spoofing in Python. It uses libcurl compiled with BoringSSL โ the same TLS library Chrome uses โ and can accurately impersonate any Chrome or Firefox TLS profile:
python
from curl_cffi import requests as cf_requests
import os
# Chrome 120 TLS profile โ matches the actual Chrome 120 TLS stack exactly
session = cf_requests.Session(impersonate="chrome120")
response = session.get(
"https://cloudflare-protected-site.com/data",
headers={
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
"Connection": "keep-alive",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
},
proxies={
"https": os.environ.get("RESIDENTIAL_PROXY_URL")
}
)
print(f"Status: {response.status_code}")
print(response.text[:500])Available impersonation profiles in curl_cffi include chrome99, chrome100, chrome101, chrome104, chrome107, chrome110, chrome116, chrome119, chrome120, chrome123, chrome124, firefox91, firefox98, firefox102, firefox109, and safari15_3. Always use a recent Chrome profile โ using chrome99 in 2026 looks suspicious because no real browser is still on that version.
The TLS version and Chrome version in your impersonation profile must match your User-Agent header. Claiming to be Chrome 120 with a TLS profile that Chrome 120 would never produce is detectable. Ensure your browser headers, TLS fingerprint, and HTTP/2 fingerprint are all consistent and indicate that the request is coming from a real browser.
Layer 3: HTTP/2 Fingerprinting (AKAMAI)
HTTP/2 introduced binary framing, header compression (HPACK), and stream multiplexing. Cloudflare fingerprints the specific HTTP/2 parameter values that clients send โ frame ordering, SETTINGS frames, window sizes, HEADERS frame pseudo-header ordering.
The HTTP/2 specification extends the previous HTTP/1.1 protocol with new parameters to improve the performance of web applications with concurrency. Cloudflare's Bot Manager fingerprints these new values in upcoming requests to detect bot-like behaviour. Of all the passive bot detection techniques, TLS and HTTP/2 fingerprinting are the most technically challenging to control in request-based bots.
The fingerprinting method is sometimes called "AKAMAI fingerprint" in security research because Akamai independently developed similar techniques. It captures:
SETTINGS frame values: initial window size, max concurrent streams, header table size
WINDOW_UPDATE frame: the initial connection-level window size increment
HEADERS frame: pseudo-header ordering (
:method,:path,:scheme,:authorityordering)PRIORITY frame: whether the client sends stream priorities (Chrome does; simple HTTP clients often don't)
The fix: curl_cffi also handles HTTP/2 fingerprinting. When you impersonate chrome120, curl_cffi reproduces Chrome 120's exact HTTP/2 parameter values โ not just the TLS profile. This is the key advantage of curl_cffi over manually setting headers in httpx or aiohttp, which control headers but not the underlying protocol parameters.
For the highest-fidelity HTTP/2 fingerprinting, full browser automation (Playwright with real Chromium) is the gold standard โ you're using the actual browser engine rather than replicating it.
Layer 4: JavaScript Challenges (Cloudflare Managed Challenge and Turnstile)
When Cloudflare's passive detection layers (TLS, HTTP/2, IP reputation) are inconclusive, it escalates to active JavaScript challenges that require a browser environment to solve.
Managed Challenge (formerly "5-second check")
The Managed Challenge is a more aggressive security measure. When Cloudflare detects highly suspicious traffic, it presents a full-page interstitial screen, often with a brief delay ("Challenge 5s") or a visible Turnstile widget. Oxylabs
The challenge serves a JavaScript interstitial page that executes several checks:
Proof-of-work computation (deterministic but CPU-intensive for bots)
Browser API probing (Canvas, WebGL, AudioContext, battery API, device memory)
Cookie validation โ the challenge sets a
cf_clearancecookie upon successful completionTimer validation โ the challenge has a minimum completion time to detect speed cheating
Solving this requires a real JavaScript engine. A requests or curl_cffi call gets the interstitial page and nothing else.
Cloudflare Turnstile
Turnstile runs a series of small non-interactive JavaScript challenges to gather signals about the visitor or browser environment. These challenges include proof-of-work (computational puzzles), proof-of-space, probing for web APIs, and various other challenges for detecting browser-quirks and human behaviour. As a result, Cloudflare can fine-tune the difficulty of the challenge to the specific request and avoid showing a visual or interactive puzzle to a user. RapidSeedbox
For Turnstile, the actual act of checking a box isn't important โ it's the background data being analysed while the box is checked that matters. The current deployment of Turnstile checks billions of visitors every day, and Cloudflare can identify browser abnormalities that bots exhibit while attempting to pass those tests. Apify
Turnstile comes in three modes: Managed (automatically decides whether to show a checkbox based on visitor risk level), Non-interactive (visitors never need to interact with the widget), and Invisible (the widget is completely invisible to the visitor, but the challenge still runs in the background). Thunderbit
The invisible mode is the most challenging for bypass tools โ there's no visible element to detect or interact with.
The fix: Playwright with stealth patches
For JavaScript challenge handling, full browser automation is the only reliable approach. Playwright launches real Chromium, which executes the challenge JavaScript natively:
python
from playwright.sync_api import sync_playwright
import time
import random
def bypass_cloudflare_playwright(
url: str,
proxy_url: str = None,
wait_for_selector: str = None,
) -> tuple[str, dict]:
"""
Bypass Cloudflare using real browser automation.
Returns (html_content, cookies) for session reuse.
"""
with sync_playwright() as p:
browser_args = [
"--no-sandbox",
"--disable-blink-features=AutomationControlled",
"--disable-dev-shm-usage",
"--disable-gpu",
]
launch_options = {
"headless": True,
"args": browser_args,
}
if proxy_url:
launch_options["proxy"] = {"server": proxy_url}
browser = p.chromium.launch(**launch_options)
context = browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
locale="en-US",
timezone_id="America/New_York",
# Disable WebDriver flag that headless Chrome exposes
java_script_enabled=True,
)
# Patch navigator.webdriver before any page load
context.add_init_script("""
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined,
configurable: true
});
Object.defineProperty(navigator, 'plugins', {
get: () => [1, 2, 3, 4, 5],
});
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en'],
});
window.chrome = {
runtime: {},
loadTimes: function() {},
csi: function() {},
app: {}
};
""")
page = context.new_page()
# Navigate and wait for Cloudflare challenge to complete
page.goto(url, wait_until="networkidle", timeout=30000)
# Wait for challenge to resolve โ Cloudflare typically takes 3-8 seconds
time.sleep(random.uniform(4, 8))
# Check if we're still on a challenge page
if "Just a moment" in page.title() or "Checking your browser" in page.content():
# Wait longer โ challenge is still processing
page.wait_for_load_state("networkidle", timeout=30000)
time.sleep(random.uniform(3, 6))
if wait_for_selector:
try:
page.wait_for_selector(wait_for_selector, timeout=15000)
except Exception:
pass
html = page.content()
# Capture cookies for session reuse
cookies = {
cookie["name"]: cookie["value"]
for cookie in context.cookies()
}
browser.close()
return html, cookies
# Usage
html, cookies = bypass_cloudflare_playwright(
"https://cloudflare-protected-site.com",
proxy_url="http://user:pass@residential-proxy:8080"
)
# Reuse the cf_clearance cookie in subsequent requests
cf_clearance = cookies.get("cf_clearance")
print(f"Got cf_clearance: {cf_clearance[:20]}...")Session reuse: the key to efficiency
Launching a full browser for every request is resource-intensive and slow. The correct pattern is:
Launch Playwright once to solve the initial challenge
Extract the
cf_clearancecookie and session cookiesReuse those cookies in faster
curl_cffirequests for all subsequent pages
python
from curl_cffi import requests as cf_requests
import os
def make_session_with_clearance(
cf_clearance: str,
user_agent: str,
proxy_url: str = None,
) -> cf_requests.Session:
"""
Create a curl_cffi session pre-loaded with Cloudflare clearance.
Far faster than Playwright for pages that don't trigger new challenges.
"""
session = cf_requests.Session(impersonate="chrome120")
session.cookies.set("cf_clearance", cf_clearance)
session.headers.update({
"User-Agent": user_agent,
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
})
if proxy_url:
session.proxies = {"https": proxy_url}
return session
# One Playwright call to get clearance
html, cookies = bypass_cloudflare_playwright(
"https://protected-site.com",
proxy_url=os.environ.get("RESIDENTIAL_PROXY_URL")
)
# All subsequent requests via curl_cffi (5-10x faster)
session = make_session_with_clearance(
cf_clearance=cookies.get("cf_clearance"),
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64)...",
proxy_url=os.environ.get("RESIDENTIAL_PROXY_URL")
)
# Now scrape multiple pages without relaunching Playwright
for url in product_urls:
response = session.get(url)
process_page(response.text)
polite_delay()Important: The cf_clearance cookie is tied to the IP address used to obtain it. If you change proxies between obtaining the cookie and using it, Cloudflare will reject the clearance. Keep proxy session consistency throughout.
Layer 5: Browser Environment Fingerprinting
Even with correct TLS, HTTP/2, headers, and IP, JavaScript-executing pages run additional checks against the browser environment itself. This is where headless Playwright fails without stealth patches.
The signals Cloudflare collects include:
navigator.webdriver: Standard Playwright sets this to true. No real browser does. This single flag identifies your session as automated if not patched.
Canvas fingerprint: The browser renders a canvas element and hashes the pixel data. The output varies by GPU, driver, and browser version. Headless Chrome renders via software (SwiftShader) and produces a known software-renderer hash that differs from any real GPU.
WebGL renderer string: gl.getParameter(gl.RENDERER) returns "Google SwiftShader" in headless Chrome. Real Chrome returns GPU-specific strings like "ANGLE (NVIDIA, NVIDIA GeForce RTX 3080 Direct3D11 vs_5_0 ps_5_0, D3D11)".
navigator.plugins: Real Chrome has plugins. Headless Chrome has none by default. An empty plugins array is a common automation signal.
Timing attacks: Cloudflare scripts measure JavaScript execution timing. Real browsers have deterministic timing patterns; automated environments sometimes show impossible timing values.
The fix: stealth patches
javascript
// Inject before any page load via context.add_init_script()
// 1. Remove webdriver flag
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined,
configurable: true
});
// 2. Fake plugins (real Chrome has 3 by default)
Object.defineProperty(navigator, 'plugins', {
get: () => {
return [
{name: "Chrome PDF Plugin", filename: "internal-pdf-viewer"},
{name: "Chrome PDF Viewer", filename: "mhjfbmdgcfjbbpaeojofohoefgiehjai"},
{name: "Native Client", filename: "internal-nacl-plugin"}
]
}
});
// 3. Fake permissions API (headless doesn't implement this the same way)
const originalQuery = window.navigator.permissions.query;
window.navigator.permissions.query = (parameters) => (
parameters.name === 'notifications' ?
Promise.resolve({ state: Notification.permission }) :
originalQuery(parameters)
);
// 4. Add chrome runtime object (missing in headless)
window.chrome = {
app: {isInstalled: false},
webstore: {onInstallStageChanged: {}, onDownloadProgress: {}},
runtime: {
PlatformOs: {MAC: 'mac', WIN: 'win', ANDROID: 'android', CROS: 'cros', LINUX: 'linux', OPENBSD: 'openbsd'},
PlatformArch: {ARM: 'arm', X86_32: 'x86-32', X86_64: 'x86-64'},
RequestUpdateCheckStatus: {THROTTLED: 'throttled', NO_UPDATE: 'no_update', UPDATE_AVAILABLE: 'update_available'},
OnInstalledReason: {INSTALL: 'install', UPDATE: 'update', CHROME_UPDATE: 'chrome_update', SHARED_MODULE_UPDATE: 'shared_module_update'},
OnRestartRequiredReason: {APP_UPDATE: 'app_update', OS_UPDATE: 'os_update', PERIODIC: 'periodic'}
}
};
// 5. Correct screen dimensions (headless often reports 0,0)
Object.defineProperty(screen, 'colorDepth', {get: () => 24});
Object.defineProperty(screen, 'pixelDepth', {get: () => 24});The playwright-stealth Python package applies most of these patches automatically:
bash
pip install playwright-stealthpython
from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page()
stealth_sync(page) # Apply all stealth patches
page.goto("https://cloudflare-protected-site.com")The Error Codes: What Cloudflare Is Telling You
When bypass fails, Cloudflare returns specific error codes. Understanding them helps you diagnose exactly which layer blocked you:
Error | Meaning | Fix |
|---|---|---|
1003 | Direct IP access โ no hostname in request | Add |
1010 | Browser fingerprint banned | Switch TLS profile, use |
1012 | IP address banned | Switch to residential proxy, wait for IP rotation |
1015 | Rate limited | Reduce request frequency, add delays |
1020 | WAF rule blocked | Request pattern matched a custom WAF rule; change approach |
403 with cf-ray | Request blocked by Cloudflare WAF | Evaluate headers and request pattern |
Challenge page (HTML) | JavaScript challenge triggered | Use Playwright to solve; extract |
Empty body / 200 but no content | Challenge page returned as 200; JS not executed | Switch to Playwright โ you're getting the interstitial |
The cf-ray header in every Cloudflare response contains a unique identifier for the blocked request. If you need to debug a specific block with Cloudflare support (for legitimate automation with site owner permission), this ID is what to reference.
Cloudflare Bot Fight Mode vs. Bot Management
An important distinction that affects bypass difficulty:
Bot Fight Mode (free): Available on all Cloudflare plans. Applies JavaScript challenges and IP reputation checks. Bypassable with the techniques above.
Super Bot Fight Mode (Pro/Business): Adds more aggressive challenge types and blocks "definitely automated" traffic more aggressively. Still bypassable with full browser automation and residential proxies.
Bot Management (Enterprise only): Machine learning models trained on network-wide traffic patterns. Intent-based detection that goes beyond request analysis. This is what protects major e-commerce platforms, financial services sites, and heavily targeted properties. Significantly harder to bypass with DIY approaches โ the ML models identify behavioural patterns across session history, not just individual requests.
If the site you're targeting uses Enterprise Bot Management, the DIY path requires genuine behavioural simulation: realistic scroll patterns, mouse movements that follow natural B-spline curves, interaction timing that matches human reading speeds, and session histories that look like returning visitors. This level of automation is complex to build and maintain.
When DIY Breaks Down: The Production Reality
The techniques above handle standard Cloudflare deployments reliably. At some point, the maintenance overhead of keeping bypass logic current exceeds the cost of using infrastructure built specifically for this problem.
The signals that you've hit that point:
Cloudflare updated its detection and your bypass stopped working with no warning or documentation
You're targeting Enterprise Bot Management sites where behavioural simulation is required
You need geo-targeted data across multiple residential proxy pools in different countries
Success rate degrades over weeks as your proxy IPs accumulate session history and get flagged
You're managing bypass for dozens of different target sites simultaneously โ each potentially on different Cloudflare plan tiers
ScrapeBadger's Cloudflare bypass infrastructure handles all five detection layers at the infrastructure level. TLS fingerprinting, HTTP/2 parameter matching, browser environment simulation, residential proxy rotation, and session management are built into every request. When Cloudflare pushes updates to its detection models โ which happens continuously โ the bypass adapts without any changes to your code.
python
import requests
# One API call. ScrapeBadger handles all five Cloudflare detection layers.
response = requests.get(
"https://api.scrapebadger.com/v1/scrape",
headers={"X-API-Key": "YOUR_API_KEY"},
params={
"url": "https://cloudflare-protected-site.com/products",
"render_js": True,
"wait_for": "networkidle",
}
)
data = response.json()
print(data["html"][:500])From your code's perspective: one API call, clean HTML back. From ScrapeBadger's infrastructure: residential proxy selection, TLS fingerprint matching, browser execution, challenge solving, session management. The complexity is invisible โ which is exactly the point.
Full technical documentation at docs.scrapebadger.com. The ScrapeBadger CLI supports scheduled Cloudflare bypass at scale without managing any of this infrastructure yourself.
Putting It Together: The Complete Bypass Stack
For teams building production scrapers against Cloudflare-protected sites, the full stack ordered by effectiveness:
Level 1 โ Basic Cloudflare (Bot Fight Mode, standard IP reputation):
curl_cffiwithchrome120impersonationRotating residential proxies
Complete browser headers with consistent versions
Level 2 โ Moderate Cloudflare (Super Bot Fight Mode, Managed Challenges):
Playwright with stealth patches
Extract
cf_clearance, reuse incurl_cffisessionsRotating residential proxies with session stickiness
Polite request timing with randomised delays
Level 3 โ Aggressive Cloudflare (Enterprise Bot Management, ML-based intent detection):
Full browser automation with genuine interaction patterns
Mobile proxies for highest IP trust
Session histories built over time (returning visitor pattern)
Or: ScrapeBadger's bypass infrastructure, which handles this at the platform level
The decision between Level 2 (DIY) and Level 3 (infrastructure) is an engineering economics question. If Cloudflare bypass is a means to an end for your project, the ongoing maintenance cost of keeping Level 2 working as Cloudflare updates โ realistically 5โ15 engineering hours per month for each protected target โ often exceeds the cost of using production infrastructure. If you're building a scraping product where bypass is core, owning the stack makes sense.
The guide above gives you the complete technical picture to make that decision well. As covered in the ScrapeBadger guide to scraping without getting blocked, the most successful scraping teams are honest about what's worth building in-house and what's worth buying โ and Cloudflare bypass is one of the highest-maintenance components of any production scraping stack.
Frequently Asked Questions
Does using requests with the right headers bypass Cloudflare?
No. If you have a non-browser User-Agent string, such as python-requests/2.22.0, your scraper can easily be picked out as a bot. But even with perfect headers, requests fails at TLS fingerprinting โ the JA3 fingerprint of the requests library is in every major anti-bot blocklist. Use curl_cffi with a browser impersonation profile as a minimum. Bright Data
Does rotating proxies alone bypass Cloudflare?
No. Proxy rotation addresses IP reputation only โ one of five detection layers. Cloudflare checks for unique browser characteristics like headers, installed plugins, screen resolution, and rendering engines, and also uses TLS fingerprinting by analysing the TLS handshake and client hello messages. You need all layers addressed. GitHub
Can headless Playwright bypass Cloudflare without stealth patches?
Rarely on protected sites. The navigator.webdriver flag alone identifies automation to any moderately configured Cloudflare deployment. Apply stealth patches before page navigation.
How long does a cf_clearance cookie last?
Typically 30 minutes to 24 hours depending on the site's Cloudflare configuration. Build session expiry detection and re-solve logic into any production pipeline.
Is bypassing Cloudflare legal?
Scraping publicly visible data is generally lawful under established precedent (see hiQ v. LinkedIn). Cloudflare's protection doesn't change the legality of accessing public data. Violating a site's Terms of Service is a civil matter. Scraping authenticated content, personal data, or attempting DDoS-style volume attacks carries different legal risk. Use reasonable rates, scrape only publicly visible data, and consult legal counsel for commercial redistribution.

Written by
Thomas Shultz
Thomas Shultz is the Head of Data at ScrapeBadger, working on public web data, scraping infrastructure, and data reliability. He writes about real-world scraping, data pipelines, and turning unstructured web data into usable signals.
Ready to get started?
Join thousands of developers using ScrapeBadger for their data needs.