Web Scraping with Python: What Actually Works in 2026
A practical guide to scraping web data with requests and BeautifulSoup — including the anti-bot measures you'll encounter and how to handle them ethically.
Web scraping gets a bad reputation, most of it undeserved. Scraping public data for research, price tracking, or automation is a legitimate use case with a long history. Here's how to do it correctly.
Before Scraping: Check These First
- 1robots.txt: example.com/robots.txt shows what the site explicitly disallows crawling. Disallowed paths in robots.txt are off-limits by convention.
- 2Public API: many sites provide one (Twitter/X, Reddit, GitHub). An API is always preferable — structured data, no parsing needed, officially supported.
- 3Terms of Service: look for 'scraping', 'crawling', 'automated access'. Commercial use of scraped data often requires permission.
- 4Rate limits: if you must scrape, how fast is too fast? Mimicking human browsing speed (seconds between requests) is a reasonable baseline.
The Basic Pattern with requests + BeautifulSoup
scraper.py
import requests
from bs4 import BeautifulSoup
import time
headers = {
'User-Agent': 'Mozilla/5.0 (compatible; research-bot/1.0)'
}
def scrape_page(url):
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status() # raises on 4xx/5xx
soup = BeautifulSoup(response.text, 'html.parser')
# Find elements by CSS selector
titles = soup.select('h2.article-title')
return [t.get_text(strip=True) for t in titles]
urls = ['https://example.com/page1', 'https://example.com/page2']
for url in urls:
data = scrape_page(url)
print(data)
time.sleep(2) # be politeFinding the Right CSS Selectors
Open browser DevTools, right-click the element you want, 'Inspect'. You'll see the HTML. Right-click the HTML element in DevTools → 'Copy' → 'Copy selector' for an auto-generated CSS selector. These auto-selectors are often overly specific (nth-child(3) of nth-child(2)...) — look at the actual element for cleaner selectors based on class names or IDs.
When You Need a Real Browser
JavaScript-rendered content requires Playwright (modern choice) or Selenium. Playwright example:
playwright_scrape.py
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto('https://example.com')
page.wait_for_selector('.content')
data = page.eval_on_selector_all('.product-title',
'elements => elements.map(e => e.textContent)')
print(data)
browser.close()Frequently Asked Questions
Is web scraping legal?+
Why does my scraper return empty results even though the data is on the page?+
How do I avoid getting blocked while scraping?+
What's the difference between requests and Scrapy?+
🔧 Free Tools Used in This Guide
FreeToolKit Team
FreeToolKit Team
We build free browser-based tools and write practical guides without the fluff.
Tags: