If you've ever tried requests.get("https://www.youtube.com/...") and parsed the result with BeautifulSoup looking for video titles, you already know the punchline: you got back a wall of empty <div> tags and exactly zero useful data.
That's the single most common frustration I see from developers trying to scrape YouTube for the first time. YouTube is a single-page application — it renders almost everything client-side via JavaScript. The HTML your Python script receives is a shell. The actual video titles, view counts, and metadata? They're buried in a massive JSON blob called ytInitialData that gets injected by JS after the page loads.
So your perfectly reasonable soup.find("div", class_="ytd-video-renderer") returns None because that element literally doesn't exist in the raw HTTP response. Once I understood this, the whole puzzle clicked — and the four methods below are the result of a lot of testing, breaking things, and reading way too many GitHub issues. I'll walk you through each approach, show you exactly when to use which one, and toss in a no-code shortcut at the end for anyone who just wants the data without the project setup.
Why Scrape YouTube with Python in the First Place?
YouTube isn't just a video platform — it's a data source. With and , there's an enormous amount of publicly visible information that businesses, researchers, and creators want to analyze programmatically.
The catch is that YouTube's built-in analytics only shows data for your own channel. If you want to understand a competitor's upload cadence, track trending topics in your niche, or analyze audience sentiment from comments on someone else's videos, you need to scrape.
Here are the most common real-world use cases I've seen:
| Use Case | Who Needs It | Data Involved |
|---|---|---|
| Competitor analysis | Marketing teams, content strategists | View counts, upload frequency, engagement rates |
| Lead generation | Sales teams, B2B outreach | Channel contact info, business emails in descriptions |
| Market research | Product managers, analysts | Trending topics, audience sentiment via comments |
| Content strategy | YouTubers, agencies | High-performing formats, optimal title/tag patterns |
| SEO / keyword research | SEO specialists | Video titles, tags, descriptions, ranking signals |
| Brand monitoring | PR teams, brand managers | Mentions in video titles, comments, descriptions |
| Academic research | Researchers, data scientists | Comment datasets for sentiment analysis (one 2025 study hit 93.1% accuracy fine-tuning BERT on 45K YouTube comments) |
A DJI vs. GoPro vs. Insta360 competitive analysis, for example, found that — the kind of insight that's invisible from inside YouTube Studio.
Why requests + BeautifulSoup Alone Won't Scrape YouTube
Before we get to the methods that work, you need to understand why the obvious approach fails. This isn't academic — it'll save you hours of debugging.
The "obvious" approach looks something like this conceptually:
1import requests
2from bs4 import BeautifulSoup
3response = requests.get("https://www.youtube.com/@somechannel/videos")
4soup = BeautifulSoup(response.text, "html.parser")
5videos = soup.find_all("a", id="video-title-link")
6print(len(videos)) # 0 — every time
The result is always zero. As puts it: "The page was loaded dynamically, which is not supported by the requests lib." is blunter: "Using requests and BeautifulSoup alone you cannot execute JavaScript."
explains the mechanism: YouTube is built as a Single Page Application (SPA). When you use basic HTTP requests, you only receive the initial HTML shell — the actual content hasn't been rendered yet. The video data is hidden inside JavaScript objects that a browser would normally execute and inject into the DOM.
The good news: YouTube does embed all the data you need in the raw HTML. It's just not in DOM elements — it's in two JSON blobs inside <script> tags:
ytInitialData— page structure, video listings, engagement metrics, comment continuation tokensytInitialPlayerResponse— core video metadata (title, description, duration, formats, captions)
Both are accessible with a single requests.get() — no browser required — once you know how to extract and parse them. That's Method 1 below.
4 Ways to Scrape YouTube with Python: A Side-by-Side Comparison
Before diving into each method, here's the decision matrix. I've tested all four approaches and compared them across the criteria that actually matter when you're choosing a tool for a real project.
| Criteria | requests + BS4 (ytInitialData) | Selenium / Playwright | yt-dlp | YouTube Data API | No-Code (Thunderbit) |
|---|---|---|---|---|---|
| Setup complexity | Low | Medium | Low | Medium (API key) | None |
| Handles JS rendering | Partial (JSON parse) | Yes | Yes | N/A (structured API) | Yes |
| Speed | Fast | Slow | Fast | Fast | Fast (cloud) |
| Anti-bot risk | Medium | High | Low | None | Handled |
| Quota / rate limits | None (but IP blocks) | None (but detection) | None | 10,000 units/day | Credit-based |
| Comments extraction | Hard | Possible but complex | Built-in | Built-in | Depends on page |
| Transcripts | No | Complex | Yes | No | No |
| Best for | Quick metadata | Search results, dynamic pages | Bulk metadata + comments | Structured data at scale | Non-coders, quick exports |
Quick summary:

What YouTube Data Can You Actually Extract (and With Which Method)?
This is the reference table I wish existed when I started. No single method covers every field — which is exactly why this article covers four.
| Data Field | BS4 (ytInitialData) | Selenium/Playwright | yt-dlp | YouTube API | Thunderbit |
|---|---|---|---|---|---|
| Video title | âś… | âś… | âś… | âś… | âś… |
| View count | âś… | âś… | âś… | âś… | âś… |
| Like count | ⚠️ Inconsistent | ✅ | ✅ | ✅ | ✅ |
| Comments (text) | ❌ | ⚠️ Complex | ✅ | ✅ | ⚠️ |
| Transcript/subtitles | ❌ | ⚠️ | ✅ | ❌ | ❌ |
| Tags | ✅ | ✅ | ✅ | ✅ | ⚠️ |
| Thumbnail URLs | âś… | âś… | âś… | âś… | âś… |
| Channel subscriber count | ⚠️ | ✅ | ✅ | ✅ | ✅ |
| Upload date | âś… | âś… | âś… | âś… | âś… |
| Video duration | âś… | âś… | âś… | âś… | âś… |
| Shorts-specific data | ❌ | ⚠️ | ✅ | ⚠️ | ⚠️ |
Pick your method based on which rows matter most to your project. If you need comments and transcripts, yt-dlp is the clear winner. If you need structured stats at moderate scale, the API is your best bet. If you need data in two minutes, keep reading to the Thunderbit section.

Method 1: Scrape YouTube with Python Using requests + BeautifulSoup (ytInitialData Parsing)
This method exploits the fact that YouTube embeds all page data as JSON inside the raw HTML. You don't need a browser — you just need to know where to look.
- Difficulty: Beginner
- Time Required: ~15 minutes
- What You'll Need: Python 3.10+,
requests,beautifulsoup4
Step 1: Send a GET Request to the YouTube Page
Send a request with a realistic User-Agent header. The default python-requests/2.x header gets blocked instantly — confirms this is the single biggest footgun for beginners.
1import requests
2HEADERS = {
3 "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
4 "AppleWebKit/537.36 (KHTML, like Gecko) "
5 "Chrome/114.0.0.0 Safari/537.36",
6 "Accept-Language": "en-US,en;q=0.9",
7 "Cookie": "CONSENT=YES+cb", # bypasses EU consent wall
8}
9url = "https://www.youtube.com/@mkbhd/videos"
10response = requests.get(url, headers=HEADERS)
11print(response.status_code) # Should be 200
The CONSENT cookie is critical — without it, EU-region requests redirect to consent.youtube.com, which serves HTML containing no ytInitialData at all.
Step 2: Parse the HTML and Locate the ytInitialData Script
Use BeautifulSoup or a regex to find the <script> tag containing var ytInitialData =:
1import re
2import json
3# Extract ytInitialData JSON
4match = re.search(
5 r"var ytInitialData\s*=\s*({.*?});</script>",
6 response.text,
7 re.DOTALL
8)
9if match:
10 data = json.loads(match.group(1))
11 print("ytInitialData extracted successfully")
12else:
13 print("ytInitialData not found — check your headers/cookies")
A common mistake: using a non-greedy .*? with just }; as the end sentinel. Nested object terminators appear constantly inside the JSON and will truncate your capture early. Use };</script> as does — it's the last assignment in its own script block.
Step 3: Navigate the JSON Structure to Extract Video Data
The JSON is deeply nested. Rather than hard-coding paths that break whenever YouTube reshuffles the structure (which happens regularly — documents multiple format changes since 2023), use a recursive key search:
1def search_dict(partial, search_key):
2 stack = [partial]
3 while stack:
4 cur = stack.pop()
5 if isinstance(cur, dict):
6 for k, v in cur.items():
7 if k == search_key:
8 yield v
9 else:
10 stack.append(v)
11 elif isinstance(cur, list):
12 stack.extend(cur)
13# Extract video info from channel page
14videos = []
15for vr in search_dict(data, "videoRenderer"):
16 videos.append({
17 "video_id": vr.get("videoId"),
18 "title": vr["title"]["runs"][0]["text"],
19 "views": vr.get("viewCountText", {}).get("simpleText", "N/A"),
20 "published": vr.get("publishedTimeText", {}).get("simpleText", "N/A"),
21 })
22print(f"Found {len(videos)} videos")
23for v in videos[:5]:
24 print(f" {v['title']} — {v['views']}")
This recursive approach is what , yt-dlp, and Scrapfly all converged on — it survives YouTube's frequent JSON restructuring.
Step 4: Export Scraped Data to CSV or Excel
1import csv
2with open("youtube_videos.csv", "w", newline="", encoding="utf-8") as f:
3 writer = csv.DictWriter(f, fieldnames=["video_id", "title", "views", "published"])
4 writer.writeheader()
5 writer.writerows(videos)
6print("Data exported to youtube_videos.csv")
When to Use This Method (and When Not To)
Best for: Quick metadata pulls from a handful of channel or video pages. Lightweight SEO tools. One-off analyses where you need title, views, and upload date.
Limitations: The JSON structure changes — documented breakages include the like button refactor (2023: toggleButtonRenderer → segmentedLikeDislikeButtonViewModel), the description refactor (2023: description.runs[] → attributedDescription.content), and the channel Videos tab redesign (2022–2023: gridRenderer → richGridRenderer). Datacenter IPs typically get soft-blocked after 50–200 requests. No comments, no transcripts.
Method 2: Scrape YouTube with Python Using Selenium or Playwright
When you need to interact with the page — scrolling through search results, clicking tabs, expanding descriptions — browser automation is the way to go.
- Difficulty: Intermediate
- Time Required: ~30 minutes
- What You'll Need: Python 3.10+, Playwright (
pip install playwright && playwright install) or Selenium + ChromeDriver
I recommend Playwright over Selenium for new projects. show Playwright at roughly versus Selenium. Playwright uses a persistent WebSocket via Chrome DevTools Protocol while Selenium uses WebDriver-over-HTTP, adding a translation layer per command.
Step 1: Set Up Playwright
1pip install playwright
2playwright install chromium
1from playwright.sync_api import sync_playwright
2pw = sync_playwright().start()
3browser = pw.chromium.launch(headless=False) # headful avoids some detection
4context = browser.new_context()
5# Preset consent cookie to bypass EU wall
6context.add_cookies([{
7 "name": "SOCS",
8 "value": "CAISNQgDEitib3FfaWRlbnRpdHlmcm9udGVuZHVpc2VydmVyXzIwMjMwODI5LjA3X3AxGgJlbiACGgYIgJnPpwY",
9 "domain": ".youtube.com",
10 "path": "/",
11}])
12page = context.new_page()
Step 2: Navigate to a YouTube Page and Wait for Content to Load
1page.goto("https://www.youtube.com/@mkbhd/videos")
2page.wait_for_selector("a#video-title-link", timeout=15000)
3print("Page loaded — video elements visible")
If you're scraping search results, navigate to https://www.youtube.com/results?search_query=your+query instead.
Step 3: Handle Infinite Scroll to Load More Videos
YouTube uses infinite scroll on channel pages and search results. Here's the canonical scrollHeight loop, adapted from :
1prev_height = -1
2max_scrolls = 20 # cap this — a 10K-video channel will scroll forever
3scroll_count = 0
4while scroll_count < max_scrolls:
5 page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
6 page.wait_for_timeout(1500) # wait for new content to load
7 new_height = page.evaluate("document.body.scrollHeight")
8 if new_height == prev_height:
9 break # no new content loaded
10 prev_height = new_height
11 scroll_count += 1
12print(f"Scrolled {scroll_count} times")
Step 4: Extract Video Data from the Rendered Page
1video_elements = page.query_selector_all("a#video-title-link")
2videos = []
3for el in video_elements:
4 title = el.inner_text()
5 href = el.get_attribute("href")
6 video_id = href.split("v=")[-1] if href else None
7 videos.append({"title": title, "video_id": video_id, "url": f"https://www.youtube.com{href}"})
8print(f"Extracted {len(videos)} videos")
For view counts and upload dates, you'll need to grab sibling elements. warns that id="video-title-link" isn't universal — YouTube ships multiple page variants. The robust fallback is a[href*="watch"].
Step 5: Export to CSV or Google Sheets
1import csv
2with open("youtube_playwright.csv", "w", newline="", encoding="utf-8") as f:
3 writer = csv.DictWriter(f, fieldnames=["title", "video_id", "url"])
4 writer.writeheader()
5 writer.writerows(videos)
6browser.close()
7pw.stop()
When to Use This Method (and When Not To)
Best for: Scraping search results, interacting with dynamic page elements (clicking tabs, expanding descriptions), anything that requires a fully rendered DOM.
Limitations: Slow (~1.5–3 seconds per video on a scroll-extract flow). High anti-bot detection risk — vanilla Selenium sets navigator.webdriver === true, which . Resource-heavy (each browser instance uses 200–500 MB RAM). For 100 videos, expect 3–8 minutes versus seconds with yt-dlp.
Method 3: Scrape YouTube with Python Using yt-dlp
yt-dlp is the Swiss Army knife of YouTube scraping. It's a community fork of youtube-dl with , active nightly releases, and built-in support for metadata, comments, transcripts, and batch scraping — all without needing a browser or API key.
- Difficulty: Beginner to Intermediate
- Time Required: ~10 minutes
- What You'll Need: Python 3.10+,
pip install yt-dlp
Step 1: Install yt-dlp
1pip install yt-dlp
No browser drivers, no API keys, no config files.
Step 2: Extract Video Metadata Without Downloading
1import yt_dlp
2opts = {
3 "quiet": True,
4 "skip_download": True, # no video bytes — metadata only
5 "no_warnings": True,
6}
7with yt_dlp.YoutubeDL(opts) as ydl:
8 info = ydl.extract_info(
9 "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
10 download=False
11 )
12print(f"Title: {info['title']}")
13print(f"Views: {info['view_count']:,}")
14print(f"Likes: {info.get('like_count', 'N/A')}")
15print(f"Duration: {info['duration']}s")
16print(f"Upload date: {info['upload_date']}")
17print(f"Channel: {info['channel']} ({info.get('channel_follower_count', 'N/A')} subs)")
18print(f"Tags: {info.get('tags', [])[:5]}")
A typical extract_info call returns 80–120 fields depending on video state: id, title, channel, channel_id, channel_follower_count, view_count, like_count, comment_count, upload_date, duration, tags, categories, description, thumbnails, is_live, availability, automatic_captions, subtitles, chapters, heatmap, and more.
Step 3: Extract Comments from a YouTube Video
1opts = {
2 "quiet": True,
3 "skip_download": True,
4 "getcomments": True,
5 "extractor_args": {
6 "youtube": {
7 "max_comments": ["200", "50", "50", "10"], # total, parents, replies-per, replies-total
8 "comment_sort": ["top"],
9 }
10 },
11}
12with yt_dlp.YoutubeDL(opts) as ydl:
13 info = ydl.extract_info(
14 "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
15 download=False
16 )
17comments = info.get("comments", [])
18print(f"Fetched {len(comments)} comments")
19for c in comments[:3]:
20 print(f" [{c.get('like_count', 0)} likes] {c['author']}: {c['text'][:80]}...")
Comment extraction is slow. reports comment fetch at ~30 KB/s — a video with 100K comments can take hours. documents videos where format URLs expire (~6 hours) before comment pagination finishes. Set max_comments aggressively for large videos.
Step 4: Extract Transcripts and Subtitles
Neither the YouTube Data API nor BS4 parsing can give you full transcripts. This is yt-dlp's unique advantage.
1opts = {
2 "quiet": True,
3 "skip_download": True,
4 "writesubtitles": True,
5 "writeautomaticsub": True,
6 "subtitleslangs": ["en", "en-orig"],
7 "subtitlesformat": "json3", # machine-parseable: start/dur in ms + text
8 "outtmpl": "%(id)s.%(ext)s",
9}
10with yt_dlp.YoutubeDL(opts) as ydl:
11 info = ydl.extract_info(
12 "https://www.youtube.com/watch?v=dQw4w9WgXcQ",
13 download=False
14 )
15# Access subtitle data directly from info dict
16auto_captions = info.get("automatic_captions", {})
17manual_subs = info.get("subtitles", {})
18print(f"Auto-caption languages: {list(auto_captions.keys())[:10]}")
19print(f"Manual subtitle languages: {list(manual_subs.keys())}")
The json3 format is preferred for machine parsing — each segment has start/dur in milliseconds plus text runs. Language codes are BCP-47 (en, en-US, zh-Hans, ja, es).
Step 5: Batch Scrape Multiple Videos or an Entire Channel
1opts = {
2 "quiet": True,
3 "skip_download": True,
4 "extract_flat": "in_playlist", # fast — just video IDs and titles
5 "sleep_interval": 2,
6 "max_sleep_interval": 6,
7}
8with yt_dlp.YoutubeDL(opts) as ydl:
9 info = ydl.extract_info(
10 "https://www.youtube.com/@mkbhd/videos",
11 download=False
12 )
13entries = info.get("entries", [])
14print(f"Found {len(entries)} videos on channel")
15for e in entries[:5]:
16 print(f" {e.get('title', 'N/A')} — {e.get('id')}")
Pass a channel URL, playlist URL, or even a search query (ytsearch10:python scraping) and yt-dlp handles pagination internally.
When to Use This Method (and When Not To)
Best for: Bulk metadata extraction, comments, transcripts, downloading videos, channel-wide scrapes where you need the full field set.
Limitations: Not ideal for scraping search result pages (Selenium/Playwright is better for that). The 2024–2026 anti-bot arms race has made yt-dlp more complex to run at scale — YouTube now enforces on some clients. For production use, install the plugin and use --cookies-from-browser chrome (with a throwaway account — the yt-dlp team warns that cookies from a real Google account can get that account banned).
Method 4: Scrape YouTube with Python Using the YouTube Data API
The official YouTube Data API v3 is the most reliable and structured way to get YouTube data. The responses are clean JSON, the fields are documented, and there's no anti-bot cat-and-mouse game. But there's a catch that most tutorials gloss over: the quota system.
- Difficulty: Intermediate
- Time Required: ~20 minutes (including API key setup)
- What You'll Need: Python 3.10+, a Google Cloud project,
pip install google-api-python-client
Step 1: Get a YouTube Data API Key
- Go to
- Create a new project (or select an existing one)
- Navigate to APIs & Services → Library → search for "YouTube Data API v3" → Enable
- Go to APIs & Services → Credentials → Create Credentials → API Key
- Copy the key — you'll use it in the code below
Step 2: Make Your First API Call
1from googleapiclient.discovery import build
2API_KEY = "YOUR_API_KEY_HERE"
3youtube = build("youtube", "v3", developerKey=API_KEY)
4# Fetch details for a specific video
5response = youtube.videos().list(
6 part="snippet,statistics,contentDetails",
7 id="dQw4w9WgXcQ"
8).execute()
9video = response["items"][0]
10print(f"Title: {video['snippet']['title']}")
11print(f"Views: {video['statistics']['viewCount']}")
12print(f"Likes: {video['statistics'].get('likeCount', 'hidden')}")
13print(f"Comments: {video['statistics'].get('commentCount', 'disabled')}")
14print(f"Duration: {video['contentDetails']['duration']}")
15print(f"Tags: {video['snippet'].get('tags', [])[:5]}")
The response is clean, typed, and documented. No JSON archaeology required.
Step 3: Extract Video Details, Channel Info, and Comments
1# Search for videos
2search_response = youtube.search().list(
3 part="snippet",
4 q="python web scraping tutorial",
5 type="video",
6 maxResults=10,
7 order="viewCount"
8).execute()
9for item in search_response["items"]:
10 print(f" {item['snippet']['title']} — {item['id']['videoId']}")
11# Fetch comments
12comments_response = youtube.commentThreads().list(
13 part="snippet",
14 videoId="dQw4w9WgXcQ",
15 maxResults=20,
16 order="relevance"
17).execute()
18for item in comments_response["items"]:
19 comment = item["snippet"]["topLevelComment"]["snippet"]
20 print(f" [{comment['likeCount']} likes] {comment['authorDisplayName']}: {comment['textDisplay'][:80]}")
The YouTube API Quota Reality (What No One Tells You)
This is the section that separates a useful guide from a copy-paste tutorial. The default allocation is , reset at midnight Pacific Time. Here's what each call costs:
| API Endpoint | Quota Cost per Call | Max Results per Call |
|---|---|---|
search.list | 100 units | 50 results |
videos.list | 1 unit | 50 video IDs (batched) |
channels.list | 1 unit | 50 channel IDs |
commentThreads.list | 1 unit | 100 comments |
captions.list | 50 units | N/A |
Now the math. Say you want to scrape 1,000 search results:
- Search calls: 1,000 results ÷ 50 per page = 20 calls × 100 units = 2,000 units (20% of your daily budget — gone)
- Video details for those 1,000 videos: 1,000 IDs ÷ 50 per batch = 20 calls × 1 unit = 20 units (cheap — batched
videos.listis the saving grace) - Comments for those 1,000 videos (assuming 1 page each): 1,000 calls Ă— 1 unit = 1,000 units
Total: ~3,020 units for a modest scrape. But if those videos have deep comment threads (50+ pages each), you'll blow through the remaining 7,000 units fast. A video with 50,000 comments = ~500 pages = 500 units. Scrape 20 such videos and you're done for the day.
The requires a full compliance audit: privacy policy URL, ToS URL, video walkthrough of the app, quota math justification. Community reports suggest typical Google response in 3–5 business days, full approval can take weeks to months, and many applications get rejected — especially "I want more data for analysis" use cases.
When to use the API: Small-to-medium scale, when you need structured/reliable data, when comments and channel stats matter, when you can tolerate the quota ceiling.
When scraping makes more sense: Large-scale projects (>10K videos/day), fields the API doesn't expose (full transcripts — captions.download requires OAuth + video owner permission), or when you need more than 500 search results per query (hard API limit regardless of totalResults claim).
The No-Code Shortcut: Scrape YouTube with Thunderbit (No Python Required)
If you need Python for a data pipeline, use Methods 1–4 above. But if you need YouTube data in 2 minutes — maybe you're a marketer pulling competitor stats, or a developer who just wants a quick data pull without setting up a project environment — there's a faster path.
is an AI web scraper Chrome extension that we built specifically for cases where writing code is overkill. It works on YouTube pages directly in your browser.
How to Scrape YouTube with Thunderbit in 3 Steps
Step 1: Install the and open a YouTube channel page, search results page, or video page.
Step 2: Click "AI Suggest Fields" in the Thunderbit sidebar. The AI reads the page and suggests columns like video title, views, upload date, duration, channel name, and thumbnail URL. You can add, remove, or rename columns as needed.
Step 3: Click "Scrape" and export to Google Sheets, Excel, CSV, Airtable, or Notion. The data lands in a clean table, ready to use.
Who This Is For
- Marketers who need competitor channel data but don't code
- Developers who want a quick data pull without setting up a virtual environment and installing dependencies
- Anyone hitting anti-bot walls — Thunderbit scrapes in the user's own logged-in browser session, which inherits your cookies and PO tokens. This sidesteps many of the blocking issues that plague server-side scrapers
- Thunderbit can also use to visit each video page and enrich the table with additional details (like count, description, tags)
For a deeper look at how Thunderbit handles YouTube specifically, check out the and the .
Tips to Scrape YouTube with Python Without Getting Blocked
These tips apply across all four Python methods. YouTube's anti-bot measures are rated , with three main signals: IP behavioral analysis, JS execution requirements, and frequently-changing HTML structure.
For all methods:
- Rotate User-Agent strings and the full header set —
Accept,Accept-Language,Sec-CH-UAclient hints must all match the declared UA. has a current list. - Add random delays of 2–8 seconds between requests. Fixed intervals are a detection signal.
- Use residential proxies for anything beyond a handful of pages. Datacenter IPs (AWS, GCP, Hetzner) are .
- Rotate session + IP together — YouTube ties sessions to IP, and the same session cookie appearing on two IPs is a red flag.
For requests + BS4: Set the CONSENT=YES+cb cookie. Without it, EU requests get redirected to a consent page with no data.
For Selenium/Playwright: Run headful with xvfb on Linux servers rather than --headless=new — headless Chrome still leaks enough fingerprint signals for sophisticated detectors. Consider which applies ~17 evasions.
For yt-dlp: Use sleep_interval and max_sleep_interval options. Install the plugin for PO Token generation. Use --cookies-from-browser chrome with a throwaway account.
For the API: Monitor quota usage via the and batch requests efficiently. A single videos.list call with 50 comma-separated IDs costs 1 unit — use it.
For Thunderbit: Anti-bot measures are handled automatically since scraping happens in your browser session. You're essentially just automating what you'd do manually.
Is It Legal to Scrape YouTube with Python?
It depends on what you scrape, how you scrape it, and what you do with the data.
The legal landscape shifted in 2024 with Meta Platforms v. Bright Data (N.D. Cal., January 2024), where . Scraping publicly accessible data became "significantly less risky" after this ruling. On the other hand, hiQ v. LinkedIn ended with a for breach of ToS, CFAA violations (fake accounts), and trespass to chattels — plus a permanent injunction.
YouTube's own are explicit: "You are not allowed to access the Service using any automated means (such as robots, botnets or scrapers)" except with prior written permission or as permitted by applicable law. The YouTube Data API is the officially sanctioned way to access data.
Some practical ground rules:
- Scraping publicly visible data for personal research or non-commercial analysis is generally lower risk
- The API is the safest path — it's explicitly authorized
- Avoid scraping private/login-gated content, downloading copyrighted videos for redistribution, or violating GDPR with personal data from comments
- YouTube comments contain personal data under GDPR Art. 4(1) — handle EU data subjects' information carefully
- Consult legal counsel for commercial scraping projects
None of this is legal advice. The landscape is shifting fast — a new wave of that scraped YouTube for training data is reshaping the landscape in 2025–2026.
Which Method Should You Use to Scrape YouTube with Python?
The decision guide:
- Need quick metadata from a few pages? → Method 1 (requests + BS4). Fast, lightweight, no dependencies beyond
requestsandbeautifulsoup4. - Need to scrape search results or interact with dynamic pages? → Method 2 (Selenium/Playwright). Full browser rendering, infinite scroll support, but slow and detection-prone.
- Need bulk metadata, comments, or transcripts? → Method 3 (yt-dlp). The most capable single tool — for a reason.
- Need structured, reliable data at moderate scale? → Method 4 (YouTube Data API). Official, clean, but quota-limited to .
- Need data in 2 minutes without writing code? → . Browser-based, AI-powered, exports to Google Sheets in clicks.
No single method covers every use case. Bookmark the comparison table and extractable fields reference above — they'll save you time on your next project. And if you want to explore more , we've got plenty of guides on the Thunderbit blog covering everything from to .
FAQs
Can I scrape YouTube without an API key?
Yes. Methods 1 (requests + BS4), 2 (Selenium/Playwright), and 3 (yt-dlp) don't require an API key. Only Method 4 (YouTube Data API) needs one. Thunderbit also works without any API key — it scrapes directly in your browser.
What's the fastest way to scrape YouTube with Python?
For Python, yt-dlp and requests + BS4 are the fastest — both avoid browser overhead and can pull metadata in seconds per video. yt-dlp is particularly fast for batch operations since it handles pagination internally. For non-Python users, Thunderbit is the fastest overall since there's zero setup time.
How do I scrape YouTube comments with Python?
yt-dlp has built-in comment extraction via the getcomments option — it's the simplest path. The YouTube Data API also supports comments via commentThreads.list (1 quota unit per call, up to 100 comments per page). Selenium/Playwright can technically do it by scrolling and extracting rendered comment elements, but it's slow and fragile.
Can I scrape YouTube Shorts with Python?
Yes. yt-dlp handles Shorts metadata well — it treats them as regular videos with additional Shorts-specific fields. The YouTube Data API has partial support (Shorts views counting — views now counts any playback start/replay). BS4 and Selenium/Playwright have limited Shorts support since the Shorts shelf uses different DOM structures.
How many YouTube videos can I scrape per day?
With the YouTube Data API, you're limited to ~10,000 quota units/day. Using batched videos.list calls (50 IDs per call at 1 unit), that's up to 500,000 video stat lookups per day — but search.list at 100 units per call eats the budget fast. With scraping methods (BS4, Selenium, yt-dlp), the limit is practical rather than hard-coded: IP blocks typically kick in after a few hundred to a few thousand requests per IP per day, depending on your proxy setup and request patterns. Thunderbit uses a credit-based system tied to your .
Learn More
