Scrape LinkedIn with Python: A Step-by-Step Guide

Last Updated on April 14, 2026

If you’ve ever tried to build a B2B lead list, run a competitor analysis, or just keep your CRM up to date, you know the goldmine that is LinkedIn. But let’s be honest—manually copying profile info is about as fun as watching paint dry, and LinkedIn’s own tools rarely give you the data you really want. That’s why, in 2026, more sales and operations teams than ever are looking to scrape LinkedIn with Python—turning hours of tedious clicking into a few lines of code and a spreadsheet full of prospects.

man-linkedin-notebook.webp

But here’s the catch: LinkedIn is now the Fort Knox of business data. With over 1.3 billion members and a whopping 310 million monthly active users (), it’s the #1 source for B2B leads—but also the most heavily defended against bots and scrapers. In fact, LinkedIn restricted over 30 million accounts in 2025 alone for scraping or automation (). So, how do you actually extract LinkedIn data with Python in 2026—without getting your account sent to the digital gulag? Let’s break it down, step by step, from setup to safe scraping, data cleaning, and how tools like Thunderbit can turbocharge your workflow.

What Does It Mean to Scrape LinkedIn with Python?

When we talk about scraping LinkedIn with Python, we’re really talking about using Python scripts and libraries to automate the process of collecting data from LinkedIn’s web pages. Instead of copying and pasting names, job titles, or company info one by one, you write a script that does the heavy lifting—visiting profiles, extracting the fields you want, and saving them in a structured format.

Manual data collection is like picking apples one at a time. LinkedIn data extraction with Python is like shaking the whole tree and catching the apples in a basket. The core target keywords here—linkedin data extraction python, python linkedin scraper, and automate linkedin scraping—all point to the same idea: using code to gather LinkedIn data at scale, faster and (hopefully) safer than any human could.

Business scenarios where LinkedIn scraping is used:

  • Building targeted lead lists for sales outreach
  • Enriching CRM records with up-to-date job titles and companies
  • Monitoring competitor hiring trends or executive moves
  • Mapping out industry networks for market research
  • Aggregating company posts or job listings for analysis

In short, if you need structured LinkedIn data and you don’t want to spend your weekend clicking “Connect,” Python is your friend.

Why Automate LinkedIn Scraping? Key Business Use Cases

Let’s get real: LinkedIn isn’t just a social network—it’s the backbone of modern B2B sales and marketing. Here’s why teams are obsessed with automating LinkedIn scraping in 2026:

  • Lead Generation: and 62% say it actually produces leads. LinkedIn delivers 277% more leads than Facebook and Twitter combined.
  • Market & Competitor Research: LinkedIn is the only place where you can see real-time org charts, hiring trends, and company news at scale.
  • CRM Enrichment: Keeping your CRM fresh is a nightmare without automation. Scraping LinkedIn means you can update titles, companies, and contact info in bulk.
  • Content & Event Analysis: Want to know who’s posting, speaking, or hiring in your niche? LinkedIn scraping gives you the data.

Here’s a quick table of the most common use cases:

TeamUse CaseValue Delivered
SalesLead list building, outreach prepMore meetings, higher conversion
MarketingAudience research, content curationBetter targeting, higher engagement
OperationsCRM enrichment, org mappingCleaner data, less manual entry
RecruitingTalent sourcing, competitor trackingFaster hiring, smarter pipelines

And the ROI? Teams using AI-driven automation for prospecting report saving 2–3 hours per day (), and companies like TripMaster have seen 650% ROI from LinkedIn-based lead gen (). That’s not just a time-saver—it’s a pipeline multiplier.

Python vs. Other LinkedIn Scraping Solutions: What You Need to Know

So, why use Python instead of a browser extension or a SaaS tool? Here’s the honest breakdown:

Manual Copy-Paste

  • Pros: No setup, no risk (unless you get carpal tunnel)
  • Cons: Slow, error-prone, impossible to scale

Browser Extensions (like PhantomBuster, Evaboot)

  • Pros: Easy setup, no coding, decent for small jobs
  • Cons: Limited scale, high ban risk, often require Sales Navigator, monthly fees

SaaS APIs (like Bright Data, Apify)

  • Pros: High scale, low maintenance, compliance handled by provider
  • Cons: Expensive at volume, sometimes laggy/cached data, less flexibility

Python Scripts

  • Pros: Maximum flexibility, lowest per-row cost at scale, real-time data
  • Cons: High technical skill needed, highest ban risk, ongoing maintenance

Here’s a side-by-side comparison:

DimensionDIY PythonBrowser ExtensionSaaS API
Setup timeDays–weeksMinutesHours
Technical skillHighLowMedium
Cost (10K rows)~$200 (proxies)$50–300$300–500
Scale ceilingHighLow–MediumHigh
Ban riskHighestHighLowest
Data freshnessReal-timeReal-timeCached
MaintenanceOngoingLowNone
ComplianceUser-owned riskUser-owned riskProvider-owned

Bottom line: If you’re technical and want full control, Python is unbeatable. But for most business users, tools like offer a much faster, safer path to LinkedIn data—especially as LinkedIn’s defenses get tougher every year.

Getting Started: Setting Up Your Python LinkedIn Scraper

Ready to roll up your sleeves? Here’s how to set up your Python environment for LinkedIn scraping in 2026:

1. Install Python and Key Libraries

  • Python 3.10+ is recommended for best compatibility.
  • Core libraries:
    • Playwright (the new standard for browser automation)
    • Selenium (still popular, but slower and easier to detect)
    • Beautiful Soup (for parsing HTML)
    • Requests (for simple HTTP requests; limited use on LinkedIn)
    • pandas (for data cleaning/export)

Install via pip:

1pip install playwright selenium beautifulsoup4 pandas

For Playwright, you’ll also need to install browser binaries:

1playwright install

2. Set Up Browser Drivers

  • Playwright manages its own drivers.
  • Selenium needs or .
  • Make sure your browser and driver versions match.

3. Prepare for Login

  • You’ll need a LinkedIn account (preferably aged, with real activity).
  • For most scripts, you’ll either:
    • Automate the login flow (risk of CAPTCHA)
    • Inject your li_at session cookie (faster, but still risky)

4. Respect LinkedIn’s Terms

Warning: Scraping LinkedIn, even with your own account, violates their User Agreement. The legal landscape is complex (see the hiQ v. LinkedIn saga), and LinkedIn is now extremely aggressive in enforcement. Use these scripts for educational or internal research purposes, and never sell or publicly distribute scraped data.

Here’s where things get tricky. LinkedIn’s anti-bot defenses in 2026 are no joke. They’ve shut down entire businesses (RIP Proxycurl) and restricted over 30 million accounts in 2025 alone (). So, how do you scrape without getting burned?

The Main Risks

  • Rate Limits: Unauthenticated users get about 50 profile views per day per IP. Logged-in accounts can do a few hundred before hitting CAPTCHAs or bans ().
  • CAPTCHAs: Frequent, especially after rapid profile views or logins.
  • Account Restrictions: LinkedIn can lock, restrict, or permanently ban accounts for suspicious activity.

Proven Strategies to Reduce Risk

  • Use Mobile or Aged Residential Proxies: Mobile proxies have an 85% survival rate on LinkedIn, compared to 50% for residential and near-zero for datacenter IPs ().
  • Randomize Delays: Don’t use fixed time.sleep(5). Instead, randomize delays between 2–8 seconds.
  • Warm Up Accounts: Don’t hit 100 profiles on a fresh account. Start slow, mimic real user behavior.
  • Scrape During Business Hours: Match the timezone of your account.
  • Rotate User Agents Per Session: But don’t change mid-session—LinkedIn flags this.
  • Scroll Naturally: Use browser automation to scroll and trigger lazy-loaded content.
  • Separate IP per Account: Never run multiple accounts behind one proxy.
  • Monitor for Early Warnings: 429 errors, redirects to /authwall, or empty profile bodies mean you’re close to a ban.

Pro tip: Even the best stealth plugins (Playwright Stealth, undetected-chromedriver) only patch surface-level fingerprints. LinkedIn’s detection goes much deeper—so don’t get cocky.

Choosing the Right Python Libraries for LinkedIn Data Extraction

In 2026, the Python scraping landscape is clearer than ever. Here’s how the main libraries stack up:

LibraryStatic HTMLJS-renderedLogin flowsSpeedBest for
Requests + BS4FastestSmall, public-only pages
Selenium 4.xSlowLegacy, broad browser support
Playwright (Python)FastDefault for LinkedIn in 2026
ScrapyWith pluginWith effortFastHigh-volume structured crawls

Why Playwright is the winner for LinkedIn:

  • 12% faster page loads and 15% lower memory usage than Selenium ()
  • Handles LinkedIn’s asynchronous loading without manual hacks
  • Native tab management for parallel scraping
  • Official stealth plugin for basic fingerprint evasion

Beginner tip: If you’re just starting out, Playwright is your best bet. Selenium is still useful for legacy projects, but it’s slower and easier to detect.

Step-by-Step: Your First Python LinkedIn Scraper Script

Let’s walk through a basic example using Selenium (for beginners) and Playwright (for production). Remember: these scripts are for educational use only.

Example 1: Minimal Selenium Login and Profile Scrape

1from selenium import webdriver
2from selenium.webdriver.common.by import By
3from selenium.webdriver.common.keys import Keys
4import time, random
5driver = webdriver.Chrome()
6driver.get("https://www.linkedin.com/login")
7driver.find_element(By.ID, "username").send_keys("you@example.com")
8driver.find_element(By.ID, "password").send_keys("yourpassword" + Keys.RETURN)
9time.sleep(random.uniform(3, 6))  # randomized delay
10# Visit a profile
11driver.get("https://www.linkedin.com/in/some-profile/")
12time.sleep(random.uniform(4, 8))
13# Scroll to trigger lazy-load
14driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
15# Extract data (simplified)
16name = driver.find_element(By.CSS_SELECTOR, "h1").text
17print("Name:", name)
18driver.quit()

Note: For production, you’ll want to inject your li_at cookie instead of logging in every time (to avoid CAPTCHAs).

1import asyncio
2from linkedin_scraper import BrowserManager, PersonScraper
3async def main():
4    async with BrowserManager() as browser:
5        await browser.load_session("session.json")  # stores your login session
6        scraper = PersonScraper(browser.page)
7        person = await scraper.scrape("https://linkedin.com/in/username")
8        print(person.name, person.experiences)
9asyncio.run(main())

()

Where to insert anti-ban measures:

  • Use mobile proxies in your browser manager
  • Randomize delays between actions
  • Scrape in small batches, not all at once

Warning: Any selector-based scraper will break when LinkedIn updates their DOM (which happens every few weeks). Be ready to maintain your scripts.

Cleaning and Formatting LinkedIn Data with Python

Scraping is only half the battle. LinkedIn data is messy—think duplicate names, inconsistent job titles, and weird Unicode characters. Here’s how to clean it up:

1. Use pandas for Table Wrangling

1import pandas as pd
2df = pd.read_csv("linkedin_raw.csv")
3df = df.drop_duplicates(subset=["email", "phone"])  # Exact dedupe
4df["name"] = df["name"].str.lower().str.strip()

2. Fuzzy Matching for Company Names

1from rapidfuzz import fuzz
2def is_similar(a, b):
3    return fuzz.ratio(a, b) > 90
4# Example: "Acme Corp" vs "ACME Corporation"

3. Normalize Phone Numbers and Emails

1import phonenumbers
2from email_validator import validate_email, EmailNotValidError
3# Phone normalization
4num = phonenumbers.parse("+1 415-555-1234", None)
5print(phonenumbers.format_number(num, phonenumbers.PhoneNumberFormat.E164))
6# Email validation
7try:
8    v = validate_email("someone@example.com")
9    print(v.email)
10except EmailNotValidError as e:
11    print("Invalid email:", e)

4. Export to Excel, Google Sheets, or CRM

  • Excel: df.to_excel("cleaned_data.xlsx")
  • Google Sheets: Use gspread library
  • Airtable: Use pyairtable
  • Salesforce/HubSpot: Use their respective Python API clients

Pro tip: Always clean and deduplicate before importing to your CRM. Nothing kills a sales rep’s mood like calling the same prospect twice.

Boosting LinkedIn Scraping Efficiency with Thunderbit

Now, let’s talk about making your life even easier. As much as I love Python, maintaining scrapers for LinkedIn is a never-ending game of whack-a-mole. That’s why, at Thunderbit, we built an that takes the pain out of LinkedIn data extraction.

Why Thunderbit?

  • 2-Click Scraping: Just click “AI Suggest Fields” and Thunderbit reads the page, proposes columns, and extracts the data—no code, no selectors, no headaches.
  • Subpage Scraping: Scrape a search results page, then let Thunderbit visit each profile and enrich your table automatically.
  • Instant Templates: Pre-built for LinkedIn, Amazon, Google Maps, and more—get started in seconds.
  • Free Export: Send data to Excel, Google Sheets, Airtable, Notion, or download as CSV/JSON.
  • AI Autofill: Automate form-filling and repetitive workflows—great for sales ops and CRM admins.
  • Cloud or Browser Scraping: Choose the mode that fits your use case and login needs.
  • No Maintenance: Thunderbit’s AI adapts to LinkedIn’s layout changes, so you’re not constantly fixing broken scripts.

Thunderbit is trusted by over 100,000 users worldwide and has a 4.4★ rating on the Chrome Web Store (). For most business users, it’s the fastest, safest way to extract LinkedIn data—without risking your account or your sanity.

Advanced Tips: Scaling and Automating LinkedIn Scraping Workflows

If you’re ready to go pro, here’s how to scale up your LinkedIn scraping game:

1. Scheduling Scripts

  • cron (Linux/Mac) or Task Scheduler (Windows) for simple jobs
  • APScheduler or Prefect 3 for Python-native scheduling and retries
  • Airflow for enterprise-grade orchestration

2. Cloud Deployment

  • AWS Lambda (with Playwright in a container)
  • GCP Cloud Run
  • Railway / Fly.io / Render for easy Playwright hosting
  • Apify for scraping-specific cloud workflows

3. Monitoring and Drift Detection

  • Sentry for error tracking
  • Custom alerts for spikes in 429 errors or DOM changes
  • Hash-based diffing to detect when LinkedIn’s layout changes

4. CRM Integration

  • Use APIs for Salesforce, HubSpot, Notion, or Airtable to push cleaned data automatically
  • Build a pipeline: Scheduler → Scraper → pandas clean/dedupe → Enrichment → CRM push → Alerts

5. Staying Compliant

  • Never scrape more than a few hundred profiles per account per day
  • Rotate proxies and user agents
  • Monitor for early ban signals and pause scripts if you see them

Pro tip: Even with all this automation, LinkedIn can (and will) change the rules. Always have a backup plan—and consider using Thunderbit for the most critical workflows.

Conclusion & Key Takeaways

Scraping LinkedIn with Python in 2026 is both more powerful and riskier than ever. Here’s what you need to remember:

  • LinkedIn is the #1 B2B data source—but also the most heavily defended against scrapers.
  • Python gives you maximum flexibility for LinkedIn data extraction, but comes with high ban risk and ongoing maintenance.
  • Playwright is now the gold standard for LinkedIn scraping—faster and more reliable than Selenium.
  • Reducing ban risk is all about proxies, delays, and mimicking real user behavior—mobile proxies survive at 85%, residential at 50%, datacenter at 0%.
  • Data cleaning is essential—use pandas, fuzzy matching, and validation libraries before importing to your CRM.
  • Thunderbit offers a safer, faster alternative—with AI-powered scraping, subpage enrichment, instant export, and no code required.
  • Scaling up means automating everything—from scheduling to monitoring to CRM integration.

And above all: scrape ethically and responsibly. LinkedIn’s legal team is not known for their sense of humor.

If you’re tired of fighting LinkedIn’s ever-changing defenses, . It’s the tool I wish I’d had when I started—and it might just save you (and your LinkedIn account) a world of pain.

Want to go deeper? Check out the for more guides on web scraping, automation, and sales ops best practices.

Try Thunderbit for Faster LinkedIn Scraping

FAQs

1. Is scraping LinkedIn with Python legal in 2026?
The legal landscape is complex. While the hiQ v. LinkedIn case ruled that scraping public data doesn’t violate the CFAA, LinkedIn can (and does) enforce its User Agreement, which prohibits scraping. In 2025, LinkedIn shut down Proxycurl and restricted over 30 million accounts for scraping. Always use scraping scripts for internal or educational purposes, and never sell or publicly distribute scraped data.

2. What’s the safest way to automate LinkedIn scraping?
Use aged accounts, mobile proxies (85% survival rate), randomize delays, and scrape during business hours. Never use datacenter IPs, and monitor for early ban signals. For most business users, tools like offer a much lower-risk alternative to DIY Python scripts.

3. Which Python library is best for LinkedIn scraping in 2026?
Playwright is now the default choice—faster, more reliable, and better at handling LinkedIn’s dynamic content than Selenium. For simple, public pages, Requests + Beautiful Soup still works, but for anything involving login or JavaScript, use Playwright.

4. How do I clean and format LinkedIn data after scraping?
Use pandas for table wrangling and deduplication, RapidFuzz for fuzzy matching, phonenumbers and email-validator for contact info, and export to Excel, Google Sheets, or your CRM using their respective Python libraries.

5. How does Thunderbit improve LinkedIn data extraction?
Thunderbit uses AI to suggest fields, handle subpage scraping, and export data directly to your favorite tools—no code required. It adapts to LinkedIn’s frequent layout changes, reducing maintenance and ban risk. Plus, it’s free to try and trusted by over 100,000 users worldwide.

Curious to see LinkedIn scraping in action—without the headaches? and start extracting data in just two clicks. Your sales team (and your LinkedIn account) will thank you.

Learn More

Shuai Guan
Shuai Guan
Co-founder/CEO @ Thunderbit. Passionate about cross section of AI and Automation. He's a big advocate of automation and loves making it more accessible to everyone. Beyond tech, he channels his creativity through a passion for photography, capturing stories one picture at a time.
Topics
Scrape linkedin with pythonLinkedin data extraction pythonPython linkedin scraperAutomate linkedin scraping
Table of Contents

Try Thunderbit

Scrape leads & other data in just 2-clicks. Powered by AI.

Get Thunderbit It's free
Extract Data using AI
Easily transfer data to Google Sheets, Airtable, or Notion
PRODUCT HUNT#1 Product of the Week