Selenium web scraping is still one of the most dependable ways to extract data from dynamic, JavaScript-heavy websites. In 2025, it's smoother and faster than ever.
Selenium is a browser automation toolkit with bindings for all major programming languages, including Python, which we'll focus on here. It talks to browsers through the WebDriver protocol, giving you control over Chrome, Firefox, Safari, or even remote setups. Originally built for testing, Selenium has grown into a full automation tool that can click, type, scroll, and extract data just like a real user.
Its main advantage? It runs JavaScript. Static scrapers only see the raw HTML, missing data rendered after the page loads. Selenium executes scripts, scrolls the page, fills forms, and waits for elements to appear, letting you capture the data that's otherwise hidden behind client-side rendering.
In this guide, we'll go step-by-step through a modern Selenium scraping setup, explore when it's the right tool, and see how you can combine it with ScrapingBee for speed, reliability, and automatic proxy handling.

Quick answer (TL;DR)
Here's a simple Selenium web scraping script that runs headless, opens a page, grabs some text, and saves a screenshot. A full mini-workflow in one go.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
def main():
opts = Options()
opts.add_argument("--headless") # run without GUI
driver = webdriver.Chrome(options=opts)
driver.get("https://example.com")
# Extract title and save screenshot
print(driver.title)
driver.save_screenshot("page.png")
# Example element extraction
links = driver.find_elements(By.TAG_NAME, "a")
for link in links[:5]:
text = link.text.strip()
href = link.get_attribute("href")
print(f"{text}: {href}")
driver.quit()
if __name__ == "__main__":
main()
That's basically it: install Selenium, launch Chrome headless, load a page, pull out elements, and snap a screenshot. You've got the foundation for any real-world scraper right there.
If you like command-line snippets, check out the ScrapingBee Curl Converter to turn any cURL request into Python instantly.
Installing Selenium and setting up WebDriver in 2025
Selenium's still the go-to when you actually need to see a website to handle JavaScript, click stuff, wait for popups, and deal with everything static scrapers can't touch. The good news? Setting it up in 2025 is way smoother than before. No more hunting for ChromeDriver binaries or juggling PATH variables like it's 2018.
Installing Selenium and starting a project
You've got two clean options. Pick whichever fits your workflow; both get you ready for Selenium web scraping in minutes.
Option A — pip (classic)
# 1 - Create and activate a virtual env (recommended)
python -m venv .venv
# macOS/Linux
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
# 2 - Upgrade basics and install selenium
python -m pip install --upgrade pip
pip install --upgrade selenium
Option B — uv (fast and modern)
# 1 - Initialize a project (creates pyproject.toml)
uv init selenium-scraper
cd selenium-scraper
# 2 - Add selenium as a dependency
uv add selenium
Now just create a main.py file in your project root. That's where you'll drop the code examples from this guide.
Bonus for 2025: Selenium now includes the Selenium Manager, which automatically downloads and manages the right browser drivers for you, so no more manual ChromeDriver or GeckoDriver setup. Please note that you still need a real browser installed on your PC (Chrome or Firefox, just download from the official website).
Launching Chrome or Firefox with WebDriver
Now let's see how to launch Selenium with Chrome or Firefox.
Chrome (modern headless mode)
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument("--headless") # use the modern headless mode
opts.add_argument("--no-sandbox") # handy for CI or Docker
opts.add_argument("--disable-dev-shm-usage") # avoids /dev/shm issues in containers
driver = webdriver.Chrome(options=opts) # Selenium Manager grabs the right driver
driver.get("https://example.com")
print(driver.title)
driver.quit()
The --headless flag runs Chrome in its newer headless mode, introduced in Chrome 109+. It renders pages almost exactly like full Chrome (fonts, CSS, layout), making it more reliable for Selenium web scraping on dynamic sites.
Firefox (still fully supported via GeckoDriver)
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
opts = Options()
opts.add_argument("-headless")
driver = webdriver.Firefox(options=opts) # driver auto-managed too
driver.get("https://example.com")
print(driver.title)
driver.quit()
If you've scraped sites before 2023, you probably remember the painful chromedriver download dance: unzipping, PATH edits, version mismatches, and the rest. That's over!
Selenium Manager (built into Selenium since 4.6) automatically finds and installs the right driver version for your browser. No manual setup, no PATH tweaks, no update headaches. The only time you'll need to handle it manually is if you're using something exotic like Brave or a Chromium snapshot, or if your network blocks external downloads. For everyone else: one install, zero config. Yay!
Verifying your setup and fixing common issues
Time for a sanity check. Let's make sure everything actually runs and cover the usual "why won't it launch?" moments.
Pick one of the scripts from above, drop it into main.py, and run it:
python main.py
# With uv
uv run python main.py
If you see "Example Domain" printed in your terminal, you're good to go.
Common issues (and quick fixes)
Browser not found – Selenium's there, but no Chrome or Firefox is installed.
- macOS and Windows: just install Chrome or Firefox normally.
- Debian/Ubuntu servers:
sudo apt-get update
# Debian/Ubuntu (Chromium from repo)
sudo apt-get install -y chromium
# or Google Chrome (official repo)
# wget -qO- https://dl.google.com/linux/linux_signing_key.pub | sudo gpg --dearmor -o /usr/share/keyrings/google-linux.gpg
# echo "deb [arch=amd64 signed-by=/usr/share/keyrings/google-linux.gpg] http://dl.google.com/linux/chrome/deb/ stable main" | sudo tee /etc/apt/sources.list.d/google-chrome.list
# sudo apt-get update && sudo apt-get install -y google-chrome-stable
# Firefox (Debian)
sudo apt-get install -y firefox-esr
Timed out waiting for driver or version mismatch – usually old cached drivers.
- Clear the Selenium Manager cache or upgrade Selenium:
pip install -U selenium
# or
uv add --upgrade selenium
Crashes in Docker/CI – add --no-sandbox and --disable-dev-shm-usage.
- You can also increase shared memory (
--shm-size=2g) if you control the container config.
Headless rendering quirks – some sites behave differently in headless mode.
- Try running with the window visible (remove
--headless) for debugging, or compare with Firefox.
TLS/certificate warnings (corporate networks) – proxies sometimes break HTTPS.
- Fix system certs or use a custom browser profile.
- As a last resort, disable cert checks with flags only for debugging (never in production).
Launching your first Selenium script
Now that everything's installed, it's time to see Selenium web scraping in action. The first move is simple: open a browser, visit a page, and maybe grab a screenshot.
Opening a webpage with driver.get()
Here's the smallest possible script that does something real:
from selenium import webdriver
def main():
driver = webdriver.Firefox() # or use Chrome instead
driver.get("https://www.scrapingbee.com/")
print(driver.title)
driver.quit()
if __name__ == "__main__":
main()
driver.get() tells Selenium to open the given URL exactly like typing it into your own browser. It's the starting point for everything else: clicks, waits, extractions, and full scraping workflows.
Run this script and you should see a browser window pop up, then "ScrapingBee – The Best Web Scraping API" printed in your terminal. That's your first successful Selenium run. Nice!
Running a browser in headless mode with Options()
Headless mode is your best friend for automation. It runs the browser without opening a window, which makes it perfect for servers, CI pipelines, or background jobs.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def main():
opts = Options()
opts.add_argument("--headless") # modern headless mode (Chrome 109+)
driver = webdriver.Chrome(options=opts)
driver.get("https://www.scrapingbee.com/")
print(driver.title)
driver.quit()
if __name__ == "__main__":
main()
This setup loads full pages (including JavaScript) without a visible window. The --headless flag tells Chrome to use its new headless mode, introduced in Chrome 109+. It renders pages almost identically to the full browser: better fonts, CSS, and layout accuracy.
If you're using Firefox, its -headless flag still works fine, but Chrome's new mode gives you closer-to-real results and fewer visual differences when scraping modern, JS-heavy sites.
When you run this script, nothing pops up on screen, but you'll still see the same terminal output proving the browser worked invisibly behind the scenes.
Saving a screenshot with driver.save_screenshot()
Sometimes you just want proof that the page loaded and rendered right. Maybe to debug a layout, confirm a login, or verify what your scraper actually "saw". That's just one line:
driver.save_screenshot("page.png")
Drop it right after driver.get(), and Selenium will save a PNG of the current browser view. It's super handy for checking dynamic pages, tracking visual changes, or keeping a quick record during Selenium web scraping runs.
Locating elements with XPath, CSS, and ID
Your scraping lives or dies by your selectors. Good locators keep your Selenium web scraping scripts stable when the frontend inevitably changes. The golden rules in 2025:
- Prefer IDs or
data-*attributes whenever possible — they're fast, unique, and rarely renamed. Having said that, websites tend to change very often so the script that used to work a week ago might break today due to updated layout. - Avoid fragile chains of utility classes. Many sites generate gibberish class names (e.g.
._a1b2c3,css-19kzrtu, hashed Tailwind builds) that change on every deploy — don't use those. - Use CSS selectors for speed and clarity (
.class,a[href="/pricing/"], etc.). - Use XPath when you need more complex matching — text search, relative paths, or parent/child traversal that CSS can't express cleanly.
- Test your selectors in DevTools before coding them. If they're flaky there, they'll break in Selenium too.
For quick patterns you'll actually use in real scraping, keep this close: XPath/CSS Cheat Sheet.
Using find_element() vs find_elements()
In Selenium, how you find elements makes a big difference. Both methods work, but they behave slightly differently:
find_element()returns the first matching element or throws aNoSuchElementExceptionif nothing is found. Use it when the element must exist (like a login button or search field).find_elements()returns a list of matches, possibly empty if nothing is found. Use it for optional elements or collections (like product cards, menu links, etc.).
A good pattern is to combine find_elements() with a length check to avoid exceptions when something isn't guaranteed to appear.
Here's an example that opens the ScrapingBee homepage and grabs all links from the top navigation bar:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
def main():
opts = Options()
opts.add_argument("--headless")
driver = webdriver.Chrome(options=opts)
driver.get("https://www.scrapingbee.com/")
print(driver.title)
links = driver.find_elements(By.CSS_SELECTOR, ".navbar-wrap nav a")
if links:
print(f"Found {len(links)} nav links:")
for link in links:
print("-", link.text.strip())
driver.quit()
if __name__ == "__main__":
main()
Using find_elements() here avoids crashes if the selector doesn't match anything: you'll just get an empty list.
Supported locator strategies
Both find_element() and find_elements() rely on the By class to tell Selenium how to find elements. Each strategy looks for elements differently: by ID, class, CSS selector, XPath, or even link text. Knowing which one to use and when makes your Selenium web scraping scripts cleaner, faster, and more resilient.
Here's a quick cheat sheet of the most useful locators:
| Locator Type | What It Does | Example DOM | Example Code |
|---|---|---|---|
By.ID | Finds an element by its unique HTML id. Fastest and most reliable. | <div id="myID"> | driver.find_element(By.ID, "myID") |
By.NAME | Targets elements by their name attribute — great for form fields. | <input name="email"> | driver.find_element(By.NAME, "email") |
By.XPATH | Uses XPath syntax to navigate the DOM — powerful but verbose. | <span>My <a>Link</a></span> | driver.find_element(By.XPATH, "//span/a") |
By.LINK_TEXT | Finds <a> elements that exactly match given link text. | <a>My Link</a> | driver.find_element(By.LINK_TEXT, "My Link") |
By.PARTIAL_LINK_TEXT | Matches part of a link's text — handy when the text changes slightly. | <a>My Link</a> | driver.find_element(By.PARTIAL_LINK_TEXT, "Link") |
By.TAG_NAME | Selects elements by tag (like h1, p, or img). | <h1>Hello</h1> | driver.find_element(By.TAG_NAME, "h1") |
By.CLASS_NAME | Finds elements by class name — avoid if classes overlap too much. | <div class="button primary"> | driver.find_element(By.CLASS_NAME, "button") |
By.CSS_SELECTOR | The most flexible option — uses standard CSS syntax. | <span><a>Link</a></span> | driver.find_element(By.CSS_SELECTOR, "span > a") |
💡 Pro tip: Stick with
By.IDorBy.CSS_SELECTORwhenever possible — they're fast, readable, and less likely to break. Save XPath for when you really need text matching or deep DOM traversal.
By.ID, By.CLASS_NAME, and By.XPATH usage examples
Let's put the locators into action. Before you can find anything on a page, make sure you import the By helper:
from selenium.webdriver.common.by import By
The By class tells Selenium how to search — by ID, class name, CSS selector, XPath, tag name, and so on. It's what gives find_element() and find_elements() their flexibility.
Here's a working example that loads ScrapingBee and finds a few elements in different ways:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
def main():
opts = Options()
opts.add_argument("--headless")
driver = webdriver.Chrome(options=opts)
driver.get("https://www.scrapingbee.com/")
# ID (fastest) – use whenever available
countdown = driver.find_element(By.ID, "countdown")
# CSS – readable and stable (great for href or attribute-based targeting)
pricing = driver.find_element(By.CSS_SELECTOR, 'nav a[href="/pricing/"]')
faq = driver.find_element(By.CSS_SELECTOR, 'nav a[href="/faq/"]')
blog = driver.find_element(By.CSS_SELECTOR, 'nav a[href="/blog/"]')
# External absolute links (login/signup)
login = driver.find_element(By.CSS_SELECTOR, 'a[href="https://app.scrapingbee.com/account/login"]')
signup = driver.find_element(By.CSS_SELECTOR, 'a[href="https://app.scrapingbee.com/account/register"]')
# XPath – best for text matches or messy class-heavy layouts
features = driver.find_element(By.XPATH, '//nav//a[normalize-space()="Features"]')
developers = driver.find_element(By.XPATH, '//nav//a[normalize-space()="Developers"]')
# Multiple elements – grab all top-level nav links
top_links = driver.find_elements(By.CSS_SELECTOR, "nav > ul li > a")
for link in top_links:
print(link.text.strip(), "→", link.get_attribute("href"))
driver.quit()
if __name__ == "__main__":
main()
Breaking it down
find_element() always takes two arguments:
- The search strategy — like
By.ID,By.CSS_SELECTOR, orBy.XPATH. - The query string — the actual selector, e.g.
'nav a[href="/pricing/"]'.
So this line:
pricing = driver.find_element(By.CSS_SELECTOR, 'nav a[href="/pricing/"]')
Means: "Find the first <a> tag inside a <nav> element that has an href attribute exactly equal to /pricing/."
That's a CSS selector, using the same syntax you'd use in DevTools or a stylesheet. CSS selectors are fast, human-readable, and perfect for most Selenium web scraping tasks.
More examples can be found in the Using CSS Selectors for Web Scraping tutorial.
By.ID is the simplest and fastest option:
countdown = driver.find_element(By.ID, "countdown")
This says: "Find the element with id="countdown"."
It's blazing fast because IDs are supposed to be unique in the DOM (well yeah... supposed to be).
By.XPATH works differently. It uses an XML-style query language that describes where an element sits in the document tree:
features = driver.find_element(By.XPATH, '//nav//a[normalize-space()="Features"]')
This means: "Find an <a> tag anywhere inside a <nav> that has visible text equal to ‘Features', ignoring extra spaces."
Here, // means "search anywhere below this node," and normalize-space() trims weird whitespace. XPath can be slower than CSS, but it's unbeatable for matching text or walking complex nested layouts.
find_elements() works the same way but returns a list instead of a single element:
top_links = driver.find_elements(By.CSS_SELECTOR, "nav > ul li > a")
Translation: "Find all <a> tags that live inside a <li> inside a <ul> directly under a <nav>."
You can loop through them, grab text and href attributes, and process them however you want.
💡 Pro tip: always test your selector in DevTools first.
$$('selector')for CSS or$x('xpath')for XPath.
Inspecting elements with browser DevTools
Before you write a single find_element() call, you need to know what to target. The easiest way to do that is with your browser's built-in DevTools.
Open DevTools
- Chrome / Edge:
Ctrl + Shift + C(Windows) orCmd + Option + C(macOS) - Firefox:
Ctrl + Shift + IorCmd + Option + IThis opens the Elements panel, where you can inspect and explore the live DOM.
- Chrome / Edge:
Hover and copy selectors
- Right-click the element, then press Inspect
- In the Elements tab, right-click the highlighted line → Copy → Copy selector (for CSS) or Copy → Copy XPath (for XPath) These give you quick, ready-to-use starting points for your Selenium locators.
Test selectors in the Console
- CSS: type
$$('nav a[href="/pricing/"]') - XPath: type
$x('//nav//a[normalize-space()="Pricing"]')Both commands return matching elements instantly, so you can confirm your selector logic before using it in code.
- CSS: type
Simplify and stabilize Skip long auto-generated selectors like
div:nth-child(7) > span. Instead, aim for meaningful anchors — IDs,hrefattributes, or visible text. If it's solid in DevTools, it'll be solid in Selenium too.
💡 Pro tip: If you control the frontend, add
data-testidor similardata-*attributes to key elements. They make scraping and testing far more reliable and less likely to break when classes or layout change.
Interacting with web elements for data extraction
Finding elements is step one. Now it's time to do something with them. For Selenium web scraping, these are the essential moves: getting text, grabbing links, and triggering JavaScript-driven updates (like pagination or "Load More" buttons).
Below are the interactions you'll use most often in scraping scripts.
Selenium WebElement
A WebElement in Selenium represents a single HTML element on the page: a link, button, input, paragraph, anything. Once you've located it, you can interact with it exactly as a human would. In Selenium web scraping, this is the bridge between finding and doing.
Here are the key actions you'll use with a WebElement:
- Read text
Use
element.textto get the visible content. Ideal for titles, prices, labels, or descriptions. - Click elements Trigger user actions like clicking buttons, links, or submitting forms with:
element.click()
- Get attributes Extract any attribute value with element.get_attribute("attr"), for example:
element.get_attribute("href")
element.get_attribute("class")
- Type into fields Send keystrokes to inputs or textareas using:
element.send_keys("your_text")
Great for logins, search boxes, or filling out forms dynamically.
- Check visibility
element.is_displayed()returnsTrueonly if the element is actually visible on the page. Useful for ignoring hidden traps, debugging layout quirks, or making sure dynamic content loaded properly.
Extracting text with element.text
The .text property gives you the visible content of an element: exactly what a real user would see on the page.
Here's a quick example using elements from earlier:
countdown = driver.find_element(By.ID, "countdown")
print(countdown.text)
pricing = driver.find_element(By.CSS_SELECTOR, 'nav a[href="/pricing/"]')
print(pricing.text)
This works perfectly for grabbing product names, article titles, or any other visible copy. If the text changes dynamically (after an AJAX call, for example), call .text again later; or better yet, use explicit waits to make sure the content has finished updating before you read it.
Clicking buttons with element.click()
Need to trigger a "Next Page," "Show More," or "Accept Cookies" button? Just click it! Selenium runs the site's actual JavaScript, so pagination, modals, or pop-ups all behave like in a normal browser.
Here's an example that clicks a "Register" button, waits for the page to load, and then takes a screenshot:
register_btn = driver.find_element(By.CSS_SELECTOR, '.announcement-banner .announcement-cta')
register_btn.click()
driver.save_screenshot("register.png")
That one line (.click()) is enough to fire any on-page event: JavaScript handlers, AJAX calls, or full page navigations. In headless mode, clicks behave exactly the same as in a visible browser, which makes it ideal for automated Selenium web scraping pipelines.
💡 Pro tip: For pages that load new content dynamically after a click, add an explicit wait before taking your next action. It'll save you from chasing race conditions later.
Getting attributes with get_attribute()
When you need URLs, image sources, or metadata, use get_attribute().
Here's an example that prints out link text and their href values:
links = driver.find_elements(By.CSS_SELECTOR, "a")
for link in links:
href = link.get_attribute("href")
text = link.text.strip()
print(f"{text}: {href}")
You can extract anything: href, src, alt, title, or even custom data-* attributes. That's how you collect the raw material for your Selenium web scraping pipeline.
Logging into a website with Selenium
Sometimes your scraper needs to reach data that's behind a login: dashboards, saved searches, account pages. That's where Selenium web scraping really shines: it can log in just like a real user! No API tokens, no cookie hacks, just fill the form, click the button, and you're in.
Let's walk through a quick example using Hacker News.
Step-by-step login flow
- Go to the login page – use
driver.get(). - Enter your credentials – locate input fields and call
.send_keys(). - Submit the form – find the button and click it with
.click().
Here's what that looks like in code:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
def main():
opts = Options()
opts.add_argument("--headless")
driver = webdriver.Chrome(options=opts)
driver.get("https://news.ycombinator.com/login")
# Enter credentials (replace with your own)
# If IDs or name exist on the fields, prefer them over type-only XPaths
driver.find_element(By.XPATH, '//input[@type="text"]').send_keys("YOUR_USERNAME")
driver.find_element(By.XPATH, '//input[@type="password"]').send_keys("YOUR_PASSWORD")
# Submit login form
driver.find_element(By.XPATH, '//input[@value="login"]').click()
# Verify login success
try:
driver.find_element(By.LINK_TEXT, "logout")
print("✅ Logged in successfully")
except NoSuchElementException:
print("❌ Login failed")
driver.quit()
if __name__ == "__main__":
main()
Post-login verification
After you submit the form, it's smart to double-check whether the login actually worked. Look for something that confirms success (like a "logout" link or a username element) and handle failures cleanly.
- Success: page contains a logout button, profile name, or user menu.
- Failure: page shows an error like "invalid password" or "user not found."
That's why the example uses a try/except block with NoSuchElementException as it lets your script respond gracefully whether login succeeds or fails.
Check our Practical XPath for Web Scraping tutorial to learn more.
Tips for real-world scraping
- Use explicit waits: after logging in, wait for the dashboard or target elements to fully load before parsing.
WebDriverWaitis your best friend here. - Never hardcode credentials: store usernames and passwords in environment variables or a secrets manager instead.
- Reuse sessions: once logged in, you can save cookies and reuse them in later Selenium runs (or even with a faster HTTP client) to skip the login step.
Handling JavaScript-rendered content and infinite scroll
A big part of Selenium web scraping in 2025 is handling modern frontends: React, Vue, Next.js, Angular, and everything in between. These sites don't just spit out all the data at once. Content loads dynamically, scrolls infinitely, or only appears after JavaScript finishes doing its thing.
The good news: Selenium speaks JavaScript fluently. You can wait for elements, scroll to load more, or even run custom scripts inside the browser context. Here's how.
Waiting for elements with WebDriverWait and expected_conditions
Modern apps build their pages after load using JavaScript and async calls. That means your scraper might run too early — before the data even exists in the DOM. If you call .find_element() right away, Selenium throws a NoSuchElementException because the element isn't there yet.
There are two main ways to handle this:
time.sleep()(the brute-force way) You can just pause the script for a few seconds before scraping, but it's hit or miss. Sometimes it's too short, sometimes way too long. Network speed, CPU load, or slow third-party scripts can all throw it off and you'll either miss elements or waste time waiting.WebDriverWait(the smart way) This waits dynamically until a condition is met, like an element appearing, text loading, or a button becoming clickable. Selenium checks every half second until the condition passes or the timeout expires.
Here's what a clean version looks like in Selenium web scraping:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 10)
element = wait.until(
EC.presence_of_element_located((By.CSS_SELECTOR, "div.product-item"))
)
print(element.text)
This pattern prevents race conditions and keeps your scraper fast. Selenium waits just long enough for the page to render, no more, no less.
💡 Pro tip: Use
presence_of_element_located()for content that just needs to exist, andelement_to_be_clickable()for buttons or interactive items you plan to click.
Using expected conditions in Selenium
With JavaScript-heavy sites, timing is everything. Pages load asynchronously, popups appear late, and buttons aren't always clickable right away. That's why WebDriverWait and expected conditions are core tools in any Selenium web scraping workflow.
Instead of hardcoding time.sleep() calls, you can wait for specific conditions, like when an element becomes visible, clickable, or when certain text appears.
Common expected conditions
alert_is_present— waits for a browser alert to appear.element_to_be_clickable— ensures the element exists and is ready for clicks.text_to_be_present_in_element— pauses until the target text shows up.visibility_of— ensures an element is both in the DOM and visible to the user.
| Method | Description | Example Usage |
|---|---|---|
alert_is_present() | Waits for an alert popup to show up. | WebDriverWait(driver, 10).until(EC.alert_is_present()) |
element_to_be_clickable() | Waits until the element is clickable (enabled and visible). | WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.ID, "submit"))) |
text_to_be_present_in_element() | Waits until specific text appears in an element. | WebDriverWait(driver, 10).until(EC.text_to_be_present_in_element((By.ID, "status"), "Loaded")) |
visibility_of_element_located() | Waits for an element to be both present and visible. | WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, "visibleElement"))) |
Because you're running a real browser, Selenium also lets you execute custom JavaScript when needed:
driver.execute_script("alert('Scraping complete!')")
Scrolling with execute_script() and detecting new content
Many modern sites don't have classic pagination as they load more results as you scroll. Think social feeds, search pages, or e-commerce catalogs. For Selenium web scraping, scrolling is just another interaction you can automate. The browser runs real JavaScript, so you can scroll exactly like a user and keep fetching more data.
Here's the classic infinite-scroll pattern:
import time
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(2) # let new content load
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
That loop scrolls to the bottom, waits a bit for new content to load, and checks if the page height changed. If not, it stops meaning you've reached the end.
This approach works fine for simple infinite scrolls, but you can make it even better with a few tweaks:
- Use
document.documentElementinstead ofbody— it's more reliable across browsers. - Add a maximum number of scrolls — avoids infinite loops if a site keeps loading empty placeholders.
- Use
WebDriverWaitfor new elements — instead oftime.sleep(), wait until new items appear (for example, when the number of.product-cardelements increases).
💡 Pro tip: some sites use a "Load More" button instead of continuous scroll. In that case, click it in a loop until it disappears or no new results show up.
Using execute_async_script() for asynchronous JavaScript
Some sites trigger background actions (API requests, lazy-loaded widgets, or smooth animations) that don't finish right away. With execute_async_script(), you can hook directly into that async behavior instead of just waiting and hoping it's done.
Here's a simple example:
driver.set_script_timeout(15)
result = driver.execute_async_script("""
const done = arguments[0];
setTimeout(() => {
done(document.querySelectorAll('div.product-item').length);
}, 3000);
""")
print(f"Products loaded: {result}")
This runs JavaScript inside the browser, waits for it to finish, and then passes the result back to Python. The done() callback signals completion; Selenium won't move on until it's called or the timeout expires.
Combining Selenium with BeautifulSoup for efficient parsing
Selenium is perfect for handling dynamic, JavaScript-heavy sites, but when it comes to parsing and data extraction, it's not the fastest tool in the shed. That's where BeautifulSoup comes in. Once Selenium has finished rendering the page, you can pass the final HTML to BeautifulSoup for fast, lightweight parsing.
This setup gives you the best of both worlds:
- Selenium takes care of browser automation and JavaScript rendering.
- BeautifulSoup handles structured data extraction from the fully rendered DOM.
It's one of the most efficient and common patterns in modern Selenium web scraping workflows.
👉 For a deeper dive into parsing itself, check out this guide: BeautifulSoup web scraping.
Extracting page_source and parsing with BeautifulSoup
After the page has loaded and dynamic elements have rendered, grab the full HTML source with driver.page_source and feed it into BeautifulSoup:
from bs4 import BeautifulSoup
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
title = soup.find("h1").text
print(title)
That one line turns Selenium's live DOM into a fully navigable BeautifulSoup object, ready for .find() and .find_all() searches. From there, you can extract text, links, images, or any structured data.
Using soup.find_all() to extract structured data
BeautifulSoup shines when it comes to structured data extraction. Once Selenium has done its job rendering the page, BS4 can quickly sift through the HTML and pull out repeating patterns like product cards, listings, or articles.
Here's a simple example:
products = soup.find_all("div", class_="product-item")
for p in products:
name = p.find("h2").text.strip()
price = p.find("span", class_="price").text.strip()
print(name, price)
After rendering, BeautifulSoup runs much faster than Selenium's DOM methods making it ideal for processing dozens or hundreds of elements efficiently.
When to use Selenium vs BeautifulSoup
- Use Selenium when the page relies heavily on JavaScript or requires real user actions: logging in, clicking buttons, scrolling, or waiting for content to appear.
- Use BeautifulSoup when you already have the final HTML and just need to parse it quickly and cleanly.
In most real-world scraping setups, Selenium handles rendering and interaction, while BeautifulSoup handles data extraction. It's a clean split of responsibility that scales beautifully and keeps your code simple.
Combining Selenium and BeautifulSoup
Here's an example scraping news titles from Hacker News:
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get("https://news.ycombinator.com/")
# Grab the rendered HTML
html = driver.page_source
driver.quit()
# Parse with BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
# Extract titles
titles = soup.find_all("tr", class_="athing")
for t in titles:
link = t.find("span", class_="titleline").find("a")
print(link.get_text())
This workflow is both faster and cleaner than relying on Selenium alone. You let Selenium deal with the hard part and let BeautifulSoup handle the easy part: parsing and data extraction.
💡 Want to go deeper? Check out our full BeautifulSoup web scraping guide.
Avoiding detection: honeypots, CAPTCHAs, and headless browsing
Blocks are inevitable if you crawl aggressively. The goal is to reduce noise and avoid obvious red flags so your Selenium web scraping jobs run longer and cleaner. Do this ethically: respect robots.txt, the site's terms, and sensible rate limits. Below are practical tactics that actually help in real projects.
Detecting honeypots with is_displayed() (and a few extras)
Honeypots are invisible traps: inputs or links present in the HTML but hidden from real users via CSS (display:none, visibility:hidden, off-screen positioning), tiny size, or aria-hidden flags. A bot that blindly fills every field or clicks every link hands itself to anti-bot logic.
Selenium's is_displayed() is the first and easiest check: it returns True only for elements actually visible to the user. Combine it with a couple of other checks for extra safety (type="hidden", size, aria-hidden, or zero width/height).
Example pattern:
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
def safe_send_keys(el, value):
# basic visibility checks before interacting
if not el.is_displayed():
return False
if el.get_attribute("type") == "hidden":
return False
w, h = el.size.get("width", 0), el.size.get("height", 0)
if w == 0 or h == 0:
return False
if el.get_attribute("aria-hidden") in ("true", "True"):
return False
el.send_keys(value)
return True
# usage
try:
el = driver.find_element(By.ID, "custId")
if safe_send_keys(el, "12345"):
print("Interacted safely")
else:
print("Honeypot or hidden field — skipped")
except NoSuchElementException:
print("Field not present")
When iterating over multiple nodes, keep the same mentality:
for el in driver.find_elements(By.CSS_SELECTOR, "a.some-link"):
if not el.is_displayed():
continue
if el.size.get("width", 0) == 0:
continue
print(el.text.strip(), "→", el.get_attribute("href"))
Key takeaways: Staying clear of honeypots
Honeypots are one of the easiest traps to fall into when scraping interactively. They're invisible to users but loud to naive bots, so always test what you're about to click or type into.
Here's what actually keeps your Selenium scrapers safe and stealthy:
- Always check
is_displayed()before clicking or typing. If it's not visible, skip it. - Don't fill hidden inputs or auto-fill every form field. Only interact with elements a real user would touch.
- Ignore suspicious elements —
type="hidden",aria-hidden="true", or zero-size nodes. - Watch out for weird names — inputs like
qwerty_123ortrap_fieldusually scream honeypot. Legit ones have clear names (email,password,search, etc.). - Test in visible (non-headless) mode occasionally. Sometimes a "visible" DOM element is visually hidden with CSS tricks, and only a live run reveals that.
- Respect site structure and ethics. Interact only with real, visible elements and never brute-force or spam hidden forms.
A little visibility check goes a long way. That single is_displayed() call can mean the difference between a clean Selenium web scraping session and getting flagged as a bot.
Handling CAPTCHA: manual vs third-party services
CAPTCHAs exist for one reason — to stop bots. So don't treat them like bugs; treat them like speed bumps. When your Selenium web scraping run hits one, you've got three ways forward.
1. Manual solve (human in the loop)
Pause the scraper, screenshot the CAPTCHA, and have a human solve it. Perfect for small runs, debugging, or anything where you care more about accuracy than speed.
Pros: cheap, safe, reliable. Cons: slow, not scalable.
2. Third-party solver services
APIs like 2Captcha, Anti-Captcha, or CapMonster can solve reCAPTCHA or hCaptcha for you. You send them the challenge, wait a few seconds, and they return a token.
Pros: automatic, works at scale. Cons: costs money and means sending data to a third party. Know the risks and read the fine print.
If you go this route, handle retries, random failures, and per-solve fees. More info here: Bypass Captcha.
3. Avoid triggering CAPTCHAs altogether
The best CAPTCHA is the one you never see. Try this:
- Slow down and randomize your timing between requests.
- Rotate IPs or use realistic residential proxies.
- Set real user-agent strings and screen sizes.
- Reuse cookies and sessions instead of re-logging in every time.
- Don't spam endpoints or send 100 identical requests per second.
Sometimes these small tweaks are enough to stay under the radar.
TL;DR
- Low-volume jobs → manual solve.
- Big pipelines → third-party API (carefully).
- Long-term stability → prevent CAPTCHAs before they happen.
Disabling images and JavaScript to improve speed
Running a full browser in Selenium web scraping is awesome. You get real rendering, JS, screenshots, the works. But if you don't need all that visual stuff, it's just wasted horsepower. Turning off images (and sometimes JavaScript) can make your scraper way faster and lighter.
Why bother?
If you only need text, prices, or links, you don't need the browser loading every image, font, and animation. Disabling them saves bandwidth, CPU, and time, especially useful on VPS, CI, or when running hundreds of scrapers in parallel.
How to do it
Use browser Options to tweak performance. Chrome, for example, lets you block images and optionally JavaScript:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
# Block images, optionally JavaScript
prefs = {
"profile.managed_default_content_settings.images": 2, # block images
# "profile.managed_default_content_settings.javascript": 2, # (optional) block JS
}
opts.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(options=opts)
driver.get("https://example.com")
print(driver.title)
driver.quit()
This setup skips image downloads but still runs JavaScript; perfect balance for most scrapers. If you're targeting static sites, set "javascript": 2 to disable it completely.
Quick reality check: Many modern sites rely on JS to render content. If you turn it off, you might end up with an empty page. So use this trick when you're sure the data lives in plain HTML or you'll be scraping air.
Extra practical tips
- Use real User-Agent strings and avoid obvious flags like "HeadlessChrome."
- Add small random delays (0.5–2 seconds) between actions as tight, predictable loops look robotic.
- Rotate IPs or proxies if you're running at scale; quality residential or datacenter pools help avoid bans.
- Consider a managed scraping API (like ScrapingBee) when you need scale, reliability, or easy CAPTCHA handling. They take care of proxies, retries, and stealth automatically.
- Monitor responses for anomalies such as login pages, CAPTCHA prompts, or blank HTML, and handle them gracefully instead of crashing.
For more on headless setups and trade-offs, read: What is a headless browser — best solutions for web scraping at scale.
Scaling web scraping with Selenium Grid and proxies
Eventually, your Selenium web scraping setup will outgrow a single browser. Too many pages, too many waits, not enough hours, and you hit the wall. Scaling isn't just "open more tabs." You need a proper structure, smart proxy management, and clean parallelism. Here's how to do it without turning your setup into chaos.
When to Use Selenium Grid
Selenium Grid is the standard way to scale Selenium. It lets you run multiple browser sessions at once, either across your local machine, a cluster, or cloud containers. Perfect for big scrapes or testing across multiple browsers.
The idea is simple:
- Hub: the control center that manages and schedules sessions.
- Nodes: the worker browsers that execute your scraping jobs.
Instead of one lonely browser crawling 1,000 pages in sequence, you can launch 10 or 20 parallel sessions and finish in a fraction of the time. You can start local (Hub + a few Nodes in Docker) and later expand to a distributed or cloud setup. Same code, just more power.
Using Docker for Selenium Grid
Docker is the easiest way to spin up a scalable Selenium Grid. You can start small with a single, self-contained Chrome instance, or scale out with multiple nodes. To launch a quick standalone setup:
# simplest
docker run -d -p 4444:4444 selenium/standalone-chrome
# with live viewer at http://localhost:7900/?autoconnect=1&resize=scale
docker run -d -p 4444:4444 -p 7900:7900 selenium/standalone-chrome
That one line gives you a ready-to-use Selenium Grid with Chrome built in. You can connect to it from your Python script using the Remote WebDriver.
For larger setups, use the classic hub/node pattern:
selenium/hub
selenium/node-chrome
selenium/node-firefox
Each node runs in isolation, preventing memory leaks and browser conflicts. Containers are easy to restart, replace, or scale horizontally. Suitable for CI/CD pipelines, distributed scraping, or cloud environments.
Managing proxies for geo and scale
Once your Selenium web scraping workflow starts hitting hundreds or thousands of requests, proxies stop being optional. They help you:
- Avoid IP bans and throttling.
- Access region-specific content (geo-targeting).
- Keep sessions clean and harder to fingerprint.
You can add proxies manually through browser options or environment variables, but at scale, it's better to use rotating proxy APIs. These handle IP rotation, retries, and failover automatically without manual list management.
When to switch to a managed API
If managing Selenium Grid, proxies, and browser quirks starts eating your time, consider moving up a layer. Services like ScrapingBee handle all the messy parts for you:
- Headless browser rendering at scale.
- Built-in proxy rotation and rate limits.
- Automatic retries and error handling.
You just send a request and get clean HTML back: no Docker, no IP juggling, no maintenance.
For details on performance and pricing, check out the ScrapingBee Documentation and Pricing — a good next step if you're hitting scaling or reliability limits.
Start scaling with ScrapingBee today
Once your Selenium web scraping workflow is solid, the next step is cutting out the maintenance grind — browser updates, proxy bans, and flaky headless setups. That's exactly where ScrapingBee takes over.
ScrapingBee gives you managed JavaScript rendering, rotating proxies, and smart rate limiting. No need to run Selenium Grid, tweak Chrome flags, or babysit browser containers. You send an API call with your target URL, and get back the fully rendered HTML suitable for BeautifulSoup, Pandas, or any data processing pipeline.
If you're ready to scale scraping without scaling the pain, let ScrapingBee handle the heavy lifting. Try ScrapingBee today — faster, cleaner, and built for 2025-scale web scraping.
Conclusion
Selenium has come a long way. What used to be a clunky QA tool is now a stable, flexible engine for web scraping in 2025. With built-in driver management, reliable headless modes, and full JavaScript support, you can scrape just about any modern site confidently.
The key isn't just using Selenium; it's knowing when to hand things off. For smaller, dynamic sites, Selenium alone works great. For heavier workloads, pass rendered HTML to BeautifulSoup or use a managed API like ScrapingBee for rendering, proxy rotation, and rate limiting. That's how you move from "it works" to "it scales."

Build your workflow smart: start local, automate what matters, and outsource the overhead. In the end, great scraping isn't about more code, it's about fewer headaches, cleaner pipelines, and faster results.
Thank you for staying with me, and until next time.
Frequently asked questions
Is Selenium still the best choice for web scraping in 2025?
Yes. When you need real user actions (logins, clicks, infinite scroll, JS-heavy flows), Selenium is still the right tool.
For simple HTML fetches, requests + BeautifulSoup is faster. The common 2025 pattern is hybrid: use Selenium web scraping to render and interact, then hand the final HTML to BeautifulSoup for parsing, or use a managed renderer (ScrapingBee) for scale.
How can I avoid detection when using Selenium for web scraping?
Act like a regular user: randomize small delays, respect rate limits, rotate quality proxies, reuse sessions/cookies, and avoid interacting with hidden fields or weird form inputs. Use realistic user-agents and screen sizes, and test in non-headless mode to compare behavior. If you see frequent CAPTCHAs or blocks, consider a managed service or stricter proxy hygiene.
Quick visibility check example:
from selenium.webdriver.common.by import By
el = driver.find_element(By.CSS_SELECTOR, "a.some-link")
if not el.is_displayed():
# skip it — likely a honeypot or hidden element
pass
What about CAPTCHAs?
Treat CAPTCHAs as a signal, not a bug. Options: manual solve for low-volume work, third-party solvers for automated flows (cost/privacy trade-offs), or prevention (slow down, better proxies, realistic browser signals). Always log CAPTCHA occurrences and respect the target site's terms.
What are the advantages of combining Selenium with BeautifulSoup?
Selenium handles rendering and interaction, while BeautifulSoup handles fast parsing — a perfect split of duties. Selenium loads the page, executes JavaScript, and exposes the final HTML; BeautifulSoup then parses that HTML quickly and cleanly.
Example pattern:
html = driver.page_source
soup = BeautifulSoup(html, "html.parser")
items = [e.text for e in soup.select(".product-name")]
Result: Selenium gives you complete, rendered pages; BeautifulSoup extracts the data in milliseconds. It's faster, cleaner, and less error-prone than doing everything directly through Selenium.
How do I handle infinite scrolling websites with Selenium?
Use JavaScript scrolling in a loop until no new content loads. This triggers the site's real lazy-loading logic, just like a user would:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
wait = WebDriverWait(driver, 12)
prev = 0
while True:
driver.execute_script("window.scrollTo(0, document.documentElement.scrollHeight);")
# wait until more cards exist or we time out
try:
wait.until(lambda d: len(d.find_elements(By.CSS_SELECTOR, ".product-card")) > prev)
prev = len(driver.find_elements(By.CSS_SELECTOR, ".product-card"))
except Exception:
break
- Dynamic scrolling: Scrolls down and waits until new
.product-cardelements load, no fixed sleeps. document.documentElement: More reliable scroll target across browsers thandocument.body.WebDriverWait+lambda: Waits until the number of product cards increases, then continues.prevcounter: Tracks how many cards are currently loaded to detect when scrolling stops adding more.try/except: Breaks the loop cleanly when no new items appear (timeout reached).
What's the best way to locate elements when web scraping with Selenium?
Use stable, predictable selectors. IDs and data-* attributes are your best friends as they're fast and rarely change. CSS selectors are clean and readable, while XPath is perfect when you need text matching or complex DOM traversal.
Examples:
el = driver.find_element(By.ID, "countdown")
pricing = driver.find_element(By.CSS_SELECTOR, 'nav a[href="/pricing/"]')
features = driver.find_element(By.XPATH, '//nav//a[normalize-space()="Features"]')
Always test your selectors in DevTools first: $$('selector') for CSS and $x('xpath') for XPath. If it's flaky there, it'll be flaky in Selenium too.
Can Selenium scrape JavaScript-heavy websites better than Requests or BeautifulSoup?
Yes. Selenium runs a full browser engine, so it executes JavaScript exactly like a real user. That makes it perfect for SPAs and sites that load data dynamically after page load. Just remember: it's heavier and slower than requests-based scraping, so only use it when you actually need JavaScript rendering.
Is headless mode detectable by websites?
Sometimes. Some sites detect headless browsers through subtle fingerprinting: things like missing plugins, off-size viewports, or missing WebGL data. Modern Selenium (with --headless) hides most of these, but it's still smart to rotate proxies, add human-like delays, and keep browser settings realistic.
Can I run Selenium scrapers in the cloud?
Absolutely. Use Selenium Grid, Docker, or managed platforms like ScrapingBee. These handle browser scaling, proxy management, and CI/CD integration so you can run distributed scraping jobs without keeping local Chrome instances alive.
How can I handle cookie consent popups or modals automatically?
Search for buttons with text like "Accept" or "Got it" and click them right after page load. For recurring banners, store cookies between sessions or wrap the click in a try/except so your script doesn't crash if the popup's missing:
from selenium.common.exceptions import TimeoutException
try:
btn = WebDriverWait(driver, 5).until(
EC.element_to_be_clickable((By.XPATH, "//button[matches(., 'Accept|Agree|Got it', 'i')]"))
)
btn.click()
except TimeoutException:
pass

Ilya is an IT tutor and author, web developer, and ex-Microsoft/Cisco specialist. His primary programming languages are Ruby, JavaScript, Python, and Elixir. He enjoys coding, teaching people and learning new things. In his free time he writes educational posts, participates in OpenSource projects, tweets, goes in for sports and plays music.


