How to download an image with Python?

Maxine Meurer | 14 November 2025 | 23 min read

Table of contents

If you've ever tried to Python download image from URL, you already know the theory looks stupidly simple: call requests.get() and boom — image saved. Except that's not how the real world usually works. Sites block bots, images hide behind JavaScript, redirects go in circles, and bulk downloads crumble if you're not streaming, retrying, or handling files properly.

This guide takes the actually useful route: how to stream images safely, name files without creating a junkyard, avoid duplicates, scale to thousands of downloads, and bring in ScrapingBee when a site decides to get spicy. By the end, you'll have a toolkit that works on real websites, not toy examples.

How to download an image with Python?

Quick answer: Download an image in Python, fast

To Python download image from URL, you grab it with requests.get(), check for errors, and dump the bytes to a file. This is the basic pattern behind every Python requests download image or Python save image from URL trick — whether you're pulling a random JPG or wiring it into a bigger Python web scraping pipeline.

The baseline (Requests)

import requests

url = "https://example.com/image.jpg"

# Fetch the image from the URL
resp = requests.get(url)
resp.raise_for_status()  # Make sure the request didn't fail

# Save the image bytes to disk
# (fetch the whole image into memory at once)
with open("image.jpg", "wb") as f:
    f.write(resp.content)

# Alternative approach: streaming
# (useful for larger images)
# with requests.get(url, stream=True) as resp:
#     resp.raise_for_status()

#     with open("large-image.jpg", "wb") as f:
#         for chunk in resp.iter_content(chunk_size=8192):
#             if chunk:
#                 f.write(chunk)

What the basic version does:

Grabs the entire image into memory in one go
Saves it straight to disk
Totally fine for small or medium-sized files or quick one-off scripts

And if your image is large, it would be a better idea to stream because:

You don't load the whole file into RAM at once
It's safer when downloading hundreds or thousands of images
Slow servers won't choke your script with giant responses
You write chunks as they arrive, which keeps things smooth and predictable

The ScrapingBee version (more reliable on real sites)

import requests

SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"

params = {
    "api_key": "YOUR_API_KEY",
    "url": "https://example.com/image.jpg",
}

# Basic: fetch the rendered/processed image through ScrapingBee
resp = requests.get(SB_ENDPOINT, params=params)
resp.raise_for_status()

with open("image.jpg", "wb") as f:
    f.write(resp.content)

# Alternative: stream so we don't load giant images into memory at once
# with requests.get(SB_ENDPOINT, params=params, stream=True) as resp:
#     resp.raise_for_status()

#     with open("large-image.jpg", "wb") as f:
#         for chunk in resp.iter_content(chunk_size=8192):
#             if chunk:
#                 f.write(chunk)

What the basic version does:

Calls ScrapingBee and saves the bytes — simple and fine for small files.
The ScrapingBee call behaves just like a normal requests.get(), except it handles proxies, bot checks, and JavaScript for you.

And if your image is large, it would be a better idea to stream because it keeps memory usage low and avoids loading the whole file at once.

Prerequisites

Before we start slurping images off the internet like civilized devs, let's make sure your setup isn't held together by duct tape and hope.

You'll need just three things:

Python 3 — any reasonably recent version
A text editor — VS Code, Vim, PyCharm; whatever works for you
Optional: uv — a stupidly fast Python package manager that feels like pip after hitting the gym

Now let's check your Python installation. Pop open your terminal and run:

python3 --version

# or

python --version

If you see something like Python 3.10.12, you're good to go.

If you want to roll with uv, here's the quickest way to spin up a fresh project with requests and BeautifulSoup already installed:

uv init image-downloader
cd image-downloader
uv add requests
uv add beautifulsoup4

The newly created project will contain a main.py file — our code will go there. To execute it, simply run:

uv run python main.py

That's it!

Using the Requests package

Alright, let's get into the real work: Python download image, the right way. Most devs start with requests.get(url).content, and yeah, that works... until you try pulling down a 200MB image and your RAM starts making death noises. (Well, I'm exaggerating a bit but you got the idea.)

So here's the rule of the land: if the file is even remotely large, always use stream=True and iterate with iter_content(). This is the difference between downloading gracefully and detonating your laptop.

response.content loads the entire file into memory. It's great for tiny PNGs but not so good for larger files.
iter_content() with stream=True downloads the file in chunks. You stay memory-friendly, efficient, and less on fire.

If you're writing tutorials, docs, or production code, the chunked pattern is the one you use. No exceptions.

Downloading an image with Requests (streaming)

Here's the standard pattern you should reach for when doing Python requests download image or save image Python use cases:

import requests

url = "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg"
headers = {"User-Agent": "Mozilla/5.0"}

with requests.get(url, stream=True, headers=headers) as resp:
    resp.raise_for_status()
    with open("large.jpg", "wb") as f:
        for chunk in resp.iter_content(chunk_size=8192):
            f.write(chunk)

This accomplishes three critical things:

Uses a real User-Agent (yes, some sites absolutely care).
Streams the response to avoid loading the entire file into RAM.
Writes the image chunk-by-chunk so Python stays calm and functional.

Downloading images through ScrapingBee

When plain Requests starts throwing fits — 403 errors, weird JavaScript redirects, or "region not allowed" messages — ScrapingBee is the next step. Instead of fighting bot checks and browser-only behavior yourself, you let ScrapingBee proxy the request on your behalf.

The flow is simple: you send your api_key and the target url to ScrapingBee's API, and it returns the raw bytes just like a normal requests.get() call. Your download logic stays the same, but the heavy lifting happens on ScrapingBee's side.

You can sign up for free and get 1,000 credits, which is plenty for testing image downloads.

ScrapingBee can also forward headers (Accept, Referer, etc.), run JavaScript when a site requires it, and route traffic through premium proxies for geo-sensitive content.

Below are three common recipes.

1. Direct image download (straightforward case)

If the image URL is direct and there's no special protection, this is all you need:

import requests

SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"

params = {
    "api_key": "YOUR_API_KEY",
    "url": "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg"
}

with requests.get(SB_ENDPOINT, params=params, stream=True) as resp:
    resp.raise_for_status()
    with open("bee_image.jpg", "wb") as f:
        for chunk in resp.iter_content(chunk_size=8192):
            f.write(chunk)

What this does:

Sends the image request through ScrapingBee (so JS, bot checks, and cookies are handled for you)
Streams the file in chunks to avoid loading a multi-MB image into memory
Writes each chunk directly to bee_image.jpg until the download finishes

2. Image behind JavaScript

Some pages generate or reveal the real image URL only after running JavaScript. ScrapingBee can handle that by enabling JavaScript rendering:

import requests

SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"

params = {
    "api_key": "YOUR_API_KEY",
    "url": "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg",
    "render_js": "true", # Enable JS rendering
}

with requests.get(SB_ENDPOINT, params=params, stream=True) as resp:
    resp.raise_for_status()
    with open("js_image.jpg", "wb") as f:
        for chunk in resp.iter_content(chunk_size=8192):
            f.write(chunk)

3. Geo-blocked / Heavily protected image

If the target site only serves the file to certain IP regions or uses tougher anti-bot rules, enable the premium proxy layer:

import requests

SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"

params = {
    "api_key": "YOUR_API_KEY",
    "url": "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg",
    "premium_proxy": "true",
    "country_code": "us",
}

with requests.get(SB_ENDPOINT, params=params, stream=True) as resp:
    resp.raise_for_status()
    with open("geo_image.jpg", "wb") as f:
        for chunk in resp.iter_content(chunk_size=8192):
            f.write(chunk)

Why use premium proxy:

It routes the request through a trusted residential/geo-specific proxy
Helps bypass stricter anti-bot systems and region-locked content
Useful when the site won't serve images to regular datacenter IPs

At the end of the day, you can wrestle with headers, cookies, redirects, JS execution, and IP restrictions yourself — but ScrapingBee handles all of that in one clean request. For tricky downloads, it's simply the superior option.

Extract image URLs, then download

Downloading one image is cool. Downloading a whole gallery of them is where scrape images from website Python actually starts paying off.

The usual workflow for Python web scraping images goes like this:

Fetch the HTML of the page (we'll use ScrapingBee so it works even on grumpy sites).
Parse all <img> tags with BeautifulSoup.
Normalize URLs (turn relative paths into absolute links).
Handle different attributes: src, data-src, or even srcset.
Loop over the image URLs and download them one by one.

Let's walk through this using the classic demo site Books to Scrape, and pull down the first 10 book covers.

Scraping book cover images with ScrapingBee + BeautifulSoup

Here's a full working example of image scraping with Python that:

Fetches the homepage with ScrapingBee
Extracts <img> tags from the book grid
Normalizes the cover image URLs
Downloads the first 10 covers into an images/ folder

import os
from urllib.parse import urljoin

import requests
from bs4 import BeautifulSoup

SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"
API_KEY = "YOUR_API_KEY"

page_url = "https://books.toscrape.com/"

params = {
    "api_key": API_KEY,
    "url": page_url,
    # This page is static, so we don't actually need JS here.
    # For JS-heavy sites, add: "render_js": "true"
}

# 1. Fetch page HTML via ScrapingBee
resp = requests.get(SB_ENDPOINT, params=params)
resp.raise_for_status()

soup = BeautifulSoup(resp.text, "html.parser")

# 2. Extract image sources from <img> tags
img_tags = soup.find_all("img")
img_urls = []

for tag in img_tags:
    # Prefer src, then data-src, then first item from srcset if present
    src = tag.get("src") or tag.get("data-src")
    if not src:
        srcset = tag.get("srcset")
        if srcset:
            # srcset is like "url1 1x, url2 2x" → take the first URL
            src = srcset.split(",")[0].strip().split()[0]

    if not src:
        continue

    # 3. Normalize to absolute URL
    full_url = urljoin(page_url, src)
    img_urls.append(full_url)

# Make sure output directory exists
os.makedirs("images", exist_ok=True)

# 4. Download first 10 images via ScrapingBee
for i, img_url in enumerate(img_urls[:10], start=1):
    img_params = {
        "api_key": API_KEY,
        "url": img_url,
    }

    with requests.get(SB_ENDPOINT, params=img_params, stream=True) as r:
        r.raise_for_status()
        filename = os.path.join("images", f"book_{i}.jpg")
        with open(filename, "wb") as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)

print(f"Downloaded {min(10, len(img_urls))} images into ./images/")

What this script does:

Fetches the HTML through ScrapingBee so the page loads cleanly even if it had JS or bot protections
Parses all <img> tags and pulls out image URLs from src, data-src, or srcset
Converts any relative paths to absolute URLs so they're safe to download
Creates an images/ folder if it doesn't exist
Downloads the first 10 images via ScrapingBee, streaming them to disk chunk by chunk
Saves everything as book_1.jpg, book_2.jpg, etc.

You can adapt this to:

Use render_js="true" on JavaScript-heavy galleries
Forward headers like Referer or Accept through ScrapingBee when sites are picky
Follow pagination links to walk through a multi-page gallery

If you want to go deeper on HTML extraction in general, have a look at ScrapingBee's web data extraction feature, and for JS-heavy scenarios, our JavaScript web scraper examples are worth a read.

Pro tip: paginate pages, not one giant in-memory image list. Process one page at a time and write images to disk as you go. That's how you keep memory usage predictable, even on big sites.

Name files correctly and preserve type

Once you can collect and download images, the next step is saving them correctly. Plenty of beginners just do open("image.jpg") for everything, but once you start handling multiple formats or big batches, that falls apart instantly.

A solid filename strategy should:

Detect the actual file extension (.jpg, .png, .gif, etc.)
Normalize the name so it's filesystem-safe
Prevent collisions when different images share the same basename
Stay readable for both humans and scripts

This pattern works well for Python save image, save an image Python, and Python save image from URL workflows.

Below is a compact helper built around the large sample file we've been using:

import os
import re
import hashlib
from urllib.parse import urlparse

import requests

EXT_FROM_CTYPE = {
    "image/jpeg": ".jpg",
    "image/jpg": ".jpg",
    "image/png": ".png",
    "image/gif": ".gif",
    "image/webp": ".webp",
    "image/avif": ".avif",
}

def safe_filename(url: str, resp: requests.Response) -> str:
    # 1. Try to infer extension from Content-Type
    ctype = resp.headers.get("Content-Type", "").split(";")[0].strip().lower()
    ext = EXT_FROM_CTYPE.get(ctype)

    # 2. Fallback to URL suffix if no known Content-Type
    clean_url = url.split("?", 1)[0]
    if not ext:
        ext = os.path.splitext(clean_url)[1] or ".bin"

    # 3. Slugify base name
    path = urlparse(clean_url).path
    base = os.path.basename(path) or "image"
    base = re.sub(r"[^a-zA-Z0-9_-]", "_", base).strip("_") or "image"

    # Optional: cap length so we don't create zombie-long filenames
    if len(base) > 50:
        base = base[:50]

    # 4. Short hash tail (avoid collisions)
    # we can hash resp.url in case the original URL was a redirect
    hash_tail = hashlib.sha256(resp.url.encode("utf-8")).hexdigest()[:8]

    return f"{base}_{hash_tail}{ext}"


# Example usage
url = "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg"

with requests.get(url, stream=True) as resp:
    resp.raise_for_status()
    filename = safe_filename(url, resp)

    with open(filename, "wb") as f:
        for chunk in resp.iter_content(chunk_size=8192):
            f.write(chunk)

print("Saved as:", filename)

Key points:

Content-Type detection — first we trust whatever the server says in the Content-Type header. If it tells us image/jpeg or image/png, cool, we use that to pick the right extension.
Fallback to URL extension — if the header is useless (and plenty of sites mess this up), we grab the extension straight from the URL. Not perfect, but way better than saving everything as .jpg.
Slugifying the base name — we take the original filename and scrub out all the sketchy characters. Underscores instead of chaos → filenames that won't break on weird filesystems.
Hash tail for uniqueness — a tiny SHA256 tail keeps things collision-proof. Two images with the same name? No problem, they won't stomp on each other.
Streaming write — we save the file in chunks with iter_content(). Keeps memory usage tiny and makes big downloads behave like adults instead of blowing up your script.

Batch downloads that don't fall over

Looping over one or two images is fine. Looping over hundreds in a strict sequence... not so much. If you really want to download images with Python at scale, you need:

Some light concurrency (10–20 threads is usually enough)
A shared Session with retries and backoff
Per-request timeouts so one slow host doesn't stall the whole run

And when you pair that with ScrapingBee's web scraping API, you get a pretty resilient Python image download setup.

Here's a compact pattern that puts it all together.

import os
from concurrent.futures import ThreadPoolExecutor
from typing import Iterable

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"
API_KEY = "YOUR_API_KEY"

# --- Shared session with retries + ScrapingBee defaults ---

session = requests.Session()

retries = Retry(
    total=3,                 # total retry attempts
    backoff_factor=1,        # sleep 1s, 2s, 4s between retries
    status_forcelist=[429, 500, 502, 503, 504],
    allowed_methods=["GET"],
)

adapter = HTTPAdapter(
    max_retries=retries,
    pool_connections=20,
    pool_maxsize=20,
)

session.mount("http://", adapter)
session.mount("https://", adapter)

# Every request will automatically send these params
session.params = {
    "api_key": API_KEY,
    # Set your defaults here so workers inherit them
    # "render_js": "true",
}


def download_image(url: str, filename: str, render_js: bool = False) -> str:
    params = {"url": url}
    if render_js:
        params["render_js"] = "true"

    try:
        with session.get(SB_ENDPOINT, params=params, timeout=15, stream=True) as resp:
            resp.raise_for_status()
            with open(filename, "wb") as f:
                for chunk in resp.iter_content(chunk_size=8192):
                    f.write(chunk)
        return f"Saved {filename}"
    except requests.RequestException as e:
        return f"Failed {url}: {e}"


def batch_download(image_urls: Iterable[str], max_workers: int = 10) -> None:
    os.makedirs("downloads", exist_ok=True)

    jobs = [
        (url, os.path.join("downloads", f"img_{i+1}.jpg"))
        for i, url in enumerate(image_urls)
    ]

    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        for url, filename in jobs:
            # fire off tasks in parallel
            executor.submit(download_image, url, filename)


# Example list of image URLs (could come from your scraper)
image_urls = [
    "https://images.pexels.com/photos/2280547/pexels-photo-2280547.jpeg",
    "https://images.pexels.com/photos/276267/pexels-photo-276267.jpeg",
    "https://images.pexels.com/photos/159045/the-interior-of-the-repair-interior-design-159045.jpeg",
]

batch_download(image_urls, max_workers=12)

What this setup gives you:

Concurrency: a ThreadPoolExecutor with 10–20 workers so you speed things up without flattening the target site.
Shared session: one Session reused across all threads, which cuts connection overhead and keeps things snappy.
Retries and backoff: temporary 429s or 5xx hiccups get retried automatically with growing delays, so flaky hosts don't kill the batch.
Timeouts: every download has a firm timeout=15, meaning one slow server can't freeze the whole operation.
ScrapingBee defaults in one place: putting your api_key, render_js, and other defaults in session.params keeps config clean and ensures all workers behave the same way.

Learn how to send POST requests with Python in our tutorial.

Handling blocks, errors, and status codes

Even if your Python requests download image code is flawless, servers can still hit you with a "nah bro." Status codes are your early-warning system, and ScrapingBee usually gives you a direct switch to flip when things get weird.

Here's a quick troubleshooting map for common Python download image failures:

Problem / Symptom	What's actually happening	ScrapingBee fix to try
403 Forbidden (or blank image)	Basic bot rules, missing headers, or simple anti-hotlinking	Add `premium_proxy=true`, set `country_code`, forward `User-Agent` / `Referer`
429 Too Many Requests	You're rate-limited	Keep retries + backoff, lower concurrency, try `premium_proxy=true` for bigger pools
Endless redirect / login loop	Site keeps sending you to consent/login/region pages	Enable `render_js=true` so JS redirects and cookies get handled on ScrapingBee's side
Hotlinking blocked (works only on the site itself)	Image requires a specific `Referer` or `Origin`	Send the page URL as `Referer` + use a realistic `User-Agent`

And one rule you never skip:

resp.raise_for_status()

If something goes wrong, you want it exploding loudly, not quietly writing out a 0-byte "image".

A quick example of proper error handling

import requests

url = "https://example.com/image.jpg"
filename = "image.jpg"

try:
    with requests.get(url, timeout=15, stream=True) as resp:
        resp.raise_for_status()  # catch 4xx/5xx immediately

        with open(filename, "wb") as f:
            for chunk in resp.iter_content(chunk_size=8192):
                if chunk:
                    f.write(chunk)

    print("Saved:", filename)

except requests.HTTPError as e:
    print(f"HTTP error while downloading {url}: {e}")

except requests.Timeout:
    print(f"Timeout reached while fetching {url}")

except requests.RequestException as e:
    print(f"Request failed for {url}: {e}")

What this gives you:

Errors explode early instead of corrupting files
Timeouts are treated clearly and separately
You never write a 0-byte "image" because the request failed upstream
The control flow stays clean; success path is simple, errors are explicit

Download large files safely

When you use Python to download image from URL and the file is big (more than 5-10MB), you really don't want to load the whole thing into memory. Large Python image download jobs should always be streamed, otherwise your script will chew RAM or die halfway through.

The safe pattern looks like this:

Always set stream=True
Read the response in chunks (8–16 KB is the sweet spot)
Check Content-Length when the server provides it so you know the transfer actually finished
Optionally show a progress bar with tqdm
ScrapingBee will forward Content-Length when the origin server includes it — but not every server sends that header, so don't rely on it blindly

Standard large-file streaming (with tqdm)

First of all, make sure to install tqdm:

uv add tqdm

And here's the code that streams your file and shows a nice progress bar:

import requests
from tqdm import tqdm

url = "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg"
filename = "large.jpg"

# Optional: reuse a session if you plan multiple downloads
session = requests.Session()

with session.get(url, stream=True, timeout=30) as resp:
    resp.raise_for_status()

    total = int(resp.headers.get("Content-Length", 0))

    with open(filename, "wb") as f, tqdm(
        total=total or None,   # handle missing Content-Length
        unit="B",
        unit_scale=True,
        desc=filename,
    ) as pbar:
        for chunk in resp.iter_content(chunk_size=16384):
            if not chunk:
                continue
            f.write(chunk)
            pbar.update(len(chunk))

print("Saved:", filename)

Key points:

Session reuse — reusing a Session keeps connections warm and makes repeated downloads noticeably faster.
Timeout added — timeout=30 prevents the script from hanging forever if a server stops responding.
Graceful handling of missing Content-Length — total = total or None lets tqdm show a progress bar even when the server doesn't report the file size.
Chunked streaming — iter_content(16384) pulls the file down in safe 16 KB blocks, avoiding memory spikes on large downloads.
Early error detection — calling resp.raise_for_status() ensures failures blow up immediately instead of silently writing garbage.
Skip empty chunks — if not chunk: continue filters out keep-alive packets so you only write real file data.

ScrapingBee variant (same logic, cleaner upstream handling)

import requests
from tqdm import tqdm

SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"
API_KEY = "YOUR_API_KEY"  # replace with your real key

IMAGE_URL = "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg"
filename = "landscape.jpg"

params = {
    "api_key": API_KEY,
    "url": IMAGE_URL,
}

# Optional but recommended: use a Session
session = requests.Session()

with session.get(SB_ENDPOINT, params=params, stream=True, timeout=30) as resp:
    # This makes sure we do not hang forever if something is wrong
    resp.raise_for_status()

    total = int(resp.headers.get("Content-Length", 0))

    with open(filename, "wb") as f, tqdm(
        total=total or None,  # None lets tqdm handle "unknown size"
        unit="B",
        unit_scale=True,
        desc=filename,
    ) as pbar:
        for chunk in resp.iter_content(chunk_size=16384):
            if not chunk:
                continue
            f.write(chunk)
            pbar.update(len(chunk))

print("Saved:", filename)

Key points in this large-file ScrapingBee downloader:

Timeouts prevent hangs — timeout=30 makes the request fail fast instead of sitting there forever when a server goes sleepy.
stream=True keeps memory usage low — the file arrives in manageable chunks, so you never load a 50–500 MB blob into RAM at once.
tqdm works with known and unknown sizes — total = total or None lets tqdm show a progress bar whether Content-Length exists or not.
raise_for_status() catches failures early — if ScrapingBee returns a bad status (wrong API key, 404, 429, whatever), the script stops before writing junk.
Session reuse = fewer slowdowns — one shared Session keeps connections alive and matches the best practices you'll use later in batch jobs.
Chunked writes are safer for big files — writing 16 KB chunks keeps downloads smooth and stable across all image formats and network speeds.

De-duplication and metadata

When you save image Python style at scale, the fastest way to fill your disk with regret is downloading the same JPEG a few hundred times. The clean fix is simple: compute a hash while streaming the file. If you've already seen that hash, skip the write.

SHA-256 is perfect for this — strong, reliable, and effectively collision-free for anything you'll hit in real scraping work.

import hashlib
import os
import requests

image_urls = [
    "https://images.pexels.com/photos/2280547/pexels-photo-2280547.jpeg",
    "https://images.pexels.com/photos/276267/pexels-photo-276267.jpeg",
    "https://images.pexels.com/photos/159045/the-interior-of-the-repair-interior-design-159045.jpeg",
]

seen_hashes = {}  # hash -> filename
os.makedirs("dedup_images", exist_ok=True)


def download_and_hash(url: str, index: int) -> None:
    # Temporary filename while we don't yet know if it's a duplicate
    tmp_name = f"dedup_images/tmp_{index}.bin"
    hasher = hashlib.sha256()

    with requests.get(url, stream=True, timeout=30) as resp:
        resp.raise_for_status()

        with open(tmp_name, "wb") as f:
            for chunk in resp.iter_content(chunk_size=8192):
                if not chunk:
                    continue
                hasher.update(chunk)
                f.write(chunk)

    file_hash = hasher.hexdigest()

    if file_hash in seen_hashes:
        print(f"Duplicate found → {url}")
        print(f"Already saved as: {seen_hashes[file_hash]}")
        os.remove(tmp_name)
        return

    final_name = f"dedup_images/image_{index}.jpg"
    os.rename(tmp_name, final_name)
    seen_hashes[file_hash] = final_name

    print(f"Saved unique image: {final_name}")
    print(f"SHA-256: {file_hash}")


# Run through all URLs
for i, url in enumerate(image_urls, start=1):
    download_and_hash(url, i)

Key points:

Hash while streaming — the SHA-256 digest is built chunk-by-chunk as the file downloads, so we never load the whole image into memory.
SHA-256 is the safest mainstream choice — strong, collision-resistant, and still fast enough to run hundreds or thousands of times in a scraping loop.
Dictionary lookup for duplicates — a simple in-memory map (hash → filename) gives an instant O(1) way to check if we've already seen the file.
Write only unique images — the file is saved only if its hash is new, which keeps your dataset clean and stops you from wasting disk space on copies.

This small pattern is enough to keep thousands of downloaded images deduplicated without complex logic or expensive pixel comparisons.

Best practices and ethics

When you're web scraping images using Python, the goal isn't to grab every file in sight — it's to do it cleanly, responsibly, and without becoming "that person" who shows up in an admin's logs at 3 a.m. A handful of simple habits goes a long way.

1. Respect rules

Before you scrape, check:

The site's robots.txt (e.g. https://example.com/robots.txt)
The site's Terms of Service

Some sites explicitly restrict automated scraping, hotlinking, or bulk downloading. Even demo sites often spell out what's allowed. ScrapingBee also has options for no-code web scraping and web data extraction that can keep things structured and predictable.

2. Avoid overload

Don't carpet-bomb a server with 200 parallel requests just because "threads are cool." Keep things sane:

Use modest concurrency (10–20 workers, not hundreds)
Add tiny pauses between page fetches
Back off when you start getting 429s or notice the site slowing down

A polite Python web scraping images setup keeps the target site happy and dramatically reduces blocks, CAPTCHAs, and bizarre edge cases you'd otherwise waste hours debugging.

3. Be smart about re-use

You don't need to download the same image five times:

Cache successful downloads (by URL, by hash, or both)
Log failed URLs so you can retry them later without re-running everything
Track simple metadata (hash, filename, source page) so you can dedupe, resume, and audit cleanly

Why ScrapingBee helps here

ScrapingBee's proxy rotation, JavaScript rendering, and structured web data extraction mean you spend less time fighting blocks and more time running stable pipelines. Combine that with good etiquette — respect rules, avoid overload, cache smartly — and your large-scale web scraping images using Python stays both effective and sustainable.

Downloading images using `urllib` (legacy option)

Python ships with urllib, and it can download image from URL Python style without any third-party packages. But in practice, most developers skip it now as requests is cleaner, safer, and much easier to extend or wrap with ScrapingBee when sites get tricky.

Still, if you ever need a zero-dependency fallback for python save image from url tasks, here's a simple, modern Python 3 example:

import urllib.request

url = "https://images.pexels.com/photos/2280547/pexels-photo-2280547.jpeg"
file_name = "urllib_image.jpg"

# Provide a User-Agent to avoid trivial blocks
headers = {"User-Agent": "Mozilla/5.0"}
req = urllib.request.Request(url, headers=headers)

with urllib.request.urlopen(req, timeout=20) as resp:
    with open(file_name, "wb") as f:
        while True:
            chunk = resp.read(8192)
            if not chunk:
                break
            f.write(chunk)

print("Image saved as", file_name)

So, urllib works, but if you're doing anything beyond tiny scripts, requests and ScrapingBee will make your life significantly easier.

Using the `wget` module

If you just want a quick download image Python one-liner, the wget module does the job. It's a tiny wrapper around basic HTTP downloads: great for quick hacks or throwaway scripts, but not something you rely on when you need headers, sessions, retries, proxies, or any real Python image download workflow.

Here's the bare-bones version:

import wget

url = "https://images.pexels.com/photos/2280547/pexels-photo-2280547.jpeg"

file_name = wget.download(url)
print("\nImage saved as", file_name)

Learn how to use Python with curl in our tutorial.

Ready to scrape smarter with Python?

If you're actually serious about scraping images at scale — not just poking at a few URLs — then it's time to stop fighting blocks, redirects, and flaky headers on your own. ScrapingBee gives you JS rendering, proxy rotation, stable HTML, and a dead-simple API that plugs straight into your Python workflow.

Grab your free 1,000 credits and see how smooth scraping can be: Get started now.

Conclusion

Downloading images in Python is easy, but doing it properly is what turns a quick script into a real, production-ready workflow.
You've seen how to stream large files safely, avoid memory spikes, generate clean filenames, deduplicate with hashes, parallelize downloads, scrape galleries, and deal with everything from rate limits to hotlink protection.

requests gives you the control you need, and ScrapingBee carries you through the messy parts: stable HTML, JS rendering, proxy rotation, and predictable results even on stubborn sites. Pair those tools with smart habits like caching, being polite with concurrency, retry logic, and solid file handling, and your Python image pipelines end up fast, reliable, and ready for whatever you throw at them.

Python image download FAQs

How do I download an image from a URL in Python?

Use requests.get(url, stream=True) and write the response in chunks with iter_content(). This keeps memory usage low, handles large files safely, and follows the pattern recommended in the requests docs. Add raise_for_status() to catch errors early, and use a Session if you're downloading multiple images.

Why does a direct `requests.get` return 403?

Because the site doesn't like you showing up "naked." Many servers block default clients when headers like User-Agent or Referer are missing, or when your IP triggers bot checks. Adding realistic headers sometimes works, but the most reliable fix is using ScrapingBee with premium proxies as you inherit real browser behavior and a clean IP pool automatically.

How do I save images with the correct extension?

Look at the server's Content-Type header (image/jpeg, image/png, etc.) and map it to the right extension. When the header is missing or vague, fall back to the URL's extension after stripping query parameters. This keeps your files consistent and avoids saving everything as .jpg by accident.

How do I download many images quickly without bans?

Use smart parallelism, not brute force:

10–20 worker threads (not hundreds)
Retries with exponential backoff
Per-request timeouts
A shared Session for connection reuse

If you're scraping at scale, ScrapingBee's proxy rotation spreads requests across a large IP pool, massively reducing blocks and rate limits.

When should I enable `render_js` in ScrapingBee?

Turn on render_js=true when the images only appear after JavaScript executes — lazy-loaded galleries, React/Vue pages, JS redirects, cookie walls, etc. For static pages or direct image URLs, leave it off. You'll get faster performance, fewer resources used, and lower credit consumption.

Can I pass headers (e.g., Referer) through ScrapingBee?

Yes. To forward headers, set forward_headers=true and prefix each forwarded header with Spb-. For example:

params = {
    "api_key": API_KEY,
    "url": target_image_url,
    "forward_headers": "true",
    "Spb-Referer": "https://example.com",
    "Spb-User-Agent": "Mozilla/5.0"
}

This tells ScrapingBee to send those exact headers to the target website.

Maxine Meurer

Maxine is a software engineer and passionate technical writer, who enjoys spending her free time incorporating her knowledge of environmental technologies into web development.

You might also like:

Pyppeteer: the Puppeteer for Python Developers

Kalebu Gwalugano

Pyppeteer is a Python wrapper for Puppeteer. This article will show you how to use it to scrape dynamic site, automate and render Javascript-heavy websites.

How to send a POST with Python Requests?

Ian Wootten

Learn how to use Python Requests POST for JSON, form data, file uploads, sessions, retries, and scraping workflows. Clear examples and best practices included.

Mastering the Python curl request: A practical guide for developers

Alexander M

Learn how to use curl in Python with subprocess, PycURL, and Requests. Includes ready examples and ScrapingBee for proxies and JavaScript rendering.