If you've ever tried to Python download image from URL, you already know the theory looks stupidly simple: call requests.get() and boom — image saved. Except that's not how the real world usually works. Sites block bots, images hide behind JavaScript, redirects go in circles, and bulk downloads crumble if you're not streaming, retrying, or handling files properly.
This guide takes the actually useful route: how to stream images safely, name files without creating a junkyard, avoid duplicates, scale to thousands of downloads, and bring in ScrapingBee when a site decides to get spicy. By the end, you'll have a toolkit that works on real websites, not toy examples.

Quick answer: Download an image in Python, fast
To Python download image from URL, you grab it with requests.get(), check for errors, and dump the bytes to a file. This is the basic pattern behind every Python requests download image or Python save image from URL trick — whether you're pulling a random JPG or wiring it into a bigger Python web scraping pipeline.
The baseline (Requests)
import requests
url = "https://example.com/image.jpg"
# Fetch the image from the URL
resp = requests.get(url)
resp.raise_for_status() # Make sure the request didn't fail
# Save the image bytes to disk
# (fetch the whole image into memory at once)
with open("image.jpg", "wb") as f:
f.write(resp.content)
# Alternative approach: streaming
# (useful for larger images)
# with requests.get(url, stream=True) as resp:
# resp.raise_for_status()
# with open("large-image.jpg", "wb") as f:
# for chunk in resp.iter_content(chunk_size=8192):
# if chunk:
# f.write(chunk)
What the basic version does:
- Grabs the entire image into memory in one go
- Saves it straight to disk
- Totally fine for small or medium-sized files or quick one-off scripts
And if your image is large, it would be a better idea to stream because:
- You don't load the whole file into RAM at once
- It's safer when downloading hundreds or thousands of images
- Slow servers won't choke your script with giant responses
- You write chunks as they arrive, which keeps things smooth and predictable
The ScrapingBee version (more reliable on real sites)
import requests
SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"
params = {
"api_key": "YOUR_API_KEY",
"url": "https://example.com/image.jpg",
}
# Basic: fetch the rendered/processed image through ScrapingBee
resp = requests.get(SB_ENDPOINT, params=params)
resp.raise_for_status()
with open("image.jpg", "wb") as f:
f.write(resp.content)
# Alternative: stream so we don't load giant images into memory at once
# with requests.get(SB_ENDPOINT, params=params, stream=True) as resp:
# resp.raise_for_status()
# with open("large-image.jpg", "wb") as f:
# for chunk in resp.iter_content(chunk_size=8192):
# if chunk:
# f.write(chunk)
What the basic version does:
- Calls ScrapingBee and saves the bytes — simple and fine for small files.
- The ScrapingBee call behaves just like a normal
requests.get(), except it handles proxies, bot checks, and JavaScript for you.
And if your image is large, it would be a better idea to stream because it keeps memory usage low and avoids loading the whole file at once.
Prerequisites
Before we start slurping images off the internet like civilized devs, let's make sure your setup isn't held together by duct tape and hope.
You'll need just three things:
- Python 3 — any reasonably recent version
- A text editor — VS Code, Vim, PyCharm; whatever works for you
- Optional:
uv— a stupidly fast Python package manager that feels like pip after hitting the gym
Now let's check your Python installation. Pop open your terminal and run:
python3 --version
# or
python --version
If you see something like Python 3.10.12, you're good to go.
If you want to roll with uv, here's the quickest way to spin up a fresh project with requests and BeautifulSoup already installed:
uv init image-downloader
cd image-downloader
uv add requests
uv add beautifulsoup4
The newly created project will contain a main.py file — our code will go there. To execute it, simply run:
uv run python main.py
That's it!
Using the Requests package
Alright, let's get into the real work: Python download image, the right way. Most devs start with requests.get(url).content, and yeah, that works... until you try pulling down a 200MB image and your RAM starts making death noises. (Well, I'm exaggerating a bit but you got the idea.)
So here's the rule of the land: if the file is even remotely large, always use stream=True and iterate with iter_content(). This is the difference between downloading gracefully and detonating your laptop.
response.contentloads the entire file into memory. It's great for tiny PNGs but not so good for larger files.iter_content()withstream=Truedownloads the file in chunks. You stay memory-friendly, efficient, and less on fire.
If you're writing tutorials, docs, or production code, the chunked pattern is the one you use. No exceptions.
Downloading an image with Requests (streaming)
Here's the standard pattern you should reach for when doing Python requests download image or save image Python use cases:
import requests
url = "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg"
headers = {"User-Agent": "Mozilla/5.0"}
with requests.get(url, stream=True, headers=headers) as resp:
resp.raise_for_status()
with open("large.jpg", "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
f.write(chunk)
This accomplishes three critical things:
- Uses a real
User-Agent(yes, some sites absolutely care). - Streams the response to avoid loading the entire file into RAM.
- Writes the image chunk-by-chunk so Python stays calm and functional.
Downloading images through ScrapingBee
When plain Requests starts throwing fits — 403 errors, weird JavaScript redirects, or "region not allowed" messages — ScrapingBee is the next step. Instead of fighting bot checks and browser-only behavior yourself, you let ScrapingBee proxy the request on your behalf.
The flow is simple: you send your api_key and the target url to ScrapingBee's API, and it returns the raw bytes just like a normal requests.get() call. Your download logic stays the same, but the heavy lifting happens on ScrapingBee's side.
You can sign up for free and get 1,000 credits, which is plenty for testing image downloads.
ScrapingBee can also forward headers (Accept, Referer, etc.), run JavaScript when a site requires it, and route traffic through premium proxies for geo-sensitive content.
Below are three common recipes.
1. Direct image download (straightforward case)
If the image URL is direct and there's no special protection, this is all you need:
import requests
SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"
params = {
"api_key": "YOUR_API_KEY",
"url": "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg"
}
with requests.get(SB_ENDPOINT, params=params, stream=True) as resp:
resp.raise_for_status()
with open("bee_image.jpg", "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
f.write(chunk)
What this does:
- Sends the image request through ScrapingBee (so JS, bot checks, and cookies are handled for you)
- Streams the file in chunks to avoid loading a multi-MB image into memory
- Writes each chunk directly to
bee_image.jpguntil the download finishes
2. Image behind JavaScript
Some pages generate or reveal the real image URL only after running JavaScript. ScrapingBee can handle that by enabling JavaScript rendering:
import requests
SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"
params = {
"api_key": "YOUR_API_KEY",
"url": "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg",
"render_js": "true", # Enable JS rendering
}
with requests.get(SB_ENDPOINT, params=params, stream=True) as resp:
resp.raise_for_status()
with open("js_image.jpg", "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
f.write(chunk)
3. Geo-blocked / Heavily protected image
If the target site only serves the file to certain IP regions or uses tougher anti-bot rules, enable the premium proxy layer:
import requests
SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"
params = {
"api_key": "YOUR_API_KEY",
"url": "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg",
"premium_proxy": "true",
"country_code": "us",
}
with requests.get(SB_ENDPOINT, params=params, stream=True) as resp:
resp.raise_for_status()
with open("geo_image.jpg", "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
f.write(chunk)
Why use premium proxy:
- It routes the request through a trusted residential/geo-specific proxy
- Helps bypass stricter anti-bot systems and region-locked content
- Useful when the site won't serve images to regular datacenter IPs
At the end of the day, you can wrestle with headers, cookies, redirects, JS execution, and IP restrictions yourself — but ScrapingBee handles all of that in one clean request. For tricky downloads, it's simply the superior option.
Extract image URLs, then download
Downloading one image is cool. Downloading a whole gallery of them is where scrape images from website Python actually starts paying off.
The usual workflow for Python web scraping images goes like this:
- Fetch the HTML of the page (we'll use ScrapingBee so it works even on grumpy sites).
- Parse all
<img>tags with BeautifulSoup. - Normalize URLs (turn relative paths into absolute links).
- Handle different attributes:
src,data-src, or evensrcset. - Loop over the image URLs and download them one by one.
Let's walk through this using the classic demo site Books to Scrape, and pull down the first 10 book covers.
Scraping book cover images with ScrapingBee + BeautifulSoup
Here's a full working example of image scraping with Python that:
- Fetches the homepage with ScrapingBee
- Extracts
<img>tags from the book grid - Normalizes the cover image URLs
- Downloads the first 10 covers into an
images/folder
import os
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup
SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"
API_KEY = "YOUR_API_KEY"
page_url = "https://books.toscrape.com/"
params = {
"api_key": API_KEY,
"url": page_url,
# This page is static, so we don't actually need JS here.
# For JS-heavy sites, add: "render_js": "true"
}
# 1. Fetch page HTML via ScrapingBee
resp = requests.get(SB_ENDPOINT, params=params)
resp.raise_for_status()
soup = BeautifulSoup(resp.text, "html.parser")
# 2. Extract image sources from <img> tags
img_tags = soup.find_all("img")
img_urls = []
for tag in img_tags:
# Prefer src, then data-src, then first item from srcset if present
src = tag.get("src") or tag.get("data-src")
if not src:
srcset = tag.get("srcset")
if srcset:
# srcset is like "url1 1x, url2 2x" → take the first URL
src = srcset.split(",")[0].strip().split()[0]
if not src:
continue
# 3. Normalize to absolute URL
full_url = urljoin(page_url, src)
img_urls.append(full_url)
# Make sure output directory exists
os.makedirs("images", exist_ok=True)
# 4. Download first 10 images via ScrapingBee
for i, img_url in enumerate(img_urls[:10], start=1):
img_params = {
"api_key": API_KEY,
"url": img_url,
}
with requests.get(SB_ENDPOINT, params=img_params, stream=True) as r:
r.raise_for_status()
filename = os.path.join("images", f"book_{i}.jpg")
with open(filename, "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
print(f"Downloaded {min(10, len(img_urls))} images into ./images/")
What this script does:
- Fetches the HTML through ScrapingBee so the page loads cleanly even if it had JS or bot protections
- Parses all
<img>tags and pulls out image URLs fromsrc,data-src, orsrcset - Converts any relative paths to absolute URLs so they're safe to download
- Creates an
images/folder if it doesn't exist - Downloads the first 10 images via ScrapingBee, streaming them to disk chunk by chunk
- Saves everything as
book_1.jpg,book_2.jpg, etc.
You can adapt this to:
- Use
render_js="true"on JavaScript-heavy galleries - Forward headers like Referer or Accept through ScrapingBee when sites are picky
- Follow pagination links to walk through a multi-page gallery
If you want to go deeper on HTML extraction in general, have a look at ScrapingBee's web data extraction feature, and for JS-heavy scenarios, our JavaScript web scraper examples are worth a read.
Pro tip: paginate pages, not one giant in-memory image list. Process one page at a time and write images to disk as you go. That's how you keep memory usage predictable, even on big sites.
Name files correctly and preserve type
Once you can collect and download images, the next step is saving them correctly. Plenty of beginners just do open("image.jpg") for everything, but once you start handling multiple formats or big batches, that falls apart instantly.
A solid filename strategy should:
- Detect the actual file extension (
.jpg,.png,.gif, etc.) - Normalize the name so it's filesystem-safe
- Prevent collisions when different images share the same basename
- Stay readable for both humans and scripts
This pattern works well for Python save image, save an image Python, and Python save image from URL workflows.
Below is a compact helper built around the large sample file we've been using:
import os
import re
import hashlib
from urllib.parse import urlparse
import requests
EXT_FROM_CTYPE = {
"image/jpeg": ".jpg",
"image/jpg": ".jpg",
"image/png": ".png",
"image/gif": ".gif",
"image/webp": ".webp",
"image/avif": ".avif",
}
def safe_filename(url: str, resp: requests.Response) -> str:
# 1. Try to infer extension from Content-Type
ctype = resp.headers.get("Content-Type", "").split(";")[0].strip().lower()
ext = EXT_FROM_CTYPE.get(ctype)
# 2. Fallback to URL suffix if no known Content-Type
clean_url = url.split("?", 1)[0]
if not ext:
ext = os.path.splitext(clean_url)[1] or ".bin"
# 3. Slugify base name
path = urlparse(clean_url).path
base = os.path.basename(path) or "image"
base = re.sub(r"[^a-zA-Z0-9_-]", "_", base).strip("_") or "image"
# Optional: cap length so we don't create zombie-long filenames
if len(base) > 50:
base = base[:50]
# 4. Short hash tail (avoid collisions)
# we can hash resp.url in case the original URL was a redirect
hash_tail = hashlib.sha256(resp.url.encode("utf-8")).hexdigest()[:8]
return f"{base}_{hash_tail}{ext}"
# Example usage
url = "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg"
with requests.get(url, stream=True) as resp:
resp.raise_for_status()
filename = safe_filename(url, resp)
with open(filename, "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
f.write(chunk)
print("Saved as:", filename)
Key points:
- Content-Type detection — first we trust whatever the server says in the
Content-Typeheader. If it tells usimage/jpegorimage/png, cool, we use that to pick the right extension. - Fallback to URL extension — if the header is useless (and plenty of sites mess this up), we grab the extension straight from the URL. Not perfect, but way better than saving everything as
.jpg. - Slugifying the base name — we take the original filename and scrub out all the sketchy characters. Underscores instead of chaos → filenames that won't break on weird filesystems.
- Hash tail for uniqueness — a tiny SHA256 tail keeps things collision-proof. Two images with the same name? No problem, they won't stomp on each other.
- Streaming write — we save the file in chunks with
iter_content(). Keeps memory usage tiny and makes big downloads behave like adults instead of blowing up your script.
Batch downloads that don't fall over
Looping over one or two images is fine. Looping over hundreds in a strict sequence... not so much. If you really want to download images with Python at scale, you need:
- Some light concurrency (10–20 threads is usually enough)
- A shared
Sessionwith retries and backoff - Per-request timeouts so one slow host doesn't stall the whole run
And when you pair that with ScrapingBee's web scraping API, you get a pretty resilient Python image download setup.
Here's a compact pattern that puts it all together.
import os
from concurrent.futures import ThreadPoolExecutor
from typing import Iterable
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"
API_KEY = "YOUR_API_KEY"
# --- Shared session with retries + ScrapingBee defaults ---
session = requests.Session()
retries = Retry(
total=3, # total retry attempts
backoff_factor=1, # sleep 1s, 2s, 4s between retries
status_forcelist=[429, 500, 502, 503, 504],
allowed_methods=["GET"],
)
adapter = HTTPAdapter(
max_retries=retries,
pool_connections=20,
pool_maxsize=20,
)
session.mount("http://", adapter)
session.mount("https://", adapter)
# Every request will automatically send these params
session.params = {
"api_key": API_KEY,
# Set your defaults here so workers inherit them
# "render_js": "true",
}
def download_image(url: str, filename: str, render_js: bool = False) -> str:
params = {"url": url}
if render_js:
params["render_js"] = "true"
try:
with session.get(SB_ENDPOINT, params=params, timeout=15, stream=True) as resp:
resp.raise_for_status()
with open(filename, "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
f.write(chunk)
return f"Saved {filename}"
except requests.RequestException as e:
return f"Failed {url}: {e}"
def batch_download(image_urls: Iterable[str], max_workers: int = 10) -> None:
os.makedirs("downloads", exist_ok=True)
jobs = [
(url, os.path.join("downloads", f"img_{i+1}.jpg"))
for i, url in enumerate(image_urls)
]
with ThreadPoolExecutor(max_workers=max_workers) as executor:
for url, filename in jobs:
# fire off tasks in parallel
executor.submit(download_image, url, filename)
# Example list of image URLs (could come from your scraper)
image_urls = [
"https://images.pexels.com/photos/2280547/pexels-photo-2280547.jpeg",
"https://images.pexels.com/photos/276267/pexels-photo-276267.jpeg",
"https://images.pexels.com/photos/159045/the-interior-of-the-repair-interior-design-159045.jpeg",
]
batch_download(image_urls, max_workers=12)
What this setup gives you:
- Concurrency: a
ThreadPoolExecutorwith 10–20 workers so you speed things up without flattening the target site. - Shared session: one
Sessionreused across all threads, which cuts connection overhead and keeps things snappy. - Retries and backoff: temporary 429s or 5xx hiccups get retried automatically with growing delays, so flaky hosts don't kill the batch.
- Timeouts: every download has a firm
timeout=15, meaning one slow server can't freeze the whole operation. - ScrapingBee defaults in one place: putting your
api_key,render_js, and other defaults insession.paramskeeps config clean and ensures all workers behave the same way.
Learn how to send POST requests with Python in our tutorial.
Handling blocks, errors, and status codes
Even if your Python requests download image code is flawless, servers can still hit you with a "nah bro." Status codes are your early-warning system, and ScrapingBee usually gives you a direct switch to flip when things get weird.
Here's a quick troubleshooting map for common Python download image failures:
| Problem / Symptom | What's actually happening | ScrapingBee fix to try |
|---|---|---|
| 403 Forbidden (or blank image) | Basic bot rules, missing headers, or simple anti-hotlinking | Add premium_proxy=true, set country_code, forward User-Agent / Referer |
| 429 Too Many Requests | You're rate-limited | Keep retries + backoff, lower concurrency, try premium_proxy=true for bigger pools |
| Endless redirect / login loop | Site keeps sending you to consent/login/region pages | Enable render_js=true so JS redirects and cookies get handled on ScrapingBee's side |
| Hotlinking blocked (works only on the site itself) | Image requires a specific Referer or Origin | Send the page URL as Referer + use a realistic User-Agent |
And one rule you never skip:
resp.raise_for_status()
If something goes wrong, you want it exploding loudly, not quietly writing out a 0-byte "image".
A quick example of proper error handling
import requests
url = "https://example.com/image.jpg"
filename = "image.jpg"
try:
with requests.get(url, timeout=15, stream=True) as resp:
resp.raise_for_status() # catch 4xx/5xx immediately
with open(filename, "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
print("Saved:", filename)
except requests.HTTPError as e:
print(f"HTTP error while downloading {url}: {e}")
except requests.Timeout:
print(f"Timeout reached while fetching {url}")
except requests.RequestException as e:
print(f"Request failed for {url}: {e}")
What this gives you:
- Errors explode early instead of corrupting files
- Timeouts are treated clearly and separately
- You never write a 0-byte "image" because the request failed upstream
- The control flow stays clean; success path is simple, errors are explicit
Download large files safely
When you use Python to download image from URL and the file is big (more than 5-10MB), you really don't want to load the whole thing into memory. Large Python image download jobs should always be streamed, otherwise your script will chew RAM or die halfway through.
The safe pattern looks like this:
- Always set
stream=True - Read the response in chunks (8–16 KB is the sweet spot)
- Check
Content-Lengthwhen the server provides it so you know the transfer actually finished - Optionally show a progress bar with
tqdm - ScrapingBee will forward
Content-Lengthwhen the origin server includes it — but not every server sends that header, so don't rely on it blindly
Standard large-file streaming (with tqdm)
First of all, make sure to install tqdm:
uv add tqdm
And here's the code that streams your file and shows a nice progress bar:
import requests
from tqdm import tqdm
url = "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg"
filename = "large.jpg"
# Optional: reuse a session if you plan multiple downloads
session = requests.Session()
with session.get(url, stream=True, timeout=30) as resp:
resp.raise_for_status()
total = int(resp.headers.get("Content-Length", 0))
with open(filename, "wb") as f, tqdm(
total=total or None, # handle missing Content-Length
unit="B",
unit_scale=True,
desc=filename,
) as pbar:
for chunk in resp.iter_content(chunk_size=16384):
if not chunk:
continue
f.write(chunk)
pbar.update(len(chunk))
print("Saved:", filename)
Key points:
- Session reuse — reusing a
Sessionkeeps connections warm and makes repeated downloads noticeably faster. - Timeout added —
timeout=30prevents the script from hanging forever if a server stops responding. - Graceful handling of missing Content-Length —
total = total or Nonelets tqdm show a progress bar even when the server doesn't report the file size. - Chunked streaming —
iter_content(16384)pulls the file down in safe 16 KB blocks, avoiding memory spikes on large downloads. - Early error detection — calling
resp.raise_for_status()ensures failures blow up immediately instead of silently writing garbage. - Skip empty chunks —
if not chunk: continuefilters out keep-alive packets so you only write real file data.
ScrapingBee variant (same logic, cleaner upstream handling)
import requests
from tqdm import tqdm
SB_ENDPOINT = "https://app.scrapingbee.com/api/v1"
API_KEY = "YOUR_API_KEY" # replace with your real key
IMAGE_URL = "https://sample-files.com/downloads/images/jpg/landscape_hires_4000x2667_6.83mb.jpg"
filename = "landscape.jpg"
params = {
"api_key": API_KEY,
"url": IMAGE_URL,
}
# Optional but recommended: use a Session
session = requests.Session()
with session.get(SB_ENDPOINT, params=params, stream=True, timeout=30) as resp:
# This makes sure we do not hang forever if something is wrong
resp.raise_for_status()
total = int(resp.headers.get("Content-Length", 0))
with open(filename, "wb") as f, tqdm(
total=total or None, # None lets tqdm handle "unknown size"
unit="B",
unit_scale=True,
desc=filename,
) as pbar:
for chunk in resp.iter_content(chunk_size=16384):
if not chunk:
continue
f.write(chunk)
pbar.update(len(chunk))
print("Saved:", filename)
Key points in this large-file ScrapingBee downloader:
- Timeouts prevent hangs —
timeout=30makes the request fail fast instead of sitting there forever when a server goes sleepy. stream=Truekeeps memory usage low — the file arrives in manageable chunks, so you never load a 50–500 MB blob into RAM at once.- tqdm works with known and unknown sizes —
total = total or Nonelets tqdm show a progress bar whetherContent-Lengthexists or not. raise_for_status()catches failures early — if ScrapingBee returns a bad status (wrong API key, 404, 429, whatever), the script stops before writing junk.- Session reuse = fewer slowdowns — one shared
Sessionkeeps connections alive and matches the best practices you'll use later in batch jobs. - Chunked writes are safer for big files — writing 16 KB chunks keeps downloads smooth and stable across all image formats and network speeds.
De-duplication and metadata
When you save image Python style at scale, the fastest way to fill your disk with regret is downloading the same JPEG a few hundred times. The clean fix is simple: compute a hash while streaming the file. If you've already seen that hash, skip the write.
SHA-256 is perfect for this — strong, reliable, and effectively collision-free for anything you'll hit in real scraping work.
import hashlib
import os
import requests
image_urls = [
"https://images.pexels.com/photos/2280547/pexels-photo-2280547.jpeg",
"https://images.pexels.com/photos/276267/pexels-photo-276267.jpeg",
"https://images.pexels.com/photos/159045/the-interior-of-the-repair-interior-design-159045.jpeg",
]
seen_hashes = {} # hash -> filename
os.makedirs("dedup_images", exist_ok=True)
def download_and_hash(url: str, index: int) -> None:
# Temporary filename while we don't yet know if it's a duplicate
tmp_name = f"dedup_images/tmp_{index}.bin"
hasher = hashlib.sha256()
with requests.get(url, stream=True, timeout=30) as resp:
resp.raise_for_status()
with open(tmp_name, "wb") as f:
for chunk in resp.iter_content(chunk_size=8192):
if not chunk:
continue
hasher.update(chunk)
f.write(chunk)
file_hash = hasher.hexdigest()
if file_hash in seen_hashes:
print(f"Duplicate found → {url}")
print(f"Already saved as: {seen_hashes[file_hash]}")
os.remove(tmp_name)
return
final_name = f"dedup_images/image_{index}.jpg"
os.rename(tmp_name, final_name)
seen_hashes[file_hash] = final_name
print(f"Saved unique image: {final_name}")
print(f"SHA-256: {file_hash}")
# Run through all URLs
for i, url in enumerate(image_urls, start=1):
download_and_hash(url, i)
Key points:
- Hash while streaming — the SHA-256 digest is built chunk-by-chunk as the file downloads, so we never load the whole image into memory.
- SHA-256 is the safest mainstream choice — strong, collision-resistant, and still fast enough to run hundreds or thousands of times in a scraping loop.
- Dictionary lookup for duplicates — a simple in-memory map (
hash → filename) gives an instant O(1) way to check if we've already seen the file. - Write only unique images — the file is saved only if its hash is new, which keeps your dataset clean and stops you from wasting disk space on copies.
This small pattern is enough to keep thousands of downloaded images deduplicated without complex logic or expensive pixel comparisons.
Best practices and ethics
When you're web scraping images using Python, the goal isn't to grab every file in sight — it's to do it cleanly, responsibly, and without becoming "that person" who shows up in an admin's logs at 3 a.m. A handful of simple habits goes a long way.
1. Respect rules
Before you scrape, check:
- The site's
robots.txt(e.g.https://example.com/robots.txt) - The site's Terms of Service
Some sites explicitly restrict automated scraping, hotlinking, or bulk downloading. Even demo sites often spell out what's allowed. ScrapingBee also has options for no-code web scraping and web data extraction that can keep things structured and predictable.
2. Avoid overload
Don't carpet-bomb a server with 200 parallel requests just because "threads are cool." Keep things sane:
- Use modest concurrency (10–20 workers, not hundreds)
- Add tiny pauses between page fetches
- Back off when you start getting 429s or notice the site slowing down
A polite Python web scraping images setup keeps the target site happy and dramatically reduces blocks, CAPTCHAs, and bizarre edge cases you'd otherwise waste hours debugging.
3. Be smart about re-use
You don't need to download the same image five times:
- Cache successful downloads (by URL, by hash, or both)
- Log failed URLs so you can retry them later without re-running everything
- Track simple metadata (hash, filename, source page) so you can dedupe, resume, and audit cleanly
Why ScrapingBee helps here
ScrapingBee's proxy rotation, JavaScript rendering, and structured web data extraction mean you spend less time fighting blocks and more time running stable pipelines. Combine that with good etiquette — respect rules, avoid overload, cache smartly — and your large-scale web scraping images using Python stays both effective and sustainable.
Downloading images using urllib (legacy option)
Python ships with urllib, and it can download image from URL Python style without any third-party packages. But in practice, most developers skip it now as requests is cleaner, safer, and much easier to extend or wrap with ScrapingBee when sites get tricky.
Still, if you ever need a zero-dependency fallback for python save image from url tasks, here's a simple, modern Python 3 example:
import urllib.request
url = "https://images.pexels.com/photos/2280547/pexels-photo-2280547.jpeg"
file_name = "urllib_image.jpg"
# Provide a User-Agent to avoid trivial blocks
headers = {"User-Agent": "Mozilla/5.0"}
req = urllib.request.Request(url, headers=headers)
with urllib.request.urlopen(req, timeout=20) as resp:
with open(file_name, "wb") as f:
while True:
chunk = resp.read(8192)
if not chunk:
break
f.write(chunk)
print("Image saved as", file_name)
So, urllib works, but if you're doing anything beyond tiny scripts, requests and ScrapingBee will make your life significantly easier.
Using the wget module
If you just want a quick download image Python one-liner, the wget module does the job. It's a tiny wrapper around basic HTTP downloads: great for quick hacks or throwaway scripts, but not something you rely on when you need headers, sessions, retries, proxies, or any real Python image download workflow.
Here's the bare-bones version:
import wget
url = "https://images.pexels.com/photos/2280547/pexels-photo-2280547.jpeg"
file_name = wget.download(url)
print("\nImage saved as", file_name)
Learn how to use Python with curl in our tutorial.
Ready to scrape smarter with Python?
If you're actually serious about scraping images at scale — not just poking at a few URLs — then it's time to stop fighting blocks, redirects, and flaky headers on your own. ScrapingBee gives you JS rendering, proxy rotation, stable HTML, and a dead-simple API that plugs straight into your Python workflow.
Grab your free 1,000 credits and see how smooth scraping can be: Get started now.
Conclusion
Downloading images in Python is easy, but doing it properly is what turns a quick script into a real, production-ready workflow.
You've seen how to stream large files safely, avoid memory spikes, generate clean filenames, deduplicate with hashes, parallelize downloads, scrape galleries, and deal with everything from rate limits to hotlink protection.
requests gives you the control you need, and ScrapingBee carries you through the messy parts: stable HTML, JS rendering, proxy rotation, and predictable results even on stubborn sites. Pair those tools with smart habits like caching, being polite with concurrency, retry logic, and solid file handling, and your Python image pipelines end up fast, reliable, and ready for whatever you throw at them.
Python image download FAQs
How do I download an image from a URL in Python?
Use requests.get(url, stream=True) and write the response in chunks with iter_content(). This keeps memory usage low, handles large files safely, and follows the pattern recommended in the requests docs. Add raise_for_status() to catch errors early, and use a Session if you're downloading multiple images.
Why does a direct requests.get return 403?
Because the site doesn't like you showing up "naked." Many servers block default clients when headers like User-Agent or Referer are missing, or when your IP triggers bot checks. Adding realistic headers sometimes works, but the most reliable fix is using ScrapingBee with premium proxies as you inherit real browser behavior and a clean IP pool automatically.
How do I save images with the correct extension?
Look at the server's Content-Type header (image/jpeg, image/png, etc.) and map it to the right extension. When the header is missing or vague, fall back to the URL's extension after stripping query parameters. This keeps your files consistent and avoids saving everything as .jpg by accident.
How do I download many images quickly without bans?
Use smart parallelism, not brute force:
- 10–20 worker threads (not hundreds)
- Retries with exponential backoff
- Per-request timeouts
- A shared
Sessionfor connection reuse
If you're scraping at scale, ScrapingBee's proxy rotation spreads requests across a large IP pool, massively reducing blocks and rate limits.
When should I enable render_js in ScrapingBee?
Turn on render_js=true when the images only appear after JavaScript executes — lazy-loaded galleries, React/Vue pages, JS redirects, cookie walls, etc. For static pages or direct image URLs, leave it off. You'll get faster performance, fewer resources used, and lower credit consumption.
Can I pass headers (e.g., Referer) through ScrapingBee?
Yes. To forward headers, set forward_headers=true and prefix each forwarded header with Spb-. For example:
params = {
"api_key": API_KEY,
"url": target_image_url,
"forward_headers": "true",
"Spb-Referer": "https://example.com",
"Spb-User-Agent": "Mozilla/5.0"
}
This tells ScrapingBee to send those exact headers to the target website.



