How to Get All URLs from the Wayback Machine

by David Gossage March 28, 2026

The Wayback Machine (Archive.org) is one of the most powerful resources for SEO, migrations, and competitor research. But if you’ve ever tried to extract a full list of URLs from it, you’ll know it’s not exactly straightforward.

Contents:

Using the archive.org interface
Using the CDX API
Limitations
A faster way to get all URLs using a free tool

Before we go on, please bear in mind that archive.org is a wonderful tool that is free to use, but also needs our help to stay active. Please consider donating to archive.org to keep this service available for everyone.

1. Using the Wayback Machine Interface (Manual Method)

https://www.lifewire.com/thmb/8Cb2_ZOLP6P7GLYmsctRiq3ehL4%3D/1500x0/filters%3Ano_upscale%28%29%3Amax_bytes%28150000%29%3Astrip_icc%28%29/WaybackMachine-CalendarView-66a466e6111640959a1ef75fe1aa2fc0.jpg

The most common way to explore archived pages is via the Wayback Machine interface.

How it works:

Enter a domain (e.g. example.com)
Browse the timeline and calendar view
Click on specific snapshots to view archived pages

The problem:

This method is useful for browsing, but not for extracting URLs at scale.

You’ll quickly run into limitations:

No way to export URLs
No structured list of pages
Pages are slow to load
Difficult to analyse large sites

2. Using the Wayback Machine API (CDX API)

To extract URLs properly, you need to use the CDX API — the underlying data source behind the Wayback Machine.

Example API request:

http://web.archive.org/cdx/search/cdx?url=example.com/*&output=json&fl=original,timestamp,statuscode

This returns:

URL
Timestamp
HTTP status code

Adding a date range

You can refine results using from and to:

&from=20200501&to=20250531

Deduplicating URLs

To return only one version per URL:

&collapse=original

Full example:

http://web.archive.org/cdx/search/cdx?url=example.com/*&output=json&fl=original,timestamp,statuscode&from=20200501&to=20250531&collapse=original

Limitations of the API approach

While powerful, the API isn’t exactly user-friendly:

Requires manual URL construction
Returns raw JSON (not easy to analyse)
No built-in filtering (e.g. only 200 pages)
No export or visual interface
Large datasets can be hard to handle

This is where most people get stuck.

3. A Faster Way: Use a Wayback URL Extractor Tool

Instead of manually building API requests, you can use a tool to handle everything for you.

Use the Wayback Machine Extractor tool here

What the tool does:

Pulls all archived URLs for a domain
Lets you set a date range (YYYY/MM/DD)
Displays status codes and timestamps
Links directly to archived pages
Includes filters:
- Deduplicate URLs
- Show only 200 status pages
Allows CSV export or copy to clipboard

How to use the tool

Enter your domain
Set your date range
Click Fetch URLs
Apply filters if needed:
- Deduplicate → one row per URL
- Only 200s → remove redirects/errors
Export or copy results

You’ll instantly get a structured list of URLs, saving you time and headache.

Practical SEO Use Cases

1. Redirect mapping

Pull historical URLs to:

build 301 redirect maps
identify missing redirects
recover legacy pages

2. Website migrations

Before or after a migration:

verify old URLs
ensure coverage
find dropped pages

3. Competitor analysis

See what competitors:

used to rank for
have removed or consolidated
previously targeted

4. Expired domain research

Check:

historical site structure
spam signals
content themes

5. Content recovery

Find and restore:

deleted blog posts
product pages
landing pages