How to Get All URLs from the Wayback Machine

The Wayback Machine (Archive.org) is one of the most powerful resources for SEO, migrations, and competitor research. But if you’ve ever tried to extract a full list of URLs from it, you’ll know it’s not exactly straightforward.

Contents:

  • Using the archive.org interface
  • Using the CDX API
  • Limitations
  • A faster way to get all URLs using a free tool

Before we go on, please bear in mind that archive.org is a wonderful tool that is free to use, but also needs our help to stay active. Please consider donating to archive.org to keep this service available for everyone.

1. Using the Wayback Machine Interface (Manual Method)


https://www.lifewire.com/thmb/8Cb2_ZOLP6P7GLYmsctRiq3ehL4%3D/1500x0/filters%3Ano_upscale%28%29%3Amax_bytes%28150000%29%3Astrip_icc%28%29/WaybackMachine-CalendarView-66a466e6111640959a1ef75fe1aa2fc0.jpg

The most common way to explore archived pages is via the Wayback Machine interface.

How it works:

  1. Enter a domain (e.g. example.com)
  2. Browse the timeline and calendar view
  3. Click on specific snapshots to view archived pages

The problem:

This method is useful for browsing, but not for extracting URLs at scale.

You’ll quickly run into limitations:

  • No way to export URLs
  • No structured list of pages
  • Pages are slow to load
  • Difficult to analyse large sites

2. Using the Wayback Machine API (CDX API)

To extract URLs properly, you need to use the CDX API — the underlying data source behind the Wayback Machine.

Example API request:

http://web.archive.org/cdx/search/cdx?url=example.com/*&output=json&fl=original,timestamp,statuscode

This returns:

  • URL
  • Timestamp
  • HTTP status code

Adding a date range

You can refine results using from and to:

&from=20200501&to=20250531

Deduplicating URLs

To return only one version per URL:

&collapse=original

Full example:

http://web.archive.org/cdx/search/cdx?url=example.com/*&output=json&fl=original,timestamp,statuscode&from=20200501&to=20250531&collapse=original

Limitations of the API approach

While powerful, the API isn’t exactly user-friendly:

  • Requires manual URL construction
  • Returns raw JSON (not easy to analyse)
  • No built-in filtering (e.g. only 200 pages)
  • No export or visual interface
  • Large datasets can be hard to handle

This is where most people get stuck.

3. A Faster Way: Use a Wayback URL Extractor Tool

Instead of manually building API requests, you can use a tool to handle everything for you.

Use the Wayback Machine Extractor tool here

What the tool does:

  • Pulls all archived URLs for a domain
  • Lets you set a date range (YYYY/MM/DD)
  • Displays status codes and timestamps
  • Links directly to archived pages
  • Includes filters:
    • Deduplicate URLs
    • Show only 200 status pages
  • Allows CSV export or copy to clipboard

How to use the tool

  1. Enter your domain
  2. Set your date range
  3. Click Fetch URLs
  4. Apply filters if needed:
    • Deduplicate → one row per URL
    • Only 200s → remove redirects/errors
  5. Export or copy results

You’ll instantly get a structured list of URLs, saving you time and headache.

Practical SEO Use Cases

1. Redirect mapping

Pull historical URLs to:

  • build 301 redirect maps
  • identify missing redirects
  • recover legacy pages

2. Website migrations

Before or after a migration:

  • verify old URLs
  • ensure coverage
  • find dropped pages

3. Competitor analysis

See what competitors:

  • used to rank for
  • have removed or consolidated
  • previously targeted

4. Expired domain research

Check:

  • historical site structure
  • spam signals
  • content themes

5. Content recovery

Find and restore:

  • deleted blog posts
  • product pages
  • landing pages