HTTrack is a website cloner for a web that doesn't exist anymore

Landing Page

Published: Mar 6, 2026

Apollo dying in Rocky's arms in Rocky IV (1985)

Trying to clone a website

I paste a URL into HTTrack and start the mirror.

Files begin appearing in the project folder - images, CSS, JavaScript bundles, fonts. It feels a little like watching a website get pulled apart piece by piece and laid neatly onto my disk.

When the crawl finishes, I open the mirrored folder and double-click index.html. Instead of the homepage, I see this.

404 Page Not Found on index.html — The *mirror* that HTTrack downloaded

The file I opened is literally index.html. The crawl completed without errors. HTTrack downloaded a bunch of files. The folder structure looks intact. But the homepage still says the page doesn’t exist.

The mirror looks fine

My first instinct is that something must have gone wrong with the mirror. I start checking the usual things. Maybe some assets didn’t download. Maybe some JavaScript file failed to load. But the mirror looks normal. The CSS files are there. The JavaScript bundles are there. The images are there. HTTrack seems to have done exactly what it was supposed to do.

So I start searching. Queries like “httrack not working,” “httrack javascript website,” and eventually “httrack react website.”

A pattern shows up quickly. HTTrack often struggles with React websites.

A small realization

Once I see that explanation, another observation starts to make sense.

The HTTrack website itself looks like it hasn’t been updated since the late 90s. When you land on the site, it almost feels like stepping into an older part of the internet. Plain HTML pages. Simple layout. Dense blocks of text. At first glance it looks outdated.

HTTrack's website is a yesteryear beauty queen

But it also reflects the kind of web HTTrack was built for. A web made of files.

The web changed

Around the same time, I start looking at how common React actually is now.

React is used by 85% of frontend developers — React is way out in the front

React usage has been climbing steadily for years. A large share of modern websites - especially the polished marketing pages and product sites people share online - are built with React.

The animated landing pages.
The interactive demos.
The startup sites with smooth transitions and elaborate UI details.

All the cutting edge landing pages are built on React.

Which means the problem I ran into isn’t unusual. It’s becoming more common.

What HTTrack actually does

For a long time, websites mostly consisted of files that already represented the finished page. When a browser requested a page, the server sent back HTML that already contained the text, the layout, and the structure of the page.

HTTrack works perfectly in that world.

It downloads the HTML, the CSS, the JavaScript files, the images, and whatever other assets the page references. Then it recreates the folder structure locally so those files can be loaded again without the original server. Open the mirrored page and the browser loads the same files. The page appears almost exactly as it did online.

Why React websites break

React sites work differently.

Instead of sending a finished page, the server often sends a minimal HTML shell and a JavaScript bundle. The browser loads the shell first. Then the JavaScript starts running and begins constructing the page dynamically.

Components render themselves. Routes are handled on the client side. Data is often fetched from APIs after the page has already loaded. A large portion of the page doesn’t exist yet when the initial HTML arrives. HTTrack never sees that part. It downloads what the server sends: the shell, the scripts, the static assets. But much of the actual content only appears once the JavaScript executes in the browser and begins talking to APIs or assembling components.

That’s where the strange behavior starts.

When you open the mirrored site, the files are technically all there. But the runtime environment the application expects is missing. API calls fail. Routes don’t resolve. The application falls back to its internal error state. Which is how I ended up opening index.html and seeing “Page Not Found.” The mirror technically succeeded. But the page itself never really existed in the files HTTrack downloaded.

Where HTTrack still works well

This doesn’t mean HTTrack is a bad tool.

It still works extremely well in the situations it was designed for. Static websites mirror beautifully. Documentation sites copy almost perfectly. Blogs download cleanly and can be browsed offline without any problems.

When a page already exists as a set of files on the server, HTTrack can recreate it with surprising accuracy. In that sense it behaves more like a web preservation tool than a development tool. Researchers use it to archive websites. People use it to save documentation locally.

It’s extremely good at capturing what the server actually sends to the browser.

When the mirror isn’t enough

Where things start to feel frustrating is when someone expects something different.

Sometimes the goal isn’t to archive the site. Sometimes the goal is to reuse it. To study a landing page layout. To copy the structure of a marketing page. To recreate a UI.

HTTrack downloads the files. But if the site is built as a React application, the mirror often isn’t usable.

The layout might be there, but the behavior is missing.

Buttons don’t work.
Sections stay blank.
Navigation breaks.

That’s usually the moment people start looking for a website cloner.

Why I built Cuttly

That’s the problem I ran into while trying to recreate pages.

HTTrack mirrors websites exactly as the server sends them. But modern websites often depend on things that only exist at runtime -JavaScript execution, client-side routing, API responses, application state.

If the goal is to preserve the site, that model works perfectly.

If the goal is to recreate the page so it can actually be reused, the mirror isn’t enough.

That’s the problem I built Cuttly to solve.

Instead of downloading the files exactly as they exist on the server, Cuttly reconstructs the page itself. The goal isn’t to archive the site but to produce a working copy of the page that can actually be edited and reused.

HTTrack mirrors websites. Cuttly recreates them.

SHARE THIS POST