Reddit has announced to block the Internet Archive (IA) from indexing popular Reddit threads to prevent artificial intelligence (AI) firms from scrapping the content for training purposes.
According to Reditt, it has found evidence against AI companies that scrapped its data through Internet Archive’s Wayback Machine.
As the result of blocking the access, the Wayback Machine will fail to archive Reddit pages, profiles, comments, and threads. Only the news headlines and popular posts will be archived on a given day from now on.
According to Tim Rathschmidt, the spokesperson. “Internet Archive provides services to the open web, but we have been made aware of instances where AI companies violate platform policies, including ours, and scrap data from the Wayback Machine.”
“Until they are able to defend their site and comply with platform policies such as respecting content and user privacy we are limiting some of their access to Reddit data to protect redditors,” Tim added.
Tim also urged the IA to put concerted efforts to tackle the issues of data scraping.
Although Reddit has not yet mentioned the names of companies that are involved in this violation.
In recent times, Reddit has taken steps of cutting off access to scrapers tools. In 2024, it made a deal with Google for AI training data and Google Search and blocked the search engines from using its data without any payment.
It has also decided to bring API changes, forcing third-party apps to shut down based on training fears.
In June 2025, Reddit filed a lawsuit against Anthropic, accusing it of scraping Reddit data.
In response, Mark Graham, director of the Wayback Machine issued a statement, putting the emphasis on a long-standing relationship and ongoing discussion about the matter.