The Internet Archive’s Wayback Machine provided free access to over 1 trillion archived web pages

Major publishers, including The New York Times and The Guardian, are increasingly blocking the Internet Archive’s Wayback Machine, which provides free access to over 1 trillion archived web pages.

Why?
Publishers claim the site poses a broader risk of their content being illegally scraped to train large language models powering AI bots.

These concerns are bolstered by a 2023 Washington Post report showing the Internet Archive was used to train Google’s T5 and Meta’s Llama.

An analysis by Originality AI reveals that 241 news sites across nine countries now disallow Internet Archive’s crawling bots. It is led by USA Today Co., which accounts for 87% of these restrictions.

Meanwhile, over 100 journalists have signed a letter in support of the Internet Archive’s preservation mission.

Apart from newspapers, Reddit cut access to the online repository last August.