Daily Shorts

The Internet Archive’s Wayback Machine provided free access to over 1 trillion archived web pages

Major publishers, including The New York Times and The Guardian, are increasingly blocking the Internet Archive’s Wayback Machine, which provides free access to over 1 trillion archived web pages.

Why?
Publishers claim the site poses a broader risk of their content being illegally scraped to train large language models powering AI bots.

These concerns are bolstered by a 2023 Washington Post report showing the Internet Archive was used to train Google’s T5 and Meta’s Llama.

An analysis by Originality AI reveals that 241 news sites across nine countries now disallow Internet Archive’s crawling bots. It is led by USA Today Co., which accounts for 87% of these restrictions.

Meanwhile, over 100 journalists have signed a letter in support of the Internet Archive’s preservation mission.

Apart from newspapers, Reddit cut access to the online repository last August.

Major news organizations are blocking the Wayback Machine

Tiny followings, big impact: Brands rethink influencer spending

Paramount pauses Warner deal pending court decision

Comcast beats Q2 revenue estimates as Peacock turns profitable

Teen withdraws mental health lawsuit against Meta before trial

ESPN cuts stars from lineup as NFL deal takes shape

California and other states move to halt Paramount-WBD deal