Skip to main content

Publishers are blocking the Internet Archive for fear AI scrapers can use it as a workaround

The Internet Archive has often been a valuable resource for journalists, from it's finding records of deleted tweets or providing academic texts for background research. However, the advent of AI has created a new tension between the parties. A few major publications have begun blocking the nonprofit digital library's access to their content based on concerns that AI companies' bots are using the Internet Archive's collections to indirectly scrape their articles.

"A lot of these AI businesses are looking for readily available, structured databases of content," Robert Hahn, head of business affairs and licensing for The Guardian, told Nieman Lab. "The Internet Archive’s API would have been an obvious place to plug their own machines into and suck out the IP."

The New York Times took a similar step. "We are blocking the Internet Archive's bot from accessing the Times because the Wayback Machine provides unfettered access to Times content — including by AI companies — without authorization," a representative from the newspaper confirmed to Nieman Lab. Subscription-focused publication the Financial Times and social forum Reddit have also made moves to selectively block how the Internet Archive catalogs their material.

Many publishers have attempted to sue AI businesses for how they access content used to train large language models. To name a few just from the realm of journalism:

Other media outlets have sought financial deals before offering up their libraries as training material, although those arrangements seem to provide compensation to the publishing companies rather than the writers. And that's not even delving into the copyright and piracy issues also being fought against AI tools by other creative fields, from fiction writers to visual artists to musicians. The whole Nieman Lab story is well worth a read for anyone who has been following any of these creative industries’ responses to artificial intelligence.

This article originally appeared on Engadget at https://ift.tt/waxuPFR

from Engadget is a web magazine with obsessive daily coverage of everything new in gadgets and consumer electronics https://ift.tt/waxuPFR
via IFTTT

Comments

Popular posts from this blog

The Nintendo Switch has been the US’s bestselling console for 23 straight months

Photo by James Bareham / The Verge It’s been a good two years for the Nintendo Switch. According to Nintendo, the gaming tablet has been the bestselling console in the US for 23 straight months. And according to data from the NPD Group, it just had its best October ever, moving 735,926 units of both the Switch and Switch Lite in the US. The company says that represents a 136 percent increase compared to last year. To date, the Switch has sold 22.5 million units in the US, and last week Nintendo revealed that more than 68 million units have been sold globally . “We’re excited about our momentum,” says Nick Chavez, Nintendo of America’s SVP of sales and marketing. Chavez puts the company’s big October down to two main factors. One is a better supply of stock; this year in particular, it’s often been hard to find a Switch on store shelves. This has only been exacerbated by increased demand due to a combination of the pandemic and the breakout success of Animal Crossing: New Horizons . ...

Instagram accidentally reinstated Pornhub’s banned account

After years of on-and-off temporary suspensions, Instagram permanently banned Pornhub’s account in September. Then, for a short period of time this weekend, the account was reinstated. By Tuesday, it was permanently banned again. “This was done in error,” an Instagram spokesperson told TechCrunch. “As we’ve said previously, we permanently disabled this Instagram account for repeatedly violating our policies.” Instagram’s content guidelines prohibit  nudity and sexual solicitation . A Pornhub spokesperson told TechCrunch, though, that they believe the adult streaming platform’s account did not violate any guidelines. Instagram has not commented on the exact reasoning for the ban, or which policies the account violated. It’s worrying from a moderation perspective if a permanently banned Instagram account can accidentally get switched back on. Pornhub told TechCrunch that its account even received a notice from Instagram, stating that its ban had been a mistake (that message itse...

MVP versus EVP: Is it time to introduce ethics into the agile startup model?

Anand Rao Contributor Share on Twitter Anand Rao is global head of AI at PwC . The rocket ship trajectory of a startup is well known: Get an idea, build a team and slap together a minimum viable product (MVP) that you can get in front of users. However, today’s startups need to reconsider the MVP model as artificial intelligence (AI) and machine learning (ML) become ubiquitous in tech products and the market grows increasingly conscious of the ethical implications of AI augmenting or replacing humans in the decision-making process. An MVP allows you to collect critical feedback from your target market that then informs the minimum development required to launch a product — creating a powerful feedback loop that drives today’s customer-led business. This lean, agile model has been extremely successful over the past two decades — launching thousands of successful startups, some of which have grown into billion-dollar companies. However, building high-performing product...