Skip to main content

Reddit sues Perplexity and three other companies for allegedly using its content without paying

Reddit is suing companies SerApi, OxyLabs, AWMProxy and Perplexity for allegedly scraping its data from search results and using it without a license, The New York Times reports. The new lawsuit follows legal action against AI startup Anthropic, who allegedly used Reddit content to train its Claude chatbot.

 As of 2023, Reddit charges companies looking access to posts and other content in the hopes of making money on data that could be used for AI training. The company has also signed licensing deals with companies like Google and OpenAI, and even built an AI answer machine of its own to leverage the knowledge in users' posts. Scraping search results for Reddit content avoids those payments, which is why the company is seeking financial damages and a permanent injunction that prevents companies from selling previously scraped Reddit material.

Some of the companies Reddit is focused on, like SerApi, OxyLabs and AWMProxy, are not exactly household names, but they've all made collecting data from search results and selling it a key part of their business. Perplexity's inclusion in the lawsuit might be more obvious. The AI company needs data to train its models, and has already been caught seemingly copying and regurgitating material it hasn't paid to license. That also includes reportedly ignoring the robots.txt protocol, a way for websites to communicate that they don't want their material scraped.

Per a copy of the lawsuit provided to Engadget, Reddit had already sent a cease-and-desist to Perplexity asking it to stop scraping posts without a license. The company claimed it didn't use Reddit data, but it also continued to cite the platform in answers from its chatbot. Reddit says it was able to prove Perplexity was using scraped Reddit content by creating a "test post" that "could only be crawled by Google’s search engine and was not otherwise accessible anywhere on the internet." Within a few hours, queries made to Perplexity's answer engine were able to reproduce the content of the post.

"The only way that Perplexity could have obtained that Reddit content and then used it in its 'answer engine' is if it and/or its co-defendants scraped Google [search results] for that Reddit content and Perplexity then quickly incorporated that data into its answer engine," the lawsuit claims.

When asked to comment, Perplexity provided the following statement:

Perplexity has not yet received the lawsuit, but we will always fight vigorously for users’ rights to freely and fairly access public knowledge. Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.

This new lawsuit fits with the aggressive stance Reddit has taken towards protecting its data, including rate-limiting unknown bots and web crawlers in 2024, and even limiting what access the Internet Archive's Wayback Machine has to its site in August 2025. The company has also sought to define new terms around how websites are crawled by adopting the Really Simple Licensing standard, which adds licensing terms to robots.txt.

This article originally appeared on Engadget at https://ift.tt/iQCnHNW

from Engadget is a web magazine with obsessive daily coverage of everything new in gadgets and consumer electronics https://ift.tt/iQCnHNW
via IFTTT

Comments

Popular posts from this blog

Instagram accidentally reinstated Pornhub’s banned account

After years of on-and-off temporary suspensions, Instagram permanently banned Pornhub’s account in September. Then, for a short period of time this weekend, the account was reinstated. By Tuesday, it was permanently banned again. “This was done in error,” an Instagram spokesperson told TechCrunch. “As we’ve said previously, we permanently disabled this Instagram account for repeatedly violating our policies.” Instagram’s content guidelines prohibit  nudity and sexual solicitation . A Pornhub spokesperson told TechCrunch, though, that they believe the adult streaming platform’s account did not violate any guidelines. Instagram has not commented on the exact reasoning for the ban, or which policies the account violated. It’s worrying from a moderation perspective if a permanently banned Instagram account can accidentally get switched back on. Pornhub told TechCrunch that its account even received a notice from Instagram, stating that its ban had been a mistake (that message itse...

Watch Aidy Bryant *completely* lose it as 'SNL' roasts political pundits

On Saturday Night Live , there are breaks and then there's whatever happened here. The Season 45 premiere featured a sketch that was meant to expose the empty noisemaking of political punditry on TV. But part of the joke involved a series of quick costume changes, and some weirdness during one of those switches led to a complete and total breakdown. Aidy Bryant, the segment's host, couldn't take it. She manages to keep it together until what appears to be an accidental wide shot exposes some of the magic as we see a woman who's probably a member of the SNL wardrobe crew fiddling with Aidy's costume. Read more... More about Saturday Night Live , Aidy Bryant , Entertainment , and Movies Tv Shows from Mashable https://ift.tt/2okrAOq via IFTTT

California Gov. Newsom vetoes bill SB 1047 that aims to prevent AI disasters

California Gov. Gavin Newsom has vetoed bill SB 1047, which aims to prevent bad actors from using AI to cause "critical harm" to humans. The California state assembly passed the legislation by a margin of 41-9 on August 28, but several organizations including the Chamber of Commerce had urged Newsom to veto the bill . In his veto message on Sept. 29, Newsom said the bill is "well-intentioned" but "does not take into account whether an Al system is deployed in high-risk environments, involves critical decision-making or the use of sensitive data. Instead, the bill applies stringent standards to even the most basic functions - so long as a large system deploys it."  SB 1047 would have made the developers of AI models liable for adopting safety protocols that would stop catastrophic uses of their technology. That includes preventive measures such as testing and outside risk assessment, as well as an "emergency stop" that would completely shut down...