Reddit sues Perplexity and three other companies for allegedly using its content without paying

Reddit is suing companies SerApi, OxyLabs, AWMProxy and Perplexity for allegedly scraping its data from search results and using it without a license, The New York Times reports. The new lawsuit follows legal action against AI startup Anthropic, who allegedly used Reddit content to train its Claude chatbot.

As of 2023, Reddit charges companies looking access to posts and other content in the hopes of making money on data that could be used for AI training. The company has also signed licensing deals with companies like Google and OpenAI, and even built an AI answer machine of its own to leverage the knowledge in users' posts. Scraping search results for Reddit content avoids those payments, which is why the company is seeking financial damages and a permanent injunction that prevents companies from selling previously scraped Reddit material.

Some of the companies Reddit is focused on, like SerApi, OxyLabs and AWMProxy, are not exactly household names, but they've all made collecting data from search results and selling it a key part of their business. Perplexity's inclusion in the lawsuit might be more obvious. The AI company needs data to train its models, and has already been caught seemingly copying and regurgitating material it hasn't paid to license. That also includes reportedly ignoring the robots.txt protocol, a way for websites to communicate that they don't want their material scraped.

Per a copy of the lawsuit provided to Engadget, Reddit had already sent a cease-and-desist to Perplexity asking it to stop scraping posts without a license. The company claimed it didn't use Reddit data, but it also continued to cite the platform in answers from its chatbot. Reddit says it was able to prove Perplexity was using scraped Reddit content by creating a "test post" that "could only be crawled by Google’s search engine and was not otherwise accessible anywhere on the internet." Within a few hours, queries made to Perplexity's answer engine were able to reproduce the content of the post.

"The only way that Perplexity could have obtained that Reddit content and then used it in its 'answer engine' is if it and/or its co-defendants scraped Google [search results] for that Reddit content and Perplexity then quickly incorporated that data into its answer engine," the lawsuit claims.

When asked to comment, Perplexity provided the following statement:

Perplexity has not yet received the lawsuit, but we will always fight vigorously for users’ rights to freely and fairly access public knowledge. Our approach remains principled and responsible as we provide factual answers with accurate AI, and we will not tolerate threats against openness and the public interest.

This new lawsuit fits with the aggressive stance Reddit has taken towards protecting its data, including rate-limiting unknown bots and web crawlers in 2024, and even limiting what access the Internet Archive's Wayback Machine has to its site in August 2025. The company has also sought to define new terms around how websites are crawled by adopting the Really Simple Licensing standard, which adds licensing terms to robots.txt.

This article originally appeared on Engadget at https://ift.tt/iQCnHNW

from Engadget is a web magazine with obsessive daily coverage of everything new in gadgets and consumer electronics https://ift.tt/iQCnHNW
via IFTTT

Tech Boys

Search This Blog