Anthropic Claims Chinese AI Firms ‘Distilled’ Claude to Train Their Models
Anthropic Claims Chinese AI Firms ‘Distilled’ Claude to Train Their Models
Questions about how AI models can be copied and replicated are moving from theory into active security debates after Anthropic, the developer of the Claude AI chatbot, accused several companies of attempting to extract knowledge from the Claude language model. In a recent blog post, the company said it detected coordinated activity aimed at using Claude outputs to train competing systems, a practice known as model distillation. In AI, distillation refers to training a new AI model by learning from the outputs of an existing model instead of using original training data. While the process has legitimate uses across the industry, Anthropic argues that large-scale automated querying designed to replicate a model’s capabilities crosses into abuse. 🚨
According to the company, investigators observed patterns suggesting that DeepSeek and two other China-based AI firms, including MiniMax and Moonshot AI, accessed Claude in ways intended to extract structured responses at scale. Anthropic claims these activities involved bypassing platform safeguards and export restrictions tied to advanced chips and software, raising concerns that the effort required coordination beyond normal usage. In the case of DeepSeek, researchers reported more than 150,000 exchanges focused on reasoning tasks across different domains, as well as rubric-based grading workflows that effectively turned Claude into a reward model for reinforcement learning. 📊
As for the other two firms, Anthropic attributes more than 3.4 million exchanges to Moonshot AI, which it says concentrated on agentic reasoning, coding and data analysis, computer-use agents, and computer vision workflows. MiniMax accounted for the largest volume at over 13 million exchanges, with activity focused on agentic coding and tool orchestration, areas that allow AI systems to plan tasks and coordinate multiple functions. According to Anthropic, the structured nature and volume of these interactions indicated systematic data collection rather than ordinary user behaviour. 🔍
Anthropic said it is developing detection systems designed to identify suspicious querying patterns associated with distillation attacks. These include monitoring for unusual prompt sequences, automated request patterns, and attempts to harvest structured knowledge in bulk. Security experts say the issue extends beyond major AI labs. William Wright, CEO of Closed Door Security, warned that any organisation building customised AI assistants or chatbots could face similar risks if adversaries attempt to replicate proprietary knowledge through prompting alone. “The statement from Anthropic highlights a threat that most businesses are not talking about,” Wright said. “Distillation doesn’t just raise misalignment risks: it means that any company that has built a customised AI chatbot, agent, or assistant has effectively packaged its proprietary knowledge into something that can be queried, and therefore copied.” “An attacker does not need access to the code or the training data to steal business IP; they just need to prompt the model,” he said. 💡
To read the complete article see: Read full article