Oh No! Someone Stole My LLM!
How model theft exposes the urgent need for AI-focused security and privacy
Wait a Second? Can AI Models be Stolen?
A 2023 RTInsights article reports on a survey of 300 senior cybersecurity leaders, finding that 20% of companies experienced an AI model attack or compromise in the prior 12 months. That’s literally one in five businesses reporting incidents involving their AI systems. OWASP’s “Machine Learning Security Top Ten 2023” explicitly calls out Model Theft (ML05) as a distinct category of attack where adversaries extract or clone model parameters, urging encryption, strict access controls, and obfuscation to defend against it.
According to HiddenLayer’s AI Threat Landscape report, 74% of organizations “definitely knew” they’d had an AI‑related security breach in 2024 (up from 67 % the year before), underscoring how rapidly these threats are growing. In that same RTInsights survey, 68% of respondents worried employees might leak sensitive data via tools like ChatGPT and 49% feared threat actors poisoning their AI/ML models.
U.S. prosecutors unsealed a 14‑count federal indictment accusing former Google engineer Linwei “Leon” Ding of economic espionage and trade‑secret theft, alleging he stole “artificial intelligence trade secrets” (including details of the hardware infrastructure and software platform for training large models) to benefit two Chinese companies.
Reuters/New York Times reported that a hacker breached OpenAI’s internal messaging systems in 2023 and “stole details about the design of the company’s artificial intelligence technologies,” which further shows how even leading AI labs are not immune to model‑related data theft.
In December 2024, researchers at North Carolina State University demonstrated a novel “model‑stealing” technique that recovers nearly the entire model purely by feeding noise inputs and observing outputs without ever hacking the server itself. This shows how a remote attacker can clone an AI model just via its public API.
AI model theft goes back a few years, when multiple studies have shown that attackers can reconstruct models by issuing large numbers of queries to machine‑learning APIs. What started as proofs‑of‑concept have since been recognized as real‑world threats against commercial AI services.
What is Model Theft?
Model theft (also called model extraction or cloning) is the unauthorized copying or exfiltration of an AI system’s proprietary components. At its core, it means an attacker walks away with the “secret sauce”: the model architecture, learned weights, or training data you’ve invested time and money to develop.
In other words, learned patterns and rules that make a model smart are obtained and cloned without permission, often to profit, compete, or cause harm. As mentioned earlier, without any need for backdoor access, attackers might bombard a model’s public interface with queries to mimic its outputs, as researchers have shown since 2017 and recently perfected with noise-based tricks in 2024.
Why Does Model Theft Still Happen?
Despite being a known risk for years, model theft remains common because of a combination of human, technical, and economic factors:
Stealing a well‑trained model can be far cheaper than re‑training from scratch. Once exfiltrated, a cloned model can be re‑branded, deployed by shadow players, or reverse‑engineered for competitive advantage.
Exposing predictive services via public APIs is business‑critical but each open endpoint is a potential extraction channel if not properly rate‑limited or monitored.
Research into extraction and side‑channel techniques has accelerated and readily available libraries can automate reconstruction with minimal expertise.
Encryption at rest, strict access controls, watermarking, and obfuscation are still not universally adopted or standardized in many AI deployments, particularly for smaller teams or rapid prototypes.
Organizations often avoid publicizing breaches to protect reputation or client trust. Even when theft occurs, it can be hard to trace back to a specific vulnerability or threat actor, blunting incentives for rigorous disclosure and defense.
What Can be Done to Counter Model Theft Attacks?
Mitigating model theft demands a blend of best practices from standard cryptography to protecting inference and parameter access via hardware‑backed isolation and privacy techniques:
Encrypt model files, enforce strict API quotas, embed watermarks, and adopt “zero‑trust” architecture around AI assets.
Continuous monitoring of API usage patterns, regular security audits of storage and communication channels, and swift incident response playbooks.
Clear documentation of AI ownership boundaries, developer training on secure deployment, and industry‑wide standards for watermarking or fingerprinting models.
Run models inside Trusted Execution Environments (TEEs), a hardware enclave like Intel SGX, AMD SEV, and Arm TrustZone. That way even if someone compromises a host, they can’t directly read out the raw weights or intermediate activations.
Use differentially private inference for adding calibrated noise or clipping to API outputs so that attackers can’t precisely reconstruct model parameters, yet end‑users still get high‑utility predictions.
What Now?
AI systems are undeniably powerful and promise immense value but that very power makes them prime targets for theft, tampering, and misuse. As models grow more capable and the services they enable become more critical, organizations must embed advanced security and privacy‑enhancing technologies throughout the AI lifecycle. From hardware‑backed enclaves and encryption to rigorous access controls, differential privacy, and continual monitoring, the “secret sauce” that drives AI must be protected just as relentlessly as the data it ingests. Only by matching AI’s rising utility with equally robust defenses can we ensure these transformative tools remain both innovative and secure.
Wow, I had no idea but it makes sense. I wish people were just more honest! 💙
David, this is highly useful information. I’ve saved this so I can do a deeper dive shortly.
But implications over these stolen trained models is severe…as no one is even validating to what degree the AI is moving towards. So sure, there’s the lost asset and potential revenue. But it’s even detrimental if these stolen models produce indeterministic results. Especially considering the ethics with regard to the hackers
Looking forward to future discussion here