* This blog post is a summary of this video.

Protecting Large Language Models Like GPT-4 From Cyber Theft

Author: AsianometryTime: 2024-02-10 05:50:01

Table of Contents

The Challenges of Protecting Massive AI Models and Datasets

At the heart of a Large Language Model (LLM) is just two files - around 500 lines of C code and hundreds of billions of seemingly random numbers representing the model parameters. Based on current evaluations, the more parameters a model has and the more data it is trained on, the more capable it becomes. These models and their training data have tremendous economic value and contain proprietary trade secrets. When separated from safety systems, they can also exhibit malicious capabilities.

There is significant incentive for unethical parties to steal these assets. State actors, rogue AI labs, or hacktivists could bootstrap performance by exfiltrating a leading model like GPT-4. The "big" challenge is getting these multi-terabyte models and datasets out undetected given their massive size.

Exfiltration Techniques Used to Steal AI Assets

Well-studied data exfiltration techniques like encoding stolen data into innocuous image, video and email files can work. But it would likely take a prohibitive amount of time and effort to exfiltrate a terabyte-scale model using these traditional tactics. The fact that LLMs need to be deeply embedded into products and distributed over dozens of servers for inference creates new security issues. Attackers may no longer need physical access to lab premises and can potentially breach third-party cloud data centers instead.

How Embedding LLMs Creates New Security Issues

During inference, copies of the LLM float around data centers with the model's data unencrypted in GPU memory. Adversaries could use memory bus monitoring or probes to steal data and model parameters. Such attacks were not a priority in the past given the need for physical access. There is an inherent tradeoff between protecting data in use and maintaining model performance. Heavily restricting access to memory introduces latencies that degrade the user experience. Companies are unlikely to accept products with significantly reduced capabilities due to security measures.

Securing AI Models and Data In-Use

The Linux Foundation's Confidential Computing Consortium seeks hardware-based solutions for securing data in use across environments. Their Trusted Execution Environment (TEE) technology isolates applications and memory while enabling verifiable attestations about running programs.

TEEs were first developed for CPUs but later extended to GPUs like Nvidia's H100. The H100 splits memory into protected and unprotected segments. When in Confidential Compute mode, nothing external can access protected memory. Communications between CPU and GPU are also encrypted once the TEE is initiated.

Many experts see Confidential Computing as a promising way to address the long-standing vulnerability of data in use without a significant loss of performance.

Vulnerabilities Unique to Large Language Models

There are also attack vectors unique to LLMs that remote adversaries can exploit. Model extraction attacks query API endpoints to train a student model that replicates the target LLM. With clever prompting, it may only take tens of thousands of queries to extract specialty models.

Membership inference attacks determine if a particular dataset was used to train the LLM, which could violate privacy or copyright. As LLMs memorize training data, attackers can craft inputs to get the model to regurgitate proprietary snippets.

Insider attacks are always a concern when protecting national assets. However, most data breaches originate from outsiders rather than insiders based on cybersecurity reports. The goal should be implementing strong technical controls that do not rely solely on individual trust.

Model Extraction Attacks

Attackers can fire inputs to commercial LLM APIs and use the outputs to train student models, incrementally replicating the target. It may only take tens of thousands of queries to extract specialty models focused on specific topics like medicine or law.

Extracting the Training Data

Adversaries can determine if certain datasets were used to train an LLM via membership inference attacks. As LLMs memorize training data, attackers craft inputs prompting the model to reveal proprietary data, violating privacy and copyright.

Preparing for Inevitable Breaches

History shows that even cybersecurity elites suffer breaches. Data and models will inevitably be exfiltrated from AI labs through some combination of emerging attack vectors. We must consider the potential ramifications of GPT-4 or similar models being openly released by hostile nation-states.

The time may be coming sooner than expected as LLMs enable greater productivity for activities like phishing. More research is needed around securing data in use and controlling for malicious generation.


Q: How big are large language models like GPT-4?
A: LLMs like GPT-4 can be over a terabyte in size with trillions of parameters, requiring special techniques to store and protect.

Q: What is model extraction?
A: Model extraction is an attack where adversaries query an API to replicate the model. Even partial replication enables capabilities.

Q: How does confidential computing help?
A: Confidential computing isolates the model and memory in a trusted environment, preventing many physical access attacks.