We live in the age of Artificial Intelligence (AI), and the impact of this technology is not restricted to just tools like ChatGPT or Google Bard. AI is revolutionizing many sectors worldwide as businesses leverage the power of Machine Learning (ML) to offload key activities onto AI applications for better efficiency. At the same time, however, the broad adoption of these tools has not escaped the attention of cyber attackers, and new types of attacks targeting AI applications have started cropping up.
In this article, we go over two attacks that uniquely target how AI-based applications work and how they can exploit the inner workings of these applications. These are Membership Inference and Data Poisoning attacks.
We must understand how Machine Learning applications work to understand Membership Inference attacks. These applications are trained on a massive amount of data to predict future outcomes and make decisions. This data can be quite sensitive, such as when AI applications are used in sectors like Healthcare, Payments, Government Services, etc.
In a Membership Inference attack, the attacker’s goal is to find out if a particular dataset was used in training. If the attacker is successful, this can have significant privacy implications, such as enabling the attacker to find out if a particular face was used in a facial recognition app or if a person was involved in a specific medical procedure.
Most AI applications do not provide this data but respond with confidence scores when they are queried. By constantly querying the machine learning model and observing the level of confidence scores that are generated, attackers can piece together the type of data that was used in the training.
This attack is unique in how it abuses the very nature of AI applications and how they depend on data to make their decisions. The attacker can deduce the differences when a model is given data on which it was trained vs. data it does not recognize and build up this pattern over time.
Data Poisoning Attacks
If Membership Inference attacks the privacy and confidentiality of a Machine Learning Application, then the following type of attack, Data Poisoning, targets the integrity.
As we mentioned earlier, Machine Learning needs data and lots of it. The accuracy and security of this data are paramount as that will form the basis of its training and affect how it makes decisions in the future. In this attacker poisons the data used for training to corrupt how the application makes predictions or decisions going forward.
Imagine an anti-malware program trained to recognize specific behavior, and an attacker introduces his malicious code into the “approved” behavior dataset. That could theoretically mean that the anti-malware would never flag this malicious code giving the attacker a backdoor to attack any company using this product! Or deliberately corrupting a dataset used by other machine learning programs causing widespread chaos.
An attacker could mislabel the training dataset used for self-driving cars, making it misidentify objects and civilians alike, leading to injury or even loss of life. Whether the goal is to “trick” the AI application or to degrade its performance, data poisoning can have profound security implications for companies using them.
The risk is amplified by the fact that most companies do not create training datasets from scratch but rely on pre-trained data stores that are shared amongst thousands of companies. If an attacker could poison this data pool, they could corrupt the decision-making in all these companies.
Protecting against these attacks
New attacks require new types of controls, and cybersecurity professionals must upskill themselves regarding these new threats.
A few essential controls that can be implemented are:
- For Membership Inference: Adding “noise” to the responses given by the model makes it difficult for attackers to differentiate individual responses that tell them which data was used to train the model. This is similar to how outputs are sanitized in web application attacks so that error codes do not give too much information. Other security controls can also be implemented, such as alerts if excessive querying occurs from a single location, potentially indicating an attacker is trying to commit inference attacks.
- For Data Poisoning: It is advised to implement robust security controls against data stores used for training purposes so that attackers cannot compromise the same. AI applications should also be tested whenever they are refreshed so that any changes in behavior or decision-making are immediately identified.
Cybersecurity professionals must understand how this new breed of attacks works and the underlying techniques that are used to compromise machine learning applications to be in a position to protect against them going forward. AI is the future, and AI application attacks are here to say. As cyberattacks evolve, security controls must evolve with them and adapt accordingly.
FREQUENTLY ASKED QUESTIONS
What is a membership inference attack?
A membership inference attack is a privacy breach where an attacker tries to determine if a specific data record was used to train an AI model. If successful, this can reveal sensitive information about individuals.
How does a membership inference attack work?
Membership inference attacks exploit the tendency of AI models to overfit their training data. An attacker can infer whether a data point was part of the training set by observing differences in the model’s behavior for data it was trained on versus unseen data.
What is a data poisoning attack?
A data poisoning attack is a type of cyber threat where an attacker manipulates the training data of an AI model with the intent to influence its behavior. Injecting malicious data can lead the model to make incorrect predictions or decisions.
What are the types of data poisoning attacks?
There are two main types of data poisoning attacks: targeted and exploratory. In targeted attacks, the attacker aims to manipulate specific predictions. In exploratory attacks, the goal is to degrade the model’s overall performance.