A team of scientists has demonstrated a new attack using the Text-to-SQL model to generate malicious code that can allow an attacker to collect sensitive information and conduct DoS attacks.
To better interact with users, database applications use artificial intelligence techniques that can translate human queries into SQL queries (Text-to-SQL model), according to the researchers.
By sending special requests, an attacker can fool the text-to-SQL transformation models to create malicious code. Since such code is automatically executed in the database, it can lead to data leakage to the dark web and DoS attacks.
The findings, which were confirmed by two commercial solutions BAIDU-UNIT and AI2sql, mark the first experimental case where natural language processing (NLP) models have been used as an attack vector.
There are many ways to install backdoors in pre-trained language models (PLMs) by poisoning training samples, such as word substitution, development of special prompts, and changing sentence styles. Attacks on 4 different open source models (BART-BASE, BART-LARGE, T5-BASE and T5-3B) using malicious images have achieved 100% success rate with little performance impact, making such issues hard to detect in the real world.
As a mitigation, experts suggest including classifiers to check for suspicious strings in inputs, evaluate off-the-shelf models to prevent supply chain threats, and adhere to software engineering best practices.