It’s disturbingly easy to trick AI into doing something deadly ( Reposted )
Updated: Feb 6
Artificial intelligence researchers have a big problem. Even as they design powerful new technologies, hackers are figuring out how to trick the tech into doing things it was never meant to — with potentially deadly consequences.
The scariest part is that hackers can do this using something as simple as...stickers.
In a recent report, Tencent’s Keen Security Lab showed how they were able to bamboozle a Tesla Model S into switching lanes so that it drives directly into oncoming traffic. All they had to do was place three stickers on the road, forming the appearance of a line. The car’s Autopilot system, which relies on computer vision, detected the stickers and interpreted them to mean that the lane was veering left. So it steered the car that way.
If this were happening in the real world, the results could’ve been lethal. Luckily, it was an experiment designed by experts who were testing out the technology to make sure it’s invulnerable to hackers who may want to carry out so-called “adversarial attacks” on machine learning systems.
That’s a very real risk, and it’s becoming an increasing source of concern to AI researchers. It has serious implications for fields that rely heavily on AI, from self-driving cars to medicine to the military.
Machine learning is a type of AI that involves feeding computers example after example of something, until they “learn” to make their own determinations. The aim of adversarial machine learning is to trick the computers by feeding them inputs that’ll mess up their determinations.
Placing stickers on the road is one example of that. In another commonly cited example, researchers placed stickers on a stop sign to make a self-driving car think the sign says there’s a speed limit of 45 miles per hour. This can be done with other types of objects, too. Here’s a sticker that fools AI into thinking a banana is a toaster:
Responding to the Keen Security Lab report, Tesla co-founder and CEO Elon Musk said it was “solid work by Keen, as usual.” This is not the first time Keen, a leading security research team, has probed the vulnerabilities of a Tesla.
However, a Tesla spokesperson responding to the recent report said the vulnerability it identifies is “not a realistic concern given that a driver can easily override Autopilot at any time by using the steering wheel or brakes and should always be prepared to do so.” But that seems too flippant. Realistically, people in self-driving cars are not going to be prepared to jump into action at any moment, because the very premise of Autopilot will have conditioned them to think they can afford to let their minds wander.
UC Berkeley computer science professor Dawn Song, who studies adversarial attacks, says that within the research community, people are taking the risk of such attacks seriously. “Everyone has recognized the importance of this topic — researchers from Google and Facebook as well as Open AI are actively working in this domain,” she told me, adding that the last two years have seen “an explosion” of interest as AI gets more powerful, more ubiquitous, and therefore more dangerous.
Although an adversarial attack involving the use of stickers to fool AI hasn’t yet been observed in the real world, there’s a sense that it may not be long before bad actors try this sort of thing. “Once you understand how to do it,” Song said, “it’s very cheap and easy to do.”
The presence of these risks doesn’t mean we should jettison all AI and the many benefits it offers us. But it does mean we should be figuring out how to make our AI systems robust in the face of attacks. To do that, we need to use our imaginations to anticipate what hackers might come up with, staying always one step ahead of them.
How adversarial attacks could affect medicine, warfare, and more
Song has studied various types of adversarial machine learning methods, one of which MIT Technology Review sums up like this:
One project, conducted in collaboration with Google, involved probing machine-learning algorithms trained to generate automatic responses from e-mail messages (in this case the Enron email data set). The effort showed that by creating the right messages, it is possible to have the machine model spit out sensitive data such as credit card numbers. The findings were used by Google to prevent Smart Compose, the tool that auto-generates text in Gmail, from being exploited.
Another scenario looks at an adversarial attack that targets the health care system. A study by Harvard and MIT researchers, published last month in Science, showed how machine-learning systems can be fooled into participating in medical fraud.
Let’s say you’re a doctor and your patient has a mole. An image of it is fed into a machine-learning system, which correctly identifies it as benign. But then you add a “perturbation” to the image — a layer of pixels that changes how the system reads the underlying image. Suddenly the mole is classified as malignant. You claim that an excision is necessary and you request reimbursement for it. Because you’ve gamed the classification, the health insurance company is willing to dish out the money.