You are here: Home » News-IANS » General
Business Standard

Hate speech-detecting AIs can be fooled: Study

IANS  |  London 

detectors deployed by major and to track are "brittle and easy to deceive", a study claims.

The study, led by researchers from the in Finland, found that bad grammar and awkward spelling -- intentional or not -- might make toxic comments harder for (AI) detectors to spot.

Modern techniques (NLP) can classify text based on individual characters, words or sentences. When faced with textual data that differs from that used in their training, they begin to fumble, the researchers said.

"We inserted typos, changed word boundaries or added neutral words to the original Removing spaces between words was the most powerful attack, and a combination of these methods was effective even against Google's comment-ranking system Perspective," said Tommi Grondahl, a doctoral student at the varsity.

The team put seven state-of-the-art detectors to the test for the study. All of them failed.

Among them was Google's Perspective. It ranks the "toxicity" of comments using text analysis methods.

Earlier, it was found that "Perspective" can be fooled by introducing simple typos.

But, Grondahl's team discovered that although "Perspective" has since become resilient to simple typos, it can still be fooled by other modifications such as removing spaces or adding innocuous words like "love".

A sentence like "I hate you" slipped through the sieve and became non-hateful when modified into "Ihateyou love".

Hate speech is subjective and context-specific, which renders text analysis techniques insufficient as the researchers said.

They recommend that more attention be paid to the quality of data sets used to train models -- rather than refining the model design.

The results will be presented at the forthcoming ACM AISec workshop in



(This story has not been edited by Business Standard staff and is auto-generated from a syndicated feed.)

First Published: Sun, September 16 2018. 17:28 IST