Can AI clean up online hate speech?

Posted 29 January 2018
Can AI stop online trolls from posting hateful comments?

Major online platforms Facebook and the New York Times, have both announced that Machine Learning (ML) systems will be used to moderate their respective websites to try and eliminate abusive trolling.

But can the use of AI successfully flush out the scourge of the trolls? It seems unlikely, at least for now, because not only is the technology being used in its infancy, but there are still some serious problems with using AI to moderate content.

How do AI systems learn which comments need to be flagged up and removed, and which comments are sweet as candy? We tell them of course! Behind every ML system there is a team of developers who will feed the machine examples of good comments and bad comments.

When we think of AI moderating comments, we might imagine that machines make objective decisions about what is acceptable, at least more objective than a human’s decisions anyway. But that’s simply not the case. Take the New York Times AI software as an example: Perspective, a project by Google offshoot Jigsaw.

A new Perspective

The ML system, Perspective, has been developed by extracting millions of comments posted on websites such as Wikipedia and the New York Times. These comments were shown to panels of ten people, according to a report by Wired in February 2017, who rated them in terms of “toxicity”, 100% being the most toxic. These judgements were fed into the AI, and it uses that surveying as its basis to proceed and rate other comments.

Has it worked? No, said American writer David Auerbach, in a Facebook post in February 2017, in which he revealed that Perspective had scored, “please gas the joos. Thank you”, at 7%; and, “She was asking for it”, scored 3%.

By contrast “I think you’re being racist” scored 70%; ”few Muslims are a terrorist threat,” scored 79% toxic; and “Trump sucks”, referring to American President Donald Trump, was scored at 96%, marking that as almost one of the worst things a user could say.

These are some pretty big mistakes, but the point of ML is, of course, that it learns, and researchers at Perspective say that the AI will grow exponentially through continued use. These problems may already be ironed out at the time of writing.

Deciding right from wrong 

One thing that Perspective can’t do is completely disregard the initial judgements fed to it by its developers, tried and tested as they were. And what if some of these judgements just don’t fit the bill?

Case in point: Facebook. Documents, leaked to the Guardian in May 2017 show that Facebook’s human moderators are told that posts such as: “To snap a bitch’s neck, make sure to apply all your pressure to the middle of her throat”; and “Little girl needs to keep to herself before daddy breaks her face”, are acceptable and should not be deleted from the website.

Facebook’s AI software has not yet been named, but we know it’s an ML algorithm that works on “a feedback loop and [gets] better over time”. The AI will primarily be used to crack down on extremist and terrorist content.

Just like Perspective it will be developed by being provided with an initial bank of ‘good comments’ and ‘bad comments’ to work from. Some might say that a ‘good comment’ at Facebook is violently misogynistic, but the AI won’t be able to make that decision for itself.

AI systems have been shown to develop racist, misogynistic, and other prejudices from the humans they interact with. (You only need to look at Microsoft’s now removed chatbot, Tay, to understand how learning from others can have negative consequences.)

Slipping through the cracks 

This means a comments section being tended by a ML system will still always contain some comments that a number of users will find offensive. Is that surprising? It shouldn’t be if we consider that moderating forces a user to decide what is ‘hate speech’ and what is ‘free speech’, and there is simply no consensus on that.

It’s a question that has provoked a lot of throwing about of brains since times immemorial, and we’re no closer to solving it now than the Greeks or the Romans were. Some might even say we’ve gotten further away from the truth.

When it comes to making value judgements, we can only teach AI to see things the way that we see them, because we don’t have an algorithm for morality or even decorum. Unless somebody creates a Superintelligent AI, which will give us that formula, something that could lead to excellent AI moderators and a lot of job losses in the trolling industry.