In experiments, the error rates of three commercial programmes in determining the gender of light-skinned men were never worse than 0.8 per cent.
For darker-skinned women, however, the error rates ballooned to more than 20 per cent in one case and more than 34 per cent in the other two.
The findings raise questions about how today's neural networks, which learn to perform computational tasks by looking for patterns in huge data sets, are trained and evaluated.
For instance, researchers at a major US technology company claimed an accuracy rate of more than 97 per cent for a face-recognition system they'd designed.
"What's really important here is the method and how that method applies to other applications," said Joy Buolamwini, a researcher at Massachusetts Institute of Technology (MIT) in the US.
"The same data-centric techniques that can be used to try to determine somebody's gender are also used to identify a person when you're looking for a criminal suspect or to unlock your phone," said Buolamwini.
"It's not just about computer vision. I'm really hopeful that this will spur more work into looking at other disparities," he said.
All three systems treated gender classification as a binary decision - male or female - which made their performance on that task particularly easy to assess statistically.
However, the same types of bias probably afflict the programmes' performance on other tasks, too.
To begin investigating the programs' biases systematically, Buolamwini first assembled a set of images in which women and people with dark skin are much better-represented than they are in the data sets typically used to evaluate face-analysis systems. The final set contained more than 1,200 images.
Then she applied three commercial facial-analysis systems from major technology companies to her newly constructed data set.
Across all three, the error rates for gender classification were consistently higher for females than they were for males, and for darker-skinned subjects than for lighter-skinned subjects.
For darker-skinned women, the error rates were 20.8 per cent, 34.5 per cent, and 34.7.
But with two of the systems, the error rates for the darkest-skinned women in the data set were worse - 46.5 per cent and 46.8 per cent.
Disclaimer: No Business Standard Journalist was involved in creation of this content
