Friday, September 09, 2016

fairness, machine learning, versus optimal stopping and cognitive bias

There's a bunch of work in making sure that machine learning systems are, in some carefully defined sense, fair - see for example the MPI work by Krishna Gummadi, in removing biases in various ML use cases (e.g. gender as an explicit or implicit discriminator).

For me there's a really subtle problem here which links between this work and other problems of Optimal Stopping and Cognitive Biases, and how one choose to define fairness in ML and the feedback loop between this and human society and the views we take on each other.

So lets take two simple use cases:

1. Admissions to University and Gender

Imagine a Computer Science department has 100 applicants a month, over 3 months for 50 places and wants to pick the best 50 people. Naive use of Optimal stopping would say wait til you have 37% of the applicants (111 people), then pick. What if the population is drawn differently by gender - e.g. out of every 100 applicants, only 1 is female. Lets say this is because applicants are self selecting based on the position in the ability of their own sub-population.. You have about a 1/2 chance of having 0 women in the admissions. The feedback to the population in society is you have to be in the top 1% of female applicants, but in the top 18% of men. Assuming their isn't actually a gender basis for ability distribution. You've just built a system that re-enforces it. TO get out of this, you have to run a two-factor optimal stopping scheme. If you want to do this for other groups in society, it will get more complex too...

2. Stop&Search and Race

It may be the case that you stop and search people in safeguarding society by profiling individuals based on past cases of stopping and successfully apprehending miscreants. Lets say this leads to a higher probability of stopping people who "look middle eastern". Again there's a feedback loop between your "correct" but naive selection scheme, and how people behave - in this case, various cognitive biases in how society will regard the group you target, may lead to the group being marginalised, out of proportion to even your allegedly accurate statistical model. e.g. anchorism....or many others, will lead to over-weighting by society, especially since humans are risk averse.