Machine Learning

Key Innovators

Noah Waisberg

Machine learning algorithms build models for understanding training data in order to understand new data. The goal of a machine learning algorithm is to generalize from experience.

For example, Google Image Search uses machine learning to understand images, so that if you search for “cats” you get pictures of cats. Casetext uses machine learning to screen judicial opinions for citation authority—whether a case has been upheld, limited, or overruled—so that its human reviewers only have to check passages likely to contain overruling language. (Source.)

Machine learning algorithms are generally structured to improve over time. For example, I assume Casetext’s citation authority algorithm gets some form of feedback from human reviewers so that it can improve the accuracy of its model. Search engines use metrics like bounce rate and dwell time to determine whether someone found what they were looking for when they clicked a link.

Bias & Transparency

Machine learning algorithms need a set of training data to learn from. If the training data is biased, then the algorithm is likely to amplify that bias.

Let’s say a law school admissions office wants help identifying high-quality applicants, so it takes the applications of its top students from the past 10 years and uses them to train a machine learning algorithm. If the school has admitted few BIPOC students over that time, the algorithm is likely to develop a bias against BIPOC applicants. This happened at St. George’s Medical School in the United Kingdom. The machine learning algorithm developed a bias against women and “non-European sounding” names. (Source.)

There are many more examples of algorithmic bias, from criminal sentencing to search results to image recognition.

One way to reduce bias is to require machine learning algorithms to “explain” outcomes. For example, the St. George’s Medical School algorithm might have reported that 75% of its decision on rejected applicants was based on their names. If the school had seen that, they probably would have smacked their foreheads and removed names and pronouns from the training data.

Published on January 6th, 2022. Last updated on January 11th, 2022, by Sam Glover.