Machine learning, the backbone of modern artificial intelligence, is a diverse field of methodologies that aim to solve different sorts of data-intensive problems. The types of problems addressed by these tools vary between industry and government. Ultimately, most machine learning end-users aim to use these models to make accurate predictions about the world around them—the number of grocery store shoppers on a particular day of the week, the likelihood of a loan applicant to default, or the most likely corrected spelling of a misspelled search query, for example.
Machine learning tools have come under increased scrutiny in recent years, both due to their incredible ability to spur innovation and reduce costs, as well as their potential to inherit bias from the social contexts in which they are built. To best understand how these models impact the lives of everyday people, it is essential to understand the basic architecture of machine learning tools used today.
Each machine learning framework, from supervised learning, unsupervised learning, to reinforcement learning, represents an increasingly complex way to create a useful ‘model’ of the world. Fundamentally, a model is simply a mapping of inputs to outputs, usually in service of a prediction or classification. A straightforward model in physics is that of gravity—it can be empirically shown that, absent the effects of air resistance, objects on earth will accelerate downward at about 32 feet per second squared. When accounting for wind direction, air resistance, barometric pressure, among other factors, engineers can use these models to make accurate predictions about the flight of an airplane, .
Models based on established rules are the gold standard for prediction tasks. In most contexts, however, these rules are either too complex to model simultaneously or not known in the first place. Consider a model that aims to predict the proportion of a given population that spends 35% of their income on rent each month (hereafter referred to as rent stress). While an analyst could make an educated guess on factors that might be correlated with rent stress, these data may not be available, and even when they are, they rarely offer a complete explanation on their own. In these situations, machine learning allows analysts to make useful predictions, despite not knowing the underlying processes that determine the outcome of interest.
Using a collection of observations and ‘features’, machine learning tools make predictions. For example, using American Community Survey data for the eight District of Columbia wards, each ward would represent an observation, and the information that describes that ward (percentage in a 20+ unit complex and percentage with broadband internet) would be features of that observation (Figure 1).
As the most widely used type of machine learning framework in the private sector, supervised learning tools are used in industry to make a variety of predictions with well-defined target outcomes (i.e., labels). Streaming platforms like Netflix use both explicit (liked or highly rated) and implicit (clicked and watched completely) signals about content to create supervised recommendation models for each user of their services. Image recognition tools, like Google’s Cloud Vision or Amazon’s Rekognition, are trained using labeled images and can recognize a wide array of objects, people, and activities. Logistics firms use these methods to predict supply and demand from downstream consumers and upstream suppliers alike, allowing them to optimize their supply chains and reduce costs. While these tools are very flexible, the required data labels may or may not need to be created manually, depending on how the data were generated.
In supervised learning tasks, a researcher chooses one feature of the data they would like to be able to predict (referred to as a “target”) using the remaining features. Using the ACS data from Figure 1, suppose an analyst wants to predict the percentage of rent stressed citizens using the percentage of citizens with broadband internet. The first step is to split the data into two groups: the training set, which will be used to tune the model; and the test set, which will be used to evaluate the model’s accuracy. In this case, Ward 1 and Ward 5 will constitute the test set, with the remaining wards used for training. Real-world data sets will often contain thousands to millions of observations, so while this ratio of training to test observations is typical (about 4-to-1), this model is purely for illustration.
This model found a negative relationship between the percentage of the population with broadband and the percentage that are rent-stressed. When comparing this model with other models using different predictors, the analyst uses the test data to determine how well the model fits the data. In this example, Ward 5 is a great prediction (only 1% higher than the real value of 39%), but Ward 1 is a poor prediction (7% higher than the real value of 27%; Figure 2). It is worth noting that the level of accuracy required for a given model varies from use-case to use-case, and in some instances, an error of 7% may be acceptable. After creating many competing models using the same data, the analyst will choose whichever model best minimizes this average test error (or another evaluation metric, depending on the situation).
Unsupervised learning aims to solve problems where there is not a variable of interest to predict. A very common unsupervised learning task used in industry is anomaly detection, where new observations are compared to previous observations to determine their novelty. This sort of approach is what allows credit card companies to flag transactions as a fraud when they occur at atypical times or locations for a given consumer, cybersecurity experts to detect nefarious activity online, and cardiologists to identify discords in a patient’s electrocardiogram. Since they don’t require labeled data, they are often easier to implement than supervised learning approaches, but they are more susceptible to error and noise for this same reason.
Another common use-case for unsupervised learning is ‘clustering,’ which aims to pool each observation into different clusters with similar attributes, usually to better understand those data. Using the same ACS data from Figure 1, suppose an analyst wants to know if the DC wards can be naturally grouped using the population percentage in a 20+ unit compound and the percentage with broadband internet. Unlike supervised learning tools, where the analyst would be able to evaluate the performance of their model by showing it new labeled data and comparing predictions to reality, there is no need for a test set as there is nothing to predict in an unsupervised learning framework.
While metrics exist to compare how “close” observations are to their own cluster versus other clusters, in practice, these unsupervised tools are typically only useful if they help the end-user learn something useful about the data. If the clusters do not make much intuitive sense, it is typically difficult to use them effectively in a real-world setting. In this example, there appears to be a subset of wards with high-density housing with high broadband internet adoption and a subset with lower-density housing and relatively low broadband internet adoption (Figure 3).
Reinforcement learning aims to solve problems with a long-term goal that can be decomposed to a series of short-term decisions, each with clear signals for success or failure. This approach made headlines in 2016 when DeepMind’s AlphaGo was able to defeat 18-time international Go champion Lee Sedol in four of five games. Games such as Go, with simple rules and objectives but requiring long-term planning and strategy, represent an excellent test-bed for these learning tools. These tools have since made the leap from simulated environments to the real world in the form of autonomous driving systems produced by Tesla and Waymo and dexterous robotic manipulation arms by Google and OpenAI.
Reinforcement learning represents a major methodological departure from supervised and unsupervised learning. As part of the process of training a reinforcement learning tool, the data are generated on the fly, either in a simulated environment or in some real-world context. While this necessarily limits their applicability, it gives the model the ability to test out a wide range of approaches to solve a problem, rewarding or discouraging certain emergent behaviors along the way.
These tools place a decision-making “agent” (the player in a simulated game or the driver of an autonomous vehicle) in some sort of environment with available actions that it can take in service of some goal, allowing it to make decisions on its own and giving it feedback on how it’s doing. Early reinforcement learning tools were explicitly designed with games in mind. An agent would play thousands of games of checkers, for example, receiving a “reward” when it captured an opponent’s piece, and a “punishment” when it lost one of its own. Instead of providing the model with a collection of observations and using it to predict the expected outcome of some target variable, the analyst instead defines some objectives for the agent (winning the game and capturing pieces) to optimize and lets it learn on its own.
In repeated iterations with different starting conditions, this agent learns which decisions lead to the highest reward. Due to their ability to learn quickly in a variety of environments, and with relatively little supervision other than defining short-term objectives, these tools are extremely useful in solving a wide array of industry challenges that would otherwise be intractable using more traditional supervised learning or rule-based approaches. However, due to the massive amount of simulated data generated and the high costs of capturing safe real-world data, there are fewer use-cases for reinforcement learning than for supervised and unsupervised learning.
These explainers are meant to serve as a foundation for BPC’s upcoming AI case study series which will introduce examples of machine learning in the real world, the challenges that arise from their use, as well as some proposed social and technological solutions to these challenges. By making machine learning and artificial intelligence accessible to a policy audience, BPC hopes to help preserve their innovative potential without eroding the civil liberties and rights of American citizens.
In the next two explainers in this series, BPC will introduce the differences between conventional machine learning and deep learning, as well as the downstream issues that arise from this increase in complexity.