Cathy O’Neil’s essay, like her engaging book Weapons of Math Destruction, provides a valuable counterweight to our tendency to be overawed by imposing mathematics, granting the products of machine learning and big data analysis an unexamined aura of objectivity. Since my own focus is on privacy and surveillance policy, I’ll even add a few items to her bill of indictment.
One of O’neil’s central concerns about machine learning is its potential to generate vicious feedback loops: Subsets of the population are algorithmically branded “high risk”—whether for default on loans or criminal recidivism—and that judgment, echoed across multiple institutions employing similar algorithms, ultimately contributes to the fulfillment of its own prophecy, seemingly validating the model that generated it. But there’s also a potential feedback loop on the input side, as the prospect of reaping gains—commercial or otherwise—from sophisticated algorithmic analysis generates demand for more data to train and feed ever more complex models.
Two related technological developments are jointly responsible for much of that algorithm food: The precipitous decline in the cost of data storage—to the point where even seemingly useless data can be stored indefinitely at trivial expense—and the explosive growth of networked computing technologies that generate structured data records of the human actions and interactions they mediate as a side effect of their ordinary operation. As big data analytics provide a means of monetizing information, that data that was once, in effect, a digital waste product—granular records of how the reader’s mouse moves around a Web page, say—is increasingly retained, either for internal use or for sale (perhaps in theoretically anonymized form) to data brokers. If storage is cheap enough, it may be worth simply defaulting to retaining nearly everything, on the theory that even if it’s not useful now, some analytic use may later be found for it.
Nor, increasingly, is that approach limited to the digital realm. Cell phones are networked, sensor-enabled devices that can be used to amass a dizzying array of useful types of data that would until recently have been infeasible to gather at scale—allowing driving apps like Waze to make better predictions about traffic patterns and delays based on historical user data. In spaces like Walt Disney World, foot traffic is monitored just as meticulously via electronic bracelets that serve as both tickets and trackers. Retail goods now routinely arrive on the shelf bearing RFID tags to help give brick-and-mortar shops the same analytic insights into consumer behavior that their online counterparts take for granted.
Most of this is benign enough in itself. It’s precisely because the data collected has value that we’ve grown accustomed to getting valuable online services for free, and few Disney visitors mind having their stroll through the park tracked if it helps Big Rodent provide a better experience. But in the aggregate, the imperative to feed all those hungry algorithms generates both massive pools of data and, perhaps more importantly, pervasive architectures of surveillance that can subsequently be repurposed—whether by their owners, malicious attackers, or law enforcement and intelligence agencies. Security experts routinely recommend minimizing the retention of unnecessary data, both to reduce the attractiveness of databases to attackers and mitigate the harms of a breach if it does occur. Exploiting big data, more or less by definition, means doing the reverse.
Thanks to a quirk of American jurisprudence, personal information effectively loses its presumption of Fourth Amendment protection when shared by third party businesses. That’s why, to pick a notorious example, the NSA’s bulk collection of Americans’ telephone records, publicized by Edward Snowden, did not require the kind of particularized search warrant that would be required to enter a home and rifle through private papers. That constitutional asymmetry means that as ever more useful data is collected by private firms, intelligence and law enforcement naturally gravitate toward investigative methods that exploit those resources when possible, sometimes to establish the probable cause needed to seek judicial authorization for a physical search or electronic communications surveillance—sometimes obviating the need to do so altogether. That temptation is particularly strong for intelligence agencies, whose mandate is not to punish particular crimes after the fact, but to anticipate and preempt terrorism or espionage before they occur.
At NSA, that led to the self-conscious adoption of a “collect it all” approach—presumably on the theory that if the attacks of 9/11 represented a failure to “connect the dots,” the solution was to collect more dots. But, as Jim Harper and Jeff Jonas argued in a Cato policy paper more than a decade ago, terrorism is rare enough and its manifestations mutable enough that data mining approaches are sure to yield vastly more false positives than true hits. In the years after 9/11, exasperated FBI agents were known to complain about time and resources wasted following up “Pizza Hut leads” generated by the intelligence community—because, say, a phone number cropping up in the call records of a seemingly suspicious number of terror suspects turned out to be the local pizza parlor.
One solution to this problem, of course, is to gather yet more data, in hopes of refining one’s algorithms and adding additional variables that help exclude false positives. But the data sets necessary to do this are often enormous. Consider, for instance, an NSA program known as CO-TRAVELLER which seeks to map the social networks of foreign targets, not by looking at electronic communications links, but by using cell phone location data to identify people who are meeting up in person. The trouble, of course, is that you can’t do this by looking at the records of your target, which won’t have an entry for “people nearby.” Rather, you need to analyze everyone’s location records and find statistically anomalous pairings in the sea of human motion. But since even that is bound to generate a substantial number of purely coincidental matches, you likely need to consult still more data sets to figure out which of the leads thus generated are promising and which are innocuous.
Intelligence analysts sometimes refer to this as “drinking from a firehose”—the quantity of data to sift through is so enormous that it becomes overwhelming to sift through. But this complexity poses challenges for oversight as well. If we think back to the political abuses of intelligence in the 1970s and 70s, we find instances where misconduct could be identified by observing that a wiretap or other method of data collection had been carried out on a politically sensitive target without any clear legitimate purpose, or had continued even after a legitimate investigation had been wrapped up. When data is collected at a population scale, abuse can far more easily hide in the crowd.
That’s especially true when that same size and complexity make “innocent” violations of the rules more common. In documents disclosed in 2013, the government explained to the Foreign Intelligence Surveillance Court that rules for querying the NSA’s vast telephone database had been routinely violated, not as a result of malfeasance, but because the system was so vast, complex, and compartmentalized that nobody within the agency understood how the pieces fit together.
Having made this case, I should note that while the dynamic I’ve described here is exacerbated by the rise of machine learning and data mining, that is hardly the sole factor, and it would be somewhat odd to focus on the algorithms if our aim is to remedy privacy problems. And I think something similar can be said of many of the problems O’Neill discusses in her essay and book.
One section of Weapons of Math Destruction for instance, deals with the ways predatory businesses, and even outright fraudsters, can employ large data sets to target their ads at vulnerable populations. But this is hardly a problem with targeted marketing—or with the telephone and e-mail networks those businesses use to reach their targets. It’s just another instance of technologies with general utility regrettably facilitating the conduct of bad actors as well as good ones.
A more plausible case of a harm originating with algorithms comes in a section where O’Neil considers how adjusting car insurance premiums based on risk—does the policy holder live, work, or commute through areas with a heightened risk of theft or vandalism?—compounds the burdens on the poor. But O’Neil doesn’t really make a case that these assessments are systematically wrong, so much as that they seem unfair: The poor may have no option but to live in higher risk neighborhoods, and being faced with higher premiums as a result is just one more burden. And that may be true, but if we think there’s a social obligation to aid those who lose out as a result of such risk analysis, it’s not clear why the sensible remedy is to try to shift real risk to insurance companies, obscuring market assessments of that risk in the process.
In other cases, algorithms don’t so much introduce problems as make them harder to ignore. If racial and gender bias influence hiring and promotion, we may be able to detect the problem by either experiment or analysis of labor force statistics, but it’s notoriously difficult to remedy, especially if the bias is often subconscious, since the problem is the upshot of millions of people at thousands of businesses making discrete decisions that may well seem individually defensible. If those hiring practices are used to train an algorithm for training employment applications, we at last have a single target to train our disapproval on. And often, of course, we should—provided we recognize that this is likely an improvement over the scenario where the same biases are hidden in the black-box algorithm that is human judgment. Precisely because algorithmic biases are easier to monitor and tweak than their human counterparts, we should be wary of approaches that make it less costly to fall back on human judgment as a means of concealing rather than remedying those biases.
Instances where algorithmic analysis of big data simply produces wrong results seem like the easiest case, and the one where the case for some external policy remedy is weakest. The loss that results from an algorithm that tells a bank to pass on a loan that would be repaid, or an employee who’d perform well, may not be large enough to spur a reweighting in the individual instance, but over time—and especially for larger firms—any truly systematic analytic failure, affecting significant numbers of people, is likely to impose enough of a cumulative cost that old fashioned greed in a competitive market provides adequate motivation for gradual improvement.
The toughest cases are apt to be of the sort we considered at the start: Where algorithms are widely enough used that they generate feedback loops that seem to validate their own predictions. Even here, though, the problem is often somewhere other than the algorithm. If a model for predicting recidivism gives longer sentences to some convicts, and the longer sentence itself increases the likelihood of recidivism upon release—because the prisoner has now spent years in a criminal social milieu and their marketable skills have atrophied—that’s a serious problem, but it’s fundamentally a problem with the carcereal state, and one that would seem to recommend less punitive sentences across the board, not a tweaking of the process by which they’re allocated.
The feedback loop cases also seem like the ones least amenable to remedy by regulatory intervention in the market context, because barring a monopoly, the problem will be less a function of any one process, but of how many interact over time. Problems of this type may be quite serious, but they’re also the least likely to be spotted in advance.
That’s why despite my sympathy with much of the argument O’Neil advances, I’m not terribly sanguine about the idea of giving some regulatory body responsibility for ex ante review of the algorithms deployed by private firms. The biggest downside to such an approach, I suspect, would not be the monetary expense so much as the friction imposed when problems that were unforeseeable based on scrutiny of code in isolation become evident in the wild. It would be counterproductive to make denying the problem as long as possible less expensive than tweaking the algorithm, which seems unavoidable if we impose a layer of regulatory review in each iteration of the trial-and-error process of tweaking.
More promising, in my view, would be an intermediate approach: Encourage firms to let outside researchers review their models with an eye toward identifying potential harms that may not reduce short-term profits, and thus are more likely to fly beneath the radar of in-house coders. If there are systematic problems that seem to arise in particular sectors, that should prompt a discussion about sector-specific solutions—armed with the essential general insights O’Neil has provided.