The ABCs of Machine Learning: Privacy and Other Legal Concerns

It is 2035. Police investigate a crime that has allegedly been perpetrated by a robot. This is just the beginning of the grave threat faced by humankind in the age of Robot vs. Man. Will the machine-learning algorithms we write today one day transform the plot of 2004’s hit film I, Robot starring Will Smith into our reality? Will we have failed to develop adequate security protocols and lose control of our artificially intelligent bots? Will we have surrendered so much of our privacy that machines will know us better than we know ourselves?

Probably not.

More likely, we will have to grapple with the balance of wanting to explore and advance new technologies with the aspects of life we wish to retain and protect—namely, our data privacy and security—by holding the parties that propagate, use and promote artificial intelligence accountable through legislation. To understand where we stand in terms of the future capabilities of machine learning, and artificial intelligence as a whole, we need to understand what it is and where it comes from.

What is It?

Artificial intelligence (AI) is used to describe the simulation of human intelligence by computers. Originally coined in 1956 by computer and cognitive scientist John McCarthy (widely regarded as the father of AI), it is an umbrella term within which computers carry out processes of learning, reasoning and self-correction, based on algorithms and rules a human inputs into the system. Machine learning is a form of AI that has seen increased momentum and investment in its development from private and public sectors alike. Machine learning focuses on providing computers the ability to learn and re-learn, when exposed to new data, without being reprogrammed to do so each time a dataset changes. Instead of chewing on data that is fed into a computer and spitting it out for a human to act on, machine learning allows the computer the chew on the data, digest it, and then manifest that data in adjustments in programmatic actions, according to the data it was fed.

For example, when Instagram discovered that its growth was stagnating—fewer people were subscribing, and existing users were not posting as much—the company turned to machine learning to reengage users. Instagram’s algorithm focused on learning what content a user was viewing, to determine what other, similar content they might be interested in. It worked. Instagram used machine learning a second time when it realized that users were only looking at 30% of their feed. To retain and engage those users, machine learning algorithms places the “best” 30% of a user’s feed at the top, so that feed quality always keeps the user coming back for more. This use of machine learning seems quite effective, and even some may say, considerate. By weeding out content that a user is unlikely to interact with, he or she is able to have a much more meaningful social media experience.

But at what cost?

Privacy Impacts of Machine Learning

One of the most common current uses of machine learning is facial recognition. Upload a photo to Facebook, and it will ask you if you would like to tag your friends, identifying them by name. Inherently nothing is wrong or bad about the technology; it has been around for some time now, particularly for crime investigation. However, without the proper security safeguards, there is heightened risk of such technology getting into the hands of a nefarious actor at a public level. For example, a Russian software called FindFace can be used on Russia’s version of Facebook,, with open access. This means that a user can be subject to search and identification through FindFace by any third party, regardless of whether they are connected on the social media platform. Take a moment to think about the privacy settings you’ve put in place on your social media profiles, and your reasoning for doing so. Now imagine that, regardless of how you customized your privacy settings, your profile is up for grabs by anyone with an account. Your privacy is but an illusion at this point.

A recent study at Cornell University concretely demonstrates how machine learning could directly attack one’s privacy. The researchers applied basic algorithms (i.e. less complex than those used commercially by the likes of Facebook and Google), to identify people in blurred, pixelated and encrypted images. They were able to “show how to train artificial neural networks to identify faces and recognize objects and handwritten digits” with 71% accuracy. For comparison, human accuracy was 0.19%. The research was conducted for the purpose of warning privacy and security professionals about the threat of machine learning to privacy. Blurred and pixelated images have long been used in the media to obfuscate identities and objects. License plate numbers are routinely blurred out, as are the faces of victims of particularly violent or graphic crimes. This technology could be used to work around blurring frameworks put in place by social media platforms like YouTube, that shield protesters in oppressive governments that do not take kindly to protest and civil disobedience.

Speech and handwriting recognition algorithms present similar concerns. On the one hand, stroke victims and others unable to speak or write naturally stand to benefit tremendously from technology that can learn and relearn a user’s behavior, preferences and patterns. On the other hand, when bad actors intervene, the same technology can be used to forge signatures, and cause dangerously incorrect things to be “said.”

What Should We Expect From the Law?

It is too soon to know what privacy legislation around machine learning will look like. So far, we only have a sense that it will be a part of much of the technologies currently being developed, and those to come. The White House and the Federal Trade Commission have both released reports that function as stage-setting roadmaps rather than concrete policy about the rise of big data, the use of large datasets as fodder for machines hungry to learn, and what this may mean for the economy, the advancement of science and medicine and the jeopardization of the consumer’s privacy. In preparing for the inevitable slew of legislation around machine learning, and AI in general, here are a few areas that lawyers and other privacy professionals will want to think about.


In 2014, Alphabet Inc., Google’s parent company, bought a UK-based AI company called DeepMind. DeepMind’s aims are to apply AI for good, to “tackle some of our most pressing real-world challenges [f]rom climate changes to the need for radically improved healthcare…” The company’s AlphaGo rose to fame in 2016, when it defeated one of the world’s best competitive Go players. Just this May, AlphaGo played its last competitive game of Go, beating the best competitive Go player 3-0, and settling once and for all, that AI is far more advanced than many scientists thought. With the confidence that its algorithms clearly work, DeepMind is ready to work on the real-world challenges in science and medicine it set out to solve. The combination of DeepMind’s powerful technology and Google’s vast access to virtually all types of data could lead to scientific breakthroughs and medical achievements at speeds never heard of before. But as the old adage goes, corporations will be corporations, and bottom lines may mean more than breakthroughs. There is no telling whether, if more profitable to Google, the company would rather sell medical insights to insurance companies looking to deny coverage to the masses, rather than research hospitals and experimental treatment facilities.

This is why as a condition to its acquisition, DeepMind required Google to appoint an ethics board to monitor how AI was used in its capabilities. Contracting for an ethics board, or some type of checks and balance function, is likely to increase in demand, as machine learning becomes more nuanced, and impacts on our data privacy are tangibly felt. However, while DeepMind set the precedence for an AI ethics board, the board members and its operations itself are largely a mystery. The Partnership on AI is a new organization “established to study and formulate best practices on AI technologies, to advance the public’s understanding of AI, and to serve as an open platform for discussion and engagement about AI and its influence on people and society.” Both DeepMind and Google, as well as Facebook, Amazon, Apple, IBM, and Microsoft, are listed as founding partners of the organization. These companies are in the best position to know how AI and the machine learning industry in particular will evolve and grow. It remains to be seen what the type and value of information from the Partnership on AI will be, and whether the banding together of AI leaders will increase accountability and ethical responsibility.

Job Creation, Loss, and Efficiency

Ed Felten, a former deputy U.S. chief technology officer, has said that “AI will create new jobs while phasing out some old ones”, suggesting that the solution is based in preparing the incoming job force through the education system, rather than solve employment issues with legislation. For some fields, job displacement is on the horizon. Self-driving cars are here, with companies like Uber and Alphabet Inc.’s Waymo, making headlines for their pioneering achievements as well as heated trade secret wars. Self-manned vehicles are set to transform transportation and logistics industries, and a skills development or training program is likely to have more impact on the economic situation of drivers, than legislation to curb the use of AI.

Other professions have embraced AI, as it does not pose the same type of threat to job security. For example, it is unlikely that machine learning will completely displace attorneys and doctors. Rather, it will remove some functions from those professions, such as determining how a court has generally ruled upon a particular matter in the past, or the current proper citation of a regulation as amended, and force more tailored legal reasoning, and creative thinking to problem solving. Machine learning is likely also to increase transparency and access to typically high-cost services by cutting down on research and information management done by paralegals, nurses and residents.


Some of the ways industry experts expect machine learning to be incorporated into our current challenges are in fields that rely on confidentiality, namely in the legal and medical fields. Conceivably, an AI machine could scan your grocery shopping behaviors coupled with your exercise routine, sleeping patterns, stress levels and preexisting health conditions, to diagnose a particular medical experience or condition, without a trip to the doctor’s office. IBM has already partnered with a few large law firms to introduce Watson (its AI bot) to the legal field, largely to cope with the complex structure of legal knowledge held in state and federal statutes, regulations, treatises, contracts and case law.

It will be important to understand how liability is apportioned in the case of an incorrect medical diagnosis, or whether attorney-client privilege is breached by using Watson, or other AI in the practice of law. For example, who takes ownership of a medical diagnosis delivered to a patient by a system employing machine learning? The doctor who was ultimately not consulted, as it saved the patient a trip to the hospital, may escape unscathed. But the manufacturer and developers of the AI system may also be free of fault, as the computer is, by definition, learning and relearning based on datasets, in ways unknown to the programmers.

Edith Ramirez, the former FTC chairwoman, has said that there may be “no traditional way for operators to make disclosures about what information they are collecting and how they will use it” because there is no way to know how and what a machine learning system will learn.

Should liability fall, then, on the parties that controlled the datasets in the first place? It seems a stretch, as the data is likely to be raw, unanalyzed, and without context or meaning and therefore no reasonable expectation of liability, until the machine applies context itself. Laws will likely also have to address incorrect predictions, diagnoses or results that stem from bad judgment on the part of the machine, as differentiated by those that stem from manipulation of the data to create bias, whether malicious or not.

About the Author

Zainab Hussain is an attorney with Foundry Law Group in Seattle, where she works with startups and growth companies on IP, data privacy and general business issues. Contact her on Twitter @FoundryLaw.

Send this to a friend