Microsoft Steps into Differential Privacy and Responsible Machine Learning

Highlights its artificial intelligence efforts at Build 2020

Competition for the hearts and minds of developers has never been more relevant, which is why Microsoft Build is always a vital event for the company and the whole industry.

Build 2020 was no different in that regard, although this year the event took on a different feel thanks to its digital format as a result of the Covid-19 crisis. The show wasn’t short of news, especially in artificial intelligence (AI), privacy and responsible machine learning, which took centre stage this year.

Much of the existing narrative about responsible AI focusses on high-level areas like ethics, policy and establishing principles for the technology. These are important, but often too abstract to carry any real-world relevance or provide operational guidelines for developers.

By contrast, Build saw a much deeper focus on the technical tools and practices to help AI practitioners build and deploy machine learning responsibly from the outset. Moves in this area form part of a wider push by Microsoft into responsible AI this year and, in particular, the tools to enable effective governance of machine learning applications and processes.

Let’s take a closer look at the key announcements and some of Microsoft’s tools for responsible AI. They have vital implications for businesses and the industry over the coming 12 months.

The Shift to Responsible AI

Responsible AI is a combination of principles, practices and tools that enable businesses to deploy AI technologies in their organizations in an ethical, transparent, secure and accountable way.

The area has been getting a lot of attention recently as more decision-makers consider introducing data and AI solutions in mission-critical and regulated areas such as finance, security, transportation and healthcare. Additionally, concerns are mounting about the ethical use of AI, the risks inherent in biased data and a lack of interpretability in the technology, as well as the potential for malicious activity such as adversarial attacks.

For these reasons, the governance of machine learning models has become a top priority for enterprises investing in such systems. A survey of senior IT decision-makers in 2019 by FDM CCS Insight indicated that the two most important requirements when investing in AI and machine learning technology were the level of transparency of how systems work and are trained, and the ability of AI systems to ensure data security and privacy. These two requirements were cited by almost 50% of respondents.

Getting Practical with Tools for Responsible Machine Learning

One of the major areas on show at Build this year was Microsoft’s expanding portfolio of tools available in open source, Azure and soon natively integrated into Azure Machine Learning, that help data scientists, machine learning engineers and developers get hands-on experience of responsible AI.

Microsoft is focussed on building trust and transparency into the entire life cycle of machine learning, from data acquisition to modelling and deployment. Its tools focus on three principal areas — protect, control and understand — which saw several major announcements.

Protect

This area addresses scenarios in machine learning that involve sensitive information or privacy requirements such as using personal health or census data. The data security, privacy and compliance capabilities of the Azure cloud are fundamental to Microsoft’s efforts, along with Azure Machine Learning, its platform for the operationalization of the technology, from training to deployment and monitoring.

One of the notable moves at Build focused on this area and specifically on differential privacy. Differential privacy is a class of algorithms that facilitate computing and statistical analysis of sensitive, personal data while ensuring the privacy of individuals isn’t compromised. Microsoft unveiled WhiteNoise, a library of open-source algorithms that enable machine learning on private, sensitive data.

As one of the strongest guarantees of privacy available, differential privacy algorithms are being adopted in several areas today. The US Census Bureau uses them to analyse demographic information and the likes of Apple, Google and Microsoft employ the technology to analyse user behaviour in their operating systems.

In 2019, Microsoft partnered with Harvard University’s Institute for Quantitative Social Sciences to develop an open-source platform to share private data with differential privacy to bring more researchers into the field. However, widespread adoption in enterprises is minimal, but with the release of WhiteNoise, Microsoft is aiming for more organizations to begin using the algorithms for machine learning on sensitive data.

Another major announcement was the unveiling of efforts to support confidential machine learning, which is coming to customers later this year. This enables the building of models in a secure environment where the data is confidential and can’t be seen or accessed by anyone including the data science team. All machine learning assets including the inputs, models and derivates are kept confidential.

The capability adds to Microsoft’s approach to building models over encrypted data, following its release into open source in 2018 of Simple Encrypted Arithmetic Library (SEAL), a set of encrypted libraries that allow computations to be performed directly on encrypted data using homomorphic encryption.

Control

This area focusses on responsible development processes for machine learning, enabling them to be repeatable, reliable and accountable. Azure Machine Learning automatically tracks the lineage of data sets, so customers can maintain an audit trail of their machine learning assets, including history, training and model explanations, in a central registry. This gives data scientists, machine learning engineers and developers improved visibility and auditability in their workflows.

Microsoft also provided guidance on how custom tags in Azure Machine Learning can be used to implement datasheets for machine learning models, to enable customers to improve the documentation and metadata of their datasets. Custom tags are based on Microsoft’s work with the Partnership on AI and its Annotation and Benchmarking on Understanding and Transparency of Machine learning Lifecycles (ABOUT ML) project, which aims to increase transparency and accountability in the documentation of machine learning systems.

Understand

The third area focusses on the tools that build a better understanding of machine learning models and assess and mitigate unfairness in data. Microsoft has been investing heavily in tools for model unfairness and explainability over the past few months; these areas are of particular importance to practitioners of machine learning at the moment.

In 2019, Microsoft released Fairlearn into open source toolkit that assesses the fairness of machine learning models. At this year’s Build, Microsoft announced that it will integrate the toolkit natively into Azure Machine Learning in June. The Fairlearn toolkit offers up to 15 “fairness metrics” that it can assess and retrain models against. It also offers visual dashboards to show practitioners how the model is performing against certain groups selected by the customer, such as age, gender or race. Microsoft plans to add to these capabilities as research in the field progresses.

Another important solution is InterpretML, a tool and set of interactive dashboards that use various techniques to deliver model explanations at inference. For different types of model, InterpretML helps practitioners better understand the most important features in determining the model’s output, perform “what if” analyses and explore trends in the data. Microsoft also announced it was adding a new user interface equipped with a set of visualizations for interpretability, support for text-based classification and counterfactual example analysis to the toolset.

Visualizing Explainability for Customers

The emerging nature of these crucial areas in AI has made it challenging for some early-stage enterprises to understand how the technologies work in practice, especially for their customers. Below is a photo I took at a recent Microsoft demonstration of InterpretML. It shows how a retailer could use explainability in action, for example on its website to support its AI-driven product recommendations for consumers.

Microsoft has made a host of visual improvements in this area, particularly for data scientists and machine learning engineers. More importantly, in my view, it’s a great example of how businesses can build transparency and trust in AI with their customers, one of the biggest hurdles with the technology at the moment.

Trust — or a lack of it — in the technology has emerged as the biggest barrier to adoption of machine learning in enterprises. FDM CCS Insight’s survey in 2019 found that 39% of IT decision-makers said trust was the biggest hurdle to adoption in their organization.

The Road Ahead

The announcements at Build 2020 follow a big push from Microsoft into responsible AI over the past 12 months and they put the firm ahead of many of its competitors in this area. Microsoft’s growing set of tools for machine learning engineers, data scientists and developers are some of the most comprehensive available, especially in the emerging areas of security and privacy.

But the firm can’t stand still, and more work must be done to raise the awareness of these tools as adoption of responsible ML is still low. It needs to stand taller as a trusted advisor to practitioners and be more prescriptive with customers on the right approaches for deployment as well. And it will need more products in AI security, as well as further integrations with GitHub and its low-code Power Platform if the company is to continue to lead the market in trusted AI.

What It Means for Practitioners

It’s early days for responsible AI, but Microsoft is demonstrating how it can help companies avoid problems and improve the performance and quality of the AI applications they deploy.

I’ve been asked many times during the past month whether the heightened pressure enterprises are now facing as a result of the Covid-19 pandemic will cause them to short-cut aspects like responsible machine learning in favour of getting pilots into production faster.

This is certainly a possibility, but in my opinion, people’s memories of the actions that enterprises are taking now will run much deeper than many of the better-planned projects that have come before the pandemic or have yet to start. More organizations will therefore aim to get AI right during the crisis as well.

As practitioners get going in this area, here are a few final things to consider:

Prioritize the governance of AI early. One global bank I spoke to recently has just put in place a policy that no AI model can move into production without some interpretability and bias controls built into the life cycle of the application.

This is a fantastic approach. Embedding governance into the entire life cycle of machine learning helps to reduce problems later on and, above all, engenders confidence and trust in the AI that gets built. This ultimately leads to faster deployments, wider adoption and more responsible innovation.

Kjersten Moody, Chief Data and Analytics Officer at insurer State Farm, perhaps captures this best when she states: “As we introduce AI into our processes, we have to hold ourselves to the highest standard and we have hold AI to that high or higher standard that we would hold our people to.”

Concentrate on bias and explainability. Although they are in their infancy, tools to counter potential unfairness in data and improve explainability in models are getting better. They are a good place to start in responsible AI. They will help to minimize any negative effects not only on customers but also on business processes, employees and the surrounding technologies that support AI.

Select a trusted provider. Customers I speak to care little about the conceit of algorithmic perfection from a supplier. Rather, they want to know that they are on solid foundations with a responsible provider as they advance their AI strategies.

Look to suppliers that prioritize this area, provide access to talent and best practices and are transparent in their AI services. This includes ensuring you can extend your framework for responsible AI to models you consume from your provider that you don’t own.

Build 2020 did a good job showcasing the investments Microsoft is making in responsible machine learning. It will be fascinating watching enterprises progress in this key field in the months ahead.

Portions of this blog were previously published by Computer Weekly here and here.

Written by: Nick McQuire

Posted on 2 June 2020