Microsoft AI Data Exposure: 5 Things To Know

‘Those of us in IT security only need to be wrong once, while the bad actors only have to be right once,’ US itek President David Stinner says.

A Microsoft employee’s accidental exposure of company data has sparked a conversation over the security of Shared Access Signature tokens.

SAS tokens allow users to provide select access to specified Azure Storage resources, according to Microsoft.

However, an employee’s sharing of a uniform resource locator for a blog store in a public code repository while working on open-source learning models for artificial intelligence accidentally exposed 38 terabytes of private data, according to security vendor Wiz, whose researchers discovered the exposure.

[RELATED: Microsoft AI Research Team ‘Accidentally’ Exposes 38 Terabytes Of Private Data: Wiz]

Microsoft SAS Token Exposure

No customer data was exposed and no other internal services were at risk, according to Microsoft.

David Stinner, founder and president of US itek, a Buffalo, N.Y.-based MSP and Microsoft partner, told CRN that the incident is a reminder “that those of us in IT security only need to be wrong once, while the bad actors only have to be right once.”

Tim Bates, Lenovo’s CTO for the global accounts business, said on Microsoft-owned LinkedIn that “this incident underscores the challenges of securing massive amounts of data, especially in the fast-paced world of AI development.”

“It’s a wake-up call for companies to tighten their security protocols,” Bates said.

New York-based Wiz warned that SAS tokens should be considered as sensitive as account keys and recommends not using account SAS tokens for external sharing. Instead, users should consider SAS with stored access policy and user delegation SAS.

“Token creation mistakes can easily go unnoticed and expose sensitive data,” Wiz said in its report on the exposure.

Wiz also commented on the risks posed as more organizations explore AI—organizations need to watch out for oversharing data and watch out for a supply chain attack within repository models.

Here’s what else you need to know about the exposure.

Microsoft Explains Exposure

A Monday blog post from Microsoft said that the exposure came from a Microsoft employee who shared a uniform resource locator for a blog store in a public GitHub repository while working on open-source AI learning models.

The uniform resource locator included a SAS token for an internal storage account, but the token was too permissive. Azure Storage and the SAS token itself did not have a security issue or vulnerability, according to Microsoft.

“Like other secrets, SAS tokens should be created and managed properly,” Microsoft said in the blog post. “Additionally, we are making ongoing improvements to further harden the SAS token feature and continue to evaluate the service to bolster our secure-by-default posture.”

Microsoft Missed Exposure

Although Microsoft performs rescans of all public repositories in Microsoft-owned or affiliated organizations and accounts, the SAS uniform resource locator at the center of the exposure was incorrectly marked as a false positive.

“The root cause issue for this has been fixed and the system is now confirmed to be detecting and properly reporting on all over-provisioned SAS tokens,” Microsoft said.

Microsoft stood by SAS tokens as a security mechanism in its Monday blog post, saying that SAS provides granular control over how clients access organization data unlike shared keys, which grant access to an entire storage account.

SAS tokens can restrict specific containers, directories, blobs, blob versions and other resources from clients. They can control the actions clients can take, such as reading and deleting materials. They control what network clients can access information from and how long they have access.

Users can also revoke SAS access at any time by rotating parent keys.

GitHub Preventative Measures

GitHub, which is owned by Microsoft, has a secrets scanning service to monitor all public open-source code changes for credential and secret exposure, according to the vendor.

The service can detect Azure Storage SAS links to cryptographic keys and other sensitive content and SAS tokens that are too permissive with expiration dates and privileges.

Case in point, the SAS token at the center of this Microsoft had an expiry date of Oct. 6, 2051.

38 TB Of Data Exposed

According to Wiz’s timeline of events, researchers with the security vendor found and reported the SAS token to Microsoft on June 22.

The token was first committed to GitHub in July 2020. In October 2021, the token’s expiry date was changed from Oct. 5, 2021, to Oct. 6, 2051.

After Wiz researchers found the exposure over the summer, they could use the token to access backups of two former employees’ workstation profiles and internal Microsoft Teams messages the two employees shared with colleagues.

The backups included private keys, passwords and 300,000 Teams messages, according to Wiz.

Microsoft invalidated the token two days after learning about the issue. Microsoft replaced the SAS token on GitHub in July. It completed its internal investigation of the exposure in August and disclosed the exposure Monday.

AI Pipeline Risks

As interest and adoption of AI grows, the Wiz report on Microsoft’s exposure provided some warnings for organizations leveraging the emerging technology.

First, Wiz cautioned security teams to define guidelines for sharing AI datasets externally to avoid oversharing data. In Microsoft’s case, the researchers could have separated the public AI data set to a dedicated storage account.

Wiz also warned about supply chain attacks. Researchers who look to repository models for AI development need to watch for malicious code injected into the model files.

“Security teams should review and sanitize AI models from external sources, since they can be used as a remote code execution vector,” according to Wiz.