Practical Advice for the Latest Okta Breach

James Plouffe

Oct 27, 2023 ・ 8 Min read

Key take-aways

HTTP Archive (HAR) files are a common part of tech support workflows, but they may contain a wide variety of sensitive data and can be difficult to sanitize.
Authentication material like session cookies and tokens is valuable to attackers because it can be used to impersonate end users while bypassing Multi-Factor Authentication (MFA), detecting a compromised session cookie or token can be difficult, and such credentials can be comparatively long-lived.
Browsers store sensitive data in multiple locations including Cookies, Session Storage, Local Storage, and IndexedDB all of which may be targeted by attackers and require additional protection.

Summary of recommendations

Re-evaluate automated tools that sanitize files going to external entities especially unfamiliar or uncommon file types
Prioritize the collection and analysis of events that can help identify compromised tokens
Review your account creation and modification events
Evaluate tools, like Seraphic, that can provide greater protection for sensitive data that doesn’t reside in a traditional filesystem

Introduction

The recent disclosure of a breach at cloud identity company Okta marks the second high-profile identity-related compromise in as many months. The breach is remarkable less for its uniqueness than for the factors it has in common with other breaches. Data from the most recent Verizon Data Breach Investigations Report indicates that stolen credentials were involved in nearly half of confirmed breaches. According to Okta, a stolen credential—the exact nature of which remains unclear—gave adversaries access to the internal system used for managing support cases. This access enabled the attackers to steal credentials of Okta customers which in turn granted them access to those customers’ environments (including the environments of other technology vendors). This breach is also noteworthy because, like the attack on MGM last month, it leveraged a routine business process: in both cases, attackers exploited technical support workflows to achieve their objectives. Lastly, it has echoes of other supply chain attacks in which the compromise of a vendor led to incidents or breaches at their customers.

HTTP Archive (HAR) files

One should be forgiven if the phrase “HAR file” doesn’t immediately evoke a reaction or trigger any special association. While HAR files are useful for diagnosing and troubleshooting web performance issues, they are artifacts with a particular niche and—although they are widely supported—the development of the HAR specification was abandoned by the W3C Web Performance Working Group in late 2012. This lack of ongoing development of the specification may account for some of the security-specific issues this format has. Simply put, HAR files are a JSON-formatted log that contains all the details of the HTTP transactions between a web browser and the pages it loads; they are analogous to PCAP files generated by network monitoring tools but specific to web browsers. Because they are created directly by browsers, they do not require HTTPS interception or other decryption techniques to expose application layer data.

The exhaustive detail of HAR files makes them well-suited to their intended purpose but, importantly, they do not support any filtering or obfuscation at the time of generation meaning that they may contain sensitive data (such as credentials), all of which—barring any manual intervention—is stored in cleartext. The result is that sharing these files can lead to the potential for unauthorized disclosure of that sensitive data, just as it did in the case of Okta and its customers.

Although organizations that collect HAR files as part of their support processes recommend that their customers sanitize the files before sharing them, this process is hardly clear-cut. First, because of the level detail they capture, HAR files are often comparatively large. As an example, a single login to office.com with no additional navigation on the site involved over 530 discrete web requests and generated a file almost 380,000 lines long and nearly 65MB in size. It also contained multiple access tokens to the various APIs for the other Microsoft services on which office.com depends. Some automated tools for sanitizing HAR files exist, but given the sheer size of the files created by even the most basic operations, as well the variety of the ways sensitive data may be captured, it is difficult to evaluate the effectiveness without additional manual verification.

It also seems possible that, in this specific instance, sanitizing the file would have defeated its purpose. Since Okta is an Identity Provider (IdP) and a core part of the platform’s functionality is OAuth authentication, support cases may necessarily involve troubleshooting using valid tokens. If a HAR file supplied by a customer is stripped of that data, it could become more difficult to reproduce the issues that necessitated the collection of the HAR file in the first place.

Finally, it is critically important to remember that authentication material isn’t the only form of sensitive data that might exist in a HAR file. They can be used to reconstruct all of the content—which could include Personally Identifiable Information (PII), intellectual property (IP), or any other form of data that should be kept confidential—rendered by the browser during the recording process, creating a new vector for data leakage or other unauthorized disclosure.

Targeting tokens

Stolen credentials are one of the easiest and lowest cost ways for attackers to gain access to their targets. Although phishing remains a significant problem and one of the primary methods of stealing credentials, the increased use of Multi-Factor Authentication (MFA) in enterprises has made the traditional username/password combination less likely to yield the necessary access.

From an attacker’s perspective, focusing on token theft has several distinct advantages. First, they are generated after the authentication process (including any MFA), meaning that an adversary does not need a user’s password and there is no need to intercept a secondary authentication flow. This can significantly reduce the level of effort and infrastructure required for a successful attack, since it doesn’t require a convincing phishing site with the ability to give a user a functional login process. Second, it can be more difficult to identify a compromised token. Third, certain tokens—especially the refresh tokens that are used to request new access tokens—can be long-lived. These last two conditions work in concert to increase the risk. Passwords can be reset or accounts can be disabled as a precaution based on a simple suspicion of compromise but detecting a compromised token requires more telemetry and better analytics which will only be generated and triggered after the compromised token is in use by an attacker. Moreover, the revocation of compromised tokens may not be automatic. The longer the lifetime of the compromised token, the longer the attacker has to achieve secondary objectives like persistence, lateral movement, and privilege escalation so extra vigilance is critical.

Support files aren’t the only treasure troves

The direct source of some of the compromised tokens involved in this breach may have been the HAR files generated by browsers, but browsers themselves have multiple storage facilities for sensitive data including Cookies, Session Storage, Local Storage, and IndexedDB and these locations may be equally attractive targets for adversaries. Infostealers and malicious extensions are just two of the ways that sensitive information can be extracted from browsers. It is, therefore, also important to have mechanisms in place to both prevent the introduction of malware and protect sensitive data at rest.

Recommendations

This breach once again underscores the need to safeguard credentials but also highlights the difficulty in doing so due to the unexpected places that they may reside. Organizations should:

Evaluate automated tools that help them sanitize files they are sharing with external entities, as well as tools that can prevent the sharing of files containing sensitive data (especially in unfamiliar or uncommon file types). Some vendors may provide specific guidance, so be sure to review it (if it exists) but otherwise opt for aggressive redaction of files shared with 3rd-parties.
Prioritize the collection and analysis of events that can help identify compromised tokens, especially detecting “impossible travel”, watching for logins from proxy services or Virtual Private Servers (VPS) and looking for access without predicate login events
Carefully audit account creation and modification events.
Evaluate tools, like Seraphic, that can encrypt credentials (including session cookies and tokens) and other sensitive data that are not stored in traditional filesystems to mitigate the risk from theft.

Seraphic gives organizations tools to help implement some of these practices. For more information, visit our Safe Browsing and DLP pages or schedule a demo to see how we can help you protect authentication material and other forms of sensitive data from accidental or intentional unauthorized disclosure.