Information Leakage: What You Need to Know

Information leakage poses a serious problem for thousands of companies around the world. Many software startups and even large, established enterprise organizations have difficulty ensuring that technical data is kept under lock and key. Problems proliferate with secrets being improperly committed on GitHub repositories and Amazon S3 buckets with poorly improperly configured permissions leaving sensitive information open for anyone to find. 

In fact, a study by the University of North Carolina found that over a 6 month period, 200,000 passwords and API keys were published on GitHub, and over 80% were still available at the end of the study. This post will explain how you can identify potential information leaks, remediate them, and prevent them from happening in the future.

Table of Contents

Modern Development Practices Pose Unique Information Leakage Risk

Software development isn’t what it used to be. Many organizations today have large teams composed of a mix of full-time in-house employees and contractors, both locally sourced and from around the world. While hybrid teams have enormously increased cost-efficiency and productivity in software development, they also present an extremely lucrative attack surface. This refers to the increasing total number of all possible attack vectors (points) where attackers or unauthorized users can access a system or enter or extract data. For example, while working from home, 47% of employees will fall for phishing scams which increases risk and provides more opportunity for malicious actors.

Particularly, during this new normal, not only has the decentralization of workers had an effect on information leakage risk but also, there’s been a rise in attacks. Deloitte has reported that Switzerland had 350 cyber attacks back in April 2020. However, their average number of attacks is between 100-150. 

Large and nationally diverse development teams represent challenges for identity and access management. While a well-planned information security program can dramatically cut down on the risk of exposed technical data, information leakage can still easily occur. All it takes is one junior developer to improperly configure cloud permissions, and large amounts of data can be left exposed to the wider internet. 

Exposed S3 buckets don’t just pose a risk to the development team either. An unsecured S3 bucket owned by a cannabis dispensary leaked tens of thousands of personal records from customers, resulting in thousands of instances of personally identifiable information (PII). Breaches like this can cause enormous reputational risks as well as regulatory action.

Types of Information Leakage

Cloud data source credentials or addresses: 

The cloud is part of the internet. When sensitive cloud data is exposed to the rest of the internet, this is when a cloud data breach occurs. Most of the time, malicious actors gain access to misconfigured cloud buckets (AWS, Azure, GCP) or databases (MySQL, ElasticSearch, etc.)

What are Cloud Buckets?

Cloud buckets are spaces to hold your data. All files stored in the cloud must be stored in a bucket. You can control access and organize files and data within a bucket which are important pieces when it comes to their protection (but more on that later). 

Third-Party API Keys

protecting API keys

Theft or loss of data stored by a third party, such as a Customer Relationship Management (CRM) or an Enterprise Resource Planning (ERP) platform, is a breach of your third-party API keys.

An example of this kind of information leak is the Facebook data leak in 2019. More details have been revealed from the initial press release that states, approximately 533 million users were affected. Two third-party Facebook app datasets had been breached and exposed users’ phone numbers. 

Cost: $5 billion

Source Code

Source code is any collection of computer code that is written using human readable programming language. If Source code and intellectual property are leaked, it can result in loss of competitive advantage. 

In 2018, an Apple intern leaked source code as he was leaving the company. The intern shared IOS source code which made it easier to jailbreak iPhones with his friends that were security researchers. Of course, the source code ended up being shared on GitHub. Although Apple initially stated the information leaked didn’t affect the security of their devices, they quickly took action to take down the dozen of replicated source code repositories on GitHub. Not only is leaked source code creating vulnerabilities for malicious actors to take advantage of, but also competitors and customers. 

What is GitHub?

GitHub is a platform where communities of developers share code in the collective effort to aid in the advancement of programming. Although this is a great community, many organizations’ source code, API keys, and data are often leaked on GitHub and for malicious actors, a leak is good news that often travels fast. These leaks are generally honest mistakes by employees looking to share code or be more efficient, but can have significant consequences for organisations.

External or Internal Server Addresses, Configuration, or Secrets

External or Internal Server Addresses, Configuration, or secrets can provide support for malicious actor network discovery and lateral movement, which can lead to a variety of common potential attacks, including data theft, data destruction, ransomware, etc.

Preventing Information Leakage

We’ve established that information leakage poses enormous risks to companies, but how do you prevent it from happening in the first place? Below are some best practices to prevent sensitive data being leaked from AWS S3 buckets as well as GitHub repositories. The #1 thing your company can do is have a strong data leak prevention strategy.

preventing information leakage graphic

Preventing S3 Bucket Leaks

S3 Block Public Access 

The first and most important step is to properly configure buckets to avoid information leakage. This can be accomplished by setting up the S3 Block Public Access feature, which enables account administrators to quickly and easily set permissions for buckets, access points, and objects across the organization. This also blocks resources from being publicly accessible, without the need to manually verify each bucket. 

Employ the Principle of Least Privilege

For all cloud environments, but particularly for S3 buckets and data storage, make sure that your organization employs the principle of least privilege. Employees and consultants should only be given enough privileges to perform their jobs adequately. Providing users with unnecessary account administrative access can lead to errors resulting in technical data leakage. By employing the principle of least privilege, it reduces your attack surface, stops the spread of malware, and improves user productivity. 

A more extensive model of the principle of least privilege is zero-trust. This concept is becoming increasingly popular because zero-trust considers not only who is being granted access but, zero-trust also considers the context of the request and the level of risk. Both concepts are important to prevent information leakage but, as organizations’ footprints become more complex, we expect more companies to adopt the zero-trust approach.

Encrypt Data at Rest

If you are using buckets to store sensitive data, ensure that adequate encryption is employed to prevent unauthorized access. You can utilize Amazon S3-Managed Keys (SSE-S3), which provides a unique key that rotates regularly, or client side encryption to accomplish this. 

Utilize An External Monitoring Platform

A Digital Risk Protection Platform like Flare can enable your organization to understand your digital footprint in near real-time, and quickly remediate bucket issues. Flare continually hunts for exposed buckets, GitHub repositories, and other misconfigurations and can immediately alert you if permissions have been misconfigured. Not only do we monitor the dark web, but we scan the clear and deep web to have a full view of your external attack surface

s3 bucket leak information leakage

Preventing GitHub Repo Leaks

Preventing leaks on GitHub is critical for software development companies and other organizations that are developing custom code. According to GitHub, in 2020 there were more than fifty-six million GitHub users and more than sixty million repositories were created. While collaborative SaaS repositories can enable teams from around the world to seamlessly work together, they also represent a wide attack surface. Many organizations unintentionally expose data such as API keys, secrets, and service tokens through lax coding practices and overly generous permissions.

information leak example

Source code displaying database access usernames and passwords.

Of particular concern are secrets, a term used to describe the various digital authentication credentials used in an IT infrastructure, such as passwords, keys, tokens, etc.

Here are some key practices you can employ to reduce the risk of accidental GitHub leaks:

Standardize Coding Practices

Take the time to clearly write the rules of the road for your development team. Once you have established actionable policies, train your developers to adhere to them throughout the software development lifecycle. Policies such as preventing account sharing to bypass permissions issues, and utilizing pre-commit Git hooks to actively detect secrets can enable your development team to dramatically reduce the risk of technical data leakage. 

Centralize your Key and Secrets Storage

In addition, it’s worth considering your current approach to storing keys and secrets. If your encryption keys aren’t secure, it can undermine other cybersecurity practices you are employing. Consider using a centralized depot to securely store keys and secrets rather than relying on an array of DevOps tools. Centralized storage can make it easier to keep track of highly sensitive technical data, enable your team to work efficiently, and reduce the risk of accidental leakage. 

Periodically Review Permissions

Conducting a regular permissions review based on the principle of least privilege is key. Developers who are working on other projects or who have left the organization should be immediately stripped of any permissions. By scheduling a periodic review you can hold yourself accountable and ensure that users’ permissions are properly configured for the work they are doing. 

Encourage a Culture of Security

Many organizations go through the motions to comply with certain information security regulations and reassure key stakeholders that they take information security seriously. However, to reduce real risk you need to actively create a culture of security. This means making conversations around development security a part of routine meetings and regularly training developers on information security best practices.

Externally Monitor GitHub Environments

No matter how good your security is, mistakes happen. Utilizing external technical data leak monitoring software can help you catch leaks that you might have otherwise missed. In addition, Digital Risk Protection software can help you identify if any users have compromised credentials for sale on the dark web.

Don’t Forget about Pastebin

Pastebin is a commonly used site by both developers and hackers. Essentially the site allows easy file sharing between users which are termed “pastes”. Pastes are used in a variety of ways by both development teams and malicious actors. Pastebin allows for fully anonymous posting of any plain text. While the company does remove malicious content, it relies heavily on user reports which can lead to a delay between posting and removal. Some Pastebin uses include:

  • Quickly sharing and collaborating on software code
  • Sharing text files as an alternative to Word Documents
  • Sharing dark web links between users
  • Posting data breach dumps of usernames, emails, and passwords
  • Posting stolen source code and other technical data

We strongly encourage organizations to actively monitor Pastebin for corporate login credentials and leaked technical data. If sensitive information is compromised, chances are high that it could appear as a paste.

Information Leakage Doesn’t Have to be Devastating

No matter how good your controls are, information leakage can happen to any organization. The mix of contracted developers, multiple software platforms, and dozens of people involved in software development lends itself to user error. Flare provides a unique digital risk protection platform that can help not only monitor for information leakage, but also identify threats on the dark web such as account takeover schemes and financial fraud.