Information Leakage: What You Need to Know

Information leakage poses a serious problem for thousands of companies around the world. Many software startups and even large, established enterprise organizations have difficulty ensuring that technical data is kept under lock and key. Problems proliferate with secrets being improperly committed on GitHub repositories and Amazon S3 buckets with poorly improperly configured permissions leaving sensitive information open for anyone to find. 

In fact, a study by the University of North Carolina found that over a 6 month period, 200,000 passwords and API keys were published on GitHub, and over 80% were still available at the end of the study. This post will explain how you can identify potential information leaks, remediate them, and prevent them from happening in the future.

Modern Development Practices Pose Unique Information Leakage Risk

Software development isn’t what it used to be. Many organizations today have large teams composed of a mix of full-time in-house employees and contractors, both locally sourced and from around the world. While hybrid teams have enormously increased cost-efficiency and productivity in software development, they also present an extremely lucrative attack surface for bad actors. 

Large and nationally diverse development teams represent challenges for identity and access management. While a well-planned information security program can dramatically cut down on the risk of exposed technical data, information leakage can still easily occur. All it takes is one junior developer to improperly configure cloud permissions, and large amounts of data can be left exposed to the wider internet. 

Exposed S3 buckets don’t just pose a risk to the development team either. An unsecured S3 bucket owned by a cannabis dispensary leaked tens of thousands of personal records from customers, resulting in thousands of instances of personally identifiable information (PII). Breaches like this can cause enormous reputational risks as well as regulatory action. 

Types of Information Leakage

Cloud data source credentials or addresses: 

Could allow malicious actors access to potentially misconfigured cloud buckets (AWS, Azure, GCP) or databases (MySQL, ElasticSearch, etc.)


Third-Party API Keys: 

Theft or loss of data stored by a third party, such as a Customer Relationship Management (CRM) or an Enterprise Resource Planning (ERP) platform.

protecting API keys

Source Code: 

Source code and intellectual property leaks can result in loss of competitive advantage. 


External or Internal Server Addresses, Configuration, or Secret – 

Can provide support for malicious actor network discovery and lateral movement, which can lead to a variety of common potential attacks, including data theft, data destruction, ransomware, etc.

Preventing Information Leakage

We’ve established that information leakage poses enormous risks to companies, but how do you prevent it from happening in the first place? Below are some best practices to prevent sensitive data being leaked from AWS S3 buckets as well as GitHub repositories. 


Preventing S3 Bucket Leaks

S3 Block Public Access 

The first and most important step is to properly configure buckets to avoid information leakage. This can be accomplished by S3 Block Public Access, which enables account administrators to quickly and easily set permissions for buckets across the organization. This also blocks them from being publicly accessible, without the need to manually verify each bucket. 


Employ the Principle of Least Privilege

For all cloud environments, but particularly for S3 buckets and data storage, make sure that your organization employs the principle of least privilege. Employees and consultants should only be given enough privileges to perform their jobs adequately. Providing users with unnecessary account administrative access can lead to errors resulting in technical data leakage. 


Encrypt Data at Rest

If you are using buckets to store sensitive data, ensure that adequate encryption is employed to prevent unauthorized access. You can utilize Amazon S3-Managed Keys (SSE-S3) or client side encryption to accomplish this. 


Utilize An External Monitoring Platform

A Digital Risk Protection Platform like Flare can enable your organization to understand your digital footprint in near real-time, and quickly remediate bucket issues. Flare continually hunts for exposed buckets, GitHub repositories, and other misconfigurations and can immediately alert you if permissions have been misconfigured.

s3 bucket leak information leakage

Preventing GitHub Repo Leaks

Preventing leaks on GitHub is critical for software development companies and other organizations that are developing custom code. According to GitHub, in 2020 there were more than fifty-six million GitHub users and more than sixty million repositories were created. While collaborative SaaS repositories can enable teams from around the world to seamlessly work together, they also represent a wide attack surface. Many organizations unintentionally expose data such as API keys, secrets, and service tokens through lax coding practices and overly generous permissions. 

information leak example

Of particular concern are secrets, a term used to describe the various digital authentication credentials used in an IT infrastructure, such as passwords, keys, tokens, etc.

Here are some key practices you can employ to reduce the risk of accidental GitHub leaks:

Standardize Coding Practices

Take the time to clearly write the rules of the road for your development team. Once you have established actionable policies, train your developers to adhere to them throughout the software development lifecycle. Policies such as preventing account sharing to bypass permissions issues, and utilizing pre-commit Git hooks to actively detect secrets can enable your development team to dramatically reduce the risk of technical data leakage. 

Centralize your Key and Secrets Storage

In addition, it’s worth considering your current approach to storing keys and secrets. If your encryption keys aren’t secure, it can undermine other cybersecurity practices you are employing. Consider using a centralized depot to securely store keys and secrets rather than relying on an array of DevOps tools. Centralized storage can make it easier to keep track of highly sensitive technical data, enable your team to work efficiently, and reduce the risk of accidental leakage. 

Periodically Review Permissions

Conducting a regular permissions review based on the principle of least privilege is key. Developers who are working on other projects or who have left the organization should be immediately stripped of any permissions. By scheduling a periodic review you can hold yourself accountable and ensure that users’ permissions are properly configured for the work they are doing. 

Encourage a Culture of Security

Many organizations go through the motions to comply with certain information security regulations and reassure key stakeholders that they take information security seriously. However, to reduce real risk you need to actively create a culture of security. This means making conversations around development security a part of routine meetings and regularly training developers on information security best practices.

Externally Monitor GitHub Environments

No matter how good your security is, mistakes happen. Utilizing external technical data leak monitoring software can help you catch leaks that you might have otherwise missed. In addition, Digital Risk Protection software can help you identify if any users have compromised credentials for sale on the dark web.

Don't Forget about Pastebin

Pastebin is a commonly used site by both developers and hackers. Essentially the site allows easy file sharing between users which are termed “pastes”. Pastes are used in a variety of ways by both development teams and malicious actors. Pastebin allows for fully anonymous posting of any plain text. While the company does remove malicious content, it relies heavily on user reports which can lead to a delay between posting and removal. Some Pastebin uses include:

  • Quickly sharing and collaborating on software code
  • Sharing text files as an alternative to Word Documents
  • Sharing dark web links between users
  • Posting data breach dumps of usernames, emails, and passwords
  • Posting stolen source code and other technical data


We strongly encourage organizations to actively monitor Pastebin for corporate login credentials and leaked technical data. If sensitive information is compromised, chances are high that it could appear as a paste.

Information Leakage Doesn't Have to be Devastating

No matter how good your controls are, information leakage can happen to any organization. The mix of contracted developers, multiple software platforms, and dozens of people involved in software development lends itself to user error. Flare provides a unique digital risk protection platform that can help not only monitor for information leakage, but also identify threats on the dark web such as account takeover schemes and financial fraud. 

Comments are closed.