How company data and secrets leak from GitHub repositories

One boring day during the pandemic, security researcher Craig Hays decided to do an experiment. He wanted to disclose an SSH username and password to a GitHub repository and see if an attacker could find it. Hays thought he would have to wait a few days, maybe a week, before anyone noticed. The reality turned out to be more brutal. The first unauthorized connection occurred within 34 minutes. “The biggest eye-opener for me was how quickly it was exploited,” he told CSO.

In the first 24 hours, six different IP addresses connected to his honeypot a total of nine times. One attacker attempted to install a botnet client, while another attempted to use the server to launch a denial of service attack. Hays also saw someone wanting to steal sensitive information from the server and someone else looking around.

Experience has shown him that threat actors constantly scan GitHub and other public code repositories for sensitive data left behind by developers. The volume of secrets, including usernames, passwords, Google keys, developer tools, or private keys, continues to increase as businesses move from on-premises software to the cloud and more and more developers are working from home. This year alone, there will be an increase of at least 20% in exposed secrets compared to the previous year, said Eric Fourrier, co-founder of French security startup GitGuardian, which analyzes public repositories to identify data that attackers could take advantage of.

How hackers uncover the secrets of GitHub

Hackers know that GitHub is a great place to find sensitive information, and organizations like the United Nations, Equifax, Codecov, Starbucks, and Uber have paid the price for neglect. Some companies might claim that they are not at risk because they don’t run on open source code, but the truth is more nuanced; developers often use their personal repository for work projects. According to the State of Secrets Sprawl report on GitHub, 85% of leaks occur on personal developer repositories and only the remaining 15% occur in repositories owned by organizations.

Developers leave shell command history, environment files, and copyrighted content. Sometimes they make mistakes because they are trying to streamline their processes. For example, they can include their credentials when they write code because it is easier to debug. Then they might forget to delete it and commit. Even if they perform a delete commit later or push force to clear secrets, this private information is often still accessible in Git history.

types of gitguardian secrets GitGuardian

Most common types of secrets exposed on GitHub

“I find a lot of passwords in older versions of files that have been replaced by newer, cleaner versions without the passwords,” says Hays. “Git commit history remembers everything, unless you delete it deliberately and explicitly.”

Both junior and senior developers can make mistakes. “Even if you’re a big developer and educated on the problem, at some point when you’re coding late at night you can make a mistake and things happen,” Fourrier said. “Leaking secrets is human error.”

gitguardian file extensions GitGuardian

Most common file types found exposed on GitHub

While any developer is prone to mistakes, those new to the job market usually divulge the most secrets. Many years ago, while a software engineering student, Crina Catalina Bucur set up an AWS account for development purposes and received a bill for $ 2,000 on which only $ 0.01 belonged to her just. title.

“My project was an aggregate file management platform for a dozen cloud storage services, including Amazon’s S3,” she said. “This was before GitHub offered free private repositories, so my AWS access key and corresponding secret key were published along with the code to my public repository. I didn’t stop to think about it, but even though I did, I didn’t ‘I don’t think I would have paid too much attention to it. “

A few days later, she started receiving emails from AWS warning her that her account was compromised, but she didn’t read them carefully, until she received the invoice. Luckily for her, AWS Support waived the extra charges. Bucur made several mistakes which were exploited by hackers, including hard-coding the keys for convenience and posting them to a public code repository.

Today hackers who want to find errors like these need few resources, says Hays. He’s a bug bounty hunter in his spare time and often relies on open source information (OSINT), information anyone can find on the web if they know where to look for it. “My method of choice is to search manually using the standard interface,” he said. “I use search operators to narrow down to specific file types, keywords, users, and organizations, based on the companies I’m targeting.”

There are some tools that can make the process faster and more efficient. “Attackers run automated bots that grab GitHub content and extract sensitive information,” says Gabriel Cirlig, security researcher at HUMAN. These bots can run all the time, which means hackers can spot errors within seconds or minutes.

Once a secret is found, attackers can easily exploit it. “For example, if you find an AWS key, you have access to all of the company’s cloud infrastructure,” says Fourrier. “It’s very easy to target developers who work for a specific company and try to look at some of the assets of the company. Depending on the nature of the secrets, hackers can do many things, including launching supply chain attacks and compromising the security of a company’s customers.

How Businesses Can Protect Secrets From GitHub Leaks

As the volume of secrets increases, businesses need to better detect them before it’s too late. GitHub has its own “secret scan partner program,” which finds text strings that look like passwords, SSH keys, or API tokens. GitHub has partnered with over 40 cloud service providers to automatically fix API keys exposed in public repositories.

“We are continually looking to expand these partnerships to better protect the ecosystem,” a GitHub spokesperson told CSO. “We are currently revoking over 100 GitHub API keys exposed daily, often safely introducing new developers to the importance of credential security.”

Hays said the “Secret Scan Partner Program” is a step in the right direction, as it makes it harder for attackers to find valid credentials. He says, however, that the initiative is not perfect. “It still leaves a void when people accidentally verify their own SSH keys, passwords, tokens or anything else sensitive,” he says. “It’s a lot harder to detect and manage because there aren’t any partner credential providers to ask questions like, ‘Is this real? Do you want to revoke it? Should either of us let the owner know? “”

Meanwhile, he advises developers to be aware of how they write and deploy their code. “One of the first things to do is add the correct settings to a .gitignore file,” he said. “This file tells Git and therefore which files should not be tracked and downloaded from the Internet.”

Some security startups are also trying to fill the void. GittyLeaks, SecretOps, gitLeaks and GitGuardian aim to offer a few more layers of protection for business users and independent professionals. Some detect leaked secrets within seconds, allowing developers and businesses to take immediate action. “We scan all of your code on your software throughout the development cycle, the Docker container, different types of data,” says Fourrier. “We find the secrets and try to revoke them.”

Ideally, however, the best strategy is to keep secrets at all or to disclose as little as possible and awareness about this can help with that. “Training developers to write secure code and proactively stop bots is always better than playing around with leaked secrets,” Cirlig says.

Copyright © 2021 IDG Communications, Inc.

About Janet Young

Check Also

Technique protects privacy when making online recommendations – Eurasia Review

Algorithms recommend products when we shop online or suggest songs we might like when we …

Leave a Reply

Your email address will not be published.