Lucian Constantin
CSO Senior Writer

This tool can help weed out hard-coded keys from software projects

news
Jan 9, 20172 mins

Truffle Hog can find access tokens and keys that are 20 characters or longer inside source code repositories

A security researcher has developed a tool that can automatically detect sensitive access keys that have been hard-coded inside software projects.

The Truffle Hog tool was created by U.S.-based researcher Dylan Ayrey and is written in Python. It searches for hard-coded access keys by scanning deep inside git code repositories for strings that are 20 or more characters and which have a high entropy. A high Shannon entropy, named after American mathematician Claude E. Shannon, would suggest a level of randomness that makes it a candidate for a cryptographic secret, like an access token.

Hard-coding access tokens for various services in software projects is considered a security risk because those tokens can be extracted without much effort by hackers. Unfortunately this practice is very common.

In 2014 a researcher found almost 10,000 access keys for Amazon Web Services and Elastic Compute Cloud left by developers inside publicly accessible code on GitHub. Amazon has since started scanning GitHub for such keys itself and revoking them.

Last year researchers from Detectify found 1,500 Slack tokens hard-coded by developers into GitHub projects, many of them providing access to chats, files, private messages, and other sensitive data shared inside Slack teams.

In 2015, a study by researchers from Technical University and the Fraunhofer Institute for Secure Information Technology in Darmstadt, Germany, uncovered over 1,000 access credentials for Backend-as-a-Service (BaaS) frameworks stored inside Android and iOS applications. Those credentials unlocked access to more than 18.5 million records containing 56 million data items stored on BaaS providers like Facebook-owned Parse, CloudMine or Amazon Web Services.

Truffle Hog digs deep into a project’s commit history and branches. It will evaluate the Shannon entropy for both the base64 and hexadecimal character set for every blob of text greater than 20 characters, Ayrey said in the project’s description.

The tool is available on GitHub and requires the GitPython library to run. Companies and independent developers can use it to scan their own software projects before hackers do so.

Lucian Constantin

Lucian Constantin writes about information security, privacy, and data protection for CSO. Before joining CSO in 2019, Lucian was a freelance writer for VICE Motherboard, Security Boulevard, Forbes, and The New Stack. Earlier in his career, he was an information security correspondent for the IDG News Service and Information security news editor for Softpedia.

Before he became a journalist, Lucian worked as a system and network administrator. He enjoys attending security conferences and delving into interesting research papers. He lives and works in Romania.

You can reach him at lucian_constantin@foundryco.com or @lconstantin on X. For encrypted email, his PGP key's fingerprint is: 7A66 4901 5CDA 844E 8C6D 04D5 2BB4 6332 FC52 6D42

More from this author