Soatok<p><strong>Beyond Bcrypt</strong></p><p>In 2010, Coda Hale wrote <em><a href="https://codahale.com/how-to-safely-store-a-password/" rel="nofollow noopener noreferrer" target="_blank">How To Safely Store A Password</a></em> which began with the repeated phrase, “Use bcrypt”, where the word bcrypt was linked to a different implementation for various programming languages. </p><p>This had two effects on the technology blogosphere at the time:</p><ol><li>It convinced a lot of people that bcrypt was the right answer for storing a password.</li><li>It created a meme for how technology bloggers recommend specific cryptographic algorithms when they want attention from Hacker News.</li></ol><p>At the time, it was great advice!</p> Credit: <a href="https://cmykat.carrd.co/" rel="nofollow noopener noreferrer" target="_blank">CMYKat</a> <p>In 2010, bcrypt was the only clearly good answer for password hashing in most programming languages.</p><p>In the intervening <em>almost fifteen years</em>, we’ve learned a lot more about passwords, password cracking, authentication mechanism beyond passwords, and password-based cryptography.</p><blockquote><p>If you haven’t already <a href="https://soatok.blog/2022/12/29/what-we-do-in-the-etc-shadow-cryptography-with-passwords/" rel="nofollow noopener noreferrer" target="_blank">read my previous post about password-based cryptography</a>, you may want to give that one a once-over before you continue.</p></blockquote><p>But we’ve also learned a lot more about bcrypt, its limitations, the various footguns involved with using it in practice, and even some cool shit you can build with it.</p><p>In light of a recent discussion about <a href="https://github.com/WordPress/wordpress-develop/pull/7333" rel="nofollow noopener noreferrer" target="_blank">switching WordPress’s password hashing algorithm</a> from PHPass (which is based on MD5) to bcrypt, I feel now is the perfect time to dive into this algorithm and its implications on real-world cryptography.</p><p><strong>Understanding Bcrypt in 2024</strong></p><p>Bcrypt is a password hashing function, but <a href="https://news.ycombinator.com/item?id=22028143" rel="nofollow noopener noreferrer" target="_blank">not a password KDF</a> or general-purpose cryptographic hash function.</p><p>If you’re using a sane password storage API, such as <a href="https://www.php.net/manual/en/function.password-hash.php" rel="nofollow noopener noreferrer" target="_blank">PHP’s password API</a>, you don’t even need to think about salting your passwords, securely verifying passwords, or handling weird error conditions. Instead, you only need concern yourself with the “cost” factor, which exponentially increases the runtime of the algorithm.</p><p>There’s just one problem: <strong>bcrypt silently truncates after 72 characters</strong> (or rather, bytes, if you’re pedantic and assume non-ASCII passwords, such as emoji).</p><p>Here’s a quick script <a href="https://3v4l.org/cRhjD" rel="nofollow noopener noreferrer" target="_blank">you can run yourself</a> to test this:</p> <pre><?php$example1 = str_repeat('A', 72);$example2 = $example1 . 'B';$hash = password_hash($example1, PASSWORD_BCRYPT);var_dump(password_verify($example2, $hash));</pre> <p>This may sound ludicrous (“who uses 72 character passwords anyway?”) until you see security advisories like <a href="https://trust.okta.com/security-advisories/okta-ad-ldap-delegated-authentication-username/" rel="nofollow noopener noreferrer" target="_blank">this recent one from Okta</a>.</p><blockquote><p>The Bcrypt algorithm was used to generate the cache key where we hash a combined string of userId + username + password. Under a specific set of conditions, listed below, this could allow users to authenticate by providing the username with the stored cache key of a previous successful authentication.</p><p>(…)</p><ul><li>The username is 52 characters or longer</li></ul></blockquote><p>The other thing to consider is that many people use passphrases, such as those generated from Diceware, which produce longer strings with less entropy per character.</p><p>If you use bcrypt as-is, you will inevitably run into this truncation at some point.</p><p><strong>“Let’s pre-hash passwords!”</strong></p><p>In response to this limitation, many developers will suggest pre-hashing the password with a general purpose cryptographic hash function, such as SHA-256.</p><p>And so, in pursuit of a way to avoid one footgun, developers introduced two more.</p> <a href="https://bsky.app/profile/ajlovesdinos.bsky.social" rel="nofollow noopener noreferrer" target="_blank">AJ</a> <p><strong>Truncation on NUL Bytes</strong></p><p>If you use the raw binary output of a hash function as your password hash, be aware <a href="https://blog.ircmaxell.com/2015/03/security-issue-combining-bcrypt-with.html" rel="nofollow noopener noreferrer" target="_blank">that bcrypt will truncate on NUL (<code>0x00</code>) bytes</a>.</p><p>With respect to the WordPress issue linked above, the default for PHP’s hashing API is to output hexadecimal characters.</p><p>This is a bit wasteful. Base64 is preferable, although any isomorphism of the raw hash output that doesn’t include a <code>0x00</code> byte is safe from truncation.</p><p><strong>Hash Shucking</strong></p><p>When a system performs a migration from a cryptographic hash function (e.g., MD5) to bcrypt, they typically choose to re-hash the existing hash with bcrypt. </p><p>Because users typically reuse passwords, you can often take the fast, unsalted hashes from another breach and use it as your password dictionary for bcrypt. </p><p>If then you succeed in verifying the bcrypt password for a fast hash, you can then plug the fast hash into software like Hashcat, and then crack the actual password at a much faster rate (tens of billions of candidates/second, versus thousands per second).</p><p>This technique is called <a href="https://youtu.be/OQD3qDYMyYQ?t=1462" rel="nofollow noopener noreferrer" target="_blank">hash shucking</a> (YouTube link).</p><p>You can avoid hash shucking by using HMAC with a static key–either universal for all deployments of your software, or unique per application. </p><p>It doesn’t really matter which you choose; all you really need from it is domain separation from naked hashes.</p><blockquote><p>I frequently see this referred to as “peppering”, but the term “pepper” isn’t rigidly defined anywhere.</p></blockquote><p>One benefit of using a per-application HMAC secret does make your hashes harder to crack if you don’t know this secret.</p><p>For balance, one downside is that your hashes are no longer portable across applications without managing this static key.</p><p><strong>Disarming Bcrypt’s Footguns</strong></p><p>Altogether, it’s quite straightforward to avoid bcrypt’s footguns, as <a href="https://github.com/WordPress/wordpress-develop/pull/7333#pullrequestreview-2449232465" rel="nofollow noopener noreferrer" target="_blank">I had recommended to WordPress last week</a>.</p><ol><li>Pre-hash with HMAC-SHA512.</li><li>Ensure the output of step 1 is base64-encoded.</li><li>Pass the output of step 2 to PHP’s password API.</li></ol><p>Easy, straightforward, and uncontroversial. Right?</p><p><strong>Objections to Bcrypt Disarmament</strong></p><p>The linked discussion was <a href="https://github.com/WordPress/wordpress-develop/pull/7333#issuecomment-2499156613" rel="nofollow noopener noreferrer" target="_blank">tedious</a>, so I will briefly describe the objections raised to my suggestion.</p><ol><li>This is “rolling our own crypto”.<ul><li>Answer: No, it’s a well-understood pattern that’s been discussed in the PHP community for well over a decade.</li></ul></li><li>Passwords over 72 characters are rare and not worthy of our consideration.<ul><li>Answer: No, this has bit people in unexpected ways before (see: Okta).<p>When you develop a popular CMS, library, or framework, you cannot possibly know all the ways that your software will be used by others. It’s almost always better to be misuse-resistant.</p></li></ul></li><li>Pre-hashing introduces a Denial-of-Service attack risk.<ul><li>Answer: No. Bcrypt with a cost factor of 10 is about 100,000 times as expensive as SHA2.</li></ul></li><li>This introduces a risk of hash shucking.<ul><li>As demonstrated above, HMAC doesn’t suffer this problem (assuming the key is reasonably selected).</li></ul></li><li>Base64 encoding reduces entropy.<ul><li>Answer: No, it’s isomorphic.</li></ul></li><li>Base64 with the 72 character truncation reduces entropy.<ul><li>Answer: We’re still truncating SHA-512 to more than 256 bits of its output, so this doesn’t actually matter for any practical security reason.</li></ul></li><li>This would necessitate a special prefix (e.g. <code>$2w$</code>) to distinguish disarmed bcrypt from vanilla bcrypt that PHP’s password API wouldn’t know what to do with.<ul><li>This is a trivial concern, for which the fix is also trivial: <br>After password_hash(), modify the prefix with a marker to indicate pre-hashing.<br>Before password_verify(), swap the original prefix back in.</li></ul></li></ol><p>There were some other weird arguments (such as “Bcrypt is approved by NIST for FIPS”, which is just plain false).</p><p><strong>Why Bcrypt Truncating SHA-512 Doesn’t Matter</strong></p><p>If you have a random secret key, HMAC-SHA-512 is a secure pseudorandom function that you can treat as a <a href="https://crypto.stackexchange.com/a/880" rel="nofollow noopener noreferrer" target="_blank">Random Oracle</a>. </p><p>Because it’s HMAC, you don’t have to worry about Length Extension Attacks at all. Therefore, the best known attack strategy is to produce a collision.</p><p>The raw binary output of SHA-512 is 64 characters, but may contain NUL characters (which would truncate the hash). To avoid this, we base64-encode the output.</p><p>When you base64-encode a SHA-512 hash, the output is 88 characters (due to base64 padding). This is longer than the 72 characters supported by bcrypt, so it will truncate silently after 72 characters.</p><p>This is still secure, but to prove this, I need to use math.</p><p>First, let’s assume you’re working with an extremely secure, high-entropy password, and might be negatively impacted by this truncation. How bad is the damage in this extreme case?</p><p>There are 64 possible characters in the base64 alphabet. That’s tautology, after all.</p><p>If you have a string of length 72, for which each character can be one of 64 values, you can represent the total probability space of possible strings as .</p><p>If you know that , you can do a little bit of arithmetic and discover this quantity equal to .</p><p>As I discussed in <a href="https://soatok.blog/2024/07/01/blowing-out-the-candles-on-the-birthday-bound/" rel="nofollow noopener noreferrer" target="_blank">my deep dive on the birthday bound</a>, you can take the cube root of this number to find what I call the Optimal Birthday Bound.</p><p>This works out to samples in order to find a probability of a single collision.</p><p>This simply isn’t going to happen in our lifetimes.</p> 2^-144 is about 17 trillion times less likely than 2^-100. <p>The real concern is the entropy of the actual password, not losing a few bits from a truncated hash.</p><p>After all, even though the outputs of HMAC-SHA512 are indistinguishable from random when you don’t know the HMAC key, the <strong>input</strong> selection is entirely based on the (probably relatively easy-to-guess) password.</p><p><strong>“Why not just use Argon2 or Scrypt?”</strong></p><p>Argon2 and scrypt don’t have the bcrypt footguns. You can hash passwords of arbitrary length and not care about NUL characters. They’re great algorithms.</p><p>Several people involved in the Password Hashing Competition (that selected Argon2 as its winner) have since lamented the emphasis on memory-hardness over <a href="https://github.com/Sc00bz/bscrypt#why-cache-hard" rel="nofollow noopener noreferrer" target="_blank">cache-hardness</a>. Cache-hardness is more important for short run-times (i.e., password-based authentication), while memory-hardness is more important for longer run-times (i.e., key derivation).</p><p>As Sc00bz explains in the GitHub readme for <a href="https://github.com/Sc00bz/bscrypt?tab=readme-ov-file#why-cache-hard" rel="nofollow noopener noreferrer" target="_blank">his bscrypt project</a>:</p><blockquote><p>Cache hard algorithms are better than memory hard algorithms at shorter run times. Basically cache hard algorithms forces GPUs to use 1/4 to 1/16 of the memory bandwidth because of the large bus width (commonly 256 to 1024 bits). Another way to look at it is memory transactions vs bandwidth. Also the low latency of L2 cache on CPUs and the 8 parallel look ups let’s us make a lot of random reads. With memory hard algorithms, there is a point where doubling the memory quarters a GPU attacker’s speed. There then is a point at which a memory hard algorithm will overtake a cache hard algorithm. Cache hard algorithms don’t care that GPUs will get ~100% utilization of memory transactions because it’s already very limiting.</p></blockquote><p>Ironically, bcrypt is cache-hard, while scrypt and the flavors of Argon2 that most people use are not.</p><p>Most of my peers just care that you use <em>a</em> password hashing algorithm at all. They don’t particularly care which. The bigger, and more common, vulnerability is not using one of them in the first place.</p><p>I’m mostly in agreement with them, but I would prefer that anyone that chooses bcrypt takes steps to disarm its footguns.</p><p><strong>Turning Bcrypt Into a KDF</strong></p><p>Earlier, I noted that <a href="https://soatok.blog/2022/12/29/what-we-do-in-the-etc-shadow-cryptography-with-passwords/#pbcff" rel="nofollow noopener noreferrer" target="_blank">bcrypt is not a password KDF</a>. That doesn’t mean you can’t make one out of bcrypt. Ryan Castellucci is an amazing hacker; they managed <a href="https://github.com/ryancdotorg/bcrypt-ext" rel="nofollow noopener noreferrer" target="_blank">to do just that</a>.</p><p>To understand why this is difficult, and why Ryan’s hack works, you need to understand what bcrypt <em>actually is</em>.</p><p>Bcrypt is <a href="https://en.wikipedia.org/wiki/Bcrypt#Algorithm" rel="nofollow noopener noreferrer" target="_blank">a relatively simple algorithm</a> at its heart:</p><ol><li>Run the Blowfish key schedule, several times, over both the password and salt.</li><li>Encrypt the string <code>"OrpheanBeholderScryDoubt"</code> 64 times in ECB mode using the expanded key from step 1.</li></ol><p>Most of the heavy work in bcrypt is actually done in the key schedule; the encryption of three blocks (remember, Blowfish is a 64-bit block cipher) just ensures you need the correct resultant key from the key schedule.</p><p><strong>“So how do you get an encryption key out of bcrypt?”</strong></p><p><em>It’s simple: we, uh, <a href="https://github.com/ryancdotorg/bcrypt-ext/blob/cd6d6f52880c0242bd356b6bae5272a6feee1cfa/blowfish.c#L239-L246" rel="nofollow noopener noreferrer" target="_blank">hash the S-box</a>.</em></p> <pre>static void BF_kwk(struct BF_data *data, uint8_t kwk[BLAKE2B_KEYBYTES]) { BF_word *S = (BF_word *)data->ctx.S; BF_htobe(S, 4*256); // it should not be possible for this to fail... int ret = blake2b_simple(kwk, BLAKE2B_KEYBYTES, S, sizeof(BF_word)*4*256); assert(ret == 0); BF_betoh(S, 4*256);}</pre> <p>Using BLAKE2b to hash the S-box from the final Blowfish key expansion yields a key-wrapping key that can be used to encrypt whatever data is being protected.</p><p>The only feasible way to recover this key is to provide the correct password and salt to arrive at the same key schedule.</p><p>Any attack against the selection of S implies a cryptographic weakness in bcrypt, too. (I’ve already recommended <a href="https://github.com/ryancdotorg/bcrypt-ext/issues/1" rel="nofollow noopener noreferrer" target="_blank">domain separation</a> in a GitHub issue.)</p> <a href="https://cmykat.carrd.co/" rel="nofollow noopener noreferrer" target="_blank">CMYKat</a> <p>It’s worth remembering that Ryan’s design is a proof-of-concept, not a peer-reviewed design ready for production. Still, it’s a cool hack. </p><p>It’s also <a href="https://github.com/openbsd/src/blob/f6e19f5194481d5e142c7da4fb7ca548e5bd10af/lib/libutil/bcrypt_pbkdf.c" rel="nofollow noopener noreferrer" target="_blank">not the first of its kind</a> (thanks, <a href="https://cybervillains.com/@djm/113555841495008970" rel="nofollow noopener noreferrer" target="_blank">Damien Miller</a>).</p><p>If anyone was <strong>actually</strong> considering using this design, first, they should wait until it’s been adequately studied. Do not pass Go, do not collect $200.</p><p>Additionally, the output of the BLAKE2b hash should be used as the input keying material for, e.g., <a href="https://soatok.blog/2021/11/17/understanding-hkdf/" rel="nofollow noopener noreferrer" target="_blank">HKDF</a>. This lets you split the password-based key into multiple application-specific sub-keys without running the password KDF again for each derived key.</p><p><strong>Wrapping Up</strong></p><p>Although bcrypt is still an excellent cache-hard password hashing function suitable for interactive logins, it does have corner cases that sometimes cause vulnerabilities in applications that misuse it.</p><p>If you’re going to use bcrypt, make sure you use bcrypt in line with my recommendations to WordPress: HMAC-SHA-512, base64 encode, then bcrypt.</p><p>Here’s a quick proof-of-concept for PHP software:</p> <pre><?phpdeclare(strict_types=1);class SafeBcryptWrapperPoC{ private $staticKey; private $cost = 12; public function __construct( #[\SensitiveParameter] string $staticKey, int $cost = 12 ) { $this->staticKey = $staticKey; $this->cost = $cost; } /** * Generate password hashes here */ public function hash( #[\SensitiveParameter] string $password ): string { return \password_hash( $this->prehash($password), PASSWORD_BCRYPT, ['cost' => $this->cost] ); } /** * Verify password here */ public function verify( #[\SensitiveParameter] string $password, #[\SensitiveParameter] string $hash ): bool { return \password_verify( $this->prehash($password), $hash ); } /** * Pre-hashing with HMAC-SHA-512 here * * Note that this prefers the libsodium base64 code, since * it's implemented in constant-time */ private function prehash( #[\SensitiveParameter] string $password ): string { return \sodium_bin2base64( \hash_hmac('sha512', $password, $this->staticKey, true), \SODIUM_BASE64_VARIANT_ORIGINAL_NO_PADDING ); }}</pre> <p>You can see <a href="https://3v4l.org/WLB7q" rel="nofollow noopener noreferrer" target="_blank">a modified version of this proof-of-concept on 3v4l</a>, which includes the same demo from the top of this blog post to demonstrate the 72-character truncation bug.</p><p>If you’re already using bcrypt in production, you should be cautious with adding this pre-hashing alternative. Having vanilla bcrypt and non-vanilla bcrypt side-by-side could introduce problems that need to be thoroughly considered.</p><p>I can safely recommend it to WordPress because they weren’t using bcrypt before. Most of the people reading this are probably not working on the WordPress core.</p><p><strong>Addendum (2024-11-28)</strong></p><p>More of the WordPress team has chimed in to signal support for vanilla bcrypt, rather than disarming the bcrypt footgun.</p><p>The reason?</p><blockquote><p>That would result in <strong>maximum compatibility for existing WordPress users who use the Password hashes outside of WordPress</strong>, while also not introducing yet-another-custom-hash into the web where it’s not overly obviously necessary, but while still gaining the bcrypt advantages for where it’s possible.</p><p><a href="https://github.com/WordPress/wordpress-develop/pull/7333#issuecomment-2505128884" rel="nofollow noopener noreferrer" target="_blank">dd32</a></p></blockquote><p>The hesitance to introduce a custom hash construction is understandable, but the goal I emphasized with bold text is weird and not a reasonable goal for any password storage system.</p><p>It’s true that the overwhelming non-WordPress code written in PHP is just using the password hashing API. But that means they aren’t compatible with WordPress today. PHP’s password hashing API doesn’t implement phpass, after all.</p><p>In addition to being scope creep for a secure password storage strategy, it’s kind of a bonkers design constraint to expect password hashes be portable. Why are you intentionally exposing hashes unnecessarily?</p> <a href="https://cmykat.carrd.co/" rel="nofollow noopener noreferrer" target="_blank">CMYKat</a> <p>At this point, it’s overwhelmingly likely that WordPress will choose to not disarm the bcrypt footguns, and will just ship it. </p><p>That’s certainly not the worst outcome, but I do object to arriving there for stupid reasons, and that GitHub thread is <strong>full of stupid reasons</strong> and misinformation.</p><p>The most potent source of misinformation also <a href="https://github.com/WordPress/wordpress-develop/pull/7333#issuecomment-2499314967" rel="nofollow noopener noreferrer" target="_blank">barked orders at me</a> and then tried to dismiss my technical arguments as the concerns of “the hobbyist community”, which was a great addition to my LinkedIn profile.</p><p>If WordPress’s choice turns out to be a mistake–that is to say, that their decision for vanilla bcrypt introduces a vulnerability in a plugin or theme that uses their password hashing API for, I dunno, API keys?–at least I can say I tried.</p><p>Additionally, WordPress cannot say they didn’t know the risk existed, especially in a courtroom, since me informing them of it is so thoroughly documented (and archived).</p> <a href="https://cmykat.carrd.co/" rel="nofollow noopener noreferrer" target="_blank">CMYKat</a> <p>Here’s to hoping the risk never actually manifests. Saying “I told you so” is more bitter than sweet in security. Happy Thanksgiving.</p> <p>Header image: Art by <a href="https://bsky.app/profile/mrjimmydafloof.bsky.social" rel="nofollow noopener noreferrer" target="_blank">Jim</a> and <a href="https://cmykat.carrd.co/" rel="nofollow noopener noreferrer" target="_blank">CMYKat</a>; a collage of some DEFCON photos, as well as Creative Commons photos of <a href="https://commons.wikimedia.org/wiki/File:Bruce_Schneier_at_CoPS2013-IMG_9174.jpg" rel="nofollow noopener noreferrer" target="_blank">Bruce Schneier</a> (inventor of the Blowfish block cipher) and <a href="https://commons.wikimedia.org/wiki/File:Niels_provos.jpg" rel="nofollow noopener noreferrer" target="_blank">Niels Provos</a> (co-designer of bcrypt, which is based on Blowfish).</p><p></p><p><a rel="nofollow noopener noreferrer" class="hashtag u-tag u-category" href="https://soatok.blog/tag/bcrypt/" target="_blank">#bcrypt</a> <a rel="nofollow noopener noreferrer" class="hashtag u-tag u-category" href="https://soatok.blog/tag/cryptography/" target="_blank">#cryptography</a> <a rel="nofollow noopener noreferrer" class="hashtag u-tag u-category" href="https://soatok.blog/tag/password-hashing/" target="_blank">#passwordHashing</a> <a rel="nofollow noopener noreferrer" class="hashtag u-tag u-category" href="https://soatok.blog/tag/passwords/" target="_blank">#passwords</a> <a rel="nofollow noopener noreferrer" class="hashtag u-tag u-category" href="https://soatok.blog/tag/security-guidance/" target="_blank">#SecurityGuidance</a></p>