Royce Williams<p>The protective value of "k-anonymity"¹ for Have I Been Pwned / Pwned Passwords API lookups is significantly reduced because frequency data is included. And the more common the password, the more this effect is magnified.</p><p>An example:</p><p><a href="https://gist.github.com/roycewilliams/2034c9253d46fbcaefb13f8e5d42daa2" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">gist.github.com/roycewilliams/</span><span class="invisible">2034c9253d46fbcaefb13f8e5d42daa2</span></a></p><p>... with cracks:</p><p><a href="https://gist.github.com/roycewilliams/2bb471cc90cce7f6834204344590fcac" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">gist.github.com/roycewilliams/</span><span class="invisible">2bb471cc90cce7f6834204344590fcac</span></a></p><p>Using "k-anonymity"¹ to return all hashes that begin with <code>b2e98</code> is less "anonymous" ... when 98.6% of the passwords (by frequency across all leaks) are the top one.</p><p>It's not really hiding a needle in a haystack if you just lay it on top.</p><p>Edit: in fact, even <em>without</em> the frequency data, since some passwords are much more common than others ... left-skewed distribution is an intrinsic property of password data. Missing frequency data can be largely reconstructed from public cracking efforts. (And even if that weren't true, the hashes can just be cracked using traditional methods. If the cracking community can get a 97%+ cracking rate², what is being achieved other than plausible deniability?)</p><p>K-anonymity [as implemented by HIBP, anyway -- true K-anonymity is different¹] may just be a bad fit for password hashes.</p><p>¹ Not actually k-anonymity at all:<br><a href="https://en.wikipedia.org/wiki/K-anonymity" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">en.wikipedia.org/wiki/K-anonym</span><span class="invisible">ity</span></a></p><p>² Actually closer to 99.29% across the entire corpus, publicly:<br><a href="https://gist.github.com/roycewilliams/40f0e8c93ec9c69f5b5a1874c76f2587" rel="nofollow noopener noreferrer" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">gist.github.com/roycewilliams/</span><span class="invisible">40f0e8c93ec9c69f5b5a1874c76f2587</span></a></p><p><a href="https://infosec.exchange/tags/passwords" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>passwords</span></a> <a href="https://infosec.exchange/tags/HaveIBeenPwned" class="mention hashtag" rel="nofollow noopener noreferrer" target="_blank">#<span>HaveIBeenPwned</span></a></p>