Thank you for your detailed response — I truly appreciate critical engagement like this as it helps elevate the discussion and refine the research.
You’re absolutely right that BIP39 itself is an encoding standard, and it’s not inherently “weak” from a cryptographic perspective. I agree that the standard is well-designed and widely reviewed.
However, my investigation is not targeting BIP39 as a protocol, but rather the real-world implementations of wallets and the entropy sources used during seed generation. The weakness I referred to lies in how some wallets or tools may generate seed phrases with statistical bias or flawed randomness — not the BIP39 spec itself.
Regarding the 24,000 wallet hits — perhaps I should have clarified more precisely:
- These are not confirmed “used” wallets with transaction history, but rather derivable addresses with publicly visible structure via explorers. You’re correct that a derived address doesn’t necessarily mean the wallet was “used”, and I did not imply successful access to any such wallets — my apologies if that wasn’t clear.
- My experiment involved analyzing entropy bias in seed generation, not brute-forcing access to existing wallets. I used frequency analysis and clustered common patterns generated by certain tools. This revealed repeated phrases and derivable patterns, some of which coincided with real addresses that appear to have been used — which raised red flags.
Of course, collisions at that scale are near impossible under ideal randomness. Which is precisely my point — if such overlaps even appear in a small test sample, it may imply that some seed generation tools are not following best practices, such as proper use of CSPRNGs or secure entropy.
Your point about the improbability of randomly discovering active wallets is very important — and I agree. My approach is not about breaking the math, but about auditing the implementation quality of tools that generate wallets.
Thanks again for your thoughtful feedback.