Passwords verification against a set of breached passwords
NIST and OWASP ASVS recommends checking passwords against those obtained from previous data breaches. The list of such passwords can be downloaded from "Have I Been Pwned" (https://haveibeenpwned.com/Passwords), but there are more than 300 million pwned passwords. Do you recommend comparing users' passwords (during registration, password change, or even every successful login) with such a large list? Will it affect the performance too much?
Is it a better idea to compare passwords with smaller lists like top 1000 or 10000 most common passwords for example https://github.com/danielmiessler/SecLists/tree/master/Passwords? That is the exact recommendation from OWASP ASVS.
"Every successful login" is pointless waste unless you just implemented the password-checking feature and didn't have it before, in which case it might make sense to check every user's login once. The only other time it would make sense is right after you add a bunch of new passwords. Add a field to your authentication DB that indicates whether each password has been checked against the list (defaulting to false), and check each one once, toggling the field to true after it passes (no match). Those that do match the list get forced to change it (if they decline, leave their field false, or even just flag the account for "must change password"). New passwords (either of new users, or if the user changes or resets them) are checked at creation, and once they pass, the field is set true.
The pwned passwords list can be sorted in a few ways, but "alphabetic" (by string comparison logic) is an obvious one, and allows extremely fast lookups (binary search). Stuff it all in a database table with appropriate indices and you won't even have to write the lookup logic yourself. There's no performance reason not to check all 300 million; it's only a bit more than twice as many lookups as checking 10,000 (which isn't enough, really), and that's a pretty easy task for any decent DB engine with a few gigs of RAM to use (give it enough RAM and the whole column can be kept in RAM, though even without that it'll be decently fast). However, you might decide there's some threshold where a password that has ever been compromised is still tolerable (maybe the passwords that have been seen in dumps only once, or just not the million most common ones, or some such thing). The most secure option is definitely to check them all, though.