2023年8月5日 01:22:18go评论120阅读模式

英文:

Can Levenshein distance prevent password abuse?

问题

以下是您要翻译的部分：

"Learning about passwords by writing an app around flask's auth utilities. I've crammed all the approaches I can think of in one function cleanpassword(). Later I will decide which ones are more trouble than they are worth.

This is the fragment in question. The rest of the function is below for context. I downloaded a list of the worst passwords known to man (in English -- I bet the lists for others are just as funny). I archive each password as it is used. There is a timestamp if I ever need to allow reuse after a while. The archived passwords are stored hashed because seeing a users last five passwords would make it pretty easy to guess what his next five will be or what he uses on another site. The historically bad passwords are in plain text.

Storing them plain allows the use of Levenshein distance to prevent trivial modification of bad passwords ('football' 'f00tball' and 'futball!') are all treated the same."

您可以告诉我是否需要继续翻译其他部分。

英文:

Learning about passwords by writing an app around flask's auth utilities. I'ved crammed all the approaches I can think of in one function cleanpassword(). Later I will decide which ones are more trouble than they are worth.

This is the fragment in question. The rest of the function is below for context.
I downloaded a list of the worst passwords known to man (in English -- I bet the lists for others are just as funny). I archive each password as it is used. There is a timestamp if I ever need to allow reuse after a while. The archived passwords are stored hashed because seeing a users last five passwords would make it pretty easy to guess what his next five will be or what he uses on another site. The historically bad passwords are in plain text.

Storing them plain allows the use of Levenshein distance to prevent trivial modification of bad passwords ('football' 'f00tball' and 'futball!') are all treated the same.

#    # are they on the list of 10k really bad passwords?
#    if BadPasswords.query.filter(BadPasswords.baddie == password).first():
#        flash(&#39;Password &#39;+ password +&#39; is on the list of really bad passwords.&#39;,&#39;error&#39;)
#        return False
    # are close to the list of 10k really bad passwords?
    badwords = BadPasswords.query.all()
    for b in badwords:
        if lev_distance(b.baddie, password) &lt; levenshtein_limit:
        flash(&#39;Password &#39;+ password +&#39; is on or near the list of really bad passwords.&#39;,&#39;error&#39;)
        return False
    # are they on the list old passwords?
    if bcrypt.check_password_hashpassword in OldPasswords.query.all():
        flash(&#39;Password &#39;+ password +&#39; is on the list of previous passwords.&#39;,&#39;error&#39;)
        return False

So the question is, can I use a similar edit distance test to prevent reuse of originally acceptable passwords? I think the answer is no, but it is hard to prove a negative -- particularly when authorization technology get very deep very close to shore. My understanding is elementary on a good day.

    # check for safe password
from password_strength import PasswordPolicy
from Levenshtein import distance as lev_distance
from app import bcrypt, db
from models import BadPasswords, OldPasswords
from extensions import environ
forgive = environ.get(&#39;FORGIVE_BAD_PASSWORDS&#39;)
#forgive = True
levenshtein_limit = environ[&#39;LEVENSHTEIN_LIMIT&#39;]
# class BadPasswords(db.Model):
#     __tablename__ = &#39;BadPasswords&#39;
#     id = db.Column(db.Integer, primary_key=True) 
#     baddie = db.Column(db.String(25)) 
# class OldPasswords(db.Model):
#     __tablename__=&#39;OldPasswords&#39;
#     id = db.Column(db.Integer, primary_key=True) 
#     oldie = db.Column(db.String(255)) 
#     created = db.Column(db.DateTime)
policy = PasswordPolicy.from_names(
    length     = environ[&#39;PASSWORD_LENGTH&#39;],
    uppercase  = environ[&#39;PASSWORD_REQUIRE_UPPERCASE&#39;],
    nonletters = environ[&#39;PASSWORD_REQUIRE_NON_LETTERS&#39;],
    strength   =(environ[&#39;PASSWORD_REQUIRE_STRENGTH&#39;], environ[&#39;PASSWORD_REQUIRE_ENTROPY_BITS&#39;])
)
def cleanpassword(password,verify):
    # forgive bad passwords for debugging
    if forgive:
        return True
    # do they match?
    if not password == verify:
        flash (&quot;Passwords don&#39;t match&quot;)
        return False
#    are they on the list of 10k really bad passwords?
#    if BadPasswords.query.filter(BadPasswords.baddie == password).first():
#        flash(&#39;Password &#39;+ password +&#39; is on the list of really bad passwords.&#39;,&#39;error&#39;)
#        return False
    # are close to the list of 10k really bad passwords?
    badwords = BadPasswords.query.all()
    for b in badwords:
        if lev_distance(b.baddie, password) &lt; levenshtein_limit:
        flash(&#39;Password &#39;+ password +&#39; is on or near the list of really bad passwords.&#39;,&#39;error&#39;)
        return False
    # are they on the list old passwords?
    if bcrypt.check_password_hashpassword in OldPasswords.query.all():
        flash(&#39;Password &#39;+ password +&#39; is on the list of previous passwords.&#39;,&#39;error&#39;)
        return False
    # use the quality tests      
    test = policy.test(password)
    if test:
        for t in test:
            fault = t.name()
            if fault == &#39;length&#39;:
                flash(&#39;Password needs at least &#39;+ str(t.length) +&#39; letters.&#39;,&#39;error&#39;)
            if fault == &#39;uppercase&#39;:
                flash(&#39;Password needs at least &#39;+ str(t.count)  +&#39; upper-case letters.&#39;,&#39;error&#39;)
            if fault == &#39;nonletters&#39;:
                flash(&#39;Password needs at least &#39;+ str(t.count)  +&#39; non-letter characters.&#39;,&#39;error&#39;)
            if fault == &#39;strength&#39;:
                flash(&#39;Password needs more entropy.&#39;)
        return False
    # seems to have passed all the tests
    return True

答案1

得分: 1

以下是翻译好的部分：

哈希基础

正如你所说，为了安全起见，存储的密码应始终进行哈希处理。我希望你也在哈希之前加入了盐（salt） - 或者也许 Flask 会为你处理，我不清楚。

为了使哈希函数有效，它必须创建基本上是随机的输出。如果你对两个仅相差一个字符的值进行哈希处理 - 如 foobar 和 foobaz - 哈希输出应完全不同。如果你只有哈希值，就无法判断输入是否相关（例如 foobar 和 foobaz）或完全不同（例如 foobar 和 elephant）。否则，你的哈希函数就不安全，密码可能会被泄漏。

因此，一个好的哈希函数的属性使哈希值对于莱文斯坦距离或任何其他相似性度量都完全无用。哈希值摧毁了输入之间的所有关系。

它如何影响你的方法

这意味着你只能对未经哈希处理（明文）的密码运行莱文斯坦距离。从许多方面来说，这种解决方案都是错误的。密码绝不能以明文形式存储在任何地方（你的不好的密码列表是可以的，因为你拒绝将它们作为密码）。密码绝不能以明文形式发送。密码绝不能在任何情况下发送到服务器。密码只应在客户端以已哈希的状态下离开客户端。

如果你遵循这个重要规则，那么你只能在客户端上运行密码检查。客户端输入可能的密码，本地代码检查其有效性。如果没问题，那么密码在发送到服务器之前进行哈希处理。服务器永远不应看到明文，否则它可能会被泄漏。

在这种情况下，你正在尝试的是不可能的。你的身份验证服务器存储以前的密码，已经哈希过了。无论是在发送已哈希的以前的密码到客户端，还是在将明文的新密码发送到服务器（非常糟糕）之后，都无法计算哈希密码和新密码之间的有意义的距离度量。无论如何，你都不能使用哈希值进行有意义的比较。

应该如何完成

希望你的密码检查是在客户端上完成的 - 不清楚你的示例代码在哪里运行。如果是在服务器端进行的，立即停止并重新思考你的设计。密码永远不应该发送到服务器。

甚至发送已哈希的密码也是不安全的。如果我的密码是 "foobar"，经过哈希处理后变成 "vsfsevssdfsdga"，而我每次都发送 "vsfsevssdfsdga" 来进行服务器身份验证，那么我什么也没做。攻击者可以窃听到这个魔术令牌 "vsfsevssdfsdga" 并使用它来获得访问权限，从而破坏了哈希的目的。

你真正应该做的是一种挑战-响应系统，其中客户端和服务器都知道一个共享的秘密（例如哈希密码），它们从未直接交换。相反，服务器发送一个挑战，例如 "发送给我带有随机字符串 XYZ 的密码的哈希值"。它的工作方式如下：

示例

以上面的示例为例：密码 foobar 和哈希值 "vsfsevssdfsdga"。服务器存储 "vsfsevssdfsdga" 并计算哈希值（"vsfsevssdfsdga" + XYZ）= klojhsvndfb。用户输入 foobar，客户端计算哈希值（哈希（foobar）+ XYZ）= klojhsvndfb。客户端发送 klojhsvndfb。每次登录时，只交换的数据是服务器到客户端的 XYZ，这是一个每次登录尝试都会更改的随机字符串；以及客户端到服务器的 klojhsvndfb。如果攻击者学到了哈希值 klojhsvndfb，他们不能用它来登录。它毫无用处，因为服务器会发送一个新的挑战字符串，而不是 XYZ。

在这个示例中，客户端和服务器仍然需要在后续登录时交换 "vsfsevssdfsdga"。但是，有协议可以让客户端和服务器建立一个共享的秘密，而不必直接交换任何危险的信息（例如，有人可以窥视整个对话，但仍然不能计算出共享的秘密）。如果必须使用密码，那么可能最好的方法是在客户端和服务器之间创建一个安全的共享秘密，然后使用用户的密码对其进行本地加密。在登录时，客户端输入密码以在本地解密共享秘密，然后与服务器进行挑战-响应。

结论

这就是基本思想。但请不要尝试自己实现这种或任何其他类型的身份验证方案。有许多微妙的方式可以搞砸并危害你的安全性。最好使用已建立的安全库或框架来处理身份验证。安全性很难。相信我，很容易搞错。希望这对你有所帮助。

英文:

Can you? Yes. Should you? Absolutely not. Since you say you only know the basics, I'll explain.

Hashing basics

As you said, stored passwords should always be hashed for safety. I hope you're salting them as well as hashing - or maybe flask does it for you, I don't know.

For the hash function to be any good, it has to create essentially random output. If you hash two values that differ only by a single character - like foobar and foobaz - the hash output should be completely different. If all you have is the hashes, there's no way to tell whether the inputs are related (like foobar and foobaz) or completely different (like foobar and elephant). Otherwise your hash function is not secure and your passwords can be compromised.

So the properties of a good hash function make hash values completely useless for levenstein distance or any other measures of similarity. All relations between the inputs are destroyed by the hash.

How it affects your approach

This means you can only run levenstein distance on unhashed (plaintext) passwords. That solution is wrong for so many reasons. Passwords should never be stored anywhere in plaintext (your bad passwords list is ok because you reject those as passwords). Passwords should never be sent in plaintext. Passwords should never be sent to the server at all under any conditions. The password should only leave the client in an already hashed state.

If you follow this important rule then you can only runs password checks on the client side. Client enters possible password, local code checks it's validity. If it's ok, then password is hashed before being sent to server. Server should never see the plaintext, or it can be compromised.

What you're attempting is impossible in this situation. Your auth server stores previous passwords, hashed. There's no way to compute a meaningful distance metric between hashed passwords and new password, either before or after hashing it. Doesn't matter if you send hashed previous passwords to client, or send plaintext new password to server (VERY BAD). Either way, you can't compute meaningful comparisons with hashed values.

How it should be done

Hopefully your password checks are done client side - it's not clear where your sample code runs. If server side, then stop immediately and rethink your design. Passwords should never be sent to the server.

Even sending hashed passwords is insecure. If my password is "foobar" which hashes to "vsfsevssdfsdga", and I send "vsfsevssdfsdga" to authenticate to the server every time, then I've accomplished nothing. An attacker can snoop the magic token ""vsfsevssdfsdga" " and use it to gain access, defeating the purpose of hashing.

What you should really be doing is a challenge-response system where the client and server both know a shared secret (such as a hashed password) that's never exchanged directly. Instead, the server sends a challenge like "send me the hash of your password, hashed with random string XYZ". It works like this:

Example

Take the example above : password foobar with hash "vsfsevssdfsdga". Server stores "vsfsevssdfsdga" and computes hash ("vsfsevssdfsdga" + XYZ) = klojhsvndfb. User enters foobar and client computes hash (hash (foobar) + XYZ) = klojhsvndfb. Client sends klojhsvndfb. The only data exchanged each login is XYZ from server to client, which is a random string each time ; and klojhsvndfb from client to server. If an attacker learns the hash value klojhsvndfb, they can't use it to login in the future. It's useless because the server will send a new challenge string instead of XYZ on every login attempt.

In this example, client and server still need to exchange "vsfsevssdfsdga" at the beginning for subsequent logins to work. But there are protocols for client and server to establish a shared secret without directly exchanging any compromising info (i.e. someone can snoop on the entire conversation and still not be able to compute the shared secret). If a password must be involved, probably the best approach is to create a secure shared secret between client and server, then store it client side encrypted with the user's password. On login, client enters password to decrypt shared secret on local machine, then does challenge-response with server.

Conclusion

That's the basic idea. However please DO NOT try to implement this or any other type of authentication scheme yourself. There are many, many subtle ways to screw up and compromise your security. It's far better to use an established security library or framework to handle authentication for you. Security is hard. Trust me, it's very easy to get wrong.

Hope this helps.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Levenshtein距离能防止密码滥用吗？

问题

答案1

哈希基础

它如何影响你的方法

应该如何完成

示例

结论

Hashing basics

How it affects your approach

How it should be done

Example

Conclusion

实现多线程在Python中读取文件中的行并检查是否匹配给定的字符串。

可以在一个pm2服务上运行两个不同的脚本吗？

生成带有 YYYY 格式的随机年份的 Python 代码是什么？

Python多进程与函数的功能

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。