密码哈希字符串编码的标准是什么?

huangapple go评论86阅读模式
英文:

what is the standard for password hash string encoding?

问题

我正在询问密码哈希后的格式,并准备将其存储起来。美元符号$注释似乎是普遍存在的。这是否在某个标准中有描述(包括算法的标识符)?

例如,当使用Go的golang.org/x/crypto/bcrypt时,它会给出这样一个编码字符串(playground):

func main() {
	h, err := bcrypt.GenerateFromPassword([]byte("foo"), bcrypt.DefaultCost)
	if err != nil {
		panic(err)
	}

	fmt.Printf("%s", h)
	// Output: $2a$10$g1d5KuvDIrRoUyWL2BQs7uLOWCzlM.zqbRm8o364u20p20YNmJ.Ve
}

然而,其他哈希包如scrypt示例)和argon2只返回结果哈希。使用argon2的shell命令,会返回一个编码字符串:

echo "foo" | argon2 saltsalt
Type:           Argon2i
Iterations:     3
Memory:         4096 KiB
Parallelism:    1
Hash:           d9e4f94546b9e5b0cfb2dbf9dad81d41371845d8b6a8c25ce7caf23e13f1ef72
Encoded:        $argon2i$v=19$m=4096,t=3,p=1$c2FsdHNhbHQ$2eT5RUa55bDPstv52tgdQTcYRdi2qMJc58ryPhPx73I
0.005 seconds
Verification ok

我找到了一个关于Go/argon2的特定博客文章,解释了这种编码方式,到目前为止还好。

我发现的变化

我的问题在于美元符号分隔的字符串的定义,以及我发现的可移植性和变化。

glibc

man 3 crypt页面给出了一些指针。有一个标识符表:

              ID   Method
              ───────────────────────────────────────────────────────────
              1    MD5
              2a   Blowfish(不在主线glibc中;在某些Linux发行版中添加)
              5    SHA-256(自glibc 2.7起)
              6    SHA-512(自glibc 2.7起)

但这并不涵盖像argon2iscrypt这样的新类型。

然后有一些示例字符串:

$id$salt$encrypted
$id$rounds=yyy$salt$encrypted

后者只在Glibc 2.7之后支持。

bcrypt

虽然bcrypt使用了Glibc的2a(blowfish)标识符,但从上面的示例中可以看出,它的编码方式略有不同:

$2a$10$g1d5KuvDIrRoUyWL2BQs7uLOWCzlM.zqbRm8o364u20p20YNmJ.Ve
$id$cost$<dot separated line of what exactly?>

argon2

Argon2使用了5个字段和一个完整的名称标识符,如argon2

$argon2i$v=19$m=4096,t=3,p=1$c2FsdHNhbHQ$2eT5RUa55bDPstv52tgdQTcYRdi2qMJc58ryPhPx73I
$id$version$parameters$salt$encrypted

为什么?

我想编写一个以算法无关的方式对密码进行哈希和验证的包。允许消费者在不重构代码的情况下更改参数和算法。因此,在验证过程中,该包应能够断定存储密码时使用的算法。如果存储的参数或算法版本与当前使用的不同,则重新哈希密码并返回一个新的编码字符串。

作为额外的功能,我希望该包能够重新哈希“传统”的密码,这些密码可能是由旧版(非Go)应用程序存储的,例如md5。为了做到这一点,我希望对存储格式本身有更深入的了解。

英文:

I'm asking about the format used after the password is hashed and preparing it for storage. The dollar sign $ annotation is something that seems to be widespread. Is that described in a standard somewhere (including the identifiers for algorithms)?

For example, when using Go with golang.org/x/crypto/bcrypt, it gives such an encoded string (playground):

func main() {
	h, err := bcrypt.GenerateFromPassword([]byte(&quot;foo&quot;), bcrypt.DefaultCost)
	if err != nil {
		panic(err)
	}

	fmt.Printf(&quot;%s&quot;, h)
	// Output: $2a$10$g1d5KuvDIrRoUyWL2BQs7uLOWCzlM.zqbRm8o364u20p20YNmJ.Ve
}

However, other hashing packages like scrypt (example) and argon2 return just the resulting hash. Using the argon2 shell command, there is an encoded string returned:

echo &quot;foo&quot; | argon2 saltsalt
Type:           Argon2i
Iterations:     3
Memory:         4096 KiB
Parallelism:    1
Hash:           d9e4f94546b9e5b0cfb2dbf9dad81d41371845d8b6a8c25ce7caf23e13f1ef72
Encoded:        $argon2i$v=19$m=4096,t=3,p=1$c2FsdHNhbHQ$2eT5RUa55bDPstv52tgdQTcYRdi2qMJc58ryPhPx73I
0.005 seconds
Verification ok

I found a Go / argon2 specific blog post explaining this encoding, so far so good

Variations I found

My trouble lies with the definition of the dollar separated string, the portability and variations I found.

glibc

The man 3 crypt page gives some pointers. There is a table of identifiers:

              ID   Method
              ───────────────────────────────────────────────────────────
              1    MD5
              2a   Blowfish (not in mainline glibc; added in some Linux
                   distributions)
              5    SHA-256 (since glibc 2.7)
              6    SHA-512 (since glibc 2.7)

But this doesn't cover newer types, like argon2i or scrypt.

Then there are the example strings:

$id$salt$encrypted
$id$rounds=yyy$salt$encrypted

The latter being only supported after Glibc 2.7.

bcrypt

While bcrypt uses the 2a (blowfish) identifier from Glibc, its encoding seems to be slightly different as seen from the above example:

$2a$10$g1d5KuvDIrRoUyWL2BQs7uLOWCzlM.zqbRm8o364u20p20YNmJ.Ve
$id$cost$&lt;dot seperated line of what exactly?&gt;

argon2

Argon2 uses 5 fields and a full name identifier like argon2

$argon2i$v=19$m=4096,t=3,p=1$c2FsdHNhbHQ$2eT5RUa55bDPstv52tgdQTcYRdi2qMJc58ryPhPx73I
$id$version$parameters$salt$encrypted

why?

I want to write a package that hashes and verifies passwords in an algorithm agnostic way. Allowing the consumers to change parameters and algorithms without refactoring their code. Therefore during verification the package should be able to assert the algorithm used when storing the password. If stored version of parameters or algorithm is different than the one currently in use, the password is re-hashed and a new encoded string is returned.

As a bonus, I would like the package to have the ability to re-hash "legacy" passwords which might have been stored by older (not go) applications. For instance, md5. In order to do all this I would like to have a deeper understanding of the storage format itself.

答案1

得分: 3

密码哈希字符串编码的标准是什么?

没有标准。

嘿,这个回答太简单了!点击“发布你的答案”

好吧,尽管上面的说法不幸是真的,但幸运的是,已经有一些人已经收集了大量关于所有变体的信息。

特别是,Python的Passlib库的作者(它实际上做了你想做的事情)已经撰写了一篇关于他们称之为“模块化密码格式”的页面,他们称之为“一种不是标准的标准”。以下是该页面的一些摘录[加粗斜体为我所强调]:

然而,没有官方的规范文档来描述这种格式。也没有一个中央注册表来存储标识符,或者实际的规则。模块化密码格式更像是一种临时想法,而不是真正的标准。

[模块化密码格式-概述]

不幸的是,这种格式没有规范文档。它只存在于事实上的形式中。

当MCF首次引入时,大多数方案选择一个数字作为它们的标识符(例如md5_crypt$1$)。因此,一些旧系统在尝试区分哈希时只查看第一个字符

大多数模块化密码格式的哈希都遵循这个约定,尽管有些(比如bcrypt)在配置和摘要之间省略了$分隔符

[关于配置字符串是否应该或不应该在末尾包含$],没有明确的标准。

[模块化密码格式-要求]

请注意,模块化密码格式不是一个规范或标准。它是对在实际中使用的各种不同格式的描述。有一个由密码哈希竞赛(PHC)的组织者尝试制定的规范,称为PHC字符串格式。然而,PHC并不是一个正式的标准组织,没有任何权威。它只是一个松散的密码学家团体。虽然他们建议每个新的密码哈希函数都应该使用PHC字符串格式,但他们只能强制要求那些提交给密码哈希竞赛的密码哈希函数使用该格式。

而且,无论如何,PHC字符串格式只适用于新的密码哈希函数,而不适用于现有的函数。

虽然我强烈建议你在生成的输出中使用PHC字符串格式,但你仍然需要处理各种不同格式的输入,包括一些像这样的宝石

cta_pbkdf2_sha1dlitz_pbkdf2_sha1都使用相同的标识符。虽然它们有其他的内部差异,但可以通过cta哈希总是以=结尾,而dlitz哈希根本不包含=来快速区分两者。

英文:

> what is the standard for password hash string encoding?

There is none.

Hey, that was an easy answer! Clicks "Post Your Answer".

Okay, while the above statement is unfortunately true, thankfully, there are some people who have already gone through the trouble of collecting a lot of information about all of the variations in use.

In particular, the authors of the Passlib library for Python (which does essentially the same thing you want to do) have written up a page about what they call the Modular Crypt Format which they call "a standard that isn’t". Here are some choice quotes from that page [bold italic emphasis mine]:

> However, there’s no official specification document describing this format. Nor is there a central registry of identifiers, or actual rules. The modular crypt format is more of an ad-hoc idea rather than a true standard.

[Modular Crypt Format – Overview]

> Unfortunately, there is no specification document for this format. Instead, it exists in de facto form only

> When MCF was first introduced, most schemes choose a single digit as their identifier (e.g. $1$ for md5_crypt). Because of this, some older systems only look at the first character when attempting to distinguish hashes.

> Most modular crypt format hashes follow this convention, though some (like bcrypt) omit the $ separator between the configuration and the digest.

> [T]here is no set standard about whether configuration strings should or should not include a trailing $ at the end

[Modular Crypt Format – Requirements]

Please note that the Modular Crypt Format is not a specification or a standard. It is a description of the various different formats that are used in the wild. There is an attempt at a specification by the organizers of the Password Hashing Competition (PHC), called the PHC String Format. However, the PHC is no formal standards organization with any kind of authority. It is just a loose group of cryptographers. While they recommend that every new password hashing function should use the PHC String Format, they can only mandate it for password hashing functions that are submitted to the Password Hashing Competition.

And either way, the PHC String Format only applies to new password hashing functions, not to existing ones.

While I strongly suggest that you should use the PHC String Format for any output you generate, you will still have to deal with inputs in all sorts of different formats, including some gems like these:

> cta_pbkdf2_sha1 and dlitz_pbkdf2_sha1 both use the same identifier. While there are other internal differences, the two can be quickly distinguished by the fact that cta hashes always end in =, while dlitz hashes contain no = at all.

huangapple
  • 本文由 发表于 2022年9月4日 01:58:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/73594387.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定