How does text=auto work? How does Git determine if something is a “text” file?

huangapple go评论61阅读模式
英文:

How does text=auto work? How does Git determine if something is a "text" file?

问题

Git如何确定文件是文本还是二进制?算法是否有文档记录?在不同的Git版本中是否有显著差异?

英文:

The Git documentation says of text=auto (emphasis added):

> When text is set to "auto", Git decides by itself whether the file is text or binary. If it is text and the file was not already in Git with CRLF endings, line endings are converted on checkin and checkout as described above. Otherwise, no conversion is done on checkin or checkout.

But I don't see an explanation of how Git makes this decision.

How does Git decide if a file is text or binary? Is the algorithm documented somewhere? Is it substantially different across Git versions?

答案1

得分: 0

根据 https://stackoverflow.com/a/6134127/2954547 解释:

Git v2.30.0 确定一个文件是否为“二进制”文件,如果它在文件的前 8000 个字节中包含一个零字节(ASCII NUL,通常在编程语言中表示为 \0)。

builtin_diff() 1 调用 diff_filespec_is_binary(),后者调用 buffer_is_binary(),后者检查前 8000 个字节(如果较短,则检查整个长度)中是否存在零字节(NUL "字符")的出现。

1
builtin_diff() 中有类似 Binary files %s and %s differ 的字符串,这应该是熟悉的。

这是一个相当简单的检查,但也很节俭和聪明。在大多数文本文件中,确实很少见到字面上的零字节。

英文:

As explained in https://stackoverflow.com/a/6134127/2954547:

Git v2.30.0 determines that a file is "binary" if it contains a zero byte (ASCII NUL, often denoted \0 in programming languages) in the first 8000 bytes of the file.

> builtin_diff()<sup>1</sup> calls diff_filespec_is_binary() which calls buffer_is_binary() which checks for any occurrence of a zero byte (NUL “character”) in the first 8000 bytes (or the entire length if shorter).
>
> <sup>1</sup>
builtin_diff() has strings like Binary files %s and %s differ that should be familiar.

This is a fairly crude check, but it's also parsimonious and kind of clever. It would certainly be unusual to have a literal zero byte in most text files.

huangapple
  • 本文由 发表于 2023年6月12日 22:48:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76457826.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定