英文:
How does text=auto work? How does Git determine if something is a "text" file?
问题
Git如何确定文件是文本还是二进制?算法是否有文档记录?在不同的Git版本中是否有显著差异?
英文:
The Git documentation says of text=auto
(emphasis added):
> When text
is set to "auto"
, Git decides by itself whether the file is text or binary. If it is text and the file was not already in Git with CRLF endings, line endings are converted on checkin and checkout as described above. Otherwise, no conversion is done on checkin or checkout.
But I don't see an explanation of how Git makes this decision.
How does Git decide if a file is text or binary? Is the algorithm documented somewhere? Is it substantially different across Git versions?
答案1
得分: 0
根据 https://stackoverflow.com/a/6134127/2954547 解释:
Git v2.30.0 确定一个文件是否为“二进制”文件,如果它在文件的前 8000 个字节中包含一个零字节(ASCII NUL
,通常在编程语言中表示为 \0
)。
builtin_diff()
1 调用 diff_filespec_is_binary()
,后者调用 buffer_is_binary()
,后者检查前 8000 个字节(如果较短,则检查整个长度)中是否存在零字节(NUL "字符")的出现。
1
builtin_diff()
中有类似 Binary files %s and %s differ
的字符串,这应该是熟悉的。
这是一个相当简单的检查,但也很节俭和聪明。在大多数文本文件中,确实很少见到字面上的零字节。
英文:
As explained in https://stackoverflow.com/a/6134127/2954547:
Git v2.30.0 determines that a file is "binary" if it contains a zero byte (ASCII NUL
, often denoted \0
in programming languages) in the first 8000 bytes of the file.
> builtin_diff()
<sup>1</sup> calls diff_filespec_is_binary()
which calls buffer_is_binary()
which checks for any occurrence of a zero byte (NUL “character”) in the first 8000 bytes (or the entire length if shorter).
>
> <sup>1</sup>
builtin_diff()
has strings like Binary files %s and %s differ
that should be familiar.
This is a fairly crude check, but it's also parsimonious and kind of clever. It would certainly be unusual to have a literal zero byte in most text files.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论