关于 “strxfrm()” 函数在 “C” 中的问题

huangapple go评论52阅读模式
英文:

Questions regarding "strxfrm()" function in "C"

问题

首先,我了解到关于这个问题的其他讨论帖,例如这个帖子

对我来说,很遗憾,这些解释并不是很清晰,我提供的测试结果让我更加困惑。让我们从这个函数的最基本部分开始。

该函数的定义是:

size_t strxfrm(argument 1, argument 2, argument 3);

其中:

  • size_t 是函数返回的整数类型的值。

  • argument 1 是类型为 "char * " 的值,用作目标。

  • argument 2 是类型为 "const char * " 的值,用作源。

  • argument 3 是大小为 size_t 的整数,确定从 "argument 2" 复制多少个元素到 "argument 1",覆盖其中的值。

到目前为止 - 一切都好。

但是根据定义,该函数返回:

> “转换后的字符串的长度,不包括终止的空字符。”

根据“转换后的字符串”,我理解为“目标”,即“argument 1”。问题在于,当我测试返回值时 - 它显示了“argument 2”的长度,也就是“源”的长度。例如:

#include <stdio.h>
#include <string.h>

int main()
{
    char arr1[100] = "Hello, World!", arr2[] = "Baxlazazasad";

    int retValue = strxfrm(arr1, arr2, 3);
    printf("arr1的内容:\t%s\narr1的长度:\t%lu\n\narr2的内容:\t%s\narr2的长度:\t%lu\n\n", arr1, strlen(arr1), arr2, strlen(arr2));
    printf("retValue =\t%i\n", retValue);
    return 0;
}

输出:

arr1的内容:        Baxlo, World!
arr1的长度: 13

arr2的内容:        Baxlazazasad
arr2的长度: 12

retValue =      12

关于函数 "strxfrm()" 的第二个问题是关于它的作用。很明显,该函数只是将从 "argument 2" 中复制 "argument 3" 个符号到 "argument 1" 中。那么为什么这个函数被认为是“用于字符串比较”的函数而不是“用于字符串复制”的函数呢?

英文:

First, I'm aware of another threads on this matter, like this one.

Unfortunately to me, the explanations are not very clear, and the results from the tests I provided are confusing me further. Let's start from the very begining with this function.

The function definition is:

size_t strxfrm(argument 1, argument 2, argument 3);

Where:

size_t is the integer type of the value, returned by the function.

argument 1 is value of type "char *", and serves as destination.

argument 2 is value of type "const char *", and serves as a source.

argument 3 is integer of type size_t, and determines how many elements from "argument 2" will be copied into "argument 1", overwriting the values there.

So far - so good.

But by definition, the function returns

> "The length of the transformed string, not including the terminating
> null-character."

By "transformed string" I understant "the destination" i.e. "argument 1.". The problem is, when I test the return value - it displays the length of "argument 2" i.e. "the source". For example:

#include &lt;stdio.h&gt;
#include &lt;string.h&gt;

int main()
{
	char arr1[100] = &quot;Hello, World!&quot;, arr2[] = &quot;Baxlazazasad&quot;;


	int retValue = strxfrm(arr1, arr2, 3);
	printf(&quot;Content of arr1:\t%s\nLength of arr1:\t%lu\n\nContent of arr2:\t%s\nLength of arr2:\t%lu\n\n&quot;, arr1, strlen(arr1), arr2, strlen(arr2));
	printf(&quot;retValue =\t%i\n&quot;, retValue);
	return 0;
}

Output:

Content of arr1:        Baxlo, World!
Length of arr1: 13

Content of arr2:        Baxlazazasad
Length of arr2: 12

retValue =      12

My second question regarding the function "strxfrm()", is about it's action. It is clear that the function simply copies "argument 3"-count of symbols from "argument 2" into "argument 1". Why is then the function considered "function for string compare" and not for "string copying"?

答案1

得分: 2

为什么这个函数被认为是"字符串比较函数"而不是"字符串复制函数"?

谁认为它是这样的?这是错误的。它复制并转换。转换的结果是这样的,以至于对两个已经经过strxfrm()转换的字符串执行strcmp(3)的结果与它们在转换之前执行strcoll(3)的结果相同。

而且我提供的测试结果让我更加困惑。

这是因为示例是无效的。正如文档所说:"strxfrm()函数返回存储转换后字符串所需的字节数,不包括结尾的空字符('\0')。如果返回的值大于或等于n,那么dest的内容是不确定的。"

正确的示例:

int main(void)
{
    char arr1[100] = "Hello, World!", arr2[] = "Baxlazazasad";

    size_t retValue = strxfrm(arr1, arr2, 3);
    if(retValue >= 3)
    {
        printf("这个操作的结果是不确定的\n");
    }
    else
    {
        printf("arr1的内容:\t%s\narr1的长度:\t%zu\n\narr2的内容:\t%s\narr2的长度:\t%zu\n\n", arr1, strlen(arr1), arr2, strlen(arr2));
        printf("retValue =\t%zu\n", retValue);
    }
    return 0;
}
英文:

> Why is then the function considered "function for string compare" and
> not for "string copying"?

Who is considering it this way? It is wrong. It copies and transforms. The result of transformation is in a "form such
that the result of strcmp(3) on two strings that have been
transformed with strxfrm() is the same as the result of
strcoll(3) on the two strings before their transformation."

> and the results from the tests I provided are confusing me further.

It is because the example is invalid. As documentation says: " The strxfrm() function returns the number of bytes required to
store the transformed string in dest excluding the terminating
null byte ('\0'). If the value returned is n or more, the
contents of dest are indeterminate."

The correct example:

int main(void)
{
    char arr1[100] = &quot;Hello, World!&quot;, arr2[] = &quot;Baxlazazasad&quot;;


    size_t retValue = strxfrm(arr1, arr2, 3);
    if(retValue &gt;= 3)
    {
        printf(&quot;There result of this operation is indeterminate\n&quot;);
    }
    else
    {
        printf(&quot;Content of arr1:\t%s\nLength of arr1:\t%zu\n\nContent of arr2:\t%s\nLength of arr2:\t%zu\n\n&quot;, arr1, strlen(arr1), arr2, strlen(arr2));
        printf(&quot;retValue =\t%zu\n&quot;, retValue);
    }
    return 0;
}

答案2

得分: 2

以下是翻译好的部分:

函数定义如下:

size_t strxfrm(argument 1, argument 2, argument 3);

C标准中的函数声明如下:

size_t strxfrm(char * restrict s1, const char * restrict s2, size_t n);

问题在于,当我测试返回值时 - 它显示了“argument 2”的长度,即“the source”。

该函数返回的是适当输出字符串的长度,也就是如果目标缓冲区有足够空间时的结果字符串长度。

对于给定的输入,理想的输出字符串具有长度 l。如果参数 n 大于 l+1,那么 strxfrm 会将理想输出字符串的所有 l 个字符放入目标缓冲区,并添加一个终止空字符(这就是为什么需要 l+1 个字符的空间)。如果 n 小于等于 lstrxfrm 将无法将所有所需字符放入缓冲区。在这种情况下,它仍然返回 l,以便调用者知道应该分配多少空间,以便可以分配更多空间并再次调用 strxfrm

在后一种情况下,C标准不要求 strxfrm 在目标缓冲区中放入任何特定的内容。它可能已经开始在缓冲区上工作并放入了一些内容,也可能在其中没有终止空字符。也可能已经放入了终止空字符。或者可能只是检查了长度而没有开始工作。

这就是C 2018 7.24.4.5 3的意义,该规范了strxfrm的返回值:

strxfrm 函数返回变换后字符串的长度(不包括终止空字符)。如果返回的值大于等于 n,则指向 s1 的数组的内容是不确定的。

因此,使用 strxfrm 处理某个源字符串 s2 的预期用法如下:

  • 从长度为 n+1 的初始缓冲区 s1 开始。允许 s1 是空指针并且 n 为零,或者您可以使用实际缓冲区的更大值。
  • 执行 size_t r = strxfrm(s1, s2, n);
  • 如果从 strxfrm 返回的值 r 小于 n,则处理完成。否则:
    • 分配一个具有 r+1 个字节的新缓冲区,并将 s1 设置为指向它。
    • 执行 strxfrm(s1, s2, r);

显然,该函数只是从“argument 2”中复制“argument 3”个符号到“argument 1”。

这并不明显,也不正确。在您尝试的情况下,strxfrm 可能将字符从 s2 复制到 s1。然而,依赖于区域设置(locale),在 s2 中的某些字符会导致 s1 中不仅有不同的字符,而且字符数也可能不同。

英文:

> The function definition is:
>
> size_t strxfrm(argument 1, argument 2, argument 3);

The function declaration in the C standard is:

size_t strxfrm(char * restrict s1, const char * restrict s2, size_t n);

> The problem is, when I test the return value - it displays the length of "argument 2" i.e. "the source".

The function returns the length of the proper output string, meaning the string that would be the result if there were enough room in the destination buffer.

Given some input, the ideal output string has some length l. If the argument n is l+1 or more, then strxfrm puts all l characters of the ideal output string in the destination and a terminating null character (which is why space for l+1 characters is needed). If n is l or less, strxfrm is not able to put all of the desired characters in the buffer. In this case, it still returns l so that a caller knows how much space they should allocate, so they can allocate more space and call strxfrm again.

In the latter case, the C standard does not require strxfrm to put anything particular in the destination buffer. It might have started work on the buffer and put something there. It might have left it incomplete with no null terminator. It might have put a null terminator in it. Or it might just have checked the length and not started work.

This is the meaning of C 2018 7.24.4.5 3, which specifies the return value of strxfrm:

> The strxfrm function returns the length of the transformed string (not including the terminating null character). If the value returned is n or more, the contents of the array pointed to by s1 are indeterminate.

So the intended use of strxfrm with some source string s2 is:

  • Start with some initial buffer s1 with length n+1. It is allowed for s1 to be a null pointer and n to be zero, or you can use a larger value for n with an actual buffer.
  • Execute size_t r = strxfrm(s1, s2, n);.
  • If the return value r from strxfrm is less than n, you are done. Otherwise:
    • Allocate a new buffer with r+1 bytes and set s1 to point to it.
    • Execute strxfrm(s1, s2, r);.

> It is clear that the function simply copies "argument 3"-count of symbols from "argument 2" into "argument 1".

That is not clear, and it is not true. It may be in the cases that you tried, strxfrm copied characters from s2 to s1. However, there are other cases, dependent on the locale, where certain characters in s2 result in not only different characters in s1 but different numbers of characters.

答案3

得分: 1

> 但根据定义,该函数返回
>
> > "转换后字符串的长度,不包括终止的空字符。"

是的,这就是POSIX所说的,例如。但是你忽略了一个重要的限定条件:

> 如果返回的值大于或等于n,则指向s1的数组的内容未指定。

事实上,这就是你的情况:你将n指定为3,但返回值是12。

>
> 通过"转换后的字符串",我理解为"目标",即"参数1"。

根据我上面提到的限定条件,不,这是一个误解。"转换后的字符串"意味着源字符串的转换形式,可能已经记录在目标数组中,也可能没有。这里的设计考虑到了在某些情况下,执行的转换可能会产生比源字符串更长的结果,而且没有办法确切地知道需要多长的空间,除非尝试进行转换。您应该将目标缓冲区的大小作为第三个参数传递,并可以从返回值中判断

  1. 是否足够大,以及
  2. 如果不够大,实际需要多少空间。

> 问题是,当我测试返回值时 - 它显示了"参数2"即"源"的长度。

这是一个相当可能的结果。这没有问题。这只是显示转换后的形式与源的长度相同。但是你告诉它,目标缓冲区只能容纳3个字符,这是不够的,因此内容未指定。特别是,strxfrm没有义务在该空间的任何位置写入字符串终止符。

> 我关于函数"strxfrm()"的第二个问题是关于它的操作。很明显,该函数只是将"参数2"的前几个符号 - 数目从"参数2"复制到"参数1"。

嗯,不是这样的。这是你的测试中所做的,但绝不清楚(或正确),它在每一种情况下都是这样做的。即使我们假设目标足够大,可以容纳转换后的字符串,包括其终止符(否则,对目标的影响是未定义的)。

> 那么为什么这个函数被认为是"字符串比较函数",而不是"字符串复制函数"呢?

因为在某些情况下,它会有不同的表现。具体细节是未指定的,依赖于区域设置,但描述是一个标准化函数。例如,在Unicode区域设置中,它可能会将源转换为标准Unicode规范形式之一。在这种特殊情况下,标准化将保持许多字符串不变,但不是所有字符串。

英文:

> But by definition, the function returns
>
> > "The length of the transformed string, not including the terminating null-character."

Yes, that's what POSIX says, for instance. But you have omitted an important qualifier:

> If the value returned is n or more, the contents of the array pointed to by s1 are unspecified.

That is in fact your case: you specified n as 3, but the return value is 12.

>
> By "transformed string" I understant "the destination" i.e. "argument 1.".

In light of the qualification I called out above, no, that's a misunderstanding. The "transformed string" means the transformed form of the source string, which might or might not have been recorded in the destination array.

The design here accommodates the fact that the transformation performed may in some cases produce a result longer than the source, and there is no good way to be sure exactly how long it will be without attempting the transformation itself. You are meant to pass the size of the destination buffer as the third argument, and you can judge from the return value

  1. whether that was enough, and
  2. if not, how much space you actually need.

> The problem is, when I test the return value - it displays the length of "argument 2" i.e. "the source".

That is a reasonably likely result. There's nothing wrong with that. It just shows that the transformed form is the same length as the source. But you told it that the destination buffer only has capacity for 3 characters, which is not enough, so the contents are unspecified. In particular, strxfrm is not obligated to write a string terminator anywhere in that space.

> My second question regarding the function "strxfrm()", is about it's action. It is clear that the function simply copies "argument 3"-count of symbols from "argument 2" into "argument 1".

Well, no. That's what it did in your test, but it is by no means clear (or correct) that it does that in every case. Even if we assume that the destination is large enough to accommodate the transformed string, including its terminator (for otherwise, the effect on the destination is undefined).

> Why is then the function considered "function for string compare" and not for "string copying"?

Because under some circumstances, it will do differently. The details are unspecified and locale dependent, but the description is of a normalization function. For example, in a Unicode locale, it might convert the source into one of the standard Unicode normalization forms. In that particular case, normalization would leave many strings unchanged, but not all of them.

huangapple
  • 本文由 发表于 2023年7月24日 19:03:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76753827.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定