The validity of casting in the ‘function pointer’ version of K&R’s qsort

huangapple go评论58阅读模式
英文:

The validity of casting in the 'function pointer' version of K&R's qsort

问题

I understand that you want a translation of the code-related content without any additional commentary. Here's the translation:

这个问题涉及到 K&R (第2版) 第5.11节 (第118-121页) 的“函数指针”版本的 `qsort`。有很多地方我不明白为什么或者为什么强制类型转换是如何工作的,我怀疑这个程序并不完全符合标准。

这是源代码。除了删除注释并对格式进行微小的更改外,我做了以下操作:
- 将 `qsort` 重命名为 `qsort_alt`(用于“替代 qsort”),因为标准库已经包含一个名为 `qsort` 的函数,
- 将 `main` 的函数签名更改为 `int main(void)`,并且
- 简化和修改了 `main`,使用通过 `linearr` 和 `wlinearr` 变量提供的输入,而不是 `stdin`。

(以下是源代码的部分,如有需要,请告诉我要翻译的哪个部分。)

// 代码部分...

Please let me know if you would like me to translate any specific parts of the code further or if you have any other requests related to this code.

英文:

This question is about the 'function pointer' version of qsort from K&R (2e), section 5.11 (p118-121). There are a number of places where I don't understand why or how the casts work, and I suspect that the program is not entirely standards-conformant.

Here is the source code. Aside from removing comments and making minor changes to the formatting, I:

  • renamed qsort to qsort_alt (for "alternative qsort"), because the standard library already contains a function named qsort,
  • changed the function signature of main to int main(void), and
  • simplified and altered main to use input supplied via the linearr and wlinearr variables instead of stdin.
#include <stdio.h>
#include <string.h>
#include <wchar.h>

char *linearr[] = {
    "13", "5", "10", "9", "23", "2", "5",
    "25", "3", "13", "1", "4", "14", "19",
};
wchar_t *wlinearr[] = {
    L"13", L"5", L"10", L"9", L"23", L"2", L"5",
    L"25", L"3", L"13", L"1", L"4", L"14", L"19",
    L"42", L"32", L"25", L"5", L"3", L"34",
    L"50", L"40", L"86", L"91", L"100", L"21",
    L"29", L"28", L"23", L"93", L"24", L"89",
    L"85", L"63", L"10", L"39", L"14", L"18",
    L"78", L"73", L"52", L"57", L"26", L"38",
    L"91", L"76", L"61", L"46",
};

void qsort_alt(void *v[], int left, int right,
               int (*comp)(void *, void *));
int numcmp(char *, char *);

int main(void) {
    int nlines = sizeof(linearr) / sizeof(linearr[0]);
    int nwlines = sizeof(wlinearr) / sizeof(wlinearr[0]);
    int i;

    qsort_alt((void **)linearr, 0, nlines - 1,
              (int (*)(void *, void *))numcmp);
    for (i = 0; i < nlines; i++)
        printf("%s%s", linearr[i], (i < nlines - 1) ? " " : "\n");
    /* output: 1 2 3 4 5 5 9 10 13 13 14 19 23 25 */

    qsort_alt((void **)linearr, 0, nlines - 1,
              (int (*)(void *, void *))strcmp);
    for (i = 0; i < nlines; i++)
        printf("%s%s", linearr[i], (i < nlines - 1) ? " " : "\n");
    /* output: 1 10 13 13 14 19 2 23 25 3 4 5 5 9 */

    qsort_alt((void **)wlinearr, 0, nwlines - 1,
              (int (*)(void *, void *))wcscmp);
    for (i = 0; i < nwlines; i++)
        wprintf(L"%ls%ls", wlinearr[i], (i < nwlines - 1) ? L" " : L"\n");
    /* output: 1 10 10 100 13 13 14 14 18 19 2 21 23 23 24 25 25 26 28 29 3 3 32 34 38 39 4 40 42 46 5 5 5 50 52 57 61 63 73 76 78 85 86 89 9 91 91 93 */

    return 0;
}

void qsort_alt(void *v[], int left, int right,
               int (*comp)(void *, void *)) {
    int i, last;
    void swap(void *v[], int, int);

    if (left >= right)
        return;
    swap(v, left, (left + right) / 2);
    last = left;
    for (i = left + 1; i <= right; i++)
        if ((*comp)(v[i], v[left]) < 0)
            swap(v, ++last, i);
    swap(v, left, last);
    qsort_alt(v, left, last - 1, comp);
    qsort_alt(v, last + 1, right, comp);
}

#include <stdlib.h>

int numcmp(char *s1, char *s2) {
    double v1, v2;

    v1 = atof(s1);
    v2 = atof(s2);
    if (v1 < v2)
        return -1;
    else if (v1 > v2)
        return 1;
    else
        return 0;
}

void swap(void *v[], int i, int j) {
    void *temp;

    temp = v[i];
    v[i] = v[j];
    v[j] = temp;
}

For me, this compiles fine (without warnings) with GCC and compiler flags -std=iso9899:199409 -pedantic -Wall -Wextra and produces the output shown within the comments above. (The choice of iso9899:199409 reflects the fact that <wchar.h> was introduced with C95, but actually this compiles and runs equally well with c90.)

My call to qsort_alt is substantially the same as in K&R (2e). For reference, in the original it was

qsort((void **) lineptr, 0, nlines-1,
  (int (*)(void*,void*))(numeric ? numcmp : strcmp));

with int nlines indicating the number of lines stored in char *lineptr[MAXLINES] and numeric being a boolean flag (set via the command line) that decides which of the two functions numcmp/strcmp is used. K&R's main reads input from stdin into char *lineptr[MAXLINES]; I replaced it by char *linearr[]. The input wchar_t *wlinearr[] is my own addition. Relevant here is what K&R write (p119-120):

> We have written qsort [in my code: qsort_alt] so it can process any data type, not just character strings. [...] The elaborate cast of the function argument casts the arguments of the comparison function. These will generally have no effect on actual representation, but assure the compiler that all is well.

Let's see whether all is good, ie: whether the program is portable and whether it would work for other types such as wchar_t, as implied by K&R:

1. Why are the arguments to qsort_alt [orig: qsort] cast explicitly?

The C17 standard draft says (6.5.2.2 ¶7):

> If the expression that denotes the called function has a type that does include a prototype [ie: it doesn't use old-style syntax], the arguments are implicitly converted, as if by assignment, to the types of the corresponding parameters, taking the type of each parameter to be the unqualified version of its declared type.

That is, it seems that explicit casts to match the types of the function parameters of qsort_alt aren't necessary, though they may be stylistically justified.

However, is the following wording (6.5.4 ¶3) relevant?

> Conversions that involve pointers, other than where permitted by the constraints of 6.5.16.1 ["Simple assignment"], shall be specified by means of an explicit cast.

2. Is the cast (void **)linearr [orig: (void **) lineptr] legal? What about my (void **)wlinearr?

6.3.2.3 ¶7:

> A pointer to an object type may be converted to a pointer to a different object type. If the resulting pointer is not correctly aligned[] for the referenced type, the behavior is undefined.

Pointer types are object types, hence char * and void * are object types, and therefore pointers to these types can legally be converted. However, the second sentence is relevant as well. While we know that char * and void * are identically aligned (6.2.5 ¶28), this wouldn't necessarily be true for wchar_t * and void *: it is easy to imagine a situation where the former has half the bit width of the latter. (Recall that K&R claim that their qsort "can process any data type".) That is, even if the cast is fine, my cast (void **)wlinearr might not be.

There is other language in the standard (6.7.6.1 ¶2) where I am not sure whether it has any bearing on these issues:

> For two pointer types to be compatible, both shall be identically qualified and both shall be pointers to compatible types.

linearr is of type char * [] and decays to type char ** just before the cast to type void **. For char ** and void ** to be compatible, char * and void * would need to be compatible, but it seems that they technically aren't compatible.

3. Are the casts of numcmp/strcmp from types int (*)(char *, char *)/int (*)(const char *, const char *) to type int (*)(void *, void *) legal? What about my analogous cast of wcscmp from type int (*)(const wchar_t *, const wchar_t *) to type int (*)(void *, void *)? (Note that the difference in constness isn't an issue.)

As the document "Errata for The C Programming Language, Second Edition" by Lucent Technologies (2002/2006) states (code formatting added in some places):
> [T]he comparison-routine argument is not treated well. The call shown on p 119, with an argument
> (int (*)(void*,void*))(numeric? numcmp : strcmp)
> is not only complicated, but only barely passes muster. Both numcmp and strcmp take char * arguments, but this expression casts pointers to these functions to a function pointer that takes void * arguments. The standard does say that void * and char * have the same representation, so the example will almost certainly work in practice, and is at least defensible under the standard.

According to this answer, the casting of function pointers is only safe (as in: doesn't lead to undefined behavior) if the respective types are "compatible", but char * and void * are technically not compatible (see #2 above).

But perhaps this is really just to appease the compiler (as K&R assure us), so let's have a look at what happens in qsort_alt:

4. Is the call to comp unproblematic?

qsort_alt computes v[i] and v[left], which involves pointer arithmetic on an underlying void * type. (Note that v is of type void **. While pointer arithmetic is illegal on void * (with an underlying void), it is not illegal on void ** (with an underlying void *).) Because pointers to void and to character types "have the same representation and alignment requirements" (6.2.5 ¶28), the index computations seem fine for K&R's original code, but do they also work portably for wchar_t? It seems that a type-generic version of the program would need to supply the size of the underlying type (such as wchar_t *) to qsort_alt.

Furthermore, I am not sure whether it is really true that "[t]he elaborate cast of the function argument [within main] casts the arguments of the comparison function". It seems to me that if any arguments of the comparison function are cast, that would happen within qsort_alt when it calls comp, but not within the call to qsort_alt within main. From the perspective of qsort_alt, comp is a function accepting void * arguments, and any v[index] is of type void *, and therefore there don't seem to be any implicit argument conversions going on. In this regard, I am assuming that implicit argument conversions for function parameters would be performed by the caller, not the callee, but do correct me if I'm wrong. (Caller and callee may, like here, have differing beliefs about the parameters' types.) Despite the absence of a conversion, comp will internally assume that its arguments are char * (for K&R's original) or wchar_t * (for my addition). Things should go well as long as the pointer arithmetic to get to the arguments v[i] and v[left] is proper.

Incidentally, I see calls within qsort_alt to swap as unproblematic, as swap operates on a void ** argument and only reassigns pointers.

答案1

得分: 4

First, we should call the attention of any readers of this question to the fact the qsort being discussed here, as defined in Kernighan and Ritchie, The C Programming Language (second edition), page 120, is not the qsort specified in the C standard. The one in the book is archaic and has a defective design. Its prototype is:

void qsort(void *v[], int left, int right, int (*comp)(void *, void *));

The one in the 1990 C standard has prototype:

void qsort(void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));

The most significant difference is the first is designed only to sort an array of pointers, specifically pointers to void (first parameter type is void *[]), whereas the standard version can sort an array of any type (first parameter type is void *, and the third parameter gives the size for each element).

> 1. Why are the arguments to qsort_alt [orig: qsort] cast explicitly?

Two arguments are cast, the first and the last. In the first, the authors wish to pass their lineptr, which has type char *[5000], for a parameter nominally of type void *[]. The array lineptr will be automatically converted to char **, and the nominal parameter type will be automatically adjusted to void **, so these are different and incompatible types. For a function call with a prototype, C 2018 6.5.2.2 2 says “… Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.” (C 1990 6.3.2.2 is similar.)

Constraints for simple assignment are in C 2018 6.5.16.1 1. (C 1990 6.3.16.1 is similar.) Two of its cases cover when the left operand and the right operand of an assignment are pointer types. One of these says, in part, both are pointers to compatible types except for the qualifiers. This does not apply because char * and void * are not compatible. The other says, in part, the left operand may be a pointer to void, with or without qualifiers. This does not apply because void ** is not a pointer to void. Therefore, a char ** expression does not satisfy the constraints for assignment to a void ** object, so the char ** argument for a void ** does not satisfy the constraints for a function call, so this argument cannot be passed for a void ** parameter without a cast.

In the last argument, the authors wish to pass their numcmp, which, after automatic conversion of a function to a pointer, has type int (*)(char *, char *), for a parameter of type int (*)(void *, void *). Again the constraints for simple assignment apply, and this argument cannot be passed for that parameter without a cast.

> 2. Is the cast (void **)linearr [orig: (void **) lineptr] legal? What about my (void **)wlinearr?

The conversion is defined and the conversion and the use of the result to access pointers in the original array is effectively defined by the C standard, albeit by the kludgy specification in C 2018 6.2.5 28 that “A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.”

The initial conversion from char ** to void ** is defined by C 2018 6.3.2.3 7, which says a pointer to an object type may be converted to a pointer to a different object type, but it is undefined if the resulting pointer is not correctly aligned for the referenced type. Since char * and void * have the same alignment requirements, the caveat does not apply. 6.3.2.3 7 further says that converting the result back to the original type yields a value equal to the original pointer. This is the only thing the standard tells us generally about the result of this conversion—for example, it does not tell us that if we convert int *i to float *f, and int and float have the same size and alignment requirements, that *f will access the same bytes as *i will. However, a footnote to 6.2.5 28 tells us “The same representation and alignment requirements are meant to imply interchangeability as arguments to functions,…” This is a bad way to say that a char * may serve as a void * or vice versa, but it does seem the intent was to say we can convert a char ** to a void ** and use the result to access the char * elements. The intent is presumably also to allow this aliasing of different types even though it nominally violates the rules for aliasing in C 2018 6.5 7.

So the conversion from char ** to void ** is defined and may be used to access the char * pointers, even when it is done using a void * type.

On the other hand, your (void **) wlinearr conversion does not enjoy this license. After array-to-pointer conversion, wlinearr yields wchar_t **, and wchar_t * does not necessarily have the same size, representation, or alignment requirements as void *. So the conversion is not guaranteed to work. Even if it does, the resulting value is not specified to be useful for accessing the elements of the wchar_t * array.

> 3. Are the casts of numcmp/strcmp from types int (*)(char *, char *)/int (*)(const char *, const char *) to type int (*)(void *, void *) legal? What about my analogous cast of wcscmp from type int (*)(const wchar_t *, const wchar_t *) to type int (*)(void *, void *)? (Note that the difference in constness isn't an issue.)

The conversions specified by the casts are defined, because C 2018 6.3.2.3 8 says a pointer to any type of function may be converted to a pointer to any other type of function. However, the only defined uses for such a pointer are converting it back to the original type, comparing it for equality to other pointers, or testing whether it is a null pointer. That paragraph says “… If a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.”

> 4. Is the call to comp unproblematic?

It is problematic because of the rule quoted just above. The qsort defined at this point in the book is defective in standard C and should not be used. Perhaps it was designed for pedagogical purposes, to use types in a more simplistic way than proper use requires to avoid presenting too many new concepts to the student at once. Nonetheless, it is not a good design

英文:

First, we should call the attention of any readers of this question to the fact the qsort being discussed here, as defined in Kernighan and Ritchie, The C Programming Language (second edition), page 120, is not the qsort specified in the C standard. The one in the book is archaic and has a defective design. Its prototype is:

void qsort(void *v[], int left, int right, int (*comp)(void *, void *));

The one in the 1990 C standard has prototype:

void qsort(void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *));

The most significant difference is the first is designed only to sort an array of pointers, specifically pointers to void (first parameter type is void *[]), whereas the standard version can sort an array of any type (first parameter type is void *, and the third parameter gives the size for each element).

> 1. Why are the arguments to qsort_alt [orig: qsort] cast explicitly?

Two arguments are cast, the first and the last. In the first, the authors wish to pass their lineptr, which has type char *[5000], for a parameter nominally of type void *[]. The array lineptr will be automatically converted to char **, and the nominal parameter type will be automatically adjusted to void **, so these are different and incompatible types. For a function call with a prototype, C 2018 6.5.2.2 2 says “… Each argument shall have a type such that its value may be assigned to an object with the unqualified version of the type of its corresponding parameter.” (C 1990 6.3.2.2 is similar.)

Constraints for simple assignment are in C 2018 6.5.16.1 1. (C 1990 6.3.16.1 is similar.) Two of its cases cover when the left operand and the right operand of an assignment are pointer types. One of these says, in part, both are pointers to compatible types except for the qualifiers. This does not apply because char * and void * are not compatible. The other says, in part, the left operand may be a pointer to void, with or without qualifiers. This does not apply because void ** is not a pointer to void. Therefore, a char ** expression does not satisfy the constraints for assignment to a void ** object, so the char ** argument for a void ** does not satisfy the constraints for a function call, so this argument cannot be passed for a void ** parameter without a cast.

In the last argument, the authors wish to pass their numcmp, which, after automatic conversion of a function to a pointer, has type int (*)(char *, char *), for a parameter of type int (*)(void *, void *). Again the constraints for simple assignment apply, and this argument cannot be passed for that parameter without a cast.

> 2. Is the cast (void **)linearr [orig: (void **) lineptr] legal? What about my (void **)wlinearr?

The conversion is defined and the conversion and the use of the result to access pointers in the original array is effectively defined by the C standard, albeit by the kludgy specification in C 2018 6.2.5 28 that “A pointer to void shall have the same representation and alignment requirements as a pointer to a character type.”

The initial conversion from char ** to void ** is defined by C 2018 6.3.2.3 7, which says a pointer to an object type may be converted to a pointer to a different object type, but it is undefined if the resulting pointer is not correctly aligned for the referenced type. Since char * and void * have the same alignment requirements, the caveat does not apply. 6.3.2.3 7 further says that converting the result back to the original type yields a value equal to the original pointer. This is the only thing the standard tells us generally about the result of this conversion—for example, it does not tell us that if we convert int *i to float *f, and int and float have the same size and alignment requirements, that *f will access the same bytes as *i will. However, a footnote to 6.2.5 28 tells us “The same representation and alignment requirements are meant to imply interchangeability as arguments to functions,…” This is a bad way to say that a char * may serve as a void * or vice versa, but it does seem the intent was to say we can convert a char ** to a void ** and use the result to access the char * elements. The intent is presumably also to allow this aliasing of different types even though it nominally violates the rules for aliasing in C 2018 6.5 7.

So the conversion from char ** to void ** is defined and may be used to access the char * pointers, even when it is done using a void * type.

On the other hand, your (void **) wlinearr conversion does not enjoy this license. After array-to-pointer conversion, wlinearr yields wchar_t **, and wchar_t * does not necessarily have the same size, representation, or alignment requirements as void *. So the conversion is not guaranteed to work. Even if it does, the resulting value is not specified to be useful for accessing the elements of the wchar_t * array.

> 3. Are the casts of numcmp/strcmp from types int (*)(char *, char *)/int (*)(const char *, const char *) to type int (*)(void *, void *) legal? What about my analogous cast of wcscmp from type int (*)(const wchar_t *, const wchar_t *) to type int (*)(void *, void *)? (Note that the difference in constness isn't an issue.)

The conversions specified by the casts are defined, because C 2018 6.3.2.3 8 says a pointer to any type of function may be converted to a pointer to any other type of function. However, the only defined uses for such a pointer are converting it back to the original type, comparing it for equality to other pointers, or testing whether it is a null pointer. That paragraph says “… If a converted pointer is used to call a function whose type is not compatible with the referenced type, the behavior is undefined.”

> 4. Is the call to comp unproblematic?

It is problematic because of the rule quoted just above. The qsort defined at this point in the book is defective in standard C and should not be used. Perhaps it was designed for pedagogical purposes, to use types in a more simplistic way than proper use requires to avoid presenting too many new concepts to the student at once. Nonetheless, it is not a good design.

The qsort in the C standard fixes these problems. First, instead of a void ** (equivalently void *[], due to automatic adjustment) parameter, it takes void * and a size_t size parameters. These provide the base address of the elements to be sorted and the size of each element. This allows qsort to manipulate any array by treating its elements as raw bytes (which has defined behavior by the C standard). It does not need type information about the array elements, other than the size.

Second, it specifies the prototype of the comparison function as int ()(const void *, const void *). So no information about the element types is visible or used in this prototype. The function is passed raw pointers to memory, with no type information (other than the const qualifier). Inside the function, the const void * pointers should be converted to pointers to the actual element type (with const). Thus, all of the type information is inside the comparison function the qsort caller provides; none of it is built into qsort.

(The standard qsort also changes int left, int right to size_t nmemb. This simply removes a redundancy, as the caller can specify the “left” endpoint by adding to the base address passed in base, and then the number of elements provides the location of the “right” endpoint.)

答案2

得分: 2

以下是翻译好的部分:

  • "The function qsort discussed in this question is different from the qsort function specified in the C Standard." 这个问题中讨论的qsort函数与C标准中指定的qsort函数不同。
  • "It specifically sorts sorts an array of generic pointers to objects (type void *), using a pointer to a function that compares objects, itself receiving 2 void * pointers from the array." 它专门对对象的通用指针数组(类型为void *)进行排序,使用指向比较对象的函数的指针,该函数本身从数组中接收2个void *指针。
  • "The standard qsort from the C library sorts an array of objects, not an array of pointers to objects." C库中的标准qsort对对象数组进行排序,而不是对象指针数组。
  • "If passed an array of pointers, the comparison function will receive addresses of such pointers, and need to dereference them to compare the actual objects." 如果传递了一个指针数组,比较函数将接收这些指针的地址,并需要对它们进行解引用以比较实际的对象。
  • "Unlike the now standard qsort, the functions discussed here cannot sort an array of int or double or other scalar types, nor an array of structures." 与现在的标准qsort不同,这里讨论的函数不能对intdouble或其他标量类型的数组,也不能对结构体数组进行排序。
  • "Furthermore, the second and third arguments have a very different meaning: the standard qsort takes the number of elements and the element size, whereas qsort_alt takes left, the index to the first element of a slice and right, the index of the last element in the slice." 此外,第二个和第三个参数的含义有很大的不同:标准的qsort接受元素的数量和元素的大小,而qsort_alt接受left(切片中第一个元素的索引)和right(切片中最后一个元素的索引)。
  • "This does not allow for sorting empty arrays, in which there is no last element. It would be more idiomatic to pass right as the index to the first element after the slice, hence right - left would be the number of elements, or to just pass the number of elements in the array allowing for empty slices and removing the need for confusing +1/-1 adjustments in the algorithm." 这不允许对空数组进行排序,其中没有最后一个元素。 更符合习惯的做法是将right作为切片后第一个元素的索引,因此right - left将是元素的数量,或者只需传递数组中的元素数量,允许空切片并消除算法中令人困惑的+1/-1调整的需求。
  • "Also note that declaring void swap(void *v[], int, int); locally in the scope of the qsort_alt is a bad idea as the prototype has a limited scope and will not be checked for consistency when the parser eventually reaches the function definition. It is strongly recommended to always declare functions at the global scope." 还要注意,在qsort_alt的范围内局部声明void swap(void *v[], int, int);是一个不好的主意,因为原型的范围有限,在解析器最终到达函数定义时不会检查其一致性。 强烈建议始终在全局范围内声明函数。
英文:

The function qsort discussed in this question is different from the qsort function specified in the C Standard. It specifically sorts sorts an array of generic pointers to objects (type void *), using a pointer to a function that compares objects, itself receiving 2 void * pointers from the array.

The standard qsort from the C library sorts an array of objects, not an array of pointers to objects. If passed an array of pointers, the comparison function will receive addresses of such pointers, and need to dereference them to compare the actual objects.

Unlike the now standard qsort, the functions discussed here cannot sort an array of int or double or other scalar types, nor an array of structures.

Furthermore, the second and third arguments have a very different meaning: the standard qsort takes the number of elements and the element size, whereas qsort_alt takes left, the index to the first element of a slice and right, the index of the last element in the slice. This does not allow for sorting empty arrays, in which there is no last element. It would be more idiomatic to pass right as the index to the first element after the slice, hence right - left would be the number of elements, or to just pass the number of elements in the array allowing for empty slices and removing the need for confusing +1/-1 adjustments in the algorithm.

Also note that declaring void swap(void *v[], int, int); locally in the scope of the qsort_alt is a bad idea as the prototype has a limited scope and will not be checked for consistency when the parser eventually reaches the function definition. It is strongly recommended to always declare functions at the global scope.

Regarding your questions:

> 1. Why are the arguments to qsort_alt [orig: qsort] cast explicitly?
>
> 2. Is the cast (void **)linearr [orig: (void **) lineptr] legal? What about my (void **)wlinearr?

qsort_alt expects an array of void *, more precisely a pointer to void pointers, (void **). You pass arrays of char * and arrays of wchar_t *, which are not compatible with arrays of void *, as the pointed types are different and incompatible. This might seem confusing as void * is compatible with all data pointer types for assignment, but it may have a different representation from wchar_t * hence pretending a wchar_t * is a void * is incorrect. As noted in the K&R erratum, void * and char * are mandated to have the same representation by the C Standard, so you should get away with this for the (void **)linearr case, but the cast, albeit legal in C may cause undefined behavior, and more so for the other case (void **)wlinearr

In the case of standard qsort, not only is the cast not necessary, it is actually counter productive, albeit not a problem in practice as noted in the erratum. The cast is useless because qsort expects a void *, ie: a data pointer to any type. linearr is an array, so it decays to a char ** when passed as a function argument, which is converted implicitly to a void *. Casting it to void ** is problematic as explained here above, but the resulting void ** is still compatible with void *.

> 3. Are the casts of numcmp/strcmp from types int (*)(char *, char *)/int (*)(const char *, const char *) to type int (*)(void *, void *) legal?
>
> What about my analogous cast of wcscmp from type int (*)(const wchar_t *, const wchar_t *) to type int (*)(void *, void *)? (Note that the difference in constness isn't an issue.)

The answer is no, they are not legal in either case. Yet since void * and char * have the same representation, one could assume that they should be passed the same way as function arguments (which is an ABI issue not covered by the Standard), do a function such as strcmp that takes 2 const char * and returns an int or numcmp probably behaves as expected where qsort invokes a function taking 2 void * and returns an int.

Casting wcscmp as a int (*)(void *, void *) is definitely incorrect, but will cause an issue in very few systems, where wchar_t * and void * have a different representation and/or calling convention. I am personally not aware of any such system in production, but the C Standard allows for such differences, albeit the POSIX Standard does not.

> 4. Is the call to comp unproblematic?

For reasons explained above, the call may be problematic, and the pointer arithmetic would be indeed incorrect if wchar * and void * have a different size.

Here is a modified version of the code using functions that a compatible with standard qsort and without any explicit casts:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <wchar.h>

const char *linearr[] = {
    "13", "5", "10", "9", "23", "2", "5",
    "25", "3", "13", "1", "4", "14", "19",
};
const wchar_t *wlinearr[] = {
    L"13", L"5", L"10", L"9", L"23", L"2", L"5",
    L"25", L"3", L"13", L"1", L"4", L"14", L"19",
    L"42", L"32", L"25", L"5", L"3", L"34",
    L"50", L"40", L"86", L"91", L"100", L"21",
    L"29", L"28", L"23", L"93", L"24", L"89",
    L"85", L"63", L"10", L"39", L"14", L"18",
    L"78", L"73", L"52", L"57", L"26", L"38",
    L"91", L"76", L"61", L"46",
};

int cmp_numcmp(const void *p1, const void *p2) {
    const char * const *s1 = p1;
    const char * const *s2 = p2;
    double d1 = atof(*s1);
    double d2 = atof(*s2);
    return (d2 < d1) - (d2 > d1);
}

int cmp_strcmp(const void *p1, const void *p2) {
    const char * const *s1 = p1;
    const char * const *s2 = p2;
    return strcmp(*s1, *s2);
}

int cmp_wcscmp(const void *p1, const void *p2) {
    const wchar_t * const *s1 = p1;
    const wchar_t * const *s2 = p2;
    return wcscmp(*s1, *s2);
}

void swap(unsigned char *p1, unsigned char *p2, size_t size) {
    while (size --> 0) {
        unsigned char temp = *p1;
        *p1++ = *p2;
        *p2++ = temp;
    }
}

/* same prototype as qsort from <stdlib.h> */
void qsort_alt(void *base, size_t nmemb, size_t size,
               int (*comp)(const void *, const void *))
{
    if (nmemb > 1) {
        unsigned char *p = base;
        size_t i, last;
        /* choose the middle element as the pivot
         * and swap it to the beginning of the array
         */
        swap(p, p + (nmemb / 2) * size, size);
        last = 0;
        for (i = 1; i < nmemb; i++) {
            if ((*comp)(p + i * size, p) < 0) {
                /* if the element is below the pivot, swap it into the left slice */
                swap(p + ++last * size, p + i * size, size);
            }
        }
        /* swap the pivot back to its final place between the slices */
        swap(p, p + last * size, size);
        /* sort the left slice (elements before the pivot) */
        qsort_alt(p, last, size, comp);
        /* sort the right slice (elements after the pivot) */
        qsort_alt(p + (last + 1) * size, nmemb - (last + 1), size, comp);
    }
}

/* compatibility with Standard qsort from the C library can be 
   verified by uncommenting the line below */
/* #define qsort_alt qsort */

int main(void) {
    size_t nlines = sizeof(linearr) / sizeof(linearr[0]);
    size_t nwlines = sizeof(wlinearr) / sizeof(wlinearr[0]);
    size_t i;

    qsort_alt(linearr, nlines, sizeof(linearr[0]), cmp_numcmp);
    for (i = 0; i < nlines; i++) {
        printf("%s%s", linearr[i], (i < nlines - 1) ? " " : "\n");
    }
    /* output: 1 2 3 4 5 5 9 10 13 13 14 19 23 25 */

    qsort_alt(linearr, nlines, sizeof(linearr[0]), cmp_strcmp);
    for (i = 0; i < nlines; i++) {
        printf("%s%s", linearr[i], (i < nlines - 1) ? " " : "\n");
    }
    /* output: 1 10 13 13 14 19 2 23 25 3 4 5 5 9 */

    qsort_alt(wlinearr, nwlines, sizeof(wlinearr[0]), cmp_wcscmp);
    for (i = 0; i < nwlines; i++) {
        wprintf(L"%ls%ls", wlinearr[i], (i < nwlines - 1) ? L" " : L"\n");
    }
    /* output: 1 10 10 100 13 13 14 14 18 19 2 21 23 23 24 25 25 26 28 29 3 3 32 34 38 39 4 40 42 46 5 5 5 50 52 57 61 63 73 76 78 85 86 89 9 91 91 93 */

    return 0;
}

答案3

得分: 1

如果我们需要简化为一个简短的回答,我们可以得到如下答案:

qsort_alt 的定义是错误的,因为它接受一个 char *[] 数组,传递的是一个 void *[] 数组。按照标准,你实际上不能这样做。隐藏的假设是所有数据指针都是相同类型的。虽然这在祖先平台和现代平台上是正确的,但并不是在所有80年代和90年代的平台上都是正确的。在一些平台上,sizeof(int *)sizeof(char *) 是不同的,而且在至少一个平台上,int *char * 的内存布局是不同的,尽管它们的大小相同。

虽然我不能确定存在一个调用失败的平台,但我可以证明 void * 的意图丢失了。根据上述情况,必须存在一个对于 int[](以及几乎任意的 Tstruct)或 char[] 中的某一个而失败的平台。

英文:

If we need to boil this down to a short answer, we can have an answer as follows:

The definition of qsort_alt is wrong because it takes a char *[] array passed as a void *[] array. You can't actually do that by standard. The hidden assumption is all data pointers are the same type. While this was true on the ancestral platform and true on modern platforms, it is not true of all platforms of the '80s and '90s. On some platforms, sizeof(int *) and sizeof(char *) differ, and on at least one platform the memory layout of int * and char * were different despite being the same size.

While I can't say for certain there exists a platform for which the call fails, I can prove the intent of void * was lost. Given the cases above, there must exist a platform that fails for one of int[] (and struct T for almost any arbitrary T) or char[].

huangapple
  • 本文由 发表于 2023年5月13日 14:04:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76241319.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定