英文:
Adding two byte arrays which represent unsigned numbers together fails to add last carry in c
问题
我已经为练习编写了一个大数库,使用C语言,但加法操作出现了一些问题。它通常运行,但忽略了末尾的进位,导致结果不正确。
我一直在尝试在C语言中编写一个大数库,以下是迄今为止用于加法操作的代码。目前的代码如下:
BIG_INT
类型是一个 typedef
,表示一个8字节的 unsigned char
数组。
8字节用于测试,以后会变成512字节。
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define BIG_INT_SIZE 8
typedef unsigned char BIG_INT[BIG_INT_SIZE];
int add_big_int(BIG_INT *left, size_t right)
{
if (!right)
return 0;
char buffer[sizeof(size_t)];
memcpy(buffer, &right, sizeof(size_t));
unsigned int carry = 0;
for (int64_t i = BIG_INT_SIZE - 1; i >= 0; --i) {
unsigned int sum = (*left[i] & 0xff) + (buffer[i] & 0xff) + carry;
*left[i] = sum;
carry = sum >> 8;
}
printf("carry: %u\n", carry);
return 0;
}
int main()
{
BIG_INT big = { 0, 0, 0, 0, 0, 0, 0, 1 };
add_big_int(&big, 255);
return 0;
}
当相加数字如 256 + 1 = 257
时,字节数组是 [0, 0, 0, 0, 0, 0, 1, 1]
(逆序),但当相加 255 + 1 = 256
时,字节数组却是 [0, 0, 0, 0, 0, 0, 0, 0]
而不是预期的 [0, 0, 0, 0, 0, 0, 1, 0]
,并且进位变量设置为 1
。
英文:
I have been writing a big numbers library for practice reasons in C and the addition operation is causing me some issues. It works normally but ignores the carry at the end and the result is incorrect.
I have been trying to write a big number library in c and here is the code so far for the addition operation. Code so far:
The BIG_INT
type is a typedef
for a unsigned char
array of 8 bytes.
8 bytes is used for testing, it will be like 512 bytes later.
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define BIG_INT_SIZE 8
typedef unsigned char BIG_INT[BIG_INT_SIZE];
int add_big_int(BIG_INT *left, size_t right)
{
if (!right)
return 0;
char buffer[sizeof(size_t)];
memcpy(buffer, &right, sizeof(size_t));
unsigned int carry = 0;
for (int64_t i = BIG_INT_SIZE - 1; i >= 0; --i) {
unsigned int sum = (*left[i] & 0xff) + (buffer[i] & 0xff) + carry;
*left[i] = sum;
carry = sum >> 8;
}
printf("carry: %u\n", carry);
return 0;
}
int main()
{
BIG_INT big = { 0, 0, 0, 0, 0, 0, 0, 1 };
add_big_int(&big, 255);
return 0;
}
When adding numbers like 256 + 1 = 257
, the byte array is [0, 0, 0, 0, 0, 0, 1, 1]
(reverse order), however when adding 255 + 1 = 256
, the byte array is [0, 0, 0, 0, 0, 0, 0, 0]
instead of the expected [0, 0, 0, 0, 0, 0, 1, 0]
and the carry variable is set to 1
.
答案1
得分: 0
首先,函数的第一个参数是一个指针:
int add_big_int(BIG_INT* left, size_t right)
如果BIG_INT
定义如下:
typedef unsigned char BIG_INT[8];
那么参数的类型就是char(*)[8]
。
将数组传递给指针并没有太多意义。只需要像这样声明函数:
int add_big_int(BIG_INT left, size_t right)
然后像这样调用它:
add_big_int(big, 255);
该函数总是返回0
,对于函数的用户而言并不具有信息量。要么函数应该返回有意义的值,要么返回类型应该是void
。
在代码中使用size_t
类型,例如:
char buffer[sizeof(size_t)];
只会让代码的读者感到困惑。
相反,你可以使用表达式sizeof( BIG_INT )
,前提是数组BIG_INT
足够大,可以存储size_t
类型的值。
使用表达式*left[i]
会引发未定义行为。相反,你应该使用表达式( *left )[i]
。但无论如何,使用按位与运算符的表达式,如:
(buffer[i] & 0xff)
没有效果,因为你的数组元素类型是unsigned char
。
总的来说,由于可能会发生溢出,你应该使用动态分配的数组,将其封装在一个结构体中,这涉及到使用进位标志。
英文:
For starters the first parameter of the function is a pointer
int add_big_int(BIG_INT* left, size_t right)
^^^^^^^^
If BIG_INT
is defined like
typedef unsigned char BIG_INT[8];
then it means that the parameter has the type char ( * )[8]
.
There is no great sense to pass the array through a pointer to it. It is enough to declare the function like
int add_big_int(BIG_INT left, size_t right)
and call it like
add_big_int(big, 255);
The function always returns 0
that is not informative for the user of the function. Either the function should return meaningful values or its return type should be void
.
Using the type size_t
as for example
char buffer[sizeof(size_t)];
only confuses readers of the code.
Insetad you could use expression sizeof( BIG_INT )
provided that the array BIG_INT
is large enough to store values of the type size_t
.
The used expression *left[i]
invokes undefined behavior. Instead you have to use expression ( *left )[i]
. But in any case using expressions with the bitwise AND operator like
(buffer[i] & 0xff)
has no effect because you have arrays with the element type unsigned char
.
In general you should use a dynamically allocated array enclosed in a structure due to using the carry flag because there can be an overflow.
答案2
得分: 0
代码混乱,需要一些组织来帮助修复它。
看看这个循环:for (int64_t i = BIG_INT_SIZE - 1; i >= 0; --i)
。这个循环遍历了BIG_INT_SIZE
个元素,使用了left[i]
和buffer[i]
。buffer
中有多少元素?
buffer
被声明为char buffer[sizeof(size_t)];
,所以如果sizeof(size_t)
与BIG_INT_SIZE
不同,那么这个循环就会严重出错。你要么需要让buffer
和left
具有相同数量的元素,要么你需要在循环中编写代码来处理不同的大小。
让我们使它们具有相同的大小,使用char buffer[BIG_INT_SIZE];
。
接下来,比较这两个声明:
typedef unsigned char BIG_INT[BIG_INT_SIZE];
char buffer[BIG_INT_SIZE];
有什么不同?一个使用了unsigned char
,另一个使用了char
。C标准没有规定char
是有符号还是无符号的。当你在有符号的char
中放入255时,它很可能会变成-1,尽管这是与实现相关的。你应该使用unsigned char
:
unsigned char buffer[BIG_INT_SIZE];
接下来,你有这个语句:memcpy(buffer, &right, sizeof(size_t));
,但我们需要考虑buffer
大小设置的改变。另外,memcpy
按照它们在内存中的顺序复制字节,但从你的其他代码来看,你希望低字节位于较高的地址。所以你可以这样做:
for (int i = 0; i < sizeof right; ++i)
{
buffer[BIG_INT_SIZE - 1 - i] = right & 0xff;
right >>= 8;
}
for (int i = sizeof right; i < BIG_INT_SIZE; ++i)
buffer[BIG_INT_SIZE - 1 - i] = 0;
第一个循环将right
的低值字节放入buffer
的高地址元素中,然后将right
的字节向右移动。第二个循环用零填充buffer
的任何其他元素。
这里的一个潜在风险是right
比buffer
大,这会导致第一个循环越界缓冲区。我们可以通过以下方式防范这种情况:
_Static_assert(sizeof right <= sizeof buffer, "right is too big for buffer");
我在上面使用了& 0xff
,模仿了你现有的代码。通常,人们会设计一个八位字节,如果是这种情况,& 0xff
是不必要的;赋值可以只是buffer[BIG_INT_SIZE - 1 - i] = right;
,自动将right
转换为unsigned char
将有效地移除高位。如果你进行这个更改,你可能想要插入一个 _Static_assert
,确保CHAR_BIT
是八。CHAR_BIT
在<limits.h>
中定义。
接下来,看看unsigned int sum = (*left[i] & 0xff) + (buffer[i] & 0xff) + carry;
。数组下标运算的优先级高于指针解引用,所以*left[i]
是*(left[i])
,而不是(*left)[i]
。left
是指向BIG_INT
的指针,只传递了一个BIG_INT
,所以你不能使用left[i]
,其中i
不为零。你需要使用(*left)
来获取BIG_INT
,然后[i]
来从中选择一个元素,所以你需要unsigned int sum = ((*left)[i] & 0xff) + (buffer[i] & 0xff) + carry;
。
类似地,*left[i] = sum;
应该是(*left)[i] = sum;
。
经过这些更改,你的代码将为你展示的情况得到正确的答案。然而,你应该考虑其他设计方面的问题,包括:
- 你想将
BIG_INT
定义为数组吗?将其定义为包含数组的结构可能更安全。这将在早期捕获*left[i]
的错误。 - 为什么
right
的类型是size_t
?size_t
具有与对象大小一起使用的含义,而不是可能用于大整数类型的任意算术值。你是否想使用有符号类型? - 为什么循环中的
i
使用int64_t
?那是一个固定宽度的类型。你可能想要一个像ptrdiff_t
这样的类型,它适用于数组索引(它是有符号的,因此可以与i >= 0
测试一起使用,并且它是一种设计成根据目标平台调整其宽度的类型)。 - 当最后的
carry
不为零时,你将采取什么措施? add_big_int
是否始终返回零,或者你对它有一些未来的计划?
英文:
The code is disorganized, and some organization will help fix it.
Look at this loop: for (int64_t i = BIG_INT_SIZE - 1; i >= 0; --i)
. That iterates through BIG_INT_SIZE
elements, using left[i]
and buffer[i]
. How many elements are in buffer
?
buffer
is declared char buffer[sizeof(size_t)];
, so, if sizeof(size_t)
is different from BIG_INT_SIZE
, the loop is massively broken. You either need buffer
and left
to have the same number of elements or you need code in the loop to handle disparate sizes.
Let’s make them the same size, using char buffer[BIG_INT_SIZE];
.
Next, compare these two declarations:
typedef unsigned char BIG_INT[BIG_INT_SIZE];
char buffer[BIG_INT_SIZE];
What is different? One uses unsigned char
, and the other uses char
. The C standard does not say whether char
is signed or unsigned. When you put 255 in signed char
, it is likely to take on the value −1, although this is implementation-defined. You want unsigned char
:
unsigned char buffer[BIG_INT_SIZE];
Next, you have memcpy(buffer, &right, sizeof(size_t));
, but we need to account for the change in how the size of buffer
is set. Also, memcpy
copies the bytes in whatever order they are in memory, but you want the low bytes at higher addresses, judging from your other code. So what you can do is:
for (int i = 0; i < sizeof right; ++i)
{
buffer[BIG_INT_SIZE - 1 - i] = right & 0xff;
right >>= 8;
}
for (int i = sizeof right; i < BIG_INT_SIZE; ++i)
buffer[BIG_INT_SIZE - 1 - i] = 0;
The first loops puts the low-value byte of right
into the high-address element of buffer
and then shifts the bytes of right
right. The second loop fills in any further elements of buffer
with zeros.
A potential hazard here is that right
is larger than buffer
, which would make the first loop overrun the buffer. We can guard against that with:
_Static_assert(sizeof right <= sizeof buffer, "right is too big for buffer");
I used & 0xff
above, mimicking your existing code. Often, people would design for an eight-bit byte, in which case the & 0xff
is unnecessary; the assignment could be just buffer[BIG_INT_SIZE - 1 - i] = right;
, and the automatic conversion of right
to unsigned char
will effectively remove higher bits. If you make that change, you may want to insert a _Static_assert
that CHAR_BIT
is eight. CHAR_BIT
is defined in <limits.h>
.
Next, look at unsigned int sum = (*left[i] & 0xff) + (buffer[i] & 0xff) + carry;
. Array subscripting has higher precedence than pointer dereferencing, so *left[i]
is *(left[i])
, not (*left)[i]
. left
is a pointer to BIG_INT
, and just one BIG_INT
is passed, so you cannot use left[i]
with i
being non-zero. You need (*left)
to get the BIG_INT
and then [i]
to select an element from it, so you want unsigned int sum = ((*left)[i] & 0xff) + (buffer[i] & 0xff) + carry;
.
Similarly, *left[i] = sum;
should be (*left)[i] = sum;
.
With those changes made, your code will get the correct answer for the case you show. However, you should consider other design aspects, including:
- Do you want to define
BIG_INT
as an array? It may be safer as a structure containing an array. This would have caught the*left[i]
error earlier? - Why is the type of
right
size_t
?size_t
has a connotation that it is used with object sizes, not arbitrary arithmetic values that might be used with big-integer types. Do you want to use a signed type for it? - Why does the loop use
int64_t
fori
? That is a fixed-width type. You might want a type likeptrdiff_t
, which is suitable for indexing arrays (it is signed, so it can be used with ani >= 0
test, and it is a type designed to adjust its width to the target platform.) - What will you do when the final
carry
is non-zero? - Will
add_big_int
always return zero, or do you have some future plan for it?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论