英文:
Expression evaluation in C vs Java
问题
以下是已翻译的内容:
在C语言中执行时,变量 `z` 的值计算结果为20,但在Java中执行相同的表达式时,变量 `z` 的值为12。
有人能解释为什么会出现这种情况以及其中的区别吗?
英文:
int y=3;
int z=(--y) + (y=10);
when executed in C language the value of z
evaluates to 20
but when the same expression in java, when executed gives the z
value as 12.
Can anyone explain why this is happening and what is the difference?
答案1
得分: 6
> 在C语言中执行时,变量z的值计算为20
不,不是的。这是未定义的行为,所以 z
的值可能是任何值,包括20。程序理论上也可能执行任何操作,因为标准没有规定在遇到未定义行为时程序应该做什么。在这里阅读更多信息:https://stackoverflow.com/q/2397984/6699433
作为一个经验法则,在同一个表达式中不要对一个变量进行两次修改。
这不是一个很好的重复问题,但是这会更详细地解释一些事情。此处产生未定义行为的原因是序列点。https://stackoverflow.com/q/949433/6699433
在C语言中,当涉及算术运算符(如 +
和 /
)时,操作数的计算顺序没有在标准中指定,因此如果这些运算的计算具有副作用,你的程序将变得不可预测。这里有一个示例:
int foo(void)
{
printf("foo()\n");
return 0;
}
int bar(void)
{
printf("bar()\n");
return 0;
}
int main(void)
{
int x = foo() + bar();
}
这个程序会打印什么?嗯,我们不知道。我不确定这段代码是否会产生未定义行为,但无论如何,输出是不可预测的。我在这里提了一个问题,https://stackoverflow.com/q/63656184/6699433,所以以后我会更新这个答案。
一些其他变量具有指定的计算顺序(从左到右),比如 ||
和 &&
,这个特性用于短路求值。例如,如果我们使用上面的示例函数并使用 foo() && bar()
,只有 foo()
函数会被执行。
我对Java不是很精通,但为了完整起见,我想提到Java基本上没有未定义或未指定的行为,除了非常特殊的情况。在Java中,几乎所有事情都有明确定义。有关更多详细信息,请阅读rzwitserloot的回答。
英文:
> when executed in C language the value of z evaluates to 20
No it does not. This is undefined behavior, so z
could get any value. Including 20. The program could also theoretically do anything, since the standard does not say what the program should do when encountering undefined behavior. Read more here: https://stackoverflow.com/q/2397984/6699433
As a rule of thumb, never modify a variable twice in the same expression.
It's not a good duplicate, but this will explain things a bit deeper. The reason for undefined behavior here is sequence points. https://stackoverflow.com/q/949433/6699433
In C, when it comes to arithmetic operators, like +
and /
, the order of evaluation of the operands is not specified in the standard, so if the evaluation of those has side effects, your program becomes unpredictable. Here is an example:
int foo(void)
{
printf("foo()\n");
return 0;
}
int bar(void)
{
printf("bar()\n");
return 0;
}
int main(void)
{
int x = foo() + bar();
}
What will this program print? Well, we don't know. I'm not entirely sure if this snippet invokes undefined behavior or not, but regardless, the output is not predictable. I made a question, https://stackoverflow.com/q/63656184/6699433 , about that, so I'll update this answer later.
Some other variables have specified order (left to right) of evaluation, like ||
and &&
and this feature is used for short circuiting. For instance, if we use the above example functions and use foo() && bar()
, only the foo()
function will be executed.
I'm not very proficient in Java, but for completeness, I want to mention that Java basically does not have undefined or unspecified behavior except for very special situations. Almost everything in Java is well defined. For more details, read rzwitserloot's answer
答案2
得分: 3
以下是翻译好的内容:
有3个部分构成了这个回答:
- 在C中是如何工作的(未指定的行为)
- 在Java中是如何工作的(规范明确了如何评估)
- 为什么会有差异。
关于第一点,你应该阅读 @klutt 出色的回答。
关于第二点和第三点,你应该阅读这个回答。
在Java中是如何工作的?
不同于C,Java的语言规范更加明确。例如,C甚至没有告诉你数据类型 int
应该有多少位,而Java语言规范明确规定为32位,即使在64位处理器和64位Java实现上也是如此。
Java规范明确指出 x+y
应该从左到右进行评估(与C的“任意顺序,编译器随意”不同),因此首先评估 --y
,其明确为2(具有将y变为2的副作用),然后评估 y=10
,其明确为10(具有将y变为10的副作用),然后评估 2+10
,其明确为12。
显然,像Java这样的语言更好;毕竟,未定义的行为从本质上来说几乎就是错误,C语言规范的撰写人到底是怎么了,竟然引入了这种疯狂的东西?
答案是:性能。
在C中,编译器将源代码转换为机器码,然后机器码由CPU解释执行。这是一个两步模型。
在Java中,编译器将源代码转换为字节码,然后运行时将字节码转换为机器码,然后机器码由CPU解释执行。这是一个三步模型。
如果要引入优化,你无法控制CPU的操作,因此对于C而言,只有一个步骤可以进行优化:编译。
因此,C(语言)的设计目的是为了给C编译器提供很大的自由度,以尝试生成优化的机器码。这是一个成本/效益的情况:为了在语言规范中有大量“未定义的行为”,你获得了更好的优化编译器的好处。
在Java中,你有第二个步骤,那就是Java进行优化的地方:运行时。java.exe
将其应用于类文件;javac.exe
相当“愚蠢”,几乎不进行任何优化。这是有意为之的;在运行时,你可以做得更好(例如,你可以使用一些簿记来跟踪两个分支中哪个更常被执行,从而比C应用更好地预测分支) - 这也意味着成本/效益分析现在的结果是:语言规范应该清晰如白天。
那么Java代码永远不会有未定义的行为吗?
并非如此。Java有一个包含大量未定义行为的内存模型:
class X { int a, b; }
X instance = new X();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 5;
instance.b = 6;
System.out.print(a);
System.out.print(b);
}}.start();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 1;
instance.b = 2;
System.out.print(a);
System.out.print(b);
}}.start();
在Java中是未定义的。它可能会打印出 0056
、0012
、0010
、0002
、5600
、0600
等许多可能性。像 5000
这样的输出(虽然合法)很难想象:如何能够正确读取 a
但随后读取 b
却失败了呢?
出于完全相同的原因,你的C代码会产生任意的答案:
优化。
在规范中完全硬编码这段代码的行为将会付出很大的代价:你将削弱大部分优化的余地。因此,Java付出了代价,现在拥有了一个在不同线程中修改/读取相同字段时会产生歧义的语言规范,除非使用类似 synchronized
的“先于”卫士来建立关联。
英文:
There are 3 parts to this answer:
- How this works in C (unspecified behaviour)
- How this works in Java (the spec is clear on how this should be evaluated)
- Why is there a difference.
For #1, you should read @klutt's fantastic answer.
For #2 and #3, you should read this answer.
How does it work in java?
Unlike in C, java's language specification is far more clearly specified. For example, C doesn't even tell you how many bits the data type int
is supposed to have, whereas the java lang spec does: 32 bits. Even on 64-bit processors and a 64-bit java implementation.
The java spec clearly says that x+y
is to be evaluated left-to-right (vs. C's 'in any order you please, compiler'), thus, first --y
is evaluated which is clearly 2 (with the side-effect of making y 2), and then y=10
is evaluated which is clearly 10 (with the side effect of making y 10), and then 2+10
is evaluated which is clearly 12.
Obviously, a language like java is just better; after all, undefined behaviour is pretty much a bug by definition, whatever was wrong with the C lang spec writers to introduce this crazy stuff?
The answer is: performance.
In C, your source code is turned into machine code by the compiler, and the machine code is then interpreted by the CPU. A 2-step model.
In java, your source code is turned into bytecode by the compiler, the bytecode is then turned into machine code by the runtime, and the machine code is then interpreted by the CPU. A 3-step model.
If you want to introduce optimizations, you don't control what the CPU does, so for C there is only 1 step where it can be done: Compilation.
So C (the language) is designed to give lots of freedom to C compilers to attempt to produce optimized machine code. This is a cost/benefit scenario: At the cost of having a ton of 'undefined behaviour' in the lang spec, you get the benefit of better optimizing compilers.
In java, you get a second step, and that's where java does its optimizations: At runtime. java.exe
does it to class files; javac.exe
is quite 'stupid' and optimizes almost nothing. This is on purpose; at runtime you can do a better job (for example, you can use some bookkeeping to track which of two branches is more commonly taken and thus branch predict better than a C app ever could) - it also means that cost/benefit analysis now results in: The lang spec should be clear as day.
So java code is never undefined behaviour?
Not so. Java has a memory model which includes a ton of undefined behaviour:
class X { int a, b; }
X instance = new X();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 5;
instance.b = 6;
System.out.print(a);
System.out.print(b);
}}.start();
new Thread() { public void run() {
int a = instance.a;
int b = instance.b;
instance.a = 1;
instance.b = 2;
System.out.print(a);
System.out.print(b);
}}.start();
is undefined in java. It may print 0056
, 0012
, 0010
, 0002
, 5600
, 0600
, and many many more possibilities. Something like 5000
(which it could legally print) is hard to imagine: How can the read of a
'work' but the read of b
then fail?
For the exact same reason your C code produces arbitrary answers:
Optimization.
The cost/benefit of 'hardcoding' in the spec exactly how this code would behave would have a large cost to it: You'd take away most of the room for optimization. So java paid the cost and now has a langspec that is ambigous whenever you modify/read the same fields from different threads without establish so-called 'comes-before' guards using e.g. synchronized
.
答案3
得分: 2
> 在C语言中执行时,变量z的值计算为20。
这并不是实际情况。您所使用的编译器将其计算为 20
。另一个编译器可能会以完全不同的方式进行计算:https://godbolt.org/z/GcPsKh
这种行为被称为未定义行为。
在您的表达式中有两个问题。
- 在C语言中,除了逻辑表达式外,计算的顺序未指定(这是未指定行为)。
- 在这个表达式中,还涉及到序列点的问题(未定义行为)。
英文:
> When executed in C language the value of z evaluates to 20
It is not the truth. The compiler you use evaluates it to 20
. Another one can evaluate it completely different way: https://godbolt.org/z/GcPsKh
This kind of behaviour is called Undefined Behaviour.
In your expression you have two problems.
- Order of eveluation (except the logical expressions) is not specified in C (it is an Unspecified Behaviour)
- In this expression there is also problem with the sequence point (Undefined Bahaviour)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论