英文:
Float Arithmetic inconsistent between golang programs
问题
使用pion/opus解码音频文件时,偶尔会得到错误的值。
我已经调试了以下代码。当这个例程在Opus解码器内部运行时,得到的值与在外部运行时不同。当这两个浮点数相加时,最右边的位不同。随着程序运行时间的增长,这些值的差异最终会成为一个问题。
这是一个错误还是预期行为?我不知道如何更深入地调试这个问题或者转储程序状态以了解更多信息。
在解码器外部:
package main
import (
"fmt"
"math"
)
func main() {
a := math.Float32frombits(uint32(955684399))
b := math.Float32frombits(uint32(927295728))
fmt.Printf("%b\n", math.Float32bits(a))
fmt.Printf("%b\n", math.Float32bits(b))
fmt.Printf("%b\n", math.Float32bits(a+b))
}
返回结果:
111000111101101001011000101111
110111010001010110100011110000
111001000001111010000110100110
然后在解码器内部:
fmt.Printf("%b\n", math.Float32bits(lpcVal))
fmt.Printf("%b\n", math.Float32bits(val))
fmt.Printf("%b\n", math.Float32bits(lpcVal+val))
返回结果:
111000111101101001011000101111
110111010001010110100011110000
111001000001111010000110100111
英文:
When decoding audio files with pion/opus I will occasionally get values that are incorrect.
I have debugged it down to the following code. When this routine runs inside the Opus decoder I get a different value then when I run it outside? When the two floats are added together the right most bit is different. The difference in values eventually becomes a problem as the program runs longer.
Is this a bug or expected behavior? I don't know how to debug this deeper/dump state of my program to understand more.
Outside decoder
package main
import (
"fmt"
"math"
)
func main() {
a := math.Float32frombits(uint32(955684399))
b := math.Float32frombits(uint32(927295728))
fmt.Printf("%b\n", math.Float32bits(a))
fmt.Printf("%b\n", math.Float32bits(b))
fmt.Printf("%b\n", math.Float32bits(a+b))
}
Returns
111000111101101001011000101111
110111010001010110100011110000
111001000001111010000110100110
Then Inside decoder
fmt.Printf("%b\n", math.Float32bits(lpcVal))
fmt.Printf("%b\n", math.Float32bits(val))
fmt.Printf("%b\n", math.Float32bits(lpcVal+val))
Returns
111000111101101001011000101111
110111010001010110100011110000
111001000001111010000110100111
答案1
得分: 1
我猜lpcval
和val
不是Float32类型,而是Float64类型。
如果是这样的话,你提出了两种不同的操作:
- 在前一种情况下,你执行
Float32bits(lpcval) + Float32bits(val)
- 在后一种情况下,你执行
Float32bits(lpcval + val)
这两个32位浮点数的二进制表示如下:
1.11101101001011000101111 * 2^-14
1.10001010110100011110000 * 2^-17
精确的和是
1.000011110100001101001101 * 2^-13
这是两个可表示的Float32之间的精确平衡点;
结果被舍入为具有偶数尾数的Float32
1.00001111010000110100110 * 2^-13
但是lpcval
和val
是Float64类型:它们的小数点后面不是23位,而是52位(多了19位)。
如果这19位中的任何一位不为零,结果可能不是一个精确的平衡点,而是略大于精确的平衡点;
一旦转换为最接近的Float32,结果将是
1.00001111010000110100111 * 2^-13
由于我们不知道lpcval
和val
在这些低有效位中包含什么,所以任何事情都可能发生,即使没有使用fma操作。
英文:
I guess that lpcval
and val
are not Float32 but rather Float64.
If that is the case, then you are proposing two different operations:
- in the former case, you do
Float32bits(lpcval) + Float32bits(val)
- in the later case, you do
Float32bits(lpcval + val)
the two 32 bits floats are in binary:
1.11101101001011000101111 * 2^-14
1.10001010110100011110000 * 2^-17
The exact sum is
1.000011110100001101001101 * 2^-13
which is an exact tie between two representable Float32<br>
the result is rounded to the Float32 with even significand
1.00001111010000110100110 * 2^-13
But lpcval
and val
are Float64: instead of 23 bits after the floating point, they have 52 (19 more).
If a single bit among those 19 more bits is different from zero, the result might not be an exact tie, but slightly larger than the exact tie.<br>
Once converted to nearest Float32, that will be
1.00001111010000110100111 * 2^-13
Since we have no idea of what lpcval
and val
contains in those low significant bits, anything can happen, even without the use of fma operations.
答案2
得分: 0
这是由于“融合乘加”(Fused multiply and add)引起的。多个浮点运算被合并为一次操作。
你可以在Go 语言规范#浮点运算符中了解更多信息。
我对代码进行的更改是:
- lpcVal += currentLPCVal * (aQ12 / 4096.0)
+ lpcVal = float32(lpcVal) + float32(currentLPCVal)*float32(aQ12)/float32(4096.0)
感谢 Bryan C. Mills 在 Gophers Slack 的 #performance 频道上解答此问题。
英文:
This was happening because of Fused multiply and add
. Multiple floating point operations were becoming combined into one operation.
You can read more about it in the Go Language Spec#Floating_Point_Operators
The change I made to my code was
- lpcVal += currentLPCVal * (aQ12 / 4096.0)
+ lpcVal = float32(lpcVal) + float32(currentLPCVal)*float32(aQ12)/float32(4096.0)
Thank you to Bryan C. Mills for answering this on the #performance channel on the Gophers slack.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论