英文:
Slightly different floating point math results (C to golang)
问题
我正在开发一个直接在golang中的技术指标库。这是学习golang的一种方式。
我一直在使用TA-Lib(或者更准确地说是TA-Lib的Ruby封装)生成测试数据来验证我的算法结果。
这一直运行得很好,直到我开始实现布林带(Bollinger Bands)指标。我的实现似乎工作正常,但在小数点的第14-15位上有所不同。
我阅读了https://stackoverflow.com/questions/21872854/floating-point-math-in-different-programming-languages,并怀疑这可能是问题的原因(我按照稍微不同的顺序进行计算)。
编辑以补充:
上面的问题涉及浮点数计算的一个非常简单的表现形式。要确认一个较长的代码实际上是否存在这个问题,要困难得多。
我如何确认这只是由于计算顺序不同而导致的浮点数计算的变化?
/ 结束编辑
我的实现如下:
package ta
import (
"math"
)
func BollingerBands(values []float64, period int) ([]float64, []float64, []float64) {
deviationsUp := 2.0
deviationsDown := 2.0
middleBand := Sma(values, period)
offset := len(values)-len(middleBand)
var upperBand []float64
var lowerBand []float64
for idx, v := range middleBand {
backIdx := offset+idx-period+1
curIdx := offset+idx+1
if backIdx < 0 {
backIdx = 0
}
stdDev := SliceStdDev(values[backIdx:curIdx])
upperBand = append(upperBand, v + (stdDev * deviationsUp))
lowerBand = append(lowerBand, v - (stdDev * deviationsDown))
}
return upperBand, middleBand, lowerBand
}
// Sma produces the Simple Moving Average for the
// supplied array of float64 values for a given period
func Sma(values []float64, period int) []float64{
var result []float64
for index,_ := range values {
indexPlusOne := index+1
if(indexPlusOne>=period) {
avg := Mean(values[indexPlusOne-period:indexPlusOne])
result = append(result, avg)
}
}
return result
}
// SliceMean returns the Mean of the slice of float64
func SliceMean(values []float64) float64 {
var total float64=0
for _,element := range values {
total += element
}
return total / float64(len(values))
}
// SliceVariance returns the variance of the slice of float64.
func SliceVariance(values []float64) float64 {
if 0 == len(values) {
return 0.0
}
m := SliceMean(values)
var sum float64
for _, v := range values {
d := v - m
sum += d * d
}
return sum / float64(len(values))
}
// SliceStdDev returns the standard deviation of the slice of float64.
func SliceStdDev(values []float64) float64 {
return math.Sqrt(SliceVariance(values))
}
这将导致上轨带的以下值 <[]float64 | len:6, cap:8>: [94.92564730599291, 94.50588827974477, 92.12752961253167, 101.58367006802706, 114.64331379078675, 120.58088881180322]
。
使用Ruby:
require 'indicator/mixin'
x = [26.0, 54.0, 8.0, 77.0, 61.0, 39.0, 44.0, 91.0, 98.0, 17.0]
y = x.indicator(:bbands_5)
# {:out_real_upper_band=>[94.9256473059929, 94.50588827974477, 92.12752961253167, 101.58367006802709, 114.64331379078678, 120.58088881180323, nil, nil, nil, nil] <SNIP>}
以上是要翻译的内容。
英文:
I am working on developing a library of technical indicators directly in golang. It is, among other things, an exercise in learning golang.
I've been validating the results of my algorithms by building test cases with data generated with TA-Lib (or rather the ruby wrapper around TA-Lib).
This has been working fine until I got to the implementation of Bollinger Bands. My implementation seems to work fine, but differs at the 14-15th decimal place.
I've read https://stackoverflow.com/questions/21872854/floating-point-math-in-different-programming-languages and suspect that this might be the culprit (I am doing the calculations in a slightly different order).
Edited to add:
The question above deals with a very simple manifestation of the floating point math. It's much harder to confirm that a longer piece of code is actually hitting this problem.
How can I confirm that it is just variations in floating point math because of the order?
/ End Edit
Am I correct in my understanding?
Here is my implementation:
package ta
import (
"math"
)
func BollingerBands(values []float64, period int) ([]float64, []float64, []float64) {
deviationsUp := 2.0
deviationsDown := 2.0
middleBand := Sma(values, period)
offset := len(values)-len(middleBand)
var upperBand []float64
var lowerBand []float64
for idx, v := range middleBand {
backIdx := offset+idx-period+1
curIdx := offset+idx+1
if backIdx < 0 {
backIdx = 0
}
stdDev := SliceStdDev(values[backIdx:curIdx])
upperBand = append(upperBand, v + (stdDev * deviationsUp))
lowerBand = append(lowerBand, v - (stdDev * deviationsDown))
}
return upperBand, middleBand, lowerBand
}
// Sma produces the Simple Moving Average for the
// supplied array of float64 values for a given period
func Sma(values []float64, period int) []float64{
var result []float64
for index,_ := range values {
indexPlusOne := index+1
if(indexPlusOne>=period) {
avg := Mean(values[indexPlusOne-period:indexPlusOne])
result = append(result, avg)
}
}
return result
}
// SliceMean returns the Mean of the slice of float64
func SliceMean(values []float64) float64 {
var total float64=0
for _,element := range values {
total += element
}
return total / float64(len(values))
}
// SliceVariance returns the variance of the slice of float64.
func SliceVariance(values []float64) float64 {
if 0 == len(values) {
return 0.0
}
m := SliceMean(values)
var sum float64
for _, v := range values {
d := v - m
sum += d * d
}
return sum / float64(len(values))
}
// SliceStdDev returns the standard deviation of the slice of float64.
func SliceStdDev(values []float64) float64 {
return math.Sqrt(SliceVariance(values))
}
Which results in the following values for the upper band <[]float64 | len:6, cap:8>: [94.92564730599291, 94.50588827974477, 92.12752961253167, 101.58367006802706, 114.64331379078675, 120.58088881180322]
Using ruby:
require 'indicator/mixin'
x = [26.0, 54.0, 8.0, 77.0, 61.0, 39.0, 44.0, 91.0, 98.0, 17.0]
y = x.indicator(:bbands_5)
# {:out_real_upper_band=>[94.9256473059929, 94.50588827974477, 92.12752961253167, 101.58367006802709, 114.64331379078678, 120.58088881180323, nil, nil, nil, nil] <SNIP>}
答案1
得分: 2
我认为这些算法是不同的。例如,方差的计算方法如下:
/* 使用紧密循环进行MA计算。 /
/ 累加初始周期,除了最后一个值。 */
periodTotal1 = 0;
periodTotal2 = 0;
trailingIdx = startIdx - nbInitialElementNeeded;
i = trailingIdx;
if (optInTimePeriod > 1)
{
while (i < startIdx) {
tempReal = inReal[i++];
periodTotal1 += tempReal;
tempReal *= tempReal;
periodTotal2 += tempReal;
}
}
/* 继续计算所请求范围的值。
-
注意,此算法允许inReal和outReal是同一个缓冲区。
*/
outIdx = 0;
do
{
tempReal = inReal[i++];/* 平方并累加相同周期内的所有偏差。 */
periodTotal1 += tempReal;
tempReal *= tempReal;
periodTotal2 += tempReal;/* 平方并累加相同周期内的所有偏差。 */
meanValue1 = periodTotal1 / optInTimePeriod;
meanValue2 = periodTotal2 / optInTimePeriod;tempReal = inReal[trailingIdx++];
periodTotal1 -= tempReal;
tempReal *= tempReal;
periodTotal2 -= tempReal;outReal[outIdx++] = meanValue2 - meanValue1 * meanValue1;
} while (i <= endIdx);
这看起来不像你的方差计算方法。如果你要重现这些算法,使它们执行完全相同的操作,那么Go版本应该产生相同的结果。Go只是执行标准的IEEE 754浮点运算。
至于“顺序是否重要”的问题,当然重要。由于浮点运算是不精确的,进行计算时会丢失信息。大多数情况下,这并没有太大影响,但有时算法可能对这些变化非常敏感。(因此,在实际代码中,对公式进行代数重排可能不会得到相同的答案)
在这些库中,通常会发现算法已经被设计来解决这些问题,因此它们通常不像朴素实现那样简单。例如,mean
通常是一个简单的函数,但以下是GSL中的计算方法:
double
FUNCTION (gsl_stats, mean) (const BASE data[], const size_t stride, const size_t size)
{
/* 使用递归关系计算数据集的算术平均值
mean_(n) = mean(n-1) + (data[n] - mean(n-1))/(n+1) */
long double mean = 0;
size_t i;
for (i = 0; i < size; i++)
{
mean += (data[i * stride] - mean) / (i + 1);
}
return mean;
}
因此,除非你完全匹配这些算法,否则你的答案将有微小的差异。(这并不一定意味着你的程序有错)
通常用于解决这个问题的一种方法是使用非常小的数进行相等比较(math.Abs(expected-result) < ɛ
,其中你定义了ɛ:const ɛ = 0.0000001
),而不是使用==
进行比较。
英文:
I think the algorithms are different. For example variance:
/* Do the MA calculation using tight loops. */
/* Add-up the initial periods, except for the last value. */
periodTotal1 = 0;
periodTotal2 = 0;
trailingIdx = startIdx-nbInitialElementNeeded;
i=trailingIdx;
if( optInTimePeriod > 1 )
{
while( i < startIdx ) {
tempReal = inReal[i++];
periodTotal1 += tempReal;
tempReal *= tempReal;
periodTotal2 += tempReal;
}
}
/* Proceed with the calculation for the requested range.
* Note that this algorithm allows the inReal and
* outReal to be the same buffer.
*/
outIdx = 0;
do
{
tempReal = inReal[i++];
/* Square and add all the deviation over
* the same periods.
*/
periodTotal1 += tempReal;
tempReal *= tempReal;
periodTotal2 += tempReal;
/* Square and add all the deviation over
* the same period.
*/
meanValue1 = periodTotal1 / optInTimePeriod;
meanValue2 = periodTotal2 / optInTimePeriod;
tempReal = inReal[trailingIdx++];
periodTotal1 -= tempReal;
tempReal *= tempReal;
periodTotal2 -= tempReal;
outReal[outIdx++] = meanValue2-meanValue1*meanValue1;
} while( i <= endIdx );
That doesn't look like your variance. If you were to reproduce the algorithms so that they did the exact same operations then the Go version should produce the same result. Go is just doing standard, IEEE 754 floating point arithmetic.
As to the question "does order matter?" It definitely does. Since floating point arithmetic is inexact you will lose information as you do the calculations. Most of the time it doesn't make much of a difference, but sometimes algorithms can be very susceptible to these changes. (so rearranging your formula algebraically may not lead to the same answer in real code)
You often find in libraries like these that algorithms have been designed to account for these issues and so they often don't look like the naive implementation. For example mean
is usually a trivial function, but here's how its calculated in the GSL:
double
FUNCTION (gsl_stats, mean) (const BASE data[], const size_t stride, const size_t size)
{
/* Compute the arithmetic mean of a dataset using the recurrence relation
mean_(n) = mean(n-1) + (data[n] - mean(n-1))/(n+1) */
long double mean = 0;
size_t i;
for (i = 0; i < size; i++)
{
mean += (data[i * stride] - mean) / (i + 1);
}
return mean;
}
So unless you match the algorithms exactly your answers will be subtly different. (which doesn't necessarily mean you're program is wrong)
One solution often used for this is to do equality comparisons within a very small number (math.Abs(expected-result) < ɛ
, where you define ɛ: const ɛ = 0.0000001
) rather than using ==
.
答案2
得分: 0
根据Caleb和Matteo的评论/答案建议,即使代码的顺序微小差异也会导致浮点值的差异。
我已经确认,至少在小样本大小上,将代码实现与TA-Lib完全相同会得到正确的浮点值。如预期的那样,即使稍微偏离TA-Lib(C语言)的实现,也会导致浮点值的微小差异。
英文:
As suggested by comments/answers from Caleb and Matteo, even subtle differences in the way the code is ordered result in differences in the floating point values.
I've ended up confirming, at least with a small sample size, that implementing the code exactly like TA-Lib results in the correct floating point values. As expected, deviating even slightly from the TA-Lib (C) implementation results in tiny differences in the floating point values.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论