何时适合使用Python的fmean而不是mean?

huangapple go评论59阅读模式
英文:

When is it appropriate to use Python's fmean instead of mean?

问题

Python 3.8发布了fmean作为statistics模块的一部分,用于补充同一模块中现有的mean函数。根据文档:

>将数据转换为浮点数并计算算术平均值。这比mean()函数运行得更快,它始终返回一个浮点数。

来源:https://docs.python.org/3/library/statistics.html#statistics.fmean

然而,文档没有真正讨论任何权衡。我的问题是,何时可以使用fmean而不是mean,何时应该坚持使用mean

我的具体示例是计算从Phred分数派生的错误概率在FASTQ读取中的平均值。示例代码如下:

from statistics import mean

def decode(c):
    return ord(c) - 33

def phred_to_probability(phred_score):
    return 10**(-phred_score/10)

def raw_q_to_probability(q):
    return phred_to_probability(decode(q))

qualities = list("3===RONT{{QKLIFGHEH=::::CAAA@BBA@CCC....00002::;IBCHKJIIHHHEGGGHHGIJGFFFFMKKPILMLGGGGGIMNMEB@CBCDEKNMQQSJJMT{UKOKLLEEDEELGKIJKPEBA==>>??@@@HD@?@AH?>>>?IIKIPFFEEFFKIEDDCEHFFHIHIKMLPOHQOH")

mean(raw_q_to_probability(q) for q in qualities)

对于这种应用,我应该期望这两个函数之间的精度有差异吗?我注意到,在执行上述示例时,fmean解决方案会报告一个额外的数字,但其他方面都是相同的。

英文:

Python 3.8 released fmean as part of the statistics module, which supplements the existing mean function in the same module. As per the docs:
>Convert data to floats and compute the arithmetic mean. This runs faster than the mean() function and it always returns a float.

Source: https://docs.python.org/3/library/statistics.html#statistics.fmean

However, the docs don't really discuss any trade-offs. My question is when can I use fmean over mean, and when should I stick with mean?

My specific example is averaging error probabilities, derived from Phred scores, in FASTQ reads. Example:

from statistics import mean

def decode(c):
    return ord(c) - 33

def phred_to_probability(phred_score):
    return 10**(-phred_score/10)

def raw_q_to_probability(q):
    return phred_to_probability(decode(q))

qualities = list("3===RONT{{QKLIFGHEH=::::CAAA@BBA@CCC....00002::;IBCHKJIIHHHEGGGHHGIJGFFFFMKKPILMLGGGGGIMNMEB@CBCDEKNMQQSJJMT{UKOKLLEEDEELGKIJKPEBA==>??@@@HD@?@AH?>?>?IIKIPFFEEFFKIEDDCEHFFHIHIKMLPOHQOH")


mean(raw_q_to_probability(q) for q in qualities)

Should I expect any difference in accuracy between the two functions for this application? I note the fmean solution has an additional digit reported when executing the above example but is otherwise the same.

答案1

得分: 1

以下是翻译好的部分:

statistics文档中更高级别的部分:

> 除非明确说明,这些函数支持int、float、Decimal和Fraction。

statistics.mean可以接受一系列的Fractions并将其作为Fraction返回平均值,或者接受Decimal实例的列表并返回Decimal。statistics.fmean无法执行此操作。

此外,即使输入已经是浮点数,statistics.mean可能会避免非常轻微的舍入误差,因为它在最终转换为结果类型之前使用精确算术进行所有计算。statistics.fmean 使用 math.fsum 来以尽可能高的精度对输入进行求和,但fsum的结果仍然是一个浮点数,所以这是statistics.mean避免的一次舍入。

最后,statistics.fmean支持权重,而statistics.mean不支持。

英文:

From higher up in the statistics docs:

> Unless explicitly noted, these functions support int, float, Decimal and Fraction.

statistics.mean can take a sequence of Fractions and give you the mean as a Fraction, or take a list of Decimal instances and give you a Decimal. statistics.fmean cannot do that.

Also, even when the inputs are already floats, statistics.mean may avoid a very slight bit of rounding error, as it does all computations in exact arithmetic until the final conversion to the result type. statistics.fmean uses math.fsum to sum the inputs with as much precision as float will allow, but the result of fsum is still a float, so that's one rounding that statistics.mean avoids.

Finally, statistics.fmean supports weights. statistics.mean does not.

huangapple
  • 本文由 发表于 2023年5月7日 05:13:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76191162.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定