英文:
When is it appropriate to use Python's fmean instead of mean?
问题
Python 3.8发布了fmean
作为statistics
模块的一部分,用于补充同一模块中现有的mean
函数。根据文档:
>将数据转换为浮点数并计算算术平均值。这比mean()
函数运行得更快,它始终返回一个浮点数。
来源:https://docs.python.org/3/library/statistics.html#statistics.fmean
然而,文档没有真正讨论任何权衡。我的问题是,何时可以使用fmean
而不是mean
,何时应该坚持使用mean
?
我的具体示例是计算从Phred分数派生的错误概率在FASTQ读取中的平均值。示例代码如下:
from statistics import mean
def decode(c):
return ord(c) - 33
def phred_to_probability(phred_score):
return 10**(-phred_score/10)
def raw_q_to_probability(q):
return phred_to_probability(decode(q))
qualities = list("3===RONT{{QKLIFGHEH=::::CAAA@BBA@CCC....00002::;IBCHKJIIHHHEGGGHHGIJGFFFFMKKPILMLGGGGGIMNMEB@CBCDEKNMQQSJJMT{UKOKLLEEDEELGKIJKPEBA==>>??@@@HD@?@AH?>>>?IIKIPFFEEFFKIEDDCEHFFHIHIKMLPOHQOH")
mean(raw_q_to_probability(q) for q in qualities)
对于这种应用,我应该期望这两个函数之间的精度有差异吗?我注意到,在执行上述示例时,fmean
解决方案会报告一个额外的数字,但其他方面都是相同的。
英文:
Python 3.8 released fmean
as part of the statistics
module, which supplements the existing mean
function in the same module. As per the docs:
>Convert data to floats and compute the arithmetic mean. This runs faster than the mean() function and it always returns a float.
Source: https://docs.python.org/3/library/statistics.html#statistics.fmean
However, the docs don't really discuss any trade-offs. My question is when can I use fmean
over mean
, and when should I stick with mean
?
My specific example is averaging error probabilities, derived from Phred scores, in FASTQ reads. Example:
from statistics import mean
def decode(c):
return ord(c) - 33
def phred_to_probability(phred_score):
return 10**(-phred_score/10)
def raw_q_to_probability(q):
return phred_to_probability(decode(q))
qualities = list("3===RONT{{QKLIFGHEH=::::CAAA@BBA@CCC....00002::;IBCHKJIIHHHEGGGHHGIJGFFFFMKKPILMLGGGGGIMNMEB@CBCDEKNMQQSJJMT{UKOKLLEEDEELGKIJKPEBA==>??@@@HD@?@AH?>?>?IIKIPFFEEFFKIEDDCEHFFHIHIKMLPOHQOH")
mean(raw_q_to_probability(q) for q in qualities)
Should I expect any difference in accuracy between the two functions for this application? I note the fmean
solution has an additional digit reported when executing the above example but is otherwise the same.
答案1
得分: 1
以下是翻译好的部分:
从statistics
文档中更高级别的部分:
> 除非明确说明,这些函数支持int、float、Decimal和Fraction。
statistics.mean
可以接受一系列的Fractions并将其作为Fraction返回平均值,或者接受Decimal实例的列表并返回Decimal。statistics.fmean
无法执行此操作。
此外,即使输入已经是浮点数,statistics.mean
可能会避免非常轻微的舍入误差,因为它在最终转换为结果类型之前使用精确算术进行所有计算。statistics.fmean
使用 math.fsum
来以尽可能高的精度对输入进行求和,但fsum
的结果仍然是一个浮点数,所以这是statistics.mean
避免的一次舍入。
最后,statistics.fmean
支持权重,而statistics.mean
不支持。
英文:
From higher up in the statistics
docs:
> Unless explicitly noted, these functions support int, float, Decimal and Fraction.
statistics.mean
can take a sequence of Fractions and give you the mean as a Fraction, or take a list of Decimal instances and give you a Decimal. statistics.fmean
cannot do that.
Also, even when the inputs are already floats, statistics.mean
may avoid a very slight bit of rounding error, as it does all computations in exact arithmetic until the final conversion to the result type. statistics.fmean
uses math.fsum
to sum the inputs with as much precision as float
will allow, but the result of fsum
is still a float, so that's one rounding that statistics.mean
avoids.
Finally, statistics.fmean
supports weights. statistics.mean
does not.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论