2023年5月7日 05:13:03go评论74阅读模式

英文:

When is it appropriate to use Python's fmean instead of mean?

问题

Python 3.8发布了fmean作为statistics模块的一部分，用于补充同一模块中现有的mean函数。根据文档：

>将数据转换为浮点数并计算算术平均值。这比mean()函数运行得更快，它始终返回一个浮点数。

来源：https://docs.python.org/3/library/statistics.html#statistics.fmean

然而，文档没有真正讨论任何权衡。我的问题是，何时可以使用fmean而不是mean，何时应该坚持使用mean？

我的具体示例是计算从Phred分数派生的错误概率在FASTQ读取中的平均值。示例代码如下：

from statistics import mean

def decode(c):
    return ord(c) - 33

def phred_to_probability(phred_score):
    return 10**(-phred_score/10)

def raw_q_to_probability(q):
    return phred_to_probability(decode(q))

qualities = list("3===RONT{{QKLIFGHEH=::::CAAA@BBA@CCC....00002::;IBCHKJIIHHHEGGGHHGIJGFFFFMKKPILMLGGGGGIMNMEB@CBCDEKNMQQSJJMT{UKOKLLEEDEELGKIJKPEBA==>>??@@@HD@?@AH?>>>?IIKIPFFEEFFKIEDDCEHFFHIHIKMLPOHQOH")

mean(raw_q_to_probability(q) for q in qualities)

对于这种应用，我应该期望这两个函数之间的精度有差异吗？我注意到，在执行上述示例时，fmean解决方案会报告一个额外的数字，但其他方面都是相同的。

英文:

Python 3.8 released fmean as part of the statistics module, which supplements the existing mean function in the same module. As per the docs:
>Convert data to floats and compute the arithmetic mean. This runs faster than the mean() function and it always returns a float.

Source: https://docs.python.org/3/library/statistics.html#statistics.fmean

However, the docs don't really discuss any trade-offs. My question is when can I use fmean over mean, and when should I stick with mean?

My specific example is averaging error probabilities, derived from Phred scores, in FASTQ reads. Example:

from statistics import mean

def decode(c):
    return ord(c) - 33

def phred_to_probability(phred_score):
    return 10**(-phred_score/10)

def raw_q_to_probability(q):
    return phred_to_probability(decode(q))

qualities = list(&quot;3===RONT{{QKLIFGHEH=::::CAAA@BBA@CCC....00002::;IBCHKJIIHHHEGGGHHGIJGFFFFMKKPILMLGGGGGIMNMEB@CBCDEKNMQQSJJMT{UKOKLLEEDEELGKIJKPEBA==&gt;??@@@HD@?@AH?&gt;?&gt;?IIKIPFFEEFFKIEDDCEHFFHIHIKMLPOHQOH&quot;)


mean(raw_q_to_probability(q) for q in qualities)

Should I expect any difference in accuracy between the two functions for this application? I note the fmean solution has an additional digit reported when executing the above example but is otherwise the same.

答案1

得分: 1

以下是翻译好的部分：

从statistics文档中更高级别的部分：

> 除非明确说明，这些函数支持int、float、Decimal和Fraction。

statistics.mean可以接受一系列的Fractions并将其作为Fraction返回平均值，或者接受Decimal实例的列表并返回Decimal。statistics.fmean无法执行此操作。

此外，即使输入已经是浮点数，statistics.mean可能会避免非常轻微的舍入误差，因为它在最终转换为结果类型之前使用精确算术进行所有计算。statistics.fmean 使用 math.fsum 来以尽可能高的精度对输入进行求和，但fsum的结果仍然是一个浮点数，所以这是statistics.mean避免的一次舍入。

最后，statistics.fmean支持权重，而statistics.mean不支持。

英文:

From higher up in the statistics docs:

> Unless explicitly noted, these functions support int, float, Decimal and Fraction.

statistics.mean can take a sequence of Fractions and give you the mean as a Fraction, or take a list of Decimal instances and give you a Decimal. statistics.fmean cannot do that.

Also, even when the inputs are already floats, statistics.mean may avoid a very slight bit of rounding error, as it does all computations in exact arithmetic until the final conversion to the result type. statistics.fmean uses math.fsum to sum the inputs with as much precision as float will allow, but the result of fsum is still a float, so that's one rounding that statistics.mean avoids.

Finally, statistics.fmean supports weights. statistics.mean does not.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

何时适合使用Python的fmean而不是mean？

问题

答案1

将numpy数组转换为二进制数组在numpy中

将Python变量设置为Linux环境变量。

获取三个数字中最接近的匹配。

“key”属性的查询在GAE数据存储中是否具有强一致性？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论