将panda数据框保存为CSV会更改数值。

huangapple go评论55阅读模式
英文:

Saving panda dataframe to csv changes values

问题

以下是您要翻译的内容:

我想将一堆数值保存到数据框中然后将其保存为CSV文件但在保存过程中数值会发生变化让我们看看最小可行性示例

import pandas as pd
import csv

df = {
  "value1": [110.589, 222.534, 390.123],
  "value2": [50.111, 40.086, 45.334]
}

df.round(1)
#检查点
df.to_csv(some_path)

如果我进行调试并查看在我标记为检查点的步骤之后的df的值即在四舍五入之后它们如下

[110.6000, 222.5000, 390.1000],
[50.1000, 40.1000, 45.3000]

实际上我的数据框要大得多当我保存后打开CSV文件时一些值通常在几十行的随机块中已经发生了变化它们看起来像这样

[110.600000000001, 222.499999999999, 390.099999999999],
[50.099999999999, 40.100000000001, 45.300000000001]

因此它总是偏离真实/四舍五入的值0.000000000001有谁知道这是怎么回事/如何避免这种情况吗
英文:

I want to save a bunch of values in a dataframe to csv but I keep running in the problem that something changes to values while saving. Let's have a look at the MWE:

import pandas as pd
import csv

df = {
  "value1": [110.589, 222.534, 390.123],
  "value2": [50.111, 40.086, 45.334]
}

df.round(1)
#checkpoint
df.to_csv(some_path)

If I debug it and look at the values of df at the step which I marked "checkpoint", thus after rounding, they are like

[110.6000, 222.5000, 390.1000],
[50.1000, 40.1000, 45.3000]

In reality, my data frame is much larger and when I open the csv after saving, some values (usually in a random block of a couple of dozen rows) have changed! They then look like

[110.600000000001, 222.499999999999, 390.099999999999],
[50.099999999999, 40.100000000001, 45.300000000001]

So it's always a 0.000000000001 offset from the "real"/rounded values. Does anybody know what's going on here/how I can avoid this?

答案1

得分: 2

这是一个典型的浮点数问题。 pandas 提供了定义 float_format 的选项:

df.to_csv(some_path, float_format='%.4f')

这将强制保留4位小数(或者实际上,截断到4位小数)。请注意,现在这些值将被视为字符串,因此如果在字符串上设置引用,那么这些列也将被引用。

英文:

This is a typical floating point problem. pandas gives you the option to define a float_format:

df.to_csv(some_path, float_format='%.4f')

This will force 4 decimals (or actually, does a cut-off at 4 decimals). Note that values will be treated as strings now, so if you set quoting on strings, then these columns are also quoted.

huangapple
  • 本文由 发表于 2023年2月20日 00:57:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75501822.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定