英文:
Saving panda dataframe to csv changes values
问题
以下是您要翻译的内容:
我想将一堆数值保存到数据框中,然后将其保存为CSV文件,但在保存过程中,数值会发生变化。让我们看看最小可行性示例:
import pandas as pd
import csv
df = {
"value1": [110.589, 222.534, 390.123],
"value2": [50.111, 40.086, 45.334]
}
df.round(1)
#检查点
df.to_csv(some_path)
如果我进行调试并查看在我标记为“检查点”的步骤之后的df的值,即在四舍五入之后,它们如下:
[110.6000, 222.5000, 390.1000],
[50.1000, 40.1000, 45.3000]
实际上,我的数据框要大得多,当我保存后打开CSV文件时,一些值(通常在几十行的随机块中)已经发生了变化!它们看起来像这样:
[110.600000000001, 222.499999999999, 390.099999999999],
[50.099999999999, 40.100000000001, 45.300000000001]
因此,它总是偏离“真实”/四舍五入的值0.000000000001。有谁知道这是怎么回事/如何避免这种情况吗?
英文:
I want to save a bunch of values in a dataframe to csv but I keep running in the problem that something changes to values while saving. Let's have a look at the MWE:
import pandas as pd
import csv
df = {
"value1": [110.589, 222.534, 390.123],
"value2": [50.111, 40.086, 45.334]
}
df.round(1)
#checkpoint
df.to_csv(some_path)
If I debug it and look at the values of df at the step which I marked "checkpoint", thus after rounding, they are like
[110.6000, 222.5000, 390.1000],
[50.1000, 40.1000, 45.3000]
In reality, my data frame is much larger and when I open the csv after saving, some values (usually in a random block of a couple of dozen rows) have changed! They then look like
[110.600000000001, 222.499999999999, 390.099999999999],
[50.099999999999, 40.100000000001, 45.300000000001]
So it's always a 0.000000000001 offset from the "real"/rounded values. Does anybody know what's going on here/how I can avoid this?
答案1
得分: 2
这是一个典型的浮点数问题。 pandas
提供了定义 float_format
的选项:
df.to_csv(some_path, float_format='%.4f')
这将强制保留4位小数(或者实际上,截断到4位小数)。请注意,现在这些值将被视为字符串,因此如果在字符串上设置引用,那么这些列也将被引用。
英文:
This is a typical floating point problem. pandas
gives you the option to define a float_format
:
df.to_csv(some_path, float_format='%.4f')
This will force 4 decimals (or actually, does a cut-off at 4 decimals). Note that values will be treated as strings now, so if you set quoting on strings, then these columns are also quoted.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论