英文:
In Python, how do I read and write the actual word "None" (not the Keyword) between a .csv file and DataFrame?
问题
我想要testoutput.csv与testdata.csv相同。
英文:
I have a .csv that has the actual word "None" as a value in a field. When I read it into a DataFrame, the df reads "None" as the keyword and inserts <NA>. Later, when I rewrite the df to the .csv, all of the places where "None" was are replaced with blanks (,,
) in the .csv.
testdata.csv:
Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,None,Red
import pandas as pd
filename = "testdata.csv"
data_file = "testoutput.csv"
with open(filename, 'r', newline='', ):
# Read data into a DataFrame
user_df = pd.read_csv(filename)
user_df.to_csv(data_file, index=False)
testoutput.csv:
Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,,Red
I want testoutput.csv to be the same as the testdata.csv.
答案1
得分: 0
我已移除了多余的文件打开操作,正如我所预期的那样,我无法复制这个问题。下面是一个演示输出等于输入的会话。
源代码:
import pandas as pd
user_df = pd.read_csv('x.csv')
print(user_df)
user_df.to_csv('x1.csv', index=False)
输出:
timr@Tims-NUC:~/src$ cat x.csv
Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,None,Red
timr@Tims-NUC:~/src$ python x.py
Membership number Last name Date of birth Status Color
0 240200 NaN Wilson None Red
timr@Tims-NUC:~/src$ cat x1.csv
Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,None,Red
后续:
这似乎与版本相关。read_csv
函数包括一个 na_values
参数,用于标识应解释为 NaN 的字符串列表,而(至少在 2.0 版本中)"None" 在该列表中。
因此,两个解决方案是:要么指定一个较短的列表给 na_values
,要么设置 keep_default_na=False
以停止所有 NaN 解释。
英文:
I've removed the extraneous file open, and as I expected, I cannot duplicate the issue. Here is a session demonstrating that the output equals the input.
Source:
import pandas as pd
user_df = pd.read_csv('x.csv')
print(user_df)
user_df.to_csv('x1.csv', index=False)
Output:
timr@Tims-NUC:~/src$ cat x.csv
Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,None,Red
timr@Tims-NUC:~/src$ python x.py
Membership number Last name Date of birth Status Color
0 240200 NaN Wilson None Red
timr@Tims-NUC:~/src$ cat x1.csv
Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,None,Red
Followup
This appears to be version-related. The read_csv
function does include a na_values
parameter that identifies the list of strings that should be interpreted as NaN, and (at least in 2.0) "None" is on that list.
So, the two solutions are: specify a shorter list to na_values
, or set keep_default_na=False
to stop all NaN interpretation.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论