In Python, how do I read and write the actual word "None" (not the Keyword) between a .csv file and DataFrame?

huangapple go评论58阅读模式
英文:

In Python, how do I read and write the actual word "None" (not the Keyword) between a .csv file and DataFrame?

问题

我想要testoutput.csv与testdata.csv相同。

英文:

I have a .csv that has the actual word "None" as a value in a field. When I read it into a DataFrame, the df reads "None" as the keyword and inserts <NA>. Later, when I rewrite the df to the .csv, all of the places where "None" was are replaced with blanks (,,) in the .csv.

testdata.csv:

Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,None,Red
import pandas as pd

filename = &quot;testdata.csv&quot;
data_file = &quot;testoutput.csv&quot;

with open(filename, &#39;r&#39;, newline=&#39;&#39;, ):
    # Read data into a DataFrame
    user_df = pd.read_csv(filename)
    
user_df.to_csv(data_file, index=False)

testoutput.csv:

Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,,Red

I want testoutput.csv to be the same as the testdata.csv.

答案1

得分: 0

我已移除了多余的文件打开操作,正如我所预期的那样,我无法复制这个问题。下面是一个演示输出等于输入的会话。

源代码:

import pandas as pd

user_df = pd.read_csv('x.csv')
print(user_df)    
user_df.to_csv('x1.csv', index=False)

输出:

timr@Tims-NUC:~/src$ cat x.csv
Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,None,Red
timr@Tims-NUC:~/src$ python x.py
   Membership number  Last name Date of birth Status Color
0             240200        NaN        Wilson   None   Red
timr@Tims-NUC:~/src$ cat x1.csv
Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,None,Red

后续:
这似乎与版本相关。read_csv 函数包括一个 na_values 参数,用于标识应解释为 NaN 的字符串列表,而(至少在 2.0 版本中)"None" 在该列表中。

因此,两个解决方案是:要么指定一个较短的列表给 na_values,要么设置 keep_default_na=False 以停止所有 NaN 解释。

英文:

I've removed the extraneous file open, and as I expected, I cannot duplicate the issue. Here is a session demonstrating that the output equals the input.

Source:

import pandas as pd

user_df = pd.read_csv(&#39;x.csv&#39;)
print(user_df)    
user_df.to_csv(&#39;x1.csv&#39;, index=False)

Output:

timr@Tims-NUC:~/src$ cat x.csv
Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,None,Red
timr@Tims-NUC:~/src$ python x.py
   Membership number  Last name Date of birth Status Color
0             240200        NaN        Wilson   None   Red
timr@Tims-NUC:~/src$ cat x1.csv
Membership number,Last name,Date of birth,Status,Color
240200,,Wilson,None,Red

Followup

This appears to be version-related. The read_csv function does include a na_values parameter that identifies the list of strings that should be interpreted as NaN, and (at least in 2.0) "None" is on that list.

So, the two solutions are: specify a shorter list to na_values, or set keep_default_na=False to stop all NaN interpretation.

huangapple
  • 本文由 发表于 2023年6月12日 05:13:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76452541.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定