Pandas数据框架:根据索引和条件替换列中的值。

huangapple go评论79阅读模式
英文:

Pandas dataframe: Replace values of columns addressed by index and based on condition

问题

我需要仅在特定列(无名称,因为我有一个没有标题的csv)中替换值,基于某些条件。具体来说,我需要用\N替换""。举例来说,假设我需要更改第8和第9列,csv的以下行:

964,64448,Alen,,2,1998,A45,,,,(Italy),e02d7543d85d91a772dc9f1cac542751

应该变成:

964,64448,Alen,,2,1998,A45,\N,\N,,(Italy),e02d7543d85d91a772dc9f1cac542751

我不能用SED做这个,我必须使用Python

所以我正在加载csv:

df = pd.read_csv(filename, quotechar='"', escapechar='\\"', dtype=str, header=None)

并且,假设columns是我必须更改的列的索引列表,我会这样做:

columns = [8, 9]
df.iloc[:, columns] = np.where(
    df.iloc[:, columns] == '""', '\\N', df.iloc[:, columns]
)
df.to_csv(...)

这种方法不会抛出任何错误,但在输出文件中根本不起作用,不会更改任何内容。我认为这是因为iloc返回的是视图而不是df的副本,但不能确定。我尝试过df.iloc[:, columns].replace('""', '\\N', inplace=True),但结果相同,可能是因为操作必须在同一个df.iloc[...]对象上执行。

我该如何完成这个任务?

英文:

I need to replace values only in certain columns (unnamed, as I have a csv without header) based on some condition.
In particular, I need to replace "" with \N.
For instance, suppose I need to change columns 8 and 9, the following row of the csv:

964,64448,Alen,,2,1998,A45,,,,(Italy),e02d7543d85d91a772dc9f1cac542751

Should become:

964,64448,Alen,,2,1998,A45,\N,\N,,(Italy),e02d7543d85d91a772dc9f1cac542751

I CANNOT DO THIS WITH SED, I must use python

So I am loading the csv:

df = pd.read_csv(filename, quotechar='"', escapechar="\\", dtype=str, header=None)

and, suppose columns is the list of indices of the columns I must change, I would do the following:

columns = [8, 9]
df.iloc[:, columns] = np.where(
    df.iloc[:, columns] == "", "\\N", df.iloc[:, columns]
)
df.to_csv(...)

This approach does not throw any error but simply does not work and changes nothing in the output file. I think because iloc returns a view and not a copy of the df, but not sure.
I have tried with df.iloc[:, columns].replace("", "\\N", inplace=True) but the result is the same, probably because the operation must be done on the same df.iloc[...] object.

How can I get this done?

答案1

得分: 1

Pandas读取你的CSV文件时似乎将其视为具有标题但没有数据。至少需要将header设置为None:

df = pd.read_csv('data', header=None, dtype=str)

然后,你可以通过它们的索引/默认名称(从0开始)来替换列,并写出输出:

df[7].fillna('\\N', inplace=True)
df[8].fillna('\\N', inplace=True)

# 去除任何其他的'NaN',因为它们被读取为空白
df.fillna('')

# 写出输出,同时去掉标题:
df.to_csv('out', header=None)
# 输出:
# 0,964,64448,Alen,,2,1998,A45,\N,\N,,(Italy),e02d7543d85d91a772dc9f1cac542751

在这里的概念相同,但现在不会将NaN插入到空白位置:

df = pd.read_csv('data', header=None, keep_default_na=False)

df[7].replace('', '\\N', inplace=True)
df[8].replace('', '\\N', inplace=True)

df.to_csv('out', header=None)
英文:

Pandas is reading your csv as if it has a header but no data. You at the very least need to set header to None:

df = pd.read_csv('data', header=None, dtype=str)

Then you can go about replacing the columns by their index/default name (starts at 0), and writing the output:

df[7].fillna('\\N', inplace=True)
df[8].fillna('\\N', inplace=True)

# Get rid of any other 'NaN' as they were read in blank
df.fillna('')

# Write the output, also stripping the header:
df.to_csv('out', header=None)
# Output:
# 0,964,64448,Alen,,2,1998,A45,\N,\N,,(Italy),e02d7543d85d91a772dc9f1cac542751

Same concept here, but now doesn't inject NaN into the blank places:

df = pd.read_csv('data', header=None, keep_default_na=False)

df[7].replace('', '\\N', inplace=True)
df[8].replace('', '\\N', inplace=True)

df.to_csv('out', header=None)

答案2

得分: 0

你可以在读取文件后使用 fillna 方法:

df = pd.read_csv(filename, quotechar='"', escapechar="\\", dtype=str, header=None)

df = df.fillna({7: r'\N', 8: r'\N'}).fillna('')
df.to_csv('output.csv', index=False, header=False)

输出:

964,64448,Alen,,2,1998,A45,\N,\N,,(Italy),e02d7543d85d91a772dc9f1cac542751
英文:

You can use fillna after reading your file:

df = pd.read_csv(filename, quotechar='"', escapechar="\\", dtype=str, header=None)

df = df.fillna({7: r'\N', 8: r'\N'}).fillna('')
df.to_csv('output.csv', index=False, header=False)

Output:

964,64448,Alen,,2,1998,A45,\N,\N,,(Italy),e02d7543d85d91a772dc9f1cac542751

huangapple
  • 本文由 发表于 2023年6月6日 02:10:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76408979.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定