英文:
Pandas not summing values in two numeric columns
问题
I have a dataframe like this:
A | B |
---|---|
2 | DIV0 |
3 | DIV0 |
5 | DIV0 |
DIV0 | 3 |
I want to add a 3rd column 'C' which would be the sum of values in A & B:
A | B | C |
---|---|---|
2 | DIV0 | 2 |
3 | DIV0 | 3 |
5 | DIV0 | 5 |
DIV0 | 3 | 3 |
In my current code, the DIV0 values are removed and A and B are summed by the following lines:
df["A"] = pd.to_numeric(df["A"], errors="coerce")
df["B"] = pd.to_numeric(df["B"], errors="coerce")
df["C"] = df["A"] + df["B"]
However, this gives me an empty C column. I've tried researching numeric columns but can't understand why this is happening. Thanks.
英文:
I have a dataframe like this:
A | B |
---|---|
2 | DIV0 |
3 | DIV0 |
5 | DIV0 |
DIV0 | 3 |
I want to add a 3rd column 'C' which would be the sum of values in A & B:
A | B | C |
---|---|---|
2 | DIV0 | 2 |
3 | DIV0 | 3 |
5 | DIV0 | 5 |
DIV0 | 3 | 3 |
In my current code, the DIV0 values are removed and A and B are summed by the following lines:
df["A"] = pd.to_numeric(df["A"],errors="coerce")
df["B"] = pd.to_numeric(df["B"],errors="coerce")
df["C"] = df["A"] + df["B"]
However this gives me an empty C column - I've tried researching numeric columns but can't understand why this is happening?
thanks
答案1
得分: 3
这是你要翻译的部分:
这是因为add
的默认fill_value
是NaN
,当你进行涉及NaN
的算术操作(例如+
)时,结果也是NaN
。所以你需要将填充值设置为0
。
s1 = pd.to_numeric(df["A"], errors="coerce")
s2 = pd.to_numeric(df["B"], errors="coerce")
df["C"] = s1.add(s2, fill_value=0)
另一种变体(如果你有很多列)使用sum
:
df["C"] = df.apply(pd.to_numeric, errors="coerce").sum(axis=1)
输出:
print(df)
A B C
0 2 DIV0 2.0
1 3 DIV0 3.0
2 5 DIV0 5.0
3 DIV0 3 3.0
英文:
That's because the default fill_value
of add
is NaN
and when you perform an arithmetic operation (like +
) involving NaN
, the result is also NaN
. So you need to set the fill value to 0
.
s1 = pd.to_numeric(df["A"], errors="coerce")
s2 = pd.to_numeric(df["B"], errors="coerce")
df["C"] = s1.add(s2, fill_value=0)
Another variant (if you have a lot of columns) with sum
:
df["C"] = df.apply(pd.to_numeric, errors="coerce").sum(axis=1)
Output :
print(df)
A B C
0 2 DIV0 2.0
1 3 DIV0 3.0
2 5 DIV0 5.0
3 DIV0 3 3.0
答案2
得分: 1
你可以这样做:
df['sum'] = (df[['a', 'b']].apply(lambda x: pd.to_numeric(x, errors='coerce')).sum(axis=1, min_count=1))
输出:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论