英文:
Pandas not summing values in two numeric columns
问题
I have a dataframe like this:
| A | B |
|---|---|
| 2 | DIV0 |
| 3 | DIV0 |
| 5 | DIV0 |
| DIV0 | 3 |
I want to add a 3rd column 'C' which would be the sum of values in A & B:
| A | B | C |
|---|---|---|
| 2 | DIV0 | 2 |
| 3 | DIV0 | 3 |
| 5 | DIV0 | 5 |
| DIV0 | 3 | 3 |
In my current code, the DIV0 values are removed and A and B are summed by the following lines:
df["A"] = pd.to_numeric(df["A"], errors="coerce")
df["B"] = pd.to_numeric(df["B"], errors="coerce")
df["C"] = df["A"] + df["B"]
However, this gives me an empty C column. I've tried researching numeric columns but can't understand why this is happening. Thanks.
英文:
I have a dataframe like this:
| A | B |
|---|---|
| 2 | DIV0 |
| 3 | DIV0 |
| 5 | DIV0 |
| DIV0 | 3 |
I want to add a 3rd column 'C' which would be the sum of values in A & B:
| A | B | C |
|---|---|---|
| 2 | DIV0 | 2 |
| 3 | DIV0 | 3 |
| 5 | DIV0 | 5 |
| DIV0 | 3 | 3 |
In my current code, the DIV0 values are removed and A and B are summed by the following lines:
df["A"] = pd.to_numeric(df["A"],errors="coerce")
df["B"] = pd.to_numeric(df["B"],errors="coerce")
df["C"] = df["A"] + df["B"]
However this gives me an empty C column - I've tried researching numeric columns but can't understand why this is happening?
thanks
答案1
得分: 3
这是你要翻译的部分:
这是因为add的默认fill_value是NaN,当你进行涉及NaN的算术操作(例如+)时,结果也是NaN。所以你需要将填充值设置为0。
s1 = pd.to_numeric(df["A"], errors="coerce")
s2 = pd.to_numeric(df["B"], errors="coerce")
df["C"] = s1.add(s2, fill_value=0)
另一种变体(如果你有很多列)使用sum:
df["C"] = df.apply(pd.to_numeric, errors="coerce").sum(axis=1)
输出:
print(df)
A B C
0 2 DIV0 2.0
1 3 DIV0 3.0
2 5 DIV0 5.0
3 DIV0 3 3.0
英文:
That's because the default fill_value of add is NaN and when you perform an arithmetic operation (like +) involving NaN, the result is also NaN. So you need to set the fill value to 0.
s1 = pd.to_numeric(df["A"], errors="coerce")
s2 = pd.to_numeric(df["B"], errors="coerce")
df["C"] = s1.add(s2, fill_value=0)
Another variant (if you have a lot of columns) with sum :
df["C"] = df.apply(pd.to_numeric, errors="coerce").sum(axis=1)
Output :
print(df)
A B C
0 2 DIV0 2.0
1 3 DIV0 3.0
2 5 DIV0 5.0
3 DIV0 3 3.0
答案2
得分: 1
你可以这样做:
df['sum'] = (df[['a', 'b']].apply(lambda x: pd.to_numeric(x, errors='coerce')).sum(axis=1, min_count=1))
输出:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论