Pandas不对两个数值列中的值求和。

huangapple go评论97阅读模式
英文:

Pandas not summing values in two numeric columns

问题

I have a dataframe like this:

A B
2 DIV0
3 DIV0
5 DIV0
DIV0 3

I want to add a 3rd column 'C' which would be the sum of values in A & B:

A B C
2 DIV0 2
3 DIV0 3
5 DIV0 5
DIV0 3 3

In my current code, the DIV0 values are removed and A and B are summed by the following lines:

df["A"] = pd.to_numeric(df["A"], errors="coerce")
df["B"] = pd.to_numeric(df["B"], errors="coerce")
df["C"] = df["A"] + df["B"]

However, this gives me an empty C column. I've tried researching numeric columns but can't understand why this is happening. Thanks.

英文:

I have a dataframe like this:

A B
2 DIV0
3 DIV0
5 DIV0
DIV0 3

I want to add a 3rd column 'C' which would be the sum of values in A & B:

A B C
2 DIV0 2
3 DIV0 3
5 DIV0 5
DIV0 3 3

In my current code, the DIV0 values are removed and A and B are summed by the following lines:

df["A"] = pd.to_numeric(df["A"],errors="coerce")
df["B"] = pd.to_numeric(df["B"],errors="coerce")
df["C"] = df["A"] + df["B"]

However this gives me an empty C column - I've tried researching numeric columns but can't understand why this is happening?
thanks

答案1

得分: 3

这是你要翻译的部分:

这是因为add的默认fill_valueNaN,当你进行涉及NaN的算术操作(例如+)时,结果也是NaN。所以你需要将填充值设置为0

s1 = pd.to_numeric(df["A"], errors="coerce")
s2 = pd.to_numeric(df["B"], errors="coerce")

df["C"] = s1.add(s2, fill_value=0)

另一种变体(如果你有很多列)使用sum

df["C"] = df.apply(pd.to_numeric, errors="coerce").sum(axis=1)

输出:

print(df)

      A     B    C
0     2  DIV0  2.0
1     3  DIV0  3.0
2     5  DIV0  5.0
3  DIV0     3  3.0
英文:

That's because the default fill_value of add is NaN and when you perform an arithmetic operation (like +) involving NaN, the result is also NaN. So you need to set the fill value to 0.

s1 = pd.to_numeric(df["A"], errors="coerce")
s2 = pd.to_numeric(df["B"], errors="coerce")
​
df["C"] = s1.add(s2, fill_value=0)

Another variant (if you have a lot of columns) with sum :

df["C"] = df.apply(pd.to_numeric, errors="coerce").sum(axis=1)

Output :

print(df)

      A     B    C
0     2  DIV0  2.0
1     3  DIV0  3.0
2     5  DIV0  5.0
3  DIV0     3  3.0

答案2

得分: 1

你可以这样做:

df['sum'] = (df[['a', 'b']].apply(lambda x: pd.to_numeric(x, errors='coerce')).sum(axis=1, min_count=1))

输出:

Pandas不对两个数值列中的值求和。

英文:

You could do something like this:

df['sum'] = (df[['a', 'b']].apply(lambda x: pd.to_numeric(x, errors='coerce')).sum(axis=1, min_count=1))

Output:

Pandas不对两个数值列中的值求和。

huangapple
  • 本文由 发表于 2023年4月19日 22:42:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055835.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定