你可以在 pandas 中如何从后续列中减去数值?

huangapple go评论73阅读模式
英文:

How can I subtract from subsequent column in pandas?

问题

我可以用不写死数值的方式,对许多列进行自减操作吗?我知道可以使用 shift 函数来逐行处理,那么是否可以逐列操作呢?有 100 列需要使用这种技巧,我希望能够以更灵活的方式来实现,而不是硬编码。请注意,“a_diff”是故意从一个常数减去的,因为在我的代码中我还需要将该列减去一个常数。感谢!

Sam

英文:

How can I subtract from column before itself for many columns without hardcoding it? I can do it by hard coding it as shown below:

import pandas as pd
df = pd.DataFrame({"a":[1,2,3,4],"b":[1,3,5,6],"c":[6,7,8,9]})

df['a_diff'] = df['a']-16
df['b_diff'] = df['b']-df['a']
df['c_diff'] = df['c']-df['b']

I know there is a way to do it rowwise by using shift function. Can we also it as column wise? There are 100 columns I need to use this technique on so I would rather do it pythonically instead of hard coding it. Please note that "a_diff" was subtracted from constant intentionally since I will have to subtract that column by constant in my code as well.

Thank you,

Sam

答案1

得分: 1

使用diffcombine_first(或带有一些限制的fillna),然后使用add_suffixjoin将其添加到原始DataFrame:

out = df.join(df.diff(axis=1).combine_first(df[['a']].sub(16)).add_suffix('_diff'))

或者,如果您确信除了"a"列之外的列中没有NaN值:

out = df.join(df.diff(axis=1).fillna(df['a'].sub(16)).add_suffix('_diff'))

输出结果:

   a  b  c  a_diff  b_diff  c_diff
0  1  1  6   -15.0       0       5
1  2  3  7   -14.0       1       4
2  3  5  8   -13.0       2       3
3  4  6  9   -12.0       2       3
英文:

Use diff and combine_first (or fillna with some limitations!) then rename with add_suffix and join to the original DataFrame:

out = df.join(df.diff(axis=1).combine_first(df[['a']].sub(16)).add_suffix('_diff'))

Or, if you are sure that there is no NaN in the columns other than "a":

out = df.join(df.diff(axis=1).fillna(df['a'].sub(16)).add_suffix('_diff'))

Output:

   a  b  c  a_diff  b_diff  c_diff
0  1  1  6   -15.0       0       5
1  2  3  7   -14.0       1       4
2  3  5  8   -13.0       2       3
3  4  6  9   -12.0       2       3

答案2

得分: 0

使用DataFrame.diffDataFrame.fillnaDataFrame.add_suffix设置第一列,然后通过DataFrame.join附加到原始数据框:

df = df.join(df.diff(axis=1).fillna({'a': df['a'].sub(16)}).add_suffix('_diff'))

不指定硬编码第一列的解决方案:

first = df.columns[0]
df = df.join(df.diff(axis=1).fillna({first: df[first].sub(16)}).add_suffix('_diff'))

或者通过差异设置第一列:

df1 = df.diff(axis=1)
df1.iloc[:, 0] = df.iloc[:, 0].sub(16)
df = df.join(df1.add_suffix('_diff'))

如果原始数据框中不存在缺失值的解决方案:

df = df.join(df.diff(axis=1).fillna(df.sub(16)).add_suffix('_diff'))
print(df)
   a  b  c  a_diff  b_diff  c_diff
0  1  1  6    -15      0      5
1  2  3  7    -14      1      4
2  3  5  8    -13      2      3
3  4  6  9    -12      2      3
英文:

Use DataFrame.diff with set first column by DataFrame.fillna and DataFrame.add_suffix, last append to original by DataFrame.join:

df = df.join(df.diff(axis=1).fillna({'a': df['a'].sub(16)}).add_suffix('_diff'))

Solution without specify hardcoding first column:

first  = df.columns[0]
df = df.join(df.diff(axis=1).fillna({first: df[first].sub(16)}).add_suffix('_diff'))

Or set first column by difference:

df1 = df.diff(axis=1)
df1.iloc[:, 0] = df.iloc[:, 0].sub(16)
df = df.join(df1.add_suffix('_diff'))

Solution if not exist missing values in original DataFrame:

df = df.join(df.diff(axis=1).fillna(df.sub(16)).add_suffix('_diff'))

print (df)
   a  b  c  a_diff  b_diff  c_diff
0  1  1  6     -15       0       5
1  2  3  7     -14       1       4
2  3  5  8     -13       2       3
3  4  6  9     -12       2       3

huangapple
  • 本文由 发表于 2023年3月3日 20:33:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75627137.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定