英文:
How can I subtract from subsequent column in pandas?
问题
我可以用不写死数值的方式,对许多列进行自减操作吗?我知道可以使用 shift 函数来逐行处理,那么是否可以逐列操作呢?有 100 列需要使用这种技巧,我希望能够以更灵活的方式来实现,而不是硬编码。请注意,“a_diff”是故意从一个常数减去的,因为在我的代码中我还需要将该列减去一个常数。感谢!
Sam
英文:
How can I subtract from column before itself for many columns without hardcoding it? I can do it by hard coding it as shown below:
import pandas as pd
df = pd.DataFrame({"a":[1,2,3,4],"b":[1,3,5,6],"c":[6,7,8,9]})
df['a_diff'] = df['a']-16
df['b_diff'] = df['b']-df['a']
df['c_diff'] = df['c']-df['b']
I know there is a way to do it rowwise by using shift function. Can we also it as column wise? There are 100 columns I need to use this technique on so I would rather do it pythonically instead of hard coding it. Please note that "a_diff" was subtracted from constant intentionally since I will have to subtract that column by constant in my code as well.
Thank you,
Sam
答案1
得分: 1
使用diff
和combine_first
(或带有一些限制的fillna
),然后使用add_suffix
和join
将其添加到原始DataFrame:
out = df.join(df.diff(axis=1).combine_first(df[['a']].sub(16)).add_suffix('_diff'))
或者,如果您确信除了"a"列之外的列中没有NaN值:
out = df.join(df.diff(axis=1).fillna(df['a'].sub(16)).add_suffix('_diff'))
输出结果:
a b c a_diff b_diff c_diff
0 1 1 6 -15.0 0 5
1 2 3 7 -14.0 1 4
2 3 5 8 -13.0 2 3
3 4 6 9 -12.0 2 3
英文:
Use diff
and combine_first
(or fillna
with some limitations!) then rename with add_suffix
and join
to the original DataFrame:
out = df.join(df.diff(axis=1).combine_first(df[['a']].sub(16)).add_suffix('_diff'))
Or, if you are sure that there is no NaN in the columns other than "a":
out = df.join(df.diff(axis=1).fillna(df['a'].sub(16)).add_suffix('_diff'))
Output:
a b c a_diff b_diff c_diff
0 1 1 6 -15.0 0 5
1 2 3 7 -14.0 1 4
2 3 5 8 -13.0 2 3
3 4 6 9 -12.0 2 3
答案2
得分: 0
使用DataFrame.diff
与DataFrame.fillna
和DataFrame.add_suffix
设置第一列,然后通过DataFrame.join
附加到原始数据框:
df = df.join(df.diff(axis=1).fillna({'a': df['a'].sub(16)}).add_suffix('_diff'))
不指定硬编码第一列的解决方案:
first = df.columns[0]
df = df.join(df.diff(axis=1).fillna({first: df[first].sub(16)}).add_suffix('_diff'))
或者通过差异设置第一列:
df1 = df.diff(axis=1)
df1.iloc[:, 0] = df.iloc[:, 0].sub(16)
df = df.join(df1.add_suffix('_diff'))
如果原始数据框中不存在缺失值的解决方案:
df = df.join(df.diff(axis=1).fillna(df.sub(16)).add_suffix('_diff'))
print(df)
a b c a_diff b_diff c_diff
0 1 1 6 -15 0 5
1 2 3 7 -14 1 4
2 3 5 8 -13 2 3
3 4 6 9 -12 2 3
英文:
Use DataFrame.diff
with set first column by DataFrame.fillna
and DataFrame.add_suffix
, last append to original by DataFrame.join
:
df = df.join(df.diff(axis=1).fillna({'a': df['a'].sub(16)}).add_suffix('_diff'))
Solution without specify hardcoding first column:
first = df.columns[0]
df = df.join(df.diff(axis=1).fillna({first: df[first].sub(16)}).add_suffix('_diff'))
Or set first column by difference:
df1 = df.diff(axis=1)
df1.iloc[:, 0] = df.iloc[:, 0].sub(16)
df = df.join(df1.add_suffix('_diff'))
Solution if not exist missing values in original DataFrame:
df = df.join(df.diff(axis=1).fillna(df.sub(16)).add_suffix('_diff'))
print (df)
a b c a_diff b_diff c_diff
0 1 1 6 -15 0 5
1 2 3 7 -14 1 4
2 3 5 8 -13 2 3
3 4 6 9 -12 2 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论