英文:
How to check the row trend and add the difference and difference percentage in separate columns for the failing cases
问题
以下是翻译好的代码部分:
import pandas as pd
import numpy as np
d = {'Cell': ['A', 'B', 'C', 'D', 'E'], 'D1': [5, 2, 2, 6, 6], 'D2': [np.nan, 5, 6, np.nan, 3], 'D3': [7, np.nan, 5, 5, np.nan], 'D6': [17, 3, np.nan, np.nan, 2]}
df = pd.DataFrame(d)
希望对你有所帮助。如需进一步的翻译或解释,请随时提出。
英文:
An extension to the problem statement
https://stackoverflow.com/questions/75412399/how-to-check-each-row-trend-with-some-tolerance-by-ignoring-the-np-nan-values-in
import pandas as pd
import numpy as np
d = {'Cell':['A','B','C','D','E'],'D1':[5, 2, 2, 6,6], 'D2':[np.nan, 5, 6, np.nan,3], 'D3':[7,np.nan, 5, 5,np.nan], 'D6':[17, 3, np.nan,np.nan,2]}
df = pd.DataFrame(d)
Cell D1 D2 D3 D6
0 A 5 NaN 7.0 17.0
1 B 2 5.0 NaN 3.0
2 C 2 6.0 5.0 NaN
3 D 6 NaN 5.0 NaN
4 E 6 3.0 NaN 2.0
i want output like this with additional columns diff and diff% along with is_increasing and failing columns
Cell D1 D2 D3 D6 is_increasing? failing diff diff%
0 A 5 NaN 7.0 17.0 True NaN NaN NaN
1 B 2 5.0 NaN 3.0 False [D6] [-2] [40%]
2 C 2 6.0 5.0 NaN False [D3] [-1] [16.6%]
3 D 6 NaN 5.0 NaN False [D3] [-1] [16.6%]
4 E 6 3.0 NaN 2.0 False [D2, D6] [-3,-1] [50%,33%]
Explanation of the columns:
is_increasing --> whether the values are strictly increasing or not
failing --> columns whether strictly increasing is not followed when compared with previous value
diff --> difference of the values where there is failing cases
diff% --> difference in terms of percentages for the failing cases
between (6,5) numbers in the columns
diff column --> 5-6=-1
diff%--> 1-(5/6)=16.6%
Please let me the solution to this problem, i tried different ways but not able to come up with solution.
答案1
得分: 1
以下是您要翻译的内容:
# 仅筛选相关列
# 使用任何方法
df2 = df.drop(columns='Cell')
d = (df2.ffill(axis=1)
.diff(axis=1)
)
m = (d.where(df2.notna())
.lt(0)
)
df['is_increasing'] = ~m.any(axis=1)
df['failing'] = (
m.mul(df2.columns)
.where(m).stack()
.groupby(level=0).agg(list)
)
df['diff'] = (d
.where(m).stack()
.groupby(level=0).agg(list)
)
p = (df2.ffill(axis=1)
.pct_change(axis=1)
.mul(-100).round(2)
)
df['diff%'] = (p
.where(m).stack()
.groupby(level=0).agg(list)
)
print(df)
输出:
Cell D1 D2 D3 D6 is_increasing failing diff diff%
0 A 5 NaN 7.0 17.0 True NaN NaN NaN
1 B 2 5.0 NaN 3.0 False [D6] [-2.0] [40.0]
2 C 2 6.0 5.0 NaN False [D3] [-1.0] [16.67]
3 D 6 NaN 5.0 NaN False [D3] [-1.0] [16.67]
4 E 6 3.0 NaN 2.0 False [D2, D6] [-3.0, -1.0] [50.0, 33.33]
请注意,代码中的注释部分未被翻译。
英文:
You can use:
# filter only relevant columns
# use any method
df2 = df.drop(columns='Cell')
d = (df2.ffill(axis=1)
.diff(axis=1)
)
m = (d.where(df2.notna())
.lt(0)
)
df['is_increasing'] = ~m.any(axis=1)
df['failing'] = (
m.mul(df2.columns)
.where(m).stack()
.groupby(level=0).agg(list)
)
df['diff'] = (d
.where(m).stack()
.groupby(level=0).agg(list)
)
p = (df2.ffill(axis=1)
.pct_change(axis=1)
.mul(-100).round(2)
)
df['diff%'] = (p
.where(m).stack()
.groupby(level=0).agg(list)
)
print(df)
Output:
Cell D1 D2 D3 D6 is_increasing failing diff diff%
0 A 5 NaN 7.0 17.0 True NaN NaN NaN
1 B 2 5.0 NaN 3.0 False [D6] [-2.0] [40.0]
2 C 2 6.0 5.0 NaN False [D3] [-1.0] [16.67]
3 D 6 NaN 5.0 NaN False [D3] [-1.0] [16.67]
4 E 6 3.0 NaN 2.0 False [D2, D6] [-3.0, -1.0] [50.0, 33.33]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论