如何在迭代后更新数据框中的分组。

huangapple go评论63阅读模式
英文:

How to update group in df after iteration

问题

我试图对每个数据框组应用一个操作,然后更新原始数据帧中的相应行。然而,新值未插入到正确的位置。实际上,我正试图在每个组内进行diff操作。

mydf = pd.read_csv("my.csv")
mydf["forwardDelta"] = np.NaN

for name, group in mydf.groupby(["a", "b"]):
    group["forwardDelta"] = group["c"] - group["c"].shift(1)
    for index, row in group.iterrows():
        mydf.iloc[index, 4] = row["forwardDelta"]
英文:

I am trying to apply an operation to each data frame group, and then update the corresponding rows in the original data frame. However, the new values are not being inserted at the correct locations. Effectively, I am trying to diff within each group.

mydf = pd.read_csv("my.csv")
mydf["forwardDelta"] = np.NaN

for name, group in mydf.groupby(["a", "b"]):
    group["forwardDelta"] = group["c"] - group["c"].shift(1)
    for index, row in group.iterrows():
        mydf.iloc[index, 4] = row["forwardDelta"]

答案1

得分: 1

我认为这里不需要循环 - 使用 DataFrameGroupBy.diff,如果需要设置第5列的值,请使用 DataFrame.iloc 并使用 : 选择所有行:

mydf = pd.DataFrame({'a':[5,8]*3,
                      'b':[1,2]*3,
                      'c':[2,7,5,4,3,9],
                      'd':list('abcdef'),
                      'e':range(5,11)}).sort_values(['a','b'], ignore_index=True)

print (mydf)
   a  b  c  d   e
0  5  1  2  a   5
1  5  1  5  c   7
2  5  1  3  e   9
3  8  2  7  b   6
4  8  2  4  d   8
5  8  2  9  f  10

mydf["forwardDelta"] = mydf.groupby(["a", "b"])["c"].diff()

mydf.iloc[:, 4] = mydf["forwardDelta"]

print (mydf)
   a  b  c  d    e  forwardDelta
0  5  1  2  a  NaN           NaN
1  5  1  5  c  3.0           3.0
2  5  1  3  e -2.0          -2.0
3  8  2  7  b  NaN           NaN
4  8  2  4  d -3.0          -3.0
5  8  2  9  f  5.0           5.0
英文:

I think here no loops are necessary - use DataFrameGroupBy.diff and if need set values in 5th column use DataFrame.iloc with : for select all rows:

mydf = pd.DataFrame({'a':[5,8]*3,
                      'b':[1,2]*3,
                      'c':[2,7,5,4,3,9],
                      'd':list('abcdef'),
                      'e':range(5,11)}).sort_values(['a','b'], ignore_index=True)


print (mydf)
   a  b  c  d   e
0  5  1  2  a   5
1  5  1  5  c   7
2  5  1  3  e   9
3  8  2  7  b   6
4  8  2  4  d   8
5  8  2  9  f  10

mydf["forwardDelta"] = mydf.groupby(["a", "b"])["c"].diff()

mydf.iloc[:, 4] = mydf["forwardDelta"]

print (mydf)
   a  b  c  d    e  forwardDelta
0  5  1  2  a  NaN           NaN
1  5  1  5  c  3.0           3.0
2  5  1  3  e -2.0          -2.0
3  8  2  7  b  NaN           NaN
4  8  2  4  d -3.0          -3.0
5  8  2  9  f  5.0           5.0

huangapple
  • 本文由 发表于 2023年6月6日 13:25:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76411640.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定