使用ffill(或其他方法)在pandas数据框中更新列中的多个值

huangapple go评论109阅读模式
英文:

Updating MULTIPLE values in a column in pandas dataframe using ffill (or other methods)

问题

Here is the translated code portion you provided:

  1. #line 1
  2. scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['SRNeg', 'SINeg'])).ffill(limit=7).fillna(scores_full['Event'])
  3. #line 2
  4. scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['ArrowTask'])).ffill(limit=4).fillna(scores_full['Event'])
  5. #line 3
  6. scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['Rating'])).ffill(limit=1).fillna(scores_full['Event'])

Please note that these lines of code are meant to fill NaN values in the 'Event_BOLD_Duration' column based on conditions from the 'Event' column. If you have any further questions or need assistance with this code, feel free to ask.

英文:

I have a dataframe that can be simplified like this:

  1. TR = [17,18,19,20,21,22,23,24,25,26,27,28,29]
  2. Event = ['SRNeg', np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, 'Rating', np.NaN, np.NaN, 'ArrowTask', np.NaN]
  3. df = pd.DataFrame({'Event':Event,'TR':TR})

I want to fill in some NaNs after the event value with that value. Note that not all NaNs are filled. Here is the ideal output:

  1. TR = [17,18,19,20,21,22,23,24,25,26,27,28,29]
  2. Event = ['SRNeg', np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, 'Rating', np.NaN, np.NaN, 'ArrowTask', np.NaN]
  3. Event_BOLD_Duration = ['SRNeg', 'SRNeg', 'SRNeg', 'SRNeg', 'SRNeg', 'SRNeg', 'SRNeg', 'SRNeg', 'Rating', 'Rating', np.NaN, 'ArrowTask', 'ArrowTask']
  4. df = pd.DataFrame({'Event':Event,'Event_BOLD_Duration':Event_BOLD_Duration,'TR':TR})

Here is the code I have so far to complete the above task.

  1. #line 1
  2. scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['SRNeg', 'SINeg'])).ffill(limit=7).fillna(scores_full['Event'])
  3. #line 2
  4. scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['ArrowTask'])).ffill(limit=4).fillna(scores_full['Event'])
  5. #line 3
  6. scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['Rating'])).ffill(limit=1).fillna(scores_full['Event'])

However, each line seems to override the previous line's output. How can I fix this issue?

Thank you!

答案1

得分: 1

你需要创建具有通用模式 ".cumsum()" 的各个分组,然后在每个分组上填充值。为了帮助我们,创建一个限制的字典:

  1. limits = {'SRNeg': 7, 'SINeg': 7, 'ArrowTask': 4, 'Rating': 1}
  2. forward_fill = lambda x: x.ffill(limit=limits[x.iloc[0]]) \
  3. if x.iloc[0] in limits else x
  4. df['Event_BOLD_Duration'] = (
  5. df.groupby(df['Event'].notna().cumsum())['Event']
  6. .transform(forward_fill)
  7. )

输出:

  1. >>> df
  2. Event TR Event_BOLD_Duration
  3. 0 SRNeg 17 SRNeg
  4. 1 NaN 18 SRNeg
  5. 2 NaN 19 SRNeg
  6. 3 NaN 20 SRNeg
  7. 4 NaN 21 SRNeg
  8. 5 NaN 22 SRNeg
  9. 6 NaN 23 SRNeg
  10. 7 NaN 24 SRNeg
  11. 8 Rating 25 Rating
  12. 9 NaN 26 Rating
  13. 10 NaN 27 NaN
  14. 11 ArrowTask 28 ArrowTask
  15. 12 NaN 29 ArrowTask
英文:

You have to create individual group with the common pattern "<condition>.cumsum()" then fill values on each group. To help us, create a dict of limits:

  1. limits = {'SRNeg': 7, 'SINeg': 7, 'ArrowTask': 4, 'Rating': 1}
  2. forward_fill = lambda x: x.ffill(limit=limits[x.iloc[0]]) \
  3. if x.iloc[0] in limits else x
  4. df['Event_BOLD_Duration'] = (
  5. df.groupby(df['Event'].notna().cumsum())['Event']
  6. .transform(forward_fill)
  7. )

Output:

  1. >>> df
  2. Event TR Event_BOLD_Duration
  3. 0 SRNeg 17 SRNeg
  4. 1 NaN 18 SRNeg
  5. 2 NaN 19 SRNeg
  6. 3 NaN 20 SRNeg
  7. 4 NaN 21 SRNeg
  8. 5 NaN 22 SRNeg
  9. 6 NaN 23 SRNeg
  10. 7 NaN 24 SRNeg
  11. 8 Rating 25 Rating
  12. 9 NaN 26 Rating
  13. 10 NaN 27 NaN
  14. 11 ArrowTask 28 ArrowTask
  15. 12 NaN 29 ArrowTask

答案2

得分: 0

  1. 尝试这样做
  2. ```python
  3. df['Event_BOLD_Duration']=df['Event'].fillna(method='ffill')
  4. df.loc[10,'Event_BOLD_Duration']=np.nan
  5. print(df)

输出

  1. Event TR Event_BOLD_Duration
  2. 0 SRNeg 17 SRNeg
  3. 1 NaN 18 SRNeg
  4. 2 NaN 19 SRNeg
  5. 3 NaN 20 SRNeg
  6. 4 NaN 21 SRNeg
  7. 5 NaN 22 SRNeg
  8. 6 NaN 23 SRNeg
  9. 7 NaN 24 SRNeg
  10. 8 Rating 25 Rating
  11. 9 NaN 26 Rating
  12. 10 NaN 27 NaN
  13. 11 ArrowTask 28 ArrowTask
  14. 12 NaN 29 ArrowTask
英文:

try this:

  1. df['Event_BOLD_Duration']=df['Event'].fillna(method='ffill')
  2. df.loc[10,'Event_BOLD_Duration']=np.nan
  3. print(df)

Output

  1. Event TR Event_BOLD_Duration
  2. 0 SRNeg 17 SRNeg
  3. 1 NaN 18 SRNeg
  4. 2 NaN 19 SRNeg
  5. 3 NaN 20 SRNeg
  6. 4 NaN 21 SRNeg
  7. 5 NaN 22 SRNeg
  8. 6 NaN 23 SRNeg
  9. 7 NaN 24 SRNeg
  10. 8 Rating 25 Rating
  11. 9 NaN 26 Rating
  12. 10 NaN 27 NaN
  13. 11 ArrowTask 28 ArrowTask
  14. 12 NaN 29 ArrowTask

huangapple
  • 本文由 发表于 2023年6月29日 21:57:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76581727.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定