使用ffill(或其他方法)在pandas数据框中更新列中的多个值

huangapple go评论74阅读模式
英文:

Updating MULTIPLE values in a column in pandas dataframe using ffill (or other methods)

问题

Here is the translated code portion you provided:

#line 1
scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['SRNeg', 'SINeg'])).ffill(limit=7).fillna(scores_full['Event']) 
#line 2
scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['ArrowTask'])).ffill(limit=4).fillna(scores_full['Event']) 
#line 3
scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['Rating'])).ffill(limit=1).fillna(scores_full['Event'])

Please note that these lines of code are meant to fill NaN values in the 'Event_BOLD_Duration' column based on conditions from the 'Event' column. If you have any further questions or need assistance with this code, feel free to ask.

英文:

I have a dataframe that can be simplified like this:

TR = [17,18,19,20,21,22,23,24,25,26,27,28,29]
Event = ['SRNeg', np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, 'Rating', np.NaN, np.NaN, 'ArrowTask', np.NaN]
df = pd.DataFrame({'Event':Event,'TR':TR})

I want to fill in some NaNs after the event value with that value. Note that not all NaNs are filled. Here is the ideal output:

TR = [17,18,19,20,21,22,23,24,25,26,27,28,29]
Event = ['SRNeg', np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, np.NaN, 'Rating', np.NaN, np.NaN, 'ArrowTask', np.NaN]
Event_BOLD_Duration = ['SRNeg', 'SRNeg', 'SRNeg', 'SRNeg', 'SRNeg', 'SRNeg', 'SRNeg', 'SRNeg', 'Rating', 'Rating', np.NaN, 'ArrowTask', 'ArrowTask']
df = pd.DataFrame({'Event':Event,'Event_BOLD_Duration':Event_BOLD_Duration,'TR':TR})

Here is the code I have so far to complete the above task.

#line 1
scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['SRNeg', 'SINeg'])).ffill(limit=7).fillna(scores_full['Event']) 
#line 2
scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['ArrowTask'])).ffill(limit=4).fillna(scores_full['Event']) 
#line 3
scores_full['Event_BOLD_Duration'] = scores_full['Event'].where(scores_full['Event'].isin(['Rating'])).ffill(limit=1).fillna(scores_full['Event'])

However, each line seems to override the previous line's output. How can I fix this issue?

Thank you!

答案1

得分: 1

你需要创建具有通用模式 ".cumsum()" 的各个分组,然后在每个分组上填充值。为了帮助我们,创建一个限制的字典:

limits = {'SRNeg': 7, 'SINeg': 7, 'ArrowTask': 4, 'Rating': 1}

forward_fill = lambda x: x.ffill(limit=limits[x.iloc[0]]) \
                             if x.iloc[0] in limits else x

df['Event_BOLD_Duration'] = (
    df.groupby(df['Event'].notna().cumsum())['Event']
      .transform(forward_fill)
)

输出:

>>> df
        Event  TR Event_BOLD_Duration
0       SRNeg  17               SRNeg
1         NaN  18               SRNeg
2         NaN  19               SRNeg
3         NaN  20               SRNeg
4         NaN  21               SRNeg
5         NaN  22               SRNeg
6         NaN  23               SRNeg
7         NaN  24               SRNeg
8      Rating  25              Rating
9         NaN  26              Rating
10        NaN  27                 NaN
11  ArrowTask  28           ArrowTask
12        NaN  29           ArrowTask
英文:

You have to create individual group with the common pattern "<condition>.cumsum()" then fill values on each group. To help us, create a dict of limits:

limits = {'SRNeg': 7, 'SINeg': 7, 'ArrowTask': 4, 'Rating': 1}

forward_fill = lambda x: x.ffill(limit=limits[x.iloc[0]]) \
                             if x.iloc[0] in limits else x

df['Event_BOLD_Duration'] = (
    df.groupby(df['Event'].notna().cumsum())['Event']
      .transform(forward_fill)
)

Output:

>>> df
        Event  TR Event_BOLD_Duration
0       SRNeg  17               SRNeg
1         NaN  18               SRNeg
2         NaN  19               SRNeg
3         NaN  20               SRNeg
4         NaN  21               SRNeg
5         NaN  22               SRNeg
6         NaN  23               SRNeg
7         NaN  24               SRNeg
8      Rating  25              Rating
9         NaN  26              Rating
10        NaN  27                 NaN
11  ArrowTask  28           ArrowTask
12        NaN  29           ArrowTask

答案2

得分: 0

尝试这样做
```python
df['Event_BOLD_Duration']=df['Event'].fillna(method='ffill')
df.loc[10,'Event_BOLD_Duration']=np.nan
print(df)

输出

    Event   TR   Event_BOLD_Duration
0   SRNeg   17   SRNeg
1   NaN     18   SRNeg
2   NaN     19   SRNeg
3   NaN     20   SRNeg
4   NaN     21   SRNeg
5   NaN     22   SRNeg
6   NaN     23   SRNeg
7   NaN     24   SRNeg
8   Rating  25   Rating
9   NaN     26   Rating
10  NaN     27   NaN
11  ArrowTask 28 ArrowTask
12  NaN     29   ArrowTask
英文:

try this:

df['Event_BOLD_Duration']=df['Event'].fillna(method='ffill')
df.loc[10,'Event_BOLD_Duration']=np.nan
print(df)

Output

    Event	   TR	Event_BOLD_Duration
0	SRNeg	    17	SRNeg
1	NaN	        18	SRNeg
2	NaN	        19	SRNeg
3	NaN	        20	SRNeg
4	NaN	        21	SRNeg
5	NaN	        22	SRNeg
6	NaN	        23	SRNeg
7	NaN	        24	SRNeg
8	Rating	    25	Rating
9	NaN	        26	Rating
10	NaN	        27	NaN
11	ArrowTask	28	ArrowTask
12	NaN	        29	ArrowTask

huangapple
  • 本文由 发表于 2023年6月29日 21:57:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76581727.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定