英文:
Removing rows that does not meet a condition
问题
我想要获得每个个体在连续2天或更多天内的最低分数:
日期 | 姓名 | 得分 |
---|---|---|
2020年1月3日 | 杰克 | 30 |
英文:
I have a dataframe like this:
Date Name Score
Jan-1-2020 Jake 50
Jan-2-2020 Jake 30
Feb-1-2020 Paul 30
Jan-3-2020 Jake 30
Jan-2-2020 Paul 25
For each individual, I want to determine if they score less than 35% in a 2 or more days consecutive period
First, I arranged the table based on name and Date
Data = Data.sort_values(["Name","Date"], ascending = [True, True])
Date | Name | Score |
---|---|---|
Jan-1-2020 | Jake | 50. |
Jan-2-2020 | Jake | 30 |
Jan-3-2020 | Jake | 30 |
Jan-2-2020 | Paul | 25 |
Feb-1-2020 | Paul | 30 |
I want to obtain one row for each individual that shows their minimum score over a period of 2 or more consecutive days:
Date | Name | Score |
---|---|---|
Jan-3-2020 | Jake | 30 |
答案1
得分: 1
你可以使用rolling.sum
来统计每个2D窗口中小于等于35的值的数量:
df['Date'] = pd.to_datetime(df['Date'])
idx = (df
.sort_values(by='Date')
.assign(flag=lambda d: d['Score'].le(35))
.groupby('Name', group_keys=False)
.apply(lambda g: g.rolling('2D', on='Date')['flag'].sum())
)
print(df.loc[idx[idx>=2]])
输出结果:
Date Name Score
2 2020-02-01 Paul 30
英文:
You can use a rolling.sum
to count the number of values <= 35 per 2D:
df['Date'] = pd.to_datetime(df['Date'])
idx = (df
.sort_values(by='Date')
.assign(flag=lambda d: d['Score'].le(35))
.groupby('Name', group_keys=False)
.apply(lambda g: g.rolling('2D', on='Date')['flag'].sum())
)
print(df.loc[idx[idx>=2]])
Output:
Date Name Score
2 2020-02-01 Paul 30
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论