英文:
if first value is zero in one dataframe set previous values to 1 in another dataframe on condition
问题
我有2个数据框,df1和df2,我想根据df1中的条件更改df2中的值
df1
名称 日期 标志
0 abc 4/11/2023 1
1 xyz 2/8/2023 0
df2:
名称 日期 标志
0 xyz 2/6/2023 0
1 xyz 2/7/2023 0
2 xyz 2/8/2023 0
3 xyz 2/9/2023 1
4 xyz 2/10/2023 1
5 xyz 2/11/2023 1
6 xyz 2/12/2023 1
7 xyz 2/13/2023 1
在df1中,对于'xyz',标志在2/8/2023上为0,因此在df2中小于df1中的日期应该为1
预期输出:
英文:
i have 2 dataframes, df1 and df2 i want to change the values of df2 based on a condition from df1
df1
name date flag
0 abc 4/11/2023 1
1 xyz 2/8/2023 0
df2:
name date flag
0 xyz 2/6/2023 0
1 xyz 2/7/2023 0
2 xyz 2/8/2023 0
3 xyz 2/9/2023 1
4 xyz 2/10/2023 1
5 xyz 2/11/2023 1
6 xyz 2/12/2023 1
7 xyz 2/13/2023 1
in df1 for 'xyz', the flag is 0 on 2/8/2023 hence in df2 dates less than the date in df1 should be 1
expected output
I am new to python and want to do it using pandas functions
答案1
得分: 1
以下是代码部分的翻译:
The exact logic is unclear, but you need to use a merge_asof
to determine if there is a match per name with a later date:
确切的逻辑不清楚,但您需要使用 merge_asof
来确定是否存在一个与后续日期匹配的名称:
# ensure datetime
df1['date'] = pd.to_datetime(df1['date'], dayfirst=False)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=False)
out = (pd.merge_asof(df2.reset_index().sort_values(by='date'),
df1.sort_values(by='date'),
by='name', on='date', direction='forward',
allow_exact_matches=False
)
.set_index('index').reindex(df2.index)
.assign(flag=lambda d: d.pop('flag_x').mask(d.pop('flag_y').notna(), 1))
)
Output:
输出:
name date flag
index
0 xyz 2023-02-06 1
1 xyz 2023-02-07 1
2 xyz 2023-02-08 0
3 xyz 2023-02-09 1
4 xyz 2023-02-10 1
5 xyz 2023-02-11 1
6 xyz 2023-02-12 1
7 xyz 2023-02-13 1
Intermediate before the assign
:
在 assign
之前的中间结果:
name date flag_x flag_y
index
0 xyz 2023-02-06 0 0.0
1 xyz 2023-02-07 0 0.0
2 xyz 2023-02-08 0 NaN
3 xyz 2023-02-09 1 NaN
4 xyz 2023-02-10 1 NaN
5 xyz 2023-02-11 1 NaN
6 xyz 2023-02-12 1 NaN
7 xyz 2023-02-13 1 NaN
注意,如果需要,您可以使用更复杂的逻辑,"flag_y" 中的值是匹配日期的值(这里是 2023-02-08 对于索引 0 和 2)。
只考虑 df1
中每个名称的一个日期,或仅考虑最大日期
如果 df1
中每个名称只有一个日期,那么您可以简化为:
df1['date'] = pd.to_datetime(df1['date'], dayfirst=False)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=False)
m = df2['date'].lt(df2['name'].map(df1.set_index('name')['date']))
df.loc[m, 'flag'] = 1
或者,如果有多个日期,而您只想考虑每个名称的最大日期:
m = df2['date'].lt(df2['name'].map(df1.groupby('name')['date'].max()))
df.loc[m, 'flag'] = 1
英文:
The exact logic is unclear, but you need to use a merge_asof
to determine if there is a match per name with an later date:
# ensure datetime
df1['date'] = pd.to_datetime(df1['date'], dayfirst=False)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=False)
out = (pd.merge_asof(df2.reset_index().sort_values(by='date'),
df1.sort_values(by='date'),
by='name', on='date', direction='forward',
allow_exact_matches=False
)
.set_index('index').reindex(df2.index)
.assign(flag=lambda d: d.pop('flag_x').mask(d.pop('flag_y').notna(), 1))
)
Output:
name date flag
index
0 xyz 2023-02-06 1
1 xyz 2023-02-07 1
2 xyz 2023-02-08 0
3 xyz 2023-02-09 1
4 xyz 2023-02-10 1
5 xyz 2023-02-11 1
6 xyz 2023-02-12 1
7 xyz 2023-02-13 1
Intermediate before the assign
:
name date flag_x flag_y
index
0 xyz 2023-02-06 0 0.0
1 xyz 2023-02-07 0 0.0
2 xyz 2023-02-08 0 NaN
3 xyz 2023-02-09 1 NaN
4 xyz 2023-02-10 1 NaN
5 xyz 2023-02-11 1 NaN
6 xyz 2023-02-12 1 NaN
7 xyz 2023-02-13 1 NaN
Note that you can use a more complex logic if needed, the value in "flag_y" is the value of the matching date (here of 2023-02-08 for indices 0 and 2).
only one date per name in df1
, or only considering the max date
If df1
only has one date per name, then you can simplify to:
df1['date'] = pd.to_datetime(df1['date'], dayfirst=False)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=False)
m = df2['date'].lt(df2['name'].map(df1.set_index('name')['date']))
df.loc[m, 'flag'] = 1
Or, if several dates and you only want consider the max date per name:
m = df2['date'].lt(df2['name'].map(df1.groupby('name')['date'].max()))
df.loc[m, 'flag'] = 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论