if first value is zero in one dataframe set previous values to 1 in another dataframe on condition

huangapple go评论70阅读模式
英文:

if first value is zero in one dataframe set previous values to 1 in another dataframe on condition

问题

我有2个数据框,df1和df2,我想根据df1中的条件更改df2中的值

df1

  名称       日期  标志
0  abc  4/11/2023     1
1  xyz   2/8/2023     0

df2:

  名称       日期  标志
0  xyz   2/6/2023     0
1  xyz   2/7/2023     0
2  xyz   2/8/2023     0
3  xyz   2/9/2023     1
4  xyz  2/10/2023     1
5  xyz  2/11/2023     1
6  xyz  2/12/2023     1
7  xyz  2/13/2023     1

在df1中,对于'xyz',标志在2/8/2023上为0,因此在df2中小于df1中的日期应该为1

预期输出:

if first value is zero in one dataframe set previous values to 1 in another dataframe on condition

英文:

i have 2 dataframes, df1 and df2 i want to change the values of df2 based on a condition from df1

df1

  name       date  flag
0  abc  4/11/2023     1
1  xyz   2/8/2023     0

df2:

  name       date  flag
0  xyz   2/6/2023     0
1  xyz   2/7/2023     0
2  xyz   2/8/2023     0
3  xyz   2/9/2023     1
4  xyz  2/10/2023     1
5  xyz  2/11/2023     1
6  xyz  2/12/2023     1
7  xyz  2/13/2023     1

in df1 for 'xyz', the flag is 0 on 2/8/2023 hence in df2 dates less than the date in df1 should be 1

expected output

if first value is zero in one dataframe set previous values to 1 in another dataframe on condition

I am new to python and want to do it using pandas functions

答案1

得分: 1

以下是代码部分的翻译:

The exact logic is unclear, but you need to use a merge_asof to determine if there is a match per name with a later date:
确切的逻辑不清楚,但您需要使用 merge_asof 来确定是否存在一个与后续日期匹配的名称:

# ensure datetime
df1['date'] = pd.to_datetime(df1['date'], dayfirst=False)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=False)

out = (pd.merge_asof(df2.reset_index().sort_values(by='date'),
                     df1.sort_values(by='date'),
                     by='name', on='date', direction='forward',
                     allow_exact_matches=False
                     )
         .set_index('index').reindex(df2.index)
         .assign(flag=lambda d: d.pop('flag_x').mask(d.pop('flag_y').notna(), 1))
      )

Output:
输出:

      name       date  flag
index                      
0      xyz 2023-02-06     1
1      xyz 2023-02-07     1
2      xyz 2023-02-08     0
3      xyz 2023-02-09     1
4      xyz 2023-02-10     1
5      xyz 2023-02-11     1
6      xyz 2023-02-12     1
7      xyz 2023-02-13     1

Intermediate before the assign:
assign 之前的中间结果:

      name       date  flag_x  flag_y
index                                  
0      xyz 2023-02-06       0     0.0
1      xyz 2023-02-07       0     0.0
2      xyz 2023-02-08       0     NaN
3      xyz 2023-02-09       1     NaN
4      xyz 2023-02-10       1     NaN
5      xyz 2023-02-11       1     NaN
6      xyz 2023-02-12       1     NaN
7      xyz 2023-02-13       1     NaN

注意,如果需要,您可以使用更复杂的逻辑,"flag_y" 中的值是匹配日期的值(这里是 2023-02-08 对于索引 0 和 2)。

只考虑 df1 中每个名称的一个日期,或仅考虑最大日期

如果 df1 中每个名称只有一个日期,那么您可以简化为:

df1['date'] = pd.to_datetime(df1['date'], dayfirst=False)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=False)

m = df2['date'].lt(df2['name'].map(df1.set_index('name')['date']))
df.loc[m, 'flag'] = 1

或者,如果有多个日期,而您只想考虑每个名称的最大日期:

m = df2['date'].lt(df2['name'].map(df1.groupby('name')['date'].max()))
df.loc[m, 'flag'] = 1
英文:

The exact logic is unclear, but you need to use a merge_asof to determine if there is a match per name with an later date:

# ensure datetime
df1['date'] = pd.to_datetime(df1['date'], dayfirst=False)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=False)

out = (pd.merge_asof(df2.reset_index().sort_values(by='date'),
                     df1.sort_values(by='date'),
                     by='name', on='date', direction='forward',
                     allow_exact_matches=False
                     )
         .set_index('index').reindex(df2.index)
         .assign(flag=lambda d: d.pop('flag_x').mask(d.pop('flag_y').notna(), 1))
      )

Output:

      name       date  flag
index                      
0      xyz 2023-02-06     1
1      xyz 2023-02-07     1
2      xyz 2023-02-08     0
3      xyz 2023-02-09     1
4      xyz 2023-02-10     1
5      xyz 2023-02-11     1
6      xyz 2023-02-12     1
7      xyz 2023-02-13     1

Intermediate before the assign:

      name       date  flag_x  flag_y
index                                
0      xyz 2023-02-06       0     0.0
1      xyz 2023-02-07       0     0.0
2      xyz 2023-02-08       0     NaN
3      xyz 2023-02-09       1     NaN
4      xyz 2023-02-10       1     NaN
5      xyz 2023-02-11       1     NaN
6      xyz 2023-02-12       1     NaN
7      xyz 2023-02-13       1     NaN

Note that you can use a more complex logic if needed, the value in "flag_y" is the value of the matching date (here of 2023-02-08 for indices 0 and 2).

only one date per name in df1, or only considering the max date

If df1 only has one date per name, then you can simplify to:

df1['date'] = pd.to_datetime(df1['date'], dayfirst=False)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=False)

m = df2['date'].lt(df2['name'].map(df1.set_index('name')['date']))
df.loc[m, 'flag'] = 1

Or, if several dates and you only want consider the max date per name:

m = df2['date'].lt(df2['name'].map(df1.groupby('name')['date'].max()))
df.loc[m, 'flag'] = 1

huangapple
  • 本文由 发表于 2023年7月3日 19:09:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76604175.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定