2023年7月3日 19:09:43go评论106阅读模式

英文:

if first value is zero in one dataframe set previous values to 1 in another dataframe on condition

问题

我有2个数据框，df1和df2，我想根据df1中的条件更改df2中的值

df1

  名称       日期  标志
0  abc  4/11/2023     1
1  xyz   2/8/2023     0

df2：

  名称       日期  标志
0  xyz   2/6/2023     0
1  xyz   2/7/2023     0
2  xyz   2/8/2023     0
3  xyz   2/9/2023     1
4  xyz  2/10/2023     1
5  xyz  2/11/2023     1
6  xyz  2/12/2023     1
7  xyz  2/13/2023     1

在df1中，对于'xyz'，标志在2/8/2023上为0，因此在df2中小于df1中的日期应该为1

预期输出：

英文:

i have 2 dataframes, df1 and df2 i want to change the values of df2 based on a condition from df1

df1

  name       date  flag
0  abc  4/11/2023     1
1  xyz   2/8/2023     0

df2:

  name       date  flag
0  xyz   2/6/2023     0
1  xyz   2/7/2023     0
2  xyz   2/8/2023     0
3  xyz   2/9/2023     1
4  xyz  2/10/2023     1
5  xyz  2/11/2023     1
6  xyz  2/12/2023     1
7  xyz  2/13/2023     1

in df1 for 'xyz', the flag is 0 on 2/8/2023 hence in df2 dates less than the date in df1 should be 1

expected output

I am new to python and want to do it using pandas functions

答案1

得分: 1

以下是代码部分的翻译：

The exact logic is unclear, but you need to use a merge_asof to determine if there is a match per name with a later date:
确切的逻辑不清楚，但您需要使用 merge_asof 来确定是否存在一个与后续日期匹配的名称：

# ensure datetime
df1['date'] = pd.to_datetime(df1['date'], dayfirst=False)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=False)
out = (pd.merge_asof(df2.reset_index().sort_values(by='date'),
                     df1.sort_values(by='date'),
                     by='name', on='date', direction='forward',
                     allow_exact_matches=False
                     )
         .set_index('index').reindex(df2.index)
         .assign(flag=lambda d: d.pop('flag_x').mask(d.pop('flag_y').notna(), 1))
      )

Output:
输出：

      name       date  flag
index                      
0      xyz 2023-02-06     1
1      xyz 2023-02-07     1
2      xyz 2023-02-08     0
3      xyz 2023-02-09     1
4      xyz 2023-02-10     1
5      xyz 2023-02-11     1
6      xyz 2023-02-12     1
7      xyz 2023-02-13     1

Intermediate before the assign:
在 assign 之前的中间结果：

      name       date  flag_x  flag_y
index                                  
0      xyz 2023-02-06       0     0.0
1      xyz 2023-02-07       0     0.0
2      xyz 2023-02-08       0     NaN
3      xyz 2023-02-09       1     NaN
4      xyz 2023-02-10       1     NaN
5      xyz 2023-02-11       1     NaN
6      xyz 2023-02-12       1     NaN
7      xyz 2023-02-13       1     NaN

注意，如果需要，您可以使用更复杂的逻辑，"flag_y" 中的值是匹配日期的值（这里是 2023-02-08 对于索引 0 和 2）。

只考虑 `df1` 中每个名称的一个日期，或仅考虑最大日期

如果 df1 中每个名称只有一个日期，那么您可以简化为：

df1['date'] = pd.to_datetime(df1['date'], dayfirst=False)
df2['date'] = pd.to_datetime(df2['date'], dayfirst=False)
m = df2['date'].lt(df2['name'].map(df1.set_index('name')['date']))
df.loc[m, 'flag'] = 1

或者，如果有多个日期，而您只想考虑每个名称的最大日期：

m = df2['date'].lt(df2['name'].map(df1.groupby('name')['date'].max()))
df.loc[m, 'flag'] = 1

英文:

The exact logic is unclear, but you need to use a merge_asof to determine if there is a match per name with an later date:

# ensure datetime
df1[&#39;date&#39;] = pd.to_datetime(df1[&#39;date&#39;], dayfirst=False)
df2[&#39;date&#39;] = pd.to_datetime(df2[&#39;date&#39;], dayfirst=False)
out = (pd.merge_asof(df2.reset_index().sort_values(by=&#39;date&#39;),
                     df1.sort_values(by=&#39;date&#39;),
                     by=&#39;name&#39;, on=&#39;date&#39;, direction=&#39;forward&#39;,
                     allow_exact_matches=False
                     )
         .set_index(&#39;index&#39;).reindex(df2.index)
         .assign(flag=lambda d: d.pop(&#39;flag_x&#39;).mask(d.pop(&#39;flag_y&#39;).notna(), 1))
      )

Output:

      name       date  flag
index                      
0      xyz 2023-02-06     1
1      xyz 2023-02-07     1
2      xyz 2023-02-08     0
3      xyz 2023-02-09     1
4      xyz 2023-02-10     1
5      xyz 2023-02-11     1
6      xyz 2023-02-12     1
7      xyz 2023-02-13     1

Intermediate before the assign:

      name       date  flag_x  flag_y
index                                
0      xyz 2023-02-06       0     0.0
1      xyz 2023-02-07       0     0.0
2      xyz 2023-02-08       0     NaN
3      xyz 2023-02-09       1     NaN
4      xyz 2023-02-10       1     NaN
5      xyz 2023-02-11       1     NaN
6      xyz 2023-02-12       1     NaN
7      xyz 2023-02-13       1     NaN

Note that you can use a more complex logic if needed, the value in "flag_y" is the value of the matching date (here of 2023-02-08 for indices 0 and 2).

only one date per name in `df1`, or only considering the max date

If df1 only has one date per name, then you can simplify to:

df1[&#39;date&#39;] = pd.to_datetime(df1[&#39;date&#39;], dayfirst=False)
df2[&#39;date&#39;] = pd.to_datetime(df2[&#39;date&#39;], dayfirst=False)
m = df2[&#39;date&#39;].lt(df2[&#39;name&#39;].map(df1.set_index(&#39;name&#39;)[&#39;date&#39;]))
df.loc[m, &#39;flag&#39;] = 1

Or, if several dates and you only want consider the max date per name:

m = df2[&#39;date&#39;].lt(df2[&#39;name&#39;].map(df1.groupby(&#39;name&#39;)[&#39;date&#39;].max()))
df.loc[m, &#39;flag&#39;] = 1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

if first value is zero in one dataframe set previous values to 1 in another dataframe on condition

问题

答案1

只考虑 `df1` 中每个名称的一个日期，或仅考虑最大日期

only one date per name in `df1`, or only considering the max date

如何在discord.py中创建斜杠命令？

如何在pytesseract中使用tessedit_write_images？

Python requirements.txt 限制依赖只能安装在 Atom 处理器上。

Python socket非阻塞recv()异常和sendall()异常

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论

问题

答案1

只考虑 df1 中每个名称的一个日期，或仅考虑最大日期

only one date per name in df1, or only considering the max date

发表评论

只考虑 `df1` 中每个名称的一个日期，或仅考虑最大日期

only one date per name in `df1`, or only considering the max date