问题

你有两个表格，表格1包含低频数据，表格2包含高频数据。

你想要将对应的低频数据填入高频数据表格。

要求是将表格2的event_date列中的日期与表格1的start_date和end_date之间的日期进行匹配。

如果日期在这个范围内，应将低频数据表中的相应行插入到高频数据表的相应行，确保两个表中的类型一致。

我应该如何使用Polars来实现这个操作？

英文:

You have two tables, with Table 1 containing low-frequency data and Table 2 containing high-frequency data.

You want to fill the corresponding low-frequency data into the high-frequency data table.

The requirement is to match the dates in the event_date column of Table 2 with the dates between the start_date and end_date in Table 1.

If a date falls within this range, the corresponding row from the low-frequency data should be inserted into the corresponding row of the high-frequency data, ensuring that the type in both tables aligns.

How should I use Polars to do this?

Table 1:

shape: (21_803_347, 7)
┌─────────┬────────────┬──────┬─────────┬─────────┬────────────┬────────────┐
│ type    ┆ year       ┆ qtr  ┆ rating1 ┆ rating2 ┆ start_date ┆ end_date   │
│ ---     ┆ ---        ┆ ---  ┆ ---     ┆ ---     ┆ ---        ┆ ---        │
│ str     ┆ i16        ┆ i8   ┆ f64     ┆ f64     ┆ date       ┆ date       │
╞═════════╪════════════╪══════╪═════════╪═════════╪════════════╪════════════╡
│ 000001  ┆ 1995       ┆ 4    ┆ null    ┆ 0.04    ┆ 1996-03-14 ┆ 1996-08-28 │
│ 000001  ┆ 1996       ┆ 4    ┆ null    ┆ 0.04    ┆ 1997-04-21 ┆ 1997-08-27 │
│ 000001  ┆ 1997       ┆ 2    ┆ null    ┆ 0.03    ┆ 1997-08-28 ┆ 1998-03-11 │
│ 000001  ┆ 1997       ┆ 4    ┆ null    ┆ 0.02    ┆ 1998-03-12 ┆ 1998-08-25 │
│ 000001  ┆ 1998       ┆ 2    ┆ null    ┆ 0.05    ┆ 1998-08-26 ┆ 1999-04-22 │
│ …       ┆ …          ┆ …    ┆ …       ┆ …       ┆ …          ┆ …          │
│ 600883  ┆ 1998       ┆ 2    ┆ null    ┆ null    ┆ 1998-08-14 ┆ 1999-03-22 │
│ 000792  ┆ 2000       ┆ 2    ┆ null    ┆ null    ┆ 2000-08-22 ┆ 2001-03-13 │
│ 000973  ┆ 2023       ┆ 1    ┆ null    ┆ null    ┆ 2023-04-29 ┆ 2023-08-26 │
│ 300019  ┆ 2009       ┆ 4    ┆ null    ┆ null    ┆ 2010-02-10 ┆ 2010-04-25 │
│ 600919  ┆ 2016       ┆ 4    ┆ null    ┆ null    ┆ 2017-03-20 ┆ 2017-04-26 │
└─────────┴────────────┴──────┴─────────┴─────────┴────────────┴────────────┘

Table 2:

shape: (14_061_391, 5)
┌─────────┬────────────┬─────────┬─────────┬─────────┐
│ type    ┆ event_date ┆ rating1 ┆ rating2 ┆ rating3 │
│ ---     ┆ ---        ┆ ---     ┆ ---     ┆ ---     │
│ str     ┆ date       ┆ f64     ┆ f64     ┆ f64     │
╞═════════╪════════════╪═════════╪═════════╪═════════╡
│ 000001  ┆ 1991-06-26 ┆ null    ┆ 0.008   ┆ null    │
│ 000001  ┆ 1991-06-27 ┆ null    ┆ 0.008   ┆ null    │
│ 000001  ┆ 1991-06-28 ┆ null    ┆ 0.008   ┆ null    │
│ 000001  ┆ 1991-07-01 ┆ null    ┆ 0.008   ┆ null    │
│ 000001  ┆ 1991-07-02 ┆ null    ┆ 0.008   ┆ null    │
│ …       ┆ …          ┆ …       ┆ …       ┆ …       │
│ 900957  ┆ 2023-04-24 ┆ 0.259   ┆ 0.035   ┆ 0.015   │
│ 900957  ┆ 2023-04-25 ┆ 0.234   ┆ 0.035   ┆ 0.015   │
│ 900957  ┆ 2023-04-26 ┆ 0.255   ┆ 0.035   ┆ 0.015   │
│ 900957  ┆ 2023-04-27 ┆ 0.238   ┆ 0.035   ┆ 0.015   │
│ 900957  ┆ 2023-04-28 ┆ 0.234   ┆ 0.035   ┆ 0.015   │
└─────────┴────────────┴─────────┴─────────┴─────────┘

答案1

得分: 2

执行左连接，只连接类型列，然后应用日期处理的筛选条件，为了提高效率，可以使用延迟模式：

df1.lazy().join(df2.lazy(), on="type", how="left").filter(pl.col("event_date").is_between("start_date", "end_date")).collect()

英文:

It is probably easiest to do a left join on just the type column, and then apply a filter for the date handling. To make this efficient, you can use lazy mode:

df1.lazy().join(df2.lazy(), on=&quot;type&quot;, how=&quot;left&quot;).filter(pl.col(&quot;event_date&quot;).is_between(&quot;start_date&quot;, &quot;end_date&quot;)).collect()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据日期范围在 polars 中合并两个 DataFrame？

问题

答案1

Polars数据框删除NaN值

自定义解码二进制数据在Polars中

Given a row with a list too big to explode(), how do I do a preparatory explode() to bring list size down to fit into memory?

如何按列表中的任何正则表达式模式匹配任何字符串列来过滤DataFrame？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论