问题

我找不到Polars库的等效项。但基本上，我想要做的是在一个大型数据框之间填充两个日期之间的缺失日期。由于数据的大小大于100百万，所以必须使用Polars。

以下是我用于Pandas的代码，但如何在Polars中执行相同操作呢？

import janitor
import polars as pl
from datetime import datetime, timedelta

def missing_date_filler(d):
    df = d.copy()

    time_back = 1 # 回溯的天数
    td = pl.DataFrame({"now": [pl.datetime().now()]})
    helper = pl.DataFrame({"helper": [pl.duration.days(time_back)])
    
    max_date = (td - helper).to_date().to_list() # 获取今天的日期减去1天
    
    df_date = pl.date_range(start=df['Date'].min().date(), 
                            end=max_date[0], 
                            freq='1D').to_frame(["Date"]) # 添加从最早日期到昨天的完整日期范围

    df = df.complete(["Col_A", "Col_B"], 
                     right=df_date).sort("Date") # 填充缺失的日期

    return df

请注意，我已经将代码中的Pandas函数替换为Polars函数，以实现相同的功能。

英文:

I do not seem to find an equivalent for Polars library. But basically, what I want to do is fill missing dates between two dates for a big dataframe. It has to be Polars because of the size of the data (> 100 mill).

Below is the code I use for Pandas, but how can I do the same thing for Polars?

import janitor
import pandas as pd
from datetime import datetime, timedelta


def missing_date_filler(d):
    
    
    df = d.copy()

    
    time_back = 1 # Look back in days
    td = pd.to_datetime(datetime.now().strftime(&quot;%Y-%m-%d&quot;))
    helper = timedelta(days=time_back)
    
    max_date = (td - helper).strftime(&quot;%Y-%m-%d&quot;) # Takes todays date minus 1 day
    
    df_date = dict(Date = pd.date_range(df.Date.min(), 
                                        max_date, 
                                        freq=&#39;1D&#39;)) # Adds the full date range between the earliest date up until yesterday

    df =  df.complete([&#39;Col_A&#39;, &#39;Col_B&#39;], 
                      df_date).sort_values(&quot;Date&quot;) # Filling the missing dates
    
    
    return df

答案1

得分: 3

看起来你正在寻找.upsample()函数。

注意，你可以使用 by 参数以分组方式执行操作。

import polars as pl
from datetime import datetime

df = pl.DataFrame({
   "date": [datetime(2023, 1, 2), datetime(2023, 1, 5)], 
   "value": [1, 2]
})

形状：(2, 2)
┌─────────────────────┬───────┐
│ date                | value │
│ ---                 | ---   │
│ datetime[μs]        | i64   │
╞═════════════════════╪═══════╡
│ 2023-01-02 00:00:00 | 1     │
│ 2023-01-05 00:00:00 | 2     │
└─────────────────────┴───────┘

&gt;&gt;&gt; df.upsample(time_column="date", every="1d")
形状：(4, 2)
┌─────────────────────┬───────┐
│ date                | value │
│ ---                 | ---   │
│ datetime[μs]        | i64   │
╞═════════════════════╪═══════╡
│ 2023-01-02 00:00:00 | 1     │
│ 2023-01-03 00:00:00 | null  │
│ 2023-01-04 00:00:00 | null  │
│ 2023-01-05 00:00:00 | 2     │
└─────────────────────┴───────┘

英文:

It sounds like you're looking for .upsample()

Note that you can use the by parameter to perform the operation on a per-group basis.

import polars as pl
from datetime import datetime

df = pl.DataFrame({
   &quot;date&quot;: [datetime(2023, 1, 2), datetime(2023, 1, 5)], 
   &quot;value&quot;: [1, 2]
})

shape: (2, 2)
┌─────────────────────┬───────┐
│ date                | value │
│ ---                 | ---   │
│ datetime[μs]        | i64   │
╞═════════════════════╪═══════╡
│ 2023-01-02 00:00:00 | 1     │
│ 2023-01-05 00:00:00 | 2     │
└─────────────────────┴───────┘

&gt;&gt;&gt; df.upsample(time_column=&quot;date&quot;, every=&quot;1d&quot;)
shape: (4, 2)
┌─────────────────────┬───────┐
│ date                | value │
│ ---                 | ---   │
│ datetime[μs]        | i64   │
╞═════════════════════╪═══════╡
│ 2023-01-02 00:00:00 | 1     │
│ 2023-01-03 00:00:00 | null  │
│ 2023-01-04 00:00:00 | null  │
│ 2023-01-05 00:00:00 | 2     │
└─────────────────────┴───────┘

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在 Polars 数据框架中填充缺失的日期（Python）？

问题

答案1

将仅包含正数据的列表归一化为包含负和正数值的范围内。

将numpy数组添加到Pandas数据帧单元格中会导致ValueError。

在Python中迭代数值/键值对

比较两个JSON文件，将更改应用到另一个系统。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论