参考 polars.DataFrame.height 在 with_columns 中。

huangapple go评论64阅读模式
英文:

Reference polars.DataFrame.height in with_columns

问题

在这个例子中,numpy.random.randint(10, 99, 6) 中的 6 是硬编码的DataFrame的高度,所以如果我将间隔从 8h 更改为 4h(需要将 6 更改为 12),它将无法工作。

我知道可以通过中断链来实现:

df = polars.DataFrame(dict(
  j=polars.date_range(datetime.date(2023, 1, 1), datetime.date(2023, 1, 3), '4h', closed='left', eager=True),
))

df = df.with_columns(
  k=polars.lit(numpy.random.randint(10, 99, df.height)),
)

是否有办法在一个链式表达式中实现(即引用 df.height 或等效的内容)?

英文:

Take this example:

df = (polars
  .DataFrame(dict(
    j=polars.date_range(datetime.date(2023, 1, 1), datetime.date(2023, 1, 3), '8h', closed='left', eager=True),
    ))
  .with_columns(
    k=polars.lit(numpy.random.randint(10, 99, 6)),
    )
  )

 j                    k
 2023-01-01 00:00:00  47
 2023-01-01 08:00:00  22
 2023-01-01 16:00:00  82
 2023-01-02 00:00:00  19
 2023-01-02 08:00:00  85
 2023-01-02 16:00:00  15
shape: (6, 2)

Here, numpy.random.randint(10, 99, 6) uses hard-coded 6 as the height of DataFrame, so it won't work if I changed e.g. the interval from 8h to 4h (which would require changing 6 to 12).

I know I can do it by breaking the chain:

df = polars.DataFrame(dict(
  j=polars.date_range(datetime.date(2023, 1, 1), datetime.date(2023, 1, 3), '4h', closed='left', eager=True),
  ))

df = df.with_columns(
  k=polars.lit(numpy.random.randint(10, 99, df.height)),
  )

 j                    k
 2023-01-01 00:00:00  47
 2023-01-01 04:00:00  22
 2023-01-01 08:00:00  82
 2023-01-01 12:00:00  19
 2023-01-01 16:00:00  85
 2023-01-01 20:00:00  15
 2023-01-02 00:00:00  89
 2023-01-02 04:00:00  74
 2023-01-02 08:00:00  26
 2023-01-02 12:00:00  11
 2023-01-02 16:00:00  86
 2023-01-02 20:00:00  81
shape: (12, 2)

Is there a way to do it (i.e. reference df.height or an equivalent) in one chained expression though?

答案1

得分: 2

你可以使用 .pipe()

(
   pl.date_range(
      datetime.date(2023, 1, 1), 
      datetime.date(2023, 1, 3), 
      '4h', 
      closed='left', 
      eager=True
   )
   .to_frame()
   .pipe(lambda df: 
      df.with_columns(rand = 
         pl.lit(np.random.randint(10, 99, df.height))
      )
   )
)
形状: (12, 2)
┌─────────────────────┬──────┐
 date                 rand 
 ---                  ---  
 datetime[μs]         i64  
╞═════════════════════╪══════╡
 2023-01-01 00:00:00  39   
 2023-01-01 04:00:00  45   
 2023-01-01 08:00:00  95   
 2023-01-01 12:00:00  72   
                         
 2023-01-02 08:00:00  34   
 2023-01-02 12:00:00  42   
 2023-01-02 16:00:00  30   
 2023-01-02 20:00:00  83   
└─────────────────────┴──────┘
英文:

You can use .pipe()

(
   pl.date_range(
      datetime.date(2023, 1, 1), 
      datetime.date(2023, 1, 3), 
      '4h', 
      closed='left', 
      eager=True
   )
   .to_frame()
   .pipe(lambda df: 
      df.with_columns(rand = 
         pl.lit(np.random.randint(10, 99, df.height))
      )
   )
)
shape: (12, 2)
┌─────────────────────┬──────┐
│ date                ┆ rand │
│ ---                 ┆ ---  │
│ datetime[μs]        ┆ i64  │
╞═════════════════════╪══════╡
│ 2023-01-01 00:00:00 ┆ 39   │
│ 2023-01-01 04:00:00 ┆ 45   │
│ 2023-01-01 08:00:00 ┆ 95   │
│ 2023-01-01 12:00:00 ┆ 72   │
│ …                   ┆ …    │
│ 2023-01-02 08:00:00 ┆ 34   │
│ 2023-01-02 12:00:00 ┆ 42   │
│ 2023-01-02 16:00:00 ┆ 30   │
│ 2023-01-02 20:00:00 ┆ 83   │
└─────────────────────┴──────┘

huangapple
  • 本文由 发表于 2023年6月12日 05:18:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/76452551.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定