英文:
Reference polars.DataFrame.height in with_columns
问题
在这个例子中,numpy.random.randint(10, 99, 6)
中的 6
是硬编码的DataFrame的高度,所以如果我将间隔从 8h
更改为 4h
(需要将 6
更改为 12
),它将无法工作。
我知道可以通过中断链来实现:
df = polars.DataFrame(dict(
j=polars.date_range(datetime.date(2023, 1, 1), datetime.date(2023, 1, 3), '4h', closed='left', eager=True),
))
df = df.with_columns(
k=polars.lit(numpy.random.randint(10, 99, df.height)),
)
是否有办法在一个链式表达式中实现(即引用 df.height
或等效的内容)?
英文:
Take this example:
df = (polars
.DataFrame(dict(
j=polars.date_range(datetime.date(2023, 1, 1), datetime.date(2023, 1, 3), '8h', closed='left', eager=True),
))
.with_columns(
k=polars.lit(numpy.random.randint(10, 99, 6)),
)
)
j k
2023-01-01 00:00:00 47
2023-01-01 08:00:00 22
2023-01-01 16:00:00 82
2023-01-02 00:00:00 19
2023-01-02 08:00:00 85
2023-01-02 16:00:00 15
shape: (6, 2)
Here, numpy.random.randint(10, 99, 6)
uses hard-coded 6
as the height of DataFrame, so it won't work if I changed e.g. the interval from 8h
to 4h
(which would require changing 6
to 12
).
I know I can do it by breaking the chain:
df = polars.DataFrame(dict(
j=polars.date_range(datetime.date(2023, 1, 1), datetime.date(2023, 1, 3), '4h', closed='left', eager=True),
))
df = df.with_columns(
k=polars.lit(numpy.random.randint(10, 99, df.height)),
)
j k
2023-01-01 00:00:00 47
2023-01-01 04:00:00 22
2023-01-01 08:00:00 82
2023-01-01 12:00:00 19
2023-01-01 16:00:00 85
2023-01-01 20:00:00 15
2023-01-02 00:00:00 89
2023-01-02 04:00:00 74
2023-01-02 08:00:00 26
2023-01-02 12:00:00 11
2023-01-02 16:00:00 86
2023-01-02 20:00:00 81
shape: (12, 2)
Is there a way to do it (i.e. reference df.height
or an equivalent) in one chained expression though?
答案1
得分: 2
你可以使用 .pipe()
(
pl.date_range(
datetime.date(2023, 1, 1),
datetime.date(2023, 1, 3),
'4h',
closed='left',
eager=True
)
.to_frame()
.pipe(lambda df:
df.with_columns(rand =
pl.lit(np.random.randint(10, 99, df.height))
)
)
)
形状: (12, 2)
┌─────────────────────┬──────┐
│ date ┆ rand │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════╡
│ 2023-01-01 00:00:00 ┆ 39 │
│ 2023-01-01 04:00:00 ┆ 45 │
│ 2023-01-01 08:00:00 ┆ 95 │
│ 2023-01-01 12:00:00 ┆ 72 │
│ … ┆ … │
│ 2023-01-02 08:00:00 ┆ 34 │
│ 2023-01-02 12:00:00 ┆ 42 │
│ 2023-01-02 16:00:00 ┆ 30 │
│ 2023-01-02 20:00:00 ┆ 83 │
└─────────────────────┴──────┘
英文:
You can use .pipe()
(
pl.date_range(
datetime.date(2023, 1, 1),
datetime.date(2023, 1, 3),
'4h',
closed='left',
eager=True
)
.to_frame()
.pipe(lambda df:
df.with_columns(rand =
pl.lit(np.random.randint(10, 99, df.height))
)
)
)
shape: (12, 2)
┌─────────────────────┬──────┐
│ date ┆ rand │
│ --- ┆ --- │
│ datetime[μs] ┆ i64 │
╞═════════════════════╪══════╡
│ 2023-01-01 00:00:00 ┆ 39 │
│ 2023-01-01 04:00:00 ┆ 45 │
│ 2023-01-01 08:00:00 ┆ 95 │
│ 2023-01-01 12:00:00 ┆ 72 │
│ … ┆ … │
│ 2023-01-02 08:00:00 ┆ 34 │
│ 2023-01-02 12:00:00 ┆ 42 │
│ 2023-01-02 16:00:00 ┆ 30 │
│ 2023-01-02 20:00:00 ┆ 83 │
└─────────────────────┴──────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论