英文:
How to extract the value from nested list value?
问题
给定一个已聚合的数据框和一个索引数据框,如何从列表[<>]中提取数据?
以下是示例用于生成数据框的代码。cov实际上是由一个复杂的函数生成的,而df_idx是从那个函数中派生的(arg_max)。
use polars::prelude::*;
use polars::df;
fn main() -> PolarsResult<()> {
let df0 = df![
"read" => ["a", "a", "b", "b"],
"region" => ["x", "y", "x", "y"],
"cov" => [25, 10, 15, 30]
]?;
let df = df0.lazy()
.group_stable([col("read")])
.agg([col("*")])
.collect()?;
let df_idx = df![
"read" => ["a", "b"],
"cov_idx" => [0, 1]
]?;
}
预期结果是
┌──────┬───────────┬───────────┐
│ read ┆ region ┆ cov │
│ --- ┆ --- ┆ --- │
│ str ┆ list[str] ┆ list[i32] │
╞══════╪═══════════╪═══════════╡ result_df
│ a ┆ "x" ┆ 25 │
│ b ┆ "z" ┆ 30 │
└──────┴───────────┴───────────┘
英文:
Given an aggregated dataframe and an index dataframe, how to extract data from the list[<>]?
┌──────┬────────────┬───────────┐
│ read ┆ region ┆ cov │
│ --- ┆ --- ┆ --- │
│ str ┆ list[str] ┆ list[i32] │
╞══════╪════════════╪═══════════╡ df
│ a ┆ ["x", "y"] ┆ [25, 10] │
│ b ┆ ["x", "z"] ┆ [15, 30] │
└──────┴────────────┴───────────┘
┌──────┬─────────┐
│ read ┆ cov_idx │
│ --- ┆ --- │
│ str ┆ u32 │
╞══════╪═════════╡ df_idx
│ a ┆ 0 │
│ b ┆ 1 │
└──────┴─────────┘
The following code is an example used to generate the dataframe. The cov is actually generated by a complicated function and df_idx is derived from that (arg_max).
use polars::prelude::*;
use polars::df;
fn main() -> PolarsResult<()> {
let df0 = df![
"read" => ["a", "a", "b", "b"],
"region" => ["x", "y", "x", "y"],
"cov" => [25, 10, 15, 30]
]?;
let df = df0.lazy()
.group_stable([col("read")])
.agg([col("*")])
.collect()?;
let df_idx = df![
"read" => ["a", "b"],
"cov_idx" => [0, 1]
]?;
}
The expected result is
┌──────┬───────────┬───────────┐
│ read ┆ region ┆ cov │
│ --- ┆ --- ┆ --- │
│ str ┆ list[str] ┆ list[i32] │
╞══════╪═══════════╪═══════════╡ result_df
│ a ┆ "x" ┆ 25 │
│ b ┆ "z" ┆ 30 │
└──────┴───────────┴───────────┘
答案1
得分: 0
你可以使用 Lazy API 的 expr.list().get(idx)
来从每个列表中获取第 idx
个元素。
fn main() -> PolarsResult<()> {
let df0 = df![
"read" => ["a", "a", "b", "b"],
"region" => ["x", "y", "x", "y"],
"cov" => [25, 10, 15, 30]
]?;
let df_idx = df![
"read" => ["a", "b"],
"cov_idx" => [0, 1]
]?
.lazy();
let df = df0
.lazy()
.groupby_stable([col("read")])
.agg([col("*")])
.left_join(df_idx, col("read"), col("read"))
.with_columns(["region", "cov"].map(|c| col(c).list().get(col("cov_idx"))));
println!("{:?}", df.collect()?);
Ok(())
}
结果:
shape: (2, 4)
┌──────┬────────┬─────┬─────────┐
│ read ┆ region ┆ cov ┆ cov_idx │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i32 ┆ i32 │
╞══════╪════════╪═════╪═════════╡
│ a ┆ x ┆ 25 ┆ 0 │
│ b ┆ y ┆ 30 ┆ 1 │
└──────┴────────┴─────┴─────────┘
英文:
You can use the Lazy API’s expr.list().get(idx)
to fetch the idx
th element from each list.
fn main() -> PolarsResult<()> {
let df0 = df![
"read" => ["a", "a", "b", "b"],
"region" => ["x", "y", "x", "y"],
"cov" => [25, 10, 15, 30]
]?;
let df_idx = df![
"read" => ["a", "b"],
"cov_idx" => [0, 1]
]?
.lazy();
let df = df0
.lazy()
.groupby_stable([col("read")])
.agg([col("*")])
.left_join(df_idx, col("read"), col("read"))
.with_columns(["region", "cov"].map(|c| col(c).list().get(col("cov_idx"))));
println!("{:?}", df.collect()?);
Ok(())
}
Result:
shape: (2, 4)
┌──────┬────────┬─────┬─────────┐
│ read ┆ region ┆ cov ┆ cov_idx │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i32 ┆ i32 │
╞══════╪════════╪═════╪═════════╡
│ a ┆ x ┆ 25 ┆ 0 │
│ b ┆ y ┆ 30 ┆ 1 │
└──────┴────────┴─────┴─────────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论