如何从嵌套列表值中提取数值?

huangapple go评论69阅读模式
英文:

How to extract the value from nested list value?

问题

给定一个已聚合的数据框和一个索引数据框,如何从列表[<>]中提取数据?

以下是示例用于生成数据框的代码。cov实际上是由一个复杂的函数生成的,而df_idx是从那个函数中派生的(arg_max)。

use polars::prelude::*;
use polars::df;

fn main() -> PolarsResult<()> {
    let df0 = df![
        "read" => ["a", "a", "b", "b"],
        "region" => ["x", "y", "x", "y"],
        "cov" => [25, 10, 15, 30]
    ]?;
    let df = df0.lazy()
        .group_stable([col("read")])
        .agg([col("*")])
        .collect()?;
    let df_idx = df![
        "read" => ["a", "b"],
        "cov_idx" => [0, 1]
    ]?;
}

预期结果是

┌──────┬───────────┬───────────┐
│ read ┆ region    ┆ cov       │
│ ---  ┆ ---       ┆ ---       │
│ str  ┆ list[str] ┆ list[i32] │
╞══════╪═══════════╪═══════════╡ result_df
│ a    ┆ "x"       ┆ 25        │
│ b    ┆ "z"       ┆ 30        │
└──────┴───────────┴───────────┘ 
英文:

Given an aggregated dataframe and an index dataframe, how to extract data from the list[&lt;&gt;]?

┌──────┬────────────┬───────────┐
│ read ┆ region     ┆ cov       │
│ ---  ┆ ---        ┆ ---       │
│ str  ┆ list[str]  ┆ list[i32] │
╞══════╪════════════╪═══════════╡ df
│ a    ┆ [&quot;x&quot;, &quot;y&quot;] ┆ [25, 10]  │
│ b    ┆ [&quot;x&quot;, &quot;z&quot;] ┆ [15, 30]  │
└──────┴────────────┴───────────┘

┌──────┬─────────┐
│ read ┆ cov_idx │
│ ---  ┆ ---     │
│ str  ┆ u32     │
╞══════╪═════════╡ df_idx
│ a    ┆ 0       │
│ b    ┆ 1       │
└──────┴─────────┘

The following code is an example used to generate the dataframe. The cov is actually generated by a complicated function and df_idx is derived from that (arg_max).

use polars::prelude::*;
use polars::df;

fn main() -&gt; PolarsResult&lt;()&gt; {
    let df0 = df![
        &quot;read&quot; =&gt; [&quot;a&quot;, &quot;a&quot;, &quot;b&quot;, &quot;b&quot;],
        &quot;region&quot; =&gt; [&quot;x&quot;, &quot;y&quot;, &quot;x&quot;, &quot;y&quot;],
        &quot;cov&quot; =&gt; [25, 10, 15, 30]
    ]?;
    let df = df0.lazy()
        .group_stable([col(&quot;read&quot;)])
        .agg([col(&quot;*&quot;)])
        .collect()?;
    let df_idx = df![
        &quot;read&quot; =&gt; [&quot;a&quot;, &quot;b&quot;],
        &quot;cov_idx&quot; =&gt; [0, 1]
    ]?;
}

The expected result is

┌──────┬───────────┬───────────┐
│ read ┆ region    ┆ cov       │
│ ---  ┆ ---       ┆ ---       │
│ str  ┆ list[str] ┆ list[i32] │
╞══════╪═══════════╪═══════════╡ result_df
│ a    ┆ &quot;x&quot;       ┆ 25        │
│ b    ┆ &quot;z&quot;       ┆ 30        │
└──────┴───────────┴───────────┘ 

答案1

得分: 0

你可以使用 Lazy API 的 expr.list().get(idx) 来从每个列表中获取第 idx 个元素。

fn main() -> PolarsResult<()> {
    let df0 = df![
        "read" => ["a", "a", "b", "b"],
        "region" => ["x", "y", "x", "y"],
        "cov" => [25, 10, 15, 30]
    ]?;
    let df_idx = df![
        "read" => ["a", "b"],
        "cov_idx" => [0, 1]
    ]?
    .lazy();

    let df = df0
        .lazy()
        .groupby_stable([col("read")])
        .agg([col("*")])
        .left_join(df_idx, col("read"), col("read"))
        .with_columns(["region", "cov"].map(|c| col(c).list().get(col("cov_idx"))));

    println!("{:?}", df.collect()?);

    Ok(())
}

结果:

shape: (2, 4)
┌──────┬────────┬─────┬─────────┐
│ read ┆ region ┆ cov ┆ cov_idx │
│ ---  ┆ ---    ┆ --- ┆ ---     │
│ str  ┆ str    ┆ i32 ┆ i32     │
╞══════╪════════╪═════╪═════════╡
│ a    ┆ x      ┆ 25  ┆ 0       │
│ b    ┆ y      ┆ 30  ┆ 1       │
└──────┴────────┴─────┴─────────┘
英文:

You can use the Lazy API’s expr.list().get(idx) to fetch the idxth element from each list.

fn main() -&gt; PolarsResult&lt;()&gt; {
	let df0 = df![
		&quot;read&quot; =&gt; [&quot;a&quot;, &quot;a&quot;, &quot;b&quot;, &quot;b&quot;],
		&quot;region&quot; =&gt; [&quot;x&quot;, &quot;y&quot;, &quot;x&quot;, &quot;y&quot;],
		&quot;cov&quot; =&gt; [25, 10, 15, 30]
	]?;
	let df_idx = df![
		&quot;read&quot; =&gt; [&quot;a&quot;, &quot;b&quot;],
		&quot;cov_idx&quot; =&gt; [0, 1]
	]?
	.lazy();

	let df = df0
		.lazy()
		.groupby_stable([col(&quot;read&quot;)])
		.agg([col(&quot;*&quot;)])
		.left_join(df_idx, col(&quot;read&quot;), col(&quot;read&quot;))
		.with_columns([&quot;region&quot;, &quot;cov&quot;].map(|c| col(c).list().get(col(&quot;cov_idx&quot;))));

	println!(&quot;{:?}&quot;, df.collect()?);

	Ok(())
}

Result:

shape: (2, 4)
┌──────┬────────┬─────┬─────────┐
│ read ┆ region ┆ cov ┆ cov_idx │
│ ---  ┆ ---    ┆ --- ┆ ---     │
│ str  ┆ str    ┆ i32 ┆ i32     │
╞══════╪════════╪═════╪═════════╡
│ a    ┆ x      ┆ 25  ┆ 0       │
│ b    ┆ y      ┆ 30  ┆ 1       │
└──────┴────────┴─────┴─────────┘

huangapple
  • 本文由 发表于 2023年6月8日 05:21:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76427199.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定