将`&str`转换为`f64`,使用Rust Polars自定义函数。

huangapple go评论82阅读模式
英文:

Convert &str to f64 using a Rust Polars custom function

问题

Here is the translated content you requested:

我现在的问题可能可以描述为对Rust和Polars都很陌生。请对我宽容点。 将`&str`转换为`f64`,使用Rust Polars自定义函数。

我正在尝试使用自定义函数来建立一种模式,基于此文档:https://pola-rs.github.io/polars-book/user-guide/dsl/custom_functions.html,但到目前为止没有成功。

在我的代码中,我有一个如下声明的函数:

pub fn convert_str_to_tb(value: &str) -> f64 {
    let value = value.replace(",", "");
    let mut parts = value.split_whitespace();
    let num = parts.next().unwrap().parse::<f64>().unwrap();
    let unit = parts.next().unwrap();

    match unit {
        "KB" => num / (1000.0 * 1000.0 * 1000.0),
        "MB" => num / (1000.0 * 1000.0),
        "GB" => num / 1000.0,
        "TB" => num,
        _ => panic!("Unsupported unit: {}", unit),
    }
}

我相信我应该能够像这样调用这个函数:

df.with_columns([
    col("value").map(|s| Ok(convert_str_to_tb(s))).alias("value_tb");
])

我的第一个问题是with_columns方法似乎不存在 - 我不得不使用with_column。如果我使用with_column,我会收到以下错误:

the trait bound `Expr: IntoSeries` is not satisfied
the following other types implement trait `IntoSeries`:
  Arc<(dyn polars::prelude::SeriesTrait + 'static)>
  ChunkedArray<T>
  Logical<DateType, Int32Type>
  Logical<DatetimeType, Int64Type>
  Logical<DurationType, Int64Type>
  Logical<TimeType, Int64Type>
  polars::prelude::SeriesrustcClick for full compiler diagnostic

我试图转换的DataFrame如下:

let mut df = df!("volume" => &["volume01", "volume02", "volume03"],
                 "value" => &["1,000 GB", "2,000,000 MB", "3 TB"]).unwrap();

也许有一种方法可以在不使用自定义函数的情况下完成这个任务?

英文:

My problem can probably be described as being very new to both Rust and Polars. Go easy on me. 将`&str`转换为`f64`,使用Rust Polars自定义函数。

I'm trying to establish a pattern using custom functions, based on this documentation: https://pola-rs.github.io/polars-book/user-guide/dsl/custom_functions.html, however am so far unsuccessful.

In my code, I have a function declared as follows:

pub fn convert_str_to_tb(value: &amp;str) -&gt; f64 {
    let value = value.replace(&quot;,&quot;, &quot;&quot;);
    let mut parts = value.split_whitespace();
    let num = parts.next().unwrap().parse::&lt;f64&gt;().unwrap();
    let unit = parts.next().unwrap();

    match unit {
        &quot;KB&quot; =&gt; num / (1000.0 * 1000.0 * 1000.0),
        &quot;MB&quot; =&gt; num / (1000.0 * 1000.0),
        &quot;GB&quot; =&gt; num / 1000.0,
        &quot;TB&quot; =&gt; num,
        _ =&gt; panic!(&quot;Unsupported unit: {}&quot;, unit),
    }
}

I believe I should be able to call this function like so:

df.with_columns([
    col(&quot;value&quot;).map(|s| Ok(convert_str_to_tb(s))).alias(&quot;value_tb&quot;);
])

My first issue was that with_columns method doesn't seem to exist - I had to use with_column. If I use the with_column, I receive the following error:

the trait bound `Expr: IntoSeries` is not satisfied
the following other types implement trait `IntoSeries`:
  Arc&lt;(dyn polars::prelude::SeriesTrait + &#39;static)&gt;
  ChunkedArray&lt;T&gt;
  Logical&lt;DateType, Int32Type&gt;
  Logical&lt;DatetimeType, Int64Type&gt;
  Logical&lt;DurationType, Int64Type&gt;
  Logical&lt;TimeType, Int64Type&gt;
  polars::prelude::SeriesrustcClick for full compiler diagnostic

The DataFrame I am trying to transform:

let mut df = df!(&quot;volume&quot; =&gt; &amp;[&quot;volume01&quot;, &quot;volume02&quot;, &quot;volume03&quot;],
                 &quot;value&quot; =&gt; &amp;[&quot;1,000 GB&quot;, &quot;2,000,000 MB&quot;, &quot;3 TB&quot;]).unwrap();

Perhaps there is a way to do this without a custom function?

答案1

得分: 0

问题1,关于.with_columns()的文档有一个令人困惑的说明 - 示例中的df是一个惰性数据帧。您可以在完整代码片段中看到他们调用.lazy(),其中使用了自定义函数。.with_columns()是可用于惰性数据帧的方法。

问题2,自定义函数的问题是,您对自定义函数的输入和输出有一些类型问题。您期望一个字符串输入并输出一个f64。然而,正如错误所暗示的那样,s参数实际上是一个Series,期望的返回值是一个Option<Series>

所以这里发生了什么?.map()函数为您提供了一个系列,您的自定义函数需要对其进行迭代。

更新您的自定义函数以具有适当的参数和返回类型:

pub fn convert_str_to_tb(value: Series) -> Option<Series> {
    Some(value.iter().map(|v| {
        let value = v.get_str().unwrap().replace(",", "");
        let mut parts = value.split_whitespace();
        let num = parts.next().unwrap().parse::<f64>().unwrap();
        let unit = parts.next().unwrap();

        match unit {
            "KB" => num / (1000.0 * 1000.0 * 1000.0),
            "MB" => num / (1000.0 * 1000.0),
            "GB" => num / 1000.0,
            "TB" => num,
            _ => panic!("Unsupported unit: {}", unit),
        }
    }).collect())
}

并使用以下方式调用:

df.lazy().with_columns([
    col("value").map(|s| Ok(convert_str_to_tb(s)), GetOutput::default()).alias("value_tb")
]).collect().unwrap();

输出如下:

shape: (3, 3)
┌──────────┬──────────────┬──────────┐
 volume    value         value_tb 
 ---       ---           ---      
 str       str           f64      
╞══════════╪══════════════╪══════════╡
 volume01  1,000 GB      1.0      
 volume02  2,000,000 MB  2.0      
 volume03  3 TB          3.0      
└──────────┴──────────────┴──────────┘
英文:

Problem 1, with_columns

One confusing note that should be made about the documentation - the df in the example is a lazy data frame. You can see they call .lazy() in the full code snippet where a custom function is used. .with_columns() is an available method on the lazy data frame.

Problem 2, custom function

You have some typing issues around what is expected in the custom function and what you have defined. You are expecting a str input and outputting a f64. However, as the error implies the s parameter is actually a Series and the expectation is that the returned value is an Option&lt;Series&gt;.

So what's happening here? The .map() function is providing you with a series that your custom function needs to iterate over.

Updating your custom function to have the appropriate arg and return type:

pub fn convert_str_to_tb(value: Series) -&gt; Option&lt;Series&gt; {
    Some(value.iter().map(|v| {
		let value = v.get_str().unwrap().replace(&quot;,&quot;, &quot;&quot;);
		let mut parts = value.split_whitespace();
		let num = parts.next().unwrap().parse::&lt;f64&gt;().unwrap();
		let unit = parts.next().unwrap();

		match unit {
			&quot;KB&quot; =&gt; num / (1000.0 * 1000.0 * 1000.0),
			&quot;MB&quot; =&gt; num / (1000.0 * 1000.0),
			&quot;GB&quot; =&gt; num / 1000.0,
			&quot;TB&quot; =&gt; num,
			_ =&gt; panic!(&quot;Unsupported unit: {}&quot;, unit),
		}
	}).collect())
}

And called using

df.lazy().with_columns([
	col(&quot;value&quot;).map(|s| Ok(convert_str_to_tb(s)), GetOutput::default()).alias(&quot;value_tb&quot;)
]).collect().unwrap();

Gives the output:

shape: (3, 3)
┌──────────┬──────────────┬──────────┐
│ volume   ┆ value        ┆ value_tb │
│ ---      ┆ ---          ┆ ---      │
│ str      ┆ str          ┆ f64      │
╞══════════╪══════════════╪══════════╡
│ volume01 ┆ 1,000 GB     ┆ 1.0      │
│ volume02 ┆ 2,000,000 MB ┆ 2.0      │
│ volume03 ┆ 3 TB         ┆ 3.0      │
└──────────┴──────────────┴──────────┘

huangapple
  • 本文由 发表于 2023年4月20日 08:18:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76059689.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定