英文:
Convert &str to f64 using a Rust Polars custom function
问题
Here is the translated content you requested:
我现在的问题可能可以描述为对Rust和Polars都很陌生。请对我宽容点。
我正在尝试使用自定义函数来建立一种模式,基于此文档:https://pola-rs.github.io/polars-book/user-guide/dsl/custom_functions.html,但到目前为止没有成功。
在我的代码中,我有一个如下声明的函数:
pub fn convert_str_to_tb(value: &str) -> f64 {
let value = value.replace(",", "");
let mut parts = value.split_whitespace();
let num = parts.next().unwrap().parse::<f64>().unwrap();
let unit = parts.next().unwrap();
match unit {
"KB" => num / (1000.0 * 1000.0 * 1000.0),
"MB" => num / (1000.0 * 1000.0),
"GB" => num / 1000.0,
"TB" => num,
_ => panic!("Unsupported unit: {}", unit),
}
}
我相信我应该能够像这样调用这个函数:
df.with_columns([
col("value").map(|s| Ok(convert_str_to_tb(s))).alias("value_tb");
])
我的第一个问题是with_columns方法似乎不存在 - 我不得不使用with_column。如果我使用with_column,我会收到以下错误:
the trait bound `Expr: IntoSeries` is not satisfied
the following other types implement trait `IntoSeries`:
Arc<(dyn polars::prelude::SeriesTrait + 'static)>
ChunkedArray<T>
Logical<DateType, Int32Type>
Logical<DatetimeType, Int64Type>
Logical<DurationType, Int64Type>
Logical<TimeType, Int64Type>
polars::prelude::SeriesrustcClick for full compiler diagnostic
我试图转换的DataFrame如下:
let mut df = df!("volume" => &["volume01", "volume02", "volume03"],
"value" => &["1,000 GB", "2,000,000 MB", "3 TB"]).unwrap();
也许有一种方法可以在不使用自定义函数的情况下完成这个任务?
英文:
My problem can probably be described as being very new to both Rust and Polars. Go easy on me.
I'm trying to establish a pattern using custom functions, based on this documentation: https://pola-rs.github.io/polars-book/user-guide/dsl/custom_functions.html, however am so far unsuccessful.
In my code, I have a function declared as follows:
pub fn convert_str_to_tb(value: &str) -> f64 {
let value = value.replace(",", "");
let mut parts = value.split_whitespace();
let num = parts.next().unwrap().parse::<f64>().unwrap();
let unit = parts.next().unwrap();
match unit {
"KB" => num / (1000.0 * 1000.0 * 1000.0),
"MB" => num / (1000.0 * 1000.0),
"GB" => num / 1000.0,
"TB" => num,
_ => panic!("Unsupported unit: {}", unit),
}
}
I believe I should be able to call this function like so:
df.with_columns([
col("value").map(|s| Ok(convert_str_to_tb(s))).alias("value_tb");
])
My first issue was that with_columns method doesn't seem to exist - I had to use with_column. If I use the with_column, I receive the following error:
the trait bound `Expr: IntoSeries` is not satisfied
the following other types implement trait `IntoSeries`:
Arc<(dyn polars::prelude::SeriesTrait + 'static)>
ChunkedArray<T>
Logical<DateType, Int32Type>
Logical<DatetimeType, Int64Type>
Logical<DurationType, Int64Type>
Logical<TimeType, Int64Type>
polars::prelude::SeriesrustcClick for full compiler diagnostic
The DataFrame I am trying to transform:
let mut df = df!("volume" => &["volume01", "volume02", "volume03"],
"value" => &["1,000 GB", "2,000,000 MB", "3 TB"]).unwrap();
Perhaps there is a way to do this without a custom function?
答案1
得分: 0
问题1,关于.with_columns()
的文档有一个令人困惑的说明 - 示例中的df
是一个惰性数据帧。您可以在完整代码片段中看到他们调用.lazy()
,其中使用了自定义函数。.with_columns()
是可用于惰性数据帧的方法。
问题2,自定义函数的问题是,您对自定义函数的输入和输出有一些类型问题。您期望一个字符串输入并输出一个f64。然而,正如错误所暗示的那样,s
参数实际上是一个Series
,期望的返回值是一个Option<Series>
。
所以这里发生了什么?.map()
函数为您提供了一个系列,您的自定义函数需要对其进行迭代。
更新您的自定义函数以具有适当的参数和返回类型:
pub fn convert_str_to_tb(value: Series) -> Option<Series> {
Some(value.iter().map(|v| {
let value = v.get_str().unwrap().replace(",", "");
let mut parts = value.split_whitespace();
let num = parts.next().unwrap().parse::<f64>().unwrap();
let unit = parts.next().unwrap();
match unit {
"KB" => num / (1000.0 * 1000.0 * 1000.0),
"MB" => num / (1000.0 * 1000.0),
"GB" => num / 1000.0,
"TB" => num,
_ => panic!("Unsupported unit: {}", unit),
}
}).collect())
}
并使用以下方式调用:
df.lazy().with_columns([
col("value").map(|s| Ok(convert_str_to_tb(s)), GetOutput::default()).alias("value_tb")
]).collect().unwrap();
输出如下:
shape: (3, 3)
┌──────────┬──────────────┬──────────┐
│ volume ┆ value ┆ value_tb │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 │
╞══════════╪══════════════╪══════════╡
│ volume01 ┆ 1,000 GB ┆ 1.0 │
│ volume02 ┆ 2,000,000 MB ┆ 2.0 │
│ volume03 ┆ 3 TB ┆ 3.0 │
└──────────┴──────────────┴──────────┘
英文:
Problem 1, with_columns
One confusing note that should be made about the documentation - the df
in the example is a lazy data frame. You can see they call .lazy()
in the full code snippet where a custom function is used. .with_columns()
is an available method on the lazy data frame.
Problem 2, custom function
You have some typing issues around what is expected in the custom function and what you have defined. You are expecting a str input and outputting a f64. However, as the error implies the s
parameter is actually a Series
and the expectation is that the returned value is an Option<Series>
.
So what's happening here? The .map()
function is providing you with a series that your custom function needs to iterate over.
Updating your custom function to have the appropriate arg and return type:
pub fn convert_str_to_tb(value: Series) -> Option<Series> {
Some(value.iter().map(|v| {
let value = v.get_str().unwrap().replace(",", "");
let mut parts = value.split_whitespace();
let num = parts.next().unwrap().parse::<f64>().unwrap();
let unit = parts.next().unwrap();
match unit {
"KB" => num / (1000.0 * 1000.0 * 1000.0),
"MB" => num / (1000.0 * 1000.0),
"GB" => num / 1000.0,
"TB" => num,
_ => panic!("Unsupported unit: {}", unit),
}
}).collect())
}
And called using
df.lazy().with_columns([
col("value").map(|s| Ok(convert_str_to_tb(s)), GetOutput::default()).alias("value_tb")
]).collect().unwrap();
Gives the output:
shape: (3, 3)
┌──────────┬──────────────┬──────────┐
│ volume ┆ value ┆ value_tb │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 │
╞══════════╪══════════════╪══════════╡
│ volume01 ┆ 1,000 GB ┆ 1.0 │
│ volume02 ┆ 2,000,000 MB ┆ 2.0 │
│ volume03 ┆ 3 TB ┆ 3.0 │
└──────────┴──────────────┴──────────┘
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论