Polars将数字字符串转换为列表

huangapple go评论61阅读模式
英文:

Polars convert string of digits to list

问题

所以我有一个包含数字字符串的 Polars 列/系列。

```plaintext
s = pl.Series("a", ["111", "123", "101"])
s
shape: (3,)
Series: 'a' [str]
[
    "111"
    "123"
    "101"
]

我想要将每个字符串转换为整数列表。
我找到了一个可行的解决方案,但不确定是否最优。

s.str.split("").arr.shift(1).arr.slice(2).arr.eval(pl.element().str.parse_int(10))
shape: (3,)
Series: 'a' [list[i32]]
[
    [1, 1, 1]
    [1, 2, 3]
    [1, 0, 1]
]

我首先在每个点处拆分字符串。对于第一行,这会给我 [ "", "1", "1", "1", "" ]。从中,我想要移除第一个和最后一个条目(空字符串)。由于我事先不知道条目的长度,并且 slice 不允许指定结束索引,所以我选择了 shift -> slice 版本,但我觉得一定有更好的方法。

最后是 parse_int 的应用。

这似乎可以工作,但我想知道是否有更好的方法来执行这些步骤中的任何一个。


<details>
<summary>英文:</summary>

So i have a polars column/series that is strings of digits.

s = pl.Series("a", ["111","123","101"])
s
shape: (3,)
Series: 'a' [str]
[
"111"
"123"
"101"
]

I would like to convert each string into a list of integers.
I have found a working solution but i am not sure if it is optimal.

s.str.split("").arr.shift(1).arr.slice(2).arr.eval(pl.element().str.parse_int(10))
shape: (3,)
Series: 'a' [list[i32]]
[
[1, 1, 1]
[1, 2, 3]
[1, 0, 1]
]

I first split the strings at each point. For the first row this gives me `[&quot;&quot;,&quot;1&quot;,&quot;1&quot;,&quot;1&quot;,&quot;&quot;]`. From this i want to remove the first and last entries (the empty string). Since i dont know the length of the entries beforehand and slice doesnt let one specify an end index i went with the shift -&gt; slice version but i feel that there has to be a better way.

Lastly is the application of the parse_int.

This seems to be working but id like to know if there are better ways to do this or any of the individual steps.

</details>


# 答案1
**得分**: 5

[`.extract_all()`](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.str.extract_all.html) 和 [`.cast()`](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.cast.html#polars.Expr.cast)
```python
s.str.extract_all(r"\d").cast(pl.List(pl.Int64))
形状: (3,)
Series: 'a' [list[i64]]
[
	[1, 1, 1]
	[1, 2, 3]
	[1, 0, 1]
]
英文:

.extract_all() and .cast()

s.str.extract_all(r&quot;\d&quot;).cast(pl.List(pl.Int64))
shape: (3,)
Series: &#39;a&#39; [list[i64]]
[
	[1, 1, 1]
	[1, 2, 3]
	[1, 0, 1]
]

huangapple
  • 本文由 发表于 2023年5月28日 22:20:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76351947.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定