2023年6月15日 03:42:11go评论89阅读模式

英文:

Pandas - Applying operation to dataframe but skipping over NaN values

问题

以下是翻译好的内容：

所以我有这个 Series 数据，它可以看起来像这样

1         532
2         554
3         NaN
...       ... 
Name: score, Length: 941940, dtype: str

然后我使用 apply(lambda x: pd.Series(list(x) 将其分割成了3列，但对于索引3会抛出错误，因为它是 NaN。我如何使用 apply 来支持 NaN 并像下面这样分割值？

        score_0  score_1  score_2
1       5          3        2
2       5          5        4     
3       NaN        NaN      NaN          
...     ...        ...      ...
[941940 行 x 3 列]

英文:

So I have this Series data that can look like this

1         532
2         554
3         NaN
...       ... 
Name: score, Length: 941940, dtype: str

and I split it into 3 columns on each character using apply(lambda x: pd.Series(list(x), but it throws an error for the index 3 because it's NaN. How do I use apply so that it supports NaN and splits the value like below?

        score_0  score_1  score_2
1       5          3        2
2       5          5        4     
3       NaN        NaN      NaN          
...     ...        ...      ...
[941940 rows x 3 columns]

答案1

得分: 2

另一种可能的解决方案（类似于 @GodIsOne 的解决方案），它使用 regex 来避免在每个数字的开头和结尾进行分割：

s.str.split(r'(?<=\d)(?=\d)', expand=True)

输出：

     0    1    2
0    5    3    2
1    5    5    4
2  NaN  NaN  NaN

英文:

Another possible solution (similar to @GodIsOne's), which uses regex to avoid splitting at the beginning and the end of each number:

s.str.split(r&#39;(?&lt;=\d)(?=\d)&#39;, expand=True)

Output:

     0    1    2
0    5    3    2
1    5    5    4
2  NaN  NaN  NaN

答案2

得分: 1

你可以使用.str.split("", expand=True)将每个字符分割成单独的列。这会包含NaN值：

# 分割每个字符，删除空列
df = ser.str.split("", expand=True).iloc[:, 1:-1]
# 重命名列
df.columns = [f"score_{i}" for i in range(len(df.columns))]

关于为什么会创建额外列的解释可以在这里的答案中找到：这里。

英文:

You can use .str.split("", expand=True) to split each character into separate columns. And this passes over NaN values:

# split each character, delete first and last columns of empties.
df = ser.str.split(&quot;&quot;, expand=True).iloc[:, 1:-1]
# rename columns
df.columns = [f&quot;score_{i}&quot; for i in range(len(df.columns))]

An explanation for why there are additional columns created can be found in answers here.

答案3

得分: 0

I will translate the code portions for you:

import numpy as np
s = pd.Series(['532', '554', np.nan])

print(s)

0    532
1    554
2    NaN
dtype: object

k = s.str.split("", expand=True).fillna(np.nan)

将上面的代码翻译为：

import numpy as np
s = pd.Series(['532', '554', np.nan])

print(s)

0    532
1    554
2    NaN
dtype: object

k = s.str.split("", expand=True).fillna(np.nan)

And the following part:

To omit first and last column:

k = k.iloc[:, 1:-1]
print(k)
    1    2    3
0    5    3    2
1    5    5    4
2  NaN  NaN  NaN

将下面的部分翻译为：

要省略第一列和最后一列：

k = k.iloc[:, 1:-1]
print(k)
    1    2    3
0    5    3    2
1    5    5    4
2  NaN  NaN  NaN

Lastly:

k.columns = ["score_{}".format(i) for i in range(len(k.columns))]

print(k)

    score_0    score_1    score_2
0    5         3          2
1    5         5          4
2    NaN     NaN      NaN

最后一部分翻译为：

k.columns = ["score_{}".format(i) for i in range(len(k.columns))]

print(k)

    score_0    score_1    score_2
0    5         3          2
1    5         5          4
2    NaN     NaN      NaN

英文:

I will take a minimum example:

import numpy as np
s=pd.Series([&#39;532&#39;,&#39;554&#39;,np.nan])

print(s)

0    532
1    554
2    NaN
dtype: object

k=s.str.split(&quot;&quot;,expand=True).fillna(np.nan)

To omit first and last column:

k=k.iloc[:, 1:-1]
print(k)
	1	2	3
0	5	3	2
1	5	5	4
2	NaN	NaN	NaN

k.columns = [&quot;score_{}&quot;.format(i) for i in range(len(k.columns))]

print(k)

	score_0	score_1	score_2
0	5	3	2
1	5	5	4
2	NaN	NaN	NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas – 在数据框上应用操作，但跳过 NaN 值。

问题

答案1

答案2

答案3

Type hint一个SQLAlchemy 2声明式模型

如何检查%s是否为四位数？

PDF生成从图像列表中太慢 – Python

如何创建一个交互式窗口，其中显示图像变化？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论