Pandas – 在数据框上应用操作,但跳过 NaN 值。

huangapple go评论89阅读模式
英文:

Pandas - Applying operation to dataframe but skipping over NaN values

问题

以下是翻译好的内容:

所以我有这个 Series 数据,它可以看起来像这样

1         532
2         554
3         NaN
...       ... 
Name: score, Length: 941940, dtype: str

然后我使用 apply(lambda x: pd.Series(list(x) 将其分割成了3列,但对于索引3会抛出错误,因为它是 NaN。我如何使用 apply 来支持 NaN 并像下面这样分割值?

        score_0  score_1  score_2
1       5          3        2
2       5          5        4     
3       NaN        NaN      NaN          
...     ...        ...      ...
[941940 行 x 3 列]
英文:

So I have this Series data that can look like this

1         532
2         554
3         NaN
...       ... 
Name: score, Length: 941940, dtype: str

and I split it into 3 columns on each character using apply(lambda x: pd.Series(list(x), but it throws an error for the index 3 because it's NaN. How do I use apply so that it supports NaN and splits the value like below?

        score_0  score_1  score_2
1       5          3        2
2       5          5        4     
3       NaN        NaN      NaN          
...     ...        ...      ...
[941940 rows x 3 columns]

答案1

得分: 2

另一种可能的解决方案(类似于 @GodIsOne 的解决方案),它使用 regex 来避免在每个数字的开头和结尾进行分割:

s.str.split(r'(?<=\d)(?=\d)', expand=True)

输出:

     0    1    2
0    5    3    2
1    5    5    4
2  NaN  NaN  NaN
英文:

Another possible solution (similar to @GodIsOne's), which uses regex to avoid splitting at the beginning and the end of each number:

s.str.split(r&#39;(?&lt;=\d)(?=\d)&#39;, expand=True)

Output:

     0    1    2
0    5    3    2
1    5    5    4
2  NaN  NaN  NaN

答案2

得分: 1

你可以使用.str.split("", expand=True)将每个字符分割成单独的列。这会包含NaN值:

# 分割每个字符,删除空列
df = ser.str.split("", expand=True).iloc[:, 1:-1]
# 重命名列
df.columns = [f"score_{i}" for i in range(len(df.columns))]

关于为什么会创建额外列的解释可以在这里的答案中找到:这里

英文:

You can use .str.split(&quot;&quot;, expand=True) to split each character into separate columns. And this passes over NaN values:

# split each character, delete first and last columns of empties.
df = ser.str.split(&quot;&quot;, expand=True).iloc[:, 1:-1]
# rename columns
df.columns = [f&quot;score_{i}&quot; for i in range(len(df.columns))]

An explanation for why there are additional columns created can be found in answers here.

答案3

得分: 0

I will translate the code portions for you:

import numpy as np
s = pd.Series(['532', '554', np.nan])

print(s)

0    532
1    554
2    NaN
dtype: object

k = s.str.split("", expand=True).fillna(np.nan)

将上面的代码翻译为:

import numpy as np
s = pd.Series(['532', '554', np.nan])

print(s)

0    532
1    554
2    NaN
dtype: object

k = s.str.split("", expand=True).fillna(np.nan)

And the following part:

To omit first and last column:

k = k.iloc[:, 1:-1]
print(k)
    1    2    3
0    5    3    2
1    5    5    4
2  NaN  NaN  NaN

将下面的部分翻译为:

要省略第一列和最后一列

k = k.iloc[:, 1:-1]
print(k)
    1    2    3
0    5    3    2
1    5    5    4
2  NaN  NaN  NaN

Lastly:

k.columns = ["score_{}".format(i) for i in range(len(k.columns))]

print(k)

    score_0    score_1    score_2
0    5         3          2
1    5         5          4
2    NaN     NaN      NaN

最后一部分翻译为:

k.columns = ["score_{}".format(i) for i in range(len(k.columns))]

print(k)

    score_0    score_1    score_2
0    5         3          2
1    5         5          4
2    NaN     NaN      NaN
英文:

I will take a minimum example:

import numpy as np
s=pd.Series([&#39;532&#39;,&#39;554&#39;,np.nan])

print(s)

0    532
1    554
2    NaN
dtype: object

k=s.str.split(&quot;&quot;,expand=True).fillna(np.nan)

To omit first and last column:

k=k.iloc[:, 1:-1]
print(k)
	1	2	3
0	5	3	2
1	5	5	4
2	NaN	NaN	NaN

k.columns = [&quot;score_{}&quot;.format(i) for i in range(len(k.columns))]

print(k)

	score_0	score_1	score_2
0	5	3	2
1	5	5	4
2	NaN	NaN	NaN

huangapple
  • 本文由 发表于 2023年6月15日 03:42:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76477043.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定