英文:
Pandas - Applying operation to dataframe but skipping over NaN values
问题
以下是翻译好的内容:
所以我有这个 Series 数据,它可以看起来像这样
1 532
2 554
3 NaN
... ...
Name: score, Length: 941940, dtype: str
然后我使用 apply(lambda x: pd.Series(list(x)
将其分割成了3列,但对于索引3会抛出错误,因为它是 NaN
。我如何使用 apply
来支持 NaN 并像下面这样分割值?
score_0 score_1 score_2
1 5 3 2
2 5 5 4
3 NaN NaN NaN
... ... ... ...
[941940 行 x 3 列]
英文:
So I have this Series data that can look like this
1 532
2 554
3 NaN
... ...
Name: score, Length: 941940, dtype: str
and I split it into 3 columns on each character using apply(lambda x: pd.Series(list(x)
, but it throws an error for the index 3 because it's NaN
. How do I use apply
so that it supports NaN and splits the value like below?
score_0 score_1 score_2
1 5 3 2
2 5 5 4
3 NaN NaN NaN
... ... ... ...
[941940 rows x 3 columns]
答案1
得分: 2
另一种可能的解决方案(类似于 @GodIsOne 的解决方案),它使用 regex
来避免在每个数字的开头和结尾进行分割:
s.str.split(r'(?<=\d)(?=\d)', expand=True)
输出:
0 1 2
0 5 3 2
1 5 5 4
2 NaN NaN NaN
英文:
Another possible solution (similar to @GodIsOne's), which uses regex
to avoid splitting at the beginning and the end of each number:
s.str.split(r'(?<=\d)(?=\d)', expand=True)
Output:
0 1 2
0 5 3 2
1 5 5 4
2 NaN NaN NaN
答案2
得分: 1
你可以使用.str.split("", expand=True)
将每个字符分割成单独的列。这会包含NaN
值:
# 分割每个字符,删除空列
df = ser.str.split("", expand=True).iloc[:, 1:-1]
# 重命名列
df.columns = [f"score_{i}" for i in range(len(df.columns))]
关于为什么会创建额外列的解释可以在这里的答案中找到:这里。
英文:
You can use .str.split("", expand=True)
to split each character into separate columns. And this passes over NaN
values:
# split each character, delete first and last columns of empties.
df = ser.str.split("", expand=True).iloc[:, 1:-1]
# rename columns
df.columns = [f"score_{i}" for i in range(len(df.columns))]
An explanation for why there are additional columns created can be found in answers here.
答案3
得分: 0
I will translate the code portions for you:
import numpy as np
s = pd.Series(['532', '554', np.nan])
print(s)
0 532
1 554
2 NaN
dtype: object
k = s.str.split("", expand=True).fillna(np.nan)
将上面的代码翻译为:
import numpy as np
s = pd.Series(['532', '554', np.nan])
print(s)
0 532
1 554
2 NaN
dtype: object
k = s.str.split("", expand=True).fillna(np.nan)
And the following part:
To omit first and last column:
k = k.iloc[:, 1:-1]
print(k)
1 2 3
0 5 3 2
1 5 5 4
2 NaN NaN NaN
将下面的部分翻译为:
要省略第一列和最后一列:
k = k.iloc[:, 1:-1]
print(k)
1 2 3
0 5 3 2
1 5 5 4
2 NaN NaN NaN
Lastly:
k.columns = ["score_{}".format(i) for i in range(len(k.columns))]
print(k)
score_0 score_1 score_2
0 5 3 2
1 5 5 4
2 NaN NaN NaN
最后一部分翻译为:
k.columns = ["score_{}".format(i) for i in range(len(k.columns))]
print(k)
score_0 score_1 score_2
0 5 3 2
1 5 5 4
2 NaN NaN NaN
英文:
I will take a minimum example:
import numpy as np
s=pd.Series(['532','554',np.nan])
print(s)
0 532
1 554
2 NaN
dtype: object
k=s.str.split("",expand=True).fillna(np.nan)
To omit first and last column:
k=k.iloc[:, 1:-1]
print(k)
1 2 3
0 5 3 2
1 5 5 4
2 NaN NaN NaN
k.columns = ["score_{}".format(i) for i in range(len(k.columns))]
print(k)
score_0 score_1 score_2
0 5 3 2
1 5 5 4
2 NaN NaN NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论