提取字符串列中的子字符串并将它们放入一个列表中。

huangapple go评论64阅读模式
英文:

Extract substrings from a column of strings and place them in a list

问题

从列b中,对于每个项目,我需要提取第一个空格之前的子字符串。因此,我需要以下结果:

list_of_strings = [abc, abd1, abce, abe]

英文:

I have the following data frame:

   a    b             x  
0  id1  abc 123 tr    2  
1  id2  abd1 124 tr   6 
2  id3  abce 126 af   9 
3  id4  abe 128 nm    12 

From column b, for each item, I need to extract the substrings before the first space. Hence, I need the following result:

list_of_strings = [abc, abd1, abce, abe]

Please advise

答案1

得分: 2

使用正则表达式 `^\S+`(以非空格字符开头)和 [`str.extract`](http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.extract.html):

df['b'].str.extract(r'^(\S+)', expand=False)

输出:

0 abc
1 abd1
2 abce
3 abe
Name: b, dtype: object

对于一个列表:

list_of_strings = df['b'].str.extract(r'^(\S+)', expand=False).tolist()

['abc', 'abd1', 'abce', 'abe']


[正则表达式演示](https://regex101.com/r/R4BgiT/1)
英文:

Use a regex with ^\S+ (non-space characters anchored to the start of string) and str.extract:

df['b'].str.extract(r'^(\S+)', expand=False)

Output:

0     abc
1    abd1
2    abce
3     abe
Name: b, dtype: object

For a list:

list_of_strings = df['b'].str.extract(r'^(\S+)', expand=False).tolist()
# ['abc', 'abd1', 'abce', 'abe']

regex demo

huangapple
  • 本文由 发表于 2023年5月24日 22:50:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76324810.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定