如何从数据框中获取多个部分字符串?

huangapple go评论89阅读模式
英文:

How to obtain multiple partial strings from a dataframe?

问题

我正在尝试从我的数据框中获取多个部分字符串,并将这些部分字符串作为添加列放入我的数据框。下面是一个简单的数据样本:

我想获得以下数据框:

英文:

I am trying to obtain multiple partial strings from my dataframe and put those partial strings as added columns to my dataframe. Below you will find a simple data sample:

df

	Serienummer
15	SAA VKS MSI A1 R 7,500 1
29	SAA VKS MSI A1P 7,500 1
36	SAA VKS MSI A1 R 14,370 5

I want to obtain the following dataframe:

	Serienummer                  column1    column2  column3
15	SAA VKS MSI A1 R 7,500 1     A1         7,500    1
29	SAA VKS MSI A2P 7,500 1      A2         7,500    1
36	SAA VKS MSI A1 R 14,370 5    A1         14,370   5

Any help is appriciated.

答案1

得分: 0

使用pd.Series.str.extract(用于提取正则表达式模式组)和pd.concat

new_df = (pd.concat([df, df['Serienummer'].str
                    .extract(r'(\b[A-Z]\d+)[\sA-Z]+(\d+,\d+) (\d+)', expand=True)], 
                    axis=1))

   Serienummer   0       1  2
0   SAA VKS MSI A1 R 7,500 1  A1   7,500  1
1    SAA VKS MSI A1P 7,500 1  A1   7,500  1
2  SAA VKS MSI A1 R 14,370 5  A1  14,370  5
英文:

With pd.Series.str.extract (to extract regex pattern groups) and pd.concat:

new_df = (pd.concat([df, df['Serienummer'].str
                    .extract(r'(\b[A-Z]\d+)[\sA-Z]+(\d+,\d+) (\d+)', expand=True)], 
                    axis=1))

                 Serienummer   0       1  2
0   SAA VKS MSI A1 R 7,500 1  A1   7,500  1
1    SAA VKS MSI A1P 7,500 1  A1   7,500  1
2  SAA VKS MSI A1 R 14,370 5  A1  14,370  5

huangapple
  • 本文由 发表于 2023年7月3日 21:47:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605366.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定