如何从数据框中获取多个部分字符串?

huangapple go评论135阅读模式
英文:

How to obtain multiple partial strings from a dataframe?

问题

我正在尝试从我的数据框中获取多个部分字符串,并将这些部分字符串作为添加列放入我的数据框。下面是一个简单的数据样本:

我想获得以下数据框:

英文:

I am trying to obtain multiple partial strings from my dataframe and put those partial strings as added columns to my dataframe. Below you will find a simple data sample:

df

  1. Serienummer
  2. 15 SAA VKS MSI A1 R 7,500 1
  3. 29 SAA VKS MSI A1P 7,500 1
  4. 36 SAA VKS MSI A1 R 14,370 5

I want to obtain the following dataframe:

  1. Serienummer column1 column2 column3
  2. 15 SAA VKS MSI A1 R 7,500 1 A1 7,500 1
  3. 29 SAA VKS MSI A2P 7,500 1 A2 7,500 1
  4. 36 SAA VKS MSI A1 R 14,370 5 A1 14,370 5

Any help is appriciated.

答案1

得分: 0

使用pd.Series.str.extract(用于提取正则表达式模式组)和pd.concat

  1. new_df = (pd.concat([df, df['Serienummer'].str
  2. .extract(r'(\b[A-Z]\d+)[\sA-Z]+(\d+,\d+) (\d+)', expand=True)],
  3. axis=1))

  1. Serienummer 0 1 2
  2. 0 SAA VKS MSI A1 R 7,500 1 A1 7,500 1
  3. 1 SAA VKS MSI A1P 7,500 1 A1 7,500 1
  4. 2 SAA VKS MSI A1 R 14,370 5 A1 14,370 5
英文:

With pd.Series.str.extract (to extract regex pattern groups) and pd.concat:

  1. new_df = (pd.concat([df, df['Serienummer'].str
  2. .extract(r'(\b[A-Z]\d+)[\sA-Z]+(\d+,\d+) (\d+)', expand=True)],
  3. axis=1))

  1. Serienummer 0 1 2
  2. 0 SAA VKS MSI A1 R 7,500 1 A1 7,500 1
  3. 1 SAA VKS MSI A1P 7,500 1 A1 7,500 1
  4. 2 SAA VKS MSI A1 R 14,370 5 A1 14,370 5

huangapple
  • 本文由 发表于 2023年7月3日 21:47:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605366.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定