将字符串转换为数据帧中的字典

huangapple go评论100阅读模式
英文:

Convert string to dictionary in a dataframe

问题

我有一个类似这样的数据框:

  1. df = pd.DataFrame({'col_1': ['1', '2', '3', '4'],
  2. 'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'f:g,h:i,j:']
  3. })

col_2 的数据类型目前是字符串。我想从 col_2 中提取第一个键和第一个值,分别放在 col_3col_4 中。所以输出应该如下:

  1. pd.DataFrame({'col_1': ['a', 'b', 'c', 'd'],
  2. 'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'f:g,h:i,j:'],
  3. 'col_3': ['a', '', 'w', 'f'],
  4. 'col_4': ['b', 'v', '', 'g']
  5. })

到目前为止,我已经做了这个:

  1. df['col_3'] = df['col_2'].apply(lambda x: x.split(":")[0])
  2. df['col_4'] = df['col_2'].apply(lambda x: x.split(":")[1])

但这显然不起作用,因为它不是一个字典。

英文:

I have a dataframe that looks like this

  1. df = pd.DataFrame({'col_1': ['1', '2', '3', '4'],
  2. 'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'f:g,h:i,j:']
  3. })

Datatype of col_2 is currently string. I want to extract the first key and first value from col_2 as col_3 and col_4 respectively. So the output should look like

  1. pd.DataFrame({'col_1': ['a', 'b', 'c', 'd'],
  2. 'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'f:g,h:i,j:'],
  3. 'col_3': ['a','','w','f'],
  4. 'col_4': ['b','v','','g']
  5. })

Here is what i have done so far is this

  1. df['col_3'] = df['col_2'].apply(lambda x: x.split(":")[0])
  2. df['col_4'] = df['col_2'].apply(lambda x: x.split(":")[1])

But this obviously doesn't work because its not a dictionary.

答案1

得分: 2

这是一个适合使用正则表达式和 str.extract 的良好任务:

  1. df[['col_3', 'col_4']] = df['col_2'].str.extract(r'^([^:,]*):([^:,]*)')

输出:

  1. col_1 col_2 col_3 col_4
  2. 0 1 a:b,c:d a b
  3. 1 2 :v v
  4. 2 3 w:,x:y w
  5. 3 4 f:g,h:i,j: f g

正则表达式示例

英文:

This is a good job for a regex and str.extract:

  1. df[['col_3', 'col_4']] = df['col_2'].str.extract(r'^([^:,]*):([^:,]*)')

Output:

  1. col_1 col_2 col_3 col_4
  2. 0 1 a:b,c:d a b
  3. 1 2 :v v
  4. 2 3 w:,x:y w
  5. 3 4 f:g,h:i,j: f g

regex demo

答案2

得分: 2

另一种使用字符串方法的选项:

  1. df[["col_3", "col_4"]] = df["col_2"].str.split(",", n=1).str[0].str.split(":", expand=True)

结果:

  1. col_1 col_2 col_3 col_4
  2. 0 1 a:b,c:d a b
  3. 1 2 :v v
  4. 2 3 w:,x:y w
  5. 3 4 f:g,h:i,j: f g
英文:

Another option with string methods:

  1. df[["col_3", "col_4"]] = df["col_2"].str.split(",", n=1).str[0].str.split(":", expand=True)

Result:

  1. col_1 col_2 col_3 col_4
  2. 0 1 a:b,c:d a b
  3. 1 2 :v v
  4. 2 3 w:,x:y w
  5. 3 4 f:g,h:i,j: f g

huangapple
  • 本文由 发表于 2023年2月23日 21:51:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75545718.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定