将字符串转换为数据帧中的字典

huangapple go评论56阅读模式
英文:

Convert string to dictionary in a dataframe

问题

我有一个类似这样的数据框:

df = pd.DataFrame({'col_1': ['1', '2', '3', '4'],
                   'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'f:g,h:i,j:']
                   })

col_2 的数据类型目前是字符串。我想从 col_2 中提取第一个键和第一个值,分别放在 col_3col_4 中。所以输出应该如下:

pd.DataFrame({'col_1': ['a', 'b', 'c', 'd'],
              'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'f:g,h:i,j:'],
              'col_3': ['a', '', 'w', 'f'],
              'col_4': ['b', 'v', '', 'g']
               })

到目前为止,我已经做了这个:

df['col_3'] = df['col_2'].apply(lambda x: x.split(":")[0])
df['col_4'] = df['col_2'].apply(lambda x: x.split(":")[1])

但这显然不起作用,因为它不是一个字典。

英文:

I have a dataframe that looks like this

df = pd.DataFrame({'col_1': ['1', '2', '3', '4'],
                   'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'f:g,h:i,j:']
                   })

Datatype of col_2 is currently string. I want to extract the first key and first value from col_2 as col_3 and col_4 respectively. So the output should look like

pd.DataFrame({'col_1': ['a', 'b', 'c', 'd'],
              'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'f:g,h:i,j:'],
              'col_3': ['a','','w','f'],
              'col_4': ['b','v','','g']
               })

Here is what i have done so far is this

df['col_3'] = df['col_2'].apply(lambda x: x.split(":")[0])
df['col_4'] = df['col_2'].apply(lambda x: x.split(":")[1])

But this obviously doesn't work because its not a dictionary.

答案1

得分: 2

这是一个适合使用正则表达式和 str.extract 的良好任务:

df[['col_3', 'col_4']] = df['col_2'].str.extract(r'^([^:,]*):([^:,]*)')

输出:

  col_1       col_2 col_3 col_4
0     1     a:b,c:d     a     b
1     2          :v           v
2     3      w:,x:y     w      
3     4  f:g,h:i,j:     f     g

正则表达式示例

英文:

This is a good job for a regex and str.extract:

df[['col_3', 'col_4']] = df['col_2'].str.extract(r'^([^:,]*):([^:,]*)')

Output:

  col_1       col_2 col_3 col_4
0     1     a:b,c:d     a     b
1     2          :v           v
2     3      w:,x:y     w      
3     4  f:g,h:i,j:     f     g

regex demo

答案2

得分: 2

另一种使用字符串方法的选项:

df[["col_3", "col_4"]] = df["col_2"].str.split(",", n=1).str[0].str.split(":", expand=True)

结果:

  col_1       col_2 col_3 col_4
0     1     a:b,c:d     a     b
1     2          :v           v
2     3      w:,x:y     w      
3     4  f:g,h:i,j:     f     g
英文:

Another option with string methods:

df[["col_3", "col_4"]] = df["col_2"].str.split(",", n=1).str[0].str.split(":", expand=True)

Result:

  col_1       col_2 col_3 col_4
0     1     a:b,c:d     a     b
1     2          :v           v
2     3      w:,x:y     w      
3     4  f:g,h:i,j:     f     g

huangapple
  • 本文由 发表于 2023年2月23日 21:51:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75545718.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定