合并具有数组的数据框。

huangapple go评论75阅读模式
英文:

Merge dataframes having array

问题

isActive,trackedSearchId,Primary Keyword Group(s)
True,53436615,SEO - Directory-Deployment&SEO - Other-Glossary
True,53434228,SEO - Other
True,53434229,SEO - Glossary
英文:

I have two data frames.

DF1

isActive,trackedSearchId
True,53436615
True,53434228
True,53434229

DF2

trackedSearchIds,Primary Keyword Group(s)
"[53436613, 53436615, 53437436, 53436506]",SEO - Directory-Deployment
"[53435887, 53437509, 53437441, 53436615, 53438685, 53437636]",SEO - Other-Glossary
"[53437504, 53435090, 53435887, 53434228]",SEO - Other
"[53437504, 53435090, 53434229]",SEO - Glossary

I want to check for each row of DF1 for column trackedSearchId and check with each row in DF2 with each value of trackedSearchIds array and if the value of DF1 column is present in DF2 append it with DF1.

The output should look like:

isActive,trackedSearchId,Primary Keyword Group(s)
True,53436615,SEO - Directory-Deployment&SEO - Other-Glossary
True,53434228,SEO - Other
True,53434229,SEO - Glossary

答案1

得分: 1

Sure, here is the translated code part:

import ast

df1['trackedSearchId'] = df1['trackedSearchId'].astype(str)

out = df1.merge((df2.assign(trackedSearchIds=df2['trackedSearchIds'].str.findall(r'(\d+)')).explode('trackedSearchIds')
                    .groupby('trackedSearchIds').agg('&'.join)
               ),
               left_on='trackedSearchId', right_on='trackedSearchIds')

And here is the translated output:

   isActive trackedSearchId                         Primary Keyword Group(s)
0      True        53436615  SEO - Directory-Deployment&SEO - Other-Glossary
1      True        53434228                                      SEO - Other
2      True        53434229                                   SEO - Glossary

Please note that I have retained the code and output formatting as it is.

英文:

Assuming strings representations of lists in df2, use ast.literal_eval, then groupby.agg, explode, and merge:

import ast

df1['trackedSearchId'] = df1['trackedSearchId'].astype(str)

out = df1.merge((df2.assign(trackedSearchIds=df2['trackedSearchIds'].str.findall(r'(\d+)')).explode('trackedSearchIds')
                    .groupby('trackedSearchIds').agg('&'.join)
               ),
               left_on='trackedSearchId', right_on='trackedSearchIds')

Output:

   isActive trackedSearchId                         Primary Keyword Group(s)
0      True        53436615  SEO - Directory-Deployment&SEO - Other-Glossary
1      True        53434228                                      SEO - Other
2      True        53434229                                   SEO - Glossary

答案2

得分: 1

You can use DataFrame.explode 来展开已转换为列表的值,如果必要的话,使用 ast.literal_eval 进行转换,使用 GroupBy.agg 中的 join 进行聚合,在 DataFrame.join 中添加新列:

import ast

df = (df1.join(df2.assign(trackedSearchIds=df2['trackedSearchIds'].apply(ast.literal_eval))
           .explode('trackedSearchIds')
           .groupby('trackedSearchIds')['Primary Keyword Group(s)']
           .agg('&'.join), on='trackedSearchId'))
print(df)
   isActive  trackedSearchId                         Primary Keyword Group(s)
0      True         53436615  SEO - Directory-Deployment&SEO - Other-Glossary
1      True         53434228                                      SEO - Other
2      True         53434229                                   SEO - Glossary

如果值是列表:

df = (df1.join(df2
           .explode('trackedSearchIds')
           .groupby('trackedSearchIds')['Primary Keyword Group(s)']
           .agg('&'.join), on='trackedSearchId'))
英文:

You can use DataFrame.explode of converted values to list by ast.literal_eval if necessary, aggregate join in GroupBy.agg and append new column in DataFrame.join:

import ast

df = (df1.join(df2.assign(trackedSearchIds=df2['trackedSearchIds'].apply(ast.literal_eval))
           .explode('trackedSearchIds')
           .groupby('trackedSearchIds')['Primary Keyword Group(s)']
           .agg('&'.join), on='trackedSearchId'))
print (df)
   isActive  trackedSearchId                         Primary Keyword Group(s)
0      True         53436615  SEO - Directory-Deployment&SEO - Other-Glossary
1      True         53434228                                      SEO - Other
2      True         53434229                                   SEO - Glossary

If values are lists:

df = (df1.join(df2
           .explode('trackedSearchIds')
           .groupby('trackedSearchIds')['Primary Keyword Group(s)']
           .agg('&'.join), on='trackedSearchId'))

huangapple
  • 本文由 发表于 2023年4月13日 18:03:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004164.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定