2023年2月24日 03:27:47go评论96阅读模式

英文:

Merging dataframes based on pairs

问题

我有一个数据框，看起来像这样：

df = pd.DataFrame({'col_1': ['1', '2', '3', '4'],
                   'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'a:g,h:b,j:']
                   })

col_2的数据类型是字符串，所以我们必须进行字符串操作/正则表达式处理。

我还有另一个数据框，它包含了col_2中键值对的映射。它看起来像这样：

df1 = pd.DataFrame({'col_1': ['a', 'c', '', 'w', 'x', 'a', 'h', 'j','t'],
                    'col_2': ['b', 'd', 'v', '', 'y', 'g', 'b', '', 'g'],
                    'col_3': ['aw', 'rt', 'er', 'aa', 'ey', 'wk', 'oo', 'ri', 'ty'],
                    'col_4': ['rt', 'yu', 'gq', 'tr', 'ui', 'pi', 'pw', 'pp', 'uu']
                   })

基本上，a:b 被翻译为 aw:rt，这意味着你不能只通过 a 和 b 来获取 aw 和 rt。

我想要获取与col_2中的键值对对应的col_4中的所有值，所以我希望我的输出是：

pd.DataFrame({'col_1': ['1', '2', '3', '4'],
                   'col_2': ['a:b,c:d', ':v', 'w:,x:y', 'a:g,h:b,j:'],
                   'col_3': ['rt,yu', 'gq', 'tr,ui','pi,pw,pp' ]
                   })

我可以使用以下代码将键值对提取为不同的列：

df[['c1', 'c2']] = df['col_2'].str.extract(r'^([^:,]*):([^:,]*)&')

因此，我可以将所有键值对提取为列，然后进行合并，但这似乎是一种冗长的方法。有没有其他优化的方式？

英文:

I have a dataframe that looks like this:

df = pd.DataFrame({&#39;col_1&#39;: [&#39;1&#39;, &#39;2&#39;, &#39;3&#39;, &#39;4&#39;],
                   &#39;col_2&#39;: [&#39;a:b,c:d&#39;, &#39;:v&#39;, &#39;w:,x:y&#39;, &#39;a:g,h:b,j:&#39;]
                   })

The datatype of col_2 is a string, so we must do string manipulation/regex.

I also have another dataframe that has a mapping between key-value pair from col_2. It looks like this:

df1 = pd.DataFrame({&#39;col_1&#39;: [&#39;a&#39;, &#39;c&#39;, &#39;&#39;, &#39;w&#39;, &#39;x&#39;, &#39;a&#39;, &#39;h&#39;, &#39;j&#39;,&#39;t&#39;],
                    &#39;col_2&#39;: [&#39;b&#39;, &#39;d&#39;, &#39;v&#39;, &#39;&#39;,&#39;y&#39;, &#39;g&#39;, &#39;b&#39;, &#39;&#39;, &#39;g&#39;],
                    &#39;col_3&#39;: [&#39;aw&#39;, &#39;rt&#39;, &#39;er&#39;, &#39;aa&#39;, &#39;ey&#39;, &#39;wk&#39;, &#39;oo&#39;, &#39;ri&#39;, &#39;ty&#39;],
                    &#39;col_4&#39;: [&#39;rt&#39;, &#39;yu&#39;, &#39;gq&#39;, &#39;tr&#39;, &#39;ui&#39;, &#39;pi&#39;, &#39;pw&#39;, &#39;pp&#39;, &#39;uu&#39;]
                   })

basically a:b translated to aw:rt, which means you can't reach aw and rt without both a and b,

I want to get all the values from col_4 corresponding to the key-value pairs in col_2, so i want my output to be

pd.DataFrame({&#39;col_1&#39;: [&#39;1&#39;, &#39;2&#39;, &#39;3&#39;, &#39;4&#39;],
                   &#39;col_2&#39;: [&#39;a:b,c:d&#39;, &#39;:v&#39;, &#39;w:,x:y&#39;, &#39;a:g,h:b,j:&#39;],
                   &#39;col_3&#39;: [&#39;rt,yu&#39;, &#39;gq&#39;, &#39;tr,ui&#39;,&#39;pi,pw,pp&#39; ]
                   })

I am able to extract key, value pair as different columns using

df[[&#39;c1&#39;, &#39;c2&#39;]] = df[&#39;col_2&#39;].str.extract(r&#39;^([^:,]*):([^:,]*)&#39;)

so I can extract all the key-value pairs as columns and then do merge, but it looks like a lengthy route, Any other optimised way?

答案1

得分: 2

我会在这里使用基本的pandas方法。拆分并展开col_2以获得单独的配对，创建从配对到col_4的映射，然后将其映射以替换值。

pairs = df['col_2'].str.split(',').explode()
mapping = df1['col_4'].set_axis(df1['col_1'] + ':' + df1['col_2'])
df['col_3'] = pairs.map(mapping).groupby(level=0).agg(','.join)

英文:

I would use the basic pandas methods here. Split and explode col_2 to get the individual pairs, create a mapping from pairs to col_4 and just map it to replace the values.

pairs = df[&#39;col_2&#39;].str.split(&#39;,&#39;).explode()
mapping = df1[&#39;col_4&#39;].set_axis(df1[&#39;col_1&#39;] + &#39;:&#39; + df1[&#39;col_2&#39;])
df[&#39;col_3&#39;] = pairs.map(mapping).groupby(level=0).agg(&#39;,&#39;.join)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基于配对合并数据框

问题

答案1

launch.json breaks debugging in VSCode

如何使用Pandas按分组计算其他列数值不为零时零值的数量。

使用NumPy进行数字分箱

多进程池 – 使用Pytorch时文件描述符过多

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。