2023年6月26日 16:21:03go评论114阅读模式

英文:

Joining 2 tables and creating corresponding columns

问题

我有两个如下定义的数据框：

from numpy import nan
import pandas as pd
df1 = pd.DataFrame({'ID': {0: 'A', 1: 'B', 2: 'C'}, 'Description': {0: 'Apple', 1: 'Book', 2: 'Cat'}})
df2 = pd.DataFrame({'Name': {0: 'David', 1: 'Ken'},'ID1': {0: 'A', 1: 'B'}, 'ID2': {0: 'C', 1: nan}, 'ID3': {0: 'B', 1: 'C'}})

我想要将表1的ID与表2的ID1、ID2和ID3连接，并添加相应的列DESC1、DESC2和DESC3：

  Name  ID1  DESC1  ID2  DESC2  ID3  DESC3
0 David   A   Apple    C    Cat    B   Book
1   Ken   B   Book   NaN   Null    C   Cat

考虑要高效地执行此操作，但不知道如何操作。有什么建议吗？

英文:

I have 2 DataFrames as defined below:

from numpy import nan
import pandas as pd
df1 = pd.DataFrame({&#39;ID&#39;: {0: &#39;A&#39;, 1: &#39;B&#39;, 2: &#39;C&#39;}, &#39;Description&#39;: {0: &#39;Apple&#39;, 1: &#39;Book&#39;, 2: &#39;Cat&#39;}})
df2 = pd.DataFrame({&#39;Name&#39;: {0: &#39;David&#39;, 1: &#39;Ken&#39;},&#39;ID1&#39;: {0: &#39;A&#39;, 1: &#39;B&#39;}, &#39;ID2&#39;: {0: &#39;C&#39;, 1: nan}, &#39;ID3&#39;: {0: &#39;B&#39;, 1: &#39;C&#39;}})
# df1
  ID Description
0  A       Apple
1  B        Book
2  C         Cat
# df2
    Name    ID1  ID2  ID3
0   David   A    C    B
1   Ken     B   NaN   C

I want to join ID of table1 to ID1, ID2, ID3 of table 2 and add corresponding columns DESC1, DESC2, DESC3: <br>

  Name    ID1  DESC1  ID2  DESC2 ID3  DESC3
0 David   A    Apple   C    Cat   B   Book
1 Ken     B    Book    NaN  Null  C   Cat

Thinking about a for-loop to do that efficiently but do not know how to do about. Any suggestions would be appreciated!

答案1

得分: 0

以下是代码的翻译部分：

假设使用 [tag:pandas]，首先 [`melt`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.melt.html)，然后 [`merge`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html) 和 [`pivot`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html)：

out = (
df2.melt(ignore_index=False, var_name='col', value_name='ID')
.assign(col=lambda d: d['col'].str.extract(r'(\d+)'))
.reset_index()
.merge(df1.rename(columns={'Description': 'DESC'}), how='left')
.pivot(index='index', columns='col')
.sort_index(axis=1, level=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
)


或者使用简单的循环和 [`concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html)：

out = pd.concat([df2[[col]].rename(columns={col: 'ID'}).merge(df1)
.add_suffix(col.removeprefix('ID'))
for col in df2], axis=1)


输出：

  ID1  DESC1  ID2 DESC2 ID3 DESC3

index
0 A Apple C Cat B Book
1 B Book NaN NaN C Cat


处理额外的列：

cols = ['Name']

out = (
df2.rename_axis('index').set_index(cols, append=True)
.melt(ignore_index=False, var_name='col', value_name='ID')
.assign(col=lambda d: d['col'].str.extract(r'(\d+)'))
.reset_index()
.merge(df1.rename(columns={'Description': 'DESC'}), how='left')
.pivot(index=['index']+cols, columns='col')
.sort_index(axis=1, level=1, sort_remaining=False)
.pipe(lambda d: d.set_axis(d.columns.map(''.join), axis=1))
.reset_index(cols)
)


使用 `concat`：

out = df2[cols].join(
pd.concat([df2[[col]].rename(columns={col: 'ID'}).merge(df1)
.add_suffix(col.removeprefix('ID'))
for col in df2.drop(columns=cols)], axis=1)
)


输出：

    Name ID1  DESC1  ID2 DESC2 ID3 DESC3

index
0 David A Apple C Cat B Book
1 Ken B Book NaN NaN C Cat

英文:

Assuming [tag:pandas], first melt, then merge and pivot:

out = (
 df2.melt(ignore_index=False, var_name=&#39;col&#39;, value_name=&#39;ID&#39;)
    .assign(col=lambda d: d[&#39;col&#39;].str.extract(r&#39;(\d+)&#39;))
    .reset_index()
    .merge(df1.rename(columns={&#39;Description&#39;: &#39;DESC&#39;}), how=&#39;left&#39;)
    .pivot(index=&#39;index&#39;, columns=&#39;col&#39;)
    .sort_index(axis=1, level=1, sort_remaining=False)
    .pipe(lambda d: d.set_axis(d.columns.map(&#39;&#39;.join), axis=1))
)

Or with a simple loop and concat:

out = pd.concat([df2[[col]].rename(columns={col: &#39;ID&#39;}).merge(df1)
                           .add_suffix(col.removeprefix(&#39;ID&#39;))
                 for col in df2], axis=1)

Output:

      ID1  DESC1  ID2 DESC2 ID3 DESC3
index                                
0       A  Apple    C   Cat   B  Book
1       B   Book  NaN   NaN   C   Cat

handling extra columns

Add the other columns to the index temporarily:

cols = [&#39;Name&#39;]
out = (
 df2.rename_axis(&#39;index&#39;).set_index(cols, append=True)
    .melt(ignore_index=False, var_name=&#39;col&#39;, value_name=&#39;ID&#39;)
    .assign(col=lambda d: d[&#39;col&#39;].str.extract(r&#39;(\d+)&#39;))
    .reset_index()
    .merge(df1.rename(columns={&#39;Description&#39;: &#39;DESC&#39;}), how=&#39;left&#39;)
    .pivot(index=[&#39;index&#39;]+cols, columns=&#39;col&#39;)
    .sort_index(axis=1, level=1, sort_remaining=False)
    .pipe(lambda d: d.set_axis(d.columns.map(&#39;&#39;.join), axis=1))
    .reset_index(cols)
)

With concat:

out = df2[cols].join(
       pd.concat([df2[[col]].rename(columns={col: &#39;ID&#39;}).merge(df1)
                            .add_suffix(col.removeprefix(&#39;ID&#39;))
                  for col in df2.drop(columns=cols)], axis=1)
      )

Output:

        Name ID1  DESC1  ID2 DESC2 ID3 DESC3
index                                       
0      David   A  Apple    C   Cat   B  Book
1        Ken   B   Book  NaN   NaN   C   Cat

答案2

得分: 0

要获得所需的输出，您可以直接创建一个字典（我们称之为 dic），其中包含键和值，如 'A': '苹果'。然后，遍历 ID 字典，并将键替换为它们对应的值以创建 desc 字典。最后，将ID和描述字典合并为一个。在这个解释中，Table1 和 Table2 分别指的是 a 和 b。

代码：

dic = {value: a[&#39;Description&#39;][key] for key, value in a[&#39;ID&#39;].items()}    #output : {&#39;A&#39;: &#39;Apple&#39;, &#39;B&#39;: &#39;Book&#39;, &#39;C&#39;: &#39;Cat&#39;}
{**b, **{key.replace(&#39;ID&#39;, &#39;DESC&#39;) : {k: dic.get(v, &#39;null&#39;) for k, v in val.items()} for key, val in b.items()}}

输出：

{&#39;ID1&#39;: {0: &#39;A&#39;, 1: &#39;B&#39;},
 &#39;ID2&#39;: {0: &#39;C&#39;, 1: nan},
 &#39;ID3&#39;: {0: &#39;B&#39;, 1: &#39;C&#39;},
 &#39;DESC1&#39;: {0: &#39;苹果&#39;, 1: &#39;书&#39;},
 &#39;DESC2&#39;: {0: &#39;猫&#39;, 1: &#39;null&#39;},
 &#39;DESC3&#39;: {0: &#39;书&#39;, 1: &#39;猫&#39;}}

英文:

To obtain the desired output, you can directly create a dictionary (let's call it dic) with keys and values like 'A': 'Apple'. Then, iterate through the ID dictionary and replace the keys with their corresponding values to create the desc dictionary. Finally, merge the ID and description dictionaries into one. In this explanation, Table1 and Table2 refer to a and b respectively.

Code:

dic = {value: a[&#39;Description&#39;][key] for key, value in a[&#39;ID&#39;].items()}    #output : {&#39;A&#39;: &#39;Apple&#39;, &#39;B&#39;: &#39;Book&#39;, &#39;C&#39;: &#39;Cat&#39;}
{**b, **{key.replace(&#39;ID&#39;, &#39;DESC&#39;) : {k: dic.get(v, &#39;null&#39;) for k, v in val.items()} for key, val in b.items()}}

Output:

{&#39;ID1&#39;: {0: &#39;A&#39;, 1: &#39;B&#39;},
 &#39;ID2&#39;: {0: &#39;C&#39;, 1: nan},
 &#39;ID3&#39;: {0: &#39;B&#39;, 1: &#39;C&#39;},
 &#39;DESC1&#39;: {0: &#39;Apple&#39;, 1: &#39;Book&#39;},
 &#39;DESC2&#39;: {0: &#39;Cat&#39;, 1: &#39;null&#39;},
 &#39;DESC3&#39;: {0: &#39;Book&#39;, 1: &#39;Cat&#39;}}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

连接两个表格并创建相应的列

问题

答案1

handling extra columns

答案2

如何解决Python tkinter中的这个msg.showerror问题？

Python multiprocessing: sharing global real-only large data without reloading from disk for child processes

按月计算登录时长

pandas DataFrame 查询在使用 where 时不起作用。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论