2023年1月9日 08:37:55go评论100阅读模式

英文:

Merging two DataFrames with multiple rows for the same key

问题

我有分成两个不同CSV的医疗数据，我需要合并它们。一个数据集包含基本的人口统计信息，第二个包含诊断代码。每个患者都被分配一个唯一的身份识别号码，称为INC_KEY，我已经简化成简单的数字，如下例所示：

df1:

INC_KEY   SEX    AGE
1         F      40
2         F      24  
3         M      66

df2:

INC_KEY   DCODE
1         BW241ZZ
1         BW28ZZZ
2         0BH17EZ
3         05H633Z
2         4A103BD
3         BR30ZZZ	
1         BF42ZZZ

我需要合并这两个数据框，输出应该包含在df1中看到的三行，并为每个与该患者相关的DCODE附加列。像这样：

INC_KEY   SEX    AGE   DCODE1     DCODE2     DCODE3
1         F      40    BW241ZZ    BW28ZZZ    BF42ZZZ
2         F      24    0BH17EZ    4A103BD    N/A
3         M      66    05H633Z    BR30ZZZ    N/A

我该如何操作？我尝试过左连接，但没有得到我想要的结果。

英文:

I have medical data split into two different CSVs, and I need to merge them. One data set contains basic demographic information, and the second contains diagnosis codes. Each patient is assigned a unique identification number called INC_KEY, which I've simplified to simple numbers, as shown in this example:

df1:

INC_KEY   SEX    AGE
1         F      40
2         F      24  
3         M      66

df2:

INC_KEY   DCODE
1         BW241ZZ
1         BW28ZZZ
2         0BH17EZ
3         05H633Z
2         4A103BD
3         BR30ZZZ	
1         BF42ZZZ

I need to merge the two dataframes with the output containing the three rows as seen in df1 with appended columns for each dcode respective to that patient. Like this:

INC_KEY   SEX    AGE   DCODE1     DCODE2     DCODE3
1         F      40    BW241ZZ    BW28ZZZ    BF42ZZZ
2         F      24    0BH17EZ    4A103BD    N/A
3         M      66    05H633Z    BR30ZZZ    N/A

How can I go about this? I've tried to do a left merge but it does not give the result I am looking for.

答案1

得分: 1

你可以使用.merge方法将这两个数据框根据INC_KEY列合并。然后，你可以使用.groupby()和pd.concat()将各个行转换为所需的列。最后，你可以使用.drop()方法删除原始的“DCODE”列：

df = df1.merge(df2, on="INC_KEY", how="right")
df = df.groupby(["INC_KEY", "SEX", "AGE"]).agg({"DCODE": list}).reset_index()
df = pd.concat(
    (df, pd.DataFrame(df["DCODE"].values.tolist()).add_prefix("DCODE")), 
    axis=1
)
df = df.drop("DCODE", axis=1)

这将输出：

   INC_KEY SEX  AGE   DCODE0   DCODE1   DCODE2
0        1   F   40  BW241ZZ  BW28ZZZ  BF42ZZZ
1        2   F   24  0BH17EZ  4A103BD     None
2        3   M   66  05H633Z  BR30ZZZ     None

英文:

You can combine the two dataframes on the INC_KEY column using .merge. Then, you can use .groupby() and pd.concat() to turn individual rows into the desired columns. Finally, you can drop the original "DCODE" column using .drop():

df = df1.merge(df2, on=&quot;INC_KEY&quot;, how=&quot;right&quot;)
df = df.groupby([&quot;INC_KEY&quot;, &quot;SEX&quot;, &quot;AGE&quot;]).agg({&quot;DCODE&quot;: list}).reset_index()
df = pd.concat(
    (df, pd.DataFrame(df[&quot;DCODE&quot;].values.tolist()).add_prefix(&quot;DCODE&quot;)), 
    axis=1
)
df = df.drop(&quot;DCODE&quot;, axis=1)

This outputs:

   INC_KEY SEX  AGE   DCODE0   DCODE1   DCODE2
0        1   F   40  BW241ZZ  BW28ZZZ  BF42ZZZ
1        2   F   24  0BH17EZ  4A103BD     None
2        3   M   66  05H633Z  BR30ZZZ     None

答案2

得分: 0

这是另一种方式：

df_out = df1.merge(df2, on='INC_KEY')
df_out = df_out.set_index(['INC_KEY', 'SEX', 'AGE', df_out.groupby('INC_KEY').cumcount()]).unstack()
df_out.columns = [f'{i}{j}' for i, j in df_out.columns]
df_out.reset_index()

输出：

   INC_KEY SEX  AGE   DCODE0   DCODE1   DCODE2
0        1   F   40  BW241ZZ  BW28ZZZ  BF42ZZZ
1        2   F   24  0BH17EZ  4A103BD      NaN
2        3   M   66  05H633Z  BR30ZZZ      NaN

英文:

Here's another way:

df_out = df1.merge(df2, on=&#39;INC_KEY&#39;)
df_out = df_out.set_index([&#39;INC_KEY&#39;, &#39;SEX&#39;, &#39;AGE&#39;, df_out.groupby(&#39;INC_KEY&#39;).cumcount()]).unstack()
df_out.columns = [f&#39;{i}{j}&#39; for i, j in df_out.columns]
df_out.reset_index()

Output:

   INC_KEY SEX  AGE   DCODE0   DCODE1   DCODE2
0        1   F   40  BW241ZZ  BW28ZZZ  BF42ZZZ
1        2   F   24  0BH17EZ  4A103BD      NaN
2        3   M   66  05H633Z  BR30ZZZ      NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

合并具有相同键的两个数据框，其中包含多行。

问题

答案1

答案2

如何向Dask中的聚合函数传递参数。

Python Polars 表达式：按天按类别累计行数

如何更高效地下载WHL文件？

如何正确使用Python中的subprocess.Popen线程？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。