2023年7月14日 02:13:40go评论107阅读模式

英文:

Retain original number of rows with dataset to be matched, when pairing values from two different datasets in Pandas

问题

Data

df1

ID                  stat
AA1                 exzone
BB2                 exzone5
CC4                 limit5

df2

name                state
AA1                 NY
AA1                 NY
AA1                 NY
AA1                 NY
BB2                 GA
BB2                 GA
BB2                 GA
CC4                 CA
CC4                 CA

Desired

name                stat          state
AA1                 exzone        NY
BB2                 exzone5       GA
CC4                 limit5        CA

Doing

out = pd.merge(df1, df2, left_on=['ID'], right_on=['name'], how='left')

然而，上面的脚本输出结果爆炸了，并且不保留原始左侧数据框的行数。欢迎任何建议。

英文:

Data

df1

ID                  stat
AA1                 exzone
BB2                 exzone5
CC4                 limit5

df2

name                state
AA1                 NY
AA1                 NY
AA1                 NY
AA1                 NY
BB2                 GA
BB2                 GA
BB2                 GA
CC4                 CA
CC4                 CA

Desired

name                stat          state
AA1                 exzone        NY
BB2                 exzone5       GA
CC4                 limit5        CA

Doing

out = pd.merge(df1,df2, left_on=[&#39;ID&#39;], right_on= [&#39;name&#39;], how=&quot;left&quot;)

however, the above script is giving an exploded output and does not retain the original Left dataframe row count. Any suggestion is appreciated.

答案1

得分: 1

左连接并不意味着结构将与原始左DataFrame相同。它意味着所有左侧的键将被保留，即使在右侧DataFrame中不存在。在你的情况下，右侧的重复键会强制merge计算所有行的组合。

你首先需要移除重复项：

out = pd.merge(df1, df2.drop_duplicates(), left_on=['ID'], right_on=['name'], how='left')

如果出现某种原因，你有每个名称的多个不同状态，你应该找到另一种方法来聚合（选择第一个、最后一个、将唯一状态组合为单个字符串等），或者接受有重复行的情况。

英文:

A left merge doesn't mean that the structure will be identical to that of the original left DataFrame. It means that all the left keys will be preserved, even if absent from the right DataFrame. In your case the duplicated keys on the right force the merge to compute all combinations of the rows.

You need to first remove the duplicates:

out = pd.merge(df1, df2.drop_duplicates(), left_on=[&#39;ID&#39;], right_on= [&#39;name&#39;], how=&quot;left&quot;)

If for some reason you have several different states per name, you should find another way to aggregate (pick the first, last, combine the unique states as a single string, etc.), or accept to have row duplications.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

保留原始数据集中的行数，在Pandas中将两个不同数据集的值进行配对时。

问题

答案1

你可以在Kivy的MapView中用画布圆圈替换标准标记。

如何修复高斯拟合不符合预期？

使用JSON作为SQL查询生成器。

pandas对具有多个条目的行进行get_dummies操作

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。