2023年2月27日 04:39:31go评论62阅读模式

英文:

Merge or append 2 dataframes row wise and add a check in a separate column determining which one it came from

问题

你可以使用Pandas中的concat函数来按行合并这两个DataFrame，然后填充缺失的值为1。以下是示例代码：

import pandas as pd

# 合并两个DataFrame，忽略索引，保留列名
merged_df = pd.concat([df1, df2], ignore_index=True)

# 填充缺失的值为1
merged_df['common'].fillna(0, inplace=True)
merged_df['alt'].fillna(0, inplace=True)

# 将浮点数列转换为整数
merged_df['common'] = merged_df['common'].astype(int)
merged_df['alt'] = merged_df['alt'].astype(int)

# 如果两个列都有值，将它们相加
merged_df['common'] = merged_df['common'] + merged_df['alt']

# 删除'type'列，如果需要
# merged_df = merged_df.drop('Type', axis=1)

# 打印最终DataFrame
print(merged_df)

这将合并两个DataFrame，根据'commonshortname'、'altshortname'、'Code'、'Type'列进行匹配，并添加'common'和'alt'列以表示数据的来源。

英文:

I have the following 2 dataframes, df1,

import pandas as pd

data = {
    &#39;commonshortname&#39;: [&#39;SNX.US&#39;, &#39;002400.CH&#39;, &#39;CDW.US&#39;, &#39;CEC.GR&#39;, &#39;300002.CH&#39;],
    &#39;altshortname&#39;: [&#39;SNX.US&#39;, &#39;002400.SHE&#39;, &#39;CDW.US&#39;, &#39;CEC.XETRA&#39;, &#39;300002.SHE&#39;],
    &#39;Code&#39;: [&#39;SNX&#39;, &#39;002400&#39;, &#39;CDW&#39;, &#39;CEC&#39;, &#39;300002&#39;, ...],
    &#39;Type&#39;: [&#39;Common Stock&#39;, &#39;Common Stock&#39;, &#39;Common Stock&#39;, &#39;Common Stock&#39;, &#39;Common Stock&#39;],
    &#39;common&#39;: [1, 1, 1, 1, 1]
}

df1 = pd.DataFrame(data)

and df2 which looks like this,

data = {&#39;altshortname&#39;: [&#39;SEDG.US&#39;, &#39;MHLD.US&#39;, &#39;CDW.US&#39;, &#39;POLA.US&#39;, &#39;PHASQ.US&#39;],
        &#39;Code&#39;: [&#39;SEDG&#39;, &#39;MHLD&#39;, &#39;CDW&#39;, &#39;POLA&#39;, &#39;PHASQ&#39;],
        &#39;Type&#39;: [&#39;Common Stock&#39;, &#39;Common Stock&#39;, &#39;Common Stock&#39;, &#39;Common Stock&#39;, &#39;Common Stock&#39;],
        &#39;alt&#39;: [1, 1, 1, 1, 1]}

df2 = pd.DataFrame(data)

This is what they look like in dataframe form,

     commonshortname altshortname  Code           Type   common
0          SNX.US       SNX.US      SNX   Common Stock     1
1       002400.CH    002400.SHE  002400  Common Stock      1
2          CDW.US       CDW.US      CDW   Common Stock     1
3          CEC.GR     CEC.XETRA     CEC  Common Stock      1
4       300002.CH    300002.SHE  300002  Common Stock      1
...           ...          ...     ...           ...  ...

and

     altshortname    Code         Type         alt
0         SEDG.US    SEDG  Common Stock          1
1         MHLD.US    MHLD  Common Stock          1
2          CDW.US     CDW  Common Stock          1
3         POLA.US    POLA  Common Stock          1
4        PHASQ.US   PHASQ  Common Stock          1

I want to merge these 2 row wise, so that if they exist in both, the data from the top dataframe is taken and a 1 is added into the alt column for it.

The final frame should look like this,

     commonshortname altshortname  Code           Type   common   alt
0          SNX.US       SNX.US      SNX   Common Stock     1
1       002400.CH    002400.SHE  002400  Common Stock      1
2          CDW.US       CDW.US      CDW   Common Stock     1       1
3          CEC.GR     CEC.XETRA     CEC  Common Stock      1
4       300002.CH    300002.SHE  300002  Common Stock      1
0                      SEDG.US    SEDG  Common Stock               1
1                      MHLD.US    MHLD  Common Stock               1
3                      POLA.US    POLA  Common Stock               1
4                     PHASQ.US   PHASQ  Common Stock               1

Basically, if the data came from df1, there will be a 1 in the common column, if it came from df2, there will be a 1 in the alt column, and if it came from both, there will be a 1 in both columns.

Can this be done in pandas?

I tried to do a merge, but it keeps joining it column wise and I end up with millions of rows.

merged_df = pd.merge(df1, df2, on=[&#39;altshortname&#39;, &#39;Code&#39;, &#39;Type&#39;], how=&#39;outer&#39;)

答案1

得分: 1

我理解你需要的是 concat 和 drop_duplicates。

out = pd.concat([df1, df2], ignore_index=True).drop_duplicates(
    ["altshortname", "Code", "Type"], ignore_index=True
)

英文:

IIUC what you need is a concat and drop_duplicates

out = pd.concat([df1, df2], ignore_index=True).drop_duplicates(
    [&quot;altshortname&quot;, &quot;Code&quot;, &quot;Type&quot;], ignore_index=True
)

答案2

得分: 1

这是一个可能的解决方案：

merged_df = pd.merge(df1, df2, on=['altshortname', 'Code', 'Type'], how='outer')
merged_df.fillna(0, inplace=True)

merged_df[['common', 'alt']] = merged_df[['common', 'alt']].astype(int)
merged_df.replace(0, '', inplace=True)
print(merged_df)

  commonshortname altshortname    Code          Type common alt
0          SNX.US       SNX.US     SNX  Common Stock      1    
1       002400.CH   002400.SHE  002400  Common Stock      1    
2          CDW.US       CDW.US     CDW  Common Stock      1   1
3          CEC.GR    CEC.XETRA     CEC  Common Stock      1    
4       300002.CH   300002.SHE  300002  Common Stock      1    
5                      SEDG.US    SEDG  Common Stock          1
6                      MHLD.US    MHLD  Common Stock          1
7                      POLA.US    POLA  Common Stock          1
8                     PHASQ.US   PHASQ  Common Stock          1

英文:

Here is a possible solution:

merged_df = pd.merge(df1, df2, on=[&#39;altshortname&#39;, &#39;Code&#39;, &#39;Type&#39;], how=&#39;outer&#39;)
merged_df.fillna(0, inplace=True)

merged_df[[&#39;common&#39;, &#39;alt&#39;]] = merged_df[[&#39;common&#39;, &#39;alt&#39;]].astype(int)
merged_df.replace(0, &#39;&#39;, inplace=True)
print(merged_df)

  commonshortname altshortname    Code          Type common alt
0          SNX.US       SNX.US     SNX  Common Stock      1    
1       002400.CH   002400.SHE  002400  Common Stock      1    
2          CDW.US       CDW.US     CDW  Common Stock      1   1
3          CEC.GR    CEC.XETRA     CEC  Common Stock      1    
4       300002.CH   300002.SHE  300002  Common Stock      1    
5                      SEDG.US    SEDG  Common Stock          1
6                      MHLD.US    MHLD  Common Stock          1
7                      POLA.US    POLA  Common Stock          1
8                     PHASQ.US   PHASQ  Common Stock          1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Merge or append 2 dataframes row wise and add a check in a separate column determining which one it came from

问题

答案1

答案2

Pandas：返回第一行，其中列值满足与值列表的条件相符。

使用时间序列数据框创建折线图在plotly中出现了ValueError。

在Python中，是否有一个好的解决方案用于异步写入NetCDF文件？

快速创建包含不同类型元素的嵌套列表的方法：numpy、pandas还是列表连接？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论