2023年5月17日 06:44:14go评论103阅读模式

英文:

Loop for merging dictionaries with the same key

问题

# 合并相同名称的表格
merged_sheets = {sheet_name: pd.merge(dfs_first_file[sheet_name], dfs_second_file[sheet_name], how='left', on='Concatenation') for sheet_name in dfs_first_file}
# 过滤NaN值
filtered_sheets = {sheet_name: merged_sheet.dropna() for sheet_name, merged_sheet in merged_sheets.items()}

英文:

I have two excel files with two sheets each, which I have stored as dictionaries:

my_first_file = pd.read_excel(my_path, sheet_name=None, skiprows=2)
my_second_file = pd.read_excel(my_path, sheet_name=None, skiprows=2)

Ideally, I would like to write a loop that allows me to apply a left merge to the sheets with the same name.
So then I could filter the NaN values (just like a vlookup would do in Excel).

my_first_file:

{&#39;Sheet_1&#39;:     ID     Name  Surname  Grade
 0  104  Eleanor    Rigby      6
 1  168  Barbara      Ann      8
 2  450    Polly  Cracker      7
 3   90  Little       Joe     10,
 &#39;Sheet_2&#39;:     ID       Name   Surname  Grade
 0  106       Lucy       Sky      8
 1  128    Delilah  Gonzalez      5
 2  100  Christina   Rodwell      3
 3   40      Ziggy  Stardust      7,
 &#39;Sheet_3&#39;:     ID   Name   Surname  Grade
 0   22   Lucy  Diamonds      9
 1   50  Grace     Kelly      7
 2  105    Uma   Thurman      7
 3   29   Lola      King      3}

my_second_file:

{&#39;Sheet_1&#39;:     ID     Name  Surname  Grade favourite color    favourite sport
 0  104  Eleanor    Rigby      6            blue  American football
 1  168  Barbara      Ann      8            pink             Hockey
 2  450    Polly  Cracker      7           black      Skateboarding
 3   90  Little      Josy     10          orange            Cycling,
 &#39;Sheet_2&#39;:     ID       Name   Surname  Grade favourite color favourite sport
 0  106       Lucy       Sky      8          yellow          Tennis
 1  128    Delilah     Perez      5     light green      Basketball
 2  100  Christina   Rodwell      3           black       Badminton
 3   40      Ziggy  Stardust      7             red          Squash,
 &#39;Sheet_3&#39;:     ID   Name   Surname  Grade favourite color favourite sport
 0   22   Lucy  Diamonds      9           brown            Judo
 1   50  Grace     Kelly      7           white       Taekwondo
 2  105    Uma   Thurman      7          purple      videogames
 3   29   Lola   McQueen      3             red            Surf}

I am aware that pd.df.merge(right, how='left', on='Concatenation') is only appliable to DataFrames and not dictionaries like in this scenario but I have no clue on how to make it.
My expected output after merging the two dict keys for Sheet_1 would be:

{&#39;Sheet_1&#39;:     ID      Name  Surname  Contatenation  Grade favourite color  \
 0  104  Eleanor     Rigby  Eleanor Rigby      6            blue   
 1  168  Barbara       Ann    Barbara Ann      8            pink   
 2  450    Polly   Cracker  Polly Cracker      7           black   
 3   90   Little       Joe     Little Joe     10             NaN   
 
      favourite sport  
 0  American football  
 1             Hockey  
 2      Skateboarding  
 3                NaN  ,

I have proceded with this code so far:

# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook
# Defining the two file paths
path_first_file = r&#39;C:\Users\machukovich\Desktop\stack.xlsx&#39;
path_second_file = r&#39;C:\Users\machukovich\Desktop\stack_2.xlsx&#39;
# Loading the files into a dictionary of Dataframes
dfs_first_file = pd.read_excel(path_first_file, sheet_name=None, skiprows=2)
dfs_second_file = pd.read_excel(path_second_file, sheet_name=None, skiprows=2)
# Creating a new column in each sheet to merge later respectively
for sheet_name, df in dfs_first_file.items():
    df.insert(3, &#39;Concatenation&#39;, df[&#39;Name&#39;].map(str) + &#39; &#39; + df[&#39;Surname&#39;].map(str))
for sheet_name, df in dfs_second_file.items():
    df.insert(3, &#39;Concatenation&#39;, df[&#39;Name&#39;].map(str) + &#39; &#39; + df[&#39;Surname&#39;].map(str))

Thanks in advance for any tip and or help.

答案1

得分: 1

以下是您要翻译的内容：

IIUC，您可以使用：

sheets = dfs_first_file.keys() & dfs_second_file.keys()  # 共同的键/表格
dfs_output_file = {
    sh: pd.merge(dfs_first_file[sh],
                 dfs_second_file[sh],
                 on=["Name", "Surname"], suffixes=("", "_"), how="left")
                .drop(columns=["ID_", "Grade_"]) for sh in sheets
}

解释：

在这里，我们在dictcomp内部使用merge来覆盖两个字典（dfs_first_file和dfs_second_file）的值（这些值是数据帧）。我们用同一表格的相应数据帧之间的左合并结果来覆盖它们。例如，在第一次迭代中，sh等于"Sheet1"，所以在这种情况下，我们将dfs_first_file[sh]与dfs_second_file[sh]合并（同时 sh==Sheet1）。

输出：

print(dfs_output_file["Sheet_1"])
    ID     Name  Surname  Grade favourite color    favourite sport
0  104  Eleanor    Rigby      6            blue  American football
1  168  Barbara      Ann      8            pink             Hockey
2  450    Polly  Cracker      7           black      Skateboarding
3   90   Little      Joe     10             NaN                NaN
print(dfs_output_file["Sheet_2"])
    ID       Name   Surname  Grade favourite color favourite sport
0  106       Lucy       Sky      8          yellow          Tennis
1  128    Delilah  Gonzalez      5             NaN             NaN
2  100  Christina   Rodwell      3           black       Badminton
3   40      Ziggy  Stardust      7             red          Squash
print(dfs_output_file["Sheet_3"])
    ID   Name   Surname  Grade favourite color favourite sport
0   22   Lucy  Diamonds      9           brown            Judo
1   50  Grace     Kelly      7           white       Taekwondo
2  105    Uma   Thurman      7          purple      videogames
3   29   Lola      King      3             NaN             NaN

英文:

IIUC, you can use :

sheets = dfs_first_file.keys() &amp; dfs_second_file.keys() #common keys/sheets
dfs_output_file = {
    sh: pd.merge(dfs_first_file[sh],
                 dfs_second_file[sh],
        on=[&quot;Name&quot;, &quot;Surname&quot;], suffixes=(&quot;&quot;, &quot;_&quot;), how=&quot;left&quot;)
                .drop(columns=[&quot;ID_&quot;, &quot;Grade_&quot;]) for sh in sheets
}

Explanation :

Here we use merge inside a dictcomp to overwrite the values (which are DataFrames) of the two dictionnaries (dfs_first_file and dfs_second_file). We overwrite them with the result of the left merge between the corresponding dataframes of the same sheet. For example, in the first iteration, sh equals "Sheet1", so in this case we merge dfs_first_file[sh] with dfs_second_file[sh] (while sh==Sheet1).

Output :

print(dfs_output_file[&quot;Sheet_1&quot;])
    ID     Name  Surname  Grade favourite color    favourite sport
0  104  Eleanor    Rigby      6            blue  American football
1  168  Barbara      Ann      8            pink             Hockey
2  450    Polly  Cracker      7           black      Skateboarding
3   90   Little      Joe     10             NaN                NaN
print(dfs_output_file[&quot;Sheet_2&quot;])
    ID       Name   Surname  Grade favourite color favourite sport
0  106       Lucy       Sky      8          yellow          Tennis
1  128    Delilah  Gonzalez      5             NaN             NaN
2  100  Christina   Rodwell      3           black       Badminton
3   40      Ziggy  Stardust      7             red          Squash
print(dfs_output_file[&quot;Sheet_3&quot;])
    ID   Name   Surname  Grade favourite color favourite sport
0   22   Lucy  Diamonds      9           brown            Judo
1   50  Grace     Kelly      7           white       Taekwondo
2  105    Uma   Thurman      7          purple      videogames
3   29   Lola      King      3             NaN             NaN

答案2

得分: 0

out = {}
for k in dfs_first_file.keys() & dfs_second_file.keys():
    out[k] = pd.merge(dct1[k], dct2[k], on=['ID', 'Name', 'Surname', 'Grade'])
    out[k]['Concatenation'] = out[k]['Name'] + ' ' + out[k]['Surname']
print(out)

Prints:

{'Sheet_3':     ID   Name   Surname  Grade favourite color favourite sport  Concatenation
0   22   Lucy  Diamonds      9           brown            Judo  Lucy Diamonds
1   50  Grace     Kelly      7           white       Taekwondo    Grace Kelly
2  105    Uma   Thurman      7          purple      videogames    Uma Thurman, 'Sheet_1':     ID     Name  Surname  Grade favourite color    favourite sport  Concatenation
0  104  Eleanor    Rigby      6            blue  American football  Eleanor Rigby
1  168  Barbara      Ann      8            pink             Hockey    Barbara Ann
2  450    Polly  Cracker      7           black      Skateboarding  Polly Cracker, 'Sheet_2':     ID       Name   Surname  Grade favourite color favourite sport      Concatenation
0  106       Lucy       Sky      8          yellow          Tennis           Lucy Sky
1  100  Christina   Rodwell      3           black       Badminton  Christina Rodwell
2   40      Ziggy  Stardust      7             red          Squash     Ziggy Stardust}

英文:

You can try

out = {}
for k in dfs_first_file.keys() &amp; dfs_second_file.keys():
    out[k] =  pd.merge(dct1[k], dct2[k], on=[&#39;ID&#39;, &#39;Name&#39;, &#39;Surname&#39;, &#39;Grade&#39;])
    out[k][&#39;Concatenation&#39;] = out[k][&#39;Name&#39;] + &#39; &#39; + out[k][&#39;Surname&#39;]
print(out)

Prints:

{&#39;Sheet_3&#39;:     ID   Name   Surname  Grade favourite color favourite sport  Concatenation
0   22   Lucy  Diamonds      9           brown            Judo  Lucy Diamonds
1   50  Grace     Kelly      7           white       Taekwondo    Grace Kelly
2  105    Uma   Thurman      7          purple      videogames    Uma Thurman, &#39;Sheet_1&#39;:     ID     Name  Surname  Grade favourite color    favourite sport  Concatenation
0  104  Eleanor    Rigby      6            blue  American football  Eleanor Rigby
1  168  Barbara      Ann      8            pink             Hockey    Barbara Ann
2  450    Polly  Cracker      7           black      Skateboarding  Polly Cracker, &#39;Sheet_2&#39;:     ID       Name   Surname  Grade favourite color favourite sport      Concatenation
0  106       Lucy       Sky      8          yellow          Tennis           Lucy Sky
1  100  Christina   Rodwell      3           black       Badminton  Christina Rodwell
2   40      Ziggy  Stardust      7             red          Squash     Ziggy Stardust}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

循环以合并具有相同键的字典。

问题

答案1

答案2

将字符串列表转换为正确的列表

在Flask中格式化Python爬取的输出。

从一个单词中提取字符串中的数字

使用Pandas根据另一列的条件重置列的值

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。