英文:
Loop for merging dictionaries with the same key
问题
# 合并相同名称的表格
merged_sheets = {sheet_name: pd.merge(dfs_first_file[sheet_name], dfs_second_file[sheet_name], how='left', on='Concatenation') for sheet_name in dfs_first_file}
# 过滤NaN值
filtered_sheets = {sheet_name: merged_sheet.dropna() for sheet_name, merged_sheet in merged_sheets.items()}
英文:
I have two excel files with two sheets each, which I have stored as dictionaries:
my_first_file = pd.read_excel(my_path, sheet_name=None, skiprows=2)
my_second_file = pd.read_excel(my_path, sheet_name=None, skiprows=2)
Ideally, I would like to write a loop that allows me to apply a left merge to the sheets with the same name.
So then I could filter the NaN
values (just like a vlookup
would do in Excel).
- my_first_file:
{'Sheet_1': ID Name Surname Grade
0 104 Eleanor Rigby 6
1 168 Barbara Ann 8
2 450 Polly Cracker 7
3 90 Little Joe 10,
'Sheet_2': ID Name Surname Grade
0 106 Lucy Sky 8
1 128 Delilah Gonzalez 5
2 100 Christina Rodwell 3
3 40 Ziggy Stardust 7,
'Sheet_3': ID Name Surname Grade
0 22 Lucy Diamonds 9
1 50 Grace Kelly 7
2 105 Uma Thurman 7
3 29 Lola King 3}
- my_second_file:
{'Sheet_1': ID Name Surname Grade favourite color favourite sport
0 104 Eleanor Rigby 6 blue American football
1 168 Barbara Ann 8 pink Hockey
2 450 Polly Cracker 7 black Skateboarding
3 90 Little Josy 10 orange Cycling,
'Sheet_2': ID Name Surname Grade favourite color favourite sport
0 106 Lucy Sky 8 yellow Tennis
1 128 Delilah Perez 5 light green Basketball
2 100 Christina Rodwell 3 black Badminton
3 40 Ziggy Stardust 7 red Squash,
'Sheet_3': ID Name Surname Grade favourite color favourite sport
0 22 Lucy Diamonds 9 brown Judo
1 50 Grace Kelly 7 white Taekwondo
2 105 Uma Thurman 7 purple videogames
3 29 Lola McQueen 3 red Surf}
I am aware that pd.df.merge(right, how='left', on='Concatenation')
is only appliable to DataFrames and not dictionaries like in this scenario but I have no clue on how to make it.
My expected output after merging the two dict keys for Sheet_1
would be:
{'Sheet_1': ID Name Surname Contatenation Grade favourite color \
0 104 Eleanor Rigby Eleanor Rigby 6 blue
1 168 Barbara Ann Barbara Ann 8 pink
2 450 Polly Cracker Polly Cracker 7 black
3 90 Little Joe Little Joe 10 NaN
favourite sport
0 American football
1 Hockey
2 Skateboarding
3 NaN ,
I have proceded with this code so far:
# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook
# Defining the two file paths
path_first_file = r'C:\Users\machukovich\Desktop\stack.xlsx'
path_second_file = r'C:\Users\machukovich\Desktop\stack_2.xlsx'
# Loading the files into a dictionary of Dataframes
dfs_first_file = pd.read_excel(path_first_file, sheet_name=None, skiprows=2)
dfs_second_file = pd.read_excel(path_second_file, sheet_name=None, skiprows=2)
# Creating a new column in each sheet to merge later respectively
for sheet_name, df in dfs_first_file.items():
df.insert(3, 'Concatenation', df['Name'].map(str) + ' ' + df['Surname'].map(str))
for sheet_name, df in dfs_second_file.items():
df.insert(3, 'Concatenation', df['Name'].map(str) + ' ' + df['Surname'].map(str))
Thanks in advance for any tip and or help.
答案1
得分: 1
以下是您要翻译的内容:
IIUC,您可以使用:
sheets = dfs_first_file.keys() & dfs_second_file.keys() # 共同的键/表格
dfs_output_file = {
sh: pd.merge(dfs_first_file[sh],
dfs_second_file[sh],
on=["Name", "Surname"], suffixes=("", "_"), how="left")
.drop(columns=["ID_", "Grade_"]) for sh in sheets
}
解释:
在这里,我们在dictcomp内部使用merge
来覆盖两个字典(dfs_first_file
和dfs_second_file
)的值(这些值是数据帧)。我们用同一表格的相应数据帧之间的左合并结果来覆盖它们。例如,在第一次迭代中,sh
等于"Sheet1"
,所以在这种情况下,我们将dfs_first_file[sh]
与dfs_second_file[sh]
合并(同时 sh==Sheet1
)。
输出:
print(dfs_output_file["Sheet_1"])
ID Name Surname Grade favourite color favourite sport
0 104 Eleanor Rigby 6 blue American football
1 168 Barbara Ann 8 pink Hockey
2 450 Polly Cracker 7 black Skateboarding
3 90 Little Joe 10 NaN NaN
print(dfs_output_file["Sheet_2"])
ID Name Surname Grade favourite color favourite sport
0 106 Lucy Sky 8 yellow Tennis
1 128 Delilah Gonzalez 5 NaN NaN
2 100 Christina Rodwell 3 black Badminton
3 40 Ziggy Stardust 7 red Squash
print(dfs_output_file["Sheet_3"])
ID Name Surname Grade favourite color favourite sport
0 22 Lucy Diamonds 9 brown Judo
1 50 Grace Kelly 7 white Taekwondo
2 105 Uma Thurman 7 purple videogames
3 29 Lola King 3 NaN NaN
英文:
IIUC, you can use :
sheets = dfs_first_file.keys() & dfs_second_file.keys() #common keys/sheets
dfs_output_file = {
sh: pd.merge(dfs_first_file[sh],
dfs_second_file[sh],
on=["Name", "Surname"], suffixes=("", "_"), how="left")
.drop(columns=["ID_", "Grade_"]) for sh in sheets
}
Explanation :
Here we use merge
inside a dictcomp to overwrite the values (which are DataFrames) of the two dictionnaries (dfs_first_file
and dfs_second_file
). We overwrite them with the result of the left merge between the corresponding dataframes of the same sheet. For example, in the first iteration, sh
equals "Sheet1"
, so in this case we merge dfs_first_file[sh]
with dfs_second_file[sh]
(while sh==Sheet1
).
Output :
print(dfs_output_file["Sheet_1"])
ID Name Surname Grade favourite color favourite sport
0 104 Eleanor Rigby 6 blue American football
1 168 Barbara Ann 8 pink Hockey
2 450 Polly Cracker 7 black Skateboarding
3 90 Little Joe 10 NaN NaN
print(dfs_output_file["Sheet_2"])
ID Name Surname Grade favourite color favourite sport
0 106 Lucy Sky 8 yellow Tennis
1 128 Delilah Gonzalez 5 NaN NaN
2 100 Christina Rodwell 3 black Badminton
3 40 Ziggy Stardust 7 red Squash
print(dfs_output_file["Sheet_3"])
ID Name Surname Grade favourite color favourite sport
0 22 Lucy Diamonds 9 brown Judo
1 50 Grace Kelly 7 white Taekwondo
2 105 Uma Thurman 7 purple videogames
3 29 Lola King 3 NaN NaN
答案2
得分: 0
out = {}
for k in dfs_first_file.keys() & dfs_second_file.keys():
out[k] = pd.merge(dct1[k], dct2[k], on=['ID', 'Name', 'Surname', 'Grade'])
out[k]['Concatenation'] = out[k]['Name'] + ' ' + out[k]['Surname']
print(out)
Prints:
{'Sheet_3': ID Name Surname Grade favourite color favourite sport Concatenation
0 22 Lucy Diamonds 9 brown Judo Lucy Diamonds
1 50 Grace Kelly 7 white Taekwondo Grace Kelly
2 105 Uma Thurman 7 purple videogames Uma Thurman, 'Sheet_1': ID Name Surname Grade favourite color favourite sport Concatenation
0 104 Eleanor Rigby 6 blue American football Eleanor Rigby
1 168 Barbara Ann 8 pink Hockey Barbara Ann
2 450 Polly Cracker 7 black Skateboarding Polly Cracker, 'Sheet_2': ID Name Surname Grade favourite color favourite sport Concatenation
0 106 Lucy Sky 8 yellow Tennis Lucy Sky
1 100 Christina Rodwell 3 black Badminton Christina Rodwell
2 40 Ziggy Stardust 7 red Squash Ziggy Stardust}
英文:
You can try
out = {}
for k in dfs_first_file.keys() & dfs_second_file.keys():
out[k] = pd.merge(dct1[k], dct2[k], on=['ID', 'Name', 'Surname', 'Grade'])
out[k]['Concatenation'] = out[k]['Name'] + ' ' + out[k]['Surname']
print(out)
Prints:
{'Sheet_3': ID Name Surname Grade favourite color favourite sport Concatenation
0 22 Lucy Diamonds 9 brown Judo Lucy Diamonds
1 50 Grace Kelly 7 white Taekwondo Grace Kelly
2 105 Uma Thurman 7 purple videogames Uma Thurman, 'Sheet_1': ID Name Surname Grade favourite color favourite sport Concatenation
0 104 Eleanor Rigby 6 blue American football Eleanor Rigby
1 168 Barbara Ann 8 pink Hockey Barbara Ann
2 450 Polly Cracker 7 black Skateboarding Polly Cracker, 'Sheet_2': ID Name Surname Grade favourite color favourite sport Concatenation
0 106 Lucy Sky 8 yellow Tennis Lucy Sky
1 100 Christina Rodwell 3 black Badminton Christina Rodwell
2 40 Ziggy Stardust 7 red Squash Ziggy Stardust}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论