循环以合并具有相同键的字典。

huangapple go评论99阅读模式
英文:

Loop for merging dictionaries with the same key

问题

  1. # 合并相同名称的表格
  2. merged_sheets = {sheet_name: pd.merge(dfs_first_file[sheet_name], dfs_second_file[sheet_name], how='left', on='Concatenation') for sheet_name in dfs_first_file}
  3. # 过滤NaN值
  4. filtered_sheets = {sheet_name: merged_sheet.dropna() for sheet_name, merged_sheet in merged_sheets.items()}
英文:

I have two excel files with two sheets each, which I have stored as dictionaries:

  1. my_first_file = pd.read_excel(my_path, sheet_name=None, skiprows=2)
  2. my_second_file = pd.read_excel(my_path, sheet_name=None, skiprows=2)

Ideally, I would like to write a loop that allows me to apply a left merge to the sheets with the same name.
So then I could filter the NaN values (just like a vlookup would do in Excel).

  • my_first_file:
  1. {'Sheet_1': ID Name Surname Grade
  2. 0 104 Eleanor Rigby 6
  3. 1 168 Barbara Ann 8
  4. 2 450 Polly Cracker 7
  5. 3 90 Little Joe 10,
  6. 'Sheet_2': ID Name Surname Grade
  7. 0 106 Lucy Sky 8
  8. 1 128 Delilah Gonzalez 5
  9. 2 100 Christina Rodwell 3
  10. 3 40 Ziggy Stardust 7,
  11. 'Sheet_3': ID Name Surname Grade
  12. 0 22 Lucy Diamonds 9
  13. 1 50 Grace Kelly 7
  14. 2 105 Uma Thurman 7
  15. 3 29 Lola King 3}
  • my_second_file:
  1. {'Sheet_1': ID Name Surname Grade favourite color favourite sport
  2. 0 104 Eleanor Rigby 6 blue American football
  3. 1 168 Barbara Ann 8 pink Hockey
  4. 2 450 Polly Cracker 7 black Skateboarding
  5. 3 90 Little Josy 10 orange Cycling,
  6. 'Sheet_2': ID Name Surname Grade favourite color favourite sport
  7. 0 106 Lucy Sky 8 yellow Tennis
  8. 1 128 Delilah Perez 5 light green Basketball
  9. 2 100 Christina Rodwell 3 black Badminton
  10. 3 40 Ziggy Stardust 7 red Squash,
  11. 'Sheet_3': ID Name Surname Grade favourite color favourite sport
  12. 0 22 Lucy Diamonds 9 brown Judo
  13. 1 50 Grace Kelly 7 white Taekwondo
  14. 2 105 Uma Thurman 7 purple videogames
  15. 3 29 Lola McQueen 3 red Surf}

I am aware that pd.df.merge(right, how='left', on='Concatenation') is only appliable to DataFrames and not dictionaries like in this scenario but I have no clue on how to make it.
My expected output after merging the two dict keys for Sheet_1 would be:

  1. {'Sheet_1': ID Name Surname Contatenation Grade favourite color \
  2. 0 104 Eleanor Rigby Eleanor Rigby 6 blue
  3. 1 168 Barbara Ann Barbara Ann 8 pink
  4. 2 450 Polly Cracker Polly Cracker 7 black
  5. 3 90 Little Joe Little Joe 10 NaN
  6. favourite sport
  7. 0 American football
  8. 1 Hockey
  9. 2 Skateboarding
  10. 3 NaN ,

I have proceded with this code so far:

  1. # Importing modules
  2. import openpyxl as op
  3. import pandas as pd
  4. import numpy as np
  5. import xlsxwriter
  6. from openpyxl import Workbook, load_workbook
  7. # Defining the two file paths
  8. path_first_file = r'C:\Users\machukovich\Desktop\stack.xlsx'
  9. path_second_file = r'C:\Users\machukovich\Desktop\stack_2.xlsx'
  10. # Loading the files into a dictionary of Dataframes
  11. dfs_first_file = pd.read_excel(path_first_file, sheet_name=None, skiprows=2)
  12. dfs_second_file = pd.read_excel(path_second_file, sheet_name=None, skiprows=2)
  13. # Creating a new column in each sheet to merge later respectively
  14. for sheet_name, df in dfs_first_file.items():
  15. df.insert(3, 'Concatenation', df['Name'].map(str) + ' ' + df['Surname'].map(str))
  16. for sheet_name, df in dfs_second_file.items():
  17. df.insert(3, 'Concatenation', df['Name'].map(str) + ' ' + df['Surname'].map(str))

Thanks in advance for any tip and or help.

答案1

得分: 1

以下是您要翻译的内容:

IIUC,您可以使用:

  1. sheets = dfs_first_file.keys() & dfs_second_file.keys() # 共同的键/表格
  2. dfs_output_file = {
  3. sh: pd.merge(dfs_first_file[sh],
  4. dfs_second_file[sh],
  5. on=["Name", "Surname"], suffixes=("", "_"), how="left")
  6. .drop(columns=["ID_", "Grade_"]) for sh in sheets
  7. }

解释:

在这里,我们在dictcomp内部使用merge来覆盖两个字典(dfs_first_filedfs_second_file)的值(这些值是数据帧)。我们用同一表格的相应数据帧之间的左合并结果来覆盖它们。例如,在第一次迭代中,sh等于"Sheet1",所以在这种情况下,我们将dfs_first_file[sh]dfs_second_file[sh]合并(同时 sh==Sheet1)。

输出:

  1. print(dfs_output_file["Sheet_1"])
  2. ID Name Surname Grade favourite color favourite sport
  3. 0 104 Eleanor Rigby 6 blue American football
  4. 1 168 Barbara Ann 8 pink Hockey
  5. 2 450 Polly Cracker 7 black Skateboarding
  6. 3 90 Little Joe 10 NaN NaN
  7. print(dfs_output_file["Sheet_2"])
  8. ID Name Surname Grade favourite color favourite sport
  9. 0 106 Lucy Sky 8 yellow Tennis
  10. 1 128 Delilah Gonzalez 5 NaN NaN
  11. 2 100 Christina Rodwell 3 black Badminton
  12. 3 40 Ziggy Stardust 7 red Squash
  13. print(dfs_output_file["Sheet_3"])
  14. ID Name Surname Grade favourite color favourite sport
  15. 0 22 Lucy Diamonds 9 brown Judo
  16. 1 50 Grace Kelly 7 white Taekwondo
  17. 2 105 Uma Thurman 7 purple videogames
  18. 3 29 Lola King 3 NaN NaN
英文:

IIUC, you can use :

  1. sheets = dfs_first_file.keys() & dfs_second_file.keys() #common keys/sheets
  2. dfs_output_file = {
  3. sh: pd.merge(dfs_first_file[sh],
  4. dfs_second_file[sh],
  5. on=["Name", "Surname"], suffixes=("", "_"), how="left")
  6. .drop(columns=["ID_", "Grade_"]) for sh in sheets
  7. }

Explanation :

Here we use merge inside a dictcomp to overwrite the values (which are DataFrames) of the two dictionnaries (dfs_first_file and dfs_second_file). We overwrite them with the result of the left merge between the corresponding dataframes of the same sheet. For example, in the first iteration, sh equals "Sheet1", so in this case we merge dfs_first_file[sh] with dfs_second_file[sh] (while sh==Sheet1).

Output :

  1. print(dfs_output_file["Sheet_1"])
  2. ID Name Surname Grade favourite color favourite sport
  3. 0 104 Eleanor Rigby 6 blue American football
  4. 1 168 Barbara Ann 8 pink Hockey
  5. 2 450 Polly Cracker 7 black Skateboarding
  6. 3 90 Little Joe 10 NaN NaN
  7. print(dfs_output_file["Sheet_2"])
  8. ID Name Surname Grade favourite color favourite sport
  9. 0 106 Lucy Sky 8 yellow Tennis
  10. 1 128 Delilah Gonzalez 5 NaN NaN
  11. 2 100 Christina Rodwell 3 black Badminton
  12. 3 40 Ziggy Stardust 7 red Squash
  13. print(dfs_output_file["Sheet_3"])
  14. ID Name Surname Grade favourite color favourite sport
  15. 0 22 Lucy Diamonds 9 brown Judo
  16. 1 50 Grace Kelly 7 white Taekwondo
  17. 2 105 Uma Thurman 7 purple videogames
  18. 3 29 Lola King 3 NaN NaN

答案2

得分: 0

  1. out = {}
  2. for k in dfs_first_file.keys() & dfs_second_file.keys():
  3. out[k] = pd.merge(dct1[k], dct2[k], on=['ID', 'Name', 'Surname', 'Grade'])
  4. out[k]['Concatenation'] = out[k]['Name'] + ' ' + out[k]['Surname']
  5. print(out)

Prints:

  1. {'Sheet_3': ID Name Surname Grade favourite color favourite sport Concatenation
  2. 0 22 Lucy Diamonds 9 brown Judo Lucy Diamonds
  3. 1 50 Grace Kelly 7 white Taekwondo Grace Kelly
  4. 2 105 Uma Thurman 7 purple videogames Uma Thurman, 'Sheet_1': ID Name Surname Grade favourite color favourite sport Concatenation
  5. 0 104 Eleanor Rigby 6 blue American football Eleanor Rigby
  6. 1 168 Barbara Ann 8 pink Hockey Barbara Ann
  7. 2 450 Polly Cracker 7 black Skateboarding Polly Cracker, 'Sheet_2': ID Name Surname Grade favourite color favourite sport Concatenation
  8. 0 106 Lucy Sky 8 yellow Tennis Lucy Sky
  9. 1 100 Christina Rodwell 3 black Badminton Christina Rodwell
  10. 2 40 Ziggy Stardust 7 red Squash Ziggy Stardust}
英文:

You can try

  1. out = {}
  2. for k in dfs_first_file.keys() & dfs_second_file.keys():
  3. out[k] = pd.merge(dct1[k], dct2[k], on=['ID', 'Name', 'Surname', 'Grade'])
  4. out[k]['Concatenation'] = out[k]['Name'] + ' ' + out[k]['Surname']
  5. print(out)

Prints:

  1. {'Sheet_3': ID Name Surname Grade favourite color favourite sport Concatenation
  2. 0 22 Lucy Diamonds 9 brown Judo Lucy Diamonds
  3. 1 50 Grace Kelly 7 white Taekwondo Grace Kelly
  4. 2 105 Uma Thurman 7 purple videogames Uma Thurman, 'Sheet_1': ID Name Surname Grade favourite color favourite sport Concatenation
  5. 0 104 Eleanor Rigby 6 blue American football Eleanor Rigby
  6. 1 168 Barbara Ann 8 pink Hockey Barbara Ann
  7. 2 450 Polly Cracker 7 black Skateboarding Polly Cracker, 'Sheet_2': ID Name Surname Grade favourite color favourite sport Concatenation
  8. 0 106 Lucy Sky 8 yellow Tennis Lucy Sky
  9. 1 100 Christina Rodwell 3 black Badminton Christina Rodwell
  10. 2 40 Ziggy Stardust 7 red Squash Ziggy Stardust}

huangapple
  • 本文由 发表于 2023年5月17日 06:44:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76267526.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定