英文:
Merging multiple dataframes in loop based on same suffix in variable names
问题
我想要将demand_dataframe_list
中的DataFrame与supply_dataframe_list
中的DataFrame合并,当后缀相同时。
例如,data_Market1
应该与df_supply2_Market1
合并,data_Market2
应该与df_supply2_Market2
合并。
在这里,应该使用Market1和Market2后缀来基于每个DataFrame中共有的列('Col1'和'Col2')来获取合并后的数据。
以下是我的尝试,但我得到了空的DataFrame。感谢您的帮助!
merged_dataframes = []
for demand_df, supply_df in zip(demand_dataframe_list, supply_dataframe_list):
print(demand_df)
demand_suffix = demand_df.name.split('_')[-1] # 从demand DataFrame名称中提取后缀
supply_suffix = supply_df.name.split('_')[-1] # 从supply DataFrame名称中提取后缀
merged_df = pd.merge(demand_df, supply_df, how="inner", on=['Col1', 'Col2'])
merged_dataframes.append(merged_df)
英文:
I want to merge dataframes from demand_dataframe_list with supply_dataframe_list when the suffix is identical.
demand_dataframe_list = [data_Market1, data_Market2]
supply_dataframe_list = [df_supply2_Market1, df_supply2_Market2]
For example, data_Market1
should be merged with df_supply2_Market1
and data_Market2
should be merged with df_supply2_Market2
.
Here Market1 and Market2 suffix should be used to get the merged data based on common columns present in each dataframes which is 'Col1' and 'Col2'.
Below is my try
I am getting the empty dataframe using the code help. Appreciate your help !
merged_dataframes = []
for demand_df, supply_df in zip(demand_dataframe_list, supply_dataframe_list):
print(demand_df)
demand_suffix = demand_df.name.split('_')[-1] # Extract the suffix from the demand dataframe name
supply_suffix = supply_df.name.split('_')[-1] # Extract the suffix from the supply dataframe name
merged_df = pd.merge(demand_df, supply_df, how="inner", on=['Col1', 'Col2'])
merged_dataframes.append(merged_df)
答案1
得分: 1
除非在不同的数据框上先前已经设置了 name
属性,否则获取它将引发异常。
以下辅助函数提供了一种更稳健的方法来获取变量名称的后缀:
def get_suffix(df):
return [x for x in globals() if globals()[x] is df][0].split("_")[-1]
然后,您可以通过将 zip
替换为 Python 标准库的 itertools
模块中的 product 来对两个列表进行更广泛的比较,并通过 list comprehension 使您的代码更易读:
merged_dataframes = [
pd.merge(demand_df, supply_df, how="inner", on=["Col1", "Col2"])
for demand_df, supply_df in product(demand_dataframe_list, supply_dataframe_list)
if get_suffix(demand_df) == get_suffix(supply_df)
]
英文:
Unless name
attribute has previously been set on the different dataframes, getting it will raise an exception.
The following helper function provides a more robust way to get the suffix of the variable names:
def get_suffix(df):
return [x for x in globals() if globals()[x] is df][0].split("_")[-1]
Then, you can do a more extensive comparison of both lists by replacing zip
with product from Python standard library's itertools
module and make your code more readable with a list comprehension:
merged_dataframes = [
pd.merge(demand_df, supply_df, how="inner", on=["Col1", "Col2"])
for demand_df, supply_df in product(demand_dataframe_list, supply_dataframe_list)
if get_suffix(demand_df) == get_suffix(supply_df)
]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论