根据变量名称中相同的后缀,在循环中合并多个数据框。

huangapple go评论51阅读模式
英文:

Merging multiple dataframes in loop based on same suffix in variable names

问题

我想要将demand_dataframe_list中的DataFrame与supply_dataframe_list中的DataFrame合并,当后缀相同时。
例如,data_Market1应该与df_supply2_Market1合并,data_Market2应该与df_supply2_Market2合并。

在这里,应该使用Market1和Market2后缀来基于每个DataFrame中共有的列('Col1'和'Col2')来获取合并后的数据。

以下是我的尝试,但我得到了空的DataFrame。感谢您的帮助!

merged_dataframes = []

for demand_df, supply_df in zip(demand_dataframe_list, supply_dataframe_list):
    print(demand_df)
    demand_suffix = demand_df.name.split('_')[-1]  # 从demand DataFrame名称中提取后缀
    supply_suffix = supply_df.name.split('_')[-1]  # 从supply DataFrame名称中提取后缀
    merged_df = pd.merge(demand_df, supply_df, how="inner", on=['Col1', 'Col2'])
    merged_dataframes.append(merged_df)
英文:

I want to merge dataframes from demand_dataframe_list with supply_dataframe_list when the suffix is identical.

demand_dataframe_list = [data_Market1, data_Market2] 
supply_dataframe_list = [df_supply2_Market1, df_supply2_Market2]

For example, data_Market1 should be merged with df_supply2_Market1 and data_Market2 should be merged with df_supply2_Market2.

Here Market1 and Market2 suffix should be used to get the merged data based on common columns present in each dataframes which is 'Col1' and 'Col2'.

Below is my try
I am getting the empty dataframe using the code help. Appreciate your help !

merged_dataframes = []

for demand_df, supply_df in zip(demand_dataframe_list, supply_dataframe_list):
    print(demand_df)
    demand_suffix = demand_df.name.split('_')[-1]  # Extract the suffix from the demand dataframe name
    supply_suffix = supply_df.name.split('_')[-1]  # Extract the suffix from the supply dataframe name
    merged_df = pd.merge(demand_df, supply_df, how="inner", on=['Col1', 'Col2'])
    merged_dataframes.append(merged_df)

答案1

得分: 1

除非在不同的数据框上先前已经设置了 name 属性,否则获取它将引发异常。

以下辅助函数提供了一种更稳健的方法来获取变量名称的后缀:

def get_suffix(df):
    return [x for x in globals() if globals()[x] is df][0].split("_")[-1]

然后,您可以通过将 zip 替换为 Python 标准库的 itertools 模块中的 product 来对两个列表进行更广泛的比较,并通过 list comprehension 使您的代码更易读:

merged_dataframes = [
    pd.merge(demand_df, supply_df, how="inner", on=["Col1", "Col2"])
    for demand_df, supply_df in product(demand_dataframe_list, supply_dataframe_list)
    if get_suffix(demand_df) == get_suffix(supply_df)
]
英文:

Unless name attribute has previously been set on the different dataframes, getting it will raise an exception.

The following helper function provides a more robust way to get the suffix of the variable names:

def get_suffix(df):
    return [x for x in globals() if globals()[x] is df][0].split("_")[-1]

Then, you can do a more extensive comparison of both lists by replacing zip with product from Python standard library's itertools module and make your code more readable with a list comprehension:

merged_dataframes = [
    pd.merge(demand_df, supply_df, how="inner", on=["Col1", "Col2"])
    for demand_df, supply_df in product(demand_dataframe_list, supply_dataframe_list)
    if get_suffix(demand_df) == get_suffix(supply_df)
]

huangapple
  • 本文由 发表于 2023年6月1日 09:41:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76378181.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定