英文:
You're trying to access a column, but multiple columns have that name
问题
我正在尝试连接两个数据框,使它们都具有以下命名列。如何进行左外连接的最佳方法?
df = df.join(df_forecast, ["D_ACCOUNTS_ID", "D_APPS_ID", "D_CONTENT_PAGE_ID"], 'left')
目前,我收到了以下错误信息:
您正在尝试访问一个列,但有多个列具有相同的名称。
我漏掉了什么?
英文:
I am trying to join 2 dataframes such that both have the following named columns. What's the best way to do a LEFT OUTER join?
df = df.join(df_forecast, ["D_ACCOUNTS_ID", "D_APPS_ID", "D_CONTENT_PAGE_ID"], 'left')
Currently, I get an error that:
You're trying to access a column, but multiple columns have that name.
what am i missing out on?
答案1
得分: 1
import pyspark.sql.functions as f
join_keys = ["D_ACCOUNTS_ID", "D_APPS_ID", "D_CONTENT_PAGE_ID"]
df = (
df
.join(df_forecast, join_keys, 'left')
.select(
*join_keys,
# selecting columns from left side of the join that are not in the join keys.
*[df[element].alias('df_'+element) for element in df.columns if element not in join_keys],
# selecting columns from the right side of the join that are not in the join keys.
*[df_forecast[element].alias('df_forecast_'+element) for element in df_forecast.columns if element not in join_keys]
)
)
英文:
import pyspark.sql.functions as f
join_keys = ["D_ACCOUNTS_ID", "D_APPS_ID", "D_CONTENT_PAGE_ID"]
df = (
df
.join(df_forecast, join_keys, 'left')
.select(
*join_keys,
# selecting columns from left side of the join that are not in the join keys.
*[df[element].alias('df_'+element) for element in df.columns if element not in join_keys],
# selecting columns from right side of the join that are not in the join keys.
*[df_forecast[element].alias('df_forecast_'+element) for element in df_forecast.columns if element not in join_keys]
)
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论