英文:
What is going wrong with my DataFrame merge?
问题
尝试合并这两个没有任何共同列的数据框:
combined = cleaned.merge(weather, how='cross')
combined
我得到了一个包含90行的合并数据集。
我知道一个数据集比另一个少一行。但是如何在不重复90次的情况下合并它们呢?我只是想简单地将这两个数据集合并在一起并丢弃第9行。
尝试了how=None的不同变化。
英文:
Trying to merge these two dataframes without any common column:
However, when I enter this code:
combined = cleaned.merge(weather, how='cross')
combined
I get a merged dataset that contains 90 rows.
I know one dataset has one row less than another. But how can I merge them without somehow duplicating 90 times? I just want to simply mash these two datasets together and drop the 9th row.
Tried variations on how=None
答案1
得分: 2
要合并两个数据帧,您必须找到一个公共键。您可以从第二个数据帧的“Date”列中提取年份。现在,您有一个共同的键来执行内连接:
# 如果需要,将date_time转换为datetime64
weather['date_time'] = pd.to_datetime(weather['date_time'])
# 如果需要,将Date转换为整数
cleaned['Date'] = cleaned['Date'].astype(int)
out = cleaned.merge(weather.assign(Date=df2['datetime'].dt.year), on='Date')
英文:
To merge two dataframes, you have to find a common key. You can extract the year from the Date
column on your second dataframe. Now, you have a common key to process an inner join:
# If needed, convert date_time as datetime64
weather['date_time'] = pd.to_datetime(weather['date_time'])
# If needed, convert Date as int
cleaned['Date'] = cleaned['Date'].astype(int)
out = cleaned.merge(weather.assign(Date=df2['datetime'].dt.year), on='Date')
答案2
得分: 1
你需要从 'date_time' 列中创建一个只包含年份的新列,然后在 'Year' 列上应用合并操作。之后,如果你不想在数据框中保留 'date_time' 列,可以删除它。
weather['date_time'] = pd.to_datetime(df['date_time'])
weather['Date'] = weather['date_time'].dt.year
combined = cleaned.merge(weather, on='Date')
# 之后删除 'date_time' 列
combined.drop(['date_time'], axis=1, inplace=True)
英文:
You need to create a new column from 'date_time' which contains only year and then apply merge on 'Year' column. After that drop that date_time column if you don't want to keep in your dataframe.
weather['date_time'] = pd.to_datetime(df['date_time'])
weather['Date'] = weather['date_time'].dt.year
combined = cleaned.merge(weather, on = 'Date')
# after that drop the 'date_time' column
combined.drop(['date_time'], axis = 1, inplace = True)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论