我的DataFrame合并出了什么问题?

huangapple go评论109阅读模式
英文:

What is going wrong with my DataFrame merge?

问题

尝试合并这两个没有任何共同列的数据框:

combined = cleaned.merge(weather, how='cross')
combined

我得到了一个包含90行的合并数据集。

我知道一个数据集比另一个少一行。但是如何在不重复90次的情况下合并它们呢?我只是想简单地将这两个数据集合并在一起并丢弃第9行。

尝试了how=None的不同变化。

英文:

Trying to merge these two dataframes without any common column: 我的DataFrame合并出了什么问题?我的DataFrame合并出了什么问题?

However, when I enter this code:

combined = cleaned.merge(weather, how='cross')
combined

I get a merged dataset that contains 90 rows.
我的DataFrame合并出了什么问题?

I know one dataset has one row less than another. But how can I merge them without somehow duplicating 90 times? I just want to simply mash these two datasets together and drop the 9th row.

Tried variations on how=None

答案1

得分: 2

要合并两个数据帧,您必须找到一个公共键。您可以从第二个数据帧的“Date”列中提取年份。现在,您有一个共同的键来执行内连接:

# 如果需要,将date_time转换为datetime64
weather['date_time'] = pd.to_datetime(weather['date_time'])

# 如果需要,将Date转换为整数
cleaned['Date'] = cleaned['Date'].astype(int)

out = cleaned.merge(weather.assign(Date=df2['datetime'].dt.year), on='Date')
英文:

To merge two dataframes, you have to find a common key. You can extract the year from the Date column on your second dataframe. Now, you have a common key to process an inner join:

# If needed, convert date_time as datetime64
weather['date_time'] = pd.to_datetime(weather['date_time'])

# If needed, convert Date as int
cleaned['Date'] = cleaned['Date'].astype(int)

out = cleaned.merge(weather.assign(Date=df2['datetime'].dt.year), on='Date')

答案2

得分: 1

你需要从 'date_time' 列中创建一个只包含年份的新列,然后在 'Year' 列上应用合并操作。之后,如果你不想在数据框中保留 'date_time' 列,可以删除它。

weather['date_time'] = pd.to_datetime(df['date_time'])
weather['Date'] = weather['date_time'].dt.year

combined = cleaned.merge(weather, on='Date')

# 之后删除 'date_time' 列
combined.drop(['date_time'], axis=1, inplace=True)
英文:

You need to create a new column from 'date_time' which contains only year and then apply merge on 'Year' column. After that drop that date_time column if you don't want to keep in your dataframe.

weather['date_time'] = pd.to_datetime(df['date_time'])
weather['Date'] = weather['date_time'].dt.year

combined = cleaned.merge(weather, on = 'Date')

# after that drop the 'date_time' column

combined.drop(['date_time'], axis = 1, inplace = True)

huangapple
  • 本文由 发表于 2023年6月9日 12:17:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76437174.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定