如何比较数据帧中所有行的列,无论索引值如何。

huangapple go评论71阅读模式
英文:

How to compare columns from dataframes across all rows regardless of index value

问题

我试图找到那些一个数据框中的时间戳与另一个数据框中的时间戳相同的所有行。问题在于这两个数据框的长度不同,而且当它们具有相同的时间戳时,这两行将来自它们各自的不同行。要点是,我想将一个数据框中的时间戳与另一个数据框的每一行进行比较。

我使用了 df1[df1.eq(df2) == True],但这依赖于具有相同时间戳的行具有相同的索引值。eq 方法返回 False,即使两个数据框中存在相同的时间戳,但它们在不同的行上。

例如,
df1

0   2000-01-01 00:00:00

1   2000-01-02 00:00:00

2   2000-01-03 00:00:00

df2

0    2000-01-02 00:00:00

1    2000-01-03 00:00:00

2    2000-01-04 00:00:00

df1.eq(df2) 返回

0    False

1    False

2    False

即使 2000-01-02 00:00:002000-01-03 00:00:00 存在于两个数据框中,也会返回 False

英文:

I'm trying to find all rows where the timestamps from one dataframe are the same as the timestamps from another dataframe. The problem is that the two dataframes have different lengths AND when they have the same timestamp, those two rows will be from different rows in their respective dataframes. The gist is that I want to compare a timestamp from one dataframe to every row of another dataframe.

I used df1[df1.eq(df2) == True] but this relies on the timestamps that are the same having the same index value. The eq method returns False even if two of the same timestamps exist in both dataframes but are on different rows.

Ex.
df1

0   2000-01-01 00:00:00

1   2000-01-02 00:00:00

2   2000-01-03 00:00:00

df2

0    2000-01-02 00:00:00

1    2000-01-03 00:00:00

2    2000-01-04 00:00:00

df1.eq(df2) returns

0    False

1    False

2    False

even though 2000-01-02 00:00:00 and 2000-01-03 00:00:00 exist in both the dataframes.

答案1

得分: 1

合并数据框,您将知道两个数据框中都存在哪些日期:

dates_in_both_dfs = df1.merge(df2, on="date_column", how="inner")["date_column"].tolist()

根据您的期望输出,您可以向数据框添加新列。例如:

df1['repeated_date'] = df["date_column"].isin(dates_in_both_dfs)
英文:

merge the dataframes and you will know what dates are present in both dataframes:

dates_in_both_dfs = df1.merge(df2, on="date_column", how="inner")["date_column"].tolist()

Depending on your desired output you can add new columns to the dataframes. For example:

df1['repeated_date'] = df["date_column"].isin(dates_in_both_dfs)

huangapple
  • 本文由 发表于 2023年6月9日 00:18:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76433885.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定