英文:
How to remove dataframe rows based on multiple conditions
问题
我试图使用第二个数据框(df2)的值来从第一个数据框(df1
)中删除满足多个条件的行。我想要比较这两个数据框中的‘Timestamp’(T
)和‘delta_t’(dt
)标签下的数据。
我想要应用的函数是当T_{df1} == T_{df2}
时,删除所有满足条件dt_{df2} - 0.1 < dt_{df1} < dt_{df2}
的行。
换句话说,当每个数据框的时间戳值相等时,我想要比较delta_t值。如果df_1
的delta_t
值在df2
的delta_t
值的±0.1范围内,那么从df1中删除这些行。
任何帮助都将不胜感激!
干杯!
我尝试使用df1.loc['timestamp'].isin(df2['timestamp']
来获取具有相应时间戳值的行。但我不确定如何比较delta_t
值并删除落在特定范围内的行。
编辑:
数据最初保存在一个具有许多列的数据框中。其中一个列被标记为‘channels’。为了形成我要比较的这两个数据框(df1,df2),我基于通道值进行分隔,使用以下方法:
noise = df1[df1[‘channel’] == 3][‘timestamp_copy’]
df2 = df1.loc[(df1[‘timestamp_copy’].isin(noise))]
因此,df1的行数远大于df2的行数。
英文:
I'm trying to remove Dataframe rows using multiple conditions from one Dataframe (df1
) based on values from a second Dataframe (df2). The data I'm interested in comparing within these dataframes is labelled 'Timestamp' (T
) and 'delta_t' (dt
).
The function I'm looking to apply is that when T_{df1} == T_{df2}
, then remove all lines where dt_{df2} - 0.1 < dt_{df1} < dt_{df2}
In other words, when the timestamp values from each dataframe are equal, I then want to compare the delta_t values. If the delta_t
values of df_1
fall within a +/- range of 0.1 of the delta_t
values of df2
, then remove these rows from the df1.
Any help is much appreciated!
Cheers!
I have tried using df1.loc['timestamp'].isin(df2['timestamp']
to acquire the rows with corresponding timestamp values. BUt I'm not sure how to compare the delta_t
values and remove lines which fall within a specific range.
EDIT:
The data is originally saved in one dataframe with many columns. One of the columns is labelled 'channels'. To form the two dataframes (df1, df2) that I compare, I separate based on the channel value using the following:
noise = df1[df1['channel'] == 3]['timestamp_copy']
df2 = df1.loc[(df1['timestamp_copy'].isin(noise))]
Therefore, the number of rows in df1 >> df2.
答案1
得分: 1
如果我理解正确,那么这满足了您的任务,您可以选择满足您所需条件的索引,然后从数据框df1
中删除它们,如下所示:
import pandas as pd
df1 = pd.DataFrame([[1, 2], [10, 11]], columns=['a', 'b'])
df2 = pd.DataFrame([[1, 2], [11, 10]], columns=['a', 'b'])
indices_to_removed = df1[((df1['a'] == df2['a']) & (abs(df1['b'] - df2['b']) <= 0.1))].index
df1 = df1.drop(indices_to_removed)
print(df1)
只需将a
和b
替换为您的列名称。
英文:
If I got you correctly then this satisfies your task,
you can select the indices where your desired condition satisfied and then drop them from the dataframe df1
as
import pandas as pd
df1 = pd.DataFrame([[1,2],[10,11]],columns=['a','b'])
df2 = pd.DataFrame([[1,2],[11,10]],columns=['a','b'])
indices_to_removed = df1[ ( ( df1['a'] == df2['a'] ) & ( abs( df1['b'] - df2['b'] ) <= 0.1 ) ) ].index
df1 = df1.drop(indices_to_removed)
print(df1)
just replace a
, and b
with your columns names.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论