如何根据多个条件删除数据框的行

huangapple go评论78阅读模式
英文:

How to remove dataframe rows based on multiple conditions

问题

我试图使用第二个数据框(df2)的值来从第一个数据框(df1)中删除满足多个条件的行。我想要比较这两个数据框中的‘Timestamp’(T)和‘delta_t’(dt)标签下的数据。

我想要应用的函数是当T_{df1} == T_{df2}时,删除所有满足条件dt_{df2} - 0.1 < dt_{df1} < dt_{df2}的行。

换句话说,当每个数据框的时间戳值相等时,我想要比较delta_t值。如果df_1delta_t值在df2delta_t值的±0.1范围内,那么从df1中删除这些行。

任何帮助都将不胜感激!

干杯!

我尝试使用df1.loc[&#39;timestamp&#39;].isin(df2[&#39;timestamp&#39;]来获取具有相应时间戳值的行。但我不确定如何比较delta_t值并删除落在特定范围内的行。

编辑:
数据最初保存在一个具有许多列的数据框中。其中一个列被标记为‘channels’。为了形成我要比较的这两个数据框(df1,df2),我基于通道值进行分隔,使用以下方法:

noise = df1[df1[‘channel’] == 3][‘timestamp_copy’]
df2 = df1.loc[(df1[‘timestamp_copy’].isin(noise))]

因此,df1的行数远大于df2的行数。

英文:

I'm trying to remove Dataframe rows using multiple conditions from one Dataframe (df1) based on values from a second Dataframe (df2). The data I'm interested in comparing within these dataframes is labelled 'Timestamp' (T) and 'delta_t' (dt).

The function I'm looking to apply is that when T_{df1} == T_{df2}, then remove all lines where dt_{df2} - 0.1 &lt; dt_{df1} &lt; dt_{df2}

In other words, when the timestamp values from each dataframe are equal, I then want to compare the delta_t values. If the delta_t values of df_1 fall within a +/- range of 0.1 of the delta_t values of df2, then remove these rows from the df1.

Any help is much appreciated!

Cheers!

I have tried using df1.loc[&#39;timestamp&#39;].isin(df2[&#39;timestamp&#39;] to acquire the rows with corresponding timestamp values. BUt I'm not sure how to compare the delta_t values and remove lines which fall within a specific range.

EDIT:
The data is originally saved in one dataframe with many columns. One of the columns is labelled 'channels'. To form the two dataframes (df1, df2) that I compare, I separate based on the channel value using the following:

noise = df1[df1['channel'] == 3]['timestamp_copy']
df2 = df1.loc[(df1['timestamp_copy'].isin(noise))]

Therefore, the number of rows in df1 >> df2.

答案1

得分: 1

如果我理解正确,那么这满足了您的任务,您可以选择满足您所需条件的索引,然后从数据框df1中删除它们,如下所示:

import pandas as pd

df1 = pd.DataFrame([[1, 2], [10, 11]], columns=['a', 'b'])
df2 = pd.DataFrame([[1, 2], [11, 10]], columns=['a', 'b'])

indices_to_removed = df1[((df1['a'] == df2['a']) & (abs(df1['b'] - df2['b']) <= 0.1))].index
df1 = df1.drop(indices_to_removed)
print(df1)

只需将ab替换为您的列名称。

英文:

If I got you correctly then this satisfies your task,
you can select the indices where your desired condition satisfied and then drop them from the dataframe df1 as

import pandas as pd

df1 = pd.DataFrame([[1,2],[10,11]],columns=[&#39;a&#39;,&#39;b&#39;])
df2 = pd.DataFrame([[1,2],[11,10]],columns=[&#39;a&#39;,&#39;b&#39;])

indices_to_removed = df1[ ( ( df1[&#39;a&#39;] == df2[&#39;a&#39;] ) &amp; ( abs( df1[&#39;b&#39;] - df2[&#39;b&#39;] ) &lt;= 0.1 ) ) ].index
df1 = df1.drop(indices_to_removed)
print(df1)

just replace a, and b with your columns names.

huangapple
  • 本文由 发表于 2023年7月14日 01:34:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76681966.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定