2023年7月14日 01:34:15go评论78阅读模式

英文:

How to remove dataframe rows based on multiple conditions

问题

我试图使用第二个数据框（df2）的值来从第一个数据框（df1）中删除满足多个条件的行。我想要比较这两个数据框中的‘Timestamp’（T）和‘delta_t’（dt）标签下的数据。

我想要应用的函数是当T_{df1} == T_{df2}时，删除所有满足条件dt_{df2} - 0.1 < dt_{df1} < dt_{df2}的行。

换句话说，当每个数据框的时间戳值相等时，我想要比较delta_t值。如果df_1的delta_t值在df2的delta_t值的±0.1范围内，那么从df1中删除这些行。

任何帮助都将不胜感激！

干杯！

我尝试使用df1.loc['timestamp'].isin(df2['timestamp']来获取具有相应时间戳值的行。但我不确定如何比较delta_t值并删除落在特定范围内的行。

编辑：
数据最初保存在一个具有许多列的数据框中。其中一个列被标记为‘channels’。为了形成我要比较的这两个数据框（df1，df2），我基于通道值进行分隔，使用以下方法：

noise = df1[df1[‘channel’] == 3][‘timestamp_copy’]
df2 = df1.loc[(df1[‘timestamp_copy’].isin(noise))]

因此，df1的行数远大于df2的行数。

英文:

I'm trying to remove Dataframe rows using multiple conditions from one Dataframe (df1) based on values from a second Dataframe (df2). The data I'm interested in comparing within these dataframes is labelled 'Timestamp' (T) and 'delta_t' (dt).

The function I'm looking to apply is that when T_{df1} == T_{df2}, then remove all lines where dt_{df2} - 0.1 < dt_{df1} < dt_{df2}

In other words, when the timestamp values from each dataframe are equal, I then want to compare the delta_t values. If the delta_t values of df_1 fall within a +/- range of 0.1 of the delta_t values of df2, then remove these rows from the df1.

Any help is much appreciated!

Cheers!

I have tried using df1.loc['timestamp'].isin(df2['timestamp'] to acquire the rows with corresponding timestamp values. BUt I'm not sure how to compare the delta_t values and remove lines which fall within a specific range.

EDIT:
The data is originally saved in one dataframe with many columns. One of the columns is labelled 'channels'. To form the two dataframes (df1, df2) that I compare, I separate based on the channel value using the following:

noise = df1[df1['channel'] == 3]['timestamp_copy']
df2 = df1.loc[(df1['timestamp_copy'].isin(noise))]

Therefore, the number of rows in df1 >> df2.

答案1

得分: 1

如果我理解正确，那么这满足了您的任务，您可以选择满足您所需条件的索引，然后从数据框df1中删除它们，如下所示：

import pandas as pd

df1 = pd.DataFrame([[1, 2], [10, 11]], columns=['a', 'b'])
df2 = pd.DataFrame([[1, 2], [11, 10]], columns=['a', 'b'])

indices_to_removed = df1[((df1['a'] == df2['a']) & (abs(df1['b'] - df2['b']) <= 0.1))].index
df1 = df1.drop(indices_to_removed)
print(df1)

只需将a和b替换为您的列名称。

英文:

If I got you correctly then this satisfies your task,
you can select the indices where your desired condition satisfied and then drop them from the dataframe df1 as

import pandas as pd

df1 = pd.DataFrame([[1,2],[10,11]],columns=[&#39;a&#39;,&#39;b&#39;])
df2 = pd.DataFrame([[1,2],[11,10]],columns=[&#39;a&#39;,&#39;b&#39;])

indices_to_removed = df1[ ( ( df1[&#39;a&#39;] == df2[&#39;a&#39;] ) &amp; ( abs( df1[&#39;b&#39;] - df2[&#39;b&#39;] ) &lt;= 0.1 ) ) ].index
df1 = df1.drop(indices_to_removed)
print(df1)

just replace a, and b with your columns names.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何根据多个条件删除数据框的行

问题

答案1

pip install -r requirements.txt 在虚拟环境中不起作用

如何正确导入llama-index类？

‘str’对象整数错误用于Python中的os.open

从特定列中填写先前的数值基于一个条件。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论