2023年2月16日 03:37:24go评论95阅读模式

英文:

Drop rows in a pandas DataFrame up to a certain value

问题

I only want to keep the rows that have a date greater or equal to 2014-10-26. The result should be something like the following table:

artist	date
Drake	2014-10-26
Eminem	2014-10-26
Taylor Swift	2014-10-26
Kendrick Lamar	2014-10-26
Rihanna	2014-11-02
Ed Sheeran	2014-11-02
Kanye West	2014-11-02
Lime Cordiale	2014-11-02

I tried using pandas .drop() method like in the following line:

dataset = pd.read_csv("charts.csv")
dataset = pd.DataFrame(dataset)
dataset = dataset.drop(dataset.loc[dataset['date'] <= "2014-10-19", :])

but after executing I get the following error:

KeyError: "['track_id', 'name', 'country', 'date', 'position', 'streams', 'artists', 'artist_genres', 'duration', 'explicit'] not found in axis"

英文:

I'm currently working with a pandas data frame, with approximately 80000 rows, like the following one:

artist	date
Drake	2014-10-12
Kendrick Lamar	2014-10-12
Ed Sheeran	2014-10-12
Maroon 5	2014-10-12
Rihanna	2014-10-19
Foo Fighters	2014-10-19
Bad Bunny	2014-10-19
Eminem	2014-10-19
Drake	2014-10-26
Eminem	2014-10-26
Taylor Swift	2014-10-26
Kendrick Lamar	2014-10-26
Rihanna	2014-11-02
Ed Sheeran	2014-11-02
Kanye West	2014-11-02
Lime Cordiale	2014-11-02

I only want to keep the rows that have a date greater or equal to 2014-10-26. The result should be something like the following table:

artist	date
Drake	2014-10-26
Eminem	2014-10-26
Taylor Swift	2014-10-26
Kendrick Lamar	2014-10-26
Rihanna	2014-11-02
Ed Sheeran	2014-11-02
Kanye West	2014-11-02
Lime Cordiale	2014-11-02

I tried using pandas .drop() method like in the following line:

    dataset = pd.read_csv(&quot;charts.csv&quot;)
    dataset = pd.DataFrame(dataset)
    dataset = dataset.drop(dataset.loc[dataset[&#39;date&#39;] &lt;= &quot;2014-10-19&quot;, :])

but after executing I get the following error:

KeyError: &quot;[&#39;track_id&#39;, &#39;name&#39;, &#39;country&#39;, &#39;date&#39;, &#39;position&#39;, &#39;streams&#39;, &#39;artists&#39;, &#39;artist_genres&#39;, &#39;duration&#39;, &#39;explicit&#39;] not found in axis&quot;

答案1

得分: 0

不确定您遇到了什么错误，您必须提到错误日志。

无论如何，您可以使用索引来删除行，通过筛选数据获取索引，然后删除它：

indexx = dataset[dataset['date'] <= "2014-10-19"].index
dataset.drop(indexx, inplace=True)

英文:

not sure what error you got you must have to mentioned error log.

Anyway
You can use index for drop rows, get index by filter data and then drop it

indexx = dataset[ dataset[&#39;date&#39;] &lt;= &quot;2014-10-19&quot;  ].index
dataset.drop(indexx , inplace=True)

答案2

得分: 0

你可以使用以下代码：

last_date_to_drop = pd.to_datetime("2014-10-19")
dataset["date"] = pd.to_datetime(dataset["date"])
dataset = dataset.loc[dataset["date"].gt(last_date_to_drop)].copy()

不需要进行排序或删除操作，只需按照上述方式对数据框进行子集化并复制。

此外，"drop" 不是你想象中的那样操作。它不会按行值删除，而是按列或索引标签删除。

英文:

You could use:

last_date_to_drop = pd.to_datetime(&quot;2014-10-19&quot;)
dataset[&quot;date&quot;] = pd.to_datetime(dataset[&quot;date&quot;])
dataset = dataset.loc[dataset[&quot;date&quot;].gt(last_date_to_drop)].copy()

You don't need to sort or drop. Just subset the dataframe and copy as above.

Also drop is not what you think it will do. It won't drop by row values, it drops by column or index labels.

答案3

得分: 0

import pandas as pd
df = pd.DataFrame({'artist':['Drake', 'Kendrick Lamar', 'Kendrick Lamar', 'Drake'],
                   'date':['2014-10-12', '2014-10-12', '2014-10-26', '2014-10-26']})
# Be cautious : sort first
df = (df.sort_values(by='date', key=lambda t: pd.to_datetime(t, format='%Y-%m-%d')) 
        .drop_duplicates(subset=['artist'], keep='last'))
print(df)
#            artist        date
# 2  Kendrick Lamar  2014-10-26
# 3           Drake  2014-10-26

英文:

import pandas as pd
df = pd.DataFrame({&#39;artist&#39;:[&#39;Drake&#39;, &#39;Kendrick Lamar&#39;, &#39;Kendrick Lamar&#39;, &#39;Drake&#39;],
                   &#39;date&#39;:[&#39;2014-10-12&#39;, &#39;2014-10-12&#39;, &#39;2014-10-26&#39;, &#39;2014-10-26&#39;]})
# Be cautious : sort first
df = (df.sort_values(by=&#39;date&#39;, key=lambda t: pd.to_datetime(t, format=&#39;%Y-%m-%d&#39;)) 
        .drop_duplicates(subset=[&#39;artist&#39;], keep=&#39;last&#39;))
print(df)
#            artist        date
# 2  Kendrick Lamar  2014-10-26
# 3           Drake  2014-10-26

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在一个 pandas DataFrame 中删除直到某个数值的行。

问题

答案1

答案2

答案3

合并多个BeautifulSoup调用

ValueError: 字节长度与格式和分辨率大小不相等

有关摄入数值的问题，增加了2倍。

Golang中与Python的getattr()或call()等效的方法是什么？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论