2023年5月13日 16:22:11go评论97阅读模式

英文:

How to drop rows based on value of index

问题

我想根据索引值删除行，但我并不是指逐个列举它们。我想知道是否有一种方法可以不手动列出这些年份。
我想要删除索引在2000以下的行，有没有办法用一个公式来实现，我在想类似于 `drop(label=[df.index<2000]` 的东西。
显然这段代码是不正确的，但我希望它能给出我想要发生的事情的一个想法。

英文:

I want to drop rows based on index values, but I don't mean listing them. I want to see if there is a way where I could not list down the years manually.

I want to drop the rows with indices below 2000, is there anyway to do this with a formula, I'm thinking something like drop(label=[df.index<2000].

obviously the code is incorrect but i hope it gives an Idea of what I want to happen.

根据索引值删除行

答案1

得分: 0

这是一种方法来做：

import numpy as np
import pandas as pd
# 为了可重现性设置随机种子
np.random.seed(42)
# 生成一个随机的DataFrame
index_values = np.arange(1000, 3001)  # 索引值在1000到3000之间
data = np.random.randn(len(index_values), 3)  # 随机数据
columns = ['A', 'B', 'C']  # 列名
df = pd.DataFrame(data, index=index_values, columns=columns)
# 删除索引低于2000的行
df_filtered = df.drop(df[df.index < 2000].index)
# 打印结果DataFrame
print(df_filtered)

过滤前：

      A         B         C
1000  0.496714 -0.138264  0.647689
1001  1.523030 -0.234153 -0.234137
1002  1.579213  0.767435 -0.469474
1003  0.542560 -0.463418 -0.465730
1004  0.241962 -1.913280 -1.724918
...        ...       ...       ...
2996  0.434941 -0.393987  0.537768
2997  0.306389 -0.998307  0.518793
2998  0.863528  0.171469  1.152648
2999 -1.217404  0.467950 -1.170281
3000 -1.114081 -0.630931 -0.942060

过滤后：

      A         B         C
2000 -1.907808 -0.860385 -0.413606
2001  1.887688  0.556553 -1.335482
2002  0.486036 -1.547304  1.082691
2003 -0.471125 -0.093636  1.325797
2004 -1.287164 -1.397118 -0.583599
...        ...       ...       ...
2996  0.434941 -0.393987  0.537768
2997  0.306389 -0.998307  0.518793
2998  0.863528  0.171469  1.152648
2999 -1.217404  0.467950 -1.170281
3000 -1.114081 -0.630931 -0.942060

英文:

Here is one way to do it:

import numpy as np
import pandas as pd
# Set the random seed for reproducibility
np.random.seed(42)
# Generate a random DataFrame
index_values = np.arange(1000, 3001)  # Index values between 1000 and 3000
data = np.random.randn(len(index_values), 3)  # Random data
columns = [&#39;A&#39;, &#39;B&#39;, &#39;C&#39;]  # Column names
df = pd.DataFrame(data, index=index_values, columns=columns)
# Drop rows where index is below 2000
df_filtered = df.drop(df[df.index &lt; 2000].index)
# Print the resulting DataFrame
print(df_filtered)

Before filtering:

      A         B         C
1000  0.496714 -0.138264  0.647689
1001  1.523030 -0.234153 -0.234137
1002  1.579213  0.767435 -0.469474
1003  0.542560 -0.463418 -0.465730
1004  0.241962 -1.913280 -1.724918
...        ...       ...       ...
2996  0.434941 -0.393987  0.537768
2997  0.306389 -0.998307  0.518793
2998  0.863528  0.171469  1.152648
2999 -1.217404  0.467950 -1.170281
3000 -1.114081 -0.630931 -0.942060

After filtering:

      A         B         C
2000 -1.907808 -0.860385 -0.413606
2001  1.887688  0.556553 -1.335482
2002  0.486036 -1.547304  1.082691
2003 -0.471125 -0.093636  1.325797
2004 -1.287164 -1.397118 -0.583599
...        ...       ...       ...
2996  0.434941 -0.393987  0.537768
2997  0.306389 -0.998307  0.518793
2998  0.863528  0.171469  1.152648
2999 -1.217404  0.467950 -1.170281
3000 -1.114081 -0.630931 -0.942060

答案2

得分: 0

To select all indexes with a value greater than 2000, you can use df.index > 2000. To filter for greater or equal, use df.index >= 2000. This will reduce the original DataFrame and drop all values with a smaller index. To see the difference, you can create a copy and compare it with the original data.

import pandas as pd
df = pd.DataFrame({'a': [0, 1, 2, 3, 4]}, index=[1998.0, 1999, 2000, 2001, 2002])
dropped_df = df[df.index > 2000].copy()
>>> dropped_df
        a
2001.0  3
2002.0  4

(Note: I've only translated the code-related content as requested.)

英文:

To select all indexes with an value greater than 2000, you can use df.index>2000. To filter for greater or equal use df.index>=2000. This will reduce the original DataFrame and drop all values with a smaller index. To see the difference, you can create a copy and compare with the original data.

import pandas as pd
df = pd.DataFrame({&#39;a&#39;:[0,1,2,3,4]}, index=[1998.0,1999,2000,2001,2002])
dropped_df = df[df.index&gt;2000].copy()
&gt;&gt;&gt; dropped_df
        a
2001.0  3
2002.0  4

答案3

得分: 0

你可以尝试布尔索引 -

df = df.drop(df[df.index < 2000].index)

英文:

You can try boolean index -

df = df.drop(df[df.index &lt; 2000].index)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据索引值删除行

问题

答案1

答案2

答案3

如何定义提示权重以适用于huggingface的diffusers.StableDiffusionInpaintPipeline？

从列表中的字符串开头删除数字字符。

write a prog that inputs an integer 0-999 and then prints if the integer entered is a 1/2/3 digit number

从 Pandas 中检索类似 JSON 结构的列表中的数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论