2023年5月29日 09:08:38go评论124阅读模式

英文:

How to filter rows in pandas dataframed

问题

以下是已翻译的内容：

df = pd.read_excel("LAPE_Statistical_Tables_for_England_2021.xlsx", sheet_name="1.3", skiprows=5, skipfooter=24)
df = (
    df.dropna(how="all", axis="columns"),
    df.dropna(how="all", axis="rows"),
    # 使用 pd.query() 过滤包含全部大写字符的 "Unnamed: 1" 列的行
    df.dropna(subset=["Unnamed: 1"], how="any")
)

df = pd.concat(df)

# 如果需要的话，重置索引
df = df.reset_index(drop=True)
df

为什么它不会删除包含 NaN 的列 1，以及为什么它不会删除列 1 包含全大写字符的行？为什么它不起作用？

英文:

Confused.

Here's a dataset.

Unnamed: 0	Unnamed: 1	Admissions	Number of admissions per 100,000 population6	Unnamed: 4	Admissions.1	Number of admissions per 100,000 population6.1	Unnamed: 7	Admissions.2	Number of admissions per 100,000 population6.2	...	Unnamed: 28	Admissions.9	Number of admissions per 100,000 population6.9	Unnamed: 31	Admissions.10	Number of admissions per 100,000 population6.10	Unnamed: 34	Admissions.11	Number of admissions per 100,000 population6.11	Unnamed: 37
0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	E92000001	ENGLAND7	976420.0	1810.0	NaN	713550.0	2810.0	NaN	262870.0	940.0	...	NaN	841760	1620	NaN	614050	2530	NaN	227710	840	NaN
2	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	NaN	Unknown	5100.0	0.0	NaN	4360.0	0.0	NaN	730.0	0.0	...	NaN	5220	0	NaN	4460	0	NaN	760	0	NaN
4	NaN	1	2.0	3.0	4.0	5.0	6.0	7.0

My code to transform it is:

df = pd.read_excel(&quot;LAPE_Statistical_Tables_for_England_2021.xlsx&quot;, sheet_name=&quot;1.3&quot;, skiprows=5, skipfooter=24)
df = (
    df.dropna(how=&quot;all&quot;, axis=&quot;columns&quot;),
    df.dropna(how=&quot;all&quot;, axis=&quot;rows&quot;),
    # Filter rows where &quot;Unnamed: 1&quot; column contains all uppercase characters using pd.query()
    df.dropna(subset=[&quot;Unnamed: 1&quot;], how=&quot;any&quot;)
)

df = pd.concat(df)

# Reset the index if needed
df = df.reset_index(drop=True)
df

But why does it not remove the NaNs where column 1 is clearly contains NaN. I also want to remove the rows where column 1 is all uppercase.

Why does this not work?

答案1

得分: 1

在代码中存在一些问题。括号创建了一个元组，而不是修改原始的 DataFrame df，而且不是将一个元组 (df) 连接起来，而是连接了前面操作生成的各个 DataFrame 来解决错误。

我考虑了前两列，以下是已经更正的代码：

df = pd.read_excel("test1.xlsx")
df = df.dropna(how="all", axis="columns")  # 移除所有 NaN 值的列
df = df.dropna(how="all", axis="rows")     # 移除所有 NaN 值的行

# 使用 pd.query() 过滤 "Unnamed: 1" 列中包含全部大写字符的行
df = df.dropna(subset=["Unnamed: 1"], how="any")

# 如果需要，重置索引
df = df.reset_index(drop=True)

使用你的代码输出：

df

Unnamed: 0	Unnamed: 1
0	NaN
1	E92000001
2	NaN
3	NaN
4	NaN
0	NaN
1	E92000001
2	NaN
3	NaN
4	NaN
1	E92000001

应用我的修改后的输出：

Unnamed: 0	Unnamed: 1
1	E92000001

英文:

There are couple of issues in the code. The parentheses create a tuple instead of modifying the original DataFrame df and instead of concatenating a tuple (df), concatenate the individual DataFrames resulting from the previous operations to resolve the error.

I have considered first two columns and Here's the corrected code:

df = pd.read_excel(&quot;test1.xlsx&quot;)
df = df.dropna(how=&quot;all&quot;, axis=&quot;columns&quot;)  # Remove columns with all NaN values
df = df.dropna(how=&quot;all&quot;, axis=&quot;rows&quot;)     # Remove rows with all NaN values

# Filter rows where &quot;Unnamed: 1&quot; column contains all uppercase characters using pd.query()
df = df.dropna(subset=[&quot;Unnamed: 1&quot;], how=&quot;any&quot;)

# Reset the index if needed
df = df.reset_index(drop=True)

Output using your code:

df

Unnamed: 0	Unnamed: 1
0	NaN
1	E92000001
2	NaN
3	NaN
4	NaN
0	NaN
1	E92000001
2	NaN
3	NaN
4	NaN
1	E92000001

Output after applying my modifications:

Unnamed: 0	Unnamed: 1
1	E92000001

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在pandas数据帧中筛选行

问题

答案1

ValueError: 无法将字符串转换为浮点数: ‘Intel’

Pandas按年份日期排序两列，年末环绕到新年。

在Pandas中，按另一列对数据进行分组，计算行之间的百分比变化。

日期替换 Pandas

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论