如何在pandas数据帧中筛选行

huangapple go评论68阅读模式
英文:

How to filter rows in pandas dataframed

问题

以下是已翻译的内容:

  1. df = pd.read_excel("LAPE_Statistical_Tables_for_England_2021.xlsx", sheet_name="1.3", skiprows=5, skipfooter=24)
  2. df = (
  3. df.dropna(how="all", axis="columns"),
  4. df.dropna(how="all", axis="rows"),
  5. # 使用 pd.query() 过滤包含全部大写字符的 "Unnamed: 1" 列的行
  6. df.dropna(subset=["Unnamed: 1"], how="any")
  7. )
  8. df = pd.concat(df)
  9. # 如果需要的话,重置索引
  10. df = df.reset_index(drop=True)
  11. df

为什么它不会删除包含 NaN 的列 1,以及为什么它不会删除列 1 包含全大写字符的行?为什么它不起作用?

英文:

Confused.

Here's a dataset.

  1. Unnamed: 0 Unnamed: 1 Admissions Number of admissions per 100,000 population6 Unnamed: 4 Admissions.1 Number of admissions per 100,000 population6.1 Unnamed: 7 Admissions.2 Number of admissions per 100,000 population6.2 ... Unnamed: 28 Admissions.9 Number of admissions per 100,000 population6.9 Unnamed: 31 Admissions.10 Number of admissions per 100,000 population6.10 Unnamed: 34 Admissions.11 Number of admissions per 100,000 population6.11 Unnamed: 37
  2. 0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  3. 1 E92000001 ENGLAND7 976420.0 1810.0 NaN 713550.0 2810.0 NaN 262870.0 940.0 ... NaN 841760 1620 NaN 614050 2530 NaN 227710 840 NaN
  4. 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
  5. 3 NaN Unknown 5100.0 0.0 NaN 4360.0 0.0 NaN 730.0 0.0 ... NaN 5220 0 NaN 4460 0 NaN 760 0 NaN
  6. 4 NaN 1 2.0 3.0 4.0 5.0 6.0 7.0

My code to transform it is:

  1. df = pd.read_excel("LAPE_Statistical_Tables_for_England_2021.xlsx", sheet_name="1.3", skiprows=5, skipfooter=24)
  2. df = (
  3. df.dropna(how="all", axis="columns"),
  4. df.dropna(how="all", axis="rows"),
  5. # Filter rows where "Unnamed: 1" column contains all uppercase characters using pd.query()
  6. df.dropna(subset=["Unnamed: 1"], how="any")
  7. )
  8. df = pd.concat(df)
  9. # Reset the index if needed
  10. df = df.reset_index(drop=True)
  11. df

But why does it not remove the NaNs where column 1 is clearly contains NaN. I also want to remove the rows where column 1 is all uppercase.

Why does this not work?

答案1

得分: 1

在代码中存在一些问题。括号创建了一个元组,而不是修改原始的 DataFrame df,而且不是将一个元组 (df) 连接起来,而是连接了前面操作生成的各个 DataFrame 来解决错误。

我考虑了前两列,以下是已经更正的代码:

  1. df = pd.read_excel("test1.xlsx")
  2. df = df.dropna(how="all", axis="columns") # 移除所有 NaN 值的列
  3. df = df.dropna(how="all", axis="rows") # 移除所有 NaN 值的行
  4. # 使用 pd.query() 过滤 "Unnamed: 1" 列中包含全部大写字符的行
  5. df = df.dropna(subset=["Unnamed: 1"], how="any")
  6. # 如果需要,重置索引
  7. df = df.reset_index(drop=True)

使用你的代码输出:

  1. df
  2. Unnamed: 0 Unnamed: 1
  3. 0 NaN
  4. 1 E92000001
  5. 2 NaN
  6. 3 NaN
  7. 4 NaN
  8. 0 NaN
  9. 1 E92000001
  10. 2 NaN
  11. 3 NaN
  12. 4 NaN
  13. 1 E92000001

应用我的修改后的输出:

  1. Unnamed: 0 Unnamed: 1
  2. 1 E92000001
英文:

There are couple of issues in the code. The parentheses create a tuple instead of modifying the original DataFrame df and instead of concatenating a tuple (df), concatenate the individual DataFrames resulting from the previous operations to resolve the error.

I have considered first two columns and Here's the corrected code:

  1. df = pd.read_excel("test1.xlsx")
  2. df = df.dropna(how="all", axis="columns") # Remove columns with all NaN values
  3. df = df.dropna(how="all", axis="rows") # Remove rows with all NaN values
  4. # Filter rows where "Unnamed: 1" column contains all uppercase characters using pd.query()
  5. df = df.dropna(subset=["Unnamed: 1"], how="any")
  6. # Reset the index if needed
  7. df = df.reset_index(drop=True)

Output using your code:

  1. df
  2. Unnamed: 0 Unnamed: 1
  3. 0 NaN
  4. 1 E92000001
  5. 2 NaN
  6. 3 NaN
  7. 4 NaN
  8. 0 NaN
  9. 1 E92000001
  10. 2 NaN
  11. 3 NaN
  12. 4 NaN
  13. 1 E92000001

Output after applying my modifications:

  1. Unnamed: 0 Unnamed: 1
  2. 1 E92000001

huangapple
  • 本文由 发表于 2023年5月29日 09:08:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76354169.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定