英文:
How to top and tail data from NHS or ONS using pandas dataframe
问题
以下是您要翻译的内容:
"I'm playing with Pandas to look at alcohol-related admissions from https://fingertips.phe.org.uk/documents/LAPE_Statistical_Tables_for_England_2021.xlsx
I've got a bit of code that sort of looks like this:
df = pd.read_excel("../LAPE_Statistical_Tables_for_England_2021.xlsx", sheet_name="1.3", skiprows=5)
df = (df.dropna(how="all", axis="columns"),
df.dropna(how="all", axis="rows")
)
df
The results look like this:
Unnamed: 0 Unnamed: 1 Admissions Number of admissions per 100,000 population6 Admissions.1 Number of admissions per 100,000 population6.1 Admissions.2 Number of admissions per 100,000 population6.2 Admissions.3 Number of admissions per 100,000 population6.3 ...
1 E92000001 ENGLAND7 976420.0 1810.0 713550.0 2810.0 262870.0 940.0 938620 1770 ... 635060.0 2570.0 235020.0 860.0 841760 1620 614050 2530 227710 840
3 NaN Unknown 5100.0 0.0 4360.0 0.0 730.0 0.0 5260 0 ... 4370.0 0.0 620.0 0.0 5220 0 4460 0 760 0
5 E12000004 EAST MIDLANDS 86990.0 1820.0 63300.0 2790.0 23690.0 970.0 81060 1720 ... 54360.0 2470.0 20760.0 870.0 67800 1480 48930 2260 18870 800
7 E07000032 Amber Valley 2630.0 1900.0 1860.0 2810.0 780.0 1110.0 2280 1670 ... 1500.0 2350.0 610.0 890.0 2040 1550 1450 2330 590 880
8 E07000170 Ashfieldⁱ 2550.0 2010.0 1830.0 3050.0 710.0 1090.0 2550 2030 ... 1650.0 2810.0 620.0 970.0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
382 Sources NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
383 ● Hospital Episode Statistics, NHS Digital - D... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
384 ● Public Health England - Attributable fractio... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
385 ● Mid-year Population Estimatesin England, The... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
387 Hospital Episode Statistics (HES) Copyright © ... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
363 rows × 29 columns
I can change the column names manually but I want to drop rows where column Unnamed: 1 contains all upper case characters.
I ideally want to drop all the attribution rows at the bottom. Is that possible with pd.read_excel()?
UPDATE: yes, it is possible to skip the footer with skipfooter in pd.read_excel."
英文:
I'm playing with Pandas to look at alcohol-related admissions from https://fingertips.phe.org.uk/documents/LAPE_Statistical_Tables_for_England_2021.xlsx
I've got a bit of code that sort of looks like this:
df = pd.read_excel("../LAPE_Statistical_Tables_for_England_2021.xlsx", sheet_name="1.3", skiprows=5)
df = (df.dropna(how="all", axis="columns"),
df.dropna(how="all", axis="rows")
)
df
The results look like this:
Unnamed: 0 Unnamed: 1 Admissions Number of admissions per 100,000 population6 Admissions.1 Number of admissions per 100,000 population6.1 Admissions.2 Number of admissions per 100,000 population6.2 Admissions.3 Number of admissions per 100,000 population6.3 ... Admissions.7 Number of admissions per 100,000 population6.7 Admissions.8 Number of admissions per 100,000 population6.8 Admissions.9 Number of admissions per 100,000 population6.9 Admissions.10 Number of admissions per 100,000 population6.10 Admissions.11 Number of admissions per 100,000 population6.11
1 E92000001 ENGLAND7 976420.0 1810.0 713550.0 2810.0 262870.0 940.0 938620 1770 ... 635060.0 2570.0 235020.0 860.0 841760 1620 614050 2530 227710 840
3 NaN Unknown 5100.0 0.0 4360.0 0.0 730.0 0.0 5260 0 ... 4370.0 0.0 620.0 0.0 5220 0 4460 0 760 0
5 E12000004 EAST MIDLANDS 86990.0 1820.0 63300.0 2790.0 23690.0 970.0 81060 1720 ... 54360.0 2470.0 20760.0 870.0 67800 1480 48930 2260 18870 800
7 E07000032 Amber Valley 2630.0 1900.0 1860.0 2810.0 780.0 1110.0 2280 1670 ... 1500.0 2350.0 610.0 890.0 2040 1550 1450 2330 590 880
8 E07000170 Ashfieldⁱ 2550.0 2010.0 1830.0 3050.0 710.0 1090.0 2550 2030 ... 1650.0 2810.0 620.0 970.0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
382 Sources NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
383 ● Hospital Episode Statistics, NHS Digital - D... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
384 ● Public Health England - Attributable fractio... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
385 ● Mid-year Population Estimatesin England, The... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
387 Hospital Episode Statistics (HES) Copyright © ... NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
363 rows × 29 columns
I can change the column names manually but I want to drop rows where column Unnamed: 1 contains all upper case characters.
I ideally want to drop all the attribution rows at the bottom. Is that possible with pd.read_excel()?
UPDATE: yes, it is possible to skip the footer with skipfooter in pd.read_excel.
答案1
得分: 1
这是您可以删除值在Unnamed: 1
中全部由大写字符组成的行的方法。请注意,我还调整了您的代码,删除NaN值,因为原始版本返回了一个包含DataFrames
元组,而不是在同一DataFrame
上应用.dropna()
操作(我假设您希望在同一DataFrame
上应用这两个操作)。
df = pd.read_excel(filepath, sheet_name="1.3", skiprows=5)
df = (
df
.dropna(how="all", axis="columns")
.dropna(how="all", axis="rows")
)
print(df['Unnamed: 1'].head(n=5).to_markdown(index=False))
df = df.loc[
df['Unnamed: 1'] != df['Unnamed: 1'].str.upper(),
:
]
print(df['Unnamed: 1'].head(n=5).to_markdown(index=False))
请注意,这是您提供的代码的翻译部分。
英文:
Here's how you can drop the rows whose value in Unnamed: 1
consists entirely of uppercase characters. Note that I've also adjusted your code dropping NaN's as the original version returned a tuple of DataFrames
instead of applying both .dropna()
on the same DataFrame
(I assume you wanted to apply both operations on the same DataFrame
).
df = pd.read_excel(filepath, sheet_name="1.3", skiprows=5)
df = (
df
.dropna(how="all", axis="columns")
.dropna(how="all", axis="rows")
)
print(df['Unnamed: 1'].head(n=5).to_markdown(index=False))
# | Unnamed: 1 |
# |:--------------|
# | ENGLAND7 |
# | Unknown |
# | EAST MIDLANDS |
# | Amber Valley |
# | Ashfieldⁱ |
df = df.loc[
df['Unnamed: 1'] != df['Unnamed: 1'].str.upper(),
:
]
print(df['Unnamed: 1'].head(n=5).to_markdown(index=False))
# | Unnamed: 1 |
# |:-------------|
# | Unknown |
# | Amber Valley |
# | Ashfieldⁱ |
# | Bassetlaw |
# | Blaby |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论