英文:
KeyError: 'date' - I don't know why I keep getting this error
问题
# 将日期列转换为日期时间类型
df['date'] = pd.to_datetime(df.date)
# 按季度分组数据并计算每百万人口的总病例数
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index()
# 创建每季度每百万人口总病例数的条形图
sns.set_style('whitegrid')
sns.barplot(x='date', y='total_cases_per_million', data=df_total_cases_quarterly)
plt.title('每百万人口总病例数(全球)- 季度')
plt.xlabel('季度')
plt.ylabel('每百万人口总病例数')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4']) # 更新x轴标签以显示季度
plt.show()
英文:
Can anyone help with this error? I am trying to plot high-level charts to get an idea of the various developments in a Covid-19 dataset that I'm using.
# Convert date column to datetime type
df['date'] = pd.to_datetime(df.date)
# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter
# Create bar plot for total cases per million on a quarterly basis
sns.set_style('whitegrid')
sns.barplot(x ='date', y = 'total_cases_per_million', data = df_total_cases_quarterly)
plt.title('Total Cases per Million (Worldwide) - Quarterly')
plt.xlabel('Quarter')
plt.ylabel('Total Cases per Million')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4']) # Update x-axis labels to show quarters
plt.show()
The error code whenever I run the code given above
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3628 try:
-> 3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:
/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'date'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/var/folders/rf/8yc43r0d13l2gw8m9r17pc7h0000gn/T/ipykernel_942/1110610808.py in <module>
3
4 # Group data by quarter and calculate total cases per million
----> 5 df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter
6
7 # Create bar plot for total cases per million on a quarterly basis
/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
3503 if self.columns.nlevels > 1:
3504 return self._getitem_multilevel(key)
-> 3505 indexer = self.columns.get_loc(key)
3506 if is_integer(indexer):
3507 indexer = [indexer]
/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3629 return self._engine.get_loc(casted_key)
3630 except KeyError as err:
-> 3631 raise KeyError(key) from err
3632 except TypeError:
3633 # If we have a listlike key, _check_indexing_error will raise
KeyError: 'date'
I grouped a COVID-19 dataset by quarter and calculated the total cases per million so that the result can be displayed using seaborn. Despite date column not missing which I used earlier in the code as seen below and did run successfully, I am not sure why I am getting the error code given above. I would to know what is casing the error and fix it. Thanks for your help!
This code runs successfully
# Filter data for daily new deaths per million
df_daily_new_deaths = df.groupby('date').agg({'new_deaths_per_million': 'sum'}).reset_index()
# Create line plot for daily new deaths per million
sns.set_style('whitegrid')
sns.lineplot(x = 'date', y = 'new_deaths_per_million', data = df_daily_new_deaths)
plt.title('Daily New Deaths per Million (Worldwide)')
plt.xlabel('Date')
plt.ylabel('Daily New Deaths per Million')
plt.xticks(rotation = 45)
plt.show()
答案1
得分: 1
Your code should work fine, accessing dt.quarter
shouldn't change the column name. You are probably doing something else that you are not reporting here. Or maybe using an old version of pandas with a bug?
This was tested both on python 3.8 + pandas 1.5.2 and on python 3.11 + pandas 2.0.0.
Example:
import pandas as pd
import numpy as np
# set up dummy data
np.random.seed(0)
df = pd.DataFrame({'date': ['2023-01-01', '2023-04-01', '2023-07-01', '2023-10-01']*5,
'total_cases_per_million': np.random.random(20),
})
df['date'] = pd.to_datetime(df.date)
# running your exact code
import seaborn as sns
# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter
sns.set_style('whitegrid')
sns.barplot(x='date', y='total_cases_per_million', data=df_total_cases_quarterly)
plt.title('Total Cases per Million (Worldwide) - Quarterly')
plt.xlabel('Quarter')
plt.ylabel('Total Cases per Million')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4']) # Update x-axis labels to show quarters
plt.show()
Output:
英文:
Your code should work fine, accessing dt.quarter
shouldn't change the column name. You are probably doing something else that you are not reporting here. Or maybe using an old version of pandas with a bug?
This was tested both on python 3.8 + pandas 1.5.2 and on python 3.11 + pandas 2.0.0.
Example:
import pandas as pd
import numpy as np
# set up dummy data
np.random.seed(0)
df = pd.DataFrame({'date': ['2023-01-01', '2023-04-01', '2023-07-01', '2023-10-01']*5,
'total_cases_per_million': np.random.random(20),
})
df['date'] = pd.to_datetime(df.date)
# running your exact code
import seaborn as sns
# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter
df_total_cases_quarterly
sns.set_style('whitegrid')
sns.barplot(x ='date', y = 'total_cases_per_million', data = df_total_cases_quarterly)
plt.title('Total Cases per Million (Worldwide) - Quarterly')
plt.xlabel('Quarter')
plt.ylabel('Total Cases per Million')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4']) # Update x-axis labels to show quarters
plt.show()
Output:
答案2
得分: 0
你可以在groupby
之前按季度重新分配列名date
:
df_total_cases_quarterly = (df.assign(date=df['date'].dt.quarter)
.groupby('date')
.agg({'total_cases_per_million': 'sum'})
.reset_index()
或者通过DataFrame.rename_axis
更改索引名称:
df_total_cases_quarterly = (df.groupby(df['date'].dt.quarter)
.agg({'total_cases_per_million': 'sum'})
.rename_axis('date')
.reset_index()
英文:
You can reassign column date by quarters before groupby
:
df_total_cases_quarterly = (df.assign(date=df['date'].dt.quarter)
.groupby('date')
.agg({'total_cases_per_million': 'sum'})
.reset_index()
Or change index name by DataFrame.rename_axis
:
df_total_cases_quarterly = (df.groupby(df['date'].dt.quarter)
.agg({'total_cases_per_million': 'sum'})
.rename_axis('date')
.reset_index()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论