KeyError: ‘date’ – 我不知道为什么我一直收到这个错误

huangapple go评论176阅读模式
英文:

KeyError: 'date' - I don't know why I keep getting this error

问题

# 将日期列转换为日期时间类型
df['date'] = pd.to_datetime(df.date)

# 按季度分组数据并计算每百万人口的总病例数
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index()

# 创建每季度每百万人口总病例数的条形图
sns.set_style('whitegrid')
sns.barplot(x='date', y='total_cases_per_million', data=df_total_cases_quarterly)
plt.title('每百万人口总病例数(全球)- 季度')
plt.xlabel('季度')
plt.ylabel('每百万人口总病例数')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4'])  # 更新x轴标签以显示季度
plt.show()
英文:

Can anyone help with this error? I am trying to plot high-level charts to get an idea of the various developments in a Covid-19 dataset that I'm using.

# Convert date column to datetime type
df['date'] = pd.to_datetime(df.date)

# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter

# Create bar plot for total cases per million on a quarterly basis
sns.set_style('whitegrid')
sns.barplot(x ='date', y = 'total_cases_per_million', data = df_total_cases_quarterly)
plt.title('Total Cases per Million (Worldwide) - Quarterly')
plt.xlabel('Quarter')
plt.ylabel('Total Cases per Million')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4'])  # Update x-axis labels to show quarters
plt.show()

The error code whenever I run the code given above

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3628             try:
-> 3629                 return self._engine.get_loc(casted_key)
   3630             except KeyError as err:

/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'date'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/var/folders/rf/8yc43r0d13l2gw8m9r17pc7h0000gn/T/ipykernel_942/1110610808.py in <module>
      3 
      4 # Group data by quarter and calculate total cases per million
----> 5 df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter
      6 
      7 # Create bar plot for total cases per million on a quarterly basis

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3503             if self.columns.nlevels > 1:
   3504                 return self._getitem_multilevel(key)
-> 3505             indexer = self.columns.get_loc(key)
   3506             if is_integer(indexer):
   3507                 indexer = [indexer]

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3629                 return self._engine.get_loc(casted_key)
   3630             except KeyError as err:
-> 3631                 raise KeyError(key) from err
   3632             except TypeError:
   3633                 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'date'

I grouped a COVID-19 dataset by quarter and calculated the total cases per million so that the result can be displayed using seaborn. Despite date column not missing which I used earlier in the code as seen below and did run successfully, I am not sure why I am getting the error code given above. I would to know what is casing the error and fix it. Thanks for your help!

This code runs successfully

# Filter data for daily new deaths per million
df_daily_new_deaths = df.groupby('date').agg({'new_deaths_per_million': 'sum'}).reset_index()

# Create line plot for daily new deaths per million
sns.set_style('whitegrid')
sns.lineplot(x = 'date', y = 'new_deaths_per_million', data = df_daily_new_deaths)
plt.title('Daily New Deaths per Million (Worldwide)')
plt.xlabel('Date')
plt.ylabel('Daily New Deaths per Million')
plt.xticks(rotation = 45)
plt.show()

答案1

得分: 1

Your code should work fine, accessing dt.quarter shouldn't change the column name. You are probably doing something else that you are not reporting here. Or maybe using an old version of pandas with a bug?

This was tested both on python 3.8 + pandas 1.5.2 and on python 3.11 + pandas 2.0.0.

Example:

import pandas as pd
import numpy as np

# set up dummy data
np.random.seed(0)
df = pd.DataFrame({'date': ['2023-01-01', '2023-04-01', '2023-07-01', '2023-10-01']*5,
                   'total_cases_per_million': np.random.random(20),
                  })
df['date'] = pd.to_datetime(df.date)

# running your exact code

import seaborn as sns

# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter

sns.set_style('whitegrid')
sns.barplot(x='date', y='total_cases_per_million', data=df_total_cases_quarterly)
plt.title('Total Cases per Million (Worldwide) - Quarterly')
plt.xlabel('Quarter')
plt.ylabel('Total Cases per Million')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4'])  # Update x-axis labels to show quarters
plt.show()

Output:

KeyError: ‘date’ – 我不知道为什么我一直收到这个错误

英文:

Your code should work fine, accessing dt.quarter shouldn't change the column name. You are probably doing something else that you are not reporting here. Or maybe using an old version of pandas with a bug?

This was tested both on python 3.8 + pandas 1.5.2 and on python 3.11 + pandas 2.0.0.

Example:

import pandas as pd
import numpy as np


# set up dummy data
np.random.seed(0)
df = pd.DataFrame({'date': ['2023-01-01', '2023-04-01', '2023-07-01', '2023-10-01']*5,
                   'total_cases_per_million': np.random.random(20),
                  })
df['date'] = pd.to_datetime(df.date)


# running your exact code

import seaborn as sns

# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter
df_total_cases_quarterly

sns.set_style('whitegrid')
sns.barplot(x ='date', y = 'total_cases_per_million', data = df_total_cases_quarterly)
plt.title('Total Cases per Million (Worldwide) - Quarterly')
plt.xlabel('Quarter')
plt.ylabel('Total Cases per Million')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4'])  # Update x-axis labels to show quarters
plt.show()

Output:

KeyError: ‘date’ – 我不知道为什么我一直收到这个错误

答案2

得分: 0

你可以在groupby之前按季度重新分配列名date

df_total_cases_quarterly = (df.assign(date=df['date'].dt.quarter)
                              .groupby('date')
                              .agg({'total_cases_per_million': 'sum'})
                              .reset_index()

或者通过DataFrame.rename_axis更改索引名称:

df_total_cases_quarterly = (df.groupby(df['date'].dt.quarter)
                              .agg({'total_cases_per_million': 'sum'})
                              .rename_axis('date')
                              .reset_index()
英文:

You can reassign column date by quarters before groupby:

df_total_cases_quarterly = (df.assign(date=df['date'].dt.quarter)
                              .groupby('date')
                              .agg({'total_cases_per_million': 'sum'})
                              .reset_index()

Or change index name by DataFrame.rename_axis:

df_total_cases_quarterly = (df.groupby(df['date'].dt.quarter)
                              .agg({'total_cases_per_million': 'sum'})
                              .rename_axis('date')
                              .reset_index()

huangapple
  • 本文由 发表于 2023年4月13日 15:43:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76002860.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定