2023年4月13日 15:43:11go评论214阅读模式

英文:

KeyError: 'date' - I don't know why I keep getting this error

问题

# 将日期列转换为日期时间类型
df['date'] = pd.to_datetime(df.date)

# 按季度分组数据并计算每百万人口的总病例数
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index()

# 创建每季度每百万人口总病例数的条形图
sns.set_style('whitegrid')
sns.barplot(x='date', y='total_cases_per_million', data=df_total_cases_quarterly)
plt.title('每百万人口总病例数（全球）- 季度')
plt.xlabel('季度')
plt.ylabel('每百万人口总病例数')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4'])  # 更新x轴标签以显示季度
plt.show()

英文:

Can anyone help with this error? I am trying to plot high-level charts to get an idea of the various developments in a Covid-19 dataset that I'm using.

# Convert date column to datetime type
df[&#39;date&#39;] = pd.to_datetime(df.date)

# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df[&#39;date&#39;].dt.quarter).agg({&#39;total_cases_per_million&#39;: &#39;sum&#39;}).reset_index() # df[&#39;date&#39;].dt.quarter

# Create bar plot for total cases per million on a quarterly basis
sns.set_style(&#39;whitegrid&#39;)
sns.barplot(x =&#39;date&#39;, y = &#39;total_cases_per_million&#39;, data = df_total_cases_quarterly)
plt.title(&#39;Total Cases per Million (Worldwide) - Quarterly&#39;)
plt.xlabel(&#39;Quarter&#39;)
plt.ylabel(&#39;Total Cases per Million&#39;)
plt.xticks(ticks=[0, 1, 2, 3], labels=[&#39;Q1&#39;, &#39;Q2&#39;, &#39;Q3&#39;, &#39;Q4&#39;])  # Update x-axis labels to show quarters
plt.show()

The error code whenever I run the code given above

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3628             try:
-&gt; 3629                 return self._engine.get_loc(casted_key)
   3630             except KeyError as err:

/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

/opt/anaconda3/lib/python3.9/site-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: &#39;date&#39;

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
/var/folders/rf/8yc43r0d13l2gw8m9r17pc7h0000gn/T/ipykernel_942/1110610808.py in &lt;module&gt;
      3 
      4 # Group data by quarter and calculate total cases per million
----&gt; 5 df_total_cases_quarterly = df.groupby(df[&#39;date&#39;].dt.quarter).agg({&#39;total_cases_per_million&#39;: &#39;sum&#39;}).reset_index() # df[&#39;date&#39;].dt.quarter
      6 
      7 # Create bar plot for total cases per million on a quarterly basis

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/frame.py in __getitem__(self, key)
   3503             if self.columns.nlevels &gt; 1:
   3504                 return self._getitem_multilevel(key)
-&gt; 3505             indexer = self.columns.get_loc(key)
   3506             if is_integer(indexer):
   3507                 indexer = [indexer]

/opt/anaconda3/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3629                 return self._engine.get_loc(casted_key)
   3630             except KeyError as err:
-&gt; 3631                 raise KeyError(key) from err
   3632             except TypeError:
   3633                 # If we have a listlike key, _check_indexing_error will raise

KeyError: &#39;date&#39;

I grouped a COVID-19 dataset by quarter and calculated the total cases per million so that the result can be displayed using seaborn. Despite date column not missing which I used earlier in the code as seen below and did run successfully, I am not sure why I am getting the error code given above. I would to know what is casing the error and fix it. Thanks for your help!

This code runs successfully

# Filter data for daily new deaths per million
df_daily_new_deaths = df.groupby(&#39;date&#39;).agg({&#39;new_deaths_per_million&#39;: &#39;sum&#39;}).reset_index()

# Create line plot for daily new deaths per million
sns.set_style(&#39;whitegrid&#39;)
sns.lineplot(x = &#39;date&#39;, y = &#39;new_deaths_per_million&#39;, data = df_daily_new_deaths)
plt.title(&#39;Daily New Deaths per Million (Worldwide)&#39;)
plt.xlabel(&#39;Date&#39;)
plt.ylabel(&#39;Daily New Deaths per Million&#39;)
plt.xticks(rotation = 45)
plt.show()

答案1

得分: 1

Your code should work fine, accessing dt.quarter shouldn't change the column name. You are probably doing something else that you are not reporting here. Or maybe using an old version of pandas with a bug?

This was tested both on python 3.8 + pandas 1.5.2 and on python 3.11 + pandas 2.0.0.

Example:

import pandas as pd
import numpy as np

# set up dummy data
np.random.seed(0)
df = pd.DataFrame({'date': ['2023-01-01', '2023-04-01', '2023-07-01', '2023-10-01']*5,
                   'total_cases_per_million': np.random.random(20),
                  })
df['date'] = pd.to_datetime(df.date)

# running your exact code

import seaborn as sns

# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df['date'].dt.quarter).agg({'total_cases_per_million': 'sum'}).reset_index() # df['date'].dt.quarter

sns.set_style('whitegrid')
sns.barplot(x='date', y='total_cases_per_million', data=df_total_cases_quarterly)
plt.title('Total Cases per Million (Worldwide) - Quarterly')
plt.xlabel('Quarter')
plt.ylabel('Total Cases per Million')
plt.xticks(ticks=[0, 1, 2, 3], labels=['Q1', 'Q2', 'Q3', 'Q4'])  # Update x-axis labels to show quarters
plt.show()

Output:

英文:

This was tested both on python 3.8 + pandas 1.5.2 and on python 3.11 + pandas 2.0.0.

Example:

import pandas as pd
import numpy as np


# set up dummy data
np.random.seed(0)
df = pd.DataFrame({&#39;date&#39;: [&#39;2023-01-01&#39;, &#39;2023-04-01&#39;, &#39;2023-07-01&#39;, &#39;2023-10-01&#39;]*5,
                   &#39;total_cases_per_million&#39;: np.random.random(20),
                  })
df[&#39;date&#39;] = pd.to_datetime(df.date)


# running your exact code

import seaborn as sns

# Group data by quarter and calculate total cases per million
df_total_cases_quarterly = df.groupby(df[&#39;date&#39;].dt.quarter).agg({&#39;total_cases_per_million&#39;: &#39;sum&#39;}).reset_index() # df[&#39;date&#39;].dt.quarter
df_total_cases_quarterly

sns.set_style(&#39;whitegrid&#39;)
sns.barplot(x =&#39;date&#39;, y = &#39;total_cases_per_million&#39;, data = df_total_cases_quarterly)
plt.title(&#39;Total Cases per Million (Worldwide) - Quarterly&#39;)
plt.xlabel(&#39;Quarter&#39;)
plt.ylabel(&#39;Total Cases per Million&#39;)
plt.xticks(ticks=[0, 1, 2, 3], labels=[&#39;Q1&#39;, &#39;Q2&#39;, &#39;Q3&#39;, &#39;Q4&#39;])  # Update x-axis labels to show quarters
plt.show()

Output:

答案2

得分: 0

你可以在groupby之前按季度重新分配列名date：

df_total_cases_quarterly = (df.assign(date=df['date'].dt.quarter)
                              .groupby('date')
                              .agg({'total_cases_per_million': 'sum'})
                              .reset_index()

或者通过DataFrame.rename_axis更改索引名称：

df_total_cases_quarterly = (df.groupby(df['date'].dt.quarter)
                              .agg({'total_cases_per_million': 'sum'})
                              .rename_axis('date')
                              .reset_index()

英文:

You can reassign column date by quarters before groupby:

df_total_cases_quarterly = (df.assign(date=df[&#39;date&#39;].dt.quarter)
                              .groupby(&#39;date&#39;)
                              .agg({&#39;total_cases_per_million&#39;: &#39;sum&#39;})
                              .reset_index()

Or change index name by DataFrame.rename_axis:

df_total_cases_quarterly = (df.groupby(df[&#39;date&#39;].dt.quarter)
                              .agg({&#39;total_cases_per_million&#39;: &#39;sum&#39;})
                              .rename_axis(&#39;date&#39;)
                              .reset_index()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

KeyError: ‘date’ – 我不知道为什么我一直收到这个错误

问题

答案1

答案2

如何扩展 3D numpy 数组的值？

Set plotly bargap to 0.

Logger.info这样的Logger函数的类型是什么？

Airflow安装后找不到主模块

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论