Pandas groupby(pd.Grouper) is throwing error for datetime but im running it on a datetime object

huangapple go评论78阅读模式
英文:

Pandas groupby(pd.Grouper) is throwing error for datetime but im running it on a datetime object

问题

以下是您要翻译的代码部分:

I'm using pandas in python, and am trying to group a set of dates by month, and determine the highest value in the dates_and_grades["Grade_Values"] column for each month. I wrote the following code attempting to do this:

data = pd.read_csv(input_filepath)
data['Date'] = pd.to_datetime(data['Date'], format='ISO8601')

roped = ["Sport", "Trad"]

YDS_DICT={"N/A":"N/A",'3-4':0,'5':1,'5.0':1,'5.1':2,'5.2':3,'5.3':4,'5.4':5,
      '5.5':6,'5.6':7,'5.7':8,'5.8':9,'5.9':10,
      '5.10a':11,'5.10b':12, '5.10': 12, '5.10c':13,'5.10d':14,
      '5.11a':15,'5.11b':16, '5.11':16, '5.11c':17,'5.11d':18,
      '5.12a':19,'5.12b':20,'5.12c':21,'5.12d':22,
      '5.13a':23,'5.13b':24,'5.13c':25,'5.13d':26,
      '5.14a':27,'5.14b':28,'5.14c':29,'5.14d':30,
      '5.15a':31,'5.15b':32,'5.15c':33,'5.15d':34}

roped_only_naive = data.loc[data['Route Type'].isin(roped)].copy()
roped_only_naive["Rating"] = roped_only_naive['Rating'].map(slash_grade_converter)
roped_only_naive["Rating"] = roped_only_naive['Rating'].map(flatten_plus_and_minus_grades)
roped_only_naive["Rating"] = roped_only_naive['Rating'].map(remove_risk_ratings)
dates_and_grades = roped_only_naive[['Date', 'Rating']]
print(dates_and_grades.dtypes)
dates_and_grades["Grade_Values"] = dates_and_grades["Rating"].map(lambda data: YDS_DICT[data])
print(dates_and_grades.dtypes)
dates_and_grades['Date'] = dates_and_grades['Date'].groupby(pd.Grouper(freq='M'))
print(dates_and_grades)

However, I get the following error when run. 

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

What is strange is that when I check the types on my dataframe using 

print(dates_and_grades.dtypes)

I get the following printout

Date            datetime64[ns]
Rating                  object
Grade_Values             int64

So it looks like my Date column is indeed a datetime object. 

My question is then, why doesn't the groupby(pd.Grouper(freq='M')) function work on my dates_and_grades['Date'] column if it does seem like dates_and_grades['Date'] is actually a datetime type?
英文:

I'm using pandas in python, and am trying to group a set of dates by month, and determine the highest value in the dates_and_grades["Grade_Values"] column for each month. I wrote the following code attempting to do this:

data = pd.read_csv(input_filepath)
data['Date'] = pd.to_datetime(data['Date'], format = 'ISO8601')
roped = ["Sport", "Trad"]
YDS_DICT={"N/A":"N/A",'3-4':0,'5':1,'5.0':1,'5.1':2,'5.2':3,'5.3':4,'5.4':5,
'5.5':6,'5.6':7,'5.7':8,'5.8':9,'5.9':10,
'5.10a':11,'5.10b':12, '5.10': 12, '5.10c':13,'5.10d':14,
'5.11a':15,'5.11b':16, '5.11':16, '5.11c':17,'5.11d':18,
'5.12a':19,'5.12b':20,'5.12c':21,'5.12d':22,
'5.13a':23,'5.13b':24,'5.13c':25,'5.13d':26,
'5.14a':27,'5.14b':28,'5.14c':29,'5.14d':30,
'5.15a':31,'5.15b':32,'5.15c':33,'5.15d':34}
roped_only_naive = data.loc[data['Route Type'].isin(roped)].copy()
roped_only_naive["Rating"] = roped_only_naive['Rating'].map(slash_grade_converter)
roped_only_naive["Rating"] = roped_only_naive['Rating'].map(flatten_plus_and_minus_grades)
roped_only_naive["Rating"] = roped_only_naive['Rating'].map(remove_risk_ratings)
dates_and_grades = roped_only_naive[['Date', 'Rating']]
print(dates_and_grades.dtypes)
dates_and_grades["Grade_Values"] = dates_and_grades["Rating"].map(lambda data: YDS_DICT[data])
print(dates_and_grades.dtypes)
dates_and_grades['Date'] = dates_and_grades['Date'].groupby(pd.Grouper(freq='M'))
print(dates_and_grades)

However, I get the following error when run.

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

What is strange is that when I check the types on my dataframe using

print(dates_and_grades.dtypes)

I get the following printout

Date            datetime64[ns]
Rating                  object
Grade_Values             int64

So it looks like my Date column is indeed a datetime object.

My question is then, why doesn't the groupby(pd.Grouper(freq='M')) function work on my dates_and_grades['Date'] column if it does seem like dates_and_grades['Date'] is actually a datetime type?

答案1

得分: 1

Grouper中使用参数 key,如果不使用一些转换函数,则不能将groupby的输出分配给新列 - 对于列:

dates_and_grades['new'] = dates_and_grades.groupby(pd.Grouper(freq='M', key='Date'))["Grade_Values"].transform('max')

如果省略 key 参数,Grouper 需要 DatetimeIndex,所以会产生错误。

如果需要每月按最大Grade Values获取行,请使用以下方法:

out = dates_and_grades.loc[dates_and_grades.groupby(pd.Grouper(freq='M', key='Date'))["Grade_Values"].idxmax()]
英文:

Use parameter key in Grouper, also cannot assign ouput of groupby to new column if not used some transformation function - for column :

dates_and_grades['new'] = dates_and_grades.groupby(pd.Grouper(freq='M', key='Date'))["Grade_Values"].transform('max')

If omit key parameter Grouper need DatetimeIndex, so error is expected.

If need rows by maximal Grade Values per months use:

out = dates_and_grades.loc[dates_and_grades.groupby(pd.Grouper(freq='M', key='Date'))["Grade_Values"].idxmax())

huangapple
  • 本文由 发表于 2023年6月15日 14:19:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76479625.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定