Pandas groupby(pd.Grouper) is throwing error for datetime but im running it on a datetime object

huangapple go评论112阅读模式
英文:

Pandas groupby(pd.Grouper) is throwing error for datetime but im running it on a datetime object

问题

以下是您要翻译的代码部分:

  1. I'm using pandas in python, and am trying to group a set of dates by month, and determine the highest value in the dates_and_grades["Grade_Values"] column for each month. I wrote the following code attempting to do this:
  2. data = pd.read_csv(input_filepath)
  3. data['Date'] = pd.to_datetime(data['Date'], format='ISO8601')
  4. roped = ["Sport", "Trad"]
  5. YDS_DICT={"N/A":"N/A",'3-4':0,'5':1,'5.0':1,'5.1':2,'5.2':3,'5.3':4,'5.4':5,
  6. '5.5':6,'5.6':7,'5.7':8,'5.8':9,'5.9':10,
  7. '5.10a':11,'5.10b':12, '5.10': 12, '5.10c':13,'5.10d':14,
  8. '5.11a':15,'5.11b':16, '5.11':16, '5.11c':17,'5.11d':18,
  9. '5.12a':19,'5.12b':20,'5.12c':21,'5.12d':22,
  10. '5.13a':23,'5.13b':24,'5.13c':25,'5.13d':26,
  11. '5.14a':27,'5.14b':28,'5.14c':29,'5.14d':30,
  12. '5.15a':31,'5.15b':32,'5.15c':33,'5.15d':34}
  13. roped_only_naive = data.loc[data['Route Type'].isin(roped)].copy()
  14. roped_only_naive["Rating"] = roped_only_naive['Rating'].map(slash_grade_converter)
  15. roped_only_naive["Rating"] = roped_only_naive['Rating'].map(flatten_plus_and_minus_grades)
  16. roped_only_naive["Rating"] = roped_only_naive['Rating'].map(remove_risk_ratings)
  17. dates_and_grades = roped_only_naive[['Date', 'Rating']]
  18. print(dates_and_grades.dtypes)
  19. dates_and_grades["Grade_Values"] = dates_and_grades["Rating"].map(lambda data: YDS_DICT[data])
  20. print(dates_and_grades.dtypes)
  21. dates_and_grades['Date'] = dates_and_grades['Date'].groupby(pd.Grouper(freq='M'))
  22. print(dates_and_grades)
  23. However, I get the following error when run.
  24. TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
  25. What is strange is that when I check the types on my dataframe using
  26. print(dates_and_grades.dtypes)
  27. I get the following printout
  28. Date datetime64[ns]
  29. Rating object
  30. Grade_Values int64
  31. So it looks like my Date column is indeed a datetime object.
  32. My question is then, why doesn't the groupby(pd.Grouper(freq='M')) function work on my dates_and_grades['Date'] column if it does seem like dates_and_grades['Date'] is actually a datetime type?
英文:

I'm using pandas in python, and am trying to group a set of dates by month, and determine the highest value in the dates_and_grades["Grade_Values"] column for each month. I wrote the following code attempting to do this:

  1. data = pd.read_csv(input_filepath)
  2. data['Date'] = pd.to_datetime(data['Date'], format = 'ISO8601')
  3. roped = ["Sport", "Trad"]
  4. YDS_DICT={"N/A":"N/A",'3-4':0,'5':1,'5.0':1,'5.1':2,'5.2':3,'5.3':4,'5.4':5,
  5. '5.5':6,'5.6':7,'5.7':8,'5.8':9,'5.9':10,
  6. '5.10a':11,'5.10b':12, '5.10': 12, '5.10c':13,'5.10d':14,
  7. '5.11a':15,'5.11b':16, '5.11':16, '5.11c':17,'5.11d':18,
  8. '5.12a':19,'5.12b':20,'5.12c':21,'5.12d':22,
  9. '5.13a':23,'5.13b':24,'5.13c':25,'5.13d':26,
  10. '5.14a':27,'5.14b':28,'5.14c':29,'5.14d':30,
  11. '5.15a':31,'5.15b':32,'5.15c':33,'5.15d':34}
  12. roped_only_naive = data.loc[data['Route Type'].isin(roped)].copy()
  13. roped_only_naive["Rating"] = roped_only_naive['Rating'].map(slash_grade_converter)
  14. roped_only_naive["Rating"] = roped_only_naive['Rating'].map(flatten_plus_and_minus_grades)
  15. roped_only_naive["Rating"] = roped_only_naive['Rating'].map(remove_risk_ratings)
  16. dates_and_grades = roped_only_naive[['Date', 'Rating']]
  17. print(dates_and_grades.dtypes)
  18. dates_and_grades["Grade_Values"] = dates_and_grades["Rating"].map(lambda data: YDS_DICT[data])
  19. print(dates_and_grades.dtypes)
  20. dates_and_grades['Date'] = dates_and_grades['Date'].groupby(pd.Grouper(freq='M'))
  21. print(dates_and_grades)

However, I get the following error when run.

  1. TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'

What is strange is that when I check the types on my dataframe using

  1. print(dates_and_grades.dtypes)

I get the following printout

  1. Date datetime64[ns]
  2. Rating object
  3. Grade_Values int64

So it looks like my Date column is indeed a datetime object.

My question is then, why doesn't the groupby(pd.Grouper(freq='M')) function work on my dates_and_grades['Date'] column if it does seem like dates_and_grades['Date'] is actually a datetime type?

答案1

得分: 1

Grouper中使用参数 key,如果不使用一些转换函数,则不能将groupby的输出分配给新列 - 对于列:

  1. dates_and_grades['new'] = dates_and_grades.groupby(pd.Grouper(freq='M', key='Date'))["Grade_Values"].transform('max')

如果省略 key 参数,Grouper 需要 DatetimeIndex,所以会产生错误。

如果需要每月按最大Grade Values获取行,请使用以下方法:

  1. out = dates_and_grades.loc[dates_and_grades.groupby(pd.Grouper(freq='M', key='Date'))["Grade_Values"].idxmax()]
英文:

Use parameter key in Grouper, also cannot assign ouput of groupby to new column if not used some transformation function - for column :

  1. dates_and_grades['new'] = dates_and_grades.groupby(pd.Grouper(freq='M', key='Date'))["Grade_Values"].transform('max')

If omit key parameter Grouper need DatetimeIndex, so error is expected.

If need rows by maximal Grade Values per months use:

  1. out = dates_and_grades.loc[dates_and_grades.groupby(pd.Grouper(freq='M', key='Date'))["Grade_Values"].idxmax())

huangapple
  • 本文由 发表于 2023年6月15日 14:19:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/76479625.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定