将时间戳转换为日期和时段

huangapple go评论167阅读模式
英文:

Converting timestamp to date and to period

问题

以下是您提供的内容的翻译:

我知道看这个示例代码可能看起来效率不高,但在原始数据框(DF)上,我必须在一个列上应用正则表达式函数,所以我需要通过iterrows来做。

我的问题是如何将data3数据框的日期列转换为一个期间('M')列,因为我想要运行一个groupby。

  1. import pandas as pd
  2. import datetime as dt
  3. ts = dt.datetime.now()
  4. data = pd.DataFrame({
  5. 'status' : ['pending', 'pending','pending'],
  6. 'brand' : ['brand_1', 'brand_2', 'brand_3'],
  7. 'date' : [pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now())]})
  8. data2 = list()
  9. for index, row in data.iterrows():
  10. a = row['status']
  11. b = row['brand']
  12. c = row['date'].date()
  13. data2.append((a,b,c))
  14. data3 = pd.DataFrame(data=data2,columns=['status','brand','date'])

所以,如果我尝试运行一个groupby,我无法让它工作。

  1. a = data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})

您遇到的错误是AttributeError,它表示只能在日期时间类似的值上使用.dt访问器。

英文:

I know that looking at this example code, iy might seem inefficient, but on the original DF I must apply a regex function on a columns so I need to do it through iterrows.

My questions is how do I covert the date column o the data3 dataframe to a period('M') column because I want do run a groupby.

  1. import pandas as pd
  2. import datetime as dt
  3. ts = dt.datetime.now()
  4. data = pd.DataFrame({
  5. 'status' : ['pending', 'pending','pending'],
  6. 'brand' : ['brand_1', 'brand_2', 'brand_3'],
  7. 'date' : [pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now())]})
  8. data2 = list()
  9. for index, row in data.iterrows():
  10. a = row['status']
  11. b = row['brand']
  12. c = row['date'].date()
  13. data2.append((a,b,c))
  14. data3 = pd.DataFrame(data=data2,columns=['status','brand','date'])
  1. status brand date
  2. 0 pending brand_1 2023-08-04
  3. 1 pending brand_2 2023-08-04
  4. 2 pending brand_3 2023-08-04

So, if I try to run a groupby then, I can´t get it working.

  1. a = data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})

_

  1. AttributeError Traceback (most recent call last)
  2. Cell In[26], line 1
  3. ----> 1 a = data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})
  4. 2 a
  5. File c:\Users\fabio\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\generic.py:5989, in NDFrame.__getattr__(self, name)
  6. 5982 if (
  7. 5983 name not in self._internal_names_set
  8. 5984 and name not in self._metadata
  9. 5985 and name not in self._accessors
  10. 5986 and self._info_axis._can_hold_identifiers_and_holds_name(name)
  11. 5987 ):
  12. 5988 return self[name]
  13. -> 5989 return object.__getattribute__(self, name)
  14. File c:\Users\fabio\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
  15. 221 if obj is None:
  16. 222 # we're accessing the attribute of the class, i.e., Dataset.geo
  17. 223 return self._accessor
  18. --> 224 accessor_obj = self._accessor(obj)
  19. 225 # Replace the property with the accessor object. Inspired by:
  20. 226 # https://www.pydanny.com/cached-property.html
  21. 227 # We need to use object.__setattr__ because we overwrite __setattr__ on
  22. 228 # NDFrame
  23. ...
  24. 577 elif is_period_dtype(data.dtype):
  25. 578 return PeriodProperties(data, orig)
  26. --> 580 raise AttributeError("Can only use .dt accessor with datetimelike values")
  27. AttributeError: Can only use .dt accessor with datetimelike values
  28. Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

答案1

得分: 1

Remove .date() in your for loop, to maintain the data type as datetime64[ns] like this,

  1. data2 = list()
  2. for index, row in data.iterrows():
  3. ...
  4. c = row['date'] # <-- removed `.date()`
  5. ...

Now, this works

  1. data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})

Output:

  1. brand

date
2023-08 3

英文:

Remove .date() in your for loop, to maintain the data type as datetime64[ns] like this,

  1. data2 = list()
  2. for index, row in data.iterrows():
  3. ...
  4. c = row[&#39;date&#39;] # &lt;-- removed `.date()`
  5. ...

Now, this works

  1. data3.groupby([data3[&#39;date&#39;].dt.to_period(&#39;M&#39;)], observed=True).aggregate({&#39;brand&#39;:&#39;count&#39;})

Output:

  1. brand
  2. date
  3. 2023-08 3

huangapple
  • 本文由 发表于 2023年8月5日 03:58:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838830.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定