将时间戳转换为日期和时段

huangapple go评论225阅读模式
英文:

Converting timestamp to date and to period

问题

以下是您提供的内容的翻译:

我知道看这个示例代码可能看起来效率不高,但在原始数据框(DF)上,我必须在一个列上应用正则表达式函数,所以我需要通过iterrows来做。

我的问题是如何将data3数据框的日期列转换为一个期间('M')列,因为我想要运行一个groupby。

import pandas as pd
import datetime as dt
ts = dt.datetime.now()

data = pd.DataFrame({
    'status' :  ['pending', 'pending','pending'],
    'brand' : ['brand_1', 'brand_2', 'brand_3'],
    'date' : [pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now())]})

data2 = list()
for index, row in data.iterrows():
    a = row['status']
    b = row['brand']
    c = row['date'].date()
    data2.append((a,b,c))

data3 = pd.DataFrame(data=data2,columns=['status','brand','date'])

所以,如果我尝试运行一个groupby,我无法让它工作。

a = data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})

您遇到的错误是AttributeError,它表示只能在日期时间类似的值上使用.dt访问器。

英文:

I know that looking at this example code, iy might seem inefficient, but on the original DF I must apply a regex function on a columns so I need to do it through iterrows.

My questions is how do I covert the date column o the data3 dataframe to a period('M') column because I want do run a groupby.

import pandas as pd
import datetime as dt
ts = dt.datetime.now()

data = pd.DataFrame({
    'status' :  ['pending', 'pending','pending'],
    'brand' : ['brand_1', 'brand_2', 'brand_3'],
    'date' : [pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now())]})

data2 = list()
for index, row in data.iterrows():
    a = row['status']
    b = row['brand']
    c = row['date'].date()
    data2.append((a,b,c))

data3 = pd.DataFrame(data=data2,columns=['status','brand','date'])
    status    brand        date
0  pending  brand_1  2023-08-04
1  pending  brand_2  2023-08-04
2  pending  brand_3  2023-08-04

So, if I try to run a groupby then, I can´t get it working.

a = data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})

_

AttributeError                            Traceback (most recent call last)
Cell In[26], line 1
----> 1 a = data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})
      2 a

File c:\Users\fabio\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\generic.py:5989, in NDFrame.__getattr__(self, name)
   5982 if (
   5983     name not in self._internal_names_set
   5984     and name not in self._metadata
   5985     and name not in self._accessors
   5986     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5987 ):
   5988     return self[name]
-> 5989 return object.__getattribute__(self, name)

File c:\Users\fabio\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
    221 if obj is None:
    222     # we're accessing the attribute of the class, i.e., Dataset.geo
    223     return self._accessor
--> 224 accessor_obj = self._accessor(obj)
    225 # Replace the property with the accessor object. Inspired by:
    226 # https://www.pydanny.com/cached-property.html
    227 # We need to use object.__setattr__ because we overwrite __setattr__ on
    228 # NDFrame
...
    577 elif is_period_dtype(data.dtype):
    578     return PeriodProperties(data, orig)
--> 580 raise AttributeError("Can only use .dt accessor with datetimelike values")

AttributeError: Can only use .dt accessor with datetimelike values
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

答案1

得分: 1

Remove .date() in your for loop, to maintain the data type as datetime64[ns] like this,

data2 = list()
for index, row in data.iterrows():
    ...
    c = row['date'] # <-- removed `.date()`
    ...

Now, this works

data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})

Output:

    brand

date
2023-08 3

英文:

Remove .date() in your for loop, to maintain the data type as datetime64[ns] like this,

data2 = list()
for index, row in data.iterrows():
    ...
    c = row[&#39;date&#39;] # &lt;-- removed `.date()`
    ...

Now, this works

data3.groupby([data3[&#39;date&#39;].dt.to_period(&#39;M&#39;)], observed=True).aggregate({&#39;brand&#39;:&#39;count&#39;})

Output:

        brand
date	
2023-08	3

huangapple
  • 本文由 发表于 2023年8月5日 03:58:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838830.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定