英文:
Converting timestamp to date and to period
问题
以下是您提供的内容的翻译:
我知道看这个示例代码可能看起来效率不高,但在原始数据框(DF)上,我必须在一个列上应用正则表达式函数,所以我需要通过iterrows来做。
我的问题是如何将data3数据框的日期列转换为一个期间('M')列,因为我想要运行一个groupby。
import pandas as pd
import datetime as dt
ts = dt.datetime.now()
data = pd.DataFrame({
'status' : ['pending', 'pending','pending'],
'brand' : ['brand_1', 'brand_2', 'brand_3'],
'date' : [pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now())]})
data2 = list()
for index, row in data.iterrows():
a = row['status']
b = row['brand']
c = row['date'].date()
data2.append((a,b,c))
data3 = pd.DataFrame(data=data2,columns=['status','brand','date'])
所以,如果我尝试运行一个groupby,我无法让它工作。
a = data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})
您遇到的错误是AttributeError,它表示只能在日期时间类似的值上使用.dt访问器。
英文:
I know that looking at this example code, iy might seem inefficient, but on the original DF I must apply a regex function on a columns so I need to do it through iterrows.
My questions is how do I covert the date column o the data3 dataframe to a period('M') column because I want do run a groupby.
import pandas as pd
import datetime as dt
ts = dt.datetime.now()
data = pd.DataFrame({
'status' : ['pending', 'pending','pending'],
'brand' : ['brand_1', 'brand_2', 'brand_3'],
'date' : [pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now())]})
data2 = list()
for index, row in data.iterrows():
a = row['status']
b = row['brand']
c = row['date'].date()
data2.append((a,b,c))
data3 = pd.DataFrame(data=data2,columns=['status','brand','date'])
status brand date
0 pending brand_1 2023-08-04
1 pending brand_2 2023-08-04
2 pending brand_3 2023-08-04
So, if I try to run a groupby then, I can´t get it working.
a = data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})
_
AttributeError Traceback (most recent call last)
Cell In[26], line 1
----> 1 a = data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})
2 a
File c:\Users\fabio\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\generic.py:5989, in NDFrame.__getattr__(self, name)
5982 if (
5983 name not in self._internal_names_set
5984 and name not in self._metadata
5985 and name not in self._accessors
5986 and self._info_axis._can_hold_identifiers_and_holds_name(name)
5987 ):
5988 return self[name]
-> 5989 return object.__getattribute__(self, name)
File c:\Users\fabio\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
221 if obj is None:
222 # we're accessing the attribute of the class, i.e., Dataset.geo
223 return self._accessor
--> 224 accessor_obj = self._accessor(obj)
225 # Replace the property with the accessor object. Inspired by:
226 # https://www.pydanny.com/cached-property.html
227 # We need to use object.__setattr__ because we overwrite __setattr__ on
228 # NDFrame
...
577 elif is_period_dtype(data.dtype):
578 return PeriodProperties(data, orig)
--> 580 raise AttributeError("Can only use .dt accessor with datetimelike values")
AttributeError: Can only use .dt accessor with datetimelike values
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
答案1
得分: 1
Remove .date()
in your for loop, to maintain the data type as datetime64[ns]
like this,
data2 = list()
for index, row in data.iterrows():
...
c = row['date'] # <-- removed `.date()`
...
Now, this works
data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})
Output:
brand
date
2023-08 3
英文:
Remove .date()
in your for loop, to maintain the data type as datetime64[ns]
like this,
data2 = list()
for index, row in data.iterrows():
...
c = row['date'] # <-- removed `.date()`
...
Now, this works
data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})
Output:
brand
date
2023-08 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论