2023年8月5日 03:58:23go评论167阅读模式

英文:

Converting timestamp to date and to period

问题

以下是您提供的内容的翻译：

我知道看这个示例代码可能看起来效率不高，但在原始数据框（DF）上，我必须在一个列上应用正则表达式函数，所以我需要通过iterrows来做。

我的问题是如何将data3数据框的日期列转换为一个期间（'M'）列，因为我想要运行一个groupby。

import pandas as pd
import datetime as dt
ts = dt.datetime.now()
data = pd.DataFrame({
    'status' :  ['pending', 'pending','pending'],
    'brand' : ['brand_1', 'brand_2', 'brand_3'],
    'date' : [pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now())]})
data2 = list()
for index, row in data.iterrows():
    a = row['status']
    b = row['brand']
    c = row['date'].date()
    data2.append((a,b,c))
data3 = pd.DataFrame(data=data2,columns=['status','brand','date'])

所以，如果我尝试运行一个groupby，我无法让它工作。

a = data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})

您遇到的错误是AttributeError，它表示只能在日期时间类似的值上使用.dt访问器。

英文:

I know that looking at this example code, iy might seem inefficient, but on the original DF I must apply a regex function on a columns so I need to do it through iterrows.

My questions is how do I covert the date column o the data3 dataframe to a period('M') column because I want do run a groupby.

import pandas as pd
import datetime as dt
ts = dt.datetime.now()
data = pd.DataFrame({
    &#39;status&#39; :  [&#39;pending&#39;, &#39;pending&#39;,&#39;pending&#39;],
    &#39;brand&#39; : [&#39;brand_1&#39;, &#39;brand_2&#39;, &#39;brand_3&#39;],
    &#39;date&#39; : [pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now()),pd.Timestamp(dt.datetime.now())]})
data2 = list()
for index, row in data.iterrows():
    a = row[&#39;status&#39;]
    b = row[&#39;brand&#39;]
    c = row[&#39;date&#39;].date()
    data2.append((a,b,c))
data3 = pd.DataFrame(data=data2,columns=[&#39;status&#39;,&#39;brand&#39;,&#39;date&#39;])

    status    brand        date
0  pending  brand_1  2023-08-04
1  pending  brand_2  2023-08-04
2  pending  brand_3  2023-08-04

So, if I try to run a groupby then, I can´t get it working.

a = data3.groupby([data3[&#39;date&#39;].dt.to_period(&#39;M&#39;)], observed=True).aggregate({&#39;brand&#39;:&#39;count&#39;})

AttributeError                            Traceback (most recent call last)
Cell In[26], line 1
----&gt; 1 a = data3.groupby([data3[&#39;date&#39;].dt.to_period(&#39;M&#39;)], observed=True).aggregate({&#39;brand&#39;:&#39;count&#39;})
      2 a
File c:\Users\fabio\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\generic.py:5989, in NDFrame.__getattr__(self, name)
   5982 if (
   5983     name not in self._internal_names_set
   5984     and name not in self._metadata
   5985     and name not in self._accessors
   5986     and self._info_axis._can_hold_identifiers_and_holds_name(name)
   5987 ):
   5988     return self[name]
-&gt; 5989 return object.__getattribute__(self, name)
File c:\Users\fabio\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\core\accessor.py:224, in CachedAccessor.__get__(self, obj, cls)
    221 if obj is None:
    222     # we&#39;re accessing the attribute of the class, i.e., Dataset.geo
    223     return self._accessor
--&gt; 224 accessor_obj = self._accessor(obj)
    225 # Replace the property with the accessor object. Inspired by:
    226 # https://www.pydanny.com/cached-property.html
    227 # We need to use object.__setattr__ because we overwrite __setattr__ on
    228 # NDFrame
...
    577 elif is_period_dtype(data.dtype):
    578     return PeriodProperties(data, orig)
--&gt; 580 raise AttributeError(&quot;Can only use .dt accessor with datetimelike values&quot;)
AttributeError: Can only use .dt accessor with datetimelike values
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

答案1

得分: 1

Remove .date() in your for loop, to maintain the data type as datetime64[ns] like this,

data2 = list()
for index, row in data.iterrows():
    ...
    c = row['date'] # <-- removed `.date()`
    ...

Now, this works

data3.groupby([data3['date'].dt.to_period('M')], observed=True).aggregate({'brand':'count'})

Output:

    brand

date
2023-08 3

英文:

Remove .date() in your for loop, to maintain the data type as datetime64[ns] like this,

data2 = list()
for index, row in data.iterrows():
    ...
    c = row[&#39;date&#39;] # &lt;-- removed `.date()`
    ...

Now, this works

data3.groupby([data3[&#39;date&#39;].dt.to_period(&#39;M&#39;)], observed=True).aggregate({&#39;brand&#39;:&#39;count&#39;})

Output:

        brand
date	
2023-08	3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将时间戳转换为日期和时段

问题

答案1

Reading zip file content for later compute sha256 checksum fails.

如何高效处理和筛选大型CSV文件在Python中？

怎么创建一个可重复使用的函数来根据特定列中的值删除行？

在Python的Hypothesis库中，为什么text()策略会导致自定义策略重试？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。