2023年2月24日 16:46:10go评论58阅读模式

英文:

Generate a fake date between two date in Python data frame

问题

我有一个名为"test"的数据框，如下所示，我想在这两个日期之间生成一个随机日期。

id        first_month     last_month
PT1     2011-06-01     2019-10-01
PT3     2020-09-01     2022-06-01

我尝试使用这段代码，但出现了错误：

import random
test["random_date"] = test.first_month + (test.last_month - start) * random.random()

错误是：

TypeError: unsupported operand type(s) for +: 'TimedeltaArray' and 'datetime.date'

英文:

I have a data frame called "test" as follows, I would like to generate a random date between this two date.

id	     first_month	    last_month
PT1	     2011-06-01	        2019-10-01
PT3      2020-09-01         2022-06-01


import random
test[&quot;random_date&quot;] = test.first_month_active + (test.last_month_active - start) * random.random()

I tried with this code but the error is :

TypeError: unsupported operand type(s) for +: &#39;TimedeltaArray&#39; and &#39;datetime.date&#39;

答案1

得分: 1

使用Series.dt.days的差值，乘以numpy.random.uniform生成的随机数，然后将时间增量添加到原始的first_month列：

df['first_month'] = pd.to_datetime(df['first_month'])
df['last_month'] = pd.to_datetime(df['last_month'])

n = df['last_month'].sub(df['first_month']).dt.days * np.random.uniform(size=len(df))

df["random_date"] = df["first_month"] + pd.to_timedelta(n.astype(int), 'D')
print (df)
    id first_month last_month random_date
0  PT1  2011-06-01 2019-10-01  2016-06-15
1  PT3  2020-09-01 2022-06-01  2021-08-17

性能：

# 20k行
df = pd.concat([df] * 10000, ignore_index=True)

In [183]: %%timeit
     ...: n = df['last_month'].sub(df['first_month']).dt.days * np.random.uniform(size=len(df))
     ...: 
     ...: df["random_date"] = df["first_month"] + pd.to_timedelta(n.astype(int), 'D')
     ...: 
2.75 ms ± 85.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [185]: %%timeit
     ...: df['random_date'] = [np.random.choice(pd.date_range(first, last), 1)[0]
     ...:                  for first, last in zip(df['first_month'], df['last_month'])]
     ...:                  
3.87 s ± 531 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

英文:

Use Series.dt.days of subtracted values, multiple by numpy.random.uniform and add timedeltas to original first_month column:

df[&#39;first_month&#39;] = pd.to_datetime(df[&#39;first_month&#39;])
df[&#39;last_month&#39;] = pd.to_datetime(df[&#39;last_month&#39;])

n = df[&#39;last_month&#39;].sub(df[&#39;first_month&#39;]).dt.days * np.random.uniform(size=len(df))

df[&quot;random_date&quot;] = df[&quot;first_month&quot;] + pd.to_timedelta(n.astype(int), &#39;D&#39;)
print (df)
    id first_month last_month random_date
0  PT1  2011-06-01 2019-10-01  2016-06-15
1  PT3  2020-09-01 2022-06-01  2021-08-17

Performance:

#20k rows
df = pd.concat([df] * 10000, ignore_index=True)

In [183]: %%timeit
     ...: n = df[&#39;last_month&#39;].sub(df[&#39;first_month&#39;]).dt.days * np.random.uniform(size=len(df))
     ...: 
     ...: df[&quot;random_date&quot;] = df[&quot;first_month&quot;] + pd.to_timedelta(n.astype(int), &#39;D&#39;)
     ...: 
2.75 ms &#177; 85.6 &#181;s per loop (mean &#177; std. dev. of 7 runs, 100 loops each)

In [185]: %%timeit
     ...: df[&#39;random_date&#39;] = [np.random.choice(pd.date_range(first, last), 1)[0]
     ...:                      for first, last in zip(df[&#39;first_month&#39;], df[&#39;last_month&#39;])]
     ...:                      
3.87 s &#177; 531 ms per loop (mean &#177; std. dev. of 7 runs, 1 loop each)

答案2

得分: 0

from random import sample

df['random_date'] = [np.random.choice(pd.date_range(first, last), 1)[0]
                     for first, last in zip(df['first_month'], df['last_month'])]

输出：

    id first_month  last_month random_date
0  PT1  2011-06-01  2019-10-01  2019-03-15
1  PT3  2020-09-01  2022-06-01  2020-12-21

英文:

One option using a list comprehension with date_range and numpy.random.choice:

from random import sample

df[&#39;random_date&#39;] = [np.random.choice(pd.date_range(first, last), 1)[0]
                     for first, last in zip(df[&#39;first_month&#39;], df[&#39;last_month&#39;])]

Output:

    id first_month  last_month random_date
0  PT1  2011-06-01  2019-10-01  2019-03-15
1  PT3  2020-09-01  2022-06-01  2020-12-21

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python数据框中生成两个日期之间的虚假日期。

问题

答案1

答案2

使用来自数组的参数解决ODEs。

合并同一数据框中的两列，消除 “0” 值。

Langchain：自定义输出解析器在与ConversationChain一起使用时无法正常工作。

Mocking file-like gzipped csv for boto3’s StreamingBody

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论