在Python数据框中生成两个日期之间的虚假日期。

huangapple go评论58阅读模式
英文:

Generate a fake date between two date in Python data frame

问题

我有一个名为"test"的数据框,如下所示,我想在这两个日期之间生成一个随机日期。

id        first_month     last_month
PT1     2011-06-01     2019-10-01
PT3     2020-09-01     2022-06-01

我尝试使用这段代码,但出现了错误:

import random
test["random_date"] = test.first_month + (test.last_month - start) * random.random()

错误是:

TypeError: unsupported operand type(s) for +: 'TimedeltaArray' and 'datetime.date'
英文:

I have a data frame called "test" as follows, I would like to generate a random date between this two date.

id	     first_month	    last_month
PT1	     2011-06-01	        2019-10-01
PT3      2020-09-01         2022-06-01


import random
test["random_date"] = test.first_month_active + (test.last_month_active - start) * random.random()

I tried with this code but the error is :

TypeError: unsupported operand type(s) for +: 'TimedeltaArray' and 'datetime.date'

答案1

得分: 1

使用Series.dt.days的差值,乘以numpy.random.uniform生成的随机数,然后将时间增量添加到原始的first_month列:

df['first_month'] = pd.to_datetime(df['first_month'])
df['last_month'] = pd.to_datetime(df['last_month'])

n = df['last_month'].sub(df['first_month']).dt.days * np.random.uniform(size=len(df))

df["random_date"] = df["first_month"] + pd.to_timedelta(n.astype(int), 'D')
print (df)
    id first_month last_month random_date
0  PT1  2011-06-01 2019-10-01  2016-06-15
1  PT3  2020-09-01 2022-06-01  2021-08-17

性能

# 20k行
df = pd.concat([df] * 10000, ignore_index=True)

In [183]: %%timeit
     ...: n = df['last_month'].sub(df['first_month']).dt.days * np.random.uniform(size=len(df))
     ...: 
     ...: df["random_date"] = df["first_month"] + pd.to_timedelta(n.astype(int), 'D')
     ...: 
2.75 ms ± 85.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [185]: %%timeit
     ...: df['random_date'] = [np.random.choice(pd.date_range(first, last), 1)[0]
     ...:                  for first, last in zip(df['first_month'], df['last_month'])]
     ...:                  
3.87 s ± 531 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
英文:

Use Series.dt.days of subtracted values, multiple by numpy.random.uniform and add timedeltas to original first_month column:

df['first_month'] = pd.to_datetime(df['first_month'])
df['last_month'] = pd.to_datetime(df['last_month'])

n = df['last_month'].sub(df['first_month']).dt.days * np.random.uniform(size=len(df))

df["random_date"] = df["first_month"] + pd.to_timedelta(n.astype(int), 'D')
print (df)
    id first_month last_month random_date
0  PT1  2011-06-01 2019-10-01  2016-06-15
1  PT3  2020-09-01 2022-06-01  2021-08-17

Performance:

#20k rows
df = pd.concat([df] * 10000, ignore_index=True)

In [183]: %%timeit
     ...: n = df['last_month'].sub(df['first_month']).dt.days * np.random.uniform(size=len(df))
     ...: 
     ...: df["random_date"] = df["first_month"] + pd.to_timedelta(n.astype(int), 'D')
     ...: 
2.75 ms ± 85.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [185]: %%timeit
     ...: df['random_date'] = [np.random.choice(pd.date_range(first, last), 1)[0]
     ...:                      for first, last in zip(df['first_month'], df['last_month'])]
     ...:                      
3.87 s ± 531 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

答案2

得分: 0

from random import sample

df['random_date'] = [np.random.choice(pd.date_range(first, last), 1)[0]
                     for first, last in zip(df['first_month'], df['last_month'])]

输出:

    id first_month  last_month random_date
0  PT1  2011-06-01  2019-10-01  2019-03-15
1  PT3  2020-09-01  2022-06-01  2020-12-21
英文:

One option using a list comprehension with date_range and numpy.random.choice:

from random import sample

df['random_date'] = [np.random.choice(pd.date_range(first, last), 1)[0]
                     for first, last in zip(df['first_month'], df['last_month'])]

Output:

    id first_month  last_month random_date
0  PT1  2011-06-01  2019-10-01  2019-03-15
1  PT3  2020-09-01  2022-06-01  2020-12-21

huangapple
  • 本文由 发表于 2023年2月24日 16:46:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75554366.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定