英文:
Generate a fake date between two date in Python data frame
问题
我有一个名为"test"的数据框,如下所示,我想在这两个日期之间生成一个随机日期。
id first_month last_month
PT1 2011-06-01 2019-10-01
PT3 2020-09-01 2022-06-01
我尝试使用这段代码,但出现了错误:
import random
test["random_date"] = test.first_month + (test.last_month - start) * random.random()
错误是:
TypeError: unsupported operand type(s) for +: 'TimedeltaArray' and 'datetime.date'
英文:
I have a data frame called "test" as follows, I would like to generate a random date between this two date.
id first_month last_month
PT1 2011-06-01 2019-10-01
PT3 2020-09-01 2022-06-01
import random
test["random_date"] = test.first_month_active + (test.last_month_active - start) * random.random()
I tried with this code but the error is :
TypeError: unsupported operand type(s) for +: 'TimedeltaArray' and 'datetime.date'
答案1
得分: 1
使用Series.dt.days
的差值,乘以numpy.random.uniform
生成的随机数,然后将时间增量添加到原始的first_month
列:
df['first_month'] = pd.to_datetime(df['first_month'])
df['last_month'] = pd.to_datetime(df['last_month'])
n = df['last_month'].sub(df['first_month']).dt.days * np.random.uniform(size=len(df))
df["random_date"] = df["first_month"] + pd.to_timedelta(n.astype(int), 'D')
print (df)
id first_month last_month random_date
0 PT1 2011-06-01 2019-10-01 2016-06-15
1 PT3 2020-09-01 2022-06-01 2021-08-17
性能:
# 20k行
df = pd.concat([df] * 10000, ignore_index=True)
In [183]: %%timeit
...: n = df['last_month'].sub(df['first_month']).dt.days * np.random.uniform(size=len(df))
...:
...: df["random_date"] = df["first_month"] + pd.to_timedelta(n.astype(int), 'D')
...:
2.75 ms ± 85.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [185]: %%timeit
...: df['random_date'] = [np.random.choice(pd.date_range(first, last), 1)[0]
...: for first, last in zip(df['first_month'], df['last_month'])]
...:
3.87 s ± 531 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
英文:
Use Series.dt.days
of subtracted values, multiple by numpy.random.uniform
and add timedeltas to original first_month
column:
df['first_month'] = pd.to_datetime(df['first_month'])
df['last_month'] = pd.to_datetime(df['last_month'])
n = df['last_month'].sub(df['first_month']).dt.days * np.random.uniform(size=len(df))
df["random_date"] = df["first_month"] + pd.to_timedelta(n.astype(int), 'D')
print (df)
id first_month last_month random_date
0 PT1 2011-06-01 2019-10-01 2016-06-15
1 PT3 2020-09-01 2022-06-01 2021-08-17
Performance:
#20k rows
df = pd.concat([df] * 10000, ignore_index=True)
In [183]: %%timeit
...: n = df['last_month'].sub(df['first_month']).dt.days * np.random.uniform(size=len(df))
...:
...: df["random_date"] = df["first_month"] + pd.to_timedelta(n.astype(int), 'D')
...:
2.75 ms ± 85.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [185]: %%timeit
...: df['random_date'] = [np.random.choice(pd.date_range(first, last), 1)[0]
...: for first, last in zip(df['first_month'], df['last_month'])]
...:
3.87 s ± 531 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
答案2
得分: 0
from random import sample
df['random_date'] = [np.random.choice(pd.date_range(first, last), 1)[0]
for first, last in zip(df['first_month'], df['last_month'])]
输出:
id first_month last_month random_date
0 PT1 2011-06-01 2019-10-01 2019-03-15
1 PT3 2020-09-01 2022-06-01 2020-12-21
英文:
One option using a list comprehension with date_range
and numpy.random.choice
:
from random import sample
df['random_date'] = [np.random.choice(pd.date_range(first, last), 1)[0]
for first, last in zip(df['first_month'], df['last_month'])]
Output:
id first_month last_month random_date
0 PT1 2011-06-01 2019-10-01 2019-03-15
1 PT3 2020-09-01 2022-06-01 2020-12-21
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论