英文:
Date Countdown with pandas
问题
我正在尝试计算一个日期与今天之间的月份差异。以下是我目前的代码:
import pandas as pd
import numpy as np
from datetime import date
def calc_date_countdown(df):
today = date.today()
df['countdown'] = df['date'].apply(lambda x: (x - today) / np.timedelta64(1, 'M'))
df['countdown'] = df['countdown'].astype(int)
return df
关于您的代码中出现的错误,这是因为您正在尝试将日期(datetime.date)与时间戳(Timestamp)相减,导致了不支持的操作。您可以将日期对象转换为时间戳,然后进行相减。以下是修复错误的代码:
import pandas as pd
import numpy as np
from datetime import date
def calc_date_countdown(df):
today = pd.to_datetime(date.today())
df['countdown'] = df['date'].apply(lambda x: ((pd.to_datetime(x) - today) / np.timedelta64(1, 'M')).astype(int))
return df
这将解决您的错误,并计算日期和今天之间的月份差异。
英文:
I'm trying to calc the different between a date and today in months.
Here is what I have so far:
import pandas as pd
import numpy as np
from datetime import date
def calc_date_countdown(df):
today = date.today()
df['countdown'] = df['date'].apply(lambda x: (x-today)/np.timedelta64(1, 'M'))
df['countdown'] = df['countdown'].astype(int)
return df
Any pointers on what I'm doing wrong or maybe a more efficient way of doing it?
When I run on my dataset, this is the error I'm getting: TypeError: unsupported operand type(s) for -: 'Timestamp' and 'datetime.date'
答案1
得分: 2
import pandas as pd
def calc_date_countdown(df):
today = pd.Timestamp.today()
df['countdown'] = df['date'].apply(lambda x: (x - today).days // 30)
return df
This should work as long as your date column in the dataframe is a Timestamp object. If it's not, you may need to convert it using pd.to_datetime() before running the function.
英文:
import pandas as pd
def calc_date_countdown(df):
today = pd.Timestamp.today()
df['countdown'] = df['date'].apply(lambda x: (x - today).days // 30)
return df
This should work as long as your date column in the dataframe is a Timestamp object. If it's not, you may need to convert it using pd.to_datetime() before running the function.
答案2
得分: 1
使用apply
不是很高效,因为这是一个数组操作。请看下面的示例:
from datetime import date, datetime
def per_array(df):
df['months'] = ((pd.to_datetime(date.today()) - df['date']) / np.timedelta64(1, 'M')).astype(int)
return df
def using_apply(df):
today = date.today()
df['months'] = df['date'].apply(lambda x: (x - pd.to_datetime(today)) / np.timedelta64(1, 'M'))
df['months'] = df['months'].astype(int)
return df
df = pd.DataFrame({'date': [pd.to_datetime(f"2023-0{i}-01") for i in range(1, 8)]})
print(df)
# date
# 0 2023-01-01
# 1 2023-02-01
# 2 2023-03-01
# 3 2023-04-01
# 4 2023-05-01
# 5 2023-06-01
# 6 2023-07-01
计时:
%%timeit
per_array(df)
195 µs ± 5.14 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%%timeit
using_apply(df)
384 µs ± 3.22 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
如您所见,不使用apply
大约快了一倍。
英文:
Using apply
is not very efficient, as this is an array operation.
See the below example:
from datetime import date, datetime
def per_array(df):
df['months'] = ((pd.to_datetime(date.today()) - df['date']) / np.timedelta64(1, 'M')).astype(int)
return df
def using_apply(df):
today = date.today()
df['months'] = df['date'].apply(lambda x: (x-pd.to_datetime(today))/np.timedelta64(1, 'M'))
df['months'] = df['months'].astype(int)
return df
df = pd.DataFrame({'date': [pd.to_datetime(f"2023-0{i}-01") for i in range(1,8)]})
print(df)
# date
# 0 2023-01-01
# 1 2023-02-01
# 2 2023-03-01
# 3 2023-04-01
# 4 2023-05-01
# 5 2023-06-01
# 6 2023-07-01
Timing it:
%%timeit
per_array(df)
195 µs ± 5.14 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
%%timeit
using_apply(df)
384 µs ± 3.22 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
As you can see, it is around twice as fast to not use apply
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论