英文:
How to fix - Overflow in int64 addition
问题
我正在尝试通过将一个名为df['num_days']的列添加到另一个列df['sampling_date']来计算未来日期,但在int64加法中出现了溢出。
源代码-
df['sampling_date'] = pd.to_datetime(df['sampling_date'], errors='coerce')
df['future_date'] = df['sampling_date'] + pd.to_timedelta(df['num_days'], unit='D')
df['future_date'] = pd.to_datetime(df['future_date']).dt.strftime('%Y-%m-%d')
df['future_date'] = df['future_date'].astype(np.str)
df['future_date'] = np.where(df['num_days'] <= 0, 0, df['future_date'])
对于df['num_days']列,其值如下:[0, 866, 729, 48357555, 567, 478]
我正在尝试在Unix服务器上运行此代码。请帮助我解决这个问题。
英文:
I am trying to calculate future dates by adding a column with number of days df['num_days'] to another column df["sampling_date"] but getting Overflow in int64 addition.
Source code-
df['sampling_date']=pd.to_datetime(df['sampling_date'], errors='coerce')
df['future_date'] = df['sampling_date'] + pd.to_timedelta(df['num_days'], unit='D')
df['future_date'] = pd.to_datetime(df['future_date']).dt.strftime('%Y-%m-%d')
df['future_date'] = df['future_date'].astype(np.str)
df['future_date'] = np.where(df['num_days']<=0,0, df['future_date'])
for column df['num_days'], the values are as follows [0, 866, 729, 48357555, 567, 478]
I am trying to run this in unix server. Please help me resolving it.
答案1
得分: 0
问题在于这个数值:48357555
您可以创建一个简单的函数如下所示,以在出错时返回NaT
:
import numpy as np
import pandas as pd
# 这是一个示例数据框
df = pd.DataFrame({
'sampling_date': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01', '2022-05-01', '2022-06-01'],
'num_days': [0, 866, 729, 48357555, 567, 478]
})
df['sampling_date'] = pd.to_datetime(df['sampling_date'], errors='coerce')
def calculate_future_date(row):
try:
return row['sampling_date'] + pd.to_timedelta(row['num_days'], unit='D')
except:
return pd.NaT
# 对每一行应用函数
df['future_date'] = df.apply(calculate_future_date, axis=1)
df['future_date'] = np.where(df['num_days'] <= 0, df['sampling_date'], df['future_date'])
df['future_date'] = df['future_date'].dt.strftime('%Y-%m-%d').replace(pd.NaT, '0').astype(str)
print(df)
sampling_date num_days future_date
0 2022-01-01 0 2022-01-01
1 2022-02-01 866 2024-06-16
2 2022-03-01 729 2024-02-28
3 2022-04-01 48357555 0
4 2022-05-01 567 2023-11-19
5 2022-06-01 478 2023-09-22
英文:
The issue is this value: 48357555
You can create a simple function as shown below to return NaT
if error is thrown:
import numpy as np
import pandas as pd
# Here is an example df
df = pd.DataFrame({
'sampling_date': ['2022-01-01', '2022-02-01', '2022-03-01', '2022-04-01', '2022-05-01', '2022-06-01'],
'num_days': [0, 866, 729, 48357555, 567, 478]
})
df['sampling_date'] = pd.to_datetime(df['sampling_date'], errors='coerce')
def calculate_future_date(row):
try:
return row['sampling_date'] + pd.to_timedelta(row['num_days'], unit='D')
except:
return pd.NaT
# Apply the function to each row
df['future_date'] = df.apply(calculate_future_date, axis=1)
df['future_date'] = np.where(df['num_days'] <= 0, df['sampling_date'], df['future_date'])
df['future_date'] = df['future_date'].dt.strftime('%Y-%m-%d').replace(pd.NaT, '0').astype(str)
print(df)
sampling_date num_days future_date
0 2022-01-01 0 2022-01-01
1 2022-02-01 866 2024-06-16
2 2022-03-01 729 2024-02-28
3 2022-04-01 48357555 0
4 2022-05-01 567 2023-11-19
5 2022-06-01 478 2023-09-22
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论