英文:
How can I go about fixing the following messy dates in my .xlxs file
问题
我有以下的 .xlxs 文件,我已经将它上传到了谷歌云盘 这里,其中包含了 10 个不同的日期,我正在尝试使用 pandas.to_datetime 来清理混乱的日期。然而,我意识到这个函数对于第 2 行(1414/01/2019)和第 3 行(110/05/2019)的日期有一些限制。
显然,我想要修复第 2 行和第 3 行的日期分别为 (14/01/2019) 和 (11/05/2019)。但是如果这不能实现,是否有一种方法可以跳过具有无法使用 pandas.to_datetime 修复的日期格式的行?
import pandas as pd
test = pd.read_excel('Messy_Dates.xlsx', usecols=[0], header=None, skiprows=[0])
英文:
I have the following .xlxs file, I have uploaded it on google drive here, with 10 different dates and I was experimenting with pandas.to_datetime to clean the messy dates that I have. However I realised that the function has some limits specifically for the dates I have in row 2 (1414/01/2019) and 3 (110/05/2019).
Obviously I would like to fix the dates in row 2 and 3 to (14/01/2019) and (11/05/2019) respectively. However if this cannot be done is there a way on how I can skip over rows that have a date format that cannot be fixed with pandas.to_datetime?
import pandas as pd
test=pd.read_excel('Messy_Dates.xlsx', usecols=[0], header=None, skiprows = [0])
答案1
得分: 1
使用to_datetime
,参数设置为dayfirst=True
和errors='coerce'
,以便通过Series.dropna
删除无法解析的日期时间:
test=pd.read_excel('Messy_Dates.xlsx', usecols=[0], header=None, skiprows=[0])
out = pd.to_datetime(test[0], dayfirst=True, errors='coerce').dropna()
print(out)
0 2018-05-16
1 2018-05-17
4 2019-05-22
5 2019-05-24
6 2019-05-25
7 2019-05-28
Name: 0, dtype: datetime64[ns]
英文:
Use to_datetime
with dayfirst=True
and errors='coerce'
, so possible remove not parseable datetimes by Series.dropna
:
test=pd.read_excel('Messy_Dates.xlsx', usecols=[0], header=None, skiprows = [0])
out = pd.to_datetime(test[0], dayfirst=True, errors='coerce').dropna()
print (out)
0 2018-05-16
1 2018-05-17
4 2019-05-22
5 2019-05-24
6 2019-05-25
7 2019-05-28
Name: 0, dtype: datetime64[ns]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论