英文:
How can I convert these dates to the correct format in a Pandas Dataframe?
问题
我有一个包含一些日期的数据框,我想将它们转换为日期时间格式。所以我使用了pd.to_datetime
函数来实现。然而,它只适用于一些日期,因为其他日期的顺序不正确。示例:
df = pd.DataFrame({'dates': ['December 2021 17', '2005 July 01', 'December 2000 01', '2008 May 11',
'October 2000 04', 'September 2016 04', 'May 1998 09']})
使用pd.to_datetime
将仅返回yy-mm-dd
顺序的值。我尝试将它们拆分为列表并尝试重新排序,但对我来说似乎没有起作用。
英文:
I have a dataframe with some dates and I want to convert them to datetime format. So I used the pd.to_datetime
function to do so. However, it only works for some of the dates as the others are not written in the correct order. Example:
df = pd.DataFrame({'dates' : ['December 2021 17', '2005 July 01', 'December 2000 01', '2008 May 11',
'October 2000 04', 'September 2016 04', 'May 1998 09']})
Using pd.to_datetime
will only return values for the yy-mm-dd
order. I tried splitting these into list and tried to reorder them, but that didn't seem to work for me.
答案1
得分: 3
你可以使用 apply
并将其传递给 to_datetime
:
df.dates = df.dates.apply(pd.to_datetime)
现在 df
的输出如下:
dates
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
英文:
You can use apply
and give it to_datetime
:
df.dates = df.dates.apply(pd.to_datetime)
This is the output of df
now:
dates
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
答案2
得分: 1
一个选项是提取年份、月份和日期:
y = df['dates'].str.extract(r'(?P<year>\b\d{4}\b)', expand=False)
d = df['dates'].str.extract(r'(?P<day>\b\d{2}\b)', expand=False)
m = df['dates'].str.extract(r'(?P<month>\b[A-Za-z]+\b)', expand=False)
pd.to_datetime(y.str.cat([m, d]), format='%Y%B%d')
输出:
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
英文:
One option is to extract the year, month and date
y = df['dates'].str.extract(r'(?P<year>\b\d{4}\b)',expand=False)
d = df['dates'].str.extract(r'(?P<day>\b\d{2}\b)',expand = False)
m = df['dates'].str.extract(r'(?P<month>\b[A-Za-z]+\b)',expand = False)
pd.to_datetime(y.str.cat([m,d]),format = '%Y%B%d')
Output:
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
答案3
得分: 1
If you are not comfortable using the apply
function (functional programming) as suggested by @Marcelo Paco, you may try this.
Let your dataframe is called date_df
. You can convert the dates
column to your desired format as follows:
import pandas as pd
date_df['dates'] = pd.to_datetime(date_df['dates'])
date_df
Output:
dates
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
英文:
If you are not comfortable using apply
function (functional programming) as suggested by @Marcelo Paco, you may try this.
Let your dataframe is called date_df
. You can convert the dates
column to your desired format as follows;
import pandas as pd
date_df['dates'] = pd.to_datetime(date_df['dates'])
date_df
Output:
dates
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论