如何将这些日期转换为Pandas数据框中的正确格式?

huangapple go评论70阅读模式
英文:

How can I convert these dates to the correct format in a Pandas Dataframe?

问题

我有一个包含一些日期的数据框,我想将它们转换为日期时间格式。所以我使用了pd.to_datetime函数来实现。然而,它只适用于一些日期,因为其他日期的顺序不正确。示例:

df = pd.DataFrame({'dates': ['December 2021 17', '2005 July 01', 'December 2000 01', '2008 May 11',
                            'October 2000 04', 'September 2016 04', 'May 1998 09']})

使用pd.to_datetime将仅返回yy-mm-dd顺序的值。我尝试将它们拆分为列表并尝试重新排序,但对我来说似乎没有起作用。

英文:

I have a dataframe with some dates and I want to convert them to datetime format. So I used the pd.to_datetime function to do so. However, it only works for some of the dates as the others are not written in the correct order. Example:

df = pd.DataFrame({'dates' : ['December 2021 17', '2005 July 01', 'December 2000 01', '2008 May 11', 
                              'October 2000 04', 'September 2016 04', 'May 1998 09']})

Using pd.to_datetime will only return values for the yy-mm-dd order. I tried splitting these into list and tried to reorder them, but that didn't seem to work for me.

答案1

得分: 3

你可以使用 apply 并将其传递给 to_datetime

df.dates = df.dates.apply(pd.to_datetime)

现在 df 的输出如下:

           dates
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09
英文:

You can use apply and give it to_datetime:

df.dates = df.dates.apply(pd.to_datetime)

This is the output of df now:

       dates
0 2021-12-17
1 2005-07-01
2 2000-12-01
3 2008-05-11
4 2000-10-04
5 2016-09-04
6 1998-05-09

答案2

得分: 1

一个选项是提取年份、月份和日期:

y = df['dates'].str.extract(r'(?P<year>\b\d{4}\b)', expand=False)
d = df['dates'].str.extract(r'(?P<day>\b\d{2}\b)', expand=False)
m = df['dates'].str.extract(r'(?P<month>\b[A-Za-z]+\b)', expand=False)

pd.to_datetime(y.str.cat([m, d]), format='%Y%B%d')

输出:

0   2021-12-17
1   2005-07-01
2   2000-12-01
3   2008-05-11
4   2000-10-04
5   2016-09-04
6   1998-05-09
英文:

One option is to extract the year, month and date

y = df[&#39;dates&#39;].str.extract(r&#39;(?P&lt;year&gt;\b\d{4}\b)&#39;,expand=False) 
d = df[&#39;dates&#39;].str.extract(r&#39;(?P&lt;day&gt;\b\d{2}\b)&#39;,expand = False) 
m = df[&#39;dates&#39;].str.extract(r&#39;(?P&lt;month&gt;\b[A-Za-z]+\b)&#39;,expand = False)

pd.to_datetime(y.str.cat([m,d]),format = &#39;%Y%B%d&#39;)

Output:

0   2021-12-17
1   2005-07-01
2   2000-12-01
3   2008-05-11
4   2000-10-04
5   2016-09-04
6   1998-05-09

答案3

得分: 1

If you are not comfortable using the apply function (functional programming) as suggested by @Marcelo Paco, you may try this.

Let your dataframe is called date_df. You can convert the dates column to your desired format as follows:

import pandas as pd

date_df['dates'] = pd.to_datetime(date_df['dates'])
date_df

Output:

    dates
0  2021-12-17
1  2005-07-01
2  2000-12-01
3  2008-05-11
4  2000-10-04
5  2016-09-04
6  1998-05-09
英文:

If you are not comfortable using apply function (functional programming) as suggested by @Marcelo Paco, you may try this.

Let your dataframe is called date_df. You can convert the dates column to your desired format as follows;

import pandas as pd


date_df[&#39;dates&#39;] = pd.to_datetime(date_df[&#39;dates&#39;])
date_df

Output:

	dates
0	2021-12-17
1	2005-07-01
2	2000-12-01
3	2008-05-11
4	2000-10-04
5	2016-09-04
6	1998-05-09

huangapple
  • 本文由 发表于 2023年3月12日 11:08:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/75710856.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定