在一个 pandas 数据框中,当只有年份信息时,设置为该年的第一天。

huangapple go评论67阅读模式
英文:

Set first date of the year when only it has only the year in a pandas dataframe

问题

我有一个名为“date”的列在一个pandas数据框中,这是前10行:

0 22-Oct-2022
1 3-Dec-2019
2 27-Jun-2022
3 2023
4 15-Jul-2017
5 2019
6 7-Sep-2022
7 2021
8 30-Sep-2022
9 17-Aug-2021

我想要将所有这些日期转换为例如:

0 2023-05-19
1 2023-01-20
2 ...

对于那些只有年份的行,我想要设置为例如,如果原始数据框有:

0 2019
1 2021

变为:

5 2019-01-01
7 2021-01-01

换句话说,我想要为这些情况设置年份的第一个日期,但保留原始年份而不是当前年份。

我尝试过:

df['date'] = pd.to_datetime(df['date'], errors='coerce', format='%d-%b-%Y')

但它生成了NaT值。希望你们明白这个情况,我会感激任何修复这个问题的想法。

英文:

I have a column name called "date" in one pandas dataframe, this are the first 10 rows:

0    22-Oct-2022
1     3-Dec-2019
2    27-Jun-2022
3           2023
4    15-Jul-2017
5           2019
6     7-Sep-2022
7           2021
8    30-Sep-2022
9    17-Aug-2021

I want convert all those dates to for example:

0    2023-05-19 
1    2023-01-20 
2    ...

and for those rows that only has the YEAR I want set it to for example, if the original df has:

0           2019
1           2021

to

5           2019-01-01
7           2021-01-01

in other words I mean I want set for this cases set the first date of the year but keeping the original year not the current year.

I tried:

df['date'] = pd.to_datetime(df['date'], errors='coerce', format='%d-%b-%Y')

However it's generating NaT values. I hope that you understand this case guys, I will appreciate any idea to fix this problem

thanks.

答案1

得分: 5

You can set the format as mixed (New in 2.0.0, see GH50972) when calling to_datetime:

> formatstr,默认为 None
>
> "mixed",用于单独推断每个元素的格式。这很冒险,你应该考虑与 dayfirst 一起使用。

df["date"] = pd.to_datetime(df["date"], format="mixed", dayfirst=True)

或者经典的双重日期解析 + fillna

df["date"] = (
pd.to_datetime(df["date"], errors="coerce", format="%Y")
.fillna(pd.to_datetime(df["date"], errors="coerce", dayfirst=True))
)

Output:

print(df)

    date

0 2022-10-22
1 2019-12-03
2 2022-06-27
3 2023-01-01
4 2017-07-15
5 2019-01-01
6 2022-09-07
7 2021-01-01
8 2022-09-30
9 2021-08-17

英文:

You can set the format as mixed (New in 2.0.0, see GH50972) when calling to_datetime :

> format : str, default None
>
> "mixed", to infer the format for each element individually. This is
> risky, and you should probably use it along with dayfirst.

df["date"] = pd.to_datetime(df["date"], format="mixed", dayfirst=True)

Or a classical double date-parsing + fillna :

df["date"] = (
    pd.to_datetime(df["date"], errors="coerce", format="%Y")
        .fillna(pd.to_datetime(df["date"], errors="coerce", dayfirst=True))
)

Output :

print(df)

        date
0 2022-10-22
1 2019-12-03
2 2022-06-27
3 2023-01-01
4 2017-07-15
5 2019-01-01
6 2022-09-07
7 2021-01-01
8 2022-09-30
9 2021-08-17

答案2

得分: 1

你需要手动更新数值,首先可以将仅包含年份的行标准化,如下所示:

condition = data['date'].str.len() == 4
data.loc[condition, 'date'] = '1-Jan-' + data['date'].astype(str)

然后尝试在结果上使用 to_datetime 函数。

英文:

You'll have to update the values manually, first you can standarize the rows that only have the year like this:

condition = data['date'].length == 4
data.loc[condition, 'date'] = '1-Jan' + df['date'].astype(str)

and then try to use the to_datetime function on the result

huangapple
  • 本文由 发表于 2023年5月25日 07:06:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327919.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定