英文:
Set first date of the year when only it has only the year in a pandas dataframe
问题
我有一个名为“date”的列在一个pandas数据框中,这是前10行:
0 22-Oct-2022
1 3-Dec-2019
2 27-Jun-2022
3 2023
4 15-Jul-2017
5 2019
6 7-Sep-2022
7 2021
8 30-Sep-2022
9 17-Aug-2021
我想要将所有这些日期转换为例如:
0 2023-05-19
1 2023-01-20
2 ...
对于那些只有年份的行,我想要设置为例如,如果原始数据框有:
0 2019
1 2021
变为:
5 2019-01-01
7 2021-01-01
换句话说,我想要为这些情况设置年份的第一个日期,但保留原始年份而不是当前年份。
我尝试过:
df['date'] = pd.to_datetime(df['date'], errors='coerce', format='%d-%b-%Y')
但它生成了NaT值。希望你们明白这个情况,我会感激任何修复这个问题的想法。
英文:
I have a column name called "date" in one pandas dataframe, this are the first 10 rows:
0 22-Oct-2022
1 3-Dec-2019
2 27-Jun-2022
3 2023
4 15-Jul-2017
5 2019
6 7-Sep-2022
7 2021
8 30-Sep-2022
9 17-Aug-2021
I want convert all those dates to for example:
0 2023-05-19
1 2023-01-20
2 ...
and for those rows that only has the YEAR I want set it to for example, if the original df has:
0 2019
1 2021
to
5 2019-01-01
7 2021-01-01
in other words I mean I want set for this cases set the first date of the year but keeping the original year not the current year.
I tried:
df['date'] = pd.to_datetime(df['date'], errors='coerce', format='%d-%b-%Y')
However it's generating NaT values. I hope that you understand this case guys, I will appreciate any idea to fix this problem
thanks.
答案1
得分: 5
You can set the format as mixed
(New in 2.0.0
, see GH50972) when calling to_datetime
:
> format
:str,默认为 None
>
> "mixed",用于单独推断每个元素的格式。这很冒险,你应该考虑与 dayfirst 一起使用。
df["date"] = pd.to_datetime(df["date"], format="mixed", dayfirst=True)
或者经典的双重日期解析 + fillna
:
df["date"] = (
pd.to_datetime(df["date"], errors="coerce", format="%Y")
.fillna(pd.to_datetime(df["date"], errors="coerce", dayfirst=True))
)
Output:
print(df)
date
0 2022-10-22
1 2019-12-03
2 2022-06-27
3 2023-01-01
4 2017-07-15
5 2019-01-01
6 2022-09-07
7 2021-01-01
8 2022-09-30
9 2021-08-17
英文:
You can set the format as mixed
(New in 2.0.0
, see GH50972) when calling to_datetime
:
> format
: str, default None
>
> "mixed", to infer the format for each element individually. This is
> risky, and you should probably use it along with dayfirst.
df["date"] = pd.to_datetime(df["date"], format="mixed", dayfirst=True)
Or a classical double date-parsing + fillna
:
df["date"] = (
pd.to_datetime(df["date"], errors="coerce", format="%Y")
.fillna(pd.to_datetime(df["date"], errors="coerce", dayfirst=True))
)
Output :
print(df)
date
0 2022-10-22
1 2019-12-03
2 2022-06-27
3 2023-01-01
4 2017-07-15
5 2019-01-01
6 2022-09-07
7 2021-01-01
8 2022-09-30
9 2021-08-17
答案2
得分: 1
你需要手动更新数值,首先可以将仅包含年份的行标准化,如下所示:
condition = data['date'].str.len() == 4
data.loc[condition, 'date'] = '1-Jan-' + data['date'].astype(str)
然后尝试在结果上使用 to_datetime
函数。
英文:
You'll have to update the values manually, first you can standarize the rows that only have the year like this:
condition = data['date'].length == 4
data.loc[condition, 'date'] = '1-Jan' + df['date'].astype(str)
and then try to use the to_datetime
function on the result
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论