2023年5月25日 07:06:28go评论97阅读模式

英文:

Set first date of the year when only it has only the year in a pandas dataframe

问题

我有一个名为“date”的列在一个pandas数据框中，这是前10行：

0 22-Oct-2022
1 3-Dec-2019
2 27-Jun-2022
3 2023
4 15-Jul-2017
5 2019
6 7-Sep-2022
7 2021
8 30-Sep-2022
9 17-Aug-2021

我想要将所有这些日期转换为例如：

0 2023-05-19
1 2023-01-20
2 ...

对于那些只有年份的行，我想要设置为例如，如果原始数据框有：

0 2019
1 2021

变为：

5 2019-01-01
7 2021-01-01

换句话说，我想要为这些情况设置年份的第一个日期，但保留原始年份而不是当前年份。

我尝试过：

df['date'] = pd.to_datetime(df['date'], errors='coerce', format='%d-%b-%Y')

但它生成了NaT值。希望你们明白这个情况，我会感激任何修复这个问题的想法。

英文:

I have a column name called "date" in one pandas dataframe, this are the first 10 rows:

0    22-Oct-2022
1     3-Dec-2019
2    27-Jun-2022
3           2023
4    15-Jul-2017
5           2019
6     7-Sep-2022
7           2021
8    30-Sep-2022
9    17-Aug-2021

I want convert all those dates to for example:

0    2023-05-19 
1    2023-01-20 
2    ...

and for those rows that only has the YEAR I want set it to for example, if the original df has:

0           2019
1           2021

5           2019-01-01
7           2021-01-01

in other words I mean I want set for this cases set the first date of the year but keeping the original year not the current year.

I tried:

df[&#39;date&#39;] = pd.to_datetime(df[&#39;date&#39;], errors=&#39;coerce&#39;, format=&#39;%d-%b-%Y&#39;)

However it's generating NaT values. I hope that you understand this case guys, I will appreciate any idea to fix this problem

thanks.

答案1

得分: 5

You can set the format as mixed (New in 2.0.0, see GH50972) when calling to_datetime:

> format：str，默认为 None
>
> "mixed"，用于单独推断每个元素的格式。这很冒险，你应该考虑与 dayfirst 一起使用。

df["date"] = pd.to_datetime(df["date"], format="mixed", dayfirst=True)

或者经典的双重日期解析 + fillna：

df["date"] = (
pd.to_datetime(df["date"], errors="coerce", format="%Y")
.fillna(pd.to_datetime(df["date"], errors="coerce", dayfirst=True))
)

Output：

print(df)

    date

0 2022-10-22
1 2019-12-03
2 2022-06-27
3 2023-01-01
4 2017-07-15
5 2019-01-01
6 2022-09-07
7 2021-01-01
8 2022-09-30
9 2021-08-17

英文:

You can set the format as mixed (New in 2.0.0, see GH50972) when calling to_datetime :

> format : str, default None
>
> "mixed", to infer the format for each element individually. This is
> risky, and you should probably use it along with dayfirst.

df[&quot;date&quot;] = pd.to_datetime(df[&quot;date&quot;], format=&quot;mixed&quot;, dayfirst=True)

Or a classical double date-parsing + fillna :

df[&quot;date&quot;] = (
    pd.to_datetime(df[&quot;date&quot;], errors=&quot;coerce&quot;, format=&quot;%Y&quot;)
        .fillna(pd.to_datetime(df[&quot;date&quot;], errors=&quot;coerce&quot;, dayfirst=True))
)

Output :

print(df)
        date
0 2022-10-22
1 2019-12-03
2 2022-06-27
3 2023-01-01
4 2017-07-15
5 2019-01-01
6 2022-09-07
7 2021-01-01
8 2022-09-30
9 2021-08-17

答案2

得分: 1

你需要手动更新数值，首先可以将仅包含年份的行标准化，如下所示：

condition = data['date'].str.len() == 4
data.loc[condition, 'date'] = '1-Jan-' + data['date'].astype(str)

然后尝试在结果上使用 to_datetime 函数。

英文:

You'll have to update the values manually, first you can standarize the rows that only have the year like this:

condition = data[&#39;date&#39;].length == 4
data.loc[condition, &#39;date&#39;] = &#39;1-Jan&#39; + df[&#39;date&#39;].astype(str)

and then try to use the to_datetime function on the result

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在一个 pandas 数据框中，当只有年份信息时，设置为该年的第一天。

问题

答案1

答案2

不能在从父类继承的子类上使用装饰器，但可以在对象本身上使用。

错误：连接被拒绝

Django应用程序出现错误：”TypeError: ‘dict_keys’对象不可订阅”

允许 eval() 仅评估算术表达式和特定函数。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。