2023年2月18日 03:40:00go评论130阅读模式

英文:

Pandas Sort Two Columns with Day of Year Wrap-Around to New Year

问题

我有一些数据，在每年的某个时候，通常是在每年的年初附近，一个"day_of_year"序列会涉及将"year"列更改为新的年份，当"day_of_year"等于1时。这是一个我一直没有弄清楚的技巧，某种程度上不确定如何开始，所以在这里的任何帮助都非常感激。我的数据如下：

这是我的df1：

day_of_year	year	var_1
364	2017	17.71666667
364	2018	5.166666667
364	2019	2
364	2020	1.595833333
364	2021	3.75
364	2022	6.8875
365	2017	14.83333333
365	2018	2.758333333
365	2019	4.108333333
365	2020	5.766666667
365	2021	5.291666667
365	2022	10.58636364
1	2017	2.0125
1	2018	14.0125
1	2019	-0.504166667
1	2020	7.666666667
1	2021	5.520833333
1	2022	1.229166667
2	2017	1.7625
2	2018	15.10416667
2	2019	-0.391666667
2	2020	9.5
2	2021	7.645833333
2	2022	0.9125

在重新格式化后，我需要它看起来像下面排序的df，对于任何可能缺少数据的年份，需要填入"n/a"。再次感谢您，

最终的df：

day_of_year	year	var_1
364	2017	17.71666667
365	2017	14.83333333
1	2018	14.0125
2	2018	15.10416667
364	2018	5.166666667
365	2018	2.758333333
1	2019	-0.504166667
2	2019	-0.391666667
364	2019	2
365	2019	4.108333333
1	2020	7.666666667
2	2020	9.5
364	2020	1.595833333
365	2020	5.766666667
1	2021	5.520833333
2	2021	7.645833333
364	2021	3.75
365	2021	5.291666667
1	2022	1.229166667
2	2022	0.9125
364	2022	6.8875
365	2022	10.58636364
n/a	n/a	n/a
n/a	n/a	n/a

英文:

I have data that may at certain times of the year around the first of each year, that a day_of_year sequence involves changing the "year" column to the new year when day_of_year ==1. It is a trick that I have not been able to figure out and in some ways not sure how to start so any help here is much appreciated. My data looks like this:

Here is my df1 =

day_of_year	year	var_1
364	2017	17.71666667
364	2018	5.166666667
364	2019	2
364	2020	1.595833333
364	2021	3.75
364	2022	6.8875
365	2017	14.83333333
365	2018	2.758333333
365	2019	4.108333333
365	2020	5.766666667
365	2021	5.291666667
365	2022	10.58636364
1	2017	2.0125
1	2018	14.0125
1	2019	-0.504166667
1	2020	7.666666667
1	2021	5.520833333
1	2022	1.229166667
2	2017	1.7625
2	2018	15.10416667
2	2019	-0.391666667
2	2020	9.5
2	2021	7.645833333
2	2022	0.9125

And, after the re-formatting, I need it to look like the below sorted df with "n/a" for any missing or expected data in a year that might be missing data. thank you again,

final df:

day_of_year	year	var_1
364	2017	17.71666667
365	2017	14.83333333
1	2018	14.0125
2	2018	15.10416667
364	2018	5.166666667
365	2018	2.758333333
1	2019	-0.504166667
2	2019	-0.391666667
364	2019	2
365	2019	4.108333333
1	2020	7.666666667
2	2020	9.5
364	2020	1.595833333
365	2020	5.766666667
1	2021	5.520833333
2	2021	7.645833333
364	2021	3.75
365	2021	5.291666667
1	2022	1.229166667
2	2022	0.9125
364	2022	6.8875
365	2022	10.58636364
n/a	n/a	n/a
n/a	n/a	n/a

答案1

得分: 1

为什么要根据日期改变年份？只需按两列排序：

df.sort_values(by=['year', 'day_of_year'])

输出：

    day_of_year  year      var_1
0           364  2017  17.716667
6           365  2017  14.833333
12            1  2018   2.012500
18            2  2018   1.762500
1           364  2018   5.166667
7           365  2018   2.758333
13            1  2019  14.012500
19            2  2019  15.104167
2           364  2019   2.000000
8           365  2019   4.108333
14            1  2020  -0.504167
20            2  2020  -0.391667
3           364  2020   1.595833
9           365  2020   5.766667
15            1  2021   7.666667
21            2  2021   9.500000
4           364  2021   3.750000
10          365  2021   5.291667
16            1  2022   5.520833
22            2  2022   7.645833
5           364  2022   6.887500
11          365  2022  10.586364
17            1  2023   1.229167
23            2  2023   0.912500

如果因某种原因你确实需要修复年份，请使用条件语句和 mask 函数：

(df.assign(year=df['year'].mask(df['day_of_year'].le(2), df['year'].add(1)))
   .sort_values(by=['year', 'day_of_year'])
)

或者，如果你想在从365天变为较低天数后更新年份：

(df.assign(year=df['year'].add(df['day_of_year'].diff().lt(0).cumsum()))
   .sort_values(by=['year', 'day_of_year'])
)

输出：

    day_of_year  year      var_1
0           364  2017  17.716667
6           365  2017  14.833333
12            1  2018   2.012500
18            2  2018   1.762500
1           364  2018   5.166667
7           365  2018   2.758333
13            1  2019  14.012500
19            2  2019  15.104167
2           364  2019   2.000000
8           365  2019   4.108333
14            1  2020  -0.504167
20            2  2020  -0.391667
3           364  2020   1.595833
9           365  2020   5.766667
15            1  2021   7.666667
21            2  2021   9.500000
4           364  2021   3.750000
10          365  2021   5.291667
16            1  2022   5.520833
22            2  2022   7.645833
5           364  2022   6.887500
11          365  2022  10.586364
17            1  2023   1.229167
23            2  2023   0.912500

英文:

Why would you change the year based on the day? Just sort by the two columns:

df.sort_values(by=[&#39;year&#39;, &#39;day_of_year&#39;])

Output:

    day_of_year  year      var_1
12            1  2017   2.012500
18            2  2017   1.762500
0           364  2017  17.716667
6           365  2017  14.833333
13            1  2018  14.012500
19            2  2018  15.104167
1           364  2018   5.166667
7           365  2018   2.758333
14            1  2019  -0.504167
20            2  2019  -0.391667
2           364  2019   2.000000
8           365  2019   4.108333
15            1  2020   7.666667
21            2  2020   9.500000
3           364  2020   1.595833
9           365  2020   5.766667
16            1  2021   5.520833
22            2  2021   7.645833
4           364  2021   3.750000
10          365  2021   5.291667
17            1  2022   1.229167
23            2  2022   0.912500
5           364  2022   6.887500
11          365  2022  10.586364

If for some reason you really need to fix the year, use a conditional with mask:

(df.assign(year=df[&#39;year&#39;].mask(df[&#39;day_of_year&#39;].le(2), df[&#39;year&#39;].add(1)))
   .sort_values(by=[&#39;year&#39;, &#39;day_of_year&#39;])
)

Or, if you want to update the years after a change from 365 to a lower day:

(df.assign(year=df[&#39;year&#39;].add(df[&#39;day_of_year&#39;].diff().lt(0).cumsum()))
   .sort_values(by=[&#39;year&#39;, &#39;day_of_year&#39;])
)

Output:

    day_of_year  year      var_1
0           364  2017  17.716667
6           365  2017  14.833333
12            1  2018   2.012500
18            2  2018   1.762500
1           364  2018   5.166667
7           365  2018   2.758333
13            1  2019  14.012500
19            2  2019  15.104167
2           364  2019   2.000000
8           365  2019   4.108333
14            1  2020  -0.504167
20            2  2020  -0.391667
3           364  2020   1.595833
9           365  2020   5.766667
15            1  2021   7.666667
21            2  2021   9.500000
4           364  2021   3.750000
10          365  2021   5.291667
16            1  2022   5.520833
22            2  2022   7.645833
5           364  2022   6.887500
11          365  2022  10.586364
17            1  2023   1.229167
23            2  2023   0.912500

答案2

得分: 0

我会首先将所有内容转换为日期时间格式。只需运行：

pd.to_datetime(df['day_of_year'].astype(str) + '-' + df['year'].astype(str), 
               format='%j-%Y')

我将其赋值给列 ymd 并进行排序，得到以下结果：

df.sort_values('ymd')

    day_of_year  year      var_1        ymd
12            1  2017   2.012500 2017-01-01
18            2  2017   1.762500 2017-01-02
0           364  2017  17.716667 2017-12-30
6           365  2017  14.833333 2017-12-31
13            1  2018  14.012500 2018-01-01
19            2  2018  15.104167 2018-01-02
1           364  2018   5.166667 2018-12-30
7           365  2018   2.758333 2018-12-31
14            1  2019  -0.504167 2019-01-01
20            2  2019  -0.391667 2019-01-02
2           364  2019   2.000000 2019-12-30
8           365  2019   4.108333 2019-12-31
15            1  2020   7.666667 2020-01-01
21            2  2020   9.500000 2020-01-02
3           364  2020   1.595833 2020-12-29
9           365  2020   5.766667 2020-12-30
16            1  2021   5.520833 2021-01-01
22            2  2021   7.645833 2021-01-02
4           364  2021   3.750000 2021-12-30
10          365  2021   5.291667 2021-12-31
17            1  2022   1.229167 2022-01-01
23            2  2022   0.912500 2022-01-02
5           364  2022   6.887500 2022-12-30
11          365  2022  10.586364 2022-12-31

英文:

I would convert everything to date time first. Just run:

pd.to_datetime(df[&#39;day_of_year&#39;].astype(str) + &#39;-&#39; + df[&#39;year&#39;].astype(str), 
               format=&#39;%j-%Y&#39;)

I assign it to column ymd and sort, yielding the following:

&gt;&gt;&gt; df.sort_values(&#39;ymd&#39;)
    day_of_year  year      var_1        ymd
12            1  2017   2.012500 2017-01-01
18            2  2017   1.762500 2017-01-02
0           364  2017  17.716667 2017-12-30
6           365  2017  14.833333 2017-12-31
13            1  2018  14.012500 2018-01-01
19            2  2018  15.104167 2018-01-02
1           364  2018   5.166667 2018-12-30
7           365  2018   2.758333 2018-12-31
14            1  2019  -0.504167 2019-01-01
20            2  2019  -0.391667 2019-01-02
2           364  2019   2.000000 2019-12-30
8           365  2019   4.108333 2019-12-31
15            1  2020   7.666667 2020-01-01
21            2  2020   9.500000 2020-01-02
3           364  2020   1.595833 2020-12-29
9           365  2020   5.766667 2020-12-30
16            1  2021   5.520833 2021-01-01
22            2  2021   7.645833 2021-01-02
4           364  2021   3.750000 2021-12-30
10          365  2021   5.291667 2021-12-31
17            1  2022   1.229167 2022-01-01
23            2  2022   0.912500 2022-01-02
5           364  2022   6.887500 2022-12-30
11          365  2022  10.586364 2022-12-31

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas按年份日期排序两列，年末环绕到新年。

问题

答案1

答案2

Pandas数据框架 – 通过NaN值分隔句子

Data frame indexing not working as it should be. Does not give error as well. Pandas-Python.

Python: Pandas链接：如何将文本添加到绘图？

在Python中，使用多个CSV文件的数据将新行附加到现有的Excel表格中。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论