如何在多年内获取起始日期和结束日期的唯一周数 – Pandas

huangapple go评论66阅读模式
英文:

how to get a unique week number for start and end dates in multi years - Pandas

问题

我有一个数据框,其中两列表示数据记录的开始和结束日期。有多个年份。我的目标是为每一行分配一个新的列,该列表示数据记录的时间步长。由于我也有位置列,因此这些周中的一些将重复。

import pandas as pd

dates = pd.date_range(start='2021-11-11', periods=20, freq='W')

df = pd.DataFrame({
    'start_date': np.repeat(dates, 5),
    'end_date': np.repeat(dates + pd.DateOffset(days=6), 5),
    'country': ['USA', 'Canada', 'UK', 'Australia', 'Russia'] * 20
})

df = df.sort_values("start_date")

start_date	end_date	country	
0	2021-11-14	2021-11-20	USA	
1	2021-11-14	2021-11-20	Canada
2	2021-11-14	2021-11-20	UK
3	2021-11-14	2021-11-20	Australia	
4	2021-11-14	2021-11-20	Russia	

我可以使用 isocalendar().week 获取周数,但它会给出相应年份的周数。例如,如果 2021-11-142021-11-20 是数据框中的第一周,它应该得到 1。它可能跳过下一周,并且有另一条记录从 2021-11-27 开始。对于我来说,这样的时间步长应该是数据框中的第二周。

英文:

I have a dataframe where two of the columns represent the start and end date of the data record. There are multiple years. My goal is to assign a new column that represents the time step of the data record in each row. Since I have a location columns as well, some of these weeks will be repeating.

import pandas as pd


dates = pd.date_range(start='2021-11-11', periods=20, freq='W')


df = pd.DataFrame({
    'start_date': np.repeat(dates, 5),
    'end_date': np.repeat(dates + pd.DateOffset(days=6), 5),
    'country': ['USA', 'Canada', 'UK', 'Australia', 'Russia'] * 20
})

df = df.sort_values("start_date")


	start_date	end_date	country	
0	2021-11-14	2021-11-20	USA	
1	2021-11-14	2021-11-20	Canada
2	2021-11-14	2021-11-20	UK
3	2021-11-14	2021-11-20	Australia	
4	2021-11-14	2021-11-20	Russia	

I can get the week number using isocalendar().week, but it is giving the week number of the corresponding year. For instance, if 2021-11-14 and 2021-11-20 is the first week in the data frame, it should get 1. It may skip the next week, and have another record starting from 2021-11-27. Such time step should be the second week for me in the data frame.

答案1

得分: 2

理解的话,你可以使用 groupby_ngroup 方法:

df['week'] = df.groupby(df['start_date']).ngroup().add(1)
print(df)

# 输出
   start_date   end_date    country  week
0  2021-11-14 2021-11-20        USA     1
1  2021-11-14 2021-11-20     Canada     1
2  2021-11-14 2021-11-20         UK     1
3  2021-11-14 2021-11-20  Australia     1
4  2021-11-14 2021-11-20     Russia     1
..        ...        ...        ...   ...
98 2022-03-27 2022-04-02  Australia    20
95 2022-03-27 2022-04-02        USA    20
96 2022-03-27 2022-04-02     Canada    20
97 2022-03-27 2022-04-02         UK    20
99 2022-03-27 2022-04-02     Russia    20

[100 rows x 4 columns]

另一种方法是使用 pd.factorize (如果数据框已按 start_date 值排序):

df['week'] = pd.factorize(df['start_date'])[0] + 1
英文:

IIUC, you can use groupby_ngroup:

df['week'] = df.groupby(df['start_date']).ngroup().add(1)
print(df)

# Output
   start_date   end_date    country  week
0  2021-11-14 2021-11-20        USA     1
1  2021-11-14 2021-11-20     Canada     1
2  2021-11-14 2021-11-20         UK     1
3  2021-11-14 2021-11-20  Australia     1
4  2021-11-14 2021-11-20     Russia     1
..        ...        ...        ...   ...
98 2022-03-27 2022-04-02  Australia    20
95 2022-03-27 2022-04-02        USA    20
96 2022-03-27 2022-04-02     Canada    20
97 2022-03-27 2022-04-02         UK    20
99 2022-03-27 2022-04-02     Russia    20

[100 rows x 4 columns]

Alternative with pd.factorize (IF the dataframe is already sorted by start_date value:

df['week'] = pd.factorize(df['start_date'])[0] + 1

答案2

得分: 1

你可以使用 dt.week 并从中减去最小的 week 值。为了确保第一周始终标记为1,你可以在减法结果上加1。

df['start_date'] = pd.to_datetime(df['start_date'])
df['time_step'] = df['start_date'].dt.week - df['start_date'].dt.week.min() + 1
英文:

You can use dt.week and subtract the minimum week from it. In order to ensure the first week is always labeled 1, you can add 1 to the subtraction.

df['start_date'] = pd.to_datetime(df['start_date'])
df['time_step'] = df['start_date'].dt.week - df['start_date'].dt.week.min() + 1

huangapple
  • 本文由 发表于 2023年6月19日 22:12:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76507449.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定