英文:
Assign a row number of matching dates
问题
def add_values(group):
match_index = group['start Date'].eq(group['work date']).idxmax() # find the index of the matching date
group['row_number'] = '' # create a new column to hold the row numbers
for i in range(len(group)):
if i == match_index:
group.loc[i, 'row_number'] = '0'
elif i < match_index:
group.loc[i, 'row_number'] = '-' + str(match_index - i)
else:
group.loc[i, 'row_number'] = '+' + str(i - match_index)
return group
result = df.groupby('Work').apply(add_values)
这是你提供的代码的翻译,没有其他额外内容。
英文:
I have a dataframe
Work | start Date | work date |
---|---|---|
A | 20/01/2023 | 30/12/2023 |
B | 17/12/2023 | 05/12/2023 |
A | 20/01/2023 | 20/01/2023 |
B | 17/12/2023 | 07/09/2023 |
A | 20/01/2023 | 30/12/2023 |
B | 17/12/2023 | 05/05/2023 |
B | 17/12/2023 | 17/12/2023 |
A | 20/01/2023 | 30/06/2023 |
B | 17/12/2023 | 17/12/2023 |
C | 08/08/2023 | 17/12/2023 |
C | 08/08/2023 | 30/06/2023 |
C | 08/08/2023 | 17/06/2023 |
now i need a output like ,if start Date and Work date match for the particular value create a row_number column and assign zero .a value above start date should have 1,2,3,4 etc.. and value below the starting date should have -1,-2,-3,..etc in the row_number
Work | start Date | work date | row_number |
---|---|---|---|
A | 20/01/2023 | 30/12/2022 | -1 |
B | 17/06/2023 | 05/12/2023 | 1 |
A | 20/01/2023 | 20/01/2023 | 0 |
B | 17/06/2023 | 07/09/2022 | -1 |
A | 20/01/2023 | 30/12/2023 | 1 |
B | 17/06/2023 | 05/05/2023 | -2 |
B | 17/06/2023 | 17/12/2023 | 2 |
A | 20/01/2023 | 30/06/2023 | 2 |
B | 17/06/2023 | 17/06/2023 | 0 |
C | 08/08/2023 | 17/12/2023 | 1 |
C | 08/08/2023 | 30/06/2023 | -1 |
C | 08/08/2023 | 17/06/2023 | 2 |
def add_values(group):
match_index = group['Date'].eq(group['sop']).idxmax() # find the index of the matching date
group['values'] = '' # create a new column to hold the values
for i in range(len(group)):
if i == match_index:
group.loc[i, 'values'] = '0'
elif i < match_index:
group.loc[i, 'values'] = '-' + str(match_index - i)
else:
group.loc[i, 'values'] = '+' + str(i - match_index)
return group
result = df.groupby('model').apply(add_values)
答案1
得分: 3
我有与 @Corralien 相同的解释。
以下是使用 rank
的方法:
sd = pd.to_datetime(df['start Date'], dayfirst=True)
wd = pd.to_datetime(df['work date'], dayfirst=True)
m = sd == wd
df['row_number'] = (wd.groupby(df['Work'])
.transform(lambda g: (r:=g.rank(method='dense'))-r[m].squeeze())
)
输出:
Work start Date work date row_number
0 A 20/01/2023 30/12/2022 -1.0
1 B 17/06/2023 05/12/2023 2.0
2 A 20/01/2023 20/01/2023 0.0
3 B 17/06/2023 07/09/2023 1.0
4 A 20/01/2023 30/12/2023 2.0
5 B 17/06/2023 05/05/2023 -1.0
6 B 17/06/2023 17/12/2023 3.0
7 A 20/01/2023 30/06/2023 1.0
8 B 17/06/2023 17/06/2023 0.0
处理有缺失相等日期的组:
sd = pd.to_datetime(df['start Date'], dayfirst=True)
wd = pd.to_datetime(df['work date'], dayfirst=True)
m = sd == wd
df['row_number'] = (wd.groupby(df['Work'])
.transform(lambda g: (r:=g.rank(method='dense'))
-(1 if r[m].empty else r[m].iloc[0])
)
)
输出:
Work start Date work date row_number
0 A 20/01/2023 30/12/2023 2.0
1 B 17/12/2023 05/12/2023 -1.0
2 A 20/01/2023 20/01/2023 0.0
3 B 17/12/2023 07/09/2023 -2.0
4 A 20/01/2023 30/12/2023 2.0
5 B 17/12/2023 05/05/2023 -3.0
6 B 17/12/2023 17/12/2023 0.0
7 A 20/01/2023 30/06/2023 1.0
8 B 17/12/2023 17/12/2023 0.0
9 C 08/08/2023 17/12/2023 2.0
10 C 08/08/2023 30/06/2023 1.0
11 C 08/08/2023 17/06/2023 0.0
英文:
I had the same interpretation than @Corralien.
Here is my approach using rank
:
sd = pd.to_datetime(df['start Date'], dayfirst=True)
wd = pd.to_datetime(df['work date'], dayfirst=True)
m = sd==wd
df['row_number'] = (wd.groupby(df['Work'])
.transform(lambda g: (r:=g.rank(method='dense'))-r[m].squeeze())
)
Output:
Work start Date work date row_number
0 A 20/01/2023 30/12/2022 -1.0
1 B 17/06/2023 05/12/2023 2.0
2 A 20/01/2023 20/01/2023 0.0
3 B 17/06/2023 07/09/2023 1.0
4 A 20/01/2023 30/12/2023 2.0
5 B 17/06/2023 05/05/2023 -1.0
6 B 17/06/2023 17/12/2023 3.0
7 A 20/01/2023 30/06/2023 1.0
8 B 17/06/2023 17/06/2023 0.0
handling groups with missing equal dates:
sd = pd.to_datetime(df['start Date'], dayfirst=True)
wd = pd.to_datetime(df['work date'], dayfirst=True)
m = sd==wd
df['row_number'] = (wd.groupby(df['Work'])
.transform(lambda g: (r:=g.rank(method='dense'))
-(1 if r[m].empty else r[m].iloc[0])
)
)
Output:
Work start Date work date row_number
0 A 20/01/2023 30/12/2023 2.0
1 B 17/12/2023 05/12/2023 -1.0
2 A 20/01/2023 20/01/2023 0.0
3 B 17/12/2023 07/09/2023 -2.0
4 A 20/01/2023 30/12/2023 2.0
5 B 17/12/2023 05/05/2023 -3.0
6 B 17/12/2023 17/12/2023 0.0
7 A 20/01/2023 30/06/2023 1.0
8 B 17/12/2023 17/12/2023 0.0
9 C 08/08/2023 17/12/2023 2.0
10 C 08/08/2023 30/06/2023 1.0
11 C 08/08/2023 17/06/2023 0.0
答案2
得分: 2
以下是代码的翻译:
# 如果没有与值C匹配的日期,但有开始日期,那么对于所有日期,它将打印负值,如果在开始日期之上有值,则将其打印为正值,如果没有匹配日期,则将其打印为负值。
你可以使用:
如果尚未转换为DatetimeIndex,请进行转换
df['start Date'] = pd.to_datetime(df['start Date'], dayfirst=True)
df['work date'] = pd.to_datetime(df['work date'], dayfirst=True)
def row_num(df):
days = df['work date'].sub(df['start Date']).dt.days
same = days == 0
before = days < 0
after = days > 0
days[before] = np.arange(-before.sum(), 0, 1)
days[after] = np.arange(1, after.sum()+1, 1)
return days
df['row_number'] = (df.sort_values('work date')
.groupby(['Work', 'start Date'], as_index=False)
.apply(row_num).droplevel(0))
输出:
df.sort_values(['Work', 'row_number'])
Work start Date work date row_number
0 A 2023-01-20 2022-12-30 -1
2 A 2023-01-20 2023-01-20 0
7 A 2023-01-20 2023-06-30 1
4 A 2023-01-20 2023-12-30 2
3 B 2023-06-17 2022-09-07 -2
5 B 2023-06-17 2023-05-05 -1
8 B 2023-06-17 2023-06-17 0
1 B 2023-06-17 2023-12-05 1
6 B 2023-06-17 2023-12-17 2
11 C 2023-08-08 2023-06-17 -2
10 C 2023-08-08 2023-06-30 -1
9 C 2023-08-08 2023-12-17 1
**另一个使用`rank`的解决方案**
如果你更喜欢,也可以使用`rank`,但需要在开始日期之前和之后分开工作日期:
```python
# 如果尚未转换为DatetimeIndex,请进行转换
df['start Date'] = pd.to_datetime(df['start Date'], dayfirst=True)
df['work date'] = pd.to_datetime(df['work date'], dayfirst=True)
def row_num(x):
before = x < 0
after = x > 0
return pd.concat([-x[before].rank(method='dense', ascending=False),
x[~before&~after],
x[after].rank(method='dense')]).astype(int)
df['row_number'] = (df.assign(days=df['work date'].sub(df['start Date']).dt.days)
.groupby(['Work', 'start Date'])['days'].transform(row_num))
输出:
>>> df.sort_values(['Work', 'row_number'])
Work start Date work date row_number
0 A 2023-01-20 2022-12-30 -1
2 A 2023-01-20 2023-01-20 0
7 A 2023-01-20 2023-06-30 1
4 A 2023-01-20 2023-12-30 2
3 B 2023-06-17 2022-09-07 -2
5 B 2023-06-17 2023-05-05 -1
8 B 2023-06-17 2023-06-17 0
1 B 2023-06-17 2023-12-05 1
6 B 2023-06-17 2023-12-17 2
11 C 2023-08-08 2023-06-17 -2
10 C 2023-08-08 2023-06-30 -1
9 C 2023-08-08 2023-12-17 1
英文:
> If there no matching date for value C but it has start date ,then it is printing negative value for all ,i need positive value for value above start date and below start as negative value if there is no matching date as well.
You can use:
# Convert to DatetimeIndex if not already the case
df['start Date'] = pd.to_datetime(df['start Date'], dayfirst=True)
df['work date'] = pd.to_datetime(df['work date'], dayfirst=True)
def row_num(df):
days = df['work date'].sub(df['start Date']).dt.days
same = days == 0
before = days < 0
after = days > 0
days[before] = np.arange(-before.sum(), 0, 1)
days[after] = np.arange(1, after.sum()+1, 1)
return days
df['row_number'] = (df.sort_values('work date')
.groupby(['Work', 'start Date'], as_index=False)
.apply(row_num).droplevel(0))
Output:
>>> df.sort_values(['Work', 'row_number'])
Work start Date work date row_number
0 A 2023-01-20 2022-12-30 -1
2 A 2023-01-20 2023-01-20 0
7 A 2023-01-20 2023-06-30 1
4 A 2023-01-20 2023-12-30 2
3 B 2023-06-17 2022-09-07 -2
5 B 2023-06-17 2023-05-05 -1
8 B 2023-06-17 2023-06-17 0
1 B 2023-06-17 2023-12-05 1
6 B 2023-06-17 2023-12-17 2
11 C 2023-08-08 2023-06-17 -2
10 C 2023-08-08 2023-06-30 -1
9 C 2023-08-08 2023-12-17 1
Another solution with rank
You can use rank
if you prefer but you have to separate work dates before and after start date:
# Convert to DatetimeIndex if not already the case
df['start Date'] = pd.to_datetime(df['start Date'], dayfirst=True)
df['work date'] = pd.to_datetime(df['work date'], dayfirst=True)
def row_num(x):
before = x < 0
after = x > 0
return pd.concat([-x[before].rank(method='dense', ascending=False),
x[~before&~after],
x[after].rank(method='dense')]).astype(int)
df['row_number'] = (df.assign(days=df['work date'].sub(df['start Date']).dt.days)
.groupby(['Work', 'start Date'])['days'].transform(row_num))
Output:
>>> df.sort_values(['Work', 'row_number'])
Work start Date work date row_number
0 A 2023-01-20 2022-12-30 -1
2 A 2023-01-20 2023-01-20 0
7 A 2023-01-20 2023-06-30 1
4 A 2023-01-20 2023-12-30 2
3 B 2023-06-17 2022-09-07 -2
5 B 2023-06-17 2023-05-05 -1
8 B 2023-06-17 2023-06-17 0
1 B 2023-06-17 2023-12-05 1
6 B 2023-06-17 2023-12-17 2
11 C 2023-08-08 2023-06-17 -2
10 C 2023-08-08 2023-06-30 -1
9 C 2023-08-08 2023-12-17 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论