为匹配的日期分配行号。

huangapple go评论72阅读模式
英文:

Assign a row number of matching dates

问题

def add_values(group):
    match_index = group['start Date'].eq(group['work date']).idxmax() # find the index of the matching date
    group['row_number'] = '' # create a new column to hold the row numbers
    for i in range(len(group)):
        if i == match_index:
            group.loc[i, 'row_number'] = '0'
        elif i < match_index:
            group.loc[i, 'row_number'] = '-' + str(match_index - i)
        else:
            group.loc[i, 'row_number'] = '+' + str(i - match_index)
    return group

result = df.groupby('Work').apply(add_values)

这是你提供的代码的翻译,没有其他额外内容。

英文:

I have a dataframe

Work start Date work date
A 20/01/2023 30/12/2023
B 17/12/2023 05/12/2023
A 20/01/2023 20/01/2023
B 17/12/2023 07/09/2023
A 20/01/2023 30/12/2023
B 17/12/2023 05/05/2023
B 17/12/2023 17/12/2023
A 20/01/2023 30/06/2023
B 17/12/2023 17/12/2023
C 08/08/2023 17/12/2023
C 08/08/2023 30/06/2023
C 08/08/2023 17/06/2023

now i need a output like ,if start Date and Work date match for the particular value create a row_number column and assign zero .a value above start date should have 1,2,3,4 etc.. and value below the starting date should have -1,-2,-3,..etc in the row_number

Work start Date work date row_number
A 20/01/2023 30/12/2022 -1
B 17/06/2023 05/12/2023 1
A 20/01/2023 20/01/2023 0
B 17/06/2023 07/09/2022 -1
A 20/01/2023 30/12/2023 1
B 17/06/2023 05/05/2023 -2
B 17/06/2023 17/12/2023 2
A 20/01/2023 30/06/2023 2
B 17/06/2023 17/06/2023 0
C 08/08/2023 17/12/2023 1
C 08/08/2023 30/06/2023 -1
C 08/08/2023 17/06/2023 2
def add_values(group):
    match_index = group[&#39;Date&#39;].eq(group[&#39;sop&#39;]).idxmax() # find the index of the matching date
    group[&#39;values&#39;] = &#39;&#39; # create a new column to hold the values
    for i in range(len(group)):
        if i == match_index:
            group.loc[i, &#39;values&#39;] = &#39;0&#39;
        elif i &lt; match_index:
            group.loc[i, &#39;values&#39;] = &#39;-&#39; + str(match_index - i)
        else:
            group.loc[i, &#39;values&#39;] = &#39;+&#39; + str(i - match_index)
    return group

result = df.groupby(&#39;model&#39;).apply(add_values)

答案1

得分: 3

我有与 @Corralien 相同的解释。

以下是使用 rank 的方法:

sd = pd.to_datetime(df['start Date'], dayfirst=True)
wd = pd.to_datetime(df['work date'], dayfirst=True)

m = sd == wd

df['row_number'] = (wd.groupby(df['Work'])
                      .transform(lambda g: (r:=g.rank(method='dense'))-r[m].squeeze())
                   )

输出:

  Work  start Date   work date  row_number
0    A  20/01/2023  30/12/2022        -1.0
1    B  17/06/2023  05/12/2023         2.0
2    A  20/01/2023  20/01/2023         0.0
3    B  17/06/2023  07/09/2023         1.0
4    A  20/01/2023  30/12/2023         2.0
5    B  17/06/2023  05/05/2023        -1.0
6    B  17/06/2023  17/12/2023         3.0
7    A  20/01/2023  30/06/2023         1.0
8    B  17/06/2023  17/06/2023         0.0

处理有缺失相等日期的组:

sd = pd.to_datetime(df['start Date'], dayfirst=True)
wd = pd.to_datetime(df['work date'], dayfirst=True)

m = sd == wd

df['row_number'] = (wd.groupby(df['Work'])
                      .transform(lambda g: (r:=g.rank(method='dense'))
                                          -(1 if r[m].empty else r[m].iloc[0])
                                 )
                   )

输出:

   Work  start Date   work date  row_number
0     A  20/01/2023  30/12/2023         2.0
1     B  17/12/2023  05/12/2023        -1.0
2     A  20/01/2023  20/01/2023         0.0
3     B  17/12/2023  07/09/2023        -2.0
4     A  20/01/2023  30/12/2023         2.0
5     B  17/12/2023  05/05/2023        -3.0
6     B  17/12/2023  17/12/2023         0.0
7     A  20/01/2023  30/06/2023         1.0
8     B  17/12/2023  17/12/2023         0.0
9     C  08/08/2023  17/12/2023         2.0
10    C  08/08/2023  30/06/2023         1.0
11    C  08/08/2023  17/06/2023         0.0
英文:

I had the same interpretation than @Corralien.

Here is my approach using rank:

sd = pd.to_datetime(df[&#39;start Date&#39;], dayfirst=True)
wd = pd.to_datetime(df[&#39;work date&#39;], dayfirst=True)

m = sd==wd

df[&#39;row_number&#39;] = (wd.groupby(df[&#39;Work&#39;])
                      .transform(lambda g: (r:=g.rank(method=&#39;dense&#39;))-r[m].squeeze())
                   )

Output:

  Work  start Date   work date  row_number
0    A  20/01/2023  30/12/2022        -1.0
1    B  17/06/2023  05/12/2023         2.0
2    A  20/01/2023  20/01/2023         0.0
3    B  17/06/2023  07/09/2023         1.0
4    A  20/01/2023  30/12/2023         2.0
5    B  17/06/2023  05/05/2023        -1.0
6    B  17/06/2023  17/12/2023         3.0
7    A  20/01/2023  30/06/2023         1.0
8    B  17/06/2023  17/06/2023         0.0

handling groups with missing equal dates:

sd = pd.to_datetime(df[&#39;start Date&#39;], dayfirst=True)
wd = pd.to_datetime(df[&#39;work date&#39;], dayfirst=True)

m = sd==wd

df[&#39;row_number&#39;] = (wd.groupby(df[&#39;Work&#39;])
                      .transform(lambda g: (r:=g.rank(method=&#39;dense&#39;))
                                          -(1 if r[m].empty else r[m].iloc[0])
                                 )
                   )

Output:

   Work  start Date   work date  row_number
0     A  20/01/2023  30/12/2023         2.0
1     B  17/12/2023  05/12/2023        -1.0
2     A  20/01/2023  20/01/2023         0.0
3     B  17/12/2023  07/09/2023        -2.0
4     A  20/01/2023  30/12/2023         2.0
5     B  17/12/2023  05/05/2023        -3.0
6     B  17/12/2023  17/12/2023         0.0
7     A  20/01/2023  30/06/2023         1.0
8     B  17/12/2023  17/12/2023         0.0
9     C  08/08/2023  17/12/2023         2.0
10    C  08/08/2023  30/06/2023         1.0
11    C  08/08/2023  17/06/2023         0.0

答案2

得分: 2

以下是代码的翻译:

# 如果没有与值C匹配的日期,但有开始日期,那么对于所有日期,它将打印负值,如果在开始日期之上有值,则将其打印为正值,如果没有匹配日期,则将其打印为负值。
你可以使用

如果尚未转换为DatetimeIndex,请进行转换

df['start Date'] = pd.to_datetime(df['start Date'], dayfirst=True)
df['work date'] = pd.to_datetime(df['work date'], dayfirst=True)

def row_num(df):
days = df['work date'].sub(df['start Date']).dt.days
same = days == 0
before = days < 0
after = days > 0
days[before] = np.arange(-before.sum(), 0, 1)
days[after] = np.arange(1, after.sum()+1, 1)
return days

df['row_number'] = (df.sort_values('work date')
.groupby(['Work', 'start Date'], as_index=False)
.apply(row_num).droplevel(0))


输出:

df.sort_values(['Work', 'row_number'])
Work start Date work date row_number
0 A 2023-01-20 2022-12-30 -1
2 A 2023-01-20 2023-01-20 0
7 A 2023-01-20 2023-06-30 1
4 A 2023-01-20 2023-12-30 2
3 B 2023-06-17 2022-09-07 -2
5 B 2023-06-17 2023-05-05 -1
8 B 2023-06-17 2023-06-17 0
1 B 2023-06-17 2023-12-05 1
6 B 2023-06-17 2023-12-17 2
11 C 2023-08-08 2023-06-17 -2
10 C 2023-08-08 2023-06-30 -1
9 C 2023-08-08 2023-12-17 1


**另一个使用`rank`的解决方案**

如果你更喜欢,也可以使用`rank`,但需要在开始日期之前和之后分开工作日期:

```python
# 如果尚未转换为DatetimeIndex,请进行转换
df['start Date'] = pd.to_datetime(df['start Date'], dayfirst=True)
df['work date'] = pd.to_datetime(df['work date'], dayfirst=True)

def row_num(x):
    before = x < 0
    after = x > 0
    return pd.concat([-x[before].rank(method='dense', ascending=False),
                      x[~before&~after],
                      x[after].rank(method='dense')]).astype(int)

df['row_number'] = (df.assign(days=df['work date'].sub(df['start Date']).dt.days)
                      .groupby(['Work', 'start Date'])['days'].transform(row_num))

输出:

>>> df.sort_values(['Work', 'row_number'])
   Work start Date  work date  row_number
0     A 2023-01-20 2022-12-30          -1
2     A 2023-01-20 2023-01-20           0
7     A 2023-01-20 2023-06-30           1
4     A 2023-01-20 2023-12-30           2
3     B 2023-06-17 2022-09-07          -2
5     B 2023-06-17 2023-05-05          -1
8     B 2023-06-17 2023-06-17           0
1     B 2023-06-17 2023-12-05           1
6     B 2023-06-17 2023-12-17           2
11    C 2023-08-08 2023-06-17          -2
10    C 2023-08-08 2023-06-30          -1
9     C 2023-08-08 2023-12-17           1
英文:

> If there no matching date for value C but it has start date ,then it is printing negative value for all ,i need positive value for value above start date and below start as negative value if there is no matching date as well.

You can use:

# Convert to DatetimeIndex if not already the case
df[&#39;start Date&#39;] = pd.to_datetime(df[&#39;start Date&#39;], dayfirst=True)
df[&#39;work date&#39;] = pd.to_datetime(df[&#39;work date&#39;], dayfirst=True)

def row_num(df):
    days = df[&#39;work date&#39;].sub(df[&#39;start Date&#39;]).dt.days
    same = days == 0
    before = days &lt; 0
    after = days &gt; 0
    days[before] = np.arange(-before.sum(), 0, 1)
    days[after] = np.arange(1, after.sum()+1, 1)
    return days

df[&#39;row_number&#39;] = (df.sort_values(&#39;work date&#39;)
                      .groupby([&#39;Work&#39;, &#39;start Date&#39;], as_index=False)
                      .apply(row_num).droplevel(0))

Output:

&gt;&gt;&gt; df.sort_values([&#39;Work&#39;, &#39;row_number&#39;])
   Work start Date  work date  row_number
0     A 2023-01-20 2022-12-30          -1
2     A 2023-01-20 2023-01-20           0
7     A 2023-01-20 2023-06-30           1
4     A 2023-01-20 2023-12-30           2
3     B 2023-06-17 2022-09-07          -2
5     B 2023-06-17 2023-05-05          -1
8     B 2023-06-17 2023-06-17           0
1     B 2023-06-17 2023-12-05           1
6     B 2023-06-17 2023-12-17           2
11    C 2023-08-08 2023-06-17          -2
10    C 2023-08-08 2023-06-30          -1
9     C 2023-08-08 2023-12-17           1

Another solution with rank

You can use rank if you prefer but you have to separate work dates before and after start date:

# Convert to DatetimeIndex if not already the case
df[&#39;start Date&#39;] = pd.to_datetime(df[&#39;start Date&#39;], dayfirst=True)
df[&#39;work date&#39;] = pd.to_datetime(df[&#39;work date&#39;], dayfirst=True)

def row_num(x):
    before = x &lt; 0
    after = x &gt; 0
    return pd.concat([-x[before].rank(method=&#39;dense&#39;, ascending=False),
                      x[~before&amp;~after],
                      x[after].rank(method=&#39;dense&#39;)]).astype(int)

df[&#39;row_number&#39;] = (df.assign(days=df[&#39;work date&#39;].sub(df[&#39;start Date&#39;]).dt.days)
                      .groupby([&#39;Work&#39;, &#39;start Date&#39;])[&#39;days&#39;].transform(row_num))

Output:

&gt;&gt;&gt; df.sort_values([&#39;Work&#39;, &#39;row_number&#39;])
   Work start Date  work date  row_number
0     A 2023-01-20 2022-12-30          -1
2     A 2023-01-20 2023-01-20           0
7     A 2023-01-20 2023-06-30           1
4     A 2023-01-20 2023-12-30           2
3     B 2023-06-17 2022-09-07          -2
5     B 2023-06-17 2023-05-05          -1
8     B 2023-06-17 2023-06-17           0
1     B 2023-06-17 2023-12-05           1
6     B 2023-06-17 2023-12-17           2
11    C 2023-08-08 2023-06-17          -2
10    C 2023-08-08 2023-06-30          -1
9     C 2023-08-08 2023-12-17           1

huangapple
  • 本文由 发表于 2023年4月7日 03:35:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/75953174.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定