2023年4月7日 03:35:30go评论161阅读模式

英文:

Assign a row number of matching dates

问题

def add_values(group):
    match_index = group['start Date'].eq(group['work date']).idxmax() # find the index of the matching date
    group['row_number'] = '' # create a new column to hold the row numbers
    for i in range(len(group)):
        if i == match_index:
            group.loc[i, 'row_number'] = '0'
        elif i < match_index:
            group.loc[i, 'row_number'] = '-' + str(match_index - i)
        else:
            group.loc[i, 'row_number'] = '+' + str(i - match_index)
    return group

result = df.groupby('Work').apply(add_values)

这是你提供的代码的翻译，没有其他额外内容。

英文:

I have a dataframe

Work	start Date	work date
A	20/01/2023	30/12/2023
B	17/12/2023	05/12/2023
A	20/01/2023	20/01/2023
B	17/12/2023	07/09/2023
A	20/01/2023	30/12/2023
B	17/12/2023	05/05/2023
B	17/12/2023	17/12/2023
A	20/01/2023	30/06/2023
B	17/12/2023	17/12/2023
C	08/08/2023	17/12/2023
C	08/08/2023	30/06/2023
C	08/08/2023	17/06/2023

now i need a output like ,if start Date and Work date match for the particular value create a row_number column and assign zero .a value above start date should have 1,2,3,4 etc.. and value below the starting date should have -1,-2,-3,..etc in the row_number

Work	start Date	work date	row_number
A	20/01/2023	30/12/2022	-1
B	17/06/2023	05/12/2023	1
A	20/01/2023	20/01/2023	0
B	17/06/2023	07/09/2022	-1
A	20/01/2023	30/12/2023	1
B	17/06/2023	05/05/2023	-2
B	17/06/2023	17/12/2023	2
A	20/01/2023	30/06/2023	2
B	17/06/2023	17/06/2023	0
C	08/08/2023	17/12/2023	1
C	08/08/2023	30/06/2023	-1
C	08/08/2023	17/06/2023	2

def add_values(group):
    match_index = group[&#39;Date&#39;].eq(group[&#39;sop&#39;]).idxmax() # find the index of the matching date
    group[&#39;values&#39;] = &#39;&#39; # create a new column to hold the values
    for i in range(len(group)):
        if i == match_index:
            group.loc[i, &#39;values&#39;] = &#39;0&#39;
        elif i &lt; match_index:
            group.loc[i, &#39;values&#39;] = &#39;-&#39; + str(match_index - i)
        else:
            group.loc[i, &#39;values&#39;] = &#39;+&#39; + str(i - match_index)
    return group

result = df.groupby(&#39;model&#39;).apply(add_values)

答案1

得分: 3

我有与 @Corralien 相同的解释。

以下是使用 rank 的方法：

sd = pd.to_datetime(df['start Date'], dayfirst=True)
wd = pd.to_datetime(df['work date'], dayfirst=True)

m = sd == wd

df['row_number'] = (wd.groupby(df['Work'])
                      .transform(lambda g: (r:=g.rank(method='dense'))-r[m].squeeze())
                   )

输出：

  Work  start Date   work date  row_number
0    A  20/01/2023  30/12/2022        -1.0
1    B  17/06/2023  05/12/2023         2.0
2    A  20/01/2023  20/01/2023         0.0
3    B  17/06/2023  07/09/2023         1.0
4    A  20/01/2023  30/12/2023         2.0
5    B  17/06/2023  05/05/2023        -1.0
6    B  17/06/2023  17/12/2023         3.0
7    A  20/01/2023  30/06/2023         1.0
8    B  17/06/2023  17/06/2023         0.0

处理有缺失相等日期的组：

sd = pd.to_datetime(df['start Date'], dayfirst=True)
wd = pd.to_datetime(df['work date'], dayfirst=True)

m = sd == wd

df['row_number'] = (wd.groupby(df['Work'])
                      .transform(lambda g: (r:=g.rank(method='dense'))
                                          -(1 if r[m].empty else r[m].iloc[0])
                                 )
                   )

输出：

   Work  start Date   work date  row_number
0     A  20/01/2023  30/12/2023         2.0
1     B  17/12/2023  05/12/2023        -1.0
2     A  20/01/2023  20/01/2023         0.0
3     B  17/12/2023  07/09/2023        -2.0
4     A  20/01/2023  30/12/2023         2.0
5     B  17/12/2023  05/05/2023        -3.0
6     B  17/12/2023  17/12/2023         0.0
7     A  20/01/2023  30/06/2023         1.0
8     B  17/12/2023  17/12/2023         0.0
9     C  08/08/2023  17/12/2023         2.0
10    C  08/08/2023  30/06/2023         1.0
11    C  08/08/2023  17/06/2023         0.0

英文:

I had the same interpretation than @Corralien.

Here is my approach using rank:

sd = pd.to_datetime(df[&#39;start Date&#39;], dayfirst=True)
wd = pd.to_datetime(df[&#39;work date&#39;], dayfirst=True)

m = sd==wd

df[&#39;row_number&#39;] = (wd.groupby(df[&#39;Work&#39;])
                      .transform(lambda g: (r:=g.rank(method=&#39;dense&#39;))-r[m].squeeze())
                   )

Output:

  Work  start Date   work date  row_number
0    A  20/01/2023  30/12/2022        -1.0
1    B  17/06/2023  05/12/2023         2.0
2    A  20/01/2023  20/01/2023         0.0
3    B  17/06/2023  07/09/2023         1.0
4    A  20/01/2023  30/12/2023         2.0
5    B  17/06/2023  05/05/2023        -1.0
6    B  17/06/2023  17/12/2023         3.0
7    A  20/01/2023  30/06/2023         1.0
8    B  17/06/2023  17/06/2023         0.0

handling groups with missing equal dates:

sd = pd.to_datetime(df[&#39;start Date&#39;], dayfirst=True)
wd = pd.to_datetime(df[&#39;work date&#39;], dayfirst=True)

m = sd==wd

df[&#39;row_number&#39;] = (wd.groupby(df[&#39;Work&#39;])
                      .transform(lambda g: (r:=g.rank(method=&#39;dense&#39;))
                                          -(1 if r[m].empty else r[m].iloc[0])
                                 )
                   )

Output:

   Work  start Date   work date  row_number
0     A  20/01/2023  30/12/2023         2.0
1     B  17/12/2023  05/12/2023        -1.0
2     A  20/01/2023  20/01/2023         0.0
3     B  17/12/2023  07/09/2023        -2.0
4     A  20/01/2023  30/12/2023         2.0
5     B  17/12/2023  05/05/2023        -3.0
6     B  17/12/2023  17/12/2023         0.0
7     A  20/01/2023  30/06/2023         1.0
8     B  17/12/2023  17/12/2023         0.0
9     C  08/08/2023  17/12/2023         2.0
10    C  08/08/2023  30/06/2023         1.0
11    C  08/08/2023  17/06/2023         0.0

答案2

得分: 2

以下是代码的翻译：

# 如果没有与值C匹配的日期，但有开始日期，那么对于所有日期，它将打印负值，如果在开始日期之上有值，则将其打印为正值，如果没有匹配日期，则将其打印为负值。
你可以使用：

如果尚未转换为DatetimeIndex，请进行转换

df['start Date'] = pd.to_datetime(df['start Date'], dayfirst=True)
df['work date'] = pd.to_datetime(df['work date'], dayfirst=True)

def row_num(df):
days = df['work date'].sub(df['start Date']).dt.days
same = days == 0
before = days < 0
after = days > 0
days[before] = np.arange(-before.sum(), 0, 1)
days[after] = np.arange(1, after.sum()+1, 1)
return days

df['row_number'] = (df.sort_values('work date')
.groupby(['Work', 'start Date'], as_index=False)
.apply(row_num).droplevel(0))


输出：

df.sort_values(['Work', 'row_number'])
Work start Date work date row_number
0 A 2023-01-20 2022-12-30 -1
2 A 2023-01-20 2023-01-20 0
7 A 2023-01-20 2023-06-30 1
4 A 2023-01-20 2023-12-30 2
3 B 2023-06-17 2022-09-07 -2
5 B 2023-06-17 2023-05-05 -1
8 B 2023-06-17 2023-06-17 0
1 B 2023-06-17 2023-12-05 1
6 B 2023-06-17 2023-12-17 2
11 C 2023-08-08 2023-06-17 -2
10 C 2023-08-08 2023-06-30 -1
9 C 2023-08-08 2023-12-17 1


**另一个使用`rank`的解决方案**

如果你更喜欢，也可以使用`rank`，但需要在开始日期之前和之后分开工作日期：

```python
# 如果尚未转换为DatetimeIndex，请进行转换
df['start Date'] = pd.to_datetime(df['start Date'], dayfirst=True)
df['work date'] = pd.to_datetime(df['work date'], dayfirst=True)

def row_num(x):
    before = x < 0
    after = x > 0
    return pd.concat([-x[before].rank(method='dense', ascending=False),
                      x[~before&~after],
                      x[after].rank(method='dense')]).astype(int)

df['row_number'] = (df.assign(days=df['work date'].sub(df['start Date']).dt.days)
                      .groupby(['Work', 'start Date'])['days'].transform(row_num))

输出：

>>> df.sort_values(['Work', 'row_number'])
   Work start Date  work date  row_number
0     A 2023-01-20 2022-12-30          -1
2     A 2023-01-20 2023-01-20           0
7     A 2023-01-20 2023-06-30           1
4     A 2023-01-20 2023-12-30           2
3     B 2023-06-17 2022-09-07          -2
5     B 2023-06-17 2023-05-05          -1
8     B 2023-06-17 2023-06-17           0
1     B 2023-06-17 2023-12-05           1
6     B 2023-06-17 2023-12-17           2
11    C 2023-08-08 2023-06-17          -2
10    C 2023-08-08 2023-06-30          -1
9     C 2023-08-08 2023-12-17           1

英文:

> If there no matching date for value C but it has start date ,then it is printing negative value for all ,i need positive value for value above start date and below start as negative value if there is no matching date as well.

You can use:

# Convert to DatetimeIndex if not already the case
df[&#39;start Date&#39;] = pd.to_datetime(df[&#39;start Date&#39;], dayfirst=True)
df[&#39;work date&#39;] = pd.to_datetime(df[&#39;work date&#39;], dayfirst=True)

def row_num(df):
    days = df[&#39;work date&#39;].sub(df[&#39;start Date&#39;]).dt.days
    same = days == 0
    before = days &lt; 0
    after = days &gt; 0
    days[before] = np.arange(-before.sum(), 0, 1)
    days[after] = np.arange(1, after.sum()+1, 1)
    return days

df[&#39;row_number&#39;] = (df.sort_values(&#39;work date&#39;)
                      .groupby([&#39;Work&#39;, &#39;start Date&#39;], as_index=False)
                      .apply(row_num).droplevel(0))

Output:

&gt;&gt;&gt; df.sort_values([&#39;Work&#39;, &#39;row_number&#39;])
   Work start Date  work date  row_number
0     A 2023-01-20 2022-12-30          -1
2     A 2023-01-20 2023-01-20           0
7     A 2023-01-20 2023-06-30           1
4     A 2023-01-20 2023-12-30           2
3     B 2023-06-17 2022-09-07          -2
5     B 2023-06-17 2023-05-05          -1
8     B 2023-06-17 2023-06-17           0
1     B 2023-06-17 2023-12-05           1
6     B 2023-06-17 2023-12-17           2
11    C 2023-08-08 2023-06-17          -2
10    C 2023-08-08 2023-06-30          -1
9     C 2023-08-08 2023-12-17           1

Another solution with rank

You can use rank if you prefer but you have to separate work dates before and after start date:

# Convert to DatetimeIndex if not already the case
df[&#39;start Date&#39;] = pd.to_datetime(df[&#39;start Date&#39;], dayfirst=True)
df[&#39;work date&#39;] = pd.to_datetime(df[&#39;work date&#39;], dayfirst=True)

def row_num(x):
    before = x &lt; 0
    after = x &gt; 0
    return pd.concat([-x[before].rank(method=&#39;dense&#39;, ascending=False),
                      x[~before&amp;~after],
                      x[after].rank(method=&#39;dense&#39;)]).astype(int)

df[&#39;row_number&#39;] = (df.assign(days=df[&#39;work date&#39;].sub(df[&#39;start Date&#39;]).dt.days)
                      .groupby([&#39;Work&#39;, &#39;start Date&#39;])[&#39;days&#39;].transform(row_num))

Output:

&gt;&gt;&gt; df.sort_values([&#39;Work&#39;, &#39;row_number&#39;])
   Work start Date  work date  row_number
0     A 2023-01-20 2022-12-30          -1
2     A 2023-01-20 2023-01-20           0
7     A 2023-01-20 2023-06-30           1
4     A 2023-01-20 2023-12-30           2
3     B 2023-06-17 2022-09-07          -2
5     B 2023-06-17 2023-05-05          -1
8     B 2023-06-17 2023-06-17           0
1     B 2023-06-17 2023-12-05           1
6     B 2023-06-17 2023-12-17           2
11    C 2023-08-08 2023-06-17          -2
10    C 2023-08-08 2023-06-30          -1
9     C 2023-08-08 2023-12-17           1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为匹配的日期分配行号。

问题

答案1

处理有缺失相等日期的组：

handling groups with missing equal dates:

答案2

如果尚未转换为DatetimeIndex，请进行转换

在CSV中解析字典数值

无法在Docker上运行Tensorflow GPU（尽管似乎在其外部已安装）

Deployed dolly2 model in Sagemaker for embeddings, but receiving a 400 error when calling endpoint

Scipy聚类; 使用物理学的Minkowski度量？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论