在R中如何使用”rank”或其他类似的分组函数与条件结合使用。

huangapple go评论84阅读模式
英文:

Finding how to use "rank" or other similar grouping functions with conditionals in R

问题

以下是您提供的内容的中文翻译:

我正在尝试在R中执行具有以下特征的排名或分组:

  • 字段“Object”的内容等于下一记录的字段“Object”的内容
  • 下一记录的日期是连续的(对应于下一天)

示例数据:

OBJECT DATE
PRODUCT1 01/02/2023
PRODUCT1 02/02/2023
PRODUCT1 21/02/2023
PRODUCT2 07/02/2023
PRODUCT2 09/02/2023
PRODUCT2 10/02/2023
PRODUCT2 11/02/2023
PRODUCT2 23/02/2023

我正在使用以下代码,但结果不正确:

library(plyr)
ddply(df, .(object), transform, rank = (seq_along(date)))

期望的结果类似于以下表格:

OBJECT DATE RANK
PRODUCT1 01/02/2023 1
PRODUCT1 02/02/2023 1
PRODUCT1 21/02/2023 2
PRODUCT2 07/02/2023 1
PRODUCT2 09/02/2023 2
PRODUCT2 10/02/2023 2
PRODUCT2 11/02/2023 2
PRODUCT2 23/02/2023 3

感谢您在解决这个问题时的帮助。

英文:

I am trying in R to perform a ranking or grouping with the following characteristics:

  • The content of the field: "Object" of the following record is equal to the content of the field: "Object"
  • The date of the next record is sequential (corresponds to the next day)

Example data:

OBJECT DATE
PRODUCT1 01/02/2023
PRODUCT1 02/02/2023
PRODUCT1 21/02/2023
PRODUCT2 07/02/2023
PRODUCT2 09/02/2023
PRODUCT2 10/02/2023
PRODUCT2 11/02/2023
PRODUCT2 23/02/2023

I am using the following code but the result is not correct:

library(plyr)
ddply(df, .(object), transform, rank = (seq_along(date)))

The expected result is similar to the following table:

OBJECT DATE RANK
PRODUCT1 01/02/2023 1
PRODUCT1 02/02/2023 1
PRODUCT1 21/02/2023 2
PRODUCT2 07/02/2023 1
PRODUCT2 09/02/2023 2
PRODUCT2 10/02/2023 2
PRODUCT2 11/02/2023 2
PRODUCT2 23/02/2023 3

I appreciate your help in solving this question.

答案1

得分: 1

排名按"object"分组,并按"date"排序。所需的值基于前一行和当前行之间的间隔是否大于1天,如果是,则排名增加1。

df <- data.frame(OBJECT = c("PRODUCT1", "PRODUCT1", "PRODUCT1", "PRODUCT2", "PRODUCT2", "PRODUCT2", "PRODUCT2", "PRODUCT2"),
                 DATE = as.Date(c("2023-02-01", "2023-02-02", "2023-02-21", "2023-02-07", "2023-02-09", "2023-02-10", "2023-02-11", "2023-02-23")))

library(dplyr)

# 添加所需的列
df <- df %>%
  group_by(OBJECT) %>%
  arrange(DATE) %>%
  mutate(wanted = cumsum(c(1, diff(DATE) > 1)))

df

|  OBJECT  |    DATE    | wanted |
+----------+------------+--------+
| PRODUCT1 | 2023-02-01 |      1 |
| PRODUCT1 | 2023-02-02 |      1 |
| PRODUCT1 | 2023-02-21 |      2 |
| PRODUCT2 | 2023-02-07 |      1 |
| PRODUCT2 | 2023-02-09 |      2 |
| PRODUCT2 | 2023-02-10 |      2 |
| PRODUCT2 | 2023-02-11 |      2 |
| PRODUCT2 | 2023-02-23 |      3 |
英文:

The ranking groups by "object" and orders by "date". The wanted value is then based on if the gap between the previous row and the current row is greater than 1 day, then the rank is incremented by 1.

df &lt;- data.frame(OBJECT = c(&quot;PRODUCT1&quot;, &quot;PRODUCT1&quot;, &quot;PRODUCT1&quot;, &quot;PRODUCT2&quot;, &quot;PRODUCT2&quot;, &quot;PRODUCT2&quot;, &quot;PRODUCT2&quot;, &quot;PRODUCT2&quot;),
                 DATE = as.Date(c(&quot;2023-02-01&quot;, &quot;2023-02-02&quot;, &quot;2023-02-21&quot;, &quot;2023-02-07&quot;, &quot;2023-02-09&quot;, &quot;2023-02-10&quot;, &quot;2023-02-11&quot;, &quot;2023-02-23&quot;)))

library(dplyr)

# add the wanted column
df &lt;- df %&gt;%
  group_by(OBJECT) %&gt;%
  arrange(DATE) %&gt;%
  mutate(wanted = cumsum(c(1, diff(DATE) &gt; 1)))

df

|  OBJECT  |    DATE    | wanted |
+----------+------------+--------+
| PRODUCT1 | 2023-02-01 |      1 |
| PRODUCT1 | 2023-02-02 |      1 |
| PRODUCT1 | 2023-02-21 |      2 |
| PRODUCT2 | 2023-02-07 |      1 |
| PRODUCT2 | 2023-02-09 |      2 |
| PRODUCT2 | 2023-02-10 |      2 |
| PRODUCT2 | 2023-02-11 |      2 |
| PRODUCT2 | 2023-02-23 |      3 |

huangapple
  • 本文由 发表于 2023年5月13日 12:34:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76241095.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定