英文:
Finding how to use "rank" or other similar grouping functions with conditionals in R
问题
以下是您提供的内容的中文翻译:
我正在尝试在R中执行具有以下特征的排名或分组:
- 字段“Object”的内容等于下一记录的字段“Object”的内容
- 下一记录的日期是连续的(对应于下一天)
示例数据:
OBJECT | DATE |
---|---|
PRODUCT1 | 01/02/2023 |
PRODUCT1 | 02/02/2023 |
PRODUCT1 | 21/02/2023 |
PRODUCT2 | 07/02/2023 |
PRODUCT2 | 09/02/2023 |
PRODUCT2 | 10/02/2023 |
PRODUCT2 | 11/02/2023 |
PRODUCT2 | 23/02/2023 |
我正在使用以下代码,但结果不正确:
library(plyr)
ddply(df, .(object), transform, rank = (seq_along(date)))
期望的结果类似于以下表格:
OBJECT | DATE | RANK |
---|---|---|
PRODUCT1 | 01/02/2023 | 1 |
PRODUCT1 | 02/02/2023 | 1 |
PRODUCT1 | 21/02/2023 | 2 |
PRODUCT2 | 07/02/2023 | 1 |
PRODUCT2 | 09/02/2023 | 2 |
PRODUCT2 | 10/02/2023 | 2 |
PRODUCT2 | 11/02/2023 | 2 |
PRODUCT2 | 23/02/2023 | 3 |
感谢您在解决这个问题时的帮助。
英文:
I am trying in R to perform a ranking or grouping with the following characteristics:
- The content of the field: "Object" of the following record is equal to the content of the field: "Object"
- The date of the next record is sequential (corresponds to the next day)
Example data:
OBJECT | DATE |
---|---|
PRODUCT1 | 01/02/2023 |
PRODUCT1 | 02/02/2023 |
PRODUCT1 | 21/02/2023 |
PRODUCT2 | 07/02/2023 |
PRODUCT2 | 09/02/2023 |
PRODUCT2 | 10/02/2023 |
PRODUCT2 | 11/02/2023 |
PRODUCT2 | 23/02/2023 |
I am using the following code but the result is not correct:
library(plyr)
ddply(df, .(object), transform, rank = (seq_along(date)))
The expected result is similar to the following table:
OBJECT | DATE | RANK |
---|---|---|
PRODUCT1 | 01/02/2023 | 1 |
PRODUCT1 | 02/02/2023 | 1 |
PRODUCT1 | 21/02/2023 | 2 |
PRODUCT2 | 07/02/2023 | 1 |
PRODUCT2 | 09/02/2023 | 2 |
PRODUCT2 | 10/02/2023 | 2 |
PRODUCT2 | 11/02/2023 | 2 |
PRODUCT2 | 23/02/2023 | 3 |
I appreciate your help in solving this question.
答案1
得分: 1
排名按"object"分组,并按"date"排序。所需的值基于前一行和当前行之间的间隔是否大于1天,如果是,则排名增加1。
df <- data.frame(OBJECT = c("PRODUCT1", "PRODUCT1", "PRODUCT1", "PRODUCT2", "PRODUCT2", "PRODUCT2", "PRODUCT2", "PRODUCT2"),
DATE = as.Date(c("2023-02-01", "2023-02-02", "2023-02-21", "2023-02-07", "2023-02-09", "2023-02-10", "2023-02-11", "2023-02-23")))
library(dplyr)
# 添加所需的列
df <- df %>%
group_by(OBJECT) %>%
arrange(DATE) %>%
mutate(wanted = cumsum(c(1, diff(DATE) > 1)))
df
| OBJECT | DATE | wanted |
+----------+------------+--------+
| PRODUCT1 | 2023-02-01 | 1 |
| PRODUCT1 | 2023-02-02 | 1 |
| PRODUCT1 | 2023-02-21 | 2 |
| PRODUCT2 | 2023-02-07 | 1 |
| PRODUCT2 | 2023-02-09 | 2 |
| PRODUCT2 | 2023-02-10 | 2 |
| PRODUCT2 | 2023-02-11 | 2 |
| PRODUCT2 | 2023-02-23 | 3 |
英文:
The ranking groups by "object" and orders by "date". The wanted value is then based on if the gap between the previous row and the current row is greater than 1 day, then the rank is incremented by 1.
df <- data.frame(OBJECT = c("PRODUCT1", "PRODUCT1", "PRODUCT1", "PRODUCT2", "PRODUCT2", "PRODUCT2", "PRODUCT2", "PRODUCT2"),
DATE = as.Date(c("2023-02-01", "2023-02-02", "2023-02-21", "2023-02-07", "2023-02-09", "2023-02-10", "2023-02-11", "2023-02-23")))
library(dplyr)
# add the wanted column
df <- df %>%
group_by(OBJECT) %>%
arrange(DATE) %>%
mutate(wanted = cumsum(c(1, diff(DATE) > 1)))
df
| OBJECT | DATE | wanted |
+----------+------------+--------+
| PRODUCT1 | 2023-02-01 | 1 |
| PRODUCT1 | 2023-02-02 | 1 |
| PRODUCT1 | 2023-02-21 | 2 |
| PRODUCT2 | 2023-02-07 | 1 |
| PRODUCT2 | 2023-02-09 | 2 |
| PRODUCT2 | 2023-02-10 | 2 |
| PRODUCT2 | 2023-02-11 | 2 |
| PRODUCT2 | 2023-02-23 | 3 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论