英文:
For loop - how to change a varying number of entries if requirement is met in another column
问题
我理解你的需求。下面是根据你的伪代码的翻译:
if (order == 1) {
summary == 'TRUE' #每个客户的第一笔订单始终为TRUE。
}
if (order_result[row_number()] == 'positive') { #如果结果为positive
summary[row_number()+ length(?)] == FALSE #在positive的结果后,该客户的所有后续行在summary中都为FALSE。
}
if (order_result[row_number()] == 'negative') { #如果结果为negative,根据该订单日期和后续订单日期之间的时间差有两个选项。
if (diff_time(orderdate[row_number()], orderdate[row_number(?)]) <= 400) {
summary[row_number()+ length(?)] == FALSE #在negative的结果后,该客户的所有在400天内的后续行在summary中都为FALSE
} else {
summary[row_number()+ length(?)] == TRUE #否则,该客户的所有在400天以上的后续行在summary中都为TRUE
}
}
if (order_result[row_number()] == 'lost' | order_result[row_number()] == 'return') {
summary[row_number() + 1] == TRUE #如果订单结果为lost或return,该客户的下一笔订单在summary中为TRUE。
}
请注意,伪代码中有一些占位符(?),你需要填写这些占位符以使代码完整。同时,你需要将这些逻辑嵌入到适当的循环中,以便针对每个客户和订单进行迭代。希望这可以帮助你开始解决问题。
英文:
I want to create a new column summary
based on the values in order_result
and order_date
. The column orders
is the number of orders for each customer
.
I have the following code -
customer <- c("A", "A", "B", "B", "C", "C", "C", "D", "E", "E", "E", "F")
order <- c("1", "2", "1", "2", "1", "2", "3", "1", "1", "2", "3", "1")
order_result <- c("positive", "lost", "negative", "return", "negative", "lost", "negative", "lost", "lost", "return", "lost", "return")
order_date <- c("2018-09-14", "2020-08-20", "2018-09-15", "2019-08-25", "2017-09-12", "2018-09-16", "2020-08-21", "2018-08-10", "2017-09-13", "2018-02-16", "2020-08-21", "2017-05-20")
df1 <- data.frame(customer, order, order_result, order_date)
I want to parse through order_results for each customer from earliest to latest order_date and create a new column summary
with true or false entries.
The 1st order for a customer always is TRUE in the summary
column.
Then, going row by row, if the order_result
is "negative", then an unspecified number of later rows in summary
for that customer are either FALSE if their order_date
is ≤400 days from the index order or TRUE if >400 days from the index order_date
. The number of rows to fill out with FALSE depends on the actual order_dates, which can vary based on customer.
If the order_result
is "return" or "lost", the next order has TRUE in summary
. If the order_result
is "positive", all later orders for that patient have FALSE in summary
. We move down the list of orders by jumping to the next order that has TRUE in summary
(which becomes the next index order) and repeating.
Each customerid is independent of each other. The result should be:
summary <- c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)
My question is I am not sure how to approach filling out an unspecified number of rows with FALSE since it all depends on the actual order_dates and the elapsed time from the index date. In addition, how can I jump from 1 row with TRUE to the next row with TRUE in summary
(ie, change index dates) for the same customer (not the whole dataset) and repeat the process all over again? Is there a way to do this with for-loops and iterating through orders based on customerid and order_date?? I tried using lag/lead in dplyr but was not getting the correct output. Thanks!
EDIT: Here is my pseudocode:
if (order == 1) {
summary == 'TRUE'} #first order for a customer is always TRUE.
if (order_result[row_number()] == 'positive') { #If result is positive
summary[row_number()+ length(?)] == FALSE} #After a positive result, all subsequent rows in summary for that customer are FALSE.
if (order_result[row_number()] == 'negative') { #If result is negative, there are 2 options based on time difference between that order date and subsequent order dates.
if (diff_time(orderdate[row_number()], orderdate[row_number(?)]) <= 400 {
summary[row_number()+ length(?)] == FALSE} #After a negative result, all subsequent rows under summary within 400 days of that order for that customer are FALSE
else summary[row_number()+ length(?)] == TRUE} #Otherwise all subsequent rows in summary over 400 days for that customer are TRUE
if (order_result[row_number()] == 'lost' | order_result[row_number()] == 'return') {
summary[row_number() + 1] == TRUE} #If order result is lost or return, the next order for that customer is true under summary.
答案1
得分: 0
关于 mutate
的重要一点是,你可以使用 group_by
对数据进行分组,然后将你感兴趣的列传递给一个函数,它将为每列提供数据作为一个向量,并将其过滤为你想要的组。
这是我下面所做的基础:
library(tidyverse)
order_process <- function(order_col, result_col, date_col) {
summary <- c()
negative_order_date <- NA
lost_or_returned <- FALSE
for (i in seq_along(order_col)) {
# 第一行始终为TRUE
if (i == 1) {
summary[i] <- TRUE
# 如果订单丢失或退回,则下一个为TRUE。这是那个接下来的订单
} else if (lost_or_returned) {
summary[i] <- TRUE
lost_or_returned = FALSE
# 如果是正数
} else if (result_col[i] == "positive") {
# 从现在开始直到列的末尾,所有值都为FALSE
summary[i:length(order_col)] <- FALSE
return(summary)
# 如果是负数
} else if (result_col[i] == "negative") {
if (is.na(negative_order_date)) {
negative_order_date = date_col[i]
}
summary[i] <- date_col[i] - negative_order_date > 400
# 如果是丢失或退回
} else if (result_col[i] == "return" | result_col[i] == "lost") {
summary[i] <- FALSE
lost_or_returned <- TRUE
}
}
return(summary)
}
df1 %>%
as_tibble() %>%
mutate(order_date = ymd(order_date),
order = as.numeric(order),
order_result = as.factor(order_result)) %>%
group_by(customer) %>%
mutate(summary = order_process(order, order_result, order_date)) %>%
pull(summary)
除了倒数第三个值之外,对于每个值都是相同的 - 当 order_result
丢失或退回时,你没有指定在当前值上应该发生什么,所以我猜测它应该是FALSE。
英文:
The great thing about mutate
is, you can use group_by
to group your data, then pass the columns you are interested in to a function, and it will give you the data for each column as a vector, with it filtered to the group you want.
That's the basis of what I did below:
library(tidyverse)
order_process <- function(order_col, result_col, date_col) {
summary <- c()
negative_order_date <- NA
lost_or_returned <- FALSE
for (i in seq_along(order_col)) {
# first row is always TRUE
if (i == 1) {
summary[i] <- TRUE
# if the order was lost or returned, the next one is TRUE. This is that next one
} else if (lost_or_returned) {
summary[i] <- TRUE
lost_or_returned = FALSE
# If positive
} else if (result_col[i] == "positive") {
# everything from now until the end of the column is FALSE
summary[i:length(order_col)] <- FALSE
return(summary)
# If negative
} else if (result_col[i] == "negative") {
if (is.na(negative_order_date)) {
negative_order_date = date_col[i]
}
summary[i] <- date_col[i] - negative_order_date > 400
# If lost or returned
} else if (result_col[i] == "return" | result_col[i] == "lost") {
summary[i] <- FALSE
lost_or_returned <- TRUE
}
}
return(summary)
}
df1 %>%
as_tibble() %>%
mutate(order_date = ymd(order_date),
order = as.numeric(order),
order_result = as.factor(order_result)) %>%
group_by(customer) %>%
mutate(summary = order_process(order, order_result, order_date)) %>%
pull(summary)
It's the same for every value except the third last - you didn't specify what should happen on the current value when order_result
is lost or returned, so I guessed at it being FALSE.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论