英文:
Changing a different number of subsequent entries in a row based on values from other rows
问题
I understand that you want me to translate the code-related parts of your request. Here's the translated code part:
我想基于“order_result”和“order_date”的值创建一个新列“summary”。列“orders”是每个“customer”的订单数量。
我有以下代码-
customer <- c("A", "A", "B", "B", "C", "C", "C", "D", "E", "E", "E", "F", "G", "G", "G", "H", "H", "I", "I", "I", "J", "J", "J", "K", "K", "K", "K", "K")
order <- c("1", "2", "1", "2", "1", "2", "3", "1", "1", "2", "3", "1", "1", "2", "3", "1", "2", "1", "2", "3", "1", "2", "3", "1", "2", "3", "4", "5")
order_result <- c("positive", "lost", "negative", "return", "negative", "lost", "negative", "lost", "lost", "return", "lost", "return", "lost", "negative", "lost", "lost", "positive", "return", "negative", "lost", "lost","positive", "lost", "return", "negative", "lost", "lost", "negative")
order_date <- c("2018-09-14", "2020-08-20", "2018-09-15", "2019-08-25", "2017-09-12", "2018-09-16", "2020-08-21", "2018-08-10", "2017-09-13", "2018-02-16", "2020-08-21", "2017-05-20", "2018-07-05", "2019-02-15", "2021-11-04", "2017-08-07", "2021-08-05", "2019-07-30", "2020-09-23", "2020-11-23","2017-10-30", "2018-04-09", "2020-04-09", "2019-07-30", "2019-12-04", "2020-04-04", "2020-08-03", "2021-12-24")
df1 <- data.frame(customer, order, order_result, order_date)
我想要遍历每个客户的“order_result”,按照订单日期从早到晚的顺序,并创建一个新列“summary”,其中包含是或否的条目。每个客户的第一笔订单在“summary”列中始终为“是”,然后我们根据订单日期(从早到晚)为每个客户遍历“orders”。我们查看“order_result”以确定在“summary”中填充多少行以及使用什么值。假设我们从索引订单开始。如果“order_result”是“negative”,那么“summary”中的后续行数是未指定的,如果其“order_date”与索引订单相差≤400天,则为“否”,如果大于400天,则为“是”。要填充“否”的行数取决于实际的订单日期,这可以根据客户的不同而变化。如果“order_result”是“return”或“lost”,则下一笔订单在“summary”中为“是”。我们通过跳转到具有“summary”中的“是”的下一笔订单(这成为下一个索引订单)并重复来遍历订单列表。
每个“customerid”都是独立的。结果应为:
summary <- c("Yes", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "No", "Yes")
我的问题是,我不确定如何填充未指定数量的“否”行,因为它完全取决于实际的订单日期和与索引日期的经过时间。此外,如何跳转从“summary”中的一行“是”到同一客户的下一行“是”(而不是整个数据集)并重复整个过程?我尝试使用lag/lead但未获得正确的输出。谢谢!
这是我的伪代码:
如果(order == 1){
summary == 'TRUE'} #客户的第一笔订单始终为TRUE。
如果(order_result[row_number()] == 'positive'){ #如果结果为正面
summary[row_number()+ length(?)] == FALSE} #在正面结果后,该客户的所有后续行在summary中都为FALSE。
如果(order_result[row_number()] == 'negative'){ #如果结果为负面,则根据该订单日期和后续订单日期之间的时间差异有两个选项。
如果(diff_time(orderdate[row_number()],orderdate[row_number(?)]) <= 400 {
summary[row_number()+ length(?)] == FALSE} #在负面结果后,该客户的所有后续行在summary中都在该订单之后的400天内为FALSE
否则 summary[row_number()+ length(?)] == TRUE} #否则,该客户的所有后续行在summary中都在400天后为TRUE
如果(order_result[row_number()] == 'lost' | order_result[row_number()] == 'return') {
summary[row_number() + 1] == TRUE} #如果订单结果为“丢失”或“退回”,则该客户的下一笔订单在summary中为TRUE。
英文:
*EDIT: NEW question that has not been answered by the responses below or in other SO questions. Is there a way to do this with for-loops and iterating through orders based on customerid and order_date? Or by using elseif, etc in dplyr??
I want to create a new column summary
based on the values in order_result
and order_date
. The column orders
is the number of orders for each customer
.
I have the following code -
customer <- c("A", "A", "B", "B", "C", "C", "C", "D", "E", "E", "E", "F", "G", "G", "G", "H", "H", "I", "I", "I", "J", "J", "J", "K", "K", "K", "K", "K")
order <- c("1", "2", "1", "2", "1", "2", "3", "1", "1", "2", "3", "1", "1", "2", "3", "1", "2", "1", "2", "3", "1", "2", "3", "1", "2", "3", "4", "5")
order_result <- c("positive", "lost", "negative", "return", "negative", "lost", "negative", "lost", "lost", "return", "lost", "return", "lost", "negative", "lost", "lost", "positive", "return", "negative", "lost", "lost","positive", "lost", "return", "negative", "lost", "lost", "negative")
order_date <- c("2018-09-14", "2020-08-20", "2018-09-15", "2019-08-25", "2017-09-12", "2018-09-16", "2020-08-21", "2018-08-10", "2017-09-13", "2018-02-16", "2020-08-21", "2017-05-20", "2018-07-05", "2019-02-15", "2021-11-04", "2017-08-07", "2021-08-05", "2019-07-30", "2020-09-23", "2020-11-23","2017-10-30", "2018-04-09", "2020-04-09", "2019-07-30", "2019-12-04", "2020-04-04", "2020-08-03", "2021-12-24")
df1 <- data.frame(customer, order, order_result, order_date)
I want to parse through order_results for each customer from earliest to latest order_date and create a new column summary
with yes or no entries. The 1st order for a customer always is "yes" in the summary
column, and we keep moving down the list of orders for each customer. As we go through the orders
based on the order_date (earliest to latest) for each customer, we look at the order_result
to determine how many rows to fill out in summary
and with what value. Say we start with an index order. If the order_result
is "negative", then an unspecified number of later rows in summary
for that customer are either "no" if their order_date
is ≤400 days from the index order or yes if >400 days from the index order_date
. The number of rows to fill out with "no" depends on the actual order_dates, which can vary based on customer. If the order_result
is "return" or "lost", the next order has "yes" in summary
. If the order_result
is "positive", all later orders for that patient have "no" in summary
. We move down the list of orders by jumping to the next order that has yes in summary
(which becomes the next index order) and repeating.
Each customerid is independent of each other. The result should be:
summary <- c("Yes", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "No", "Yes")
My question is I am not sure how to approach filling out an unspecified number of rows with "no" since it all depends on the actual order_dates and the elapsed time from the index date. In addition, how can I jump from 1 row with "yes" to the next row with "yes" in summary
(ie, change index dates) for the same customer (not the whole dataset) and repeat the process all over again? I tried using lag/lead but was not getting the correct output. Thanks!
EDIT: Here is my pseudocode:
if (order == 1) {
summary == 'TRUE'} #first order for a customer is always TRUE.
if (order_result[row_number()] == 'positive') { #If result is positive
summary[row_number()+ length(?)] == FALSE} #After a positive result, all subsequent rows in summary for that customer are FALSE.
if (order_result[row_number()] == 'negative') { #If result is negative, there are 2 options based on time difference between that order date and subsequent order dates.
if (diff_time(orderdate[row_number()], orderdate[row_number(?)]) <= 400 {
summary[row_number()+ length(?)] == FALSE} #After a negative result, all subsequent rows under summary within 400 days of that order for that customer are FALSE
else summary[row_number()+ length(?)] == TRUE} #Otherwise all subsequent rows in summary over 400 days for that customer are TRUE
if (order_result[row_number()] == 'lost' | order_result[row_number()] == 'return') {
summary[row_number() + 1] == TRUE} #If order result is lost or return, the next order for that customer is true under summary.
}
答案1
得分: 1
请检查
df %>%
mutate(new=ifelse(order==1, '是', NA),
lag=ifelse(order!=1,lag(order_date), NA),
diff=as.Date(order_date)-as.Date(lag),
negpos=ifelse(order==1 & order_result %in% c('positive','negative'),order_result,NA_character_),
negpos2=ifelse(order!=1 & !order_result %in% c('positive','negative'),order_result,NA_character_)) %>%
group_by(customer) %>%
fill(negpos,negpos2) %>%
ungroup() %>%
mutate(summary=case_when(
order!=1 & order_result=='negative' & negpos=='negative' & negpos2 %in% c('lost','return') ~ '是',
order!=1 & diff<=400 & negpos=='negative' ~ '否',
order!=1 & diff>400 & negpos=='negative' ~ '是',
order==1 ~ '是',
order!=1 & order_result %in% c('lost','return') & negpos=='negative' ~ '否',
order!=1 & negpos=='positive' ~ '否'
)) %>%
select(-c(new,lag,negpos,negpos2,diff))
# 输出
# 一个 tibble: 8 × 5
客户 order order_result 订单日期 概要
<chr> <chr> <chr> <chr> <chr>
1 A 1 positive 2018-09-14 是
2 A 2 lost 2020-08-20 否
3 B 1 negative 2018-09-15 是
4 B 2 return 2019-08-25 否
5 C 1 negative 2017-09-12 是
6 C 2 lost 2018-09-16 否
7 C 3 negative 2020-08-21 是
8 D 1 lost 2018-08-10 是
英文:
Please check
df %>%
mutate(new=ifelse(order==1, 'Yes', NA),
lag=ifelse(order!=1,lag(order_date), NA),
diff=as.Date(order_date)-as.Date(lag),
negpos=ifelse(order==1 & order_result %in% c('positive','negative'),order_result,NA_character_),
negpos2=ifelse(order!=1 & !order_result %in% c('positive','negative'),order_result,NA_character_)) %>%
group_by(customer) %>%
fill(negpos,negpos2) %>% ungroup() %>%
mutate(summary=case_when(
order!=1 & order_result=='negative' & negpos=='negative' & negpos2 %in% c('lost','return') ~ 'Yes',
order!=1 & diff<=400 & negpos=='negative' ~ 'No',
order!=1 & diff>400 & negpos=='negative' ~ 'Yes',
order==1 ~ 'Yes',
order!=1 & order_result %in% c('lost','return') & negpos=='negative' ~ 'No',
order!=1 & negpos=='positive' ~ 'No'
)) %>% select(-c(new,lag,negpos,negpos2,diff))
# output
# A tibble: 8 × 5
customer order order_result order_date summary
<chr> <chr> <chr> <chr> <chr>
1 A 1 positive 2018-09-14 Yes
2 A 2 lost 2020-08-20 No
3 B 1 negative 2018-09-15 Yes
4 B 2 return 2019-08-25 No
5 C 1 negative 2017-09-12 Yes
6 C 2 lost 2018-09-16 No
7 C 3 negative 2020-08-21 Yes
8 D 1 lost 2018-08-10 Yes
答案2
得分: 1
根据您的评论,以下是更新后的代码:
df %>%
mutate(new = ifelse(order == 1, 'Yes', NA),
lag = ifelse(order != 1, lag(order_date), NA),
diff = as.Date(order_date) - as.Date(lag),
negpos = ifelse(order == 1 & order_result %in% c('positive', 'negative'), order_result, NA_character_),
negpos2 = ifelse(order != 1 & !order_result %in% c('positive', 'negative'), order_result, NA_character_)) %>%
group_by(customer) %>%
fill(negpos, negpos2) %>%
ungroup() %>%
mutate(summary = case_when(
order != 1 & order_result == 'negative' & negpos == 'negative' & negpos2 %in% c('lost', 'return') ~ 'Yes',
order != 1 & diff <= 400 & negpos == 'negative' ~ 'No',
order != 1 & diff > 400 & negpos == 'negative' ~ 'Yes',
order == 1 ~ 'Yes',
order != 1 & order_result %in% c('lost', 'return') & negpos == 'negative' ~ 'No',
order != 1 & negpos == 'positive' ~ 'No',
order != 1 & is.na(negpos) & !is.na(negpos2) ~ 'Yes',
order != 1 & diff <= 400 & order_result == 'negative' ~ 'No',
order != 1 & diff > 400 & order_result == 'negative' ~ 'No',
order != 1 & order_result == 'positive' ~ 'No'
)) %>%
select(-c(new, lag, negpos, negpos2, diff))
创建于2023-07-04,使用 reprex v2.0.2
# 一个 tibble: 12 × 5
customer order order_result order_date summary
<chr> <chr> <chr> <chr> <chr>
1 A 1 positive 2018-09-14 Yes
2 A 2 lost 2020-08-20 No
3 B 1 negative 2018-09-15 Yes
4 B 2 return 2019-08-25 No
5 C 1 negative 2017-09-12 Yes
6 C 2 lost 2018-09-16 No
7 C 3 negative 2020-08-21 Yes
8 D 1 lost 2018-08-10 Yes
9 E 1 lost 2017-09-13 Yes
10 E 2 return 2018-02-16 Yes
11 E 3 lost 2020-08-21 Yes
12 F 1 return 2017-05-20 Yes
英文:
based on your comment, here is the updated code
df %>%
mutate(new=ifelse(order==1, 'Yes', NA),
lag=ifelse(order!=1,lag(order_date), NA),
diff=as.Date(order_date)-as.Date(lag),
negpos=ifelse(order==1 & order_result %in% c('positive','negative'),order_result,NA_character_),
negpos2=ifelse(order!=1 & !order_result %in% c('positive','negative'),order_result,NA_character_)) %>%
group_by(customer) %>%
fill(negpos,negpos2) %>% ungroup() %>%
mutate(summary=case_when(
order!=1 & order_result=='negative' & negpos=='negative' & negpos2 %in% c('lost','return') ~ 'Yes',
order!=1 & diff<=400 & negpos=='negative' ~ 'No',
order!=1 & diff>400 & negpos=='negative' ~ 'Yes',
order==1 ~ 'Yes',
order!=1 & order_result %in% c('lost','return') & negpos=='negative' ~ 'No',
order!=1 & negpos=='positive' ~ 'No',
order!=1 & is.na(negpos) & !is.na(negpos2) ~ 'Yes',
order!=1 & diff<=400 & order_result=='negative' ~ 'No',
order!=1 & diff>400 & order_result=='negative' ~ 'No',
order!=1 & order_result=='positive' ~ 'No'
)) %>% select(-c(new,lag,negpos,negpos2,diff))
<sup>Created on 2023-07-04 with reprex v2.0.2</sup>
# A tibble: 12 × 5
customer order order_result order_date summary
<chr> <chr> <chr> <chr> <chr>
1 A 1 positive 2018-09-14 Yes
2 A 2 lost 2020-08-20 No
3 B 1 negative 2018-09-15 Yes
4 B 2 return 2019-08-25 No
5 C 1 negative 2017-09-12 Yes
6 C 2 lost 2018-09-16 No
7 C 3 negative 2020-08-21 Yes
8 D 1 lost 2018-08-10 Yes
9 E 1 lost 2017-09-13 Yes
10 E 2 return 2018-02-16 Yes
11 E 3 lost 2020-08-21 Yes
12 F 1 return 2017-05-20 Yes
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论