根据其他行的值更改一行中后续条目的不同数量。

huangapple go评论74阅读模式
英文:

Changing a different number of subsequent entries in a row based on values from other rows

问题

I understand that you want me to translate the code-related parts of your request. Here's the translated code part:

我想基于“order_result”和“order_date”的值创建一个新列“summary”。列“orders”是每个“customer”的订单数量。

我有以下代码-

    customer <- c("A", "A", "B", "B", "C", "C", "C", "D", "E", "E", "E", "F", "G", "G", "G", "H", "H", "I", "I", "I", "J", "J", "J", "K", "K", "K", "K", "K")
    order <- c("1", "2", "1", "2", "1", "2", "3", "1", "1", "2", "3", "1", "1", "2", "3", "1", "2", "1", "2", "3", "1", "2", "3", "1", "2", "3", "4", "5")
    order_result <- c("positive", "lost", "negative", "return", "negative", "lost", "negative", "lost", "lost", "return", "lost", "return", "lost", "negative", "lost", "lost", "positive", "return", "negative", "lost", "lost","positive", "lost", "return", "negative", "lost", "lost", "negative")
    order_date <- c("2018-09-14", "2020-08-20", "2018-09-15", "2019-08-25", "2017-09-12", "2018-09-16", "2020-08-21", "2018-08-10", "2017-09-13", "2018-02-16", "2020-08-21", "2017-05-20", "2018-07-05", "2019-02-15", "2021-11-04", "2017-08-07", "2021-08-05", "2019-07-30", "2020-09-23", "2020-11-23","2017-10-30", "2018-04-09", "2020-04-09", "2019-07-30", "2019-12-04", "2020-04-04", "2020-08-03", "2021-12-24")
    df1 <- data.frame(customer, order, order_result, order_date)

我想要遍历每个客户的“order_result”,按照订单日期从早到晚的顺序,并创建一个新列“summary”,其中包含是或否的条目。每个客户的第一笔订单在“summary”列中始终为“是”,然后我们根据订单日期(从早到晚)为每个客户遍历“orders”。我们查看“order_result”以确定在“summary”中填充多少行以及使用什么值。假设我们从索引订单开始。如果“order_result”是“negative”,那么“summary”中的后续行数是未指定的,如果其“order_date”与索引订单相差≤400天,则为“否”,如果大于400天,则为“是”。要填充“否”的行数取决于实际的订单日期,这可以根据客户的不同而变化。如果“order_result”是“return”或“lost”,则下一笔订单在“summary”中为“是”。我们通过跳转到具有“summary”中的“是”的下一笔订单(这成为下一个索引订单)并重复来遍历订单列表。

每个“customerid”都是独立的。结果应为:

    summary <- c("Yes", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "No", "Yes")

我的问题是,我不确定如何填充未指定数量的“否”行,因为它完全取决于实际的订单日期和与索引日期的经过时间。此外,如何跳转从“summary”中的一行“是”到同一客户的下一行“是”(而不是整个数据集)并重复整个过程?我尝试使用lag/lead但未获得正确的输出。谢谢!

这是我的伪代码:

    如果(order == 1){
      summary == 'TRUE'}  #客户的第一笔订单始终为TRUE。 
    如果(order_result[row_number()] == 'positive'){   #如果结果为正面
      summary[row_number()+ length(?)] == FALSE} #在正面结果后,该客户的所有后续行在summary中都为FALSE。
    如果(order_result[row_number()] == 'negative'){   #如果结果为负面,则根据该订单日期和后续订单日期之间的时间差异有两个选项。
      如果(diff_time(orderdate[row_number()],orderdate[row_number(?)]) <= 400 {  
        summary[row_number()+ length(?)] == FALSE}   #在负面结果后,该客户的所有后续行在summary中都在该订单之后的400天内为FALSE
      否则 summary[row_number()+ length(?)] == TRUE}  #否则,该客户的所有后续行在summary中都在400天后为TRUE
    如果(order_result[row_number()] == 'lost' | order_result[row_number()] == 'return')  {
      summary[row_number() + 1] == TRUE}  #如果订单结果为“丢失”或“退回”,则该客户的下一笔订单在summary中为TRUE。
英文:

*EDIT: NEW question that has not been answered by the responses below or in other SO questions. Is there a way to do this with for-loops and iterating through orders based on customerid and order_date? Or by using elseif, etc in dplyr??

I want to create a new column summary based on the values in order_result and order_date. The column orders is the number of orders for each customer.

I have the following code -

customer &lt;- c(&quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;D&quot;, &quot;E&quot;, &quot;E&quot;, &quot;E&quot;, &quot;F&quot;, &quot;G&quot;, &quot;G&quot;, &quot;G&quot;, &quot;H&quot;, &quot;H&quot;, &quot;I&quot;, &quot;I&quot;, &quot;I&quot;, &quot;J&quot;, &quot;J&quot;, &quot;J&quot;, &quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;K&quot;)
order &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;)
order_result &lt;- c(&quot;positive&quot;, &quot;lost&quot;, &quot;negative&quot;, &quot;return&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;lost&quot;, &quot;return&quot;, &quot;lost&quot;, &quot;return&quot;, &quot;lost&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;lost&quot;, &quot;positive&quot;, &quot;return&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;lost&quot;,&quot;positive&quot;, &quot;lost&quot;, &quot;return&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;lost&quot;, &quot;negative&quot;)
order_date &lt;- c(&quot;2018-09-14&quot;, &quot;2020-08-20&quot;, &quot;2018-09-15&quot;, &quot;2019-08-25&quot;, &quot;2017-09-12&quot;, &quot;2018-09-16&quot;, &quot;2020-08-21&quot;, &quot;2018-08-10&quot;, &quot;2017-09-13&quot;, &quot;2018-02-16&quot;, &quot;2020-08-21&quot;, &quot;2017-05-20&quot;, &quot;2018-07-05&quot;, &quot;2019-02-15&quot;, &quot;2021-11-04&quot;, &quot;2017-08-07&quot;, &quot;2021-08-05&quot;, &quot;2019-07-30&quot;, &quot;2020-09-23&quot;, &quot;2020-11-23&quot;,&quot;2017-10-30&quot;, &quot;2018-04-09&quot;, &quot;2020-04-09&quot;, &quot;2019-07-30&quot;, &quot;2019-12-04&quot;, &quot;2020-04-04&quot;, &quot;2020-08-03&quot;, &quot;2021-12-24&quot;)
df1 &lt;- data.frame(customer, order, order_result, order_date)

I want to parse through order_results for each customer from earliest to latest order_date and create a new column summary with yes or no entries. The 1st order for a customer always is "yes" in the summary column, and we keep moving down the list of orders for each customer. As we go through the orders based on the order_date (earliest to latest) for each customer, we look at the order_result to determine how many rows to fill out in summary and with what value. Say we start with an index order. If the order_result is "negative", then an unspecified number of later rows in summary for that customer are either "no" if their order_date is ≤400 days from the index order or yes if >400 days from the index order_date. The number of rows to fill out with "no" depends on the actual order_dates, which can vary based on customer. If the order_result is "return" or "lost", the next order has "yes" in summary. If the order_result is "positive", all later orders for that patient have "no" in summary. We move down the list of orders by jumping to the next order that has yes in summary (which becomes the next index order) and repeating.

Each customerid is independent of each other. The result should be:

summary &lt;- c(&quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;)

My question is I am not sure how to approach filling out an unspecified number of rows with "no" since it all depends on the actual order_dates and the elapsed time from the index date. In addition, how can I jump from 1 row with "yes" to the next row with "yes" in summary (ie, change index dates) for the same customer (not the whole dataset) and repeat the process all over again? I tried using lag/lead but was not getting the correct output. Thanks!

EDIT: Here is my pseudocode:

if (order == 1) {
summary == &#39;TRUE&#39;}  #first order for a customer is always TRUE. 
if (order_result[row_number()] == &#39;positive&#39;) {   #If result is positive
summary[row_number()+ length(?)] == FALSE} #After a positive result, all subsequent rows in summary for that customer are FALSE.
if (order_result[row_number()] == &#39;negative&#39;) {   #If result is negative, there are 2 options based on time difference between that order date and subsequent order dates. 
if (diff_time(orderdate[row_number()], orderdate[row_number(?)]) &lt;= 400 {  
summary[row_number()+ length(?)] == FALSE}   #After a negative result, all subsequent rows under summary within 400 days of that order for that customer are FALSE
else summary[row_number()+ length(?)] == TRUE}  #Otherwise all subsequent rows in summary over 400 days for that customer are TRUE
if (order_result[row_number()] == &#39;lost&#39; | order_result[row_number()] == &#39;return&#39;)  {
summary[row_number() + 1] == TRUE}  #If order result is lost or return, the next order for that customer is true under summary.
}

答案1

得分: 1

请检查


df %>%
  mutate(new=ifelse(order==1, '是', NA),
  lag=ifelse(order!=1,lag(order_date), NA),
  diff=as.Date(order_date)-as.Date(lag),
  negpos=ifelse(order==1 & order_result %in% c('positive','negative'),order_result,NA_character_),
  negpos2=ifelse(order!=1 & !order_result %in% c('positive','negative'),order_result,NA_character_)) %>%
  group_by(customer) %>%
  fill(negpos,negpos2) %>%
  ungroup() %>%
  mutate(summary=case_when(
    order!=1 & order_result=='negative' & negpos=='negative' & negpos2 %in% c('lost','return') ~ '是',
    order!=1 & diff<=400 & negpos=='negative' ~ '否',
    order!=1 & diff>400 & negpos=='negative' ~ '是',
    order==1 ~ '是',
    order!=1 & order_result %in% c('lost','return') & negpos=='negative' ~ '否',
    order!=1 & negpos=='positive' ~ '否'
    )) %>%
  select(-c(new,lag,negpos,negpos2,diff))

# 输出

# 一个 tibble: 8 × 5
  客户 order order_result 订单日期 概要
  <chr>    <chr> <chr>        <chr>      <chr>  
1 A        1     positive     2018-09-142 A        2     lost         2020-08-203 B        1     negative     2018-09-154 B        2     return       2019-08-255 C        1     negative     2017-09-126 C        2     lost         2018-09-167 C        3     negative     2020-08-218 D        1     lost         2018-08-10
英文:

Please check


df %&gt;% 
  mutate(new=ifelse(order==1, &#39;Yes&#39;, NA),
  lag=ifelse(order!=1,lag(order_date), NA),
  diff=as.Date(order_date)-as.Date(lag),
  negpos=ifelse(order==1 &amp; order_result %in% c(&#39;positive&#39;,&#39;negative&#39;),order_result,NA_character_),
  negpos2=ifelse(order!=1 &amp; !order_result %in% c(&#39;positive&#39;,&#39;negative&#39;),order_result,NA_character_)) %&gt;% 
  group_by(customer) %&gt;% 
  fill(negpos,negpos2) %&gt;% ungroup() %&gt;% 
  mutate(summary=case_when(
    order!=1 &amp; order_result==&#39;negative&#39; &amp; negpos==&#39;negative&#39; &amp; negpos2 %in% c(&#39;lost&#39;,&#39;return&#39;) ~ &#39;Yes&#39;,
    order!=1 &amp; diff&lt;=400 &amp; negpos==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; diff&gt;400 &amp; negpos==&#39;negative&#39; ~ &#39;Yes&#39;,
    order==1 ~ &#39;Yes&#39;,
    order!=1 &amp; order_result %in% c(&#39;lost&#39;,&#39;return&#39;) &amp; negpos==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; negpos==&#39;positive&#39; ~ &#39;No&#39;
    )) %&gt;% select(-c(new,lag,negpos,negpos2,diff))

# output

# A tibble: 8 &#215; 5
  customer order order_result order_date summary
  &lt;chr&gt;    &lt;chr&gt; &lt;chr&gt;        &lt;chr&gt;      &lt;chr&gt;  
1 A        1     positive     2018-09-14 Yes    
2 A        2     lost         2020-08-20 No     
3 B        1     negative     2018-09-15 Yes    
4 B        2     return       2019-08-25 No     
5 C        1     negative     2017-09-12 Yes    
6 C        2     lost         2018-09-16 No     
7 C        3     negative     2020-08-21 Yes    
8 D        1     lost         2018-08-10 Yes    

答案2

得分: 1

根据您的评论,以下是更新后的代码:

df %>%
  mutate(new = ifelse(order == 1, 'Yes', NA),
         lag = ifelse(order != 1, lag(order_date), NA),
         diff = as.Date(order_date) - as.Date(lag),
         negpos = ifelse(order == 1 & order_result %in% c('positive', 'negative'), order_result, NA_character_),
         negpos2 = ifelse(order != 1 & !order_result %in% c('positive', 'negative'), order_result, NA_character_)) %>%
  group_by(customer) %>%
  fill(negpos, negpos2) %>%
  ungroup() %>%
  mutate(summary = case_when(
    order != 1 & order_result == 'negative' & negpos == 'negative' & negpos2 %in% c('lost', 'return') ~ 'Yes',
    order != 1 & diff <= 400 & negpos == 'negative' ~ 'No',
    order != 1 & diff > 400 & negpos == 'negative' ~ 'Yes',
    order == 1 ~ 'Yes',
    order != 1 & order_result %in% c('lost', 'return') & negpos == 'negative' ~ 'No',
    order != 1 & negpos == 'positive' ~ 'No',
    order != 1 & is.na(negpos) & !is.na(negpos2) ~ 'Yes',
    order != 1 & diff <= 400 & order_result == 'negative' ~ 'No',
    order != 1 & diff > 400 & order_result == 'negative' ~ 'No',
    order != 1 & order_result == 'positive' ~ 'No'
  )) %>%
  select(-c(new, lag, negpos, negpos2, diff))

创建于2023-07-04,使用 reprex v2.0.2

# 一个 tibble: 12 × 5
   customer order order_result order_date summary
   <chr>    <chr> <chr>        <chr>      <chr>  
 1 A        1     positive     2018-09-14 Yes    
 2 A        2     lost         2020-08-20 No     
 3 B        1     negative     2018-09-15 Yes    
 4 B        2     return       2019-08-25 No     
 5 C        1     negative     2017-09-12 Yes    
 6 C        2     lost         2018-09-16 No     
 7 C        3     negative     2020-08-21 Yes    
 8 D        1     lost         2018-08-10 Yes    
 9 E        1     lost         2017-09-13 Yes    
10 E        2     return       2018-02-16 Yes    
11 E        3     lost         2020-08-21 Yes    
12 F        1     return       2017-05-20 Yes    
英文:

based on your comment, here is the updated code

df %&gt;% 
  mutate(new=ifelse(order==1, &#39;Yes&#39;, NA),
         lag=ifelse(order!=1,lag(order_date), NA),
         diff=as.Date(order_date)-as.Date(lag),
         negpos=ifelse(order==1 &amp; order_result %in% c(&#39;positive&#39;,&#39;negative&#39;),order_result,NA_character_),
         negpos2=ifelse(order!=1 &amp; !order_result %in% c(&#39;positive&#39;,&#39;negative&#39;),order_result,NA_character_)) %&gt;% 
  group_by(customer) %&gt;% 
  fill(negpos,negpos2) %&gt;% ungroup() %&gt;% 
  mutate(summary=case_when(
    order!=1 &amp; order_result==&#39;negative&#39; &amp; negpos==&#39;negative&#39; &amp; negpos2 %in% c(&#39;lost&#39;,&#39;return&#39;) ~ &#39;Yes&#39;,
    order!=1 &amp; diff&lt;=400 &amp; negpos==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; diff&gt;400 &amp; negpos==&#39;negative&#39; ~ &#39;Yes&#39;,
    order==1 ~ &#39;Yes&#39;,
    order!=1 &amp; order_result %in% c(&#39;lost&#39;,&#39;return&#39;) &amp; negpos==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; negpos==&#39;positive&#39; ~ &#39;No&#39;,
    order!=1 &amp; is.na(negpos) &amp; !is.na(negpos2) ~ &#39;Yes&#39;,
    order!=1 &amp; diff&lt;=400 &amp; order_result==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; diff&gt;400 &amp; order_result==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; order_result==&#39;positive&#39; ~ &#39;No&#39;
  )) %&gt;% select(-c(new,lag,negpos,negpos2,diff))

<sup>Created on 2023-07-04 with reprex v2.0.2</sup>


# A tibble: 12 &#215; 5
   customer order order_result order_date summary
   &lt;chr&gt;    &lt;chr&gt; &lt;chr&gt;        &lt;chr&gt;      &lt;chr&gt;  
 1 A        1     positive     2018-09-14 Yes    
 2 A        2     lost         2020-08-20 No     
 3 B        1     negative     2018-09-15 Yes    
 4 B        2     return       2019-08-25 No     
 5 C        1     negative     2017-09-12 Yes    
 6 C        2     lost         2018-09-16 No     
 7 C        3     negative     2020-08-21 Yes    
 8 D        1     lost         2018-08-10 Yes    
 9 E        1     lost         2017-09-13 Yes    
10 E        2     return       2018-02-16 Yes    
11 E        3     lost         2020-08-21 Yes    
12 F        1     return       2017-05-20 Yes    

huangapple
  • 本文由 发表于 2023年6月29日 02:58:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76575990.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定