2023年6月29日 02:58:15go评论101阅读模式

英文:

Changing a different number of subsequent entries in a row based on values from other rows

问题

I understand that you want me to translate the code-related parts of your request. Here's the translated code part:

我想基于“order_result”和“order_date”的值创建一个新列“summary”。列“orders”是每个“customer”的订单数量。
我有以下代码-
    customer <- c("A", "A", "B", "B", "C", "C", "C", "D", "E", "E", "E", "F", "G", "G", "G", "H", "H", "I", "I", "I", "J", "J", "J", "K", "K", "K", "K", "K")
    order <- c("1", "2", "1", "2", "1", "2", "3", "1", "1", "2", "3", "1", "1", "2", "3", "1", "2", "1", "2", "3", "1", "2", "3", "1", "2", "3", "4", "5")
    order_result <- c("positive", "lost", "negative", "return", "negative", "lost", "negative", "lost", "lost", "return", "lost", "return", "lost", "negative", "lost", "lost", "positive", "return", "negative", "lost", "lost","positive", "lost", "return", "negative", "lost", "lost", "negative")
    order_date <- c("2018-09-14", "2020-08-20", "2018-09-15", "2019-08-25", "2017-09-12", "2018-09-16", "2020-08-21", "2018-08-10", "2017-09-13", "2018-02-16", "2020-08-21", "2017-05-20", "2018-07-05", "2019-02-15", "2021-11-04", "2017-08-07", "2021-08-05", "2019-07-30", "2020-09-23", "2020-11-23","2017-10-30", "2018-04-09", "2020-04-09", "2019-07-30", "2019-12-04", "2020-04-04", "2020-08-03", "2021-12-24")
    df1 <- data.frame(customer, order, order_result, order_date)
我想要遍历每个客户的“order_result”，按照订单日期从早到晚的顺序，并创建一个新列“summary”，其中包含是或否的条目。每个客户的第一笔订单在“summary”列中始终为“是”，然后我们根据订单日期（从早到晚）为每个客户遍历“orders”。我们查看“order_result”以确定在“summary”中填充多少行以及使用什么值。假设我们从索引订单开始。如果“order_result”是“negative”，那么“summary”中的后续行数是未指定的，如果其“order_date”与索引订单相差≤400天，则为“否”，如果大于400天，则为“是”。要填充“否”的行数取决于实际的订单日期，这可以根据客户的不同而变化。如果“order_result”是“return”或“lost”，则下一笔订单在“summary”中为“是”。我们通过跳转到具有“summary”中的“是”的下一笔订单（这成为下一个索引订单）并重复来遍历订单列表。
每个“customerid”都是独立的。结果应为：
    summary <- c("Yes", "No", "Yes", "No", "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "No", "Yes")
我的问题是，我不确定如何填充未指定数量的“否”行，因为它完全取决于实际的订单日期和与索引日期的经过时间。此外，如何跳转从“summary”中的一行“是”到同一客户的下一行“是”（而不是整个数据集）并重复整个过程？我尝试使用lag/lead但未获得正确的输出。谢谢！
这是我的伪代码：
    如果（order == 1）{
      summary == 'TRUE'}  #客户的第一笔订单始终为TRUE。 
    如果（order_result[row_number()] == 'positive'）{   #如果结果为正面
      summary[row_number()+ length(?)] == FALSE} #在正面结果后，该客户的所有后续行在summary中都为FALSE。
    如果（order_result[row_number()] == 'negative'）{   #如果结果为负面，则根据该订单日期和后续订单日期之间的时间差异有两个选项。
      如果（diff_time（orderdate[row_number()]，orderdate[row_number(?)]) <= 400 {  
        summary[row_number()+ length(?)] == FALSE}   #在负面结果后，该客户的所有后续行在summary中都在该订单之后的400天内为FALSE
      否则 summary[row_number()+ length(?)] == TRUE}  #否则，该客户的所有后续行在summary中都在400天后为TRUE
    如果（order_result[row_number()] == 'lost' | order_result[row_number()] == 'return'）  {
      summary[row_number() + 1] == TRUE}  #如果订单结果为“丢失”或“退回”，则该客户的下一笔订单在summary中为TRUE。

英文:

*EDIT: NEW question that has not been answered by the responses below or in other SO questions. Is there a way to do this with for-loops and iterating through orders based on customerid and order_date? Or by using elseif, etc in dplyr??

I want to create a new column summary based on the values in order_result and order_date. The column orders is the number of orders for each customer.

I have the following code -

customer &lt;- c(&quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;D&quot;, &quot;E&quot;, &quot;E&quot;, &quot;E&quot;, &quot;F&quot;, &quot;G&quot;, &quot;G&quot;, &quot;G&quot;, &quot;H&quot;, &quot;H&quot;, &quot;I&quot;, &quot;I&quot;, &quot;I&quot;, &quot;J&quot;, &quot;J&quot;, &quot;J&quot;, &quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;K&quot;, &quot;K&quot;)
order &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;)
order_result &lt;- c(&quot;positive&quot;, &quot;lost&quot;, &quot;negative&quot;, &quot;return&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;lost&quot;, &quot;return&quot;, &quot;lost&quot;, &quot;return&quot;, &quot;lost&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;lost&quot;, &quot;positive&quot;, &quot;return&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;lost&quot;,&quot;positive&quot;, &quot;lost&quot;, &quot;return&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;lost&quot;, &quot;negative&quot;)
order_date &lt;- c(&quot;2018-09-14&quot;, &quot;2020-08-20&quot;, &quot;2018-09-15&quot;, &quot;2019-08-25&quot;, &quot;2017-09-12&quot;, &quot;2018-09-16&quot;, &quot;2020-08-21&quot;, &quot;2018-08-10&quot;, &quot;2017-09-13&quot;, &quot;2018-02-16&quot;, &quot;2020-08-21&quot;, &quot;2017-05-20&quot;, &quot;2018-07-05&quot;, &quot;2019-02-15&quot;, &quot;2021-11-04&quot;, &quot;2017-08-07&quot;, &quot;2021-08-05&quot;, &quot;2019-07-30&quot;, &quot;2020-09-23&quot;, &quot;2020-11-23&quot;,&quot;2017-10-30&quot;, &quot;2018-04-09&quot;, &quot;2020-04-09&quot;, &quot;2019-07-30&quot;, &quot;2019-12-04&quot;, &quot;2020-04-04&quot;, &quot;2020-08-03&quot;, &quot;2021-12-24&quot;)
df1 &lt;- data.frame(customer, order, order_result, order_date)

I want to parse through order_results for each customer from earliest to latest order_date and create a new column summary with yes or no entries. The 1st order for a customer always is "yes" in the summary column, and we keep moving down the list of orders for each customer. As we go through the orders based on the order_date (earliest to latest) for each customer, we look at the order_result to determine how many rows to fill out in summary and with what value. Say we start with an index order. If the order_result is "negative", then an unspecified number of later rows in summary for that customer are either "no" if their order_date is ≤400 days from the index order or yes if >400 days from the index order_date. The number of rows to fill out with "no" depends on the actual order_dates, which can vary based on customer. If the order_result is "return" or "lost", the next order has "yes" in summary. If the order_result is "positive", all later orders for that patient have "no" in summary. We move down the list of orders by jumping to the next order that has yes in summary (which becomes the next index order) and repeating.

Each customerid is independent of each other. The result should be:

summary &lt;- c(&quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;Yes&quot;, &quot;Yes&quot;, &quot;No&quot;, &quot;No&quot;, &quot;Yes&quot;)

My question is I am not sure how to approach filling out an unspecified number of rows with "no" since it all depends on the actual order_dates and the elapsed time from the index date. In addition, how can I jump from 1 row with "yes" to the next row with "yes" in summary (ie, change index dates) for the same customer (not the whole dataset) and repeat the process all over again? I tried using lag/lead but was not getting the correct output. Thanks!

EDIT: Here is my pseudocode:

if (order == 1) {
summary == &#39;TRUE&#39;}  #first order for a customer is always TRUE. 
if (order_result[row_number()] == &#39;positive&#39;) {   #If result is positive
summary[row_number()+ length(?)] == FALSE} #After a positive result, all subsequent rows in summary for that customer are FALSE.
if (order_result[row_number()] == &#39;negative&#39;) {   #If result is negative, there are 2 options based on time difference between that order date and subsequent order dates. 
if (diff_time(orderdate[row_number()], orderdate[row_number(?)]) &lt;= 400 {  
summary[row_number()+ length(?)] == FALSE}   #After a negative result, all subsequent rows under summary within 400 days of that order for that customer are FALSE
else summary[row_number()+ length(?)] == TRUE}  #Otherwise all subsequent rows in summary over 400 days for that customer are TRUE
if (order_result[row_number()] == &#39;lost&#39; | order_result[row_number()] == &#39;return&#39;)  {
summary[row_number() + 1] == TRUE}  #If order result is lost or return, the next order for that customer is true under summary.
}

答案1

得分: 1

请检查


df %>%
  mutate(new=ifelse(order==1, '是', NA),
  lag=ifelse(order!=1,lag(order_date), NA),
  diff=as.Date(order_date)-as.Date(lag),
  negpos=ifelse(order==1 & order_result %in% c('positive','negative'),order_result,NA_character_),
  negpos2=ifelse(order!=1 & !order_result %in% c('positive','negative'),order_result,NA_character_)) %>%
  group_by(customer) %>%
  fill(negpos,negpos2) %>%
  ungroup() %>%
  mutate(summary=case_when(
    order!=1 & order_result=='negative' & negpos=='negative' & negpos2 %in% c('lost','return') ~ '是',
    order!=1 & diff<=400 & negpos=='negative' ~ '否',
    order!=1 & diff>400 & negpos=='negative' ~ '是',
    order==1 ~ '是',
    order!=1 & order_result %in% c('lost','return') & negpos=='negative' ~ '否',
    order!=1 & negpos=='positive' ~ '否'
    )) %>%
  select(-c(new,lag,negpos,negpos2,diff))
# 输出
# 一个 tibble: 8 × 5
  客户 order order_result 订单日期 概要
  <chr>    <chr> <chr>        <chr>      <chr>  
1 A        1     positive     2018-09-14 是    
2 A        2     lost         2020-08-20 否     
3 B        1     negative     2018-09-15 是    
4 B        2     return       2019-08-25 否     
5 C        1     negative     2017-09-12 是    
6 C        2     lost         2018-09-16 否     
7 C        3     negative     2020-08-21 是    
8 D        1     lost         2018-08-10 是

英文:

Please check


df %&gt;% 
  mutate(new=ifelse(order==1, &#39;Yes&#39;, NA),
  lag=ifelse(order!=1,lag(order_date), NA),
  diff=as.Date(order_date)-as.Date(lag),
  negpos=ifelse(order==1 &amp; order_result %in% c(&#39;positive&#39;,&#39;negative&#39;),order_result,NA_character_),
  negpos2=ifelse(order!=1 &amp; !order_result %in% c(&#39;positive&#39;,&#39;negative&#39;),order_result,NA_character_)) %&gt;% 
  group_by(customer) %&gt;% 
  fill(negpos,negpos2) %&gt;% ungroup() %&gt;% 
  mutate(summary=case_when(
    order!=1 &amp; order_result==&#39;negative&#39; &amp; negpos==&#39;negative&#39; &amp; negpos2 %in% c(&#39;lost&#39;,&#39;return&#39;) ~ &#39;Yes&#39;,
    order!=1 &amp; diff&lt;=400 &amp; negpos==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; diff&gt;400 &amp; negpos==&#39;negative&#39; ~ &#39;Yes&#39;,
    order==1 ~ &#39;Yes&#39;,
    order!=1 &amp; order_result %in% c(&#39;lost&#39;,&#39;return&#39;) &amp; negpos==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; negpos==&#39;positive&#39; ~ &#39;No&#39;
    )) %&gt;% select(-c(new,lag,negpos,negpos2,diff))
# output
# A tibble: 8 &#215; 5
  customer order order_result order_date summary
  &lt;chr&gt;    &lt;chr&gt; &lt;chr&gt;        &lt;chr&gt;      &lt;chr&gt;  
1 A        1     positive     2018-09-14 Yes    
2 A        2     lost         2020-08-20 No     
3 B        1     negative     2018-09-15 Yes    
4 B        2     return       2019-08-25 No     
5 C        1     negative     2017-09-12 Yes    
6 C        2     lost         2018-09-16 No     
7 C        3     negative     2020-08-21 Yes    
8 D        1     lost         2018-08-10 Yes

答案2

得分: 1

根据您的评论，以下是更新后的代码：

df %>%
  mutate(new = ifelse(order == 1, 'Yes', NA),
         lag = ifelse(order != 1, lag(order_date), NA),
         diff = as.Date(order_date) - as.Date(lag),
         negpos = ifelse(order == 1 & order_result %in% c('positive', 'negative'), order_result, NA_character_),
         negpos2 = ifelse(order != 1 & !order_result %in% c('positive', 'negative'), order_result, NA_character_)) %>%
  group_by(customer) %>%
  fill(negpos, negpos2) %>%
  ungroup() %>%
  mutate(summary = case_when(
    order != 1 & order_result == 'negative' & negpos == 'negative' & negpos2 %in% c('lost', 'return') ~ 'Yes',
    order != 1 & diff <= 400 & negpos == 'negative' ~ 'No',
    order != 1 & diff > 400 & negpos == 'negative' ~ 'Yes',
    order == 1 ~ 'Yes',
    order != 1 & order_result %in% c('lost', 'return') & negpos == 'negative' ~ 'No',
    order != 1 & negpos == 'positive' ~ 'No',
    order != 1 & is.na(negpos) & !is.na(negpos2) ~ 'Yes',
    order != 1 & diff <= 400 & order_result == 'negative' ~ 'No',
    order != 1 & diff > 400 & order_result == 'negative' ~ 'No',
    order != 1 & order_result == 'positive' ~ 'No'
  )) %>%
  select(-c(new, lag, negpos, negpos2, diff))

^{创建于2023-07-04，使用 reprex v2.0.2}

# 一个 tibble: 12 × 5
   customer order order_result order_date summary
   <chr>    <chr> <chr>        <chr>      <chr>  
 1 A        1     positive     2018-09-14 Yes    
 2 A        2     lost         2020-08-20 No     
 3 B        1     negative     2018-09-15 Yes    
 4 B        2     return       2019-08-25 No     
 5 C        1     negative     2017-09-12 Yes    
 6 C        2     lost         2018-09-16 No     
 7 C        3     negative     2020-08-21 Yes    
 8 D        1     lost         2018-08-10 Yes    
 9 E        1     lost         2017-09-13 Yes    
10 E        2     return       2018-02-16 Yes    
11 E        3     lost         2020-08-21 Yes    
12 F        1     return       2017-05-20 Yes

英文:

based on your comment, here is the updated code

df %&gt;% 
  mutate(new=ifelse(order==1, &#39;Yes&#39;, NA),
         lag=ifelse(order!=1,lag(order_date), NA),
         diff=as.Date(order_date)-as.Date(lag),
         negpos=ifelse(order==1 &amp; order_result %in% c(&#39;positive&#39;,&#39;negative&#39;),order_result,NA_character_),
         negpos2=ifelse(order!=1 &amp; !order_result %in% c(&#39;positive&#39;,&#39;negative&#39;),order_result,NA_character_)) %&gt;% 
  group_by(customer) %&gt;% 
  fill(negpos,negpos2) %&gt;% ungroup() %&gt;% 
  mutate(summary=case_when(
    order!=1 &amp; order_result==&#39;negative&#39; &amp; negpos==&#39;negative&#39; &amp; negpos2 %in% c(&#39;lost&#39;,&#39;return&#39;) ~ &#39;Yes&#39;,
    order!=1 &amp; diff&lt;=400 &amp; negpos==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; diff&gt;400 &amp; negpos==&#39;negative&#39; ~ &#39;Yes&#39;,
    order==1 ~ &#39;Yes&#39;,
    order!=1 &amp; order_result %in% c(&#39;lost&#39;,&#39;return&#39;) &amp; negpos==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; negpos==&#39;positive&#39; ~ &#39;No&#39;,
    order!=1 &amp; is.na(negpos) &amp; !is.na(negpos2) ~ &#39;Yes&#39;,
    order!=1 &amp; diff&lt;=400 &amp; order_result==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; diff&gt;400 &amp; order_result==&#39;negative&#39; ~ &#39;No&#39;,
    order!=1 &amp; order_result==&#39;positive&#39; ~ &#39;No&#39;
  )) %&gt;% select(-c(new,lag,negpos,negpos2,diff))

<sup>Created on 2023-07-04 with reprex v2.0.2</sup>


# A tibble: 12 &#215; 5
   customer order order_result order_date summary
   &lt;chr&gt;    &lt;chr&gt; &lt;chr&gt;        &lt;chr&gt;      &lt;chr&gt;  
 1 A        1     positive     2018-09-14 Yes    
 2 A        2     lost         2020-08-20 No     
 3 B        1     negative     2018-09-15 Yes    
 4 B        2     return       2019-08-25 No     
 5 C        1     negative     2017-09-12 Yes    
 6 C        2     lost         2018-09-16 No     
 7 C        3     negative     2020-08-21 Yes    
 8 D        1     lost         2018-08-10 Yes    
 9 E        1     lost         2017-09-13 Yes    
10 E        2     return       2018-02-16 Yes    
11 E        3     lost         2020-08-21 Yes    
12 F        1     return       2017-05-20 Yes

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据其他行的值更改一行中后续条目的不同数量。

问题

答案1

答案2

保存R环境中的绘图列表

为什么在R的persp3D图的坐标轴标签中，一些Unicode下标显示为矩形？

如何动态分配响应对象的名称？

在R中查找数据框列中字符的第一次出现。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。