2023年7月3日 11:36:44go评论82阅读模式

英文:

For loop - how to change a varying number of entries if requirement is met in another column

问题

我理解你的需求。下面是根据你的伪代码的翻译：

if (order == 1) {
  summary == 'TRUE'  #每个客户的第一笔订单始终为TRUE。
}
if (order_result[row_number()] == 'positive') {   #如果结果为positive
  summary[row_number()+ length(?)] == FALSE  #在positive的结果后，该客户的所有后续行在summary中都为FALSE。
}
if (order_result[row_number()] == 'negative') {   #如果结果为negative，根据该订单日期和后续订单日期之间的时间差有两个选项。
  if (diff_time(orderdate[row_number()], orderdate[row_number(?)]) <= 400) {  
    summary[row_number()+ length(?)] == FALSE  #在negative的结果后，该客户的所有在400天内的后续行在summary中都为FALSE
  } else {
    summary[row_number()+ length(?)] == TRUE  #否则，该客户的所有在400天以上的后续行在summary中都为TRUE
  }
}
if (order_result[row_number()] == 'lost' | order_result[row_number()] == 'return')  {
  summary[row_number() + 1] == TRUE  #如果订单结果为lost或return，该客户的下一笔订单在summary中为TRUE。
}

请注意，伪代码中有一些占位符（?），你需要填写这些占位符以使代码完整。同时，你需要将这些逻辑嵌入到适当的循环中，以便针对每个客户和订单进行迭代。希望这可以帮助你开始解决问题。

英文:

I want to create a new column summary based on the values in order_result and order_date. The column orders is the number of orders for each customer.

I have the following code -

customer &lt;- c(&quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;D&quot;, &quot;E&quot;, &quot;E&quot;, &quot;E&quot;, &quot;F&quot;)
order &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;)
order_result &lt;- c(&quot;positive&quot;, &quot;lost&quot;, &quot;negative&quot;, &quot;return&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;lost&quot;, &quot;return&quot;, &quot;lost&quot;, &quot;return&quot;)
order_date &lt;- c(&quot;2018-09-14&quot;, &quot;2020-08-20&quot;, &quot;2018-09-15&quot;, &quot;2019-08-25&quot;, &quot;2017-09-12&quot;, &quot;2018-09-16&quot;, &quot;2020-08-21&quot;, &quot;2018-08-10&quot;, &quot;2017-09-13&quot;, &quot;2018-02-16&quot;, &quot;2020-08-21&quot;, &quot;2017-05-20&quot;)
df1 &lt;- data.frame(customer, order, order_result, order_date)

I want to parse through order_results for each customer from earliest to latest order_date and create a new column summary with true or false entries.

The 1st order for a customer always is TRUE in the summary column.

Then, going row by row, if the order_result is "negative", then an unspecified number of later rows in summary for that customer are either FALSE if their order_date is ≤400 days from the index order or TRUE if >400 days from the index order_date. The number of rows to fill out with FALSE depends on the actual order_dates, which can vary based on customer.

If the order_result is "return" or "lost", the next order has TRUE in summary. If the order_result is "positive", all later orders for that patient have FALSE in summary. We move down the list of orders by jumping to the next order that has TRUE in summary (which becomes the next index order) and repeating.

Each customerid is independent of each other. The result should be:

summary &lt;- c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)

My question is I am not sure how to approach filling out an unspecified number of rows with FALSE since it all depends on the actual order_dates and the elapsed time from the index date. In addition, how can I jump from 1 row with TRUE to the next row with TRUE in summary (ie, change index dates) for the same customer (not the whole dataset) and repeat the process all over again? Is there a way to do this with for-loops and iterating through orders based on customerid and order_date?? I tried using lag/lead in dplyr but was not getting the correct output. Thanks!

EDIT: Here is my pseudocode:

if (order == 1) {
              summary == &#39;TRUE&#39;}  #first order for a customer is always TRUE. 
if (order_result[row_number()] == &#39;positive&#39;) {   #If result is positive
summary[row_number()+ length(?)] == FALSE} #After a positive result, all subsequent rows in summary for that customer are FALSE.
if (order_result[row_number()] == &#39;negative&#39;) {   #If result is negative, there are 2 options based on time difference between that order date and subsequent order dates. 
if (diff_time(orderdate[row_number()], orderdate[row_number(?)]) &lt;= 400 {  
summary[row_number()+ length(?)] == FALSE}   #After a negative result, all subsequent rows under summary within 400 days of that order for that customer are FALSE
else summary[row_number()+ length(?)] == TRUE}  #Otherwise all subsequent rows in summary over 400 days for that customer are TRUE
if (order_result[row_number()] == &#39;lost&#39; | order_result[row_number()] == &#39;return&#39;)  {
summary[row_number() + 1] == TRUE}  #If order result is lost or return, the next order for that customer is true under summary.

答案1

得分: 0

关于 mutate 的重要一点是，你可以使用 group_by 对数据进行分组，然后将你感兴趣的列传递给一个函数，它将为每列提供数据作为一个向量，并将其过滤为你想要的组。

这是我下面所做的基础：

library(tidyverse)
order_process <- function(order_col, result_col, date_col) {
  summary <- c()
  negative_order_date <- NA
  lost_or_returned <- FALSE
  for (i in seq_along(order_col)) {
    
    # 第一行始终为TRUE
    if (i == 1) {
      summary[i] <- TRUE
      
      # 如果订单丢失或退回，则下一个为TRUE。这是那个接下来的订单
    } else if (lost_or_returned) {
      summary[i] <- TRUE
      lost_or_returned = FALSE
      
      # 如果是正数
    } else if (result_col[i] == "positive") {
      # 从现在开始直到列的末尾，所有值都为FALSE
      summary[i:length(order_col)] <- FALSE
      return(summary)
      
      # 如果是负数
    } else if (result_col[i] == "negative") {
      if (is.na(negative_order_date)) {
        negative_order_date = date_col[i]
      }
      summary[i] <- date_col[i] - negative_order_date > 400
      
      # 如果是丢失或退回
    } else if (result_col[i] == "return" | result_col[i] == "lost") {
      summary[i] <- FALSE
      lost_or_returned <- TRUE
    }
  }
  return(summary)
}
df1 %>%
  as_tibble() %>%
  mutate(order_date = ymd(order_date),
         order = as.numeric(order), 
         order_result = as.factor(order_result)) %>%
  group_by(customer) %>%
  mutate(summary = order_process(order, order_result, order_date)) %>%
  pull(summary)

除了倒数第三个值之外，对于每个值都是相同的 - 当 order_result 丢失或退回时，你没有指定在当前值上应该发生什么，所以我猜测它应该是FALSE。

英文:

The great thing about mutate is, you can use group_by to group your data, then pass the columns you are interested in to a function, and it will give you the data for each column as a vector, with it filtered to the group you want.

That's the basis of what I did below:

library(tidyverse)
order_process &lt;- function(order_col, result_col, date_col) {
  summary &lt;- c()
  negative_order_date &lt;- NA
  lost_or_returned &lt;- FALSE
  for (i in seq_along(order_col)) {
    
    # first row is always TRUE
    if (i == 1) {
      summary[i] &lt;- TRUE
      
      # if the order was lost or returned, the next one is TRUE. This is that next one
    } else if (lost_or_returned) {
      summary[i] &lt;- TRUE
      lost_or_returned = FALSE
      
      # If positive
    } else if (result_col[i] == &quot;positive&quot;) {
      # everything from now until the end of the column is FALSE
      summary[i:length(order_col)] &lt;- FALSE
      return(summary)
      
      # If negative
    } else if (result_col[i] == &quot;negative&quot;) {
      if (is.na(negative_order_date)) {
        negative_order_date = date_col[i]
      }
      summary[i] &lt;- date_col[i] - negative_order_date &gt; 400
      
      # If lost or returned
    } else if (result_col[i] == &quot;return&quot; | result_col[i] == &quot;lost&quot;) {
      summary[i] &lt;- FALSE
      lost_or_returned &lt;- TRUE
    }
  }
  return(summary)
}
df1 %&gt;% 
  as_tibble() %&gt;% 
  mutate(order_date = ymd(order_date),
         order = as.numeric(order), 
         order_result = as.factor(order_result)) %&gt;%
  group_by(customer) %&gt;%
  mutate(summary = order_process(order, order_result, order_date)) %&gt;%
  pull(summary)

It's the same for every value except the third last - you didn't specify what should happen on the current value when order_result is lost or returned, so I guessed at it being FALSE.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

For loop – 如何在满足另一列的要求时更改可变数量的条目

问题

答案1

在x轴上绘制日期并自定义标签。

需要帮助创建一个具有均值和标准差的函数。

Group and merge rows by ID when there are identical start and end date fields in R columns

为什么当我想在Python和R中查找我的模型的AIC时，会得到不同的结果？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。