For loop – 如何在满足另一列的要求时更改可变数量的条目

huangapple go评论63阅读模式
英文:

For loop - how to change a varying number of entries if requirement is met in another column

问题

我理解你的需求。下面是根据你的伪代码的翻译:

if (order == 1) {
  summary == 'TRUE'  #每个客户的第一笔订单始终为TRUE。
}
if (order_result[row_number()] == 'positive') {   #如果结果为positive
  summary[row_number()+ length(?)] == FALSE  #在positive的结果后,该客户的所有后续行在summary中都为FALSE。
}
if (order_result[row_number()] == 'negative') {   #如果结果为negative,根据该订单日期和后续订单日期之间的时间差有两个选项。
  if (diff_time(orderdate[row_number()], orderdate[row_number(?)]) <= 400) {  
    summary[row_number()+ length(?)] == FALSE  #在negative的结果后,该客户的所有在400天内的后续行在summary中都为FALSE
  } else {
    summary[row_number()+ length(?)] == TRUE  #否则,该客户的所有在400天以上的后续行在summary中都为TRUE
  }
}
if (order_result[row_number()] == 'lost' | order_result[row_number()] == 'return')  {
  summary[row_number() + 1] == TRUE  #如果订单结果为lost或return,该客户的下一笔订单在summary中为TRUE。
}

请注意,伪代码中有一些占位符(?),你需要填写这些占位符以使代码完整。同时,你需要将这些逻辑嵌入到适当的循环中,以便针对每个客户和订单进行迭代。希望这可以帮助你开始解决问题。

英文:

I want to create a new column summary based on the values in order_result and order_date. The column orders is the number of orders for each customer.

I have the following code -

customer &lt;- c(&quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;, &quot;D&quot;, &quot;E&quot;, &quot;E&quot;, &quot;E&quot;, &quot;F&quot;)
order &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;, &quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;1&quot;)
order_result &lt;- c(&quot;positive&quot;, &quot;lost&quot;, &quot;negative&quot;, &quot;return&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;negative&quot;, &quot;lost&quot;, &quot;lost&quot;, &quot;return&quot;, &quot;lost&quot;, &quot;return&quot;)
order_date &lt;- c(&quot;2018-09-14&quot;, &quot;2020-08-20&quot;, &quot;2018-09-15&quot;, &quot;2019-08-25&quot;, &quot;2017-09-12&quot;, &quot;2018-09-16&quot;, &quot;2020-08-21&quot;, &quot;2018-08-10&quot;, &quot;2017-09-13&quot;, &quot;2018-02-16&quot;, &quot;2020-08-21&quot;, &quot;2017-05-20&quot;)
df1 &lt;- data.frame(customer, order, order_result, order_date)

I want to parse through order_results for each customer from earliest to latest order_date and create a new column summary with true or false entries.

The 1st order for a customer always is TRUE in the summary column.

Then, going row by row, if the order_result is "negative", then an unspecified number of later rows in summary for that customer are either FALSE if their order_date is ≤400 days from the index order or TRUE if >400 days from the index order_date. The number of rows to fill out with FALSE depends on the actual order_dates, which can vary based on customer.

If the order_result is "return" or "lost", the next order has TRUE in summary. If the order_result is "positive", all later orders for that patient have FALSE in summary. We move down the list of orders by jumping to the next order that has TRUE in summary (which becomes the next index order) and repeating.

Each customerid is independent of each other. The result should be:

summary &lt;- c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE)

My question is I am not sure how to approach filling out an unspecified number of rows with FALSE since it all depends on the actual order_dates and the elapsed time from the index date. In addition, how can I jump from 1 row with TRUE to the next row with TRUE in summary (ie, change index dates) for the same customer (not the whole dataset) and repeat the process all over again? Is there a way to do this with for-loops and iterating through orders based on customerid and order_date?? I tried using lag/lead in dplyr but was not getting the correct output. Thanks!

EDIT: Here is my pseudocode:

if (order == 1) {
              summary == &#39;TRUE&#39;}  #first order for a customer is always TRUE. 
if (order_result[row_number()] == &#39;positive&#39;) {   #If result is positive
summary[row_number()+ length(?)] == FALSE} #After a positive result, all subsequent rows in summary for that customer are FALSE.
if (order_result[row_number()] == &#39;negative&#39;) {   #If result is negative, there are 2 options based on time difference between that order date and subsequent order dates. 
if (diff_time(orderdate[row_number()], orderdate[row_number(?)]) &lt;= 400 {  
summary[row_number()+ length(?)] == FALSE}   #After a negative result, all subsequent rows under summary within 400 days of that order for that customer are FALSE
else summary[row_number()+ length(?)] == TRUE}  #Otherwise all subsequent rows in summary over 400 days for that customer are TRUE
if (order_result[row_number()] == &#39;lost&#39; | order_result[row_number()] == &#39;return&#39;)  {
summary[row_number() + 1] == TRUE}  #If order result is lost or return, the next order for that customer is true under summary.

答案1

得分: 0

关于 mutate 的重要一点是,你可以使用 group_by 对数据进行分组,然后将你感兴趣的列传递给一个函数,它将为每列提供数据作为一个向量,并将其过滤为你想要的组。

这是我下面所做的基础:

library(tidyverse)

order_process <- function(order_col, result_col, date_col) {
  summary <- c()
  negative_order_date <- NA
  lost_or_returned <- FALSE
  for (i in seq_along(order_col)) {
    
    # 第一行始终为TRUE
    if (i == 1) {
      summary[i] <- TRUE
      
      # 如果订单丢失或退回,则下一个为TRUE。这是那个接下来的订单
    } else if (lost_or_returned) {
      summary[i] <- TRUE
      lost_or_returned = FALSE
      
      # 如果是正数
    } else if (result_col[i] == "positive") {
      # 从现在开始直到列的末尾,所有值都为FALSE
      summary[i:length(order_col)] <- FALSE
      return(summary)
      
      # 如果是负数
    } else if (result_col[i] == "negative") {
      if (is.na(negative_order_date)) {
        negative_order_date = date_col[i]
      }
      summary[i] <- date_col[i] - negative_order_date > 400
      
      # 如果是丢失或退回
    } else if (result_col[i] == "return" | result_col[i] == "lost") {
      summary[i] <- FALSE
      lost_or_returned <- TRUE
    }
  }
  return(summary)
}

df1 %>%
  as_tibble() %>%
  mutate(order_date = ymd(order_date),
         order = as.numeric(order), 
         order_result = as.factor(order_result)) %>%
  group_by(customer) %>%
  mutate(summary = order_process(order, order_result, order_date)) %>%
  pull(summary)

除了倒数第三个值之外,对于每个值都是相同的 - 当 order_result 丢失或退回时,你没有指定在当前值上应该发生什么,所以我猜测它应该是FALSE。

英文:

The great thing about mutate is, you can use group_by to group your data, then pass the columns you are interested in to a function, and it will give you the data for each column as a vector, with it filtered to the group you want.

That's the basis of what I did below:

library(tidyverse)

order_process &lt;- function(order_col, result_col, date_col) {
  summary &lt;- c()
  negative_order_date &lt;- NA
  lost_or_returned &lt;- FALSE
  for (i in seq_along(order_col)) {
    
    # first row is always TRUE
    if (i == 1) {
      summary[i] &lt;- TRUE
      
      # if the order was lost or returned, the next one is TRUE. This is that next one
    } else if (lost_or_returned) {
      summary[i] &lt;- TRUE
      lost_or_returned = FALSE
      
      # If positive
    } else if (result_col[i] == &quot;positive&quot;) {
      # everything from now until the end of the column is FALSE
      summary[i:length(order_col)] &lt;- FALSE
      return(summary)
      
      # If negative
    } else if (result_col[i] == &quot;negative&quot;) {
      if (is.na(negative_order_date)) {
        negative_order_date = date_col[i]
      }
      summary[i] &lt;- date_col[i] - negative_order_date &gt; 400
      
      # If lost or returned
    } else if (result_col[i] == &quot;return&quot; | result_col[i] == &quot;lost&quot;) {
      summary[i] &lt;- FALSE
      lost_or_returned &lt;- TRUE
    }
  }
  return(summary)
}

df1 %&gt;% 
  as_tibble() %&gt;% 
  mutate(order_date = ymd(order_date),
         order = as.numeric(order), 
         order_result = as.factor(order_result)) %&gt;%
  group_by(customer) %&gt;%
  mutate(summary = order_process(order, order_result, order_date)) %&gt;%
  pull(summary)

It's the same for every value except the third last - you didn't specify what should happen on the current value when order_result is lost or returned, so I guessed at it being FALSE.

huangapple
  • 本文由 发表于 2023年7月3日 11:36:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76601714.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定