Add Fail datetime to next Pass event, and reduce to single event

huangapple go评论76阅读模式
英文:

Add Fail datetime to next Pass event, and reduce to single event

问题

I understand your request. Here's the translated content you provided:

抱歉,这可能是一个基本问题,但我还在学习R。
我有一个包含测试结果的数据框,它包括项目名称,资源标识符,单个日期时间字段和测试结果的布尔值,'test_fail'=TRUE 表示测试失败。

**示例数据:**

```R
test_data <- data.frame(
  project_name = c("project-1","project-1","project-1","project-2","project-1","project-2","project-1","project-2",
                   "project-2","project-1","project-1","project-3","project-3","project-3","project-2","project-3","project-2","project-2"),
  grp_res = c("A","A","D","B","D","B","A","B","B","D","D","E","E","E","B","E","B","B"),
  test_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
  test_dt = ymd_hm(c("2023-05-01 08:45", "2023-05-01 09:10","2023-05-01 11:00","2023-05-01 14:00", "2023-05-02 09:00",
                      "2023-05-02 14:00","2023-05-03 09:10","2023-05-03 10:10","2023-05-03 14:10","2023-05-03 17:00",
                     "2023-05-04 09:30","2023-05-04 11:30","2023-05-05 13:30","2023-05-06 08:00","2023-05-06 16:00","2023-05-07 10:20","2023-05-10 08:30","2023-05-10 17:30")),
  test_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
  stringsAsFactors = FALSE
)

我尝试了这里建议的两种解决方案:column value changes in R group find date difference

因为这看起来非常接近我尝试实现的内容,但对我来说并不起作用,可能是因为我有一个日期时间字段。

我想捕获失败测试的日期时间 'test_dt',然后查找下一个在给定 'project_name' 和 'grp_res' 中转为通过的测试,将失败的日期时间值分配给已通过事件的新列。

最终,我希望数据看起来像下面的示例:

project_name grp_res test_fail test_dt test_status failed_dt fixed_dt
project-1 A TRUE 2023-05-01 08:45:00 1 2023-05-01 08:45:00 2023-05-01 09:10:00
project-2 B TRUE 2023-05-02 14:00:00 1 2023-05-02 14:00:00 2023-05-03 14:10:00
project-2 B TRUE 2023-05-10 08:30:00 1 2023-05-10 08:30:00 2023-05-10 17:30:00
project-1 D TRUE 2023-05-03 17:00:00 1 2023-05-03 17:00:00 2023-05-04 09:30:00
project-3 E TRUE 2023-05-05 13:30:00 1 2023-05-05 13:30:00 2023-05-07 10:20:00
final_data <- data.frame(
  project_name = c("project-1","project-2","project-2","project-1","project-3"),
  grp_res = c("A","B","B","D","E"),
  test_fail = as.logical(c(TRUE,TRUE,TRUE,TRUE,TRUE)),
  test_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
  failed_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
  fixed_dt = ymd_hm(c("2023-05-01 09:10","2023-05-03 14:10","2023-05-10 17:30","2023-05-04 09:30","2023-05-07 10:20")),
  test_status = as.numeric(c(1,1,1,1,1)),
  stringsAsFactors = FALSE
)

我一直在思考这个问题,并在Stackoverflow上花了不少时间寻找其他可能的解决方案,但没有成功。

任何帮助和建议将不胜感激。


If you have any specific questions or need further assistance with this code, please feel free to ask.

<details>
<summary>英文:</summary>

Excuse what might be a basic question, but I’m still learning r.
I have a data frame that contains test results, it includes project name, a resource identifier, a single date-time field, and Boolean value for the result of the test, ‘test_fail’=TRUE equates to a failed test.

**Example Data:**

test_data <- data.frame(
project_name = c("project-1","project-1","project-1","project-2","project-1","project-2","project-1","project-2",
"project-2","project-1","project-1","project-3","project-3","project-3","project-2","project-3","project-2","project-2"),
grp_res = c("A","A","D","B","D","B","A","B","B","D","D","E","E","E","B","E","B","B"),
test_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
test_dt = ymd_hm(c("2023-05-01 08:45", "2023-05-01 09:10","2023-05-01 11:00","2023-05-01 14:00", "2023-05-02 09:00",
"2023-05-02 14:00","2023-05-03 09:10","2023-05-03 10:10","2023-05-03 14:10","2023-05-03 17:00",
"2023-05-04 09:30","2023-05-04 11:30","2023-05-05 13:30","2023-05-06 08:00","2023-05-06 16:00","2023-05-07 10:20","2023-05-10 08:30","2023-05-10 17:30")),
test_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
stringsAsFactors = FALSE
)


I tried both solutions suggested here: [column value changes in R group find date difference](https://stackoverflow.com/questions/68447878/column-value-changes-in-r-group-find-date-difference)

as this looked very close to what I&#39;m attempting to achieve, but it did not work for me, possibly due to me having a datetime field.

I want to capture the datetime ‘test_dt’ for a failed test and then look forward to the next test that changed to a pass for the given ‘project_name’ and ‘grp_res’, assign the failed datetime value to a new column for the event that passed.

Ultimately, I would like the data to look like the example below:

| project_name | grp_res | test_fail | test_dt | test_status | failed_dt | fixed_dt |
| --- | --- | --- | --- | --- | --- | --- |
| project-1 | A | TRUE | 2023-05-01 08:45:00 | 1 | 2023-05-01 08:45:00 | 2023-05-01 09:10:00 |
| project-2 | B | TRUE | 2023-05-02 14:00:00 | 1 | 2023-05-02 14:00:00 | 2023-05-03 14:10:00 |
| project-2 | B | TRUE | 2023-05-10 08:30:00 | 1 | 2023-05-10 08:30:00 | 2023-05-10 17:30:00 |
| project-1 | D | TRUE | 2023-05-03 17:00:00 | 1 | 2023-05-03 17:00:00 | 2023-05-04 09:30:00 |
| project-3 | E | TRUE | 2023-05-05 13:30:00 | 1 | 2023-05-05 13:30:00 | 2023-05-07 10:20:00 |

final_data <- data.frame(
project_name = c("project-1","project-2","project-2","project-1","project-3"),
grp_res = c("A","B","B","D","E"),
test_fail = as.logical(c(TRUE,TRUE,TRUE,TRUE,TRUE)),
test_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
failed_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
fixed_dt = ymd_hm(c("2023-05-01 09:10","2023-05-03 14:10","2023-05-10 17:30","2023-05-04 09:30","2023-05-07 10:20")),
test_status = as.numeric(c(1,1,1,1,1)),
stringsAsFactors = FALSE
)


I&#39;ve been scratching my head on this one, and spent more than a few hours looking for other possible solutions on Stackoverflow, with no luck.

And help and tips will be very much appreciated.

</details>


# 答案1
**得分**: 0

以下是您要翻译的代码部分的内容:

``` r
library(dplyr)
library(purrr)
library(lubridate)

split(test_data, ~ t_fail) %>% 
  map_at("FALSE", \(df_) select(df_, proj_name, grp_res, fixed_dt = test_dt)) %>% 
  { left_join(.$`TRUE`, .$`FALSE`, 
            by = join_by(proj_name, grp_res, test_dt < fixed_dt), 
            multiple = "first")
  } %>% 
  slice_min(test_dt, by = c(proj_name, grp_res, fixed_dt)) %>% 
  arrange(test_dt) 
#>   proj_name grp_res t_fail             test_dt t_status            fixed_dt
#> 1 project-1       A   TRUE 2023-05-01 09:10:00        1 2023-05-03 09:10:00
#> 2 project-2       B   TRUE 2023-05-02 14:00:00        1 2023-05-03 14:10:00
#> 3 project-1       D   TRUE 2023-05-03 17:00:00        1 2023-05-04 09:30:00
#> 4 project-3       E   TRUE 2023-05-05 13:30:00        1 2023-05-07 10:20:00
#> 5 project-2       B   TRUE 2023-05-10 08:30:00        1 2023-05-10 17:30:00

输入数据:

test_data <- data.frame(
  proj_name = c("project-1","project-1","project-1","project-2","project-1","project-2","project-1","project-2",
                   "project-2","project-1","project-1","project-3","project-3","project-3","project-2","project-3","project-2","project-2"),
  grp_res = c("A","A","D","B","D","B","A","B","B","D","D","E","E","E","B","E","B","B"),
  t_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
  test_dt = ymd_hm(c("2023-05-01 08:45", "2023-05-01 09:10","2023-05-01 11:00","2023-05-01 14:00", "2023-05-02 09:00",
                     "2023-05-02 14:00","2023-05-03 09:10","2023-05-03 10:10","2023-05-03 14:10","2023-05-03 17:00",
                     "2023-05-04 09:30","2023-05-04 11:30","2023-05-05 13:30","2023-05-06 08:00","2023-05-06 16:00","2023-05-07 10:20","2023-05-10 08:30","2023-05-10 17:30")),
  t_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
  stringsAsFactors = FALSE
)

希望这有所帮助。如果您需要进一步的翻译或解释,请随时提问。

英文:

One option could be (conditional) join. The dataset is first split by test_fail values, items in resulting list are named &quot;TRUE&quot; and &quot;FALSE&quot;. The part with failures is kept as-is, while for the other only proj_name, grp_res & fixed_dt(renamed test_dt) columns are kept. When joining both parts, only the first match from the right side is selected. And to handle cases where different failures have been matched to the same fixed_dt value, slice_min() with relevant grouping keeps only the first failure.

library(dplyr)
library(purrr)
library(lubridate)

split(test_data, ~ t_fail) %&gt;% 
  map_at(&quot;FALSE&quot;, \(df_) select(df_, proj_name, grp_res, fixed_dt = test_dt)) %&gt;% 
  { left_join(.$`TRUE`, .$`FALSE`, 
            by = join_by(proj_name, grp_res, test_dt &lt; fixed_dt), 
            multiple = &quot;first&quot;)
  } %&gt;% 
  slice_min(test_dt, by = c(proj_name, grp_res, fixed_dt)) %&gt;% 
  arrange(test_dt) 
#&gt;   proj_name grp_res t_fail             test_dt t_status            fixed_dt
#&gt; 1 project-1       A   TRUE 2023-05-01 09:10:00        1 2023-05-03 09:10:00
#&gt; 2 project-2       B   TRUE 2023-05-02 14:00:00        1 2023-05-03 14:10:00
#&gt; 3 project-1       D   TRUE 2023-05-03 17:00:00        1 2023-05-04 09:30:00
#&gt; 4 project-3       E   TRUE 2023-05-05 13:30:00        1 2023-05-07 10:20:00
#&gt; 5 project-2       B   TRUE 2023-05-10 08:30:00        1 2023-05-10 17:30:00

Input data:

test_data &lt;- data.frame(
  proj_name = c(&quot;project-1&quot;,&quot;project-1&quot;,&quot;project-1&quot;,&quot;project-2&quot;,&quot;project-1&quot;,&quot;project-2&quot;,&quot;project-1&quot;,&quot;project-2&quot;,
                   &quot;project-2&quot;,&quot;project-1&quot;,&quot;project-1&quot;,&quot;project-3&quot;,&quot;project-3&quot;,&quot;project-3&quot;,&quot;project-2&quot;,&quot;project-3&quot;,&quot;project-2&quot;,&quot;project-2&quot;),
  grp_res = c(&quot;A&quot;,&quot;A&quot;,&quot;D&quot;,&quot;B&quot;,&quot;D&quot;,&quot;B&quot;,&quot;A&quot;,&quot;B&quot;,&quot;B&quot;,&quot;D&quot;,&quot;D&quot;,&quot;E&quot;,&quot;E&quot;,&quot;E&quot;,&quot;B&quot;,&quot;E&quot;,&quot;B&quot;,&quot;B&quot;),
  t_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
  test_dt = ymd_hm(c(&quot;2023-05-01 08:45&quot;, &quot;2023-05-01 09:10&quot;,&quot;2023-05-01 11:00&quot;,&quot;2023-05-01 14:00&quot;, &quot;2023-05-02 09:00&quot;,
                     &quot;2023-05-02 14:00&quot;,&quot;2023-05-03 09:10&quot;,&quot;2023-05-03 10:10&quot;,&quot;2023-05-03 14:10&quot;,&quot;2023-05-03 17:00&quot;,
                     &quot;2023-05-04 09:30&quot;,&quot;2023-05-04 11:30&quot;,&quot;2023-05-05 13:30&quot;,&quot;2023-05-06 08:00&quot;,&quot;2023-05-06 16:00&quot;,&quot;2023-05-07 10:20&quot;,&quot;2023-05-10 08:30&quot;,&quot;2023-05-10 17:30&quot;)),
  t_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
  stringsAsFactors = FALSE
)

<sup>Created on 2023-06-12 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年6月12日 22:24:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76457606.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定