2023年6月12日 22:24:38go评论104阅读模式

英文:

Add Fail datetime to next Pass event, and reduce to single event

问题

I understand your request. Here's the translated content you provided:

抱歉，这可能是一个基本问题，但我还在学习R。
我有一个包含测试结果的数据框，它包括项目名称，资源标识符，单个日期时间字段和测试结果的布尔值，'test_fail'=TRUE 表示测试失败。
**示例数据：**
```R
test_data <- data.frame(
  project_name = c("project-1","project-1","project-1","project-2","project-1","project-2","project-1","project-2",
                   "project-2","project-1","project-1","project-3","project-3","project-3","project-2","project-3","project-2","project-2"),
  grp_res = c("A","A","D","B","D","B","A","B","B","D","D","E","E","E","B","E","B","B"),
  test_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
  test_dt = ymd_hm(c("2023-05-01 08:45", "2023-05-01 09:10","2023-05-01 11:00","2023-05-01 14:00", "2023-05-02 09:00",
                      "2023-05-02 14:00","2023-05-03 09:10","2023-05-03 10:10","2023-05-03 14:10","2023-05-03 17:00",
                     "2023-05-04 09:30","2023-05-04 11:30","2023-05-05 13:30","2023-05-06 08:00","2023-05-06 16:00","2023-05-07 10:20","2023-05-10 08:30","2023-05-10 17:30")),
  test_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
  stringsAsFactors = FALSE
)

我尝试了这里建议的两种解决方案：column value changes in R group find date difference

因为这看起来非常接近我尝试实现的内容，但对我来说并不起作用，可能是因为我有一个日期时间字段。

我想捕获失败测试的日期时间 'test_dt'，然后查找下一个在给定 'project_name' 和 'grp_res' 中转为通过的测试，将失败的日期时间值分配给已通过事件的新列。

最终，我希望数据看起来像下面的示例：

project_name	grp_res	test_fail	test_dt	test_status	failed_dt	fixed_dt
project-1	A	TRUE	2023-05-01 08:45:00	1	2023-05-01 08:45:00	2023-05-01 09:10:00
project-2	B	TRUE	2023-05-02 14:00:00	1	2023-05-02 14:00:00	2023-05-03 14:10:00
project-2	B	TRUE	2023-05-10 08:30:00	1	2023-05-10 08:30:00	2023-05-10 17:30:00
project-1	D	TRUE	2023-05-03 17:00:00	1	2023-05-03 17:00:00	2023-05-04 09:30:00
project-3	E	TRUE	2023-05-05 13:30:00	1	2023-05-05 13:30:00	2023-05-07 10:20:00

final_data <- data.frame(
  project_name = c("project-1","project-2","project-2","project-1","project-3"),
  grp_res = c("A","B","B","D","E"),
  test_fail = as.logical(c(TRUE,TRUE,TRUE,TRUE,TRUE)),
  test_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
  failed_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
  fixed_dt = ymd_hm(c("2023-05-01 09:10","2023-05-03 14:10","2023-05-10 17:30","2023-05-04 09:30","2023-05-07 10:20")),
  test_status = as.numeric(c(1,1,1,1,1)),
  stringsAsFactors = FALSE
)

我一直在思考这个问题，并在Stackoverflow上花了不少时间寻找其他可能的解决方案，但没有成功。

任何帮助和建议将不胜感激。


If you have any specific questions or need further assistance with this code, please feel free to ask.
<details>
<summary>英文:</summary>
Excuse what might be a basic question, but I’m still learning r.
I have a data frame that contains test results, it includes project name, a resource identifier, a single date-time field, and Boolean value for the result of the test, ‘test_fail’=TRUE equates to a failed test.
**Example Data:**

test_data <- data.frame(
project_name = c("project-1","project-1","project-1","project-2","project-1","project-2","project-1","project-2",
"project-2","project-1","project-1","project-3","project-3","project-3","project-2","project-3","project-2","project-2"),
grp_res = c("A","A","D","B","D","B","A","B","B","D","D","E","E","E","B","E","B","B"),
test_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
test_dt = ymd_hm(c("2023-05-01 08:45", "2023-05-01 09:10","2023-05-01 11:00","2023-05-01 14:00", "2023-05-02 09:00",
"2023-05-02 14:00","2023-05-03 09:10","2023-05-03 10:10","2023-05-03 14:10","2023-05-03 17:00",
"2023-05-04 09:30","2023-05-04 11:30","2023-05-05 13:30","2023-05-06 08:00","2023-05-06 16:00","2023-05-07 10:20","2023-05-10 08:30","2023-05-10 17:30")),
test_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
stringsAsFactors = FALSE
)


I tried both solutions suggested here: [column value changes in R group find date difference](https://stackoverflow.com/questions/68447878/column-value-changes-in-r-group-find-date-difference)
as this looked very close to what I&#39;m attempting to achieve, but it did not work for me, possibly due to me having a datetime field.
I want to capture the datetime ‘test_dt’ for a failed test and then look forward to the next test that changed to a pass for the given ‘project_name’ and ‘grp_res’, assign the failed datetime value to a new column for the event that passed.
Ultimately, I would like the data to look like the example below:
| project_name | grp_res | test_fail | test_dt | test_status | failed_dt | fixed_dt |
| --- | --- | --- | --- | --- | --- | --- |
| project-1 | A | TRUE | 2023-05-01 08:45:00 | 1 | 2023-05-01 08:45:00 | 2023-05-01 09:10:00 |
| project-2 | B | TRUE | 2023-05-02 14:00:00 | 1 | 2023-05-02 14:00:00 | 2023-05-03 14:10:00 |
| project-2 | B | TRUE | 2023-05-10 08:30:00 | 1 | 2023-05-10 08:30:00 | 2023-05-10 17:30:00 |
| project-1 | D | TRUE | 2023-05-03 17:00:00 | 1 | 2023-05-03 17:00:00 | 2023-05-04 09:30:00 |
| project-3 | E | TRUE | 2023-05-05 13:30:00 | 1 | 2023-05-05 13:30:00 | 2023-05-07 10:20:00 |

final_data <- data.frame(
project_name = c("project-1","project-2","project-2","project-1","project-3"),
grp_res = c("A","B","B","D","E"),
test_fail = as.logical(c(TRUE,TRUE,TRUE,TRUE,TRUE)),
test_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
failed_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
fixed_dt = ymd_hm(c("2023-05-01 09:10","2023-05-03 14:10","2023-05-10 17:30","2023-05-04 09:30","2023-05-07 10:20")),
test_status = as.numeric(c(1,1,1,1,1)),
stringsAsFactors = FALSE
)


I&#39;ve been scratching my head on this one, and spent more than a few hours looking for other possible solutions on Stackoverflow, with no luck.
And help and tips will be very much appreciated.
</details>
# 答案1
**得分**: 0
以下是您要翻译的代码部分的内容：
``` r
library(dplyr)
library(purrr)
library(lubridate)
split(test_data, ~ t_fail) %>% 
  map_at("FALSE", \(df_) select(df_, proj_name, grp_res, fixed_dt = test_dt)) %>% 
  { left_join(.$`TRUE`, .$`FALSE`, 
            by = join_by(proj_name, grp_res, test_dt < fixed_dt), 
            multiple = "first")
  } %>% 
  slice_min(test_dt, by = c(proj_name, grp_res, fixed_dt)) %>% 
  arrange(test_dt) 
#>   proj_name grp_res t_fail             test_dt t_status            fixed_dt
#> 1 project-1       A   TRUE 2023-05-01 09:10:00        1 2023-05-03 09:10:00
#> 2 project-2       B   TRUE 2023-05-02 14:00:00        1 2023-05-03 14:10:00
#> 3 project-1       D   TRUE 2023-05-03 17:00:00        1 2023-05-04 09:30:00
#> 4 project-3       E   TRUE 2023-05-05 13:30:00        1 2023-05-07 10:20:00
#> 5 project-2       B   TRUE 2023-05-10 08:30:00        1 2023-05-10 17:30:00

输入数据：

test_data <- data.frame(
  proj_name = c("project-1","project-1","project-1","project-2","project-1","project-2","project-1","project-2",
                   "project-2","project-1","project-1","project-3","project-3","project-3","project-2","project-3","project-2","project-2"),
  grp_res = c("A","A","D","B","D","B","A","B","B","D","D","E","E","E","B","E","B","B"),
  t_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
  test_dt = ymd_hm(c("2023-05-01 08:45", "2023-05-01 09:10","2023-05-01 11:00","2023-05-01 14:00", "2023-05-02 09:00",
                     "2023-05-02 14:00","2023-05-03 09:10","2023-05-03 10:10","2023-05-03 14:10","2023-05-03 17:00",
                     "2023-05-04 09:30","2023-05-04 11:30","2023-05-05 13:30","2023-05-06 08:00","2023-05-06 16:00","2023-05-07 10:20","2023-05-10 08:30","2023-05-10 17:30")),
  t_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
  stringsAsFactors = FALSE
)

希望这有所帮助。如果您需要进一步的翻译或解释，请随时提问。

英文:

One option could be (conditional) join. The dataset is first split by test_fail values, items in resulting list are named "TRUE" and "FALSE". The part with failures is kept as-is, while for the other only proj_name, grp_res & fixed_dt(renamed test_dt) columns are kept. When joining both parts, only the first match from the right side is selected. And to handle cases where different failures have been matched to the same fixed_dt value, slice_min() with relevant grouping keeps only the first failure.

library(dplyr)
library(purrr)
library(lubridate)
split(test_data, ~ t_fail) %&gt;% 
  map_at(&quot;FALSE&quot;, \(df_) select(df_, proj_name, grp_res, fixed_dt = test_dt)) %&gt;% 
  { left_join(.$`TRUE`, .$`FALSE`, 
            by = join_by(proj_name, grp_res, test_dt &lt; fixed_dt), 
            multiple = &quot;first&quot;)
  } %&gt;% 
  slice_min(test_dt, by = c(proj_name, grp_res, fixed_dt)) %&gt;% 
  arrange(test_dt) 
#&gt;   proj_name grp_res t_fail             test_dt t_status            fixed_dt
#&gt; 1 project-1       A   TRUE 2023-05-01 09:10:00        1 2023-05-03 09:10:00
#&gt; 2 project-2       B   TRUE 2023-05-02 14:00:00        1 2023-05-03 14:10:00
#&gt; 3 project-1       D   TRUE 2023-05-03 17:00:00        1 2023-05-04 09:30:00
#&gt; 4 project-3       E   TRUE 2023-05-05 13:30:00        1 2023-05-07 10:20:00
#&gt; 5 project-2       B   TRUE 2023-05-10 08:30:00        1 2023-05-10 17:30:00

Input data:

test_data &lt;- data.frame(
  proj_name = c(&quot;project-1&quot;,&quot;project-1&quot;,&quot;project-1&quot;,&quot;project-2&quot;,&quot;project-1&quot;,&quot;project-2&quot;,&quot;project-1&quot;,&quot;project-2&quot;,
                   &quot;project-2&quot;,&quot;project-1&quot;,&quot;project-1&quot;,&quot;project-3&quot;,&quot;project-3&quot;,&quot;project-3&quot;,&quot;project-2&quot;,&quot;project-3&quot;,&quot;project-2&quot;,&quot;project-2&quot;),
  grp_res = c(&quot;A&quot;,&quot;A&quot;,&quot;D&quot;,&quot;B&quot;,&quot;D&quot;,&quot;B&quot;,&quot;A&quot;,&quot;B&quot;,&quot;B&quot;,&quot;D&quot;,&quot;D&quot;,&quot;E&quot;,&quot;E&quot;,&quot;E&quot;,&quot;B&quot;,&quot;E&quot;,&quot;B&quot;,&quot;B&quot;),
  t_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
  test_dt = ymd_hm(c(&quot;2023-05-01 08:45&quot;, &quot;2023-05-01 09:10&quot;,&quot;2023-05-01 11:00&quot;,&quot;2023-05-01 14:00&quot;, &quot;2023-05-02 09:00&quot;,
                     &quot;2023-05-02 14:00&quot;,&quot;2023-05-03 09:10&quot;,&quot;2023-05-03 10:10&quot;,&quot;2023-05-03 14:10&quot;,&quot;2023-05-03 17:00&quot;,
                     &quot;2023-05-04 09:30&quot;,&quot;2023-05-04 11:30&quot;,&quot;2023-05-05 13:30&quot;,&quot;2023-05-06 08:00&quot;,&quot;2023-05-06 16:00&quot;,&quot;2023-05-07 10:20&quot;,&quot;2023-05-10 08:30&quot;,&quot;2023-05-10 17:30&quot;)),
  t_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
  stringsAsFactors = FALSE
)

<sup>Created on 2023-06-12 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Add Fail datetime to next Pass event, and reduce to single event

问题

对于一个RMarkdown报告，泛化变量名称

In R, 有关 dplyr::bind_rows 合并数据框的问题。

将雷达数据重新投影到不同的坐标系。

使用dplyr检查一列的数值序列。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。