英文:
Add Fail datetime to next Pass event, and reduce to single event
问题
I understand your request. Here's the translated content you provided:
抱歉,这可能是一个基本问题,但我还在学习R。
我有一个包含测试结果的数据框,它包括项目名称,资源标识符,单个日期时间字段和测试结果的布尔值,'test_fail'=TRUE 表示测试失败。
**示例数据:**
```R
test_data <- data.frame(
project_name = c("project-1","project-1","project-1","project-2","project-1","project-2","project-1","project-2",
"project-2","project-1","project-1","project-3","project-3","project-3","project-2","project-3","project-2","project-2"),
grp_res = c("A","A","D","B","D","B","A","B","B","D","D","E","E","E","B","E","B","B"),
test_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
test_dt = ymd_hm(c("2023-05-01 08:45", "2023-05-01 09:10","2023-05-01 11:00","2023-05-01 14:00", "2023-05-02 09:00",
"2023-05-02 14:00","2023-05-03 09:10","2023-05-03 10:10","2023-05-03 14:10","2023-05-03 17:00",
"2023-05-04 09:30","2023-05-04 11:30","2023-05-05 13:30","2023-05-06 08:00","2023-05-06 16:00","2023-05-07 10:20","2023-05-10 08:30","2023-05-10 17:30")),
test_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
stringsAsFactors = FALSE
)
我尝试了这里建议的两种解决方案:column value changes in R group find date difference
因为这看起来非常接近我尝试实现的内容,但对我来说并不起作用,可能是因为我有一个日期时间字段。
我想捕获失败测试的日期时间 'test_dt',然后查找下一个在给定 'project_name' 和 'grp_res' 中转为通过的测试,将失败的日期时间值分配给已通过事件的新列。
最终,我希望数据看起来像下面的示例:
project_name | grp_res | test_fail | test_dt | test_status | failed_dt | fixed_dt |
---|---|---|---|---|---|---|
project-1 | A | TRUE | 2023-05-01 08:45:00 | 1 | 2023-05-01 08:45:00 | 2023-05-01 09:10:00 |
project-2 | B | TRUE | 2023-05-02 14:00:00 | 1 | 2023-05-02 14:00:00 | 2023-05-03 14:10:00 |
project-2 | B | TRUE | 2023-05-10 08:30:00 | 1 | 2023-05-10 08:30:00 | 2023-05-10 17:30:00 |
project-1 | D | TRUE | 2023-05-03 17:00:00 | 1 | 2023-05-03 17:00:00 | 2023-05-04 09:30:00 |
project-3 | E | TRUE | 2023-05-05 13:30:00 | 1 | 2023-05-05 13:30:00 | 2023-05-07 10:20:00 |
final_data <- data.frame(
project_name = c("project-1","project-2","project-2","project-1","project-3"),
grp_res = c("A","B","B","D","E"),
test_fail = as.logical(c(TRUE,TRUE,TRUE,TRUE,TRUE)),
test_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
failed_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
fixed_dt = ymd_hm(c("2023-05-01 09:10","2023-05-03 14:10","2023-05-10 17:30","2023-05-04 09:30","2023-05-07 10:20")),
test_status = as.numeric(c(1,1,1,1,1)),
stringsAsFactors = FALSE
)
我一直在思考这个问题,并在Stackoverflow上花了不少时间寻找其他可能的解决方案,但没有成功。
任何帮助和建议将不胜感激。
If you have any specific questions or need further assistance with this code, please feel free to ask.
<details>
<summary>英文:</summary>
Excuse what might be a basic question, but I’m still learning r.
I have a data frame that contains test results, it includes project name, a resource identifier, a single date-time field, and Boolean value for the result of the test, ‘test_fail’=TRUE equates to a failed test.
**Example Data:**
test_data <- data.frame(
project_name = c("project-1","project-1","project-1","project-2","project-1","project-2","project-1","project-2",
"project-2","project-1","project-1","project-3","project-3","project-3","project-2","project-3","project-2","project-2"),
grp_res = c("A","A","D","B","D","B","A","B","B","D","D","E","E","E","B","E","B","B"),
test_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
test_dt = ymd_hm(c("2023-05-01 08:45", "2023-05-01 09:10","2023-05-01 11:00","2023-05-01 14:00", "2023-05-02 09:00",
"2023-05-02 14:00","2023-05-03 09:10","2023-05-03 10:10","2023-05-03 14:10","2023-05-03 17:00",
"2023-05-04 09:30","2023-05-04 11:30","2023-05-05 13:30","2023-05-06 08:00","2023-05-06 16:00","2023-05-07 10:20","2023-05-10 08:30","2023-05-10 17:30")),
test_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
stringsAsFactors = FALSE
)
I tried both solutions suggested here: [column value changes in R group find date difference](https://stackoverflow.com/questions/68447878/column-value-changes-in-r-group-find-date-difference)
as this looked very close to what I'm attempting to achieve, but it did not work for me, possibly due to me having a datetime field.
I want to capture the datetime ‘test_dt’ for a failed test and then look forward to the next test that changed to a pass for the given ‘project_name’ and ‘grp_res’, assign the failed datetime value to a new column for the event that passed.
Ultimately, I would like the data to look like the example below:
| project_name | grp_res | test_fail | test_dt | test_status | failed_dt | fixed_dt |
| --- | --- | --- | --- | --- | --- | --- |
| project-1 | A | TRUE | 2023-05-01 08:45:00 | 1 | 2023-05-01 08:45:00 | 2023-05-01 09:10:00 |
| project-2 | B | TRUE | 2023-05-02 14:00:00 | 1 | 2023-05-02 14:00:00 | 2023-05-03 14:10:00 |
| project-2 | B | TRUE | 2023-05-10 08:30:00 | 1 | 2023-05-10 08:30:00 | 2023-05-10 17:30:00 |
| project-1 | D | TRUE | 2023-05-03 17:00:00 | 1 | 2023-05-03 17:00:00 | 2023-05-04 09:30:00 |
| project-3 | E | TRUE | 2023-05-05 13:30:00 | 1 | 2023-05-05 13:30:00 | 2023-05-07 10:20:00 |
final_data <- data.frame(
project_name = c("project-1","project-2","project-2","project-1","project-3"),
grp_res = c("A","B","B","D","E"),
test_fail = as.logical(c(TRUE,TRUE,TRUE,TRUE,TRUE)),
test_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
failed_dt = ymd_hm(c("2023-05-01 08:45","2023-05-02 14:00","2023-05-10 08:30","2023-05-03 17:00","2023-05-05 13:30")),
fixed_dt = ymd_hm(c("2023-05-01 09:10","2023-05-03 14:10","2023-05-10 17:30","2023-05-04 09:30","2023-05-07 10:20")),
test_status = as.numeric(c(1,1,1,1,1)),
stringsAsFactors = FALSE
)
I've been scratching my head on this one, and spent more than a few hours looking for other possible solutions on Stackoverflow, with no luck.
And help and tips will be very much appreciated.
</details>
# 答案1
**得分**: 0
以下是您要翻译的代码部分的内容:
``` r
library(dplyr)
library(purrr)
library(lubridate)
split(test_data, ~ t_fail) %>%
map_at("FALSE", \(df_) select(df_, proj_name, grp_res, fixed_dt = test_dt)) %>%
{ left_join(.$`TRUE`, .$`FALSE`,
by = join_by(proj_name, grp_res, test_dt < fixed_dt),
multiple = "first")
} %>%
slice_min(test_dt, by = c(proj_name, grp_res, fixed_dt)) %>%
arrange(test_dt)
#> proj_name grp_res t_fail test_dt t_status fixed_dt
#> 1 project-1 A TRUE 2023-05-01 09:10:00 1 2023-05-03 09:10:00
#> 2 project-2 B TRUE 2023-05-02 14:00:00 1 2023-05-03 14:10:00
#> 3 project-1 D TRUE 2023-05-03 17:00:00 1 2023-05-04 09:30:00
#> 4 project-3 E TRUE 2023-05-05 13:30:00 1 2023-05-07 10:20:00
#> 5 project-2 B TRUE 2023-05-10 08:30:00 1 2023-05-10 17:30:00
输入数据:
test_data <- data.frame(
proj_name = c("project-1","project-1","project-1","project-2","project-1","project-2","project-1","project-2",
"project-2","project-1","project-1","project-3","project-3","project-3","project-2","project-3","project-2","project-2"),
grp_res = c("A","A","D","B","D","B","A","B","B","D","D","E","E","E","B","E","B","B"),
t_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
test_dt = ymd_hm(c("2023-05-01 08:45", "2023-05-01 09:10","2023-05-01 11:00","2023-05-01 14:00", "2023-05-02 09:00",
"2023-05-02 14:00","2023-05-03 09:10","2023-05-03 10:10","2023-05-03 14:10","2023-05-03 17:00",
"2023-05-04 09:30","2023-05-04 11:30","2023-05-05 13:30","2023-05-06 08:00","2023-05-06 16:00","2023-05-07 10:20","2023-05-10 08:30","2023-05-10 17:30")),
t_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
stringsAsFactors = FALSE
)
希望这有所帮助。如果您需要进一步的翻译或解释,请随时提问。
英文:
One option could be (conditional) join. The dataset is first split by test_fail
values, items in resulting list are named "TRUE"
and "FALSE"
. The part with failures is kept as-is, while for the other only proj_name
, grp_res
& fixed_dt
(renamed test_dt
) columns are kept. When joining both parts, only the first match from the right side is selected. And to handle cases where different failures have been matched to the same fixed_dt
value, slice_min()
with relevant grouping keeps only the first failure.
library(dplyr)
library(purrr)
library(lubridate)
split(test_data, ~ t_fail) %>%
map_at("FALSE", \(df_) select(df_, proj_name, grp_res, fixed_dt = test_dt)) %>%
{ left_join(.$`TRUE`, .$`FALSE`,
by = join_by(proj_name, grp_res, test_dt < fixed_dt),
multiple = "first")
} %>%
slice_min(test_dt, by = c(proj_name, grp_res, fixed_dt)) %>%
arrange(test_dt)
#> proj_name grp_res t_fail test_dt t_status fixed_dt
#> 1 project-1 A TRUE 2023-05-01 09:10:00 1 2023-05-03 09:10:00
#> 2 project-2 B TRUE 2023-05-02 14:00:00 1 2023-05-03 14:10:00
#> 3 project-1 D TRUE 2023-05-03 17:00:00 1 2023-05-04 09:30:00
#> 4 project-3 E TRUE 2023-05-05 13:30:00 1 2023-05-07 10:20:00
#> 5 project-2 B TRUE 2023-05-10 08:30:00 1 2023-05-10 17:30:00
Input data:
test_data <- data.frame(
proj_name = c("project-1","project-1","project-1","project-2","project-1","project-2","project-1","project-2",
"project-2","project-1","project-1","project-3","project-3","project-3","project-2","project-3","project-2","project-2"),
grp_res = c("A","A","D","B","D","B","A","B","B","D","D","E","E","E","B","E","B","B"),
t_fail = as.logical(c(FALSE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,FALSE)),
test_dt = ymd_hm(c("2023-05-01 08:45", "2023-05-01 09:10","2023-05-01 11:00","2023-05-01 14:00", "2023-05-02 09:00",
"2023-05-02 14:00","2023-05-03 09:10","2023-05-03 10:10","2023-05-03 14:10","2023-05-03 17:00",
"2023-05-04 09:30","2023-05-04 11:30","2023-05-05 13:30","2023-05-06 08:00","2023-05-06 16:00","2023-05-07 10:20","2023-05-10 08:30","2023-05-10 17:30")),
t_status = as.numeric(c(0,1,0,0,0,1,0,1,0,1,0,0,1,1,0,0,1,0)),
stringsAsFactors = FALSE
)
<sup>Created on 2023-06-12 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论