问题

我已经翻译好了你提供的内容，以下是翻译结果：

我有一个包含问题回答为正确或错误的学生数据集。还有一个以秒为单位的时间变量。我想创建一个时间标志，记录按`1分钟`、`2分钟`和`3分钟`阈值计算的正确和错误回答数量。以下是一个示例数据集。

    df <- data.frame(id = c(1,2,3,4,5),
                     gender = c("m","f","m","f","m"),
                     age = c(11,12,12,13,14),
                     i1 = c(1,0,NA,1,0),
                     i2 = c(0,1,0,"1]",1),
                     i3 = c("1]",1,"1]",0,"0]"),
                     i4 = c(0,"0]",1,1,0),
                     i5 = c(1,1,NA,"0]","1]"),
                     i6 = c(0,0,"0]",1,1),
                     i7 = c(1,"1]",1,0,0),
                     i8 = c(0,0,0,"1]","1]"),
                     i9 = c(1,1,1,0,NA),
                     time = c(115,138,148,195, 225))
    
     > df
      id gender age i1 i2 i3 i4   i5 i6 i7 i8 i9 time
    1  1      m  11  1  0 1]  0    1  0  1  0  1  115
    2  2      f  12  0  1  1 0]    1  0 1]  0  1  138
    3  3      m  12 NA  0 1]  1 <NA> 0]  1  0  1  148
    4  4      f  13  1 1]  0  1   0]  1  0 1]  0  195
    5  5      m  14  0  1 0]  0   1]  1  0 1] NA  225

分钟阈值由分数右侧的`]`符号表示。

例如，对于`id = 3`，`1分钟`阈值位于`i3`项目，`2分钟`阈值位于`i6`项目。每个学生可能具有不同的时间阈值。

我需要创建标志变量，以计算按`1分钟`、`2分钟`和`3分钟`阈值计算的正确和错误回答数量。

如何实现以下所需的数据集。

    > df1
      id gender age i1 i2 i3 i4   i5 i6 i7 i8 i9 time one_true one_false two_true two_false three_true three_false
    1  1      m  11  1  0 1]  0    1  0  1  0  1  115        2         1       NA        NA         NA          NA
    2  2      f  12  0  1  1 0]    1  0 1]  0  1  138        2         2        4         3         NA          NA
    3  3      m  12 NA  0 1]  1 <NA> 0]  1  0  1  148        1         1        2         2         NA          NA
    4  4      f  13  1 1]  0  1   0]  1  0 1]  0  195        2         0        3         2          5           3
    5  5      m  14  0  1 0]  0   1]  1  0 1] NA  225        1         2        2         3          4           4

请注意，我已经将代码部分保留在原文中，不进行翻译。如果需要进一步的解释或帮助，请随时提问。

英文:

I have a student dataset that includes responses to questions as right or wrong. There is also a time variable in seconds. I would like to create a time flag to record number of correct and incorrect responses by 1 minute 2 minute and 3 minute thresholds. Here is a sample dataset.

df &lt;- data.frame(id = c(1,2,3,4,5),
gender = c(&quot;m&quot;,&quot;f&quot;,&quot;m&quot;,&quot;f&quot;,&quot;m&quot;),
age = c(11,12,12,13,14),
i1 = c(1,0,NA,1,0),
i2 = c(0,1,0,&quot;1]&quot;,1),
i3 = c(&quot;1]&quot;,1,&quot;1]&quot;,0,&quot;0]&quot;),
i4 = c(0,&quot;0]&quot;,1,1,0),
i5 = c(1,1,NA,&quot;0]&quot;,&quot;1]&quot;),
i6 = c(0,0,&quot;0]&quot;,1,1),
i7 = c(1,&quot;1]&quot;,1,0,0),
i8 = c(0,0,0,&quot;1]&quot;,&quot;1]&quot;),
i9 = c(1,1,1,0,NA),
time = c(115,138,148,195, 225))
&gt; df
id gender age i1 i2 i3 i4   i5 i6 i7 i8 i9 time
1  1      m  11  1  0 1]  0    1  0  1  0  1  115
2  2      f  12  0  1  1 0]    1  0 1]  0  1  138
3  3      m  12 NA  0 1]  1 &lt;NA&gt; 0]  1  0  1  148
4  4      f  13  1 1]  0  1   0]  1  0 1]  0  195
5  5      m  14  0  1 0]  0   1]  1  0 1] NA  225

The minute thresholds are represented by a ] sign at the right side of the score.

For example for the id = 3, the 1-minute threshold is at item i3 , the 2-minute threshold is at item i6. Each student might have different time thresholds.

I need to create flagging variables to count number of correct and incorrect responses by the 1-min 2-min and 3-min thresholds.

How can I achieve the desired dataset as below.

&gt; df1
id gender age i1 i2 i3 i4   i5 i6 i7 i8 i9 time one_true one_false two_true two_false three_true three_false
1  1      m  11  1  0 1]  0    1  0  1  0  1  115        2         1       NA        NA         NA          NA
2  2      f  12  0  1  1 0]    1  0 1]  0  1  138        2         2        4         3         NA          NA
3  3      m  12 NA  0 1]  1 &lt;NA&gt; 0]  1  0  1  148        1         1        2         2         NA          NA
4  4      f  13  1 1]  0  1   0]  1  0 1]  0  195        2         0        3         2          5           3
5  5      m  14  0  1 0]  0   1]  1  0 1] NA  225        1         2        2         3          4           4

答案1

得分: 1

这里是一个dplyr管道，可以生成你想要的结果。

我选择使用xfun::n2w来将1转换为"one"等。如果你可以接受严格的数字，那么你不需要这个步骤。

library(dplyr)
library(tidyr) # pivot_*
# library(xfun) # n2w, convert numbers to words
df %>%
  select(-gender, -age, -time) %>%
  mutate(across(-id, as.character)) %>%
  pivot_longer(-id) %>%
  arrange(id, name) %>%
  mutate(
    grp = 1L + cumsum(grepl("]", lag(value), fixed=TRUE)),
    grp = xfun::n2w(grp),
    num = if_else(gsub("[^0-9]", "", value) == "1", "true", "false"),
    .by = id) %>%
  filter(
    any(grepl("]", value, fixed = TRUE)), 
    .by = c(id, grp)) %>%
  count(id, grp, num) %>%
  filter(!is.na(num)) %>%
  mutate(n = cumsum(n), .by = c(id, num)) %>%
  pivot_wider(
    id, names_sep = "_",
    names_from = c("grp", "num"), values_from = "n"
  ) %>%
  left_join(df, ., by = "id")
#   id gender age i1 i2 i3 i4   i5 i6 i7 i8 i9 time one_false one_true two_false two_true three_false three_true
# 1  1      m  11  1  0 1]  0    1  0  1  0  1  115         1        2        NA       NA          NA         NA
# 2  2      f  12  0  1  1 0]    1  0 1]  0  1  138         2        2         3        4          NA         NA
# 3  3      m  12 NA  0 1]  1 <NA> 0]  1  0  1  148         1        1         2        2          NA         NA
# 4  4      f  13  1 1]  0  1   0]  1  0 1]  0  195        NA        2         3        5           1          4
# 5  5      m  14  0  1 0]  0   1]  1  0 1] NA  225         2        1         4        4           3          3

这里使用了.by=, 所以需要 dplyr_1.1.0 或更新版本。

大部分工作都在一个临时框架上完成，返回id和预期的摘要列 one_false 等。

mutate(across(-id, as.character))，因为 pivot_longer 需要兼容的数据类型，你的示例数据有一些列是整数，一些是字符。
pivot_longer 重新塑造了数据，将它从“宽”格式转换成三列：id、name（"i1"，"i2"，...）和value（"1"，"0"，"1]"，...）
mutate(grp...) 条件化组，使得直到第一次出现]的所有内容都在组"one"、"two"等中；使用xfun:n2w将2转换为"two"纯粹是为了美观，如果可以接受以数字开头的列名，比如1_true（假设你也对num进行了变换）。
mutate(num...) 将你的1和0转换为"true"和"false"；这主要是为了匹配你的预期输出；如果你只想要0和1，那么你仍然需要移除]以便正确计数。
filter(any(...)) 移除不以]结尾的行。
count 计数（奇怪的计数）按组进行。
mutate(n=cumsum(n)) 按不同的组确保你的one_true和two_true是累积的。
pivot_wider 将多行转换为列，撤销了我们的第一步努力。
我们将这个摘要重新与原始的 df 进行连接，使用 left_join。

英文:

Here's a dplyr pipe that produces what you want.

I'm optionally using xfun::n2w to convert 1 to "one", etc. If you can accept strict numbers, then you don't need this.

library(dplyr)
library(tidyr) # pivot_*
# library(xfun) # n2w, convert numbers to words
df %&gt;%
  select(-gender, -age, -time) %&gt;%
  mutate(across(-id, as.character)) %&gt;%
  pivot_longer(-id) %&gt;%
  arrange(id, name) %&gt;%
  mutate(
    grp = 1L + cumsum(grepl(&quot;]&quot;, lag(value), fixed=TRUE)),
    grp = xfun::n2w(grp),
    num = if_else(gsub(&quot;[^0-9]&quot;, &quot;&quot;, value) == &quot;1&quot;, &quot;true&quot;, &quot;false&quot;),
    .by = id) %&gt;%
  filter(
    any(grepl(&quot;]&quot;, value, fixed = TRUE)), 
    .by = c(id, grp)) %&gt;%
  count(id, grp, num) %&gt;%
  filter(!is.na(num)) %&gt;%
  mutate(n = cumsum(n), .by = c(id, num)) %&gt;%
  pivot_wider(
    id, names_sep = &quot;_&quot;,
    names_from = c(&quot;grp&quot;, &quot;num&quot;), values_from = &quot;n&quot;
  ) %&gt;%
  left_join(df, ., by = &quot;id&quot;)
#   id gender age i1 i2 i3 i4   i5 i6 i7 i8 i9 time one_false one_true two_false two_true three_false three_true
# 1  1      m  11  1  0 1]  0    1  0  1  0  1  115         1        2        NA       NA          NA         NA
# 2  2      f  12  0  1  1 0]    1  0 1]  0  1  138         2        2         3        4          NA         NA
# 3  3      m  12 NA  0 1]  1 &lt;NA&gt; 0]  1  0  1  148         1        1         2        2          NA         NA
# 4  4      f  13  1 1]  0  1   0]  1  0 1]  0  195        NA        2         3        5           1          4
# 5  5      m  14  0  1 0]  0   1]  1  0 1] NA  225         2        1         4        4           3          3

This is using .by=, so required dplyr_1.1.0 or newer.

Most of the work is done on a temporary frame that returns id and the intended summary columns one_false and beyond.

mutate(across(-id, as.character)) because pivot_longer requires compatible classes, and your sample data here has some columns int, some chr.
pivot_longer reshapes from "wide" to three columns: id, name ("i1", "i2", ...), and value ("1", "0", "1]", ...)
mutate(grp...) condition the groups such that everything up until the first occurrence of ] is in group "one", "two", etc; the use of xfun:n2w to go from 2 to "two" is purely aesthetic, you can do without if you can accept column names starting with numbers, ala 1_true (assuming you mutate num as well);
mutate(num...) converts your 1s and 0s to "true" and "false"; this is mostly aesthetic, included to match your intended output; if you would prefer just 0 and 1, then you'd still need to remove the ] in order to count things correctly
filter(any(...)) removes rows that do not end in a value with ]
count counts (weird) by the groups
mutate(n=cumsum(n)) by different grouping ensures your one_true and two_true are cumulative
pivot_wider converts multiple rows into columns, undoing our first effort
we bring that summary back to the original df with left_join

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按时间阈值在R中计算真值、假值和总和值。

问题

答案1

在ggplot2上添加注释会导致密度图消失。

如何在R中使用循环处理重复的代码

使用forestplot下划线标题

使Shiny应用中的onClcik在用户单击表中的单元格时起作用。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论