2023年8月9日 04:36:38go评论118阅读模式

英文:

How can I use mutate to create new variables in nested for loops?

问题

我有一个数据集，其中包含唯一的参与者ID，每个参与者都由两个不同的评分者ID对许多不同的变量（这里是Q1、Q2和Q3）进行评分。我想找到一种方法来计算一个变量，该变量指示两个评分者的评分是否相差1分。以下是我正在使用的数据的简化版本：

library(tidyverse)
Participant_ID <- rep(1:3,2)
Rater_ID <- c(rep("A",3),rep("B",3))
Q1 <- c(5, 2, 1,3, 3, 4)
Q2 <- c(4, 2, 2,3, 5, 2)
Q3 <- c(4, 3, 3,3, 4, 5)
df <- tibble(Participant_ID, Rater_ID, Q1, Q2, Q3)

我可以通过使用以下代码的每个迭代来实现：

df <- df %>% group_by(Participant_ID) %>%   
mutate(Check_Q1= ifelse((abs(Q1[1]-Q1[2]) > 1), 1, 0),          
Check_Q2= ifelse((abs(Q2[1]-Q2[2]) > 1), 1, 0),          
Check_Q3= ifelse((abs(Q3[1]-Q3[2]) > 1), 1, 0)) %>% ungroup()

对于参与者1，Q1被标记为1，对于参与者2，Q2被标记为1，对于参与者3，Q1和Q3都被标记为1，因为评分之间的差异大于1。

然而，在我的真实数据中，不仅有3个"Q"变量，还有很多。此外，我希望这段代码能够在各种情况下使用，其中Q变量的数量将会改变。用户将在运行代码之前指定number_of_questions。我一直在尝试弄清楚如何使用for循环来实现这一点，但我无法弄清楚。我目前的进展如下：

number_of_questions <- 3
questions <- grep("Q", names(df), value=TRUE)
df <-  df %>% group_by(Participant_ID)
for(q in questions){
for(x in 1:number_of_questions){
    check_varname <- paste0("Check_Q",x)
    
    df <- df %>% 
      mutate(!!check_varname := ifelse((abs(get(q)[1]-get(q)[2]) > 1), 1, 0))   
}}
df <-  df %>% ungroup()

我没有收到任何错误，但输出结果不正确。它为Participant_ID 3分配了1个Q1、Q2和Q3。有人可以帮我理解我做错了什么吗？

英文:

I have a dataset with unique Participant_IDs that are each rated by two different Rater_IDs on many different variables (Q1, Q2, and Q3 here). I am trying to find a way to compute a variable which indicates whether the two raters' ratings are within 1 point of each other. Here's a simplified version of the data I'm working with:

library(tidyverse)
Participant_ID &lt;- rep(1:3,2)
Rater_ID &lt;- c(rep(&quot;A&quot;,3),rep(&quot;B&quot;,3))
Q1 &lt;- c(5, 2, 1,3, 3, 4)
Q2 &lt;- c(4, 2, 2,3, 5, 2)
Q3 &lt;- c(4, 3, 3,3, 4, 5)
df &lt;- tibble(Participant_ID, Rater_ID, Q1, Q2, Q3)

I am able to do this by spelling out each iteration of the code using below:

df &lt;- df %&gt;% group_by(Participant_ID) %&gt;%   
mutate(Check_Q1= ifelse((abs(Q1[1]-Q1[2]) &gt; 1), 1, 0),          
Check_Q2= ifelse((abs(Q2[1]-Q2[2]) &gt; 1), 1, 0),          
Check_Q3= ifelse((abs(Q3[1]-Q3[2]) &gt; 1), 1, 0)) %&gt;% ungroup()

Q1 is flagged (assigned a 1) for participant 1, Q2 is flagged for participant 2, and both Q1 and Q3 are flagged for participant 3, as the ratings have a difference > 1.

However, in my real data, there are not only 3 "Q" variables, there are many. Plus, I want this code to be used in a variety of situations where the number of Q variables will change. The user will specify the number_of_questions before running the code. I have been trying to figure out how to do this with a for loop but I cannot figure it out. This is as far as I've gotten:

number_of_questions &lt;- 3
questions &lt;- grep(&quot;Q&quot;, names(df), value=TRUE)
df &lt;-  df %&gt;% group_by(Participant_ID)
for(q in questions){
for(x in 1:number_of_questions){
    check_varname &lt;- paste0(&quot;Check_Q&quot;,x)
    
    df &lt;- df %&gt;% 
      mutate(!!check_varname := ifelse((abs(get(q)[1]-get(q)[2]) &gt; 1), 1, 0))   
}}
df &lt;-  df %&gt;% ungroup()

I don't get any errors, but the output is not correct. It is assigning a 1 to Q1, Q2, and Q3 for Participant_ID 3. Can anyone help me understand what I'm doing wrong?

答案1

得分: 4

你可以使用across函数和.names函数来实现这个目标。

df %>%
  mutate(across(starts_with("Q"), ~ +(abs(.[1] - .[2]) > 1), .names = "Check_{.col}"), .by = Participant_ID)

输出结果如下：

  Participant_ID Rater_ID Q1 Q2 Q3 Check_Q1 Check_Q2 Check_Q3
           <int> <chr>    <dbl> <dbl> <dbl>    <dbl>    <dbl>    <dbl>
1              1 A            5     4     4        1        0        0
2              2 A            2     2     3        0        1        0
3              3 A            1     2     3        1        0        1
4              1 B            3     3     3        1        0        0
5              2 B            3     5     4        0        1        0
6              3 B            4     2     5        1        0        1

across函数允许你指定多个问题，例如：

# 连续的问题
number_of_questions <- 2
qcols <- paste0("Q", seq_len(number_of_questions))
df %>%
  mutate(across(qcols, ~ +(abs(.x[1] - .x[2]) > 1), .names = "Check_{.col}"), .by = Participant_ID)
#  Participant_ID Rater_ID Q1 Q2 Q3 Check_Q1 Check_Q2
#            <int> <chr>    <dbl> <dbl> <dbl>    <int>    <int>
# 1              1 A            5     4     4        1        0
# 2              2 A            2     2     3        0        1
# 3              3 A            1     2     3        1        0
# 4              1 B            3     3     3        1        0
# 5              2 B            3     5     4        0        1
# 6              3 B            4     2     5        1        0
# 非连续的问题，可以这样指定列：
number_of_questions <- c(2, 6, 8)
qcols <- paste0("Q", number_of_questions)

（注意我假设所有问题都以"Q"开头）

英文:

You can do this using across with the .names function.

df %&gt;%
  mutate(across(starts_with(&quot;Q&quot;), ~ +(abs(.[1] - .[2]) &gt; 1), # thanks @r2evans for improved code
                .names = &quot;Check_{.col}&quot;), .by = Participant_ID)

Output:

  Participant_ID Rater_ID    Q1    Q2    Q3 Check_Q1 Check_Q2 Check_Q3
           &lt;int&gt; &lt;chr&gt;    &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
1              1 A            5     4     4        1        0        0
2              2 A            2     2     3        0        1        0
3              3 A            1     2     3        1        0        1
4              1 B            3     3     3        1        0        0
5              2 B            3     5     4        0        1        0
6              3 B            4     2     5        1        0        1

across will allow you to specify a variable numbers of questions, for instance:

# consecutive questions
number_of_questions &lt;- 2
qcols &lt;- paste0(&quot;Q&quot;, seq_len(number_of_questions))
df %&gt;%
  mutate(across(qcols, ~ +(abs(.x[1] - .x[2]) &gt; 1), 
                .names = &quot;Check_{.col}&quot;), .by = Participant_ID)
#  Participant_ID Rater_ID    Q1    Q2    Q3 Check_Q1 Check_Q2
#            &lt;int&gt; &lt;chr&gt;    &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;    &lt;int&gt;    &lt;int&gt;
# 1              1 A            5     4     4        1        0
# 2              2 A            2     2     3        0        1
# 3              3 A            1     2     3        1        0
# 4              1 B            3     3     3        1        0
# 5              2 B            3     5     4        0        1
# 6              3 B            4     2     5        1        0
# Alternative for non consecutive questions, 
# specify columns this way:
number_of_questions &lt;- c(2,6,8)
qcols &lt;- paste0(&quot;Q&quot;, number_of_questions)

(Note I assumed all questions started with "Q")

答案2

得分: 2

可能更高效和可读的方法是重新调整数据，使得每个问题在行上，评分者在列上：

df_long <- df %>% 
  pivot_longer(-c(Participant_ID, Rater_ID)) %>% 
  pivot_wider(names_from = Rater_ID, values_from = value)
# 结果如下：
#   Participant_ID name  A B
#   <int>          <chr> <dbl> <dbl>
# 1 1              Q1    5     3
# 2 1              Q2    4     3
# 3 1              Q3    4     3
# 4 2              Q1    2     3
# 5 2              Q2    2     5
# 6 2              Q3    3     4
# 7 3              Q1    1     4
# 8 3              Q2    2     2
# 9 3              Q3    3     5
从那里，很容易创建一个检查列：
```R
df_long %>% 
  mutate(check = abs(A - B) > 1)
# 结果如下：
#   Participant_ID name  A B check
#   <int>          <chr> <dbl> <dbl> <lgl>
# 1 1              Q1    5     3 TRUE 
# 2 1              Q2    4     3 FALSE
# 3 1              Q3    4     3 FALSE
# 4 2              Q1    2     3 FALSE
# 5 2              Q2    2     5 TRUE 
# 6 2              Q3    3     4 FALSE
# 7 3              Q1    1     4 TRUE 
# 8 3              Q2    2     2 FALSE
# 9 3              Q3    3     5 TRUE 
然后可以将其转换为更宽的格式：
```R
df_long %>% 
  mutate(check = abs(A - B) > 1) %>% 
  select(-c(A, B)) %>% 
  pivot_wider(names_from = name, values_from = check, names_prefix = 'check_')
# 结果如下：
#   Participant_ID check_Q1 check_Q2 check_Q3
#   <int>          <lgl>    <lgl>    <lgl>   
# 1 1              TRUE     FALSE    FALSE   
# 2 2              FALSE    TRUE     FALSE   
# 3 3              TRUE     FALSE    TRUE

英文:

It may be more efficient and/or readable to reshape the data so that each question is on the rows, and the raters are on the columns:

df_long &lt;- df %&gt;% 
  pivot_longer(-c(Participant_ID, Rater_ID)) %&gt;% 
  pivot_wider(names_from = Rater_ID, values_from = value)
  Participant_ID name      A     B
           &lt;int&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
1              1 Q1        5     3
2              1 Q2        4     3
3              1 Q3        4     3
4              2 Q1        2     3
5              2 Q2        2     5
6              2 Q3        3     4
7              3 Q1        1     4
8              3 Q2        2     2
9              3 Q3        3     5

From there, it's easy to create a check column:

df_long %&gt;% 
  mutate(check = abs(A - B) &gt; 1)
  Participant_ID name      A     B check
           &lt;int&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;lgl&gt;
1              1 Q1        5     3 TRUE 
2              1 Q2        4     3 FALSE
3              1 Q3        4     3 FALSE
4              2 Q1        2     3 FALSE
5              2 Q2        2     5 TRUE 
6              2 Q3        3     4 FALSE
7              3 Q1        1     4 TRUE 
8              3 Q2        2     2 FALSE
9              3 Q3        3     5 TRUE

And this could be pivoted into a wider format:

df_long %&gt;% 
  mutate(check = abs(A - B) &gt; 1) %&gt;% 
  select(-c(A, B)) %&gt;% 
  pivot_wider(names_from = name, values_from = check, names_prefix = &#39;check_&#39;)
  Participant_ID check_Q1 check_Q2 check_Q3
           &lt;int&gt; &lt;lgl&gt;    &lt;lgl&gt;    &lt;lgl&gt;   
1              1 TRUE     FALSE    FALSE   
2              2 FALSE    TRUE     FALSE   
3              3 TRUE     FALSE    TRUE

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在嵌套的for循环中使用mutate创建新变量？

问题

答案1

答案2

Java：无法在我的应用程序中继续使用System.out.println()。

如何在R中使用tabulator()重新排序flextable的标题？

'undefined columns selected'-error in R when expanding rows based on column values (not missing comma, not faulty heading)

ifelse() 从两个其他因子向量创建新的因子向量未返回预期值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。