英文:
How can I use mutate to create new variables in nested for loops?
问题
我有一个数据集,其中包含唯一的参与者ID,每个参与者都由两个不同的评分者ID对许多不同的变量(这里是Q1、Q2和Q3)进行评分。我想找到一种方法来计算一个变量,该变量指示两个评分者的评分是否相差1分。以下是我正在使用的数据的简化版本:
library(tidyverse)
Participant_ID <- rep(1:3,2)
Rater_ID <- c(rep("A",3),rep("B",3))
Q1 <- c(5, 2, 1,3, 3, 4)
Q2 <- c(4, 2, 2,3, 5, 2)
Q3 <- c(4, 3, 3,3, 4, 5)
df <- tibble(Participant_ID, Rater_ID, Q1, Q2, Q3)
我可以通过使用以下代码的每个迭代来实现:
df <- df %>% group_by(Participant_ID) %>%
mutate(Check_Q1= ifelse((abs(Q1[1]-Q1[2]) > 1), 1, 0),
Check_Q2= ifelse((abs(Q2[1]-Q2[2]) > 1), 1, 0),
Check_Q3= ifelse((abs(Q3[1]-Q3[2]) > 1), 1, 0)) %>% ungroup()
对于参与者1,Q1被标记为1,对于参与者2,Q2被标记为1,对于参与者3,Q1和Q3都被标记为1,因为评分之间的差异大于1。
然而,在我的真实数据中,不仅有3个"Q"变量,还有很多。此外,我希望这段代码能够在各种情况下使用,其中Q变量的数量将会改变。用户将在运行代码之前指定number_of_questions。我一直在尝试弄清楚如何使用for循环来实现这一点,但我无法弄清楚。我目前的进展如下:
number_of_questions <- 3
questions <- grep("Q", names(df), value=TRUE)
df <- df %>% group_by(Participant_ID)
for(q in questions){
for(x in 1:number_of_questions){
check_varname <- paste0("Check_Q",x)
df <- df %>%
mutate(!!check_varname := ifelse((abs(get(q)[1]-get(q)[2]) > 1), 1, 0))
}}
df <- df %>% ungroup()
我没有收到任何错误,但输出结果不正确。它为Participant_ID 3分配了1个Q1、Q2和Q3。有人可以帮我理解我做错了什么吗?
英文:
I have a dataset with unique Participant_IDs that are each rated by two different Rater_IDs on many different variables (Q1, Q2, and Q3 here). I am trying to find a way to compute a variable which indicates whether the two raters' ratings are within 1 point of each other. Here's a simplified version of the data I'm working with:
library(tidyverse)
Participant_ID <- rep(1:3,2)
Rater_ID <- c(rep("A",3),rep("B",3))
Q1 <- c(5, 2, 1,3, 3, 4)
Q2 <- c(4, 2, 2,3, 5, 2)
Q3 <- c(4, 3, 3,3, 4, 5)
df <- tibble(Participant_ID, Rater_ID, Q1, Q2, Q3)
I am able to do this by spelling out each iteration of the code using below:
df <- df %>% group_by(Participant_ID) %>%
mutate(Check_Q1= ifelse((abs(Q1[1]-Q1[2]) > 1), 1, 0),
Check_Q2= ifelse((abs(Q2[1]-Q2[2]) > 1), 1, 0),
Check_Q3= ifelse((abs(Q3[1]-Q3[2]) > 1), 1, 0)) %>% ungroup()
Q1 is flagged (assigned a 1) for participant 1, Q2 is flagged for participant 2, and both Q1 and Q3 are flagged for participant 3, as the ratings have a difference > 1.
However, in my real data, there are not only 3 "Q" variables, there are many. Plus, I want this code to be used in a variety of situations where the number of Q variables will change. The user will specify the number_of_questions before running the code. I have been trying to figure out how to do this with a for loop but I cannot figure it out. This is as far as I've gotten:
number_of_questions <- 3
questions <- grep("Q", names(df), value=TRUE)
df <- df %>% group_by(Participant_ID)
for(q in questions){
for(x in 1:number_of_questions){
check_varname <- paste0("Check_Q",x)
df <- df %>%
mutate(!!check_varname := ifelse((abs(get(q)[1]-get(q)[2]) > 1), 1, 0))
}}
df <- df %>% ungroup()
I don't get any errors, but the output is not correct. It is assigning a 1 to Q1, Q2, and Q3 for Participant_ID 3. Can anyone help me understand what I'm doing wrong?
答案1
得分: 4
你可以使用across
函数和.names
函数来实现这个目标。
df %>%
mutate(across(starts_with("Q"), ~ +(abs(.[1] - .[2]) > 1), .names = "Check_{.col}"), .by = Participant_ID)
输出结果如下:
Participant_ID Rater_ID Q1 Q2 Q3 Check_Q1 Check_Q2 Check_Q3
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 5 4 4 1 0 0
2 2 A 2 2 3 0 1 0
3 3 A 1 2 3 1 0 1
4 1 B 3 3 3 1 0 0
5 2 B 3 5 4 0 1 0
6 3 B 4 2 5 1 0 1
across
函数允许你指定多个问题,例如:
# 连续的问题
number_of_questions <- 2
qcols <- paste0("Q", seq_len(number_of_questions))
df %>%
mutate(across(qcols, ~ +(abs(.x[1] - .x[2]) > 1), .names = "Check_{.col}"), .by = Participant_ID)
# Participant_ID Rater_ID Q1 Q2 Q3 Check_Q1 Check_Q2
# <int> <chr> <dbl> <dbl> <dbl> <int> <int>
# 1 1 A 5 4 4 1 0
# 2 2 A 2 2 3 0 1
# 3 3 A 1 2 3 1 0
# 4 1 B 3 3 3 1 0
# 5 2 B 3 5 4 0 1
# 6 3 B 4 2 5 1 0
# 非连续的问题,可以这样指定列:
number_of_questions <- c(2, 6, 8)
qcols <- paste0("Q", number_of_questions)
(注意我假设所有问题都以"Q"开头)
英文:
You can do this using across
with the .names
function.
df %>%
mutate(across(starts_with("Q"), ~ +(abs(.[1] - .[2]) > 1), # thanks @r2evans for improved code
.names = "Check_{.col}"), .by = Participant_ID)
Output:
Participant_ID Rater_ID Q1 Q2 Q3 Check_Q1 Check_Q2 Check_Q3
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 A 5 4 4 1 0 0
2 2 A 2 2 3 0 1 0
3 3 A 1 2 3 1 0 1
4 1 B 3 3 3 1 0 0
5 2 B 3 5 4 0 1 0
6 3 B 4 2 5 1 0 1
across
will allow you to specify a variable numbers of questions, for instance:
# consecutive questions
number_of_questions <- 2
qcols <- paste0("Q", seq_len(number_of_questions))
df %>%
mutate(across(qcols, ~ +(abs(.x[1] - .x[2]) > 1),
.names = "Check_{.col}"), .by = Participant_ID)
# Participant_ID Rater_ID Q1 Q2 Q3 Check_Q1 Check_Q2
# <int> <chr> <dbl> <dbl> <dbl> <int> <int>
# 1 1 A 5 4 4 1 0
# 2 2 A 2 2 3 0 1
# 3 3 A 1 2 3 1 0
# 4 1 B 3 3 3 1 0
# 5 2 B 3 5 4 0 1
# 6 3 B 4 2 5 1 0
# Alternative for non consecutive questions,
# specify columns this way:
number_of_questions <- c(2,6,8)
qcols <- paste0("Q", number_of_questions)
(Note I assumed all questions started with "Q")
答案2
得分: 2
可能更高效和可读的方法是重新调整数据,使得每个问题在行上,评分者在列上:
df_long <- df %>%
pivot_longer(-c(Participant_ID, Rater_ID)) %>%
pivot_wider(names_from = Rater_ID, values_from = value)
# 结果如下:
# Participant_ID name A B
# <int> <chr> <dbl> <dbl>
# 1 1 Q1 5 3
# 2 1 Q2 4 3
# 3 1 Q3 4 3
# 4 2 Q1 2 3
# 5 2 Q2 2 5
# 6 2 Q3 3 4
# 7 3 Q1 1 4
# 8 3 Q2 2 2
# 9 3 Q3 3 5
从那里,很容易创建一个检查列:
```R
df_long %>%
mutate(check = abs(A - B) > 1)
# 结果如下:
# Participant_ID name A B check
# <int> <chr> <dbl> <dbl> <lgl>
# 1 1 Q1 5 3 TRUE
# 2 1 Q2 4 3 FALSE
# 3 1 Q3 4 3 FALSE
# 4 2 Q1 2 3 FALSE
# 5 2 Q2 2 5 TRUE
# 6 2 Q3 3 4 FALSE
# 7 3 Q1 1 4 TRUE
# 8 3 Q2 2 2 FALSE
# 9 3 Q3 3 5 TRUE
然后可以将其转换为更宽的格式:
```R
df_long %>%
mutate(check = abs(A - B) > 1) %>%
select(-c(A, B)) %>%
pivot_wider(names_from = name, values_from = check, names_prefix = 'check_')
# 结果如下:
# Participant_ID check_Q1 check_Q2 check_Q3
# <int> <lgl> <lgl> <lgl>
# 1 1 TRUE FALSE FALSE
# 2 2 FALSE TRUE FALSE
# 3 3 TRUE FALSE TRUE
英文:
It may be more efficient and/or readable to reshape the data so that each question is on the rows, and the raters are on the columns:
df_long <- df %>%
pivot_longer(-c(Participant_ID, Rater_ID)) %>%
pivot_wider(names_from = Rater_ID, values_from = value)
Participant_ID name A B
<int> <chr> <dbl> <dbl>
1 1 Q1 5 3
2 1 Q2 4 3
3 1 Q3 4 3
4 2 Q1 2 3
5 2 Q2 2 5
6 2 Q3 3 4
7 3 Q1 1 4
8 3 Q2 2 2
9 3 Q3 3 5
From there, it's easy to create a check column:
df_long %>%
mutate(check = abs(A - B) > 1)
Participant_ID name A B check
<int> <chr> <dbl> <dbl> <lgl>
1 1 Q1 5 3 TRUE
2 1 Q2 4 3 FALSE
3 1 Q3 4 3 FALSE
4 2 Q1 2 3 FALSE
5 2 Q2 2 5 TRUE
6 2 Q3 3 4 FALSE
7 3 Q1 1 4 TRUE
8 3 Q2 2 2 FALSE
9 3 Q3 3 5 TRUE
And this could be pivoted into a wider format:
df_long %>%
mutate(check = abs(A - B) > 1) %>%
select(-c(A, B)) %>%
pivot_wider(names_from = name, values_from = check, names_prefix = 'check_')
Participant_ID check_Q1 check_Q2 check_Q3
<int> <lgl> <lgl> <lgl>
1 1 TRUE FALSE FALSE
2 2 FALSE TRUE FALSE
3 3 TRUE FALSE TRUE
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论