我可以重新编码一组列中的分数,基于与相关名称的另一组列上的分数吗?

huangapple go评论94阅读模式
英文:

How can I recode scores in a set columns based on the scores on another set of columns with related names?

问题

我有96个不同刺激的识别数据。我有a) 他们是否认出它,b) 他们对自己认出它的信心程度。例如,alligator_recognition,sheep_recognition,worm_recognition,alligator_recog_confidence,sheep_recog_confidence,worm_recog_confidence。我想要将数据分割,如果他们认出了它(在_recognition字段中编码为1),并且他们在信心上评分较高(在_recog_confidence字段上>7),适用于每个刺激。所以,如果在alligator_recognition上得分为1且在alligator_recog_confidence上得分超过7,我想要将alligator分割出来,对所有96个刺激进行相同操作。有没有关于如何高效实现这个目标的想法?

英文:

So I have recognition data for 96 different stimulus. I have a) whether they recognized it b) how confident they are that they recognized it. e.g., alligator_recognition, sheep_recognition, worm_recognition, alligator_recog_cofidence, sheep_recog_confidence, worm_recog_confidence.
I want to subset the data if they recognized it (coded 1 for _recognition) and if they rated it high on confidence (>7 on _recog_confidence) for each of the stimulus. So subset alligator if they scored a 1 on alligator_recognition and above 7 on alliator_recog_confidence for all the 96 stimulus. Any ideas on how I can do this efficiently?

I can subset the data for either of them using the grep function or subset each stimulus (1 for alligator_recognition and alligator_recog_confidence) and do it for each of the 96 then try to score them before merging them altogether but hoping for a more efficient way?

  1. alligator_recognition <- c(1,1,2,2,2,2,2,2,2)
  2. alligator_recog_confidence <- c(7,9,11,1,10,5,9,8,8)
  3. sheep_recognition <- c(2,2,1,2,2,1,2,2,2)
  4. sheep_recog_confidence <- c(3,8,1,2,9,3,8,11,5)
  5. worm_recognition <- c(2,2,1,2,2,1,2,1,1)
  6. worm_recog_confidence <- c(9,9,11,1,10,6,8,11,2)
  7. data <- data.frame(alligator_recognition, alligator_recog_confidence,
  8. sheep_recognition, sheep_recog_confidence, worm_recognition,
  9. worm_recog_confidence)

答案1

得分: 3

以下是代码的翻译部分:

可能的第一步是将您的数据转换为长格式,如下所示:

然后,您可以更容易地进行子集筛选:

英文:

Probably a good first step is to pivot your data to long, like so:

  1. library(dplyr)
  2. library(tidyr)
  3. data_long <-
  4. data %>%
  5. mutate(stimulus = row_number()) %>%
  6. pivot_longer(-stimulus,
  7. names_pattern = "(.*)_(recognition|recog_confidence)",
  8. names_to = c("species", ".value"),
  9. names_transform = list(species = factor))
  10. # # A tibble: 27 × 4
  11. # stimulus species recognition recog_confidence
  12. # <int> <fct> <dbl> <dbl>
  13. # 1 1 alligator 1 7
  14. # 2 1 sheep 2 3
  15. # 3 1 worm 2 9
  16. # 4 2 alligator 1 9
  17. # 5 2 sheep 2 8
  18. # 6 2 worm 2 9
  19. # 7 3 alligator 2 11
  20. # 8 3 sheep 1 1
  21. # 9 3 worm 1 11
  22. # 10 4 alligator 2 1
  23. # # … with 17 more rows
  24. # # ℹ Use `print(n = ...)` to see more rows

Then, you can subset more easily:

  1. data_long %>%
  2. filter(recognition == 1, recog_confidence >= 7) %>%
  3. split(.$species)
  4. # $alligator
  5. # # A tibble: 2 × 4
  6. # stimulus species recognition recog_confidence
  7. # <int> <fct> <dbl> <dbl>
  8. # 1 1 alligator 1 7
  9. # 2 2 alligator 1 9
  10. #
  11. # $sheep
  12. # # A tibble: 0 × 4
  13. # # … with 4 variables: stimulus <int>, species <fct>, recognition <dbl>,
  14. # # recog_confidence <dbl>
  15. # # ℹ Use `colnames()` to see all variable names
  16. #
  17. # $worm
  18. # # A tibble: 2 × 4
  19. # stimulus species recognition recog_confidence
  20. # <int> <fct> <dbl> <dbl>
  21. # 1 3 worm 1 11
  22. # 2 8 worm 1 11
  23. </details>
  24. # 答案2
  25. **得分**: 3
  26. 以下是您要翻译的代码部分:
  27. 也许,我们通过列名前缀,即物种,来拆分数据并循环遍历列表以过滤每个动物
  28. ```R
  29. library(dplyr)
  30. library(stringr)
  31. library(purrr)
  32. split.default(data, str_remove(names(data), "_.*")) %>%
  33. map(~ .x %>% filter(pick(1)[[1]] == 1, pick(2)[[1]] >= 7))

-output

  1. $alligator
  2. alligator_recognition alligator_recog_confidence
  3. 1 1 7
  4. 2 1 9
  5. $sheep
  6. [1] sheep_recognition sheep_recog_confidence
  7. <0 rows> (or 0-length row.names)
  8. $worm
  9. worm_recognition worm_recog_confidence
  10. 1 1 11
  11. 2 1 11
英文:

Perhaps, we split the data by the prefix of the column name i.e. species and loop over the list to filter each animal

  1. library(dplyr)
  2. library(stringr)
  3. library(purrr)
  4. split.default(data, str_remove(names(data), &quot;_.*&quot;)) |&gt;
  5. map(~ .x %&gt;% filter(pick(1)[[1]] == 1, pick(2)[[1]] &gt;= 7))

-output

  1. $alligator
  2. alligator_recognition alligator_recog_confidence
  3. 1 1 7
  4. 2 1 9
  5. $sheep
  6. [1] sheep_recognition sheep_recog_confidence
  7. &lt;0 rows&gt; (or 0-length row.names)
  8. $worm
  9. worm_recognition worm_recog_confidence
  10. 1 1 11
  11. 2 1 11

huangapple
  • 本文由 发表于 2023年5月17日 14:38:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76269175.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定