我可以重新编码一组列中的分数,基于与相关名称的另一组列上的分数吗?

huangapple go评论70阅读模式
英文:

How can I recode scores in a set columns based on the scores on another set of columns with related names?

问题

我有96个不同刺激的识别数据。我有a) 他们是否认出它,b) 他们对自己认出它的信心程度。例如,alligator_recognition,sheep_recognition,worm_recognition,alligator_recog_confidence,sheep_recog_confidence,worm_recog_confidence。我想要将数据分割,如果他们认出了它(在_recognition字段中编码为1),并且他们在信心上评分较高(在_recog_confidence字段上>7),适用于每个刺激。所以,如果在alligator_recognition上得分为1且在alligator_recog_confidence上得分超过7,我想要将alligator分割出来,对所有96个刺激进行相同操作。有没有关于如何高效实现这个目标的想法?

英文:

So I have recognition data for 96 different stimulus. I have a) whether they recognized it b) how confident they are that they recognized it. e.g., alligator_recognition, sheep_recognition, worm_recognition, alligator_recog_cofidence, sheep_recog_confidence, worm_recog_confidence.
I want to subset the data if they recognized it (coded 1 for _recognition) and if they rated it high on confidence (>7 on _recog_confidence) for each of the stimulus. So subset alligator if they scored a 1 on alligator_recognition and above 7 on alliator_recog_confidence for all the 96 stimulus. Any ideas on how I can do this efficiently?

I can subset the data for either of them using the grep function or subset each stimulus (1 for alligator_recognition and alligator_recog_confidence) and do it for each of the 96 then try to score them before merging them altogether but hoping for a more efficient way?

alligator_recognition <- c(1,1,2,2,2,2,2,2,2) 
alligator_recog_confidence <- c(7,9,11,1,10,5,9,8,8)
sheep_recognition <- c(2,2,1,2,2,1,2,2,2)
sheep_recog_confidence <- c(3,8,1,2,9,3,8,11,5)
worm_recognition <- c(2,2,1,2,2,1,2,1,1)
worm_recog_confidence <- c(9,9,11,1,10,6,8,11,2)
data <- data.frame(alligator_recognition, alligator_recog_confidence, 
                   sheep_recognition, sheep_recog_confidence, worm_recognition,
                   worm_recog_confidence)

答案1

得分: 3

以下是代码的翻译部分:

可能的第一步是将您的数据转换为长格式,如下所示:

然后,您可以更容易地进行子集筛选:

英文:

Probably a good first step is to pivot your data to long, like so:

library(dplyr)
library(tidyr)

data_long <- 
  data %>% 
  mutate(stimulus = row_number()) %>% 
  pivot_longer(-stimulus,
               names_pattern = "(.*)_(recognition|recog_confidence)",
               names_to = c("species", ".value"),
               names_transform = list(species = factor))

# # A tibble: 27 × 4
#    stimulus species   recognition recog_confidence
#       <int> <fct>           <dbl>            <dbl>
#  1        1 alligator           1                7
#  2        1 sheep               2                3
#  3        1 worm                2                9
#  4        2 alligator           1                9
#  5        2 sheep               2                8
#  6        2 worm                2                9
#  7        3 alligator           2               11
#  8        3 sheep               1                1
#  9        3 worm                1               11
# 10        4 alligator           2                1
# # … with 17 more rows
# # ℹ Use `print(n = ...)` to see more rows

Then, you can subset more easily:

data_long %>% 
  filter(recognition == 1, recog_confidence >= 7) %>% 
  split(.$species)

# $alligator
# # A tibble: 2 × 4
#   stimulus species   recognition recog_confidence
#      <int> <fct>           <dbl>            <dbl>
# 1        1 alligator           1                7
# 2        2 alligator           1                9
# 
# $sheep
# # A tibble: 0 × 4
# # … with 4 variables: stimulus <int>, species <fct>, recognition <dbl>,
# #   recog_confidence <dbl>
# # ℹ Use `colnames()` to see all variable names
# 
# $worm
# # A tibble: 2 × 4
#   stimulus species recognition recog_confidence
#      <int> <fct>         <dbl>            <dbl>
# 1        3 worm              1               11
# 2        8 worm              1               11

</details>



# 答案2
**得分**: 3

以下是您要翻译的代码部分:

也许,我们通过列名前缀,即物种,来拆分数据并循环遍历列表以过滤每个动物

```R
library(dplyr)
library(stringr)
library(purrr)
split.default(data, str_remove(names(data), "_.*")) %>%
    map(~ .x %>% filter(pick(1)[[1]] == 1, pick(2)[[1]] >= 7))

-output

$alligator
  alligator_recognition alligator_recog_confidence
1                     1                          7
2                     1                          9

$sheep
[1] sheep_recognition      sheep_recog_confidence
<0 rows> (or 0-length row.names)

$worm
  worm_recognition worm_recog_confidence
1                1                    11
2                1                    11
英文:

Perhaps, we split the data by the prefix of the column name i.e. species and loop over the list to filter each animal

library(dplyr)
library(stringr)
library(purrr)
split.default(data, str_remove(names(data), &quot;_.*&quot;)) |&gt; 
    map(~ .x %&gt;% filter(pick(1)[[1]] == 1, pick(2)[[1]] &gt;= 7))

-output

$alligator
  alligator_recognition alligator_recog_confidence
1                     1                          7
2                     1                          9

$sheep
[1] sheep_recognition      sheep_recog_confidence
&lt;0 rows&gt; (or 0-length row.names)

$worm
  worm_recognition worm_recog_confidence
1                1                    11
2                1                    11

huangapple
  • 本文由 发表于 2023年5月17日 14:38:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76269175.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定