英文:
Using purrr to recode across multiple columns with multiple mappings
问题
我有一个包含问卷调查响应标签的数据框。我总是喜欢创建一个包含项目-答案定义的tibble,然后使用 dplyr::recode()
来将所有项目标签替换为它们相应的定义。为了方便使用,定义的 tibble recode_df
以字符串的形式存储这些对应关系,并在 dplyr::recode()
中可以使用三个感叹号 !!!
进行解包和评估。在以下的示例中,有 4 个项目,两个用于 qa
,两个用于 qb
,它们共享相同的答案定义。
library(tidyverse)
set.seed(42)
# 列以 `qa` 和 `qb` 开头,共享相同的答案结构
data_df <- tibble(
qa_1 = sample(c(0, 1), 5, replace = TRUE),
qa_2 = sample(c(0, 1), 5, replace = TRUE),
qb_1 = sample(1:5, 5, replace = TRUE),
qb_3 = sample(1:5, 5, replace = TRUE)
)
# `answer` 列存储用于 `dplyr::recode()` 的字符串定义
recode_df <- tibble(
question = c("qa", "qb"),
answer = c(
'c("0" = "foo0", "1" = "foo1")',
'c("1" = "bar1", "2" = "bar2", "3" = "bar3", "4" = "bar5", "5" = "bar5")'
)
)
# 期望的结果
data_df %>%
mutate(
across(
.cols = starts_with("qa"),
.fns = ~recode(., !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, "qa")]))
),
across(
.cols = starts_with("qb"),
.fns = ~recode(., !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, "qb")]))
)
)
在上面的示例中,我展示了如何使用 dplyr::mutate()
和 dplyr::across()
来根据 recode_df
中的定义对 qa
和 qb
的列进行重新编码。如果您希望使用 purrr
来优雅地迭代和重新编码,您可以尝试使用 purrr::map2()
函数。如果您需要更多的帮助,请随时提出问题。
英文:
I have a dataframe with questionnaire response labels. I always like to make a tibble with item-answer definitions and then use dplyr::recode()
to replace all item labels with their corresponding definitions. For ease of use the definitions tibble recode_df
stores these correspondences as strings and within dplyr::recode()
they can be unpacked with bangbangbang !!!
and evaluated. In the following toy example there are 4 items, two for qa
and two for qb
that share the same answer definitions.
library(tidyverse)
set.seed(42)
# columns starting with `qa` and `qb` share the same answer structure
data_df <- tibble(
qa_1 = sample(c(0, 1), 5, replace = TRUE),
qa_2 = sample(c(0, 1), 5, replace = TRUE),
qb_1 = sample(1:5, 5, replace = TRUE),
qb_3 = sample(1:5, 5, replace = TRUE)
)
# `answer` column stores string definitions for use with `dplyr::recode()`
recode_df <- tibble(
question = c("qa", "qb"),
answer = c(
'c("0" = "foo0", "1" = "foo1")',
'c("1" = "bar1", "2" = "bar2", "3" = "bar3", "4" = "bar5", "5" = "bar5")'
)
)
# Desired result
data_df %>%
mutate(
across(
.cols = starts_with("qa"),
.fns = ~recode(., !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, "qa")])))
),
across(
.cols = starts_with("qb"),
.fns = ~recode(., !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, "qb")])))
)
)
#> # A tibble: 5 x 4
#> qa_1 qa_2 qb_1 qb_3
#> <chr> <chr> <chr> <chr>
#> 1 foo0 foo1 bar5 bar2
#> 2 foo0 foo1 bar1 bar3
#> 3 foo0 foo1 bar5 bar1
#> 4 foo0 foo0 bar5 bar1
#> 5 foo1 foo1 bar2 bar3
<sup>Created on 2023-02-26 with reprex v2.0.2</sup>
I can reach my desired result by using one mutate()
and across
for each row of recode_df
, but I am sure there is an elegant purrr
solution that iterates and recodes without repeating code. Thank you.
答案1
得分: 2
以下是代码部分的中文翻译:
有一些备选方案需要考虑,尤其是如果要以不同形式存储您的答案关键信息。然而,鉴于目前的数据框,您可以尝试以下方法。使用 `map_dfc` 来将最终结果按列拼接。您可以将重新编码函数应用于字符值向量的每个元素,比如 "qa" 和 "qb"。如果这对您有帮助,请告诉我。
library(tidyverse)
map_dfc(
recode_df$question,
\(x) {
map(
select(data_df, contains(x)),
\(y) recode(y, !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, x)])))
)
}
)
输出结果
qa_1 qa_2 qb_1 qb_3
<chr> <chr> <chr> <chr>
1 foo0 foo1 bar5 bar2
2 foo0 foo1 bar1 bar3
3 foo0 foo1 bar5 bar1
4 foo0 foo0 bar5 bar1
5 foo1 foo1 bar2 bar3
英文:
There are a number of alternatives to consider, especially if storing your answer key in a different form. However, given the present data.frames, you could try the following. Using map_dfc
to column-bind your end result. You can apply your recoding function to each element of a vector of character values, such as "qa" and "qb". Let me know if this helps.
library(tidyverse)
map_dfc(
recode_df$question,
\(x) {
map(
select(data_df, contains(x)),
\(y) recode(y, !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, x)])))
)
}
)
Output
qa_1 qa_2 qb_1 qb_3
<chr> <chr> <chr> <chr>
1 foo0 foo1 bar5 bar2
2 foo0 foo1 bar1 bar3
3 foo0 foo1 bar5 bar1
4 foo0 foo0 bar5 bar1
5 foo1 foo1 bar2 bar3
答案2
得分: 1
data_df[] <- lapply(names(data_df), (x) if (grepl('qa', x)) paste0('foo', data_df[[x]]) else paste0('bar', data_df[[x]]))
如果有更多列,您可以使用一个简单的字典 dc
,其中包含以数据前缀为元素和列前缀为名称的命名向量。
dc <- c(qa='foo', qb='bar')
或者使用 grep
来识别列
dc <- setNames(c('foo', 'bar'), unique(gsub('_\d+$', '', names(data_df))))
现在,我们可以将列名和 dc
的名称传递给 startsWith
,以识别 dc
中的正确条目。
data_df[] <- lapply(names(data_df), (x) paste0(dc[startsWith(x, names(dc))], data_df[[x]]))
data_df
qa_1 qa_2 qb_1 qb_3
1 foo0 foo1 bar4 bar2
2 foo0 foo1 bar1 bar3
3 foo0 foo1 bar5 bar1
4 foo0 foo0 bar4 bar1
5 foo1 foo1 bar2 bar3
这也适用于具有数百列的情况。很难避免一次性定义翻译。
英文:
You can have that cheaper.
data_df[] <- lapply(names(data_df), \(x) if (grepl('qa', x)) paste0('foo', data_df[[x]]) else paste0('bar', data_df[[x]]))
If there are much more columns, you can use a simple dictionary dc
consisting of a named vector with data prefixes as elements and column prefixes as names.
dc <- c(qa='foo', qb='bar')
## alternatively using `grep` to identify columns
# dc <- setNames(c('foo', 'bar'), unique(gsub('_\\d+$', '', names(data_df))))
We can now feed startsWith
with names column name and dc
-name to identify the correct entry in dc
.
data_df[] <- lapply(names(data_df), \(x) paste0(dc[startsWith(x, names(dc))], data_df[[x]]))
data_df
# qa_1 qa_2 qb_1 qb_3
# 1 foo0 foo1 bar4 bar2
# 2 foo0 foo1 bar1 bar3
# 3 foo0 foo1 bar5 bar1
# 4 foo0 foo0 bar4 bar1
# 5 foo1 foo1 bar2 bar3
This should also work well with hundreds of columns. It might be hard to avoid to define the translation once.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论