2023年2月26日 19:31:06go评论93阅读模式

英文:

Using purrr to recode across multiple columns with multiple mappings

问题

我有一个包含问卷调查响应标签的数据框。我总是喜欢创建一个包含项目-答案定义的tibble，然后使用 dplyr::recode() 来将所有项目标签替换为它们相应的定义。为了方便使用，定义的 tibble recode_df 以字符串的形式存储这些对应关系，并在 dplyr::recode() 中可以使用三个感叹号 !!! 进行解包和评估。在以下的示例中，有 4 个项目，两个用于 qa，两个用于 qb，它们共享相同的答案定义。

library(tidyverse)
set.seed(42)
# 列以 `qa` 和 `qb` 开头，共享相同的答案结构
data_df <- tibble(
  qa_1 = sample(c(0, 1), 5, replace = TRUE),
  qa_2 = sample(c(0, 1), 5, replace = TRUE),
  qb_1 = sample(1:5, 5, replace = TRUE),
  qb_3 = sample(1:5, 5, replace = TRUE)
)
# `answer` 列存储用于 `dplyr::recode()` 的字符串定义
recode_df <- tibble(
  question = c("qa", "qb"),
  answer = c(
    'c("0" = "foo0", "1" = "foo1")',
    'c("1" = "bar1", "2" = "bar2", "3" = "bar3", "4" = "bar5", "5" = "bar5")'
  )
)  
# 期望的结果
data_df %>%
  mutate(
    across(
      .cols = starts_with("qa"),
      .fns = ~recode(., !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, "qa")]))
    ),
    across(
      .cols = starts_with("qb"),
      .fns = ~recode(., !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, "qb")]))
    )
  )

在上面的示例中，我展示了如何使用 dplyr::mutate() 和 dplyr::across() 来根据 recode_df 中的定义对 qa 和 qb 的列进行重新编码。如果您希望使用 purrr 来优雅地迭代和重新编码，您可以尝试使用 purrr::map2() 函数。如果您需要更多的帮助，请随时提出问题。

英文:

I have a dataframe with questionnaire response labels. I always like to make a tibble with item-answer definitions and then use dplyr::recode() to replace all item labels with their corresponding definitions. For ease of use the definitions tibble recode_df stores these correspondences as strings and within dplyr::recode() they can be unpacked with bangbangbang !!! and evaluated. In the following toy example there are 4 items, two for qa and two for qb that share the same answer definitions.

library(tidyverse)
set.seed(42)
# columns starting with `qa` and `qb` share the same answer structure 
data_df &lt;- tibble(
  qa_1 = sample(c(0, 1), 5, replace = TRUE),
  qa_2 = sample(c(0, 1), 5, replace = TRUE),
  qb_1 = sample(1:5, 5, replace = TRUE),
  qb_3 = sample(1:5, 5, replace = TRUE)
)
# `answer` column stores string definitions for use with `dplyr::recode()`
recode_df &lt;- tibble(
  question = c(&quot;qa&quot;, &quot;qb&quot;),
  answer = c(
    &#39;c(&quot;0&quot; = &quot;foo0&quot;, &quot;1&quot; = &quot;foo1&quot;)&#39;,
    &#39;c(&quot;1&quot; = &quot;bar1&quot;, &quot;2&quot; = &quot;bar2&quot;, &quot;3&quot; = &quot;bar3&quot;, &quot;4&quot; = &quot;bar5&quot;, &quot;5&quot; = &quot;bar5&quot;)&#39;
  )
)  
# Desired result
data_df %&gt;%
  mutate(
    across(
      .cols = starts_with(&quot;qa&quot;),
      .fns = ~recode(., !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, &quot;qa&quot;)])))
    ),
    across(
      .cols = starts_with(&quot;qb&quot;),
      .fns = ~recode(., !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, &quot;qb&quot;)])))
    )
  )
#&gt; # A tibble: 5 x 4
#&gt;   qa_1  qa_2  qb_1  qb_3 
#&gt;   &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
#&gt; 1 foo0  foo1  bar5  bar2 
#&gt; 2 foo0  foo1  bar1  bar3 
#&gt; 3 foo0  foo1  bar5  bar1 
#&gt; 4 foo0  foo0  bar5  bar1 
#&gt; 5 foo1  foo1  bar2  bar3

<sup>Created on 2023-02-26 with reprex v2.0.2</sup>

I can reach my desired result by using one mutate() and across for each row of recode_df, but I am sure there is an elegant purrr solution that iterates and recodes without repeating code. Thank you.

答案1

得分: 2

以下是代码部分的中文翻译：

有一些备选方案需要考虑，尤其是如果要以不同形式存储您的答案关键信息。然而，鉴于目前的数据框，您可以尝试以下方法。使用 `map_dfc` 来将最终结果按列拼接。您可以将重新编码函数应用于字符值向量的每个元素，比如 "qa" 和 "qb"。如果这对您有帮助，请告诉我。
library(tidyverse)
map_dfc(
  recode_df$question,
  \(x) {
    map(
      select(data_df, contains(x)),
      \(y) recode(y, !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, x)])))
    )
  }
)

输出结果

  qa_1  qa_2  qb_1  qb_3 
  <chr> <chr> <chr> <chr>
1 foo0  foo1  bar5  bar2 
2 foo0  foo1  bar1  bar3 
3 foo0  foo1  bar5  bar1 
4 foo0  foo0  bar5  bar1 
5 foo1  foo1  bar2  bar3

英文:

There are a number of alternatives to consider, especially if storing your answer key in a different form. However, given the present data.frames, you could try the following. Using map_dfc to column-bind your end result. You can apply your recoding function to each element of a vector of character values, such as "qa" and "qb". Let me know if this helps.

library(tidyverse)
map_dfc(
  recode_df$question,
  \(x) {
    map(
      select(data_df, contains(x)),
      \(y) recode(y, !!!eval(parse(text = recode_df$answer[str_detect(recode_df$question, x)])))
    )
  }
)

Output

  qa_1  qa_2  qb_1  qb_3 
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
1 foo0  foo1  bar5  bar2 
2 foo0  foo1  bar1  bar3 
3 foo0  foo1  bar5  bar1 
4 foo0  foo0  bar5  bar1 
5 foo1  foo1  bar2  bar3

答案2

得分: 1

data_df[] <- lapply(names(data_df), (x) if (grepl('qa', x)) paste0('foo', data_df[[x]]) else paste0('bar', data_df[[x]]))

如果有更多列，您可以使用一个简单的字典 dc，其中包含以数据前缀为元素和列前缀为名称的命名向量。

dc <- c(qa='foo', qb='bar')

或者使用 `grep` 来识别列

dc <- setNames(c('foo', 'bar'), unique(gsub('_\d+$', '', names(data_df))))

现在，我们可以将列名和 dc 的名称传递给 startsWith，以识别 dc 中的正确条目。

data_df[] <- lapply(names(data_df), (x) paste0(dc[startsWith(x, names(dc))], data_df[[x]]))

data_df

qa_1 qa_2 qb_1 qb_3

1 foo0 foo1 bar4 bar2

2 foo0 foo1 bar1 bar3

3 foo0 foo1 bar5 bar1

4 foo0 foo0 bar4 bar1

5 foo1 foo1 bar2 bar3

这也适用于具有数百列的情况。很难避免一次性定义翻译。

英文:

You can have that cheaper.

data_df[] &lt;- lapply(names(data_df), \(x) if (grepl(&#39;qa&#39;, x)) paste0(&#39;foo&#39;, data_df[[x]]) else paste0(&#39;bar&#39;, data_df[[x]]))

If there are much more columns, you can use a simple dictionary dc consisting of a named vector with data prefixes as elements and column prefixes as names.

dc &lt;- c(qa=&#39;foo&#39;, qb=&#39;bar&#39;)
## alternatively using `grep` to identify columns
# dc &lt;- setNames(c(&#39;foo&#39;, &#39;bar&#39;), unique(gsub(&#39;_\\d+$&#39;, &#39;&#39;, names(data_df))))

We can now feed startsWith with names column name and dc-name to identify the correct entry in dc.

data_df[] &lt;- lapply(names(data_df), \(x) paste0(dc[startsWith(x, names(dc))], data_df[[x]]))
data_df
#   qa_1 qa_2 qb_1 qb_3
# 1 foo0 foo1 bar4 bar2
# 2 foo0 foo1 bar1 bar3
# 3 foo0 foo1 bar5 bar1
# 4 foo0 foo0 bar4 bar1
# 5 foo1 foo1 bar2 bar3

This should also work well with hundreds of columns. It might be hard to avoid to define the translation once.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用purrr在多个列上进行多个映射的重新编码。

问题

答案1

答案2

或者使用 `grep` 来识别列

dc <- setNames(c('foo', 'bar'), unique(gsub('_\d+$', '', names(data_df))))

qa_1 qa_2 qb_1 qb_3

1 foo0 foo1 bar4 bar2

2 foo0 foo1 bar1 bar3

3 foo0 foo1 bar5 bar1

4 foo0 foo0 bar4 bar1

5 foo1 foo1 bar2 bar3

计算嵌套/分割对象上的基尼指数

如何解决在R中使用gsub函数时出现的.checkTypos(e, names_x)错误。

生成新行并按顺序在R中填充它们。

get() calls on input$z column throwing error

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论

问题

答案1

答案2

或者使用 grep 来识别列

dc <- setNames(c('foo', 'bar'), unique(gsub('_\d+$', '', names(data_df))))

qa_1 qa_2 qb_1 qb_3

1 foo0 foo1 bar4 bar2

2 foo0 foo1 bar1 bar3

3 foo0 foo1 bar5 bar1

4 foo0 foo0 bar4 bar1

5 foo1 foo1 bar2 bar3

发表评论

或者使用 `grep` 来识别列