2023年7月17日 19:29:30go评论86阅读模式

英文:

Expanding values from multiple columns into binary values

问题

我有包含数字或NA值的列，我想重新排列它们，以使新列以不同的数字值命名，并且值为二进制的0或1值。

实际数据集包含许多其他变量，因此代码需要将函数细化到仅处理"stress"变量。

示例数据集：

df <- data.frame(
  stress1 = c('A', 'A', 'B', 'A'),
  stress2 = c(NA, 'B', 'C', 'B'),
  stress3 = c(NA, NA, NA, 'C'),
  stress4 = c(NA, NA, NA, 'D')
)

期望的结果：

desiredoutcome <- data.frame (
  A = c(1, 1, 0, 1),
  B = c(0, 1, 1, 1),
  C = c(0, 0, 1, 1),
  D = c(0, 0, 0, 1)
)

英文:

I have columns with either numeric or NA values that I want to reorder so that the new columns are named with the distinct numeric values, and the values are binary 0 or 1 values.

The actual dataset contains many other variables, so the code would need to refine the function only to the 'stress' variables.

Example dataset:

df &lt;- data.frame(
  stress1 = c(&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;A&#39;),
  stress2 = c(NA, &#39;B&#39;, &#39;C&#39;, &#39;B&#39;),
  stress3 = c(NA, NA, NA, &#39;C&#39;),
  stress4 = c(NA, NA, NA, &#39;D&#39;)
)

Desired outcome:

desiredoutcome &lt;- data.frame (
  A = c(1, 1, 0, 1),
  B = c(0, 1, 1, 1),
  C = c(0, 0, 1, 1),
  D = c(0, 0, 0, 1)
)

答案1

得分: 2

在 tidyr + dplyr 中：

library(tidyr)
library(dplyr)
df %>%
  mutate(id = row_number()) %>%
  pivot_longer(-id, values_drop_na = TRUE) %>%
  pivot_wider(names_from = "value", values_from = "name", 
              values_fill = 0, values_fn = length)
#   id A B C D
# 1  1 1 0 0 0
# 2  2 1 1 0 0
# 3  3 0 1 1 0
# 4  4 1 1 1 1

或者在基本 R 中：

df$ID <- seq_along(df)
table(cbind(df['ID'], unlist(df[1:4]))) |
  as.data.frame.matrix()
#   A B C D
# 1 1 0 0 0
# 2 1 1 0 0
# 3 0 1 1 0
# 4 1 1 1 1

英文:

In tidyr + dplyr.

library(tidyr)
library(dplyr)
df %&gt;% 
  mutate(id = row_number()) %&gt;% 
  pivot_longer(-id, values_drop_na = TRUE) %&gt;% 
  pivot_wider(names_from = &quot;value&quot;, values_from = &quot;name&quot;, 
              values_fill = 0, values_fn = length)
#   id A B C D
# 1  1 1 0 0 0
# 2  2 1 1 0 0
# 3  3 0 1 1 0
# 4  4 1 1 1 1

Or in base R:

df$ID &lt;- seq_along(df)
table(cbind(df[&#39;ID&#39;], unlist(df[1:4]))) |&gt;
  as.data.frame.matrix()
#   A B C D
# 1 1 0 0 0
# 2 1 1 0 0
# 3 0 1 1 0
# 4 1 1 1 1

答案2

得分: 2

以下是您要翻译的内容：

Base R方式，通用于希望适用于任意数量不同字符串的实际数据的情况：
# 更新的示例数据框 - NA不应被引用，去除尾随逗号
df <- data.frame(
  stress1 = c('A', 'A', 'B', 'A'),
  stress2 = c(NA, 'B', 'C', 'B'),
  stress3 = c(NA, NA, NA, 'C'),
  stress4 = c(NA, NA, NA, 'D')
)
desiredout <- data.frame (
  A = c(1, 1, 0, 1),
  B = c(0, 1, 1, 1),
  C = c(0, 0, 1, 1),
  D = c(0, 0, 0, 1)
)
out = data.frame( # 用于数据框输出
  lapply( # 迭代
    unique(
      unlist(df[!is.na(df)]) # df中的所有唯一非NA值
      ),
    \(x) {
      rowSums(df == x, na.rm = TRUE)
    }
    ))
            
names(out) <- unique(unlist(df[!is.na(df)]))

检查：

> all.equal(desiredout, out)
[1] TRUE

以及：

如果您发现在最后一列中除了1或0以外的数字，那么可能有行中有多个相同字符串的实例 - 如果是这样，请回来，我们可以进行相应的编辑。

英文:

Base R way, generalised to hopefully work for real data with any number of different strings in:

# updated sample data frame - NA&#39;s should not be quoted, removed trailing comma
df &lt;- data.frame(
  stress1 = c(&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;A&#39;),
  stress2 = c(NA, &#39;B&#39;, &#39;C&#39;, &#39;B&#39;),
  stress3 = c(NA, NA, NA, &#39;C&#39;),
  stress4 = c(NA, NA, NA, &#39;D&#39;)
)
desiredout &lt;- data.frame (
  A = c(1, 1, 0, 1),
  B = c(0, 1, 1, 1),
  C = c(0, 0, 1, 1),
  D = c(0, 0, 0, 1)
)
out = data.frame( # for data frame output
  lapply( # iterative
    unique(
      unlist(df[!is.na(df)]) # all unique, non-NA values in df 
      ),
    \(x) {
      rowSums(df == x, na.rm = TRUE)
    }
    ))
            
names(out) &lt;- unique(unlist(df[!is.na(df)]))

check:

&gt; all.equal(desiredout, out)
[1] TRUE

and:

If you find you have figures other than 1 or 0 in your final column then you may have rows with more than one instance of a given string - if so, come back and we can edit appropriately.

答案3

得分: 2

A purrr solution with pmap_dfr + table:

library(purrr)
pmap_dfr(df, ~ unclass(table(c(...)))) %>% replace(is.na(.), 0)

英文:

A purrr solution with pmap_dfr + table:

library(purrr)
pmap_dfr(df, ~ unclass(table(c(...)))) %&gt;%
  replace(is.na(.), 0)
# # A tibble: 4 &#215; 4
#       A     B     C     D
#   &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
# 1     1     0     0     0
# 2     1     1     0     0
# 3     0     1     1     0
# 4     1     1     1     1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将多列中的值扩展为二进制值

问题

答案1

答案2

答案3

在R中使用geom_sf绘制shapefiles时遇到问题。

在tidyverse中R中每个组的累积总和

通过将位置表示为整数向量来更新嵌套列表中的值。

高效地找到最后一个连续的1序列中的第一个1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论