将多列中的值扩展为二进制值

huangapple go评论70阅读模式
英文:

Expanding values from multiple columns into binary values

问题

我有包含数字或NA值的列,我想重新排列它们,以使新列以不同的数字值命名,并且值为二进制的0或1值。

实际数据集包含许多其他变量,因此代码需要将函数细化到仅处理"stress"变量。

示例数据集:

df <- data.frame(
  stress1 = c('A', 'A', 'B', 'A'),
  stress2 = c(NA, 'B', 'C', 'B'),
  stress3 = c(NA, NA, NA, 'C'),
  stress4 = c(NA, NA, NA, 'D')
)

期望的结果:

desiredoutcome <- data.frame (
  A = c(1, 1, 0, 1),
  B = c(0, 1, 1, 1),
  C = c(0, 0, 1, 1),
  D = c(0, 0, 0, 1)
)
英文:

I have columns with either numeric or NA values that I want to reorder so that the new columns are named with the distinct numeric values, and the values are binary 0 or 1 values.

The actual dataset contains many other variables, so the code would need to refine the function only to the 'stress' variables.

Example dataset:

df &lt;- data.frame(
  stress1 = c(&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;A&#39;),
  stress2 = c(NA, &#39;B&#39;, &#39;C&#39;, &#39;B&#39;),
  stress3 = c(NA, NA, NA, &#39;C&#39;),
  stress4 = c(NA, NA, NA, &#39;D&#39;)
)

Desired outcome:

desiredoutcome &lt;- data.frame (
  A = c(1, 1, 0, 1),
  B = c(0, 1, 1, 1),
  C = c(0, 0, 1, 1),
  D = c(0, 0, 0, 1)
)

答案1

得分: 2

tidyr + dplyr 中:

library(tidyr)
library(dplyr)

df %>%
  mutate(id = row_number()) %>%
  pivot_longer(-id, values_drop_na = TRUE) %>%
  pivot_wider(names_from = "value", values_from = "name", 
              values_fill = 0, values_fn = length)

#   id A B C D
# 1  1 1 0 0 0
# 2  2 1 1 0 0
# 3  3 0 1 1 0
# 4  4 1 1 1 1

或者在基本 R 中:

df$ID <- seq_along(df)
table(cbind(df['ID'], unlist(df[1:4]))) |
  as.data.frame.matrix()

#   A B C D
# 1 1 0 0 0
# 2 1 1 0 0
# 3 0 1 1 0
# 4 1 1 1 1
英文:

In tidyr + dplyr.

library(tidyr)
library(dplyr)

df %&gt;% 
  mutate(id = row_number()) %&gt;% 
  pivot_longer(-id, values_drop_na = TRUE) %&gt;% 
  pivot_wider(names_from = &quot;value&quot;, values_from = &quot;name&quot;, 
              values_fill = 0, values_fn = length)

#   id A B C D
# 1  1 1 0 0 0
# 2  2 1 1 0 0
# 3  3 0 1 1 0
# 4  4 1 1 1 1

Or in base R:

df$ID &lt;- seq_along(df)
table(cbind(df[&#39;ID&#39;], unlist(df[1:4]))) |&gt;
  as.data.frame.matrix()

#   A B C D
# 1 1 0 0 0
# 2 1 1 0 0
# 3 0 1 1 0
# 4 1 1 1 1

答案2

得分: 2

以下是您要翻译的内容:

Base R方式,通用于希望适用于任意数量不同字符串的实际数据的情况:

# 更新的示例数据框 - NA不应被引用,去除尾随逗号
df <- data.frame(
  stress1 = c('A', 'A', 'B', 'A'),
  stress2 = c(NA, 'B', 'C', 'B'),
  stress3 = c(NA, NA, NA, 'C'),
  stress4 = c(NA, NA, NA, 'D')
)

desiredout <- data.frame (
  A = c(1, 1, 0, 1),
  B = c(0, 1, 1, 1),
  C = c(0, 0, 1, 1),
  D = c(0, 0, 0, 1)
)

out = data.frame( # 用于数据框输出
  lapply( # 迭代
    unique(
      unlist(df[!is.na(df)]) # df中的所有唯一非NA值
      ),
    \(x) {
      rowSums(df == x, na.rm = TRUE)
    }
    ))
            
names(out) <- unique(unlist(df[!is.na(df)]))

检查:

> all.equal(desiredout, out)
[1] TRUE

以及:

> out
  A B C D
1 1 0 0 0
2 1 1 0 0
3 0 1 1 0
4 1 1 1 1

如果您发现在最后一列中除了1或0以外的数字,那么可能有行中有多个相同字符串的实例 - 如果是这样,请回来,我们可以进行相应的编辑。

英文:

Base R way, generalised to hopefully work for real data with any number of different strings in:

# updated sample data frame - NA&#39;s should not be quoted, removed trailing comma
df &lt;- data.frame(
  stress1 = c(&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;A&#39;),
  stress2 = c(NA, &#39;B&#39;, &#39;C&#39;, &#39;B&#39;),
  stress3 = c(NA, NA, NA, &#39;C&#39;),
  stress4 = c(NA, NA, NA, &#39;D&#39;)
)


desiredout &lt;- data.frame (
  A = c(1, 1, 0, 1),
  B = c(0, 1, 1, 1),
  C = c(0, 0, 1, 1),
  D = c(0, 0, 0, 1)
)


out = data.frame( # for data frame output
  lapply( # iterative
    unique(
      unlist(df[!is.na(df)]) # all unique, non-NA values in df 
      ),
    \(x) {
      rowSums(df == x, na.rm = TRUE)
    }
    ))
            
names(out) &lt;- unique(unlist(df[!is.na(df)]))

check:

&gt; all.equal(desiredout, out)
[1] TRUE

and:

&gt; out
  A B C D
1 1 0 0 0
2 1 1 0 0
3 0 1 1 0
4 1 1 1 1

If you find you have figures other than 1 or 0 in your final column then you may have rows with more than one instance of a given string - if so, come back and we can edit appropriately.

答案3

得分: 2

A purrr solution with pmap_dfr + table:

library(purrr)

pmap_dfr(df, ~ unclass(table(c(...)))) %>% replace(is.na(.), 0)
英文:

A purrr solution with pmap_dfr + table:

library(purrr)

pmap_dfr(df, ~ unclass(table(c(...)))) %&gt;%
  replace(is.na(.), 0)

# # A tibble: 4 &#215; 4
#       A     B     C     D
#   &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
# 1     1     0     0     0
# 2     1     1     0     0
# 3     0     1     1     0
# 4     1     1     1     1

huangapple
  • 本文由 发表于 2023年7月17日 19:29:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76704015.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定