英文:
Expanding values from multiple columns into binary values
问题
我有包含数字或NA值的列,我想重新排列它们,以使新列以不同的数字值命名,并且值为二进制的0或1值。
实际数据集包含许多其他变量,因此代码需要将函数细化到仅处理"stress"变量。
示例数据集:
df <- data.frame(
stress1 = c('A', 'A', 'B', 'A'),
stress2 = c(NA, 'B', 'C', 'B'),
stress3 = c(NA, NA, NA, 'C'),
stress4 = c(NA, NA, NA, 'D')
)
期望的结果:
desiredoutcome <- data.frame (
A = c(1, 1, 0, 1),
B = c(0, 1, 1, 1),
C = c(0, 0, 1, 1),
D = c(0, 0, 0, 1)
)
英文:
I have columns with either numeric or NA values that I want to reorder so that the new columns are named with the distinct numeric values, and the values are binary 0 or 1 values.
The actual dataset contains many other variables, so the code would need to refine the function only to the 'stress' variables.
Example dataset:
df <- data.frame(
stress1 = c('A', 'A', 'B', 'A'),
stress2 = c(NA, 'B', 'C', 'B'),
stress3 = c(NA, NA, NA, 'C'),
stress4 = c(NA, NA, NA, 'D')
)
Desired outcome:
desiredoutcome <- data.frame (
A = c(1, 1, 0, 1),
B = c(0, 1, 1, 1),
C = c(0, 0, 1, 1),
D = c(0, 0, 0, 1)
)
答案1
得分: 2
在 tidyr
+ dplyr
中:
library(tidyr)
library(dplyr)
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id, values_drop_na = TRUE) %>%
pivot_wider(names_from = "value", values_from = "name",
values_fill = 0, values_fn = length)
# id A B C D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 0
# 4 4 1 1 1 1
或者在基本 R 中:
df$ID <- seq_along(df)
table(cbind(df['ID'], unlist(df[1:4]))) |
as.data.frame.matrix()
# A B C D
# 1 1 0 0 0
# 2 1 1 0 0
# 3 0 1 1 0
# 4 1 1 1 1
英文:
In tidyr
+ dplyr
.
library(tidyr)
library(dplyr)
df %>%
mutate(id = row_number()) %>%
pivot_longer(-id, values_drop_na = TRUE) %>%
pivot_wider(names_from = "value", values_from = "name",
values_fill = 0, values_fn = length)
# id A B C D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 0
# 4 4 1 1 1 1
Or in base R:
df$ID <- seq_along(df)
table(cbind(df['ID'], unlist(df[1:4]))) |>
as.data.frame.matrix()
# A B C D
# 1 1 0 0 0
# 2 1 1 0 0
# 3 0 1 1 0
# 4 1 1 1 1
答案2
得分: 2
以下是您要翻译的内容:
Base R方式,通用于希望适用于任意数量不同字符串的实际数据的情况:
# 更新的示例数据框 - NA不应被引用,去除尾随逗号
df <- data.frame(
stress1 = c('A', 'A', 'B', 'A'),
stress2 = c(NA, 'B', 'C', 'B'),
stress3 = c(NA, NA, NA, 'C'),
stress4 = c(NA, NA, NA, 'D')
)
desiredout <- data.frame (
A = c(1, 1, 0, 1),
B = c(0, 1, 1, 1),
C = c(0, 0, 1, 1),
D = c(0, 0, 0, 1)
)
out = data.frame( # 用于数据框输出
lapply( # 迭代
unique(
unlist(df[!is.na(df)]) # df中的所有唯一非NA值
),
\(x) {
rowSums(df == x, na.rm = TRUE)
}
))
names(out) <- unique(unlist(df[!is.na(df)]))
检查:
> all.equal(desiredout, out)
[1] TRUE
以及:
> out
A B C D
1 1 0 0 0
2 1 1 0 0
3 0 1 1 0
4 1 1 1 1
如果您发现在最后一列中除了1或0以外的数字,那么可能有行中有多个相同字符串的实例 - 如果是这样,请回来,我们可以进行相应的编辑。
英文:
Base R way, generalised to hopefully work for real data with any number of different strings in:
# updated sample data frame - NA's should not be quoted, removed trailing comma
df <- data.frame(
stress1 = c('A', 'A', 'B', 'A'),
stress2 = c(NA, 'B', 'C', 'B'),
stress3 = c(NA, NA, NA, 'C'),
stress4 = c(NA, NA, NA, 'D')
)
desiredout <- data.frame (
A = c(1, 1, 0, 1),
B = c(0, 1, 1, 1),
C = c(0, 0, 1, 1),
D = c(0, 0, 0, 1)
)
out = data.frame( # for data frame output
lapply( # iterative
unique(
unlist(df[!is.na(df)]) # all unique, non-NA values in df
),
\(x) {
rowSums(df == x, na.rm = TRUE)
}
))
names(out) <- unique(unlist(df[!is.na(df)]))
check:
> all.equal(desiredout, out)
[1] TRUE
and:
> out
A B C D
1 1 0 0 0
2 1 1 0 0
3 0 1 1 0
4 1 1 1 1
If you find you have figures other than 1 or 0 in your final column then you may have rows with more than one instance of a given string - if so, come back and we can edit appropriately.
答案3
得分: 2
A purrr
solution with pmap_dfr
+ table
:
library(purrr)
pmap_dfr(df, ~ unclass(table(c(...)))) %>% replace(is.na(.), 0)
英文:
A purrr
solution with pmap_dfr
+ table
:
library(purrr)
pmap_dfr(df, ~ unclass(table(c(...)))) %>%
replace(is.na(.), 0)
# # A tibble: 4 × 4
# A B C D
# <int> <int> <int> <int>
# 1 1 0 0 0
# 2 1 1 0 0
# 3 0 1 1 0
# 4 1 1 1 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论