英文:
How to impute a conditional row-wise imputation of a constant
问题
我是一个R新手,正在尝试编写似乎很简单的逻辑代码,但遇到了困难,希望能得到帮助!我正在尝试在我的数据集中为每行中的NA单元格填充常数值1,但仅针对包含2个或更少NA单元格的行。最终,我还将在填充后计算一列新的行均值。如果一行代码可以自动完成所有这些任务,那将非常好!
这是一个示例数据集供您参考。
tData <- data.frame(subID=c(1001,1002,1003,1004),
b1=c(1,1,2,NA),
b2=c(NA,1,1,NA),
b3=c(NA,2,2,NA),
b4=c(2,NA,1,NA))
我已经查看了各种基础和dplyr代码示例,但仍然困扰不解。
英文:
I am somewhat of an R newbie, am struggling with writing code for what seems like simple logic, and would appreciate any help! I am trying to impute a constant value of 1 for NA cells in each row of my data set but only for rows that have 2 or less NA cells. Ultimately, I will also be computing a new column with row-wise means after imputation. If one line of code code automagically achieve all of these things, that would be great!
Here is an example data set to work with.
tData <- data.frame(subID=c(1001,1002,1003,1004),
b1=c(1,1,2,NA),
b2=c(NA,1,1,NA),
b3=c(NA,2,2,NA),
b4=c(2,NA,1,NA))
I have been looking at various base and dplyr code examples but am riding the struggle bus.
答案1
得分: 2
你可以在以下两行代码中完成此操作。
tData[is.na(tData) & rowSums(is.na(tData)) <= 2] <- 1
tData |>
cbind(row_means=rowMeans(tData[-1]))
数据:
tData <- structure(list(subID = c(1001, 1002, 1003, 1004), b1 = c(1, 1, 2, NA), b2 = c(NA, 1, 1, NA), b3 = c(NA, 2, 2, NA), b4 = c(2, NA, 1, NA)), class = "data.frame", row.names = c(NA, -4L))
英文:
You can do this in these two lines.
tData[is.na(tData) & rowSums(is.na(tData)) <= 2] <- 1
tData |> cbind(row_means=rowMeans(tData[-1]))
# subID b1 b2 b3 b4 row_means
# 1 1001 1 1 1 2 1.25
# 2 1002 1 1 2 1 1.25
# 3 1003 2 1 2 1 1.50
# 4 1004 NA NA NA NA NA
Data:
tData <- structure(list(subID = c(1001, 1002, 1003, 1004), b1 = c(1, 1,
2, NA), b2 = c(NA, 1, 1, NA), b3 = c(NA, 2, 2, NA), b4 = c(2,
NA, 1, NA)), class = "data.frame", row.names = c(NA, -4L))
答案2
得分: 0
我们可以这样做:
library(dplyr)
tData %>%
mutate(across(-subID, ~ifelse(rowSums(is.na(tData[2:5])) <= 2 & is.na(.), 1, .))) %>%
rowwise() %>%
mutate(mean_value = mean(c_across(-subID), na.rm = TRUE))
subID b1 b2 b3 b4 mean_value
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1001 1 1 1 2 1.25
2 1002 1 1 2 1 1.25
3 1003 2 1 2 1 1.5
4 1004 NA NA NA NA NaN
英文:
We can do this like this:
library(dplyr)
tData %>%
mutate(across(-subID, ~ifelse(rowSums(is.na(tData[2:5])) <= 2 & is.na(.), 1, .))) %>%
rowwise() %>%
mutate(mean_value = mean(c_across(-subID), na.rm = TRUE))
subID b1 b2 b3 b4 mean_value
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1001 1 1 1 2 1.25
2 1002 1 1 2 1 1.25
3 1003 2 1 2 1 1.5
4 1004 NA NA NA NA NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论