如何对条件行进行常数的逐行填充。

huangapple go评论88阅读模式
英文:

How to impute a conditional row-wise imputation of a constant

问题

我是一个R新手,正在尝试编写似乎很简单的逻辑代码,但遇到了困难,希望能得到帮助!我正在尝试在我的数据集中为每行中的NA单元格填充常数值1,但仅针对包含2个或更少NA单元格的行。最终,我还将在填充后计算一列新的行均值。如果一行代码可以自动完成所有这些任务,那将非常好!

这是一个示例数据集供您参考。

  1. tData <- data.frame(subID=c(1001,1002,1003,1004),
  2. b1=c(1,1,2,NA),
  3. b2=c(NA,1,1,NA),
  4. b3=c(NA,2,2,NA),
  5. b4=c(2,NA,1,NA))

我已经查看了各种基础和dplyr代码示例,但仍然困扰不解。

英文:

I am somewhat of an R newbie, am struggling with writing code for what seems like simple logic, and would appreciate any help! I am trying to impute a constant value of 1 for NA cells in each row of my data set but only for rows that have 2 or less NA cells. Ultimately, I will also be computing a new column with row-wise means after imputation. If one line of code code automagically achieve all of these things, that would be great!

Here is an example data set to work with.

  1. tData &lt;- data.frame(subID=c(1001,1002,1003,1004),
  2. b1=c(1,1,2,NA),
  3. b2=c(NA,1,1,NA),
  4. b3=c(NA,2,2,NA),
  5. b4=c(2,NA,1,NA))

I have been looking at various base and dplyr code examples but am riding the struggle bus.

答案1

得分: 2

你可以在以下两行代码中完成此操作。

  1. tData[is.na(tData) & rowSums(is.na(tData)) <= 2] <- 1
  2. tData |>
  3. cbind(row_means=rowMeans(tData[-1]))

数据:

  1. tData <- structure(list(subID = c(1001, 1002, 1003, 1004), b1 = c(1, 1, 2, NA), b2 = c(NA, 1, 1, NA), b3 = c(NA, 2, 2, NA), b4 = c(2, NA, 1, NA)), class = "data.frame", row.names = c(NA, -4L))
英文:

You can do this in these two lines.

  1. tData[is.na(tData) &amp; rowSums(is.na(tData)) &lt;= 2] &lt;- 1
  2. tData |&gt; cbind(row_means=rowMeans(tData[-1]))
  3. # subID b1 b2 b3 b4 row_means
  4. # 1 1001 1 1 1 2 1.25
  5. # 2 1002 1 1 2 1 1.25
  6. # 3 1003 2 1 2 1 1.50
  7. # 4 1004 NA NA NA NA NA

Data:

  1. tData &lt;- structure(list(subID = c(1001, 1002, 1003, 1004), b1 = c(1, 1,
  2. 2, NA), b2 = c(NA, 1, 1, NA), b3 = c(NA, 2, 2, NA), b4 = c(2,
  3. NA, 1, NA)), class = &quot;data.frame&quot;, row.names = c(NA, -4L))

答案2

得分: 0

我们可以这样做:

  1. library(dplyr)
  2. tData %>%
  3. mutate(across(-subID, ~ifelse(rowSums(is.na(tData[2:5])) <= 2 & is.na(.), 1, .))) %>%
  4. rowwise() %>%
  5. mutate(mean_value = mean(c_across(-subID), na.rm = TRUE))
  1. subID b1 b2 b3 b4 mean_value
  2. <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
  3. 1 1001 1 1 1 2 1.25
  4. 2 1002 1 1 2 1 1.25
  5. 3 1003 2 1 2 1 1.5
  6. 4 1004 NA NA NA NA NaN
英文:

We can do this like this:

  1. library(dplyr)
  2. tData %&gt;%
  3. mutate(across(-subID, ~ifelse(rowSums(is.na(tData[2:5])) &lt;= 2 &amp; is.na(.), 1, .))) %&gt;%
  4. rowwise() %&gt;%
  5. mutate(mean_value = mean(c_across(-subID), na.rm = TRUE))
  1. subID b1 b2 b3 b4 mean_value
  2. &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  3. 1 1001 1 1 1 2 1.25
  4. 2 1002 1 1 2 1 1.25
  5. 3 1003 2 1 2 1 1.5
  6. 4 1004 NA NA NA NA NaN

huangapple
  • 本文由 发表于 2023年7月23日 21:41:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76748553.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定