英文:
In R: How to apply a function to a column given the value of another column
问题
我有一个包含1和NA的数据框。我想将NA替换为零,但如果整行都是NA,那么不替换,因为这表示真正的NA。
例如,这是一个简化的数据框:
A <- c(NA, NA, 1, 1)
B <- c(NA, NA, NA, 1)
C <- c(1, NA, NA, NA)
df <- data.frame(A, B, C)
df$D <- ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1, 0)
列A、B和C要么是1,要么是空白(NA)。我想用零(0)替换空白,但当A、B和C都为空白时不替换。我已经创建了列D作为指示器,表示A、B和C中是否有任何数据。现在我需要一段代码来替换NA为零。希望这样能明白我的意思。
我希望输出看起来像这样:
A B C D
0 0 1 1
0
1 0 0 1
1 1 0 1
我使用以下代码来生成列D:
df$D <- ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1, 0)
英文:
I have a dataframe with 1s and NAs. I would like to replace the NAs with zeros but not if the entire row is NA as this would indicate a true NA.
For example, here is a simplified data frame:
A<-c(NA,NA,1,1)
B<-c(NA,NA,NA,1)
C<-c(1,NA,NA,NA)
df<-data.frame(A,B,C)
df$D<-ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1,0)
A B C
NA NA 1
NA NA NA
1 NA NA
1 1 NA
Columns A B and C have either 1s or blanks (NA). I would like to replace the blanks with zeros (0), but NOT when A, B, and C are all blank. I have created column D as an indicator of whether or not there is any data in A B C. Now I need a code to replace NA with zero. I hope this makes sense.
I am hoping the output will look like this:
A B C D
0 0 1 1
0
1 0 0 1
1 1 0 1
I used the following code to produce column D
df$D<-ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1,0)
答案1
得分: 2
基于rowSums
的结果的方法(受@Ritchie Sacramento的提示)
replace(df, rowSums(df, na.rm = T) > 0 & is.na(df), 0)
A B C
1 0 0 1
2 NA NA NA
3 1 0 0
4 1 1 0
英文:
An approach based on the result of a rowSums
(with hint from @Ritchie Sacramento)
replace(df, rowSums(df, na.rm = T) > 0 & is.na(df), 0)
A B C
1 0 0 1
2 NA NA NA
3 1 0 0
4 1 1 0
答案2
得分: 1
一个简单的解决方案是捕获所有行都是`NA`的行,将所有的`NA`替换为零,然后再重新填充`NA`:
```r
all_na <- apply(is.na(df), 1, all)
df[is.na(df)] <- 0
df[all_na,] <- NA
否则,您可以尝试像这样做:
data.frame(t(apply(df, 1, \(x) if (all(is.na(x))) x else replace(x, is.na(x), 0))))
# A B C
# 1 0 0 1
# 2 NA NA NA
# 3 1 0 0
# 4 1 1 0
<details>
<summary>英文:</summary>
A simple solution would be to capture the rows that are all `NA`, replace all the `NA` with zero, and then go back and re-populate the `NA`:
```r
all_na <- apply(is.na(df), 1, all)
df[is.na(df)] <- 0
df[all_na,] <- NA
Otherwise you can try it like this:
data.frame(t(apply(df, 1, \(x) if (all(is.na(x))) x else replace(x, is.na(x), 0))))
# A B C
# 1 0 0 1
# 2 NA NA NA
# 3 1 0 0
# 4 1 1 0
答案3
得分: 0
示例数据框
df <- data.frame(A = c(NA, NA, 1, 1),
B = c(NA, NA, NA, 1),
C = c(1, NA, NA, NA))
创建列D
df$D <- ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1, 0)
根据列D在列A、B和C中用零替换缺失值
df[df$D == 1, c("A", "B", "C")] <- lapply(df[df$D == 1, c("A", "B", "C")], function(x) ifelse(is.na(x), 0, x))
打印修改后的数据框
print(df)
英文:
# Sample DataFrame
df <- data.frame(A = c(NA, NA, 1, 1),
B = c(NA, NA, NA, 1),
C = c(1, NA, NA, NA))
# Create column D
df$D <- ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1, 0)
# Replace NAs with zeros in columns A, B, and C, based on column D
df[df$D == 1, c("A", "B", "C")] <- lapply(df[df$D == 1, c("A", "B", "C")], function(x) ifelse(is.na(x), 0, x))
# Print the modified DataFrame
print(df)
答案4
得分: 0
这是使用dplyr
的rowwise()
操作和replace()
的一个很好的示例。我们可以在replace()
内部包含复杂的逻辑语句。
这是一个不错的方法,因为dplyr
允许灵活地将该方法应用于不同的数据子集。
如果索引列D已经存在,则回答:
library(dplyr)
df |>
rowwise() |>
mutate(across(A:C, \(x) replace(x, is.na(x) & D, 0))) |>
ungroup()
# A tibble: 4 × 4
A B C D
<dbl> <dbl> <dbl> <dbl>
1 0 0 1 1
2 NA NA NA 0
3 1 0 0 1
我们还可以使用dplyr::if_all
即时创建索引:
df |>
rowwise() |>
mutate(across(A:C, \(x) replace(x, is.na(x) & !if_all(A:C, is.na), 0))) |>
ungroup()
英文:
This is a good case for a rowwise()
operation with dplyr
. We can include the complex logical statement inside replace()
.
This is a nice approach because dplyr
allows good flexibility for applying the method to different subsets of data.
Answer if the index column D already exists:
library(dplyr)
df |>
rowwise() |>
mutate(across(A:C, \(x) replace(x, is.na(x) & D, 0))) |>
ungroup()
# A tibble: 4 × 4
A B C D
<dbl> <dbl> <dbl> <dbl>
1 0 0 1 1
2 NA NA NA 0
3 1 0 0 1
We can also create the index on-the-fly, with dplyr::if_all
:
df |>
rowwise() |>
mutate(across(A:C, \(x) replace(x, is.na(x) & !if_all(A:C, is.na), 0))) |>
ungroup()
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论