在R中:如何根据另一列的值来应用一个函数到一列?

huangapple go评论65阅读模式
英文:

In R: How to apply a function to a column given the value of another column

问题

我有一个包含1和NA的数据框。我想将NA替换为零,但如果整行都是NA,那么不替换,因为这表示真正的NA。

例如,这是一个简化的数据框:

A <- c(NA, NA, 1, 1)
B <- c(NA, NA, NA, 1)
C <- c(1, NA, NA, NA)
df <- data.frame(A, B, C)
df$D <- ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1, 0)

列A、B和C要么是1,要么是空白(NA)。我想用零(0)替换空白,但当A、B和C都为空白时不替换。我已经创建了列D作为指示器,表示A、B和C中是否有任何数据。现在我需要一段代码来替换NA为零。希望这样能明白我的意思。

我希望输出看起来像这样:

A B C D
0 0 1 1
    0
1 0 0 1
1 1 0 1

我使用以下代码来生成列D:

df$D <- ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1, 0)
英文:

I have a dataframe with 1s and NAs. I would like to replace the NAs with zeros but not if the entire row is NA as this would indicate a true NA.

For example, here is a simplified data frame:

A&lt;-c(NA,NA,1,1)
B&lt;-c(NA,NA,NA,1)
C&lt;-c(1,NA,NA,NA)
df&lt;-data.frame(A,B,C)
df$D&lt;-ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1,0)

A   B   C  
NA  NA  1  
NA  NA  NA  
1   NA  NA  
1   1   NA  

Columns A B and C have either 1s or blanks (NA). I would like to replace the blanks with zeros (0), but NOT when A, B, and C are all blank. I have created column D as an indicator of whether or not there is any data in A B C. Now I need a code to replace NA with zero. I hope this makes sense.

I am hoping the output will look like this:

A B C  D
0 0 1  1
       0
1 0 0  1
1 1 0  1

I used the following code to produce column D

df$D&lt;-ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1,0)

答案1

得分: 2

基于rowSums的结果的方法(受@Ritchie Sacramento的提示)

replace(df, rowSums(df, na.rm = T) > 0 & is.na(df), 0)
   A  B  C
1  0  0  1
2 NA NA NA
3  1  0  0
4  1  1  0
英文:

An approach based on the result of a rowSums (with hint from @Ritchie Sacramento)

replace(df, rowSums(df, na.rm = T) &gt; 0 &amp; is.na(df), 0)
   A  B  C
1  0  0  1
2 NA NA NA
3  1  0  0
4  1  1  0

答案2

得分: 1

一个简单的解决方案是捕获所有行都是`NA`的行,将所有的`NA`替换为零,然后再重新填充`NA`:

```r
all_na <- apply(is.na(df), 1, all)
df[is.na(df)] <- 0
df[all_na,] <- NA

否则,您可以尝试像这样做:

data.frame(t(apply(df, 1, \(x) if (all(is.na(x))) x else replace(x, is.na(x), 0))))
#    A  B  C
# 1  0  0  1
# 2 NA NA NA
# 3  1  0  0
# 4  1  1  0

<details>
<summary>英文:</summary>

A simple solution would be to capture the rows that are all `NA`, replace all the `NA` with zero, and then go back and re-populate the `NA`: 

```r
all_na &lt;- apply(is.na(df), 1, all)
df[is.na(df)] &lt;- 0
df[all_na,] &lt;- NA

Otherwise you can try it like this:

data.frame(t(apply(df, 1, \(x) if (all(is.na(x))) x else replace(x, is.na(x), 0))))
#    A  B  C
# 1  0  0  1
# 2 NA NA NA
# 3  1  0  0
# 4  1  1  0

答案3

得分: 0

示例数据框

df <- data.frame(A = c(NA, NA, 1, 1),
B = c(NA, NA, NA, 1),
C = c(1, NA, NA, NA))

创建列D

df$D <- ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1, 0)

根据列D在列A、B和C中用零替换缺失值

df[df$D == 1, c("A", "B", "C")] <- lapply(df[df$D == 1, c("A", "B", "C")], function(x) ifelse(is.na(x), 0, x))

打印修改后的数据框

print(df)

英文:
# Sample DataFrame
df &lt;- data.frame(A = c(NA, NA, 1, 1),
                 B = c(NA, NA, NA, 1),
                 C = c(1, NA, NA, NA))

# Create column D
df$D &lt;- ifelse(!is.na(df$A) | !is.na(df$B) | !is.na(df$C), 1, 0)

# Replace NAs with zeros in columns A, B, and C, based on column D
df[df$D == 1, c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;)] &lt;- lapply(df[df$D == 1, c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;)], function(x) ifelse(is.na(x), 0, x))

# Print the modified DataFrame
print(df)

答案4

得分: 0

这是使用dplyrrowwise()操作和replace()的一个很好的示例。我们可以在replace()内部包含复杂的逻辑语句。
这是一个不错的方法,因为dplyr允许灵活地将该方法应用于不同的数据子集。

如果索引列D已经存在,则回答:

library(dplyr)

df |&gt; 
    rowwise() |&gt; 
    mutate(across(A:C, \(x) replace(x, is.na(x) &amp; D, 0))) |&gt;
    ungroup()

# A tibble: 4 &#215; 4
      A     B     C     D
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0     0     1     1
2    NA    NA    NA     0
3     1     0     0     1

我们还可以使用dplyr::if_all即时创建索引:

df |&gt; 
    rowwise() |&gt; 
    mutate(across(A:C, \(x) replace(x, is.na(x) &amp; !if_all(A:C, is.na), 0))) |&gt;
    ungroup()
英文:

This is a good case for a rowwise() operation with dplyr. We can include the complex logical statement inside replace().
This is a nice approach because dplyr allows good flexibility for applying the method to different subsets of data.

Answer if the index column D already exists:

library(dplyr)

df |&gt; 
    rowwise() |&gt; 
    mutate(across(A:C, \(x) replace(x, is.na(x) &amp; D, 0))) |&gt;
    ungroup()

# A tibble: 4 &#215; 4
      A     B     C     D
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0     0     1     1
2    NA    NA    NA     0
3     1     0     0     1

We can also create the index on-the-fly, with dplyr::if_all:

df |&gt; 
    rowwise() |&gt; 
    mutate(across(A:C, \(x) replace(x, is.na(x) &amp; !if_all(A:C, is.na), 0))) |&gt;
    ungroup()


</details>



huangapple
  • 本文由 发表于 2023年8月4日 06:56:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76832041.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定