“object ‘.’ not found”错误,当尝试用零替换数据框中的所有NAs时发生。

huangapple go评论70阅读模式
英文:

R ""object '.' not found" error when replacing all NAs in a dataframe with zero

问题

我正在尝试向我的数据框中添加一个"总计"列,用于汇总特定列的行值,但首先我需要将NAs更改为零。

我的数据是一个每月的文件,其中包含每天每小时的变量。数据的前4列是解释性的,不会包含在总计列中;完整数据集有44个变量。

以下是用于将NAs替换为零的代码:

df <- df |>
  replace(is.na(.), 0)

这是用于创建总计列的代码,当排除具有NA值的列时,它按预期工作:

df <- df |>
  rowwise() |>
  mutate(total = sum(c_across(5:8)))

如何将NAs替换为0以使我的总计工作?还有,总计代码中使用列索引还是列名称更好?

感谢您的帮助!
David

英文:

I'm trying to add a "total" column to my dataframe that sums the row values for specific columns, but first I need to change NAs to zero.

My data is a monthly file that has variables for every hour of every day in the month. The first 4 columns of the data are explanatory and won't be included in the total column; the full dataset has 44 variables:

library(tidyverse)

df &lt;- structure(list(Flowday = structure(c(19417, 19417, 19417, 19417, 19417), class = &quot;Date&quot;),
               Interval = c(&quot;01:00&quot;, &quot;02:00&quot;, &quot;03:00&quot;, &quot;04:00&quot;, &quot;05:00&quot;), 
               Interval_int = 1:5, Sequence = c(14, 14, 14, 14, 14), 
               DA_RC_AMT = c(18.3, 12.0, 5.6, 8.3, 11.5), 
               DA_ASSET_EN = c(20.4, 14.6, 6.6, 3.0, 15.9), 
               RT_MVP_DIST = c(NA_real_, NA_real_, NA_real_, NA_real_,   NA_real_),
               RT_RAA = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)), 
          row.names = c(NA, -5L), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;,  &quot;data.frame&quot;))

Here's my code to replace NAs with zero:

df &lt;- df  |&gt;
  replace(is.na(.), 0)

which returns this error message:

Error in [&lt;-.tbl_df(*tmp*, list, value = 0) : object '.' not found

Here's my code to create the total column, which works as expected when I exclude the columns with NA values:

df &lt;- df |&gt; 
  rowwise() |&gt; 
  mutate(total = sum(c_across(5:8)))

How can I replace the NAs with 0 so that my total works?
Also, is it better to use the column index or column name in the total code?

Thanks for the help!
David

答案1

得分: 1

以下是翻译好的代码部分:

尝试:

library(tidyverse)
library(dplyr)

df <- df %>%
  replace(is.na(.), 0) %>%
  rowwise(.) %>%
  mutate(total = sum(c_across(contains("DA_")), na.rm = TRUE))

我得到的结果如下:

# A tibble: 5 × 9
# Rowwise: 
  Flowday    Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
  <date>     <chr>           <int>    <dbl>     <dbl>       <dbl>       <dbl>  <dbl> <dbl>
1 2023-03-01 01:00               1       14      18.3        20.4           0      0  38.7
2 2023-03-01 02:00               2       14      12          14.6           0      0  26.6
3 2023-03-01 03:00               3       14       5.6         6.6           0      0  12.2
4 2023-03-01 04:00               4       14       8.3         3             0      0  11.3
5 2023-03-01 05:00               5       14      11.5        15.9           0      0  27.4

或者:

df <- df %>%
  replace(is.na(.), 0) %>%
  rowwise(.) %>%
  mutate(total = sum(c_across(contains(c("DA_", "RT_"))), na.rm = TRUE))

这将返回:

# A tibble: 5 × 9
# Rowwise: 
  Flowday    Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
  <date>     <chr>           <int>    <dbl>     <dbl>       <dbl>       <dbl>  <dbl> <dbl>
1 2023-03-01 01:00               1       14      18.3        20.4           0      0  38.7
2 2023-03-01 02:00               2       14      12          14.6           0      0  26.6
3 2023-03-01 03:00               3       14       5.6         6.6           0      0  12.2
4 2023-03-01 04:00               4       14       8.3         3             0      0  11.3
5 2023-03-01 05:00               5       14      11.5        15.9           0      0  27.4
英文:

Try:

library(tidyverse)
library(dplyr)

df &lt;- df %&gt;%
  replace(is.na(.), 0) %&gt;%
  rowwise(.) %&gt;%
  mutate(total = sum(c_across(contains(&quot;DA_&quot;)), na.rm = TRUE))

I got the result as:

# A tibble: 5 &#215; 9
# Rowwise: 
  Flowday    Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
  &lt;date&gt;     &lt;chr&gt;           &lt;int&gt;    &lt;dbl&gt;     &lt;dbl&gt;       &lt;dbl&gt;       &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt;
1 2023-03-01 01:00               1       14      18.3        20.4           0      0  38.7
2 2023-03-01 02:00               2       14      12          14.6           0      0  26.6
3 2023-03-01 03:00               3       14       5.6         6.6           0      0  12.2
4 2023-03-01 04:00               4       14       8.3         3             0      0  11.3
5 2023-03-01 05:00               5       14      11.5        15.9           0      0  27.4

Or:

df &lt;- df %&gt;%
  replace(is.na(.), 0) %&gt;%
  rowwise(.) %&gt;%
  mutate(total = sum(c_across(contains(c(&quot;DA_&quot;, &quot;RT_&quot;))), na.rm = TRUE))

which would return:

# A tibble: 5 &#215; 9
# Rowwise: 
  Flowday    Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
  &lt;date&gt;     &lt;chr&gt;           &lt;int&gt;    &lt;dbl&gt;     &lt;dbl&gt;       &lt;dbl&gt;       &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt;
1 2023-03-01 01:00               1       14      18.3        20.4           0      0  38.7
2 2023-03-01 02:00               2       14      12          14.6           0      0  26.6
3 2023-03-01 03:00               3       14       5.6         6.6           0      0  12.2
4 2023-03-01 04:00               4       14       8.3         3             0      0  11.3
5 2023-03-01 05:00               5       14      11.5        15.9           0      0  27.4

答案2

得分: 1

我们不需要将NA替换为0,因为sum或向量化的rowSums已经有na.rm参数

library(dplyr) #version &gt;= 1.1.0
df %&gt;%
  mutate(total = rowSums(pick(5:8), na.rm = TRUE))

-输出

# A tibble: 5 &#215; 9
  Flowday    Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
  &lt;date&gt;     &lt;chr&gt;           &lt;int&gt;    &lt;dbl&gt;     &lt;dbl&gt;       &lt;dbl&gt;       &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt;
1 2023-03-01 01:00               1       14      18.3        20.4          NA     NA  38.7
2 2023-03-01 02:00               2       14      12          14.6          NA     NA  26.6
3 2023-03-01 03:00               3       14       5.6         6.6          NA     NA  12.2
4 2023-03-01 04:00               4       14       8.3         3            NA     NA  11.3
5 2023-03-01 05:00               5       14      11.5        15.9          NA     NA  27.4
英文:

We don't need to replace NA with 0 as there is already na.rm argument in either sum or the vectorized rowSums

library(dplyr) #version &gt;= 1.1.0
df %&gt;%
  mutate(total = rowSums(pick(5:8), na.rm = TRUE))

-output

# A tibble: 5 &#215; 9
  Flowday    Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
  &lt;date&gt;     &lt;chr&gt;           &lt;int&gt;    &lt;dbl&gt;     &lt;dbl&gt;       &lt;dbl&gt;       &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt;
1 2023-03-01 01:00               1       14      18.3        20.4          NA     NA  38.7
2 2023-03-01 02:00               2       14      12          14.6          NA     NA  26.6
3 2023-03-01 03:00               3       14       5.6         6.6          NA     NA  12.2
4 2023-03-01 04:00               4       14       8.3         3            NA     NA  11.3
5 2023-03-01 05:00               5       14      11.5        15.9          NA     NA  27.4

huangapple
  • 本文由 发表于 2023年4月11日 13:02:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75982543.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定