英文:
R ""object '.' not found" error when replacing all NAs in a dataframe with zero
问题
我正在尝试向我的数据框中添加一个"总计"列,用于汇总特定列的行值,但首先我需要将NAs更改为零。
我的数据是一个每月的文件,其中包含每天每小时的变量。数据的前4列是解释性的,不会包含在总计列中;完整数据集有44个变量。
以下是用于将NAs替换为零的代码:
df <- df |>
replace(is.na(.), 0)
这是用于创建总计列的代码,当排除具有NA值的列时,它按预期工作:
df <- df |>
rowwise() |>
mutate(total = sum(c_across(5:8)))
如何将NAs替换为0以使我的总计工作?还有,总计代码中使用列索引还是列名称更好?
感谢您的帮助!
David
英文:
I'm trying to add a "total" column to my dataframe that sums the row values for specific columns, but first I need to change NAs to zero.
My data is a monthly file that has variables for every hour of every day in the month. The first 4 columns of the data are explanatory and won't be included in the total column; the full dataset has 44 variables:
library(tidyverse)
df <- structure(list(Flowday = structure(c(19417, 19417, 19417, 19417, 19417), class = "Date"),
Interval = c("01:00", "02:00", "03:00", "04:00", "05:00"),
Interval_int = 1:5, Sequence = c(14, 14, 14, 14, 14),
DA_RC_AMT = c(18.3, 12.0, 5.6, 8.3, 11.5),
DA_ASSET_EN = c(20.4, 14.6, 6.6, 3.0, 15.9),
RT_MVP_DIST = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_),
RT_RAA = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
Here's my code to replace NAs with zero:
df <- df |>
replace(is.na(.), 0)
which returns this error message:
Error in [<-.tbl_df
(*tmp*
, list, value = 0) : object '.' not found
Here's my code to create the total column, which works as expected when I exclude the columns with NA values:
df <- df |>
rowwise() |>
mutate(total = sum(c_across(5:8)))
How can I replace the NAs with 0 so that my total works?
Also, is it better to use the column index or column name in the total code?
Thanks for the help!
David
答案1
得分: 1
以下是翻译好的代码部分:
尝试:
library(tidyverse)
library(dplyr)
df <- df %>%
replace(is.na(.), 0) %>%
rowwise(.) %>%
mutate(total = sum(c_across(contains("DA_")), na.rm = TRUE))
我得到的结果如下:
# A tibble: 5 × 9
# Rowwise:
Flowday Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
<date> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2023-03-01 01:00 1 14 18.3 20.4 0 0 38.7
2 2023-03-01 02:00 2 14 12 14.6 0 0 26.6
3 2023-03-01 03:00 3 14 5.6 6.6 0 0 12.2
4 2023-03-01 04:00 4 14 8.3 3 0 0 11.3
5 2023-03-01 05:00 5 14 11.5 15.9 0 0 27.4
或者:
df <- df %>%
replace(is.na(.), 0) %>%
rowwise(.) %>%
mutate(total = sum(c_across(contains(c("DA_", "RT_"))), na.rm = TRUE))
这将返回:
# A tibble: 5 × 9
# Rowwise:
Flowday Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
<date> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2023-03-01 01:00 1 14 18.3 20.4 0 0 38.7
2 2023-03-01 02:00 2 14 12 14.6 0 0 26.6
3 2023-03-01 03:00 3 14 5.6 6.6 0 0 12.2
4 2023-03-01 04:00 4 14 8.3 3 0 0 11.3
5 2023-03-01 05:00 5 14 11.5 15.9 0 0 27.4
英文:
Try:
library(tidyverse)
library(dplyr)
df <- df %>%
replace(is.na(.), 0) %>%
rowwise(.) %>%
mutate(total = sum(c_across(contains("DA_")), na.rm = TRUE))
I got the result as:
# A tibble: 5 × 9
# Rowwise:
Flowday Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
<date> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2023-03-01 01:00 1 14 18.3 20.4 0 0 38.7
2 2023-03-01 02:00 2 14 12 14.6 0 0 26.6
3 2023-03-01 03:00 3 14 5.6 6.6 0 0 12.2
4 2023-03-01 04:00 4 14 8.3 3 0 0 11.3
5 2023-03-01 05:00 5 14 11.5 15.9 0 0 27.4
Or:
df <- df %>%
replace(is.na(.), 0) %>%
rowwise(.) %>%
mutate(total = sum(c_across(contains(c("DA_", "RT_"))), na.rm = TRUE))
which would return:
# A tibble: 5 × 9
# Rowwise:
Flowday Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
<date> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2023-03-01 01:00 1 14 18.3 20.4 0 0 38.7
2 2023-03-01 02:00 2 14 12 14.6 0 0 26.6
3 2023-03-01 03:00 3 14 5.6 6.6 0 0 12.2
4 2023-03-01 04:00 4 14 8.3 3 0 0 11.3
5 2023-03-01 05:00 5 14 11.5 15.9 0 0 27.4
答案2
得分: 1
我们不需要将NA替换为0,因为sum
或向量化的rowSums
已经有na.rm
参数
library(dplyr) #version >= 1.1.0
df %>%
mutate(total = rowSums(pick(5:8), na.rm = TRUE))
-输出
# A tibble: 5 × 9
Flowday Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
<date> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2023-03-01 01:00 1 14 18.3 20.4 NA NA 38.7
2 2023-03-01 02:00 2 14 12 14.6 NA NA 26.6
3 2023-03-01 03:00 3 14 5.6 6.6 NA NA 12.2
4 2023-03-01 04:00 4 14 8.3 3 NA NA 11.3
5 2023-03-01 05:00 5 14 11.5 15.9 NA NA 27.4
英文:
We don't need to replace NA with 0 as there is already na.rm
argument in either sum
or the vectorized rowSums
library(dplyr) #version >= 1.1.0
df %>%
mutate(total = rowSums(pick(5:8), na.rm = TRUE))
-output
# A tibble: 5 × 9
Flowday Interval Interval_int Sequence DA_RC_AMT DA_ASSET_EN RT_MVP_DIST RT_RAA total
<date> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2023-03-01 01:00 1 14 18.3 20.4 NA NA 38.7
2 2023-03-01 02:00 2 14 12 14.6 NA NA 26.6
3 2023-03-01 03:00 3 14 5.6 6.6 NA NA 12.2
4 2023-03-01 04:00 4 14 8.3 3 NA NA 11.3
5 2023-03-01 05:00 5 14 11.5 15.9 NA NA 27.4
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论