英文:
Sum up columns until the cell in adjacent column is NA and then divide each of them by the preceding column[SOLVED]
问题
我正在尝试在数据表中将列相加,直到相邻列的值为NA为止。例如,如果我有以下表格:
日期 | 1 | 2 | 3 |
---|---|---|---|
07/01/2022 | 2 | 12 | 12 |
08/01/2022 | 1 | 1 | NA |
09/01/2022 | 3 | NA | NA |
我只会将列1加到第2行,因为第2行的列2为NA,而第1行的列2也是NA。然后,我将每列除以前一列。我还想忽略日期。是否有一种方法可以在不硬编码的情况下执行此操作?这将是期望的最终行:
总计 | 3 | 12 | 12 |
---|
我尝试使用"adorn_table"并排除NA,但似乎没有成功。
英文:
I'm trying to add up columns in a data table until the adjacent column has a value of NA. For example, if I had the table:
DATE | 1 | 2 | 3 |
---|---|---|---|
07/01/2022 | 2 | 12 | 12 |
08/01/2022 | 1 | 1 | NA |
09/01/2022 | 3 | NA | NA |
I would only add up to row 2 in column 1, as row 3 in column 2 is NA, and row 1 in column 2. I would then divide each of those columns by the preceding column. I'd also want to ignore the date. Is there a way to do this without hard coding it? This would be the desired end row:
TOTAL | 3 | 12 | 12 |
---|
I tried using "adorn_table" and excluding NAs, but that didn't seem to do it.
答案1
得分: 2
以下是您要翻译的内容:
library(tidyverse)
# create made-up data
df <- tibble(
x = c(2, 1, 3),
y = c(12, 1, NA)
)
df |>
filter(cumsum(is.na(y)) == 0) |>
summarize(
s = sum(x)
) |>
pull(s)
library(tidyverse)
# create a function to do the work
f <- function(.x, .y) {
keep_rows <- cumsum(is.na(.y)) == 0
sum(.x[keep_rows])
}
# create made-up data
df <- tibble(
date = 1:3,
x = c(2, 1, 3),
y = c(12, 1, NA),
z = c(12, NA, NA)
)
total_row <- map2(
df |> select(x:z),
df |> select(y, z1 = z, z2 = z),
f
)
bind_rows(df |>
mutate(date = as.character(date)),
c(date = "Total", total_row)
)
英文:
For the original question, try:
library(tidyverse)
# create made-up data
df <- tibble(
x = c(2, 1, 3),
y = c(12, 1, NA)
)
df |>
filter(cumsum(is.na(y)) == 0) |> # This will get rid of all rows starting from the first time when y is NA
summarize( # add up the values of x
s = sum(x)
) |>
pull(s)
For the clarified question, I'm assuming the number of columns is arbitrary.
library(tidyverse)
# create a function to do the work
f <- function(.x, .y) {
keep_rows <- cumsum(is.na(.y)) == 0
sum(.x[keep_rows])
}
# create made-up data
df <- tibble(
date = 1:3, # dates in your case, I'm keeping it simple
x = c(2, 1, 3),
y = c(12, 1, NA),
z = c(12, NA, NA)
)
total_row <- map2(
df |> select(x:z), # select the columns being summed
df |> select(y, z1 = z, z2 = z), # If you want the last column to depend on itself, you'll need to do something like this
f
)
bind_rows(df |>
mutate(date = as.character(date)),
c(date = "Total", total_row)
)
答案2
得分: 0
1) 定义一个名为 sumToNA
的函数,该函数将 x 求和到 y 中第一个 NA 之前的位置。如果没有 NA,则将 x 整个求和。
然后将该函数应用于 DF,但不包括第一列,以及不包括前两列但最后一列重复的 DF。不使用任何包。
sumToNA <- function(x, y) sum(x[!cummax(is.na(y))], na.rm = TRUE)
n <- ncol(DF)
rbind(DF, c("Total", mapply(sumToNA, DF[2:n], DF[c(3:n, n)])))
## DATE 1 2 3
## 1 07/01/2022 2 12 12
## 2 08/01/2022 1 1 <NA>
## 3 09/01/2022 3 <NA> <NA>
## 4 Total 3 12 12
2) 定义一个名为 NAfwd
的函数,该函数接受一个向量并返回其中第一个 NA 之后的所有 NA,将所有剩余的非 NA 转换为 0。然后将 DF[-1] 添加到处理过的 DF 中,不包括前两列和最后一列,然后将每一列都应用 NAfwd
转换。然后使用 rowSums
对行进行求和,并将结果作为新行插入。
NAfwd <- \(z) 0 * Reduce(\(x, y) if (is.na(x)) x else y, z, acc = TRUE)
n <- ncol(DF)
rbind(DF, c("Total", colSums(DF[-1] + apply(DF[c(3:n,n)], 2, NAfwd), na.rm=TRUE)))
## DATE 1 2 3
## 1 07/01/2022 2 12 12
## 2 08/01/2022 1 1 <NA>
## 3 09/01/2022 3 <NA> <NA>
## 4 Total 3 12 12
注意
DF <- data.frame(
DATE = c("07/01/2022", "08/01/2022", "09/01/2022"),
`1` = c(2L, 1L, 3L),
`2` = c(12L, 1L, NA),
`3` = c(12L, NA, NA),
check.names = FALSE
)
英文:
1) Define a function sumToNA
which sums x to the position prior to the first NA in y. If there are no NAs it sums x entirely.
Then mapply that function to DF without column 1 and DF without the first two columns but the last column twice. No packages are used.
sumToNA <- function(x, y) sum(x[!cummax(is.na(y))], na.rm = TRUE)
n <- ncol(DF)
rbind(DF, c("Total", mapply(sumToNA, DF[2:n], DF[c(3:n, n)])))
## DATE 1 2 3
## 1 07/01/2022 2 12 12
## 2 08/01/2022 1 1 <NA>
## 3 09/01/2022 3 <NA> <NA>
## 4 Total 3 12 12
2) Define a function NAfwd
which takes a vector and returns it but with all NA's after the first NA and converts all remaining non-NA's to 0. Then add DF[-1] to the processed DF without the first two columns and the last column twice after transforming each column with NAfwd
. Then use rowSums
to sum the rows of that and insert the result as a new row.
NAfwd <- \(z) 0 * Reduce(\(x, y) if (is.na(x)) x else y, z, acc = TRUE)
n <- ncol(DF)
rbind(DF, c("Total", colSums(DF[-1] + apply(DF[c(3:n,n)], 2, NAfwd), na.rm=TRUE)))
## DATE 1 2 3
## 1 07/01/2022 2 12 12
## 2 08/01/2022 1 1 <NA>
## 3 09/01/2022 3 <NA> <NA>
## 4 Total 3 12 12
Note
DF <- data.frame(
DATE = c("07/01/2022", "08/01/2022", "09/01/2022"),
`1` = c(2L, 1L, 3L),
`2` = c(12L, 1L, NA),
`3` = c(12L, NA, NA),
check.names = FALSE
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论