英文:
Given a column name, extracting last non-NA value
问题
以下是您要翻译的内容:
"对于以下数据集 df
,我希望提供列名并返回该列的最后一个非NA
值:
日期 cumul_val1 cumul_val2 month_val1 month_val2
1 2020-05-31 48702.97 45919.59 NA NA
2 2020-06-30 69403.68 62780.21 20700.71 16860.62
3 2020-07-31 83631.36 75324.61 14227.68 12544.40
4 2020-08-31 98485.95 88454.14 14854.59 13129.53
5 2020-09-30 117072.67 103484.20 18586.72 15030.06
6 2020-10-31 133293.80 116555.76 16221.13 13071.56
7 2020-11-30 150834.45 129492.36 17540.65 12936.60
8 2020-12-31 176086.22 141442.95 25251.77 11950.59
9 2021-02-28 NA 13985.87 NA 13985.87
10 2021-03-31 NA NA NA 13589.95
11 2021-04-30 NA NA NA 12663.94
12 2021-05-31 NA NA NA 14078.32
这意味着我们可以实现类似以下的内容,但无需传递特定日期值:
df[df$date == '2020-12-31', "cumul_val1"]
[1] 176086.2
df[df$date == '2021-02-28', "cumul_val2"]
[1] 13985.87
df[df$date == '2020-12-31', "month_val1"]
[1] 25251.77
df[df$date == '2021-05-31', "month_val2"]
[1] 14078.32
请问如何实现它?谢谢。"
数据:
df <- 结构(list(date = c("2020-05-31", "2020-06-30", "2020-07-31",
"2020-08-31", "2020-09-30", "2020-10-31", "2020-11-30", "2020-12-31",
"2021-02-28", "2021-03-31", "2021-04-30", "2021-05-31"), cumul_val1 = c(48702.97,
69403.68, 83631.36, 98485.95, 117072.67, 133293.8, 150834.45,
176086.22, NA, NA, NA, NA), cumul_val2 = c(45919.59, 62780.21,
75324.61, 88454.14, 103484.2, 116555.76, 129492.36, 141442.95,
13985.87, NA, NA, NA), month_val1 = c(NA, 20700.71, 14227.68,
14854.59, 18586.72, 16221.13, 17540.65, 25251.77, NA, NA, NA,
NA), month_val2 = c(NA, 16860.62, 12544.4, 13129.53, 15030.06,
13071.56, 12936.6, 11950.59, 13985.87, 13589.95, 12663.94, 14078.32
)), class = "data.frame", row.names = c(NA, -12L))
英文:
For the following data set df
, I hope to give the column name and return the last non-NA
value of that column:
date cumul_val1 cumul_val2 month_val1 month_val2
1 2020-05-31 48702.97 45919.59 NA NA
2 2020-06-30 69403.68 62780.21 20700.71 16860.62
3 2020-07-31 83631.36 75324.61 14227.68 12544.40
4 2020-08-31 98485.95 88454.14 14854.59 13129.53
5 2020-09-30 117072.67 103484.20 18586.72 15030.06
6 2020-10-31 133293.80 116555.76 16221.13 13071.56
7 2020-11-30 150834.45 129492.36 17540.65 12936.60
8 2020-12-31 176086.22 141442.95 25251.77 11950.59
9 2021-02-28 NA 13985.87 NA 13985.87
10 2021-03-31 NA NA NA 13589.95
11 2021-04-30 NA NA NA 12663.94
12 2021-05-31 NA NA NA 14078.32
This means we can implement something like this, but without passing specific date values:
> df[df$date == '2020-12-31', "cumul_val1"]
[1] 176086.2
> df[df$date == '2021-02-28', "cumul_val2"]
[1] 13985.87
> df[df$date == '2020-12-31', "month_val1"]
[1] 25251.77
> df[df$date == '2021-05-31', "month_val2"]
[1] 14078.32
May I ask how to achieve it? Thanks.
Data:
df <- structure(list(date = c("2020-05-31", "2020-06-30", "2020-07-31",
"2020-08-31", "2020-09-30", "2020-10-31", "2020-11-30", "2020-12-31",
"2021-02-28", "2021-03-31", "2021-04-30", "2021-05-31"), cumul_val1 = c(48702.97,
69403.68, 83631.36, 98485.95, 117072.67, 133293.8, 150834.45,
176086.22, NA, NA, NA, NA), cumul_val2 = c(45919.59, 62780.21,
75324.61, 88454.14, 103484.2, 116555.76, 129492.36, 141442.95,
13985.87, NA, NA, NA), month_val1 = c(NA, 20700.71, 14227.68,
14854.59, 18586.72, 16221.13, 17540.65, 25251.77, NA, NA, NA,
NA), month_val2 = c(NA, 16860.62, 12544.4, 13129.53, 15030.06,
13071.56, 12936.6, 11950.59, 13985.87, 13589.95, 12663.94, 14078.32
)), class = "data.frame", row.names = c(NA, -12L))
答案1
得分: 2
library(tidyverse)
get_last <- function(df, column_name) {
df %>%
pull(!!sym(column_name)) %>%
na.omit() %>%
last()
}
get_last(df, "cumul_val1")
[1] 176086.2
OR
df %>%
pivot_longer(-date) %>%
group_by(name) %>%
drop_na() %>%
slice_tail(n = 1)
# A tibble: 4 x 3
# Groups: name [4]
date name value
<chr> <chr> <dbl>
1 2020-12-31 cumul_val1 176086.
2 2021-02-28 cumul_val2 13986.
3 2020-12-31 month_val1 25252.
4 2021-05-31 month_val2 14078.
英文:
library(tidyverse)
get_last <- function(df, column_name) {
df %>%
pull(!!sym(column_name)) %>%
na.omit() %>%
last()
}
get_last(df, "cumul_val1")
[1] 176086.2
OR
df %>%
pivot_longer(-date) %>%
group_by(name) %>%
drop_na() %>%
slice_tail(n = 1)
# A tibble: 4 x 3
# Groups: name [4]
date name value
<chr> <chr> <dbl>
1 2020-12-31 cumul_val1 176086.
2 2021-02-28 cumul_val2 13986.
3 2020-12-31 month_val1 25252.
4 2021-05-31 month_val2 14078.
答案2
得分: 2
A data.table
approach
library(data.table)
# 将数据框转换为 data.table
setDT(df)
# 将数据框变形为长格式,按变量获取最大日期对应的数值
melt(df, id.vars = "date")[!is.na(value), .(last_val = value[date == max(date)]), by = variable]
# variable last_val
# 1: cumul_val1 176086.22
# 2: cumul_val2 13985.87
# 3: month_val1 25251.77
# 4: month_val2 14078.32
英文:
A data.table
approach
library(data.table)
# set to data.table
setDT(df)
# melt to long format, get max data/value by variable
melt(df, id.vars = "date")[!is.na(value), .(last_val = value[date == max(date)]), by = variable]
# variable last_val
# 1: cumul_val1 176086.22
# 2: cumul_val2 13985.87
# 3: month_val1 25251.77
# 4: month_val2 14078.32
答案3
得分: 1
在基本的R中:
last_complete <- function(df, col) tail(df[[col]][!is.na(df[[col]])], 1)
last_complete(df, "cumul_val1")
#[1] 176086.2
last_complete(df, "month_val1")
#[1] 25251.77
英文:
In base R:
last_complete <- function(df, col) tail(df[[col]][!is.na(df[[col]])], 1)
last_complete(df, "cumul_val1")
#[1] 176086.2
last_complete(df, "month_val1")
#[1] 25251.77
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论