提取给定列名的最后一个非NA值

huangapple go评论83阅读模式
英文:

Given a column name, extracting last non-NA value

问题

以下是您要翻译的内容:

"对于以下数据集 df,我希望提供列名并返回该列的最后一个非NA值:

日期 cumul_val1 cumul_val2 month_val1 month_val2
1 2020-05-31 48702.97 45919.59 NA NA
2 2020-06-30 69403.68 62780.21 20700.71 16860.62
3 2020-07-31 83631.36 75324.61 14227.68 12544.40
4 2020-08-31 98485.95 88454.14 14854.59 13129.53
5 2020-09-30 117072.67 103484.20 18586.72 15030.06
6 2020-10-31 133293.80 116555.76 16221.13 13071.56
7 2020-11-30 150834.45 129492.36 17540.65 12936.60
8 2020-12-31 176086.22 141442.95 25251.77 11950.59
9 2021-02-28 NA 13985.87 NA 13985.87
10 2021-03-31 NA NA NA 13589.95
11 2021-04-30 NA NA NA 12663.94
12 2021-05-31 NA NA NA 14078.32

这意味着我们可以实现类似以下的内容,但无需传递特定日期值:

df[df$date == '2020-12-31', "cumul_val1"]
[1] 176086.2
df[df$date == '2021-02-28', "cumul_val2"]
[1] 13985.87
df[df$date == '2020-12-31', "month_val1"]
[1] 25251.77
df[df$date == '2021-05-31', "month_val2"]
[1] 14078.32

请问如何实现它?谢谢。"

数据:

df <- 结构(list(date = c("2020-05-31", "2020-06-30", "2020-07-31",
"2020-08-31", "2020-09-30", "2020-10-31", "2020-11-30", "2020-12-31",
"2021-02-28", "2021-03-31", "2021-04-30", "2021-05-31"), cumul_val1 = c(48702.97,
69403.68, 83631.36, 98485.95, 117072.67, 133293.8, 150834.45,
176086.22, NA, NA, NA, NA), cumul_val2 = c(45919.59, 62780.21,
75324.61, 88454.14, 103484.2, 116555.76, 129492.36, 141442.95,
13985.87, NA, NA, NA), month_val1 = c(NA, 20700.71, 14227.68,
14854.59, 18586.72, 16221.13, 17540.65, 25251.77, NA, NA, NA,
NA), month_val2 = c(NA, 16860.62, 12544.4, 13129.53, 15030.06,
13071.56, 12936.6, 11950.59, 13985.87, 13589.95, 12663.94, 14078.32
)), class = "data.frame", row.names = c(NA, -12L))

英文:

For the following data set df, I hope to give the column name and return the last non-NA value of that column:

  1. date cumul_val1 cumul_val2 month_val1 month_val2
  2. 1 2020-05-31 48702.97 45919.59 NA NA
  3. 2 2020-06-30 69403.68 62780.21 20700.71 16860.62
  4. 3 2020-07-31 83631.36 75324.61 14227.68 12544.40
  5. 4 2020-08-31 98485.95 88454.14 14854.59 13129.53
  6. 5 2020-09-30 117072.67 103484.20 18586.72 15030.06
  7. 6 2020-10-31 133293.80 116555.76 16221.13 13071.56
  8. 7 2020-11-30 150834.45 129492.36 17540.65 12936.60
  9. 8 2020-12-31 176086.22 141442.95 25251.77 11950.59
  10. 9 2021-02-28 NA 13985.87 NA 13985.87
  11. 10 2021-03-31 NA NA NA 13589.95
  12. 11 2021-04-30 NA NA NA 12663.94
  13. 12 2021-05-31 NA NA NA 14078.32

This means we can implement something like this, but without passing specific date values:

  1. &gt; df[df$date == &#39;2020-12-31&#39;, &quot;cumul_val1&quot;]
  2. [1] 176086.2
  3. &gt; df[df$date == &#39;2021-02-28&#39;, &quot;cumul_val2&quot;]
  4. [1] 13985.87
  5. &gt; df[df$date == &#39;2020-12-31&#39;, &quot;month_val1&quot;]
  6. [1] 25251.77
  7. &gt; df[df$date == &#39;2021-05-31&#39;, &quot;month_val2&quot;]
  8. [1] 14078.32

May I ask how to achieve it? Thanks.

Data:

  1. df &lt;- structure(list(date = c(&quot;2020-05-31&quot;, &quot;2020-06-30&quot;, &quot;2020-07-31&quot;,
  2. &quot;2020-08-31&quot;, &quot;2020-09-30&quot;, &quot;2020-10-31&quot;, &quot;2020-11-30&quot;, &quot;2020-12-31&quot;,
  3. &quot;2021-02-28&quot;, &quot;2021-03-31&quot;, &quot;2021-04-30&quot;, &quot;2021-05-31&quot;), cumul_val1 = c(48702.97,
  4. 69403.68, 83631.36, 98485.95, 117072.67, 133293.8, 150834.45,
  5. 176086.22, NA, NA, NA, NA), cumul_val2 = c(45919.59, 62780.21,
  6. 75324.61, 88454.14, 103484.2, 116555.76, 129492.36, 141442.95,
  7. 13985.87, NA, NA, NA), month_val1 = c(NA, 20700.71, 14227.68,
  8. 14854.59, 18586.72, 16221.13, 17540.65, 25251.77, NA, NA, NA,
  9. NA), month_val2 = c(NA, 16860.62, 12544.4, 13129.53, 15030.06,
  10. 13071.56, 12936.6, 11950.59, 13985.87, 13589.95, 12663.94, 14078.32
  11. )), class = &quot;data.frame&quot;, row.names = c(NA, -12L))

答案1

得分: 2

  1. library(tidyverse)
  2. get_last <- function(df, column_name) {
  3. df %>%
  4. pull(!!sym(column_name)) %>%
  5. na.omit() %>%
  6. last()
  7. }
  8. get_last(df, "cumul_val1")
  9. [1] 176086.2

OR

  1. df %>%
  2. pivot_longer(-date) %>%
  3. group_by(name) %>%
  4. drop_na() %>%
  5. slice_tail(n = 1)
  6. # A tibble: 4 x 3
  7. # Groups: name [4]
  8. date name value
  9. <chr> <chr> <dbl>
  10. 1 2020-12-31 cumul_val1 176086.
  11. 2 2021-02-28 cumul_val2 13986.
  12. 3 2020-12-31 month_val1 25252.
  13. 4 2021-05-31 month_val2 14078.
英文:
  1. library(tidyverse)
  2. get_last &lt;- function(df, column_name) {
  3. df %&gt;%
  4. pull(!!sym(column_name)) %&gt;%
  5. na.omit() %&gt;%
  6. last()
  7. }
  8. get_last(df, &quot;cumul_val1&quot;)
  9. [1] 176086.2

OR

  1. df %&gt;%
  2. pivot_longer(-date) %&gt;%
  3. group_by(name) %&gt;%
  4. drop_na() %&gt;%
  5. slice_tail(n = 1)
  6. # A tibble: 4 x 3
  7. # Groups: name [4]
  8. date name value
  9. &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
  10. 1 2020-12-31 cumul_val1 176086.
  11. 2 2021-02-28 cumul_val2 13986.
  12. 3 2020-12-31 month_val1 25252.
  13. 4 2021-05-31 month_val2 14078.

答案2

得分: 2

A data.table approach

  1. library(data.table)
  2. # 将数据框转换为 data.table
  3. setDT(df)
  4. # 将数据框变形为长格式,按变量获取最大日期对应的数值
  5. melt(df, id.vars = "date")[!is.na(value), .(last_val = value[date == max(date)]), by = variable]
  6. # variable last_val
  7. # 1: cumul_val1 176086.22
  8. # 2: cumul_val2 13985.87
  9. # 3: month_val1 25251.77
  10. # 4: month_val2 14078.32
英文:

A data.table approach

  1. library(data.table)
  2. # set to data.table
  3. setDT(df)
  4. # melt to long format, get max data/value by variable
  5. melt(df, id.vars = &quot;date&quot;)[!is.na(value), .(last_val = value[date == max(date)]), by = variable]
  6. # variable last_val
  7. # 1: cumul_val1 176086.22
  8. # 2: cumul_val2 13985.87
  9. # 3: month_val1 25251.77
  10. # 4: month_val2 14078.32

答案3

得分: 1

在基本的R中:

  1. last_complete <- function(df, col) tail(df[[col]][!is.na(df[[col]])], 1)
  2. last_complete(df, "cumul_val1")
  3. #[1] 176086.2
  4. last_complete(df, "month_val1")
  5. #[1] 25251.77
英文:

In base R:

  1. last_complete &lt;- function(df, col) tail(df[[col]][!is.na(df[[col]])], 1)
  2. last_complete(df, &quot;cumul_val1&quot;)
  3. #[1] 176086.2
  4. last_complete(df, &quot;month_val1&quot;)
  5. #[1] 25251.77

huangapple
  • 本文由 发表于 2023年2月8日 19:34:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/75385197.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定