英文:
Bind or merge rows by duplicate dates in R
问题
我有一个包含6列的数据框,如下所示。您可以看到日期是重复的。如何合并行并保留以下列中的信息?
我想要像这样的结果,日期上没有重复。
谢谢你非常。
英文:
I have a data frame with 6 columns like this. You can see that the dates are duplicates. How can I merge the rows and keep the information that's in the following columns ?
date 1 2 3 4 5
2019-01-01 NA 1966439. NA NA NA
2019-01-01 NA NA NA 133.6 NA
2019-01-01 NA NA NA NA 6.2
2019-02-01 NA 1962946 NA NA NA
2019-02-01 NA NA NA 134.5 NA
2019-02-01 NA NA NA NA 6.1
2019-03-01 NA 1974072 NA NA NA
2019-03-01 NA NA NA 135.4 NA
2019-03-01 NA NA NA NA 6.3
2019-04-01 NA 1984086 NA NA NA
I want something like this, with no duplicates on the dates.
date 1 2 3 4 5
2019-01-01 NA 1966439 NA 133.6 6.2
2019-02-01 NA 1962946 NA 134.5 6.1
2019-03-01 NA 1974072 NA 135.4 6.3
2019-04-01 NA 1984086 NA NA NA
thank you very much
答案1
得分: 1
我在下面贴出了一个解决方案。
希望我的注释足够清楚解释了解决方案。
#使用的包
library(dplyr)
#一些可重现的数据
dta <- data.frame(
date = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
a = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
x = c(123, NA, NA, 3456, NA, NA, 2345, NA, NA),
y = c(NA, 123, NA, NA, 3456, NA, NA, 2345, NA),
z = c(NA, NA, 123, NA, NA, 3456, NA, NA, 2345)
)
dta <- dta |>
group_by(date) |> #按日期分组
dplyr::summarise(a = sum(a, na.rm = TRUE), #只总结单个值(min()、mean()等同样有效)
x = sum(x, na.rm = TRUE),
y = sum(y, na.rm = TRUE),
z = sum(z, na.rm = TRUE)) |>
select_if(~sum(.) > 0) #删除总和为0的列(所有值都是NA的列)
英文:
I pasted a solution below.
Hopefully my comments explain the solution well enough.
#Packages used
library(dplyr)
#Some reproducible data
dta <- data.frame(
date = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
a = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
x = c(123, NA, NA, 3456, NA, NA, 2345, NA, NA),
y = c(NA, 123, NA, NA, 3456, NA, NA, 2345, NA),
z = c(NA, NA, 123, NA, NA, 3456, NA, NA, 2345)
)
dta <- dta |>
group_by(date) |> #To group by the dates
dplyr::summarise(a = sum(a, na.rm = TRUE), #just summarise the single value (min(), mean(), etc. work just as well)
x = sum(x, na.rm = TRUE),
y = sum(y, na.rm = TRUE),
z = sum(z, na.rm = TRUE)) |>
select_if(~sum(.) > 0) #Remove columns with sum of 0 (columns with all NA)
答案2
得分: 1
如果每列中每个日期只有一个非缺失值,以下代码有效:
library(tidyverse)
df <- tibble::tribble(
~date, ~col1, ~col2, ~col3, ~col4, ~col5,
"2019-01-01", NA, 1966439, NA, NA, NA,
"2019-01-01", NA, NA, NA, 133.6, NA,
"2019-01-01", NA, NA, NA, NA, 6.2,
"2019-02-01", NA, 1962946, NA, NA, NA,
"2019-02-01", NA, NA, NA, 134.5, NA,
"2019-02-01", NA, NA, NA, NA, 6.1,
"2019-03-01", NA, 1974072, NA, NA, NA,
"2019-03-01", NA, NA, NA, 135.4, NA,
"2019-03-01", NA, NA, NA, NA, 6.3,
"2019-04-01", NA, 1984086, NA, NA, NA
)
remove_na <- function(x) {
if (all(is.na(x))) return(NA)
discard(x, is.na)
}
df |>
group_by(date) |>
summarize(across(starts_with("col"), remove_na))
#> # A tibble: 4 × 6
#> date col1 col2 col3 col4 col5
#> <chr> <lgl> <dbl> <lgl> <dbl> <dbl>
#> 1 2019-01-01 NA 1966439 NA 134. 6.2
#> 2 2019-02-01 NA 1962946 NA 134. 6.1
#> 3 2019-03-01 NA 1974072 NA 135. 6.3
#> 4 2019-04-01 NA 1984086 NA NA NA
Created on 2023-03-03 with reprex v2.0.2
英文:
If there is only one non-missing value per date in each column, the following code works:
library(tidyverse)
df <- tibble::tribble(
~date, ~col1, ~col2, ~col3, ~col4, ~col5,
"2019-01-01", NA, 1966439, NA, NA, NA,
"2019-01-01", NA, NA, NA, 133.6, NA,
"2019-01-01", NA, NA, NA, NA, 6.2,
"2019-02-01", NA, 1962946, NA, NA, NA,
"2019-02-01", NA, NA, NA, 134.5, NA,
"2019-02-01", NA, NA, NA, NA, 6.1,
"2019-03-01", NA, 1974072, NA, NA, NA,
"2019-03-01", NA, NA, NA, 135.4, NA,
"2019-03-01", NA, NA, NA, NA, 6.3,
"2019-04-01", NA, 1984086, NA, NA, NA
)
remove_na <- function(x) {
if (all(is.na(x))) return(NA)
discard(x, is.na)
}
df |>
group_by(date) |>
summarize(across(starts_with("col"), remove_na))
#> # A tibble: 4 × 6
#> date col1 col2 col3 col4 col5
#> <chr> <lgl> <dbl> <lgl> <dbl> <dbl>
#> 1 2019-01-01 NA 1966439 NA 134. 6.2
#> 2 2019-02-01 NA 1962946 NA 134. 6.1
#> 3 2019-03-01 NA 1974072 NA 135. 6.3
#> 4 2019-04-01 NA 1984086 NA NA NA
<sup>Created on 2023-03-03 with reprex v2.0.2</sup>
Please include some code that generates the dataset (like I did here) next time you post a question!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论