在R中按重复日期绑定或合并行。

huangapple go评论126阅读模式
英文:

Bind or merge rows by duplicate dates in R

问题

我有一个包含6列的数据框,如下所示。您可以看到日期是重复的。如何合并行并保留以下列中的信息?

我想要像这样的结果,日期上没有重复。

谢谢你非常。

英文:

I have a data frame with 6 columns like this. You can see that the dates are duplicates. How can I merge the rows and keep the information that's in the following columns ?

date     1       2          3    4    5

2019-01-01	NA	1966439. 	NA	NA	  NA
2019-01-01	NA	NA     	    NA	133.6 NA
2019-01-01	NA	NA     	    NA  NA	  6.2
2019-02-01	NA	1962946 	NA	NA	  NA
2019-02-01	NA	NA	        NA	134.5 NA
2019-02-01	NA	NA   	    NA	NA	  6.1
2019-03-01	NA	1974072 	NA	NA	  NA
2019-03-01	NA	NA      	NA  135.4 NA
2019-03-01	NA	NA      	NA  NA	  6.3
2019-04-01	NA	1984086	    NA	NA	  NA

I want something like this, with no duplicates on the dates.

date        1     2      3    4   5 

2019-01-01	NA	1966439	NA	133.6 6.2
2019-02-01	NA	1962946	NA	134.5 6.1
2019-03-01	NA	1974072	NA	135.4 6.3
2019-04-01	NA	1984086	NA	NA	  NA

thank you very much

答案1

得分: 1

我在下面贴出了一个解决方案。
希望我的注释足够清楚解释了解决方案。

#使用的包
library(dplyr)

#一些可重现的数据
dta <- data.frame(
  date = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
  a = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
  x = c(123, NA, NA, 3456, NA, NA, 2345, NA, NA),
  y = c(NA, 123, NA, NA, 3456, NA, NA, 2345, NA),
  z = c(NA, NA, 123, NA, NA, 3456, NA, NA, 2345)
)

dta <- dta |> 
  group_by(date) |> #按日期分组
  dplyr::summarise(a = sum(a, na.rm = TRUE), #只总结单个值(min()、mean()等同样有效)
            x = sum(x, na.rm = TRUE),
            y = sum(y, na.rm = TRUE),
            z = sum(z, na.rm = TRUE)) |> 
  select_if(~sum(.) > 0) #删除总和为0的列(所有值都是NA的列)

英文:

I pasted a solution below.
Hopefully my comments explain the solution well enough.

#Packages used
library(dplyr)

#Some reproducible data
dta &lt;- data.frame(
  date = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
  a = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
  x = c(123, NA, NA, 3456, NA, NA, 2345, NA, NA),
  y = c(NA, 123, NA, NA, 3456, NA, NA, 2345, NA),
  z = c(NA, NA, 123, NA, NA, 3456, NA, NA, 2345)
)

dta &lt;- dta |&gt; 
  group_by(date) |&gt; #To group by the dates
  dplyr::summarise(a = sum(a, na.rm = TRUE), #just summarise the single value (min(), mean(), etc. work just as well)
            x = sum(x, na.rm = TRUE),
            y = sum(y, na.rm = TRUE),
            z = sum(z, na.rm = TRUE)) |&gt; 
  select_if(~sum(.) &gt; 0) #Remove columns with sum of 0 (columns with all NA)

答案2

得分: 1

如果每列中每个日期只有一个非缺失值,以下代码有效:

library(tidyverse)

df <- tibble::tribble(
     ~date, ~col1, ~col2, ~col3, ~col4, ~col5,
   "2019-01-01",  NA,  1966439,    NA,  NA,    NA,
   "2019-01-01",  NA,  NA,          NA,  133.6, NA,
  "2019-01-01",  NA,  NA,          NA,  NA,    6.2,
   "2019-02-01",  NA,  1962946,     NA,  NA,    NA,
   "2019-02-01",  NA,  NA,          NA,  134.5, NA,
  "2019-02-01",  NA,  NA,          NA,  NA,    6.1,
   "2019-03-01",  NA,  1974072,     NA,  NA,    NA,
   "2019-03-01",  NA,  NA,          NA,  135.4, NA,
  "2019-03-01",  NA,  NA,          NA,  NA,    6.3,
   "2019-04-01",  NA,  1984086,     NA,  NA,    NA
  )

remove_na <- function(x) {
  if (all(is.na(x))) return(NA)
  discard(x, is.na)
}

df |>
  group_by(date) |>
  summarize(across(starts_with("col"), remove_na))
#> # A tibble: 4 × 6
#>   date       col1     col2 col3   col4  col5
#>   <chr>      <lgl>   <dbl> <lgl> <dbl> <dbl>
#> 1 2019-01-01 NA    1966439 NA     134.   6.2
#> 2 2019-02-01 NA    1962946 NA     134.   6.1
#> 3 2019-03-01 NA    1974072 NA     135.   6.3
#> 4 2019-04-01 NA    1984086 NA      NA   NA

Created on 2023-03-03 with reprex v2.0.2

英文:

If there is only one non-missing value per date in each column, the following code works:

library(tidyverse)

df &lt;- tibble::tribble(
     ~date, ~col1, ~col2, ~col3, ~col4, ~col5,
   &quot;2019-01-01&quot;,  NA,  1966439,    NA,  NA,    NA,
   &quot;2019-01-01&quot;,  NA,  NA,          NA,  133.6, NA,
  &quot;2019-01-01&quot;,  NA,  NA,          NA,  NA,    6.2,
   &quot;2019-02-01&quot;,  NA,  1962946,     NA,  NA,    NA,
   &quot;2019-02-01&quot;,  NA,  NA,          NA,  134.5, NA,
  &quot;2019-02-01&quot;,  NA,  NA,          NA,  NA,    6.1,
   &quot;2019-03-01&quot;,  NA,  1974072,     NA,  NA,    NA,
   &quot;2019-03-01&quot;,  NA,  NA,          NA,  135.4, NA,
  &quot;2019-03-01&quot;,  NA,  NA,          NA,  NA,    6.3,
   &quot;2019-04-01&quot;,  NA,  1984086,     NA,  NA,    NA
  )

remove_na &lt;- function(x) {
  if (all(is.na(x))) return(NA)
  discard(x, is.na)
}

df |&gt; 
  group_by(date) |&gt; 
  summarize(across(starts_with(&quot;col&quot;), remove_na))
#&gt; # A tibble: 4 &#215; 6
#&gt;   date       col1     col2 col3   col4  col5
#&gt;   &lt;chr&gt;      &lt;lgl&gt;   &lt;dbl&gt; &lt;lgl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 2019-01-01 NA    1966439 NA     134.   6.2
#&gt; 2 2019-02-01 NA    1962946 NA     134.   6.1
#&gt; 3 2019-03-01 NA    1974072 NA     135.   6.3
#&gt; 4 2019-04-01 NA    1984086 NA      NA   NA

<sup>Created on 2023-03-03 with reprex v2.0.2</sup>

Please include some code that generates the dataset (like I did here) next time you post a question!

huangapple
  • 本文由 发表于 2023年3月4日 03:07:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75630977.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定