2023年3月4日 03:07:33go评论134阅读模式

英文:

Bind or merge rows by duplicate dates in R

问题

我有一个包含6列的数据框，如下所示。您可以看到日期是重复的。如何合并行并保留以下列中的信息？

我想要像这样的结果，日期上没有重复。

谢谢你非常。

英文:

I have a data frame with 6 columns like this. You can see that the dates are duplicates. How can I merge the rows and keep the information that's in the following columns ?

date     1       2          3    4    5

2019-01-01	NA	1966439. 	NA	NA	  NA
2019-01-01	NA	NA     	    NA	133.6 NA
2019-01-01	NA	NA     	    NA  NA	  6.2
2019-02-01	NA	1962946 	NA	NA	  NA
2019-02-01	NA	NA	        NA	134.5 NA
2019-02-01	NA	NA   	    NA	NA	  6.1
2019-03-01	NA	1974072 	NA	NA	  NA
2019-03-01	NA	NA      	NA  135.4 NA
2019-03-01	NA	NA      	NA  NA	  6.3
2019-04-01	NA	1984086	    NA	NA	  NA

I want something like this, with no duplicates on the dates.

date        1     2      3    4   5 

2019-01-01	NA	1966439	NA	133.6 6.2
2019-02-01	NA	1962946	NA	134.5 6.1
2019-03-01	NA	1974072	NA	135.4 6.3
2019-04-01	NA	1984086	NA	NA	  NA

thank you very much

答案1

得分: 1

我在下面贴出了一个解决方案。
希望我的注释足够清楚解释了解决方案。

#使用的包
library(dplyr)

#一些可重现的数据
dta <- data.frame(
  date = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
  a = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
  x = c(123, NA, NA, 3456, NA, NA, 2345, NA, NA),
  y = c(NA, 123, NA, NA, 3456, NA, NA, 2345, NA),
  z = c(NA, NA, 123, NA, NA, 3456, NA, NA, 2345)
)

dta <- dta |> 
  group_by(date) |> #按日期分组
  dplyr::summarise(a = sum(a, na.rm = TRUE), #只总结单个值（min()、mean()等同样有效）
            x = sum(x, na.rm = TRUE),
            y = sum(y, na.rm = TRUE),
            z = sum(z, na.rm = TRUE)) |> 
  select_if(~sum(.) > 0) #删除总和为0的列（所有值都是NA的列）

英文:

I pasted a solution below.
Hopefully my comments explain the solution well enough.

#Packages used
library(dplyr)

#Some reproducible data
dta &lt;- data.frame(
  date = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
  a = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
  x = c(123, NA, NA, 3456, NA, NA, 2345, NA, NA),
  y = c(NA, 123, NA, NA, 3456, NA, NA, 2345, NA),
  z = c(NA, NA, 123, NA, NA, 3456, NA, NA, 2345)
)

dta &lt;- dta |&gt; 
  group_by(date) |&gt; #To group by the dates
  dplyr::summarise(a = sum(a, na.rm = TRUE), #just summarise the single value (min(), mean(), etc. work just as well)
            x = sum(x, na.rm = TRUE),
            y = sum(y, na.rm = TRUE),
            z = sum(z, na.rm = TRUE)) |&gt; 
  select_if(~sum(.) &gt; 0) #Remove columns with sum of 0 (columns with all NA)

答案2

得分: 1

如果每列中每个日期只有一个非缺失值，以下代码有效：

library(tidyverse)

df <- tibble::tribble(
     ~date, ~col1, ~col2, ~col3, ~col4, ~col5,
   "2019-01-01",  NA,  1966439,    NA,  NA,    NA,
   "2019-01-01",  NA,  NA,          NA,  133.6, NA,
  "2019-01-01",  NA,  NA,          NA,  NA,    6.2,
   "2019-02-01",  NA,  1962946,     NA,  NA,    NA,
   "2019-02-01",  NA,  NA,          NA,  134.5, NA,
  "2019-02-01",  NA,  NA,          NA,  NA,    6.1,
   "2019-03-01",  NA,  1974072,     NA,  NA,    NA,
   "2019-03-01",  NA,  NA,          NA,  135.4, NA,
  "2019-03-01",  NA,  NA,          NA,  NA,    6.3,
   "2019-04-01",  NA,  1984086,     NA,  NA,    NA
  )

remove_na <- function(x) {
  if (all(is.na(x))) return(NA)
  discard(x, is.na)
}

df |>
  group_by(date) |>
  summarize(across(starts_with("col"), remove_na))
#> # A tibble: 4 × 6
#>   date       col1     col2 col3   col4  col5
#>   <chr>      <lgl>   <dbl> <lgl> <dbl> <dbl>
#> 1 2019-01-01 NA    1966439 NA     134.   6.2
#> 2 2019-02-01 NA    1962946 NA     134.   6.1
#> 3 2019-03-01 NA    1974072 NA     135.   6.3
#> 4 2019-04-01 NA    1984086 NA      NA   NA

^{Created on 2023-03-03 with reprex v2.0.2}

英文:

If there is only one non-missing value per date in each column, the following code works:

library(tidyverse)

df &lt;- tibble::tribble(
     ~date, ~col1, ~col2, ~col3, ~col4, ~col5,
   &quot;2019-01-01&quot;,  NA,  1966439,    NA,  NA,    NA,
   &quot;2019-01-01&quot;,  NA,  NA,          NA,  133.6, NA,
  &quot;2019-01-01&quot;,  NA,  NA,          NA,  NA,    6.2,
   &quot;2019-02-01&quot;,  NA,  1962946,     NA,  NA,    NA,
   &quot;2019-02-01&quot;,  NA,  NA,          NA,  134.5, NA,
  &quot;2019-02-01&quot;,  NA,  NA,          NA,  NA,    6.1,
   &quot;2019-03-01&quot;,  NA,  1974072,     NA,  NA,    NA,
   &quot;2019-03-01&quot;,  NA,  NA,          NA,  135.4, NA,
  &quot;2019-03-01&quot;,  NA,  NA,          NA,  NA,    6.3,
   &quot;2019-04-01&quot;,  NA,  1984086,     NA,  NA,    NA
  )

remove_na &lt;- function(x) {
  if (all(is.na(x))) return(NA)
  discard(x, is.na)
}

df |&gt; 
  group_by(date) |&gt; 
  summarize(across(starts_with(&quot;col&quot;), remove_na))
#&gt; # A tibble: 4 &#215; 6
#&gt;   date       col1     col2 col3   col4  col5
#&gt;   &lt;chr&gt;      &lt;lgl&gt;   &lt;dbl&gt; &lt;lgl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 2019-01-01 NA    1966439 NA     134.   6.2
#&gt; 2 2019-02-01 NA    1962946 NA     134.   6.1
#&gt; 3 2019-03-01 NA    1974072 NA     135.   6.3
#&gt; 4 2019-04-01 NA    1984086 NA      NA   NA

<sup>Created on 2023-03-03 with reprex v2.0.2</sup>

Please include some code that generates the dataset (like I did here) next time you post a question!

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中按重复日期绑定或合并行。

问题

答案1

答案2

Why does my get_hundred function not work correctly when applied to my dataset in R using dplyr and stringr?

数据整理问题，带有标记的声音文件

metaMDS() 使用 metadata 作为距离函数。

使用gota的ReadCSV函数时跳过行

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论