2023年3月4日 01:03:23go评论158阅读模式

英文:

How to filter and create a new database in r based on the latest observation and date rows?

问题

任何帮助都非常感激！

我有一个扩展的数据库（超过1000条记录），我想要删除一些行，只保留每个个体的最新信息。我不知道如何开始。

原始数据库

名字	年份	体重
约翰	2021-04-03	203
约翰	2022-08-02	198
约翰	2018-08-34	234
帕特里克	2014-05-09	176
帕特里克	2021-03-09	199
帕特里克	2020-09-03	200
皮特	2019-09-05	204
皮特	2017-07-14	209
皮特	2019-10-05	199

最终数据库

名字	年份	体重
约翰	2022-08-02	198
帕特里克	2021-03-09	199
皮特	2019-10-05	199

英文:

any help is much appreciated!

I have an extended database (more than 1000), and I would like to eliminate some rows and keep only the latest information on each individual's name. I have no idea how to start.

Original Database

Name	Year	Weight
John	2021-04-03	203
John	2022-08-02	198
John	2018-08-34	234
Patrick	2014-05-09	176
Patrick	2021-03-09	199
Patrick	2020-09-03	200
Peter	2019-09-05	204
Peter	2017-07-14	209
Peter	2019-10-05	199

Final Database

Name	Year	Weight
John	2022-08-02	198
Patrick	2021-03-09	199
Peter	2019-10-05	199

答案1

得分: 1

We could use slice_max

library(dplyr) # version &gt;= 1.1.0
df1 %&gt;%
    slice_max(Year, by = &#39;Name&#39;)

-output

     Name       Year Weight
1    John 2022-08-02    198
2 Patrick 2021-03-09    199
3   Peter 2019-10-05    199

Or with previous versions of dplyr

df1 %&gt;%
   group_by(Name) %&gt;%
   slice_max(Year) %&gt;%
   ungroup
# A tibble: 3 &#215; 3
  Name    Year       Weight
  &lt;chr&gt;   &lt;chr&gt;       &lt;int&gt;
1 John    2022-08-02    198
2 Patrick 2021-03-09    199
3 Peter   2019-10-05    199

Or in data.table

library(data.table)
setDT(df1)[df1[, .I[which.max(as.Date(Year))], Name]$V1]
      Name       Year Weight
1:    John 2022-08-02    198
2: Patrick 2021-03-09    199
3:   Peter 2019-10-05    199

data

df1 &lt;- structure(list(Name = c(&quot;John&quot;, &quot;John&quot;, &quot;John&quot;, &quot;Patrick&quot;, &quot;Patrick&quot;, 
&quot;Patrick&quot;, &quot;Peter&quot;, &quot;Peter&quot;, &quot;Peter&quot;), Year = c(&quot;2021-04-03&quot;, 
&quot;2022-08-02&quot;, &quot;2018-08-34&quot;, &quot;2014-05-09&quot;, &quot;2021-03-09&quot;, &quot;2020-09-03&quot;, 
&quot;2019-09-05&quot;, &quot;2017-07-14&quot;, &quot;2019-10-05&quot;), Weight = c(203L, 198L, 
234L, 176L, 199L, 200L, 204L, 209L, 199L)), 
class = &quot;data.frame&quot;, row names = c(NA, 
-9L))

英文:

We could use slice_max

library(dplyr) # version &gt;= 1.1.0
df1 %&gt;%
    slice_max(Year, by = &#39;Name&#39;)

-output

     Name       Year Weight
1    John 2022-08-02    198
2 Patrick 2021-03-09    199
3   Peter 2019-10-05    199

Or with previous versions of dplyr

df1 %&gt;%
   group_by(Name) %&gt;%
   slice_max(Year) %&gt;%
   ungroup
# A tibble: 3 &#215; 3
  Name    Year       Weight
  &lt;chr&gt;   &lt;chr&gt;       &lt;int&gt;
1 John    2022-08-02    198
2 Patrick 2021-03-09    199
3 Peter   2019-10-05    199

Or in data.table

library(data.table)
setDT(df1)[df1[, .I[which.max(as.Date(Year))], Name]$V1]
      Name       Year Weight
1:    John 2022-08-02    198
2: Patrick 2021-03-09    199
3:   Peter 2019-10-05    199

data

df1 &lt;- structure(list(Name = c(&quot;John&quot;, &quot;John&quot;, &quot;John&quot;, &quot;Patrick&quot;, &quot;Patrick&quot;, 
&quot;Patrick&quot;, &quot;Peter&quot;, &quot;Peter&quot;, &quot;Peter&quot;), Year = c(&quot;2021-04-03&quot;, 
&quot;2022-08-02&quot;, &quot;2018-08-34&quot;, &quot;2014-05-09&quot;, &quot;2021-03-09&quot;, &quot;2020-09-03&quot;, 
&quot;2019-09-05&quot;, &quot;2017-07-14&quot;, &quot;2019-10-05&quot;), Weight = c(203L, 198L, 
234L, 176L, 199L, 200L, 204L, 209L, 199L)), 
class = &quot;data.frame&quot;, row.names = c(NA, 
-9L))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中基于最新的观测和日期行筛选并创建新的数据库？

问题

答案1

data

data

如何使用字符向量创建具有字符向量的高图表的y轴

运行 sapply 函数，其中有两个输入（变量和数据框）。

geom_scatterpie带有不缩放的图例

IV回归与聚类标准误

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论