英文:
Keep the records by first date occurred and Name in R
问题
我有一个数据框,其中包含ID、Date、Code和Names列。在不同的日期下,ID可能有多个条目,Names列中的值可能相同也可能不同。以下是示例。
ID Date Code Names
1 2010-12-09 1.1.1 Alpha
1 2010-12-15 1.1.1 Alpha
1 2010-12-15 1.1.1 Beta
2 2010-12-09 1.1.1 Beta
2 2010-12-17 1.1.1 Beta
3 2011-02-09 1.1.1 Gamma
3 2011-04-25 1.1.1 Gamma
4 2011-04-25 1.1.1 Tango
我想保留按日期和Names首次出现的ID的行。删除其他具有不同日期和相似Names的行。以下是我的结果数据框示例。
ID Date Code Names
1 2010-12-09 1.1.1 Alpha
1 2010-12-09 1.1.1 Beta
2 2010-12-09 1.1.1 Beta
3 2011-02-09 1.1.1 Gamma
6 2011-04-25 1.1.1 Tango
英文:
I have a data frame that have columns of ID, Date, Code and Names. I have multiple entries of ID at different dates with similar or different values in Names column. Below is the example.
ID Date Code Names
1 2010-12-09 1.1.1 Alpha
1 2010-12-15 1.1.1 Alpha
1 2010-12-15 1.1.1 Beta
2 2010-12-09 1.1.1 Beta
2 2010-12-17 1.1.1 Beta
3 2011-02-09 1.1.1 Gamma
3 2011-04-25 1.1.1 Gamma
4 2011-04-25 1.1.1 Tango
I want to keep the rows by ID that occurred first by date and Names. Delete the rest with different dates and similar names. Below is the example of my resultant dataframe.
ID Date Code Names
1 2010-12-09 1.1.1 Alpha
1 2010-12-09 1.1.1 Beta
2 2010-12-09 1.1.1 Beta
3 2011-02-09 1.1.1 Gamma
6 2011-04-25 1.1.1 Tango
答案1
得分: 4
你可以使用 slice_min
:
library(dplyr)
slice_min(your_df, Date, by = c(ID, Names))
# ID Date Code Names
# 1 1 2010-12-09 1.1.1 Alpha
# 2 1 2010-12-15 1.1.1 Beta
# 3 2 2010-12-09 1.1.1 Beta
# 4 3 2011-02-09 1.1.1 Gamma
# 5 4 2011-04-25 1.1.1 Tango
英文:
You can use slice_min
:
library(dplyr)
slice_min(your_df, Date, by = c(ID, Names))
# ID Date Code Names
# 1 1 2010-12-09 1.1.1 Alpha
# 2 1 2010-12-15 1.1.1 Beta
# 3 2 2010-12-09 1.1.1 Beta
# 4 3 2011-02-09 1.1.1 Gamma
# 5 4 2011-04-25 1.1.1 Tango
答案2
得分: 1
使用 data.table
:
library(data.table)
dt <- fread("ID Date Code Names
1 2010-12-09 1.1.1 Alpha
1 2010-12-15 1.1.1 Alpha
1 2010-12-15 1.1.1 Beta
2 2010-12-09 1.1.1 Beta
2 2010-12-17 1.1.1 Beta
3 2011-02-09 1.1.1 Gamma
3 2011-04-25 1.1.1 Gamma
4 2011-04-25 1.1.1 Tango")
dt[dt[, .I[which.min(Date)], by = .(ID, Names)]$V1]
# ID Date Code Names
# 1: 1 2010-12-09 1.1.1 Alpha
# 2: 1 2010-12-15 1.1.1 Beta
# 3: 2 2010-12-09 1.1.1 Beta
# 4: 3 2011-02-09 1.1.1 Gamma
# 5: 4 2011-04-25 1.1.1 Tango
请注意,代码部分没有翻译。
英文:
Using data.table
:
library(data.table)
dt <- fread("ID Date Code Names
1 2010-12-09 1.1.1 Alpha
1 2010-12-15 1.1.1 Alpha
1 2010-12-15 1.1.1 Beta
2 2010-12-09 1.1.1 Beta
2 2010-12-17 1.1.1 Beta
3 2011-02-09 1.1.1 Gamma
3 2011-04-25 1.1.1 Gamma
4 2011-04-25 1.1.1 Tango")
dt[dt[, .I[which.min(Date)], by = .(ID, Names)]$V1]
# ID Date Code Names
# 1: 1 2010-12-09 1.1.1 Alpha
# 2: 1 2010-12-15 1.1.1 Beta
# 3: 2 2010-12-09 1.1.1 Beta
# 4: 3 2011-02-09 1.1.1 Gamma
# 5: 4 2011-04-25 1.1.1 Tango
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论