按首次发生日期和姓名在R中保留记录。

huangapple go评论71阅读模式
英文:

Keep the records by first date occurred and Name in R

问题

我有一个数据框,其中包含IDDateCodeNames列。在不同的日期下,ID可能有多个条目,Names列中的值可能相同也可能不同。以下是示例。

ID       Date        Code      Names
1     2010-12-09     1.1.1     Alpha
1     2010-12-15     1.1.1     Alpha
1     2010-12-15     1.1.1     Beta
2     2010-12-09     1.1.1     Beta
2     2010-12-17     1.1.1     Beta
3     2011-02-09     1.1.1     Gamma
3     2011-04-25     1.1.1     Gamma
4     2011-04-25     1.1.1     Tango

我想保留按日期Names首次出现的ID的行。删除其他具有不同日期和相似Names的行。以下是我的结果数据框示例。

ID       Date        Code      Names
1     2010-12-09     1.1.1     Alpha
1     2010-12-09     1.1.1     Beta
2     2010-12-09     1.1.1     Beta
3     2011-02-09     1.1.1     Gamma
6     2011-04-25     1.1.1     Tango
英文:

I have a data frame that have columns of ID, Date, Code and Names. I have multiple entries of ID at different dates with similar or different values in Names column. Below is the example.

ID       Date        Code      Names
1     2010-12-09     1.1.1     Alpha
1     2010-12-15     1.1.1     Alpha
1     2010-12-15     1.1.1     Beta
2     2010-12-09     1.1.1     Beta
2     2010-12-17     1.1.1     Beta
3     2011-02-09     1.1.1     Gamma
3     2011-04-25     1.1.1     Gamma
4     2011-04-25     1.1.1     Tango

I want to keep the rows by ID that occurred first by date and Names. Delete the rest with different dates and similar names. Below is the example of my resultant dataframe.

ID       Date        Code      Names
1     2010-12-09     1.1.1     Alpha
1     2010-12-09     1.1.1     Beta
2     2010-12-09     1.1.1     Beta
3     2011-02-09     1.1.1     Gamma
6     2011-04-25     1.1.1     Tango

答案1

得分: 4

你可以使用 slice_min

library(dplyr)
slice_min(your_df, Date, by = c(ID, Names))

#   ID       Date  Code Names
# 1  1 2010-12-09 1.1.1 Alpha
# 2  1 2010-12-15 1.1.1 Beta
# 3  2 2010-12-09 1.1.1 Beta
# 4  3 2011-02-09 1.1.1 Gamma
# 5  4 2011-04-25 1.1.1 Tango
英文:

You can use slice_min:

library(dplyr)
slice_min(your_df, Date, by = c(ID, Names))

#   ID       Date  Code Names
# 1  1 2010-12-09 1.1.1 Alpha
# 2  1 2010-12-15 1.1.1  Beta
# 3  2 2010-12-09 1.1.1  Beta
# 4  3 2011-02-09 1.1.1 Gamma
# 5  4 2011-04-25 1.1.1 Tango

答案2

得分: 1

使用 data.table

library(data.table)

dt <- fread("ID       Date        Code      Names
1     2010-12-09     1.1.1     Alpha
1     2010-12-15     1.1.1     Alpha
1     2010-12-15     1.1.1     Beta
2     2010-12-09     1.1.1     Beta
2     2010-12-17     1.1.1     Beta
3     2011-02-09     1.1.1     Gamma
3     2011-04-25     1.1.1     Gamma
4     2011-04-25     1.1.1     Tango")

dt[dt[, .I[which.min(Date)], by = .(ID, Names)]$V1]

#    ID       Date  Code Names
# 1:  1 2010-12-09 1.1.1 Alpha
# 2:  1 2010-12-15 1.1.1  Beta
# 3:  2 2010-12-09 1.1.1  Beta
# 4:  3 2011-02-09 1.1.1 Gamma
# 5:  4 2011-04-25 1.1.1 Tango

请注意,代码部分没有翻译。

英文:

Using data.table:

library(data.table)

dt &lt;- fread(&quot;ID       Date        Code      Names
1     2010-12-09     1.1.1     Alpha
1     2010-12-15     1.1.1     Alpha
1     2010-12-15     1.1.1     Beta
2     2010-12-09     1.1.1     Beta
2     2010-12-17     1.1.1     Beta
3     2011-02-09     1.1.1     Gamma
3     2011-04-25     1.1.1     Gamma
4     2011-04-25     1.1.1     Tango&quot;)

dt[dt[, .I[which.min(Date)], by = .(ID, Names)]$V1]

#    ID       Date  Code Names
# 1:  1 2010-12-09 1.1.1 Alpha
# 2:  1 2010-12-15 1.1.1  Beta
# 3:  2 2010-12-09 1.1.1  Beta
# 4:  3 2011-02-09 1.1.1 Gamma
# 5:  4 2011-04-25 1.1.1 Tango

huangapple
  • 本文由 发表于 2023年6月19日 23:40:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76508155.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定