英文:
how to delete a data looks like an other data in R
问题
我想删除一个具有最短时间的数据,但它们的名称相似。
以下是代码部分,不要翻译:
df1 <- data.frame(
name = c(
"A. MAHJUM-61365",
"A. MAHJUM-61365. MAHJUM-61365",
"A. RIZAL. AD-11002795",
"A. RIZAL. AD-11002795. RIZAL. AD-11002795",
"ABD. KADIR-60447",
"ABD. KADIR-60447ABD. KADIR-60447",
"ABD. KAHAR-62551",
"ABD. RASYID DS-11002082",
"ABDREAS APUNG @SANY",
"ABDUL AZIS @HYUNDAI",
"ABDUL AZIZ @HYUNDAY",
"ABDUL AZIZ@HYUNDAI"
),
time = c(100, 5, 40, 6, 55, 7, 90, 29, 100, 20, 100, 6)
)
和期望的df2
数据框如下:
df2 <- data.frame(name = c(
"A. MAHJUM-61365",
"A. RIZAL. AD-11002795",
"ABD. KADIR-60447",
"ABD. KAHAR-62551",
"ABD. RASYID DS-11002082",
"ABDREAS APUNG @SANY",
"ABDUL AZIS @HYUNDAY"
),
time = c(100, 40, 55, 90, 29, 100, 100)
)
我期望的df
数据框应该与df2
相似。
英文:
i want to delete a data with a minim time, but in the name is like each other
df1 <- data.frame(
name = c(
"A. MAHJUM-61365",
"A. MAHJUM-61365. MAHJUM-61365",
"A. RIZAL. AD-11002795",
"A. RIZAL. AD-11002795. RIZAL. AD-11002795",
"ABD. KADIR-60447",
"ABD. KADIR-60447ABD. KADIR-60447",
"ABD. KAHAR-62551",
"ABD. RASYID DS-11002082",
"ABDREAS APUNG @SANY",
"ABDUL AZIS @HYUNDAI",
"ABDUL AZIZ @HYUNDAY",
"ABDUL AZIZ@HYUNDAI"
),
time=c(100,5,40,6,55,7,90,29,100,20,100,6))
and the df would be like this
df2=data.frame(name=c(
"A. MAHJUM-61365"
"A. RIZAL. AD-11002795"
"ABD. KADIR-60447"
"ABD. KAHAR-62551"
"ABD. RASYID DS-11002082"
"ABDREAS APUNG @SANY"
"ABDUL AZIS @HYUNDAY"),
time=c(100,40,55,90,29,100,100))
my expected the df like to df2
答案1
得分: 3
你可以尝试使用 adist
并使用 hclust
找到相似的名称。使用 ave
找到最大值。
x <- adist(df1$name, partial=TRUE)
i <- cutree(hclust(as.dist(pmin(x, t(x))), h=2))
df1[df1$time == ave(df1$time, i, FUN=max),]
# name time
#1 A. MAHJUM-61365 100
#3 A. RIZAL. AD-11002795 40
#5 ABD. KADIR-60447 55
#7 ABD. KAHAR-62551 90
#8 ABD. RASYID DS-11002082 29
#9 ABDREAS APUNG @SANY 100
#11 ABDUL AZIZ @HYUNDAY 100
英文:
You can try adist
and use hclust
to find similar names. Use ave
to find the maximum.
x <- adist(df1$name, partial=TRUE)
i <- cutree(hclust(as.dist(pmin(x, t(x)))), h=2)
df1[df1$time == ave(df1$time, i, FUN=max),]
# name time
#1 A. MAHJUM-61365 100
#3 A. RIZAL. AD-11002795 40
#5 ABD. KADIR-60447 55
#7 ABD. KAHAR-62551 90
#8 ABD. RASYID DS-11002082 29
#9 ABDREAS APUNG @SANY 100
#11 ABDUL AZIZ @HYUNDAY 100
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论