如何在R中删除一个看起来像另一个数据的数据

huangapple go评论66阅读模式
英文:

how to delete a data looks like an other data in R

问题

我想删除一个具有最短时间的数据,但它们的名称相似。

以下是代码部分,不要翻译:

df1 <- data.frame(
  name = c(
    "A. MAHJUM-61365",
    "A. MAHJUM-61365. MAHJUM-61365",
    "A. RIZAL. AD-11002795",
    "A. RIZAL. AD-11002795. RIZAL. AD-11002795",
    "ABD. KADIR-60447",
    "ABD. KADIR-60447ABD. KADIR-60447",
    "ABD. KAHAR-62551",
    "ABD. RASYID DS-11002082",
    "ABDREAS APUNG @SANY",
    "ABDUL AZIS @HYUNDAI",
    "ABDUL AZIZ @HYUNDAY",
    "ABDUL AZIZ@HYUNDAI"
  ),
  time = c(100, 5, 40, 6, 55, 7, 90, 29, 100, 20, 100, 6)
)

和期望的df2数据框如下:

df2 <- data.frame(name = c(
    "A. MAHJUM-61365",
    "A. RIZAL. AD-11002795",
    "ABD. KADIR-60447",
    "ABD. KAHAR-62551",
    "ABD. RASYID DS-11002082",
    "ABDREAS APUNG @SANY",
    "ABDUL AZIS @HYUNDAY"
  ),
  time = c(100, 40, 55, 90, 29, 100, 100)
)

我期望的df数据框应该与df2相似。

英文:

i want to delete a data with a minim time, but in the name is like each other

df1 &lt;- data.frame(
  name = c(
    &quot;A. MAHJUM-61365&quot;,
    &quot;A. MAHJUM-61365. MAHJUM-61365&quot;,
    &quot;A. RIZAL. AD-11002795&quot;,
    &quot;A. RIZAL. AD-11002795. RIZAL. AD-11002795&quot;,
    &quot;ABD. KADIR-60447&quot;,
    &quot;ABD. KADIR-60447ABD. KADIR-60447&quot;,
    &quot;ABD. KAHAR-62551&quot;,
    &quot;ABD. RASYID DS-11002082&quot;,
    &quot;ABDREAS APUNG @SANY&quot;,
    &quot;ABDUL AZIS @HYUNDAI&quot;,
    &quot;ABDUL AZIZ @HYUNDAY&quot;,
    &quot;ABDUL AZIZ@HYUNDAI&quot;
  ),

time=c(100,5,40,6,55,7,90,29,100,20,100,6))

and the df would be like this

df2=data.frame(name=c(
&quot;A. MAHJUM-61365&quot;
&quot;A. RIZAL. AD-11002795&quot;
&quot;ABD. KADIR-60447&quot;
&quot;ABD. KAHAR-62551&quot;
&quot;ABD. RASYID DS-11002082&quot;
&quot;ABDREAS APUNG @SANY&quot;
&quot;ABDUL AZIS @HYUNDAY&quot;),
time=c(100,40,55,90,29,100,100))

my expected the df like to df2

答案1

得分: 3

你可以尝试使用 adist 并使用 hclust 找到相似的名称。使用 ave 找到最大值。

x <- adist(df1$name, partial=TRUE)
i <- cutree(hclust(as.dist(pmin(x, t(x))), h=2))

df1[df1$time == ave(df1$time, i, FUN=max),]
#                      name time
#1          A. MAHJUM-61365  100
#3    A. RIZAL. AD-11002795   40
#5         ABD. KADIR-60447   55
#7         ABD. KAHAR-62551   90
#8  ABD. RASYID DS-11002082   29
#9      ABDREAS APUNG @SANY  100
#11     ABDUL AZIZ @HYUNDAY  100
英文:

You can try adist and use hclust to find similar names. Use ave to find the maximum.

x &lt;- adist(df1$name, partial=TRUE)
i &lt;- cutree(hclust(as.dist(pmin(x, t(x)))), h=2)

df1[df1$time == ave(df1$time, i, FUN=max),]
#                      name time
#1          A. MAHJUM-61365  100
#3    A. RIZAL. AD-11002795   40
#5         ABD. KADIR-60447   55
#7         ABD. KAHAR-62551   90
#8  ABD. RASYID DS-11002082   29
#9      ABDREAS APUNG @SANY  100
#11     ABDUL AZIZ @HYUNDAY  100

huangapple
  • 本文由 发表于 2023年4月4日 15:47:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75926762.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定