在R中如何保留具有两个不同列中相似值的数据框中的行。

huangapple go评论76阅读模式
英文:

How to retain a row in a dataframe with similar values in two different column in R

问题

你好!你想要保留估算1和估算2非常相似的行,你可以使用R中的以下代码来实现这一目标:

# 计算估算1和估算2之间的差异
df$diff <- abs(df$estimation1 - df$estimation2)

# 选择差异小于某个阈值的行
threshold <- 0.01  # 你可以根据需要调整阈值
result <- df[df$diff < threshold, c("ID", "estimation1", "estimation2")]

# 移除差异列
result$diff <- NULL

# 输出结果
print(result)

这段代码将计算估算1和估算2之间的差异,并仅保留差异小于指定阈值的行,最后输出结果。

希望这对你有所帮助!

英文:

I have the following dataframe:

ID  estimation1   estimation2
A   0.0234         0.0220
A    0.0234            3
A   0.0234         0.034
B   -0.005         -1.89
B   -0.005         0.03
B   -0.005       -0.0052 
C   0.10         -0.00067
C   0.10        -0.98
C   0.10         0.11
df &lt;- structure(list(ID = c(&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;B&quot;, &quot;C&quot;, &quot;C&quot;, &quot;C&quot;), estimation1 = c(0.0234, 0.0234, 0.0234, -0.005, -0.005, -0.005, 0.10, 0.10, 0.10), estimation2 = c(0.022, 3, 0.034, -1.89, 0.03, -0.0052, -0.00067, -0.98, 0.11)), class = &quot;data.frame&quot;, row.names = c(NA, 
-3L))

I would like to retain only the row in which estimation1 and estimation2 are quite similar, in this case only the first row, with the following output:

ID  estimation1   estimation2
    A   0.0234         0.0220
    B   -0.005         -0.0052
    C   0.10           0.11

Is there a function in R being able to do something like that?
Really thank you!

答案1

得分: 2

更新: 经过澄清:

一种通用的方法可以是分组并找到绝对值的最小差异,然后进行过滤:

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(diff = abs(estimation2 - estimation1)) %>%
  filter(diff == min(diff)) %>%
  select(-diff)

 ID    estimation1 estimation2
 <chr>       <dbl>       <dbl>
1 A          0.0234      0.022 
2 B         -0.005      -0.0052
3 C          0.1         0.11  

第一个答案:
使用基本的 R 我们可以通过指定“相似性”(这里是 0.02)进行子集化:

df[abs(df$estimation1 - df$estimation2) < 0.02, ]

  ID estimation1 estimation2
1  A      0.0234       0.022

或者使用 dplyr

library(dplyr)

df %>% filter(abs(estimation1 - estimation2) < 0.02)
英文:

Update: After clarification:

One general way could be to group and find the lowest difference of the absolute value and filter thereafter:

library(dplyr)

df %&gt;% 
  group_by(ID) %&gt;% 
  mutate(diff = abs(estimation2 - estimation1)) %&gt;% 
  filter(diff == min(diff)) %&gt;% 
  select(-diff)

 ID    estimation1 estimation2
  &lt;chr&gt;       &lt;dbl&gt;       &lt;dbl&gt;
1 A          0.0234      0.022 
2 B         -0.005      -0.0052
3 C          0.1         0.11  

First answer:
With base R we could subset by indicating the "similarity" here 0.02:

df[abs(df$estimation1 - df$estimation2) &lt; 0.02, ]

  ID estimation1 estimation2
1  A      0.0234       0.022

or with dplyr:

library(dplyr)

df %&gt;% filter(abs(estimation1 - estimation2) &lt; 0.02)

答案2

得分: 1

I guess you meant to use the Euclidean distance to filter the "closest" estimations between two columns (grouped by ID), and the base option below might be one option:

subset(
    df,
    as.logical(
        ave(
            abs(estimation1 - estimation2),
            ID,
            FUN = \(x) seq_along(x) == which.min(x)
        )
    )
)

which gives

  ID estimation1 estimation2
1  A      0.0234      0.0220
6  B     -0.0050     -0.0052
9  C      0.1000      0.1100

If you use dplyr, you can try slice_min:

df %>%
    group_by(ID) %>%
    slice_min(abs(estimation2 - estimation1)) %>%
    ungroup()

which gives:

# A tibble: 3 × 3
  ID    estimation1 estimation2
1 A         0.0234         0.022
2 B        -0.0050        -0.0052
3 C         0.1000         0.1100
英文:

I guess you meant to use the Euclidean distance to filter the "closest" estimations between two columns (grouped by ID), and the base option below might be one option

subset(
    df,
    as.logical(
        ave(
            abs(estimation1 - estimation2),
            ID,
            FUN = \(x) seq_along(x) == which.min(x)
        )
    )
)

which gives

  ID estimation1 estimation2
1  A      0.0234      0.0220
6  B     -0.0050     -0.0052
9  C      0.1000      0.1100

If you use dplyr, you can try slice_min

df %&gt;%
    group_by(ID) %&gt;%
    slice_min(abs(estimation2 - estimation1)) %&gt;%
    ungroup()

which gives

# A tibble: 3 &#215; 3
  ID    estimation1 estimation2
  &lt;chr&gt;       &lt;dbl&gt;       &lt;dbl&gt;
1 A          0.0234      0.022
2 B         -0.005      -0.0052
3 C          0.1         0.11

huangapple
  • 本文由 发表于 2023年5月20日 22:47:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76295805.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定