2023年5月22日 22:45:31go评论80阅读模式

英文:

Filter numbers that are closest to target values and eliminate duplicated observations

问题

以下是翻译好的内容：

我有这个数据框：

data_a <- read.csv(text = "
date,treatment,stage
1,a,1
2,a,10
3,a,20
4,a,30
5,a,60
6,a,70
7,a,89
8,a,91
9,a,92
1,b,1
2,b,10
3,b,20
4,b,30
5,b,59.8
6,b,60.2
7,b,88.8
8,b,90.2
9,b,92
1,c,1
2,c,10
3,c,20
4,c,60
5,c,66
6,c,70
7,c,80
8,c,85
9,c,85")

我需要在每个 treatment 中过滤与 stage 值为10、60和89（或最接近这些目标值的观测值）匹配的观测值。我有以下代码：

filtered_data <- data_a %>%
  group_by(treatment) %>%
  filter(abs(stage - 10) == min(abs(stage - 10)) |
         abs(stage - 60) == min(abs(stage - 60)) |
         abs(stage - 89) == min(abs(stage - 89)))

这段代码部分地实现了目标，但在 treatment 为 b 和 c 时存在问题。

在 b 中，有两个观测值与目标的差值相同。因此，两个观测值都被过滤了，这是不希望的。

在 c 中，有两个观测值具有相同的值并且最接近目标，因此选择了这两个观测值，这也是不希望的。

期望的输出如下：

filtered_data <- read.csv(text = "
date,treatment,stage
2,a,10
5,a,60
7,a,89
2,b,10
5,b,59.8
7,b,88.8
2,c,10
4,c,60
8,c,85")

英文:

I have this dataframe:

data_a &lt;- read.csv(text = &quot;
date,treatment,stage
1,a,1
2,a,10
3,a,20
4,a,30
5,a,60
6,a,70
7,a,89
8,a,91
9,a,92
1,b,1
2,b,10
3,b,20
4,b,30
5,b,59.8
6,b,60.2
7,b,88.8
8,b,90.2
9,b,92
1,c,1
2,c,10
3,c,20
4,c,60
5,c,66
6,c,70
7,c,80
8,c,85
9,c,85&quot;)

I need to filter within each treatment the observations matching stage 10, 60, and 89 (or the observation closest to those target values). The code I have is this:

filtered_data &lt;- data_a %&gt;%
  group_by(treatment) %&gt;%
  filter(abs(stage - 10) == min(abs(stage - 10)) |
         abs(stage - 60) == min(abs(stage - 60)) |
         abs(stage - 89) == min(abs(stage - 89)))

This code partially does the trick, but there are problems for treatment b and c.

In b, two observations have the same difference from the target. So, both observations are filtered in, which is not desired.

In c, two observations have the same value and are closest to the target, and therefore both observations are selected, which is not desired.

The desired output is this:

filtered_data &lt;- read.csv(text = &quot;
date,treatment,stage
2,a,10
5,a,60
7,a,89
2,b,10
5,b,59.8
7,b,88.8
2,c,10
4,c,60
8,c,85&quot;)

答案1

得分: 2

I would do it thusly

library(tidyverse)

crossing(
  data_a,
  target_stage = c(10, 60, 89)
  ) %>%
  group_by(treatment, target_stage) %>%
  slice_min(
    abs(stage-target_stage),
    with_ties = F
    )

<sup>Created on 2023-05-22 with reprex v2.0.2</sup>

If you expand the grid using crossing you can then group by this and find the smallest whilst also removing ties

英文:

I would do it thusly

library(tidyverse)

crossing(
  data_a,
  target_stage = c(10, 60, 89)
  ) %&gt;% 
  group_by(treatment, target_stage) %&gt;% 
  slice_min(
    abs(stage-target_stage),
    with_ties = F
    )
#&gt; # A tibble: 9 &#215; 4
#&gt; # Groups:   treatment, target_stage [9]
#&gt;    date treatment stage target_stage
#&gt;   &lt;int&gt; &lt;chr&gt;     &lt;dbl&gt;        &lt;dbl&gt;
#&gt; 1     2 a          10             10
#&gt; 2     5 a          60             60
#&gt; 3     7 a          89             89
#&gt; 4     2 b          10             10
#&gt; 5     5 b          59.8           60
#&gt; 6     7 b          88.8           89
#&gt; 7     2 c          10             10
#&gt; 8     4 c          60             60
#&gt; 9     8 c          85             89

<sup>Created on 2023-05-22 with reprex v2.0.2</sup>

If you expand the grid using crossing you can then group by this and find the smallest whilst also removing ties

答案2

得分: 1

使用dplyr和purrr：

library(dplyr)
library(purrr)

map_dfr(c(10, 60, 89),
        ~ data_a %>%
          filter(abs(stage - .x) == min(abs(stage - .x)),
                 .by = treatment) %>% 
          slice_min(stage, n = 1, with_ties = FALSE, by = treatment)) %>% 
  arrange(treatment, date)

使用data.table：

library(data.table)

setDT(data_a)[
  data_a[CJ(stage = c(10, 60, 89), treatment = unique(data_a$treatment)), 
                on = .(treatment, stage), 
                roll = "nearest", 
                .(date, treatment)], 
  on = .(treatment, date)][
    order(treatment, date)]

英文:

Using dplyr and purrr:

library(dplyr)
library(purrr)

map_dfr(c(10, 60, 89),
        ~ data_a %&gt;%
          filter(abs(stage - .x) == min(abs(stage - .x)),
                 .by = treatment) %&gt;% 
          slice_min(stage, n = 1, with_ties = FALSE, by = treatment)) %&gt;% 
  arrange(treatment, date)

#&gt;   date treatment stage
#&gt; 1    2         a  10.0
#&gt; 2    5         a  60.0
#&gt; 3    7         a  89.0
#&gt; 4    2         b  10.0
#&gt; 5    5         b  59.8
#&gt; 6    7         b  88.8
#&gt; 7    2         c  10.0
#&gt; 8    4         c  60.0
#&gt; 9    8         c  85.0

Using data.table:

library(data.table)

setDT(data_a)[
  data_a[CJ(stage = c(10, 60, 89), treatment = unique(data_a$treatment)), 
                on = .(treatment, stage), 
                roll = &quot;nearest&quot;, 
                .(date, treatment)], 
  on = .(treatment, date)][
    order(treatment, date)]

#&gt;    date treatment stage
#&gt; 1:    2         a  10.0
#&gt; 2:    5         a  60.0
#&gt; 3:    7         a  89.0
#&gt; 4:    2         b  10.0
#&gt; 5:    5         b  59.8
#&gt; 6:    7         b  88.8
#&gt; 7:    2         c  10.0
#&gt; 8:    4         c  60.0
#&gt; 9:    9         c  85.0

答案3

得分: 1

你可以使用 outer() + max.col() 来找到距离 10、60、89 最近或最远的数值。

library(dplyr)

data_a %>%
  slice({
    mat <- abs(outer(c(10, 60, 89), stage, '-'))
    max.col(-mat, "first")
  }, .by = treatment)

#   date treatment stage
# 1    2         a  10.0
# 2    5         a  60.0
# 3    7         a  89.0
# 4    2         b  10.0
# 5    5         b  59.8
# 6    7         b  88.8
# 7    2         c  10.0
# 8    4         c  60.0
# 9    8         c  85.0

英文:

You can use outer() + max.col() to find the closest or farthest values from 10, 60, 89.

library(dplyr)

data_a %&gt;%
  slice({
    mat &lt;- abs(outer(c(10, 60, 89), stage, &#39;-&#39;))
    max.col(-mat, &quot;first&quot;)
  }, .by = treatment)

#   date treatment stage
# 1    2         a  10.0
# 2    5         a  60.0
# 3    7         a  89.0
# 4    2         b  10.0
# 5    5         b  59.8
# 6    7         b  88.8
# 7    2         c  10.0
# 8    4         c  60.0
# 9    8         c  85.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

过滤最接近目标值的数字并消除重复观察。

问题

答案1

答案2

答案3

盒图的数量不准确。如何修复？

如何根据其他列的值填充Python数据框的列值？

系统软件能对R包功能造成多大干扰？

如何使用`conditionPanel`在Shiny中更新显示不同的`sliderInput`？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论