计算两个曲线边界之间的观察次数。

huangapple go评论78阅读模式
英文:

Count the number of observations between two curving boundaries

问题

我有一个包含N个观测值和两个不重叠边界(一个更高,另一个更低)的数据集。我想要分析我的观测值相对于这些边界:看看有多少观测值(1)高于两个边界,(2)在两个边界之间,以及(3)低于两个边界。

这里是我的数据的简化版本。

data_line1 <- data.frame(line = "1", x = c(0, round(runif(18,0,10), 2), 10), y = round(runif(20,40,60), 2))
data_line2 <- data.frame(line = "2", x = c(0, round(runif(18,0,10), 2), 10), y = round(runif(20,0,39), 2))
data_dots <- data.frame(x = round(runif(200,0,10), 2), y = round(runif(100,0,60), 2))

plot <- ggplot()+
  geom_line(data = data_line1, aes(x,y), color = "black")+
  geom_line(data = data_line2, aes(x,y), color = "red")+
  geom_point(data = data_dots, aes(x,y), color = "deepskyblue")

计算两个曲线边界之间的观察次数。

我相当确信这个问题应该有一个优雅的解决方案,但在尝试R之前,我甚至没有能够在纸上想出任何东西(我对R也相对陌生)。

英文:

I have a dataset with N observations and two non-overlapping boundaries (one higher and another lower). I want to analyse my observations relative to these boundaries: to see how many of the observations are (1) higher than both boundaries, (2) between two boundaries, and (3) below both boundaries.

Here is a simplified version of my data.

data_line1 <- data.frame(line = "1", x = c(0, round(runif(18,0,10), 2), 10), y = round(runif(20,40,60), 2))
data_line2 <- data.frame(line = "2", x = c(0, round(runif(18,0,10), 2), 10), y = round(runif(20,0,39), 2))
data_dots <- data.frame(x = round(runif(200,0,10), 2), y = round(runif(100,0,60), 2))

plot <- ggplot()+
  geom_line(data = data_line1, aes(x,y), color = "black")+
  geom_line(data = data_line2, aes(x,y), color = "red")+
  geom_point(data = data_dots, aes(x,y), color = "deepskyblue")

计算两个曲线边界之间的观察次数。

I am pretty sure there should be an elegant solution to this problem, however I was not able to come up with anything on paper before even starting to try in R (which I am also relatively new to).

答案1

得分: 3

对于data_dots中的每个点,您需要找出y值是否高于相应x值处的每条线。为了做到这一点,您需要对构成每条线的点进行插值。我们可以使用approxfun来实现:

data_dots$line1_value <- approxfun(data_line1$x, data_line1$y)(data_dots$x)
data_dots$line2_value <- approxfun(data_line2$x, data_line2$y)(data_dots$x)
data_dots$group <- with(data_dots, 1 + (y > line2_value) + (y > line1_value))

现在,我们可以根据它们是否在每条线的上方或下方为每个组分配适当的标签:

data_dots$group <- c('below', 'between', 'above')[data_dots$group]

为了展示这一过程的效果,让我们根据它们的组来绘制点:

ggplot(data = data_line1, aes(x, y)) +
  geom_point(data = data_dots, aes(colour = group)) +
  geom_line() +
  geom_line(data = data_line2) 

要获取每个组中的实际数字,我们可以简单地使用table

table(data_dots$group)
#> above   below between 
#>    35      59     106 

计算两个曲线边界之间的观察次数。

英文:

For each point in data_dots, you need to find out whether the y value is higher than each line at the equivalent x value. To do this, you need to interpolate the points making up each line. We can do this with approxfun:

data_dots$line1_value &lt;- approxfun(data_line1$x, data_line1$y)(data_dots$x)
data_dots$line2_value &lt;- approxfun(data_line2$x, data_line2$y)(data_dots$x)
data_dots$group &lt;- with(data_dots, 1 + (y &gt; line2_value) + (y &gt; line1_value))

Now we can give each group an appropriate label depending on whether it is above or below each line:

data_dots$group &lt;- c(&#39;below&#39;, &#39;between&#39;, &#39;above&#39;)[data_dots$group]

To show this works, let us plot the points according to their group:

ggplot(data = data_line1, aes(x, y)) +
  geom_point(data = data_dots, aes(colour = group)) +
  geom_line() +
  geom_line(data = data_line2) 

计算两个曲线边界之间的观察次数。

To get the actual numbers in each group, we can simply use table:

table(data_dots$group)
#&gt; above   below between 
#&gt;    35      59     106 

答案2

得分: 2

你可以使用 approxfun 来定义插值函数,分别用于下界和上界,然后使用给定的 x 和来自 data_dots 的数据检查 y 是否在两个边界之间,例如,

fu <- with(data_line1, approxfun(x, y))
fl <- with(data_line2, approxfun(x, y))
with(data_dots, sum(y >= fl(x) & y <= fu(x)))

这将计算在边界内的点的数量。


对于可视化,你可以尝试以下代码:

fu <- with(data_line1, approxfun(x, y))
fl <- with(data_line2, approxfun(x, y))
idx <- with(data_dots, y >= fl(x) & y <= fu(x))

plot +
    geom_point(data = data_dots[idx, ], aes(x, y), color = "purple", size = 5)

计算两个曲线边界之间的观察次数。


<details>
<summary>英文:</summary>

You can use `approxfun` to define interpolation functions for both lower and upper bounds, and then check if `y` is in between two bounds with given `x` for data from `data_dots`, e.g.,

fu <- with(data_line1, approxfun(x, y))
fl <- with(data_line2, approxfun(x, y))
with(data_dots, sum(y >= fl(x) & y <= fu(x)))

which counts the number of points within bounds.

----------------------------------

For the visualization, you can try

fu <- with(data_line1, approxfun(x, y))
fl <- with(data_line2, approxfun(x, y))
idx <- with(data_dots, y >= fl(x) & y <= fu(x))

plot +
geom_point(data = data_dots[idx, ], aes(x, y), color = "purple", size = 5)

[![enter image description here][1]][1]


  [1]: https://i.stack.imgur.com/mOxVa.png

</details>



huangapple
  • 本文由 发表于 2023年7月6日 21:20:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76629299.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定