英文:
Count the number of observations between two curving boundaries
问题
我有一个包含N个观测值和两个不重叠边界(一个更高,另一个更低)的数据集。我想要分析我的观测值相对于这些边界:看看有多少观测值(1)高于两个边界,(2)在两个边界之间,以及(3)低于两个边界。
这里是我的数据的简化版本。
data_line1 <- data.frame(line = "1", x = c(0, round(runif(18,0,10), 2), 10), y = round(runif(20,40,60), 2))
data_line2 <- data.frame(line = "2", x = c(0, round(runif(18,0,10), 2), 10), y = round(runif(20,0,39), 2))
data_dots <- data.frame(x = round(runif(200,0,10), 2), y = round(runif(100,0,60), 2))
plot <- ggplot()+
geom_line(data = data_line1, aes(x,y), color = "black")+
geom_line(data = data_line2, aes(x,y), color = "red")+
geom_point(data = data_dots, aes(x,y), color = "deepskyblue")
我相当确信这个问题应该有一个优雅的解决方案,但在尝试R之前,我甚至没有能够在纸上想出任何东西(我对R也相对陌生)。
英文:
I have a dataset with N observations and two non-overlapping boundaries (one higher and another lower). I want to analyse my observations relative to these boundaries: to see how many of the observations are (1) higher than both boundaries, (2) between two boundaries, and (3) below both boundaries.
Here is a simplified version of my data.
data_line1 <- data.frame(line = "1", x = c(0, round(runif(18,0,10), 2), 10), y = round(runif(20,40,60), 2))
data_line2 <- data.frame(line = "2", x = c(0, round(runif(18,0,10), 2), 10), y = round(runif(20,0,39), 2))
data_dots <- data.frame(x = round(runif(200,0,10), 2), y = round(runif(100,0,60), 2))
plot <- ggplot()+
geom_line(data = data_line1, aes(x,y), color = "black")+
geom_line(data = data_line2, aes(x,y), color = "red")+
geom_point(data = data_dots, aes(x,y), color = "deepskyblue")
I am pretty sure there should be an elegant solution to this problem, however I was not able to come up with anything on paper before even starting to try in R (which I am also relatively new to).
答案1
得分: 3
对于data_dots
中的每个点,您需要找出y值是否高于相应x值处的每条线。为了做到这一点,您需要对构成每条线的点进行插值。我们可以使用approxfun
来实现:
data_dots$line1_value <- approxfun(data_line1$x, data_line1$y)(data_dots$x)
data_dots$line2_value <- approxfun(data_line2$x, data_line2$y)(data_dots$x)
data_dots$group <- with(data_dots, 1 + (y > line2_value) + (y > line1_value))
现在,我们可以根据它们是否在每条线的上方或下方为每个组分配适当的标签:
data_dots$group <- c('below', 'between', 'above')[data_dots$group]
为了展示这一过程的效果,让我们根据它们的组来绘制点:
ggplot(data = data_line1, aes(x, y)) +
geom_point(data = data_dots, aes(colour = group)) +
geom_line() +
geom_line(data = data_line2)
要获取每个组中的实际数字,我们可以简单地使用table
:
table(data_dots$group)
#> above below between
#> 35 59 106
英文:
For each point in data_dots
, you need to find out whether the y value is higher than each line at the equivalent x value. To do this, you need to interpolate the points making up each line. We can do this with approxfun
:
data_dots$line1_value <- approxfun(data_line1$x, data_line1$y)(data_dots$x)
data_dots$line2_value <- approxfun(data_line2$x, data_line2$y)(data_dots$x)
data_dots$group <- with(data_dots, 1 + (y > line2_value) + (y > line1_value))
Now we can give each group an appropriate label depending on whether it is above or below each line:
data_dots$group <- c('below', 'between', 'above')[data_dots$group]
To show this works, let us plot the points according to their group:
ggplot(data = data_line1, aes(x, y)) +
geom_point(data = data_dots, aes(colour = group)) +
geom_line() +
geom_line(data = data_line2)
To get the actual numbers in each group, we can simply use table
:
table(data_dots$group)
#> above below between
#> 35 59 106
答案2
得分: 2
你可以使用 approxfun
来定义插值函数,分别用于下界和上界,然后使用给定的 x
和来自 data_dots
的数据检查 y
是否在两个边界之间,例如,
fu <- with(data_line1, approxfun(x, y))
fl <- with(data_line2, approxfun(x, y))
with(data_dots, sum(y >= fl(x) & y <= fu(x)))
这将计算在边界内的点的数量。
对于可视化,你可以尝试以下代码:
fu <- with(data_line1, approxfun(x, y))
fl <- with(data_line2, approxfun(x, y))
idx <- with(data_dots, y >= fl(x) & y <= fu(x))
plot +
geom_point(data = data_dots[idx, ], aes(x, y), color = "purple", size = 5)
<details>
<summary>英文:</summary>
You can use `approxfun` to define interpolation functions for both lower and upper bounds, and then check if `y` is in between two bounds with given `x` for data from `data_dots`, e.g.,
fu <- with(data_line1, approxfun(x, y))
fl <- with(data_line2, approxfun(x, y))
with(data_dots, sum(y >= fl(x) & y <= fu(x)))
which counts the number of points within bounds.
----------------------------------
For the visualization, you can try
fu <- with(data_line1, approxfun(x, y))
fl <- with(data_line2, approxfun(x, y))
idx <- with(data_dots, y >= fl(x) & y <= fu(x))
plot +
geom_point(data = data_dots[idx, ], aes(x, y), color = "purple", size = 5)
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/mOxVa.png
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论