2023年4月19日 19:25:06go评论93阅读模式

英文:

How do I create a subset of previously surveyed areas which covers all survey teams and land class types?

问题

我目前只使用虚拟数据（附加），但情况是：

我将有大量不同的原始调查区域（在虚拟数据中为100个）
每个原始调查区域都将由一个调查团队进行调查（在我的虚拟数据中，我包括了5个不同的调查团队：a、b、c、d、e）
每个调查区域还被分配了一个土地类别（在我的虚拟数据中，我包括了9个土地类别：1 - 9）

我想编写一个脚本，它将为我确定重新调查的某个数量（举例来说，假设为25%）的调查区域。这些被确定重新调查的区域必须：

尽可能均匀地覆盖所有调查团队（即每个团队5个）
并且作为其中的子集
尽可能均匀地覆盖所有土地类别

在R中是否可能实现这一点？或者是其他系统？我也可以访问AGOL和ArcPRO。

虚拟数据：（代码部分不翻译）

英文:

I am currently only working with dummy data (attached) but the situation is:

I will have a high number of different original survey areas (in dummy data, 100)
Each original survey area will have been surveyed by a survey team (in my dummy data, I have included 5 different survey teams: a, b, c, d, e)
Each survey area has also been allocated a land class type (in my dummy data, I have included 9 land classes: 1 - 9)

I want to write a script which will identify a certain number (for examples sake, let's say 25%) of survey areas for me to resurvey for quality assurance. These identified areas for resurvey must:

Evenly (as much as possible) cover all survey teams (i.e. 5 per team)
AND as a subset of that
Evenly (as much as possible) cover all land classes

Is this possible within R? Or an alternate system? I have access to AGOL and ArcPRO too.

Dummy data:

Completed survey area |	Survey team |Land Class
1	a	1
2	b	2
3	c	3
4	d	4
5	e	5
6	a	6
7	b	7
8	c	8
9	d	9
10	e	1
11	a	2
12	b	3
13	c	4
14	d	5
15	e	6
16	a	7
17	b	8
18	c	9
19	d	1
20	e	2
21	a	3
22	b	4
23	c	5
24	d	6
25	e	7
26	a	8
27	b	9
28	c	1
29	d	2
30	e	3
31	a	4
32	b	5
33	c	6
34	d	7
35	e	8
36	a	9
37	b	1
38	c	2
39	d	3
40	e	4
41	a	5
42	b	6
43	c	7
44	d	8
45	e	9
46	a	1
47	b	2
48	c	3
49	d	4
50	e	5
51	a	6
52	b	7
53	c	8
54	d	9
55	e	1
56	a	2
57	b	3
58	c	4
59	d	5
60	e	6
61	a	7
62	b	8
63	c	9
64	d	1
65	e	2
66	a	3
67	b	4
68	c	5
69	d	6
70	e	7
71	a	8
72	b	9
73	c	1
74	d	2
75	e	3
76	a	4
77	b	5
78	c	6
79	d	7
80	e	8
81	a	9
82	b	1
83	c	2
84	d	3
85	e	4
86	a	5
87	b	6
88	c	7
89	d	8
90	e	9
91	a	1
92	b	2
93	c	3
94	d	4
95	e	5
96	a	6
97	b	7
98	c	8
99	d	9
100	e	1

I am yet to try anything as not sure where to begin.

答案1

得分: 0

这是你想要的吗？我创建了一个更大的示例数据集（n = 10,000），因为你提供的示例数据集可能太小，无法得到你想要的结果。请注意，由于"Survey team"值相对于"Land Class"的分布，略多于25%的调查站点被返回。观察到一点：我建议使用语法上有效的列名，例如避免空格和特殊字符，这样你在声明列名时就不必每次都使用反引号。

英文:

Is this what your after? I created a larger example dataset (n = 10,000) as the example dataset you gave was I believe too small to get the result you're wanting. Note also that due to the distribution of the "Survey team" values relative to "Land Class", slightly more than 25% of survey sites are returned:

library(tidyr)
library(dplyr)
# set.seed() to make results below reproducible
set.seed(1)
# Example df based on values given in your example df
df &lt;- data.frame(`Completed survey area` = 1:10000,
                 `Survey team` = rep(letters[1:5], 2000),
                 `Land Class` = c(rep(1:9, 1111), 1),
                 check.names = FALSE)
# Return ~25% sample of dataset with even distribution of teams and land classes
resurvey &lt;- df %&gt;%
  group_by(`Survey team`, `Land Class`) %&gt;%
  sample_frac(size=.25) # Sample size as fraction of total dataset e.g. 25%
resurvey
# # A tibble: 2,520 &#215; 3
# # Groups:   Survey team, Land Class [45]
# `Completed survey area` `Survey team` `Land Class`
#                   &lt;int&gt;         &lt;chr&gt;        &lt;dbl&gt;
# 1                  3016             a            1
# 2                  7471             a            1
# 3                  5761             a            1
# 4                  7246             a            1
# 5                  9631             a            1
# 6                  1891             a            1
# 7                   586             a            1
# 8                  9406             a            1
# 9                  8371             a            1
# 10                 2251             a            1
# # … with 2,510 more rows# ℹ Use `print(n = ...)` to see more rows
# Check sample distribution
check &lt;- resurvey %&gt;% group_by(`Survey team`,`Land Class`) %&gt;% tally()
print(check, n = nrow(check))
# # A tibble: 45 &#215; 3
# # Groups:   Survey team [5]
# `Survey team`     `Land Class`   n
# &lt;chr&gt;                  &lt;dbl&gt; &lt;int&gt;
# 1 a                        1    56
# 2 a                        2    56
# 3 a                        3    56
# 4 a                        4    56
# 5 a                        5    56
# 6 a                        6    56
# 7 a                        7    56
# 8 a                        8    56
# 9 a                        9    56
# 10 b                       1    56
# 11 b                       2    56
# 12 b                       3    56
# 13 b                       4    56
# 14 b                       5    56
# 15 b                       6    56
# 16 b                       7    56
# 17 b                       8    56
# 18 b                       9    56
# 19 c                       1    56
# 20 c                       2    56
# 21 c                       3    56
# 22 c                       4    56
# 23 c                       5    56
# 24 c                       6    56
# 25 c                       7    56
# 26 c                       8    56
# 27 c                       9    56
# 28 d                       1    56
# 29 d                       2    56
# 30 d                       3    56
# 31 d                       4    56
# 32 d                       5    56
# 33 d                       6    56
# 34 d                       7    56
# 35 d                       8    56
# 36 d                       9    56
# 37 e                       1    56
# 38 e                       2    56
# 39 e                       3    56
# 40 e                       4    56
# 41 e                       5    56
# 42 e                       6    56
# 43 e                       7    56
# 44 e                       8    56
# 45 e                       9    56

One observation: I would recommend using syntactically valid column names - e.g. avoid spaces and special characters - that way you don't have to use back ticks every time you declare a column name.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何创建一个先前调查过的区域的子集，涵盖所有调查团队和土地类型？

问题

答案1

R：验证抛硬币结果

有关嵌套Shiny模块响应动作按钮点击的问题。

使用lapply函数在构建带有多个条件的复杂列表时是否值得代替for循环？

有没有一种方法可以计算两个具有不同范围的单独数据集的检测概率？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。