如何创建一个先前调查过的区域的子集,涵盖所有调查团队和土地类型?

huangapple go评论65阅读模式
英文:

How do I create a subset of previously surveyed areas which covers all survey teams and land class types?

问题

我目前只使用虚拟数据(附加),但情况是:

  • 我将有大量不同的原始调查区域(在虚拟数据中为100个)
  • 每个原始调查区域都将由一个调查团队进行调查(在我的虚拟数据中,我包括了5个不同的调查团队:a、b、c、d、e)
  • 每个调查区域还被分配了一个土地类别(在我的虚拟数据中,我包括了9个土地类别:1 - 9)

我想编写一个脚本,它将为我确定重新调查的某个数量(举例来说,假设为25%)的调查区域。这些被确定重新调查的区域必须:

  • 尽可能均匀地覆盖所有调查团队(即每个团队5个)
    并且作为其中的子集
  • 尽可能均匀地覆盖所有土地类别

在R中是否可能实现这一点?或者是其他系统?我也可以访问AGOL和ArcPRO。

虚拟数据:(代码部分不翻译)

英文:

I am currently only working with dummy data (attached) but the situation is:

  • I will have a high number of different original survey areas (in dummy data, 100)
  • Each original survey area will have been surveyed by a survey team (in my dummy data, I have included 5 different survey teams: a, b, c, d, e)
  • Each survey area has also been allocated a land class type (in my dummy data, I have included 9 land classes: 1 - 9)

I want to write a script which will identify a certain number (for examples sake, let's say 25%) of survey areas for me to resurvey for quality assurance. These identified areas for resurvey must:

  • Evenly (as much as possible) cover all survey teams (i.e. 5 per team)
    AND as a subset of that
  • Evenly (as much as possible) cover all land classes

Is this possible within R? Or an alternate system? I have access to AGOL and ArcPRO too.

Dummy data:

Completed survey area |	Survey team |Land Class
1	a	1
2	b	2
3	c	3
4	d	4
5	e	5
6	a	6
7	b	7
8	c	8
9	d	9
10	e	1
11	a	2
12	b	3
13	c	4
14	d	5
15	e	6
16	a	7
17	b	8
18	c	9
19	d	1
20	e	2
21	a	3
22	b	4
23	c	5
24	d	6
25	e	7
26	a	8
27	b	9
28	c	1
29	d	2
30	e	3
31	a	4
32	b	5
33	c	6
34	d	7
35	e	8
36	a	9
37	b	1
38	c	2
39	d	3
40	e	4
41	a	5
42	b	6
43	c	7
44	d	8
45	e	9
46	a	1
47	b	2
48	c	3
49	d	4
50	e	5
51	a	6
52	b	7
53	c	8
54	d	9
55	e	1
56	a	2
57	b	3
58	c	4
59	d	5
60	e	6
61	a	7
62	b	8
63	c	9
64	d	1
65	e	2
66	a	3
67	b	4
68	c	5
69	d	6
70	e	7
71	a	8
72	b	9
73	c	1
74	d	2
75	e	3
76	a	4
77	b	5
78	c	6
79	d	7
80	e	8
81	a	9
82	b	1
83	c	2
84	d	3
85	e	4
86	a	5
87	b	6
88	c	7
89	d	8
90	e	9
91	a	1
92	b	2
93	c	3
94	d	4
95	e	5
96	a	6
97	b	7
98	c	8
99	d	9
100	e	1

I am yet to try anything as not sure where to begin.

答案1

得分: 0

这是你想要的吗?我创建了一个更大的示例数据集(n = 10,000),因为你提供的示例数据集可能太小,无法得到你想要的结果。请注意,由于"Survey team"值相对于"Land Class"的分布,略多于25%的调查站点被返回。观察到一点:我建议使用语法上有效的列名,例如避免空格和特殊字符,这样你在声明列名时就不必每次都使用反引号。

英文:

Is this what your after? I created a larger example dataset (n = 10,000) as the example dataset you gave was I believe too small to get the result you're wanting. Note also that due to the distribution of the "Survey team" values relative to "Land Class", slightly more than 25% of survey sites are returned:

library(tidyr)
library(dplyr)
# set.seed() to make results below reproducible
set.seed(1)

# Example df based on values given in your example df
df <- data.frame(`Completed survey area` = 1:10000,
                 `Survey team` = rep(letters[1:5], 2000),
                 `Land Class` = c(rep(1:9, 1111), 1),
                 check.names = FALSE)

# Return ~25% sample of dataset with even distribution of teams and land classes
resurvey <- df %>%
  group_by(`Survey team`, `Land Class`) %>%
  sample_frac(size=.25) # Sample size as fraction of total dataset e.g. 25%

resurvey
# # A tibble: 2,520 × 3
# # Groups:   Survey team, Land Class [45]
# `Completed survey area` `Survey team` `Land Class`
#                   <int>         <chr>        <dbl>
# 1                  3016             a            1
# 2                  7471             a            1
# 3                  5761             a            1
# 4                  7246             a            1
# 5                  9631             a            1
# 6                  1891             a            1
# 7                   586             a            1
# 8                  9406             a            1
# 9                  8371             a            1
# 10                 2251             a            1
# # … with 2,510 more rows# ℹ Use `print(n = ...)` to see more rows

# Check sample distribution
check <- resurvey %>% group_by(`Survey team`,`Land Class`) %>% tally()
print(check, n = nrow(check))
# # A tibble: 45 × 3
# # Groups:   Survey team [5]
# `Survey team`     `Land Class`   n
# <chr>                  <dbl> <int>
# 1 a                        1    56
# 2 a                        2    56
# 3 a                        3    56
# 4 a                        4    56
# 5 a                        5    56
# 6 a                        6    56
# 7 a                        7    56
# 8 a                        8    56
# 9 a                        9    56
# 10 b                       1    56
# 11 b                       2    56
# 12 b                       3    56
# 13 b                       4    56
# 14 b                       5    56
# 15 b                       6    56
# 16 b                       7    56
# 17 b                       8    56
# 18 b                       9    56
# 19 c                       1    56
# 20 c                       2    56
# 21 c                       3    56
# 22 c                       4    56
# 23 c                       5    56
# 24 c                       6    56
# 25 c                       7    56
# 26 c                       8    56
# 27 c                       9    56
# 28 d                       1    56
# 29 d                       2    56
# 30 d                       3    56
# 31 d                       4    56
# 32 d                       5    56
# 33 d                       6    56
# 34 d                       7    56
# 35 d                       8    56
# 36 d                       9    56
# 37 e                       1    56
# 38 e                       2    56
# 39 e                       3    56
# 40 e                       4    56
# 41 e                       5    56
# 42 e                       6    56
# 43 e                       7    56
# 44 e                       8    56
# 45 e                       9    56

One observation: I would recommend using syntactically valid column names - e.g. avoid spaces and special characters - that way you don't have to use back ticks every time you declare a column name.

huangapple
  • 本文由 发表于 2023年4月19日 19:25:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76053919.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定