如何创建一个先前调查过的区域的子集,涵盖所有调查团队和土地类型?

huangapple go评论93阅读模式
英文:

How do I create a subset of previously surveyed areas which covers all survey teams and land class types?

问题

我目前只使用虚拟数据(附加),但情况是:

  • 我将有大量不同的原始调查区域(在虚拟数据中为100个)
  • 每个原始调查区域都将由一个调查团队进行调查(在我的虚拟数据中,我包括了5个不同的调查团队:a、b、c、d、e)
  • 每个调查区域还被分配了一个土地类别(在我的虚拟数据中,我包括了9个土地类别:1 - 9)

我想编写一个脚本,它将为我确定重新调查的某个数量(举例来说,假设为25%)的调查区域。这些被确定重新调查的区域必须:

  • 尽可能均匀地覆盖所有调查团队(即每个团队5个)
    并且作为其中的子集
  • 尽可能均匀地覆盖所有土地类别

在R中是否可能实现这一点?或者是其他系统?我也可以访问AGOL和ArcPRO。

虚拟数据:(代码部分不翻译)

英文:

I am currently only working with dummy data (attached) but the situation is:

  • I will have a high number of different original survey areas (in dummy data, 100)
  • Each original survey area will have been surveyed by a survey team (in my dummy data, I have included 5 different survey teams: a, b, c, d, e)
  • Each survey area has also been allocated a land class type (in my dummy data, I have included 9 land classes: 1 - 9)

I want to write a script which will identify a certain number (for examples sake, let's say 25%) of survey areas for me to resurvey for quality assurance. These identified areas for resurvey must:

  • Evenly (as much as possible) cover all survey teams (i.e. 5 per team)
    AND as a subset of that
  • Evenly (as much as possible) cover all land classes

Is this possible within R? Or an alternate system? I have access to AGOL and ArcPRO too.

Dummy data:

  1. Completed survey area | Survey team |Land Class
  2. 1 a 1
  3. 2 b 2
  4. 3 c 3
  5. 4 d 4
  6. 5 e 5
  7. 6 a 6
  8. 7 b 7
  9. 8 c 8
  10. 9 d 9
  11. 10 e 1
  12. 11 a 2
  13. 12 b 3
  14. 13 c 4
  15. 14 d 5
  16. 15 e 6
  17. 16 a 7
  18. 17 b 8
  19. 18 c 9
  20. 19 d 1
  21. 20 e 2
  22. 21 a 3
  23. 22 b 4
  24. 23 c 5
  25. 24 d 6
  26. 25 e 7
  27. 26 a 8
  28. 27 b 9
  29. 28 c 1
  30. 29 d 2
  31. 30 e 3
  32. 31 a 4
  33. 32 b 5
  34. 33 c 6
  35. 34 d 7
  36. 35 e 8
  37. 36 a 9
  38. 37 b 1
  39. 38 c 2
  40. 39 d 3
  41. 40 e 4
  42. 41 a 5
  43. 42 b 6
  44. 43 c 7
  45. 44 d 8
  46. 45 e 9
  47. 46 a 1
  48. 47 b 2
  49. 48 c 3
  50. 49 d 4
  51. 50 e 5
  52. 51 a 6
  53. 52 b 7
  54. 53 c 8
  55. 54 d 9
  56. 55 e 1
  57. 56 a 2
  58. 57 b 3
  59. 58 c 4
  60. 59 d 5
  61. 60 e 6
  62. 61 a 7
  63. 62 b 8
  64. 63 c 9
  65. 64 d 1
  66. 65 e 2
  67. 66 a 3
  68. 67 b 4
  69. 68 c 5
  70. 69 d 6
  71. 70 e 7
  72. 71 a 8
  73. 72 b 9
  74. 73 c 1
  75. 74 d 2
  76. 75 e 3
  77. 76 a 4
  78. 77 b 5
  79. 78 c 6
  80. 79 d 7
  81. 80 e 8
  82. 81 a 9
  83. 82 b 1
  84. 83 c 2
  85. 84 d 3
  86. 85 e 4
  87. 86 a 5
  88. 87 b 6
  89. 88 c 7
  90. 89 d 8
  91. 90 e 9
  92. 91 a 1
  93. 92 b 2
  94. 93 c 3
  95. 94 d 4
  96. 95 e 5
  97. 96 a 6
  98. 97 b 7
  99. 98 c 8
  100. 99 d 9
  101. 100 e 1

I am yet to try anything as not sure where to begin.

答案1

得分: 0

这是你想要的吗?我创建了一个更大的示例数据集(n = 10,000),因为你提供的示例数据集可能太小,无法得到你想要的结果。请注意,由于"Survey team"值相对于"Land Class"的分布,略多于25%的调查站点被返回。观察到一点:我建议使用语法上有效的列名,例如避免空格和特殊字符,这样你在声明列名时就不必每次都使用反引号。

英文:

Is this what your after? I created a larger example dataset (n = 10,000) as the example dataset you gave was I believe too small to get the result you're wanting. Note also that due to the distribution of the "Survey team" values relative to "Land Class", slightly more than 25% of survey sites are returned:

  1. library(tidyr)
  2. library(dplyr)
  3. # set.seed() to make results below reproducible
  4. set.seed(1)
  5. # Example df based on values given in your example df
  6. df <- data.frame(`Completed survey area` = 1:10000,
  7. `Survey team` = rep(letters[1:5], 2000),
  8. `Land Class` = c(rep(1:9, 1111), 1),
  9. check.names = FALSE)
  10. # Return ~25% sample of dataset with even distribution of teams and land classes
  11. resurvey <- df %>%
  12. group_by(`Survey team`, `Land Class`) %>%
  13. sample_frac(size=.25) # Sample size as fraction of total dataset e.g. 25%
  14. resurvey
  15. # # A tibble: 2,520 × 3
  16. # # Groups: Survey team, Land Class [45]
  17. # `Completed survey area` `Survey team` `Land Class`
  18. # <int> <chr> <dbl>
  19. # 1 3016 a 1
  20. # 2 7471 a 1
  21. # 3 5761 a 1
  22. # 4 7246 a 1
  23. # 5 9631 a 1
  24. # 6 1891 a 1
  25. # 7 586 a 1
  26. # 8 9406 a 1
  27. # 9 8371 a 1
  28. # 10 2251 a 1
  29. # # … with 2,510 more rows# ℹ Use `print(n = ...)` to see more rows
  30. # Check sample distribution
  31. check <- resurvey %>% group_by(`Survey team`,`Land Class`) %>% tally()
  32. print(check, n = nrow(check))
  33. # # A tibble: 45 × 3
  34. # # Groups: Survey team [5]
  35. # `Survey team` `Land Class` n
  36. # <chr> <dbl> <int>
  37. # 1 a 1 56
  38. # 2 a 2 56
  39. # 3 a 3 56
  40. # 4 a 4 56
  41. # 5 a 5 56
  42. # 6 a 6 56
  43. # 7 a 7 56
  44. # 8 a 8 56
  45. # 9 a 9 56
  46. # 10 b 1 56
  47. # 11 b 2 56
  48. # 12 b 3 56
  49. # 13 b 4 56
  50. # 14 b 5 56
  51. # 15 b 6 56
  52. # 16 b 7 56
  53. # 17 b 8 56
  54. # 18 b 9 56
  55. # 19 c 1 56
  56. # 20 c 2 56
  57. # 21 c 3 56
  58. # 22 c 4 56
  59. # 23 c 5 56
  60. # 24 c 6 56
  61. # 25 c 7 56
  62. # 26 c 8 56
  63. # 27 c 9 56
  64. # 28 d 1 56
  65. # 29 d 2 56
  66. # 30 d 3 56
  67. # 31 d 4 56
  68. # 32 d 5 56
  69. # 33 d 6 56
  70. # 34 d 7 56
  71. # 35 d 8 56
  72. # 36 d 9 56
  73. # 37 e 1 56
  74. # 38 e 2 56
  75. # 39 e 3 56
  76. # 40 e 4 56
  77. # 41 e 5 56
  78. # 42 e 6 56
  79. # 43 e 7 56
  80. # 44 e 8 56
  81. # 45 e 9 56

One observation: I would recommend using syntactically valid column names - e.g. avoid spaces and special characters - that way you don't have to use back ticks every time you declare a column name.

huangapple
  • 本文由 发表于 2023年4月19日 19:25:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76053919.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定