在R中使用筛选器对项目进行随机化。

huangapple go评论92阅读模式
英文:

Randomize items with a filter in R

问题

有没有一种方法可以基于筛选条件对DataFrame的行进行洗牌?例如,我有这个DataFrame:

  1. data=data.frame(id=c(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26),
  2. name=c("restructuring","restructuring","restructuring","restructuring",
  3. "control","control","control","control","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler")
  4. )

其中,编号从3到6是'restructuring',7-10是'control',11-18是'clitic filler',19-26是'action filler',我希望name列在两个连续的行中不具有相同的值。

我尝试过:

  1. shuffled_data = data[sample(1:nrow(data)), ]

但这显然没有特定的条件。

英文:

is there a way to shuffle dataframe's rows based on a filter? For instance, I have this dataframe:

  1. data=data.frame(id=c(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26),
  2. name=c("restructuring","restructuring","restructuring","restructuring",
  3. "control","control","control","control","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler")
  4. )

In which numbers from 3 to 6 are 'restructuring', 7-10 are 'control', 11-18 are 'clitic filler', 19-26 are 'action filler', and I'd like name column to not have the same value in 2 consecutive rows.

I tried:

shuffled_data= data[sample(1:nrow(data)), ]

But this obviously randomizes with no criteria

答案1

得分: 1

如果你的数据大小是这样的,我会进行一系列的随机洗牌,并找出符合你的条件的一个或多个:

  1. shuffle = function(data) {
  2. data[sample(1:nrow(data)), ]
  3. }
  4. check = function(data) {
  5. all(data$name[-1] != data$name[-nrow(data)])
  6. }
  7. set.seed(47)
  8. results = replicate(10000, shuffle(data), simplify = FALSE)
  9. results = results[sapply(results, check)]
  10. length(results)
  11. [1] 10
  12. ## 在10000次洗牌中,有10次符合你的条件
  13. ## 这是其中一种情况:
  14. results[[1]]
  15. # id name
  16. # 16 18 clitic filler
  17. # 21 23 action filler
  18. # 9 11 clitic filler
  19. # 20 22 action filler
  20. # 15 17 clitic filler
  21. # 24 26 action filler
  22. # 1 3 restructuring
  23. # 13 15 clitic filler
  24. # 7 9 control
  25. # 2 4 restructuring
  26. # 19 21 action filler
  27. # 6 8 control
  28. # 4 6 restructuring
  29. # 23 25 action filler
  30. # 3 5 restructuring
  31. # 22 24 action filler
  32. # 10 12 clitic filler
  33. # 18 20 action filler
  34. # 12 14 clitic filler
  35. # 5 7 control
  36. # 11 13 clitic filler
  37. # 8 10 control
  38. # 17 19 action filler
  39. # 14 16 clitic filler
英文:

If your data is about this size, I would do a bunch of random shuffles and find one(s) that meet your criteria:

  1. shuffle = function(data) {
  2. data[sample(1:nrow(data)), ]
  3. }
  4. check = function(data) {
  5. all(data$name[-1] != data$name[-nrow(data)])
  6. }
  7. set.seed(47)
  8. results = replicate(10000, shuffle(data), simplify = FALSE)
  9. results = results[sapply(results, check)]
  10. length(results)
  11. [1] 10
  12. ## 10 of the 10000 shuffles meet your criteria
  13. ## here's one:
  14. results[[1]]
  15. # id name
  16. # 16 18 clitic filler
  17. # 21 23 action filler
  18. # 9 11 clitic filler
  19. # 20 22 action filler
  20. # 15 17 clitic filler
  21. # 24 26 action filler
  22. # 1 3 restructuring
  23. # 13 15 clitic filler
  24. # 7 9 control
  25. # 2 4 restructuring
  26. # 19 21 action filler
  27. # 6 8 control
  28. # 4 6 restructuring
  29. # 23 25 action filler
  30. # 3 5 restructuring
  31. # 22 24 action filler
  32. # 10 12 clitic filler
  33. # 18 20 action filler
  34. # 12 14 clitic filler
  35. # 5 7 control
  36. # 11 13 clitic filler
  37. # 8 10 control
  38. # 17 19 action filler
  39. # 14 16 clitic filler

答案2

得分: 1

使用来自[这个答案][1]的函数,其中min.dist = 1

  1. library(data.table)
  2. setorder(setDT(data), name)[
  3. frank(prob_shuffler(cumsum(!duplicated(name)), 1L), ties.method = "random")
  4. ]

#> id name
#> 1: 3 restructuring
#> 2: 10 control
#> 3: 17 clitic filler
#> 4: 7 control
#> 5: 5 restructuring
#> 6: 13 clitic filler
#> 7: 20 action filler
#> 8: 11 clitic filler
#> 9: 9 control
#> 10: 24 action filler
#> 11: 16 clitic filler
#> 12: 25 action filler
#> 13: 14 clitic filler
#> 14: 19 action filler
#> 15: 4 restructuring
#> 16: 18 clitic filler
#> 17: 22 action filler
#> 18: 6 restructuring
#> 19: 23 action filler
#> 20: 15 clitic filler
#> 21: 8 control
#> 22: 21 action filler
#> 23: 12 clitic filler
#> 24: 26 action filler

  1. [1]: https://stackoverflow.com/a/65013927/9463489
  2. <details>
  3. <summary>英文:</summary>
  4. Using the function from [this answer][1] with `min.dist = 1`:
  5. library(data.table)
  6. setorder(setDT(data), name)[
  7. frank(prob_shuffler(cumsum(!duplicated(name)), 1L), ties.method = &quot;random&quot;)
  8. ]
  9. #&gt; id name
  10. #&gt; 1: 3 restructuring
  11. #&gt; 2: 10 control
  12. #&gt; 3: 17 clitic filler
  13. #&gt; 4: 7 control
  14. #&gt; 5: 5 restructuring
  15. #&gt; 6: 13 clitic filler
  16. #&gt; 7: 20 action filler
  17. #&gt; 8: 11 clitic filler
  18. #&gt; 9: 9 control
  19. #&gt; 10: 24 action filler
  20. #&gt; 11: 16 clitic filler
  21. #&gt; 12: 25 action filler
  22. #&gt; 13: 14 clitic filler
  23. #&gt; 14: 19 action filler
  24. #&gt; 15: 4 restructuring
  25. #&gt; 16: 18 clitic filler
  26. #&gt; 17: 22 action filler
  27. #&gt; 18: 6 restructuring
  28. #&gt; 19: 23 action filler
  29. #&gt; 20: 15 clitic filler
  30. #&gt; 21: 8 control
  31. #&gt; 22: 21 action filler
  32. #&gt; 23: 12 clitic filler
  33. #&gt; 24: 26 action filler
  34. [1]: https://stackoverflow.com/a/65013927/9463489
  35. </details>

huangapple
  • 本文由 发表于 2023年3月7日 01:58:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75654242.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定