在R中使用筛选器对项目进行随机化。

huangapple go评论69阅读模式
英文:

Randomize items with a filter in R

问题

有没有一种方法可以基于筛选条件对DataFrame的行进行洗牌?例如,我有这个DataFrame:

data=data.frame(id=c(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26),
                name=c("restructuring","restructuring","restructuring","restructuring",
                       "control","control","control","control","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler")
               )

其中,编号从3到6是'restructuring',7-10是'control',11-18是'clitic filler',19-26是'action filler',我希望name列在两个连续的行中不具有相同的值。

我尝试过:

shuffled_data = data[sample(1:nrow(data)), ]

但这显然没有特定的条件。

英文:

is there a way to shuffle dataframe's rows based on a filter? For instance, I have this dataframe:

data=data.frame(id=c(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26),
                name=c("restructuring","restructuring","restructuring","restructuring",
                       "control","control","control","control","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler")
               )

In which numbers from 3 to 6 are 'restructuring', 7-10 are 'control', 11-18 are 'clitic filler', 19-26 are 'action filler', and I'd like name column to not have the same value in 2 consecutive rows.

I tried:

shuffled_data= data[sample(1:nrow(data)), ]

But this obviously randomizes with no criteria

答案1

得分: 1

如果你的数据大小是这样的,我会进行一系列的随机洗牌,并找出符合你的条件的一个或多个:

shuffle = function(data) {
  data[sample(1:nrow(data)), ]
}

check = function(data) {
  all(data$name[-1] != data$name[-nrow(data)])
}

set.seed(47)
results = replicate(10000, shuffle(data), simplify = FALSE)
results = results[sapply(results, check)]
length(results)
[1] 10
## 在10000次洗牌中,有10次符合你的条件

## 这是其中一种情况:
results[[1]]
#    id          name
# 16 18 clitic filler
# 21 23 action filler
# 9  11 clitic filler
# 20 22 action filler
# 15 17 clitic filler
# 24 26 action filler
# 1   3 restructuring
# 13 15 clitic filler
# 7   9       control
# 2   4 restructuring
# 19 21 action filler
# 6   8       control
# 4   6 restructuring
# 23 25 action filler
# 3   5 restructuring
# 22 24 action filler
# 10 12 clitic filler
# 18 20 action filler
# 12 14 clitic filler
# 5   7       control
# 11 13 clitic filler
# 8  10       control
# 17 19 action filler
# 14 16 clitic filler
英文:

If your data is about this size, I would do a bunch of random shuffles and find one(s) that meet your criteria:

shuffle = function(data) {
  data[sample(1:nrow(data)), ]
}

check = function(data) {
  all(data$name[-1] != data$name[-nrow(data)])
}

set.seed(47)
results = replicate(10000, shuffle(data), simplify = FALSE)
results = results[sapply(results, check)]
length(results)
[1] 10
## 10 of the 10000 shuffles meet your criteria

## here's one:
results[[1]]
#    id          name
# 16 18 clitic filler
# 21 23 action filler
# 9  11 clitic filler
# 20 22 action filler
# 15 17 clitic filler
# 24 26 action filler
# 1   3 restructuring
# 13 15 clitic filler
# 7   9       control
# 2   4 restructuring
# 19 21 action filler
# 6   8       control
# 4   6 restructuring
# 23 25 action filler
# 3   5 restructuring
# 22 24 action filler
# 10 12 clitic filler
# 18 20 action filler
# 12 14 clitic filler
# 5   7       control
# 11 13 clitic filler
# 8  10       control
# 17 19 action filler
# 14 16 clitic filler

答案2

得分: 1

使用来自[这个答案][1]的函数,其中min.dist = 1

library(data.table)

setorder(setDT(data), name)[
  frank(prob_shuffler(cumsum(!duplicated(name)), 1L), ties.method = "random")
]

#> id name
#> 1: 3 restructuring
#> 2: 10 control
#> 3: 17 clitic filler
#> 4: 7 control
#> 5: 5 restructuring
#> 6: 13 clitic filler
#> 7: 20 action filler
#> 8: 11 clitic filler
#> 9: 9 control
#> 10: 24 action filler
#> 11: 16 clitic filler
#> 12: 25 action filler
#> 13: 14 clitic filler
#> 14: 19 action filler
#> 15: 4 restructuring
#> 16: 18 clitic filler
#> 17: 22 action filler
#> 18: 6 restructuring
#> 19: 23 action filler
#> 20: 15 clitic filler
#> 21: 8 control
#> 22: 21 action filler
#> 23: 12 clitic filler
#> 24: 26 action filler


  [1]: https://stackoverflow.com/a/65013927/9463489

<details>
<summary>英文:</summary>

Using the function from [this answer][1] with `min.dist = 1`:

    library(data.table)
    
    setorder(setDT(data), name)[
      frank(prob_shuffler(cumsum(!duplicated(name)), 1L), ties.method = &quot;random&quot;)
    ]
    #&gt;     id          name
    #&gt;  1:  3 restructuring
    #&gt;  2: 10       control
    #&gt;  3: 17 clitic filler
    #&gt;  4:  7       control
    #&gt;  5:  5 restructuring
    #&gt;  6: 13 clitic filler
    #&gt;  7: 20 action filler
    #&gt;  8: 11 clitic filler
    #&gt;  9:  9       control
    #&gt; 10: 24 action filler
    #&gt; 11: 16 clitic filler
    #&gt; 12: 25 action filler
    #&gt; 13: 14 clitic filler
    #&gt; 14: 19 action filler
    #&gt; 15:  4 restructuring
    #&gt; 16: 18 clitic filler
    #&gt; 17: 22 action filler
    #&gt; 18:  6 restructuring
    #&gt; 19: 23 action filler
    #&gt; 20: 15 clitic filler
    #&gt; 21:  8       control
    #&gt; 22: 21 action filler
    #&gt; 23: 12 clitic filler
    #&gt; 24: 26 action filler

  [1]: https://stackoverflow.com/a/65013927/9463489

</details>



huangapple
  • 本文由 发表于 2023年3月7日 01:58:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75654242.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定