英文:
Randomize items with a filter in R
问题
有没有一种方法可以基于筛选条件对DataFrame的行进行洗牌?例如,我有这个DataFrame:
data=data.frame(id=c(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26),
name=c("restructuring","restructuring","restructuring","restructuring",
"control","control","control","control","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler")
)
其中,编号从3到6是'restructuring',7-10是'control',11-18是'clitic filler',19-26是'action filler',我希望name
列在两个连续的行中不具有相同的值。
我尝试过:
shuffled_data = data[sample(1:nrow(data)), ]
但这显然没有特定的条件。
英文:
is there a way to shuffle dataframe's rows based on a filter? For instance, I have this dataframe:
data=data.frame(id=c(3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26),
name=c("restructuring","restructuring","restructuring","restructuring",
"control","control","control","control","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","clitic filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler","action filler")
)
In which numbers from 3 to 6 are 'restructuring', 7-10 are 'control', 11-18 are 'clitic filler', 19-26 are 'action filler', and I'd like name
column to not have the same value in 2 consecutive rows.
I tried:
shuffled_data= data[sample(1:nrow(data)), ]
But this obviously randomizes with no criteria
答案1
得分: 1
如果你的数据大小是这样的,我会进行一系列的随机洗牌,并找出符合你的条件的一个或多个:
shuffle = function(data) {
data[sample(1:nrow(data)), ]
}
check = function(data) {
all(data$name[-1] != data$name[-nrow(data)])
}
set.seed(47)
results = replicate(10000, shuffle(data), simplify = FALSE)
results = results[sapply(results, check)]
length(results)
[1] 10
## 在10000次洗牌中,有10次符合你的条件
## 这是其中一种情况:
results[[1]]
# id name
# 16 18 clitic filler
# 21 23 action filler
# 9 11 clitic filler
# 20 22 action filler
# 15 17 clitic filler
# 24 26 action filler
# 1 3 restructuring
# 13 15 clitic filler
# 7 9 control
# 2 4 restructuring
# 19 21 action filler
# 6 8 control
# 4 6 restructuring
# 23 25 action filler
# 3 5 restructuring
# 22 24 action filler
# 10 12 clitic filler
# 18 20 action filler
# 12 14 clitic filler
# 5 7 control
# 11 13 clitic filler
# 8 10 control
# 17 19 action filler
# 14 16 clitic filler
英文:
If your data is about this size, I would do a bunch of random shuffles and find one(s) that meet your criteria:
shuffle = function(data) {
data[sample(1:nrow(data)), ]
}
check = function(data) {
all(data$name[-1] != data$name[-nrow(data)])
}
set.seed(47)
results = replicate(10000, shuffle(data), simplify = FALSE)
results = results[sapply(results, check)]
length(results)
[1] 10
## 10 of the 10000 shuffles meet your criteria
## here's one:
results[[1]]
# id name
# 16 18 clitic filler
# 21 23 action filler
# 9 11 clitic filler
# 20 22 action filler
# 15 17 clitic filler
# 24 26 action filler
# 1 3 restructuring
# 13 15 clitic filler
# 7 9 control
# 2 4 restructuring
# 19 21 action filler
# 6 8 control
# 4 6 restructuring
# 23 25 action filler
# 3 5 restructuring
# 22 24 action filler
# 10 12 clitic filler
# 18 20 action filler
# 12 14 clitic filler
# 5 7 control
# 11 13 clitic filler
# 8 10 control
# 17 19 action filler
# 14 16 clitic filler
答案2
得分: 1
使用来自[这个答案][1]的函数,其中min.dist = 1
:
library(data.table)
setorder(setDT(data), name)[
frank(prob_shuffler(cumsum(!duplicated(name)), 1L), ties.method = "random")
]
#> id name
#> 1: 3 restructuring
#> 2: 10 control
#> 3: 17 clitic filler
#> 4: 7 control
#> 5: 5 restructuring
#> 6: 13 clitic filler
#> 7: 20 action filler
#> 8: 11 clitic filler
#> 9: 9 control
#> 10: 24 action filler
#> 11: 16 clitic filler
#> 12: 25 action filler
#> 13: 14 clitic filler
#> 14: 19 action filler
#> 15: 4 restructuring
#> 16: 18 clitic filler
#> 17: 22 action filler
#> 18: 6 restructuring
#> 19: 23 action filler
#> 20: 15 clitic filler
#> 21: 8 control
#> 22: 21 action filler
#> 23: 12 clitic filler
#> 24: 26 action filler
[1]: https://stackoverflow.com/a/65013927/9463489
<details>
<summary>英文:</summary>
Using the function from [this answer][1] with `min.dist = 1`:
library(data.table)
setorder(setDT(data), name)[
frank(prob_shuffler(cumsum(!duplicated(name)), 1L), ties.method = "random")
]
#> id name
#> 1: 3 restructuring
#> 2: 10 control
#> 3: 17 clitic filler
#> 4: 7 control
#> 5: 5 restructuring
#> 6: 13 clitic filler
#> 7: 20 action filler
#> 8: 11 clitic filler
#> 9: 9 control
#> 10: 24 action filler
#> 11: 16 clitic filler
#> 12: 25 action filler
#> 13: 14 clitic filler
#> 14: 19 action filler
#> 15: 4 restructuring
#> 16: 18 clitic filler
#> 17: 22 action filler
#> 18: 6 restructuring
#> 19: 23 action filler
#> 20: 15 clitic filler
#> 21: 8 control
#> 22: 21 action filler
#> 23: 12 clitic filler
#> 24: 26 action filler
[1]: https://stackoverflow.com/a/65013927/9463489
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论