英文:
Expand vector positionally based on another vector (vectorised)
问题
我正在寻找一种加速这个过程的方法。我有两种类型的向量,x是我的参考向量,y是我的数据保持向量。
x <- rep(c(0,1), 25)
y <- rep(c("a", "b", "c"), 8)
我希望如果x == 0,输出是NA,但如果x = 1,输出是y。
我目前有以下代码:
out <- character(50)
count <- 0
count2 <- 0
for(i in x) {
count <- count + 1
if (i == 1) {
count2 <- count2 + 1
out[count] <- y[count2]
}
else{
out[count] <- NA
}
}
预期输出是:
Intended_out <- c(NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c", NA, NA)
x = 0用于表示y中需要填充的间隔,x = 1表示应使用y。第一个1表示y中的第一个元素,第二个1表示第二个y元素,依此类推。输出应该与x的长度相同,按相同顺序包含所有y值,但在x = 0的位置填充NA。
这段代码做到了我需要的,但我无法想到如何扩展到规模化的情况。我有很多不同的y向量,它们都不同,但长度相同。x始终保持不变,且始终大于y。是否有一种向量化此过程的方法?如果可以将所有的y绑定到一个数据框中,最好是可以使用管道方式,然后将这个逻辑应用到整个数据框中,但在这一点上我的思维已经停滞。
谢谢。
英文:
I'm looking for a way to speed this process up. I have two types of vector, x, my reference vector and y my data holding vector.
x <- rep(c(0,1), 25)
y <- rep(c("a", "b", "c"), 8)
I need it so that if x == 0, then the output is NA but if x = 1 then the output is y.
I currently have:
out <- character(50)
count <- 0
count2 <- 0
for(i in x) {
count <- count + 1
if (i == 1) {
count2 <- count2 + 1
out[count] <- y[count2]
}
else{
out[count] <- NA
}
}
Intended_out <- c(NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c", NA, "a", NA, "b",
NA, "c", NA, "a", NA, "b", NA, "c", NA, "a", NA, "b", NA, "c",
NA, "a", NA, "b", NA, "c", NA, NA)
x = 0 is used to show where a gap needs to be filled in y, x = 1 indicates that y should be used. The first 1 represents the first element in y, the second 1 indicates the second y element etc. The output should end up the same length as x, with all the y values in the same order, but gaps filled with NA where x = 0.
This does what I need, but I can't think of a way to do this at scale. I have lots of y vectors, which are all different, but the same length. x always stays the same and always greater than y. Is there a way to vectorise this process? It would be great if I can just bind all the y's in a dataframe, preferably in a pipeable way, and then apply this logic to the whole data frame but my brain has given up at this point.
Thanks.
答案1
得分: 2
> y[cumsum(x) * NA^(1 - x)]
[1] NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA
[20] "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a"
[39] NA "b" NA "c" NA "a" NA "b" NA "c" NA NA
英文:
Do you want something like this?
> y[cumsum(x) * NA^(1 - x)]
[1] NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA
[20] "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a"
[39] NA "b" NA "c" NA "a" NA "b" NA "c" NA NA
答案2
得分: 1
如果向量的长度<=100,您还可以使用pmatch
:
y[pmatch(x,x[x==1])]
[1] NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a"
[21] NA "b" NA "c" NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a" NA "b"
[41] NA "c" NA "a" NA "b" NA "c" NA NA
英文:
If the length of the vectors is <=100, you could also use pmatch
:
y[pmatch(x,x[x==1])]
[1] NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a"
[21] NA "b" NA "c" NA "a" NA "b" NA "c" NA "a" NA "b" NA "c" NA "a" NA "b"
[41] NA "c" NA "a" NA "b" NA "c" NA NA
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论