英文:
R - multiple nested for in loop not fully populating in dataframe
问题
Here is your code with the requested translation:
我正在尝试创建一个数据框,其中包含从考虑两个不同的'character'参数的函数的输出。我正在使用for循环,但它只填充了数据框中的最后一个参数的最后一个字符。你能帮助我吗?我的代码首先创建了一个siber对象,然后进行for循环。请参见以下内容:
首先,这是我在代码中使用的R包:"tidyverse","SIBER","GGally","reshape2"
这是我处理的数据。
[数据表格已省略]
这是我的代码,注意:df1和df2是我的数据中“community”和“group”变量的组合,因为它们是我制作的siber.test对象中列出的变量。
[代码已省略]
现在,该代码只为我提供了df2的最后一个组合的数据框,如下所示:
[数据表格已省略]
相反,我想要获得以下结果,但是代码不会显示给我:
[期望的数据表格已省略]
在下面的几个有帮助的评论后,我按照建议改为创建一个数据框,然后将其纳入mylist。请注意,我必须在ij.overlap和ij.overlap的tmpframe参数中添加一个转置函数t(),因为它们具有3个值输出,否则将为每个值输出创建新行。
[修改后的代码已省略]
Please let me know if you need any further assistance.
英文:
I am trying to create a dataframe that has output from a function that takes into account two different 'character' arguments. I am using a for in loop, but it is only populating the dataframe with the last character of the last argument. Can you please help me? My code includes first making a siber object, and then doing the for in loop. See below:
First, here are the R packages I have in my code: "tidyverse","SIBER","GGally","reshape2"
Here is my data that I work with.
iso1 iso2 group community
1 -8.75 8.87 a z
2 -9.21 8.53 a z
3 -8.52 8.32 a z
4 -10.69 5.79 a z
5 -10.08 7.67 a z
6 -14.88 7.16 a z
7 -15.22 6.43 a z
8 -14.09 6.70 a z
9 -14.84 6.80 a z
10 -13.76 6.40 a z
11 -14.10 5.79 a z
12 -13.92 6.16 a z
13 -12.11 6.07 a z
14 -14.48 6.48 a z
15 -10.33 6.45 a z
16 -10.85 7.01 a z
17 -11.75 6.62 a z
18 -11.64 6.78 a z
19 -11.60 4.97 a z
20 -11.91 7.86 a z
21 -11.34 6.68 a z
22 -9.80 7.78 a z
23 -11.19 7.43 a z
24 -10.74 6.85 a z
25 -10.57 7.16 a z
26 -10.29 5.71 a z
27 -11.21 7.80 a z
28 -11.69 7.85 a z
29 -10.89 7.62 a z
30 -10.18 4.73 b z
31 -9.20 6.63 b z
32 -14.28 5.47 b z
33 -12.80 4.21 b z
34 -14.87 5.72 b z
35 -12.43 6.13 b z
36 -12.60 5.02 b z
37 -11.06 5.06 b z
38 -10.65 5.63 b z
39 -13.69 2.95 c z
40 -13.19 2.41 c z
41 -14.41 3.11 c z
42 -11.47 4.88 d z
43 -11.73 4.79 d z
44 -12.62 5.59 d z
45 -11.57 3.63 e z
46 -11.79 2.75 e z
47 -12.67 4.93 f z
48 -12.50 5.09 f z
49 -13.18 5.11 f z
50 -12.71 5.36 f z
51 -10.24 5.49 f z
52 -10.10 4.46 f z
53 -10.54 4.09 f z
54 -10.46 4.22 f z
55 -11.05 4.78 f z
56 -11.08 4.67 f z
57 -11.64 4.19 f z
58 -11.61 4.82 f z
59 -11.22 4.50 f z
60 -6.00 1.29 g z
61 -7.30 4.08 g z
62 -7.30 1.68 g z
63 -6.90 1.87 g z
64 -8.10 1.13 g z
65 -5.10 2.54 g z
66 -6.90 2.38 g z
67 -6.50 4.00 g z
68 -7.10 3.60 g z
69 -5.30 2.31 g z
70 -7.30 3.97 g z
71 -4.20 1.03 g z
72 -10.50 2.85 g z
73 -8.68 2.61 g z
74 -8.04 3.06 g z
75 -14.33 2.13 e z
76 -12.05 2.21 e z
77 -12.06 2.45 e z
78 -13.18 2.84 e z
79 -12.26 1.35 e z
80 -13.14 3.01 e z
81 -14.20 3.55 e z
82 -13.56 3.36 e z
83 -11.98 2.93 e z
84 -14.49 2.68 e z
85 -14.45 3.00 e z
86 -15.08 2.32 e z
Here is my code, note: the df1 and df2 are a combination of the "community" and "group" variable in my data that are needed as that is how they are listed in the siber.test object I made.
siber.test = createSiberObject(si.test)
df1 <- data.frame(c("z.a","z.b","z.c","z.d","z.e","z.f","z.g"))
df2 <- data.frame(c("z.a","z.b"))
mylist <- list() #create an empty list
for (i in 1:nrow(df1)) {
for (j in 1:nrow(df2)) {
ij.overlap <- maxLikOverlap(df1[i,],df2[j,], siber.test,
p.interval = NULL, n = 100)
ij.over=ij.overlap[3]/(ij.overlap[2]+ij.overlap[1]-ij.overlap[3])
vec=c(df1[i,],df2[j,],ij.overlap,ij.over)
mylist[[i]]=vec
}
}
df=do.call("rbind",mylist)
Right now, the code only gives me a dataframe with just the last combination of df2, see below:
area.1 area.2 overlap overlap
[1,] "z.a" "z.b" "4.93603972039955" "4.89919820826708" "0.0841731887902768" "0.00863220489615416"
[2,] "z.b" "z.b" "4.89919820826708" "4.89919820826708" "4.89919820826767" "1.00000000000024"
[3,] "z.c" "z.b" "0.559725304862695" "4.89919820826708" "9.54097911787244e-18" "1.74777666236292e-18"
[4,] "z.d" "z.b" "0.522204858582882" "4.89919820826708" "0.45393111227755" "0.0913807096303221"
[5,] "z.e" "z.b" "2.49971788496032" "4.89919820826708" "4.85722573273506e-17" "6.56478012662033e-18"
[6,] "z.f" "z.b" "1.38774348908464" "4.89919820826708" "0.742263994748963" "0.133869637616724"
[7,] "z.g" "z.b" "5.21007280975346" "4.89919820826708" "0" "0"
instead, what I want to get, but the code won't show me is the following:
area.1 area.2 overlap overlap
[1,] "z.a" "z.a" "4.93603972039955" "4.93603972039955" "4.93603972040026" "1.00000000000029"
[2,] "z.b" "z.a" "4.89919820826708" "4.93603972039955" "0.0841731887902741" "0.00863220489615387"
[3,] "z.c" "z.a" "0.559725304862695" "4.93603972039955" "8.67361737988404e-18" "1.57823657671211e-18"
[4,] "z.d" "z.a" "0.522204858582882" "4.93603972039955" "6.93889390390723e-18" "1.27126840937583e-18"
[5,] "z.e" "z.a" "2.49971788496032" "4.93603972039955" "2.77555756156289e-17" "3.73271656887014e-18"
[6,] "z.f" "z.a" "1.38774348908464" "4.93603972039955" "3.64291929955129e-17" "5.76066442962145e-18"
[7,] "z.g" "z.a" "5.21007280975346" "4.93603972039955" "0" "0"
[8,] "z.a" "z.b" "4.93603972039955" "4.89919820826708" "0.0841731887902768" "0.00863220489615416"
[9,] "z.b" "z.b" "4.89919820826708" "4.89919820826708" "4.89919820826767" "1.00000000000024"
[10,] "z.c" "z.b" "0.559725304862695" "4.89919820826708" "9.54097911787244e-18" "1.74777666236292e-18"
[11,] "z.d" "z.b" "0.522204858582882" "4.89919820826708" "0.45393111227755" "0.0913807096303221"
[12,] "z.e" "z.b" "2.49971788496032" "4.89919820826708" "4.85722573273506e-17" "6.56478012662033e-18"
[13,] "z.f" "z.b" "1.38774348908464" "4.89919820826708" "0.742263994748963" "0.133869637616724"
[14,] "z.g" "z.b" "5.21007280975346" "4.89919820826708" "0" "0"
After the several helpful comments from below, I followed the recommendation to instead make a dataframe that then gets incorporated into mylist. Note, I had to add a transpose function t() inside of the tmpframe arguments for ij.overlap and ij.over because they have 3 value outputs that otherwise would make new rows for each value output.
siber.test = createSiberObject(si.test)
df1 <- data.frame(c("z.a","z.b","z.c","z.d","z.e","z.f","z.g"))
df2 <- data.frame(c("z.a","z.b"))
mylist <- list() #create an empty list
for (i in 1:nrow(df1)) {
for (j in 1:nrow(df2)) {
ij.overlap <- maxLikOverlap(df1[i,],df2[j,], siber.test,
p.interval = NULL, n = 100)
ij.over=ij.overlap[3]/(ij.overlap[2]+ij.overlap[1]-ij.overlap[3])
tmpframe <- data.frame(df1[i,],df2[j,],t(ij.overlap),t(ij.over))
mylist <- c(mylist, list(tmpframe))
}
}
df=do.call("rbind",mylist)
答案1
得分: 2
我没有SIBER
,但也许这个方法会起作用:它首先在df1
和df2
的每个组合上创建一个扩展,然后对每个组合运行该函数。
我们可以使用mapply
(替代您的两个for
循环);这里返回一个矩阵(需要进行转置,很容易),因为maxLikOverlap
的返回值是一个长度为3的数值向量。
eg <- expand.grid(
a = c("z.a","z.b","z.c","z.d","z.e","z.f","z.g"),
b = c("z.a","z.b"),
stringsAsFactors = FALSE)
eg
# a b
# 1 z.a z.a
# 2 z.b z.a
# 3 z.c z.a
# 4 z.d z.a
# 5 z.e z.a
# 6 z.f z.a
# 7 z.g z.a
# 8 z.a z.b
# 9 z.b z.b
# 10 z.c z.b
# 11 z.d z.b
# 12 z.e z.b
# 13 z.f z.b
# 14 z.g z.b
res <- t(mapply(function(a, b) maxLikOverlap(a, b, siber.test, p.interval = NULL, n = 100),
eg$a, eg$b))
从这里,您可以调用cbind(eg, data.frame(res))
,现在您将拥有X1
到X3
(或者从maxLikOverlap
返回的向量命名为什么您喜欢的名称)列。
mapply
调用函数(作为其第一个参数,以及您要传递给mapply
的其余命名参数)用于提供的向量/列表的每一对。例如,它展开为:
maxLikOverlap("z.a", "z.a", siber.test, p.interval = NULL, n = 100)
maxLikOverlap("z.b", "z.a", siber.test, p.interval = NULL, n = 100)
maxLikOverlap("z.c", "z.a", siber.test, p.interval = NULL, n = 100)
...
然后将结果返回为向量/数组/矩阵(如果所有返回值都是向量并且是相同的类别)或list
(否则)。
您的代码中出现错误的问题:
-
vec=c(df1[i,],df2[j,],ij.overlap,ij.overlap.95,ij.over,ij.over.95)
正在将字符串(来自df1[1,]
和df2[j,]
)与ij.overlap
的数值返回值连接在一起;看看c("A", 1)
,了解1会发生什么,您不应该这样做。如果要将它们组合成一行,建议使用以下方式:tmpframe <- data.frame(arg1=df1[i,], arg2=df2[j,], area.1=is.overlap, ...) # 填写其余部分
-
mylist[[i]]=vec
每次在循环中覆盖上一次j
迭代,我不认为这是有意的。相反,您可以尝试mylist <- c(mylist, list(tmpframe))
,然后在循环结束后执行do.call("rbind", mylist)
。
英文:
I don't have SIBER
, but perhaps this will work: it first creates an expansion on each from df1
and df2
, then runs the function on each pair.
We can use mapply
(to replace your two for
loops); this returns a matrix here (needing to be t
ranposed, easy enough), which is okay because the return value from maxLikOverlap
is a numeric vector length-3.
eg <- expand.grid(
a = c("z.a","z.b","z.c","z.d","z.e","z.f","z.g"),
b = c("z.a","z.b"),
stringsAsFactors = FALSE)
eg
# a b
# 1 z.a z.a
# 2 z.b z.a
# 3 z.c z.a
# 4 z.d z.a
# 5 z.e z.a
# 6 z.f z.a
# 7 z.g z.a
# 8 z.a z.b
# 9 z.b z.b
# 10 z.c z.b
# 11 z.d z.b
# 12 z.e z.b
# 13 z.f z.b
# 14 z.g z.b
res <- t(mapply(function(a, b) maxLikOverlap(a, b, siber.test, p.interval = NULL, n = 100),
eg$a, eg$b))
From here, you can call cbind(eg, data.frame(res))
, and you'll now have columns X1
through X3
(or is the vector returned from maxLikOverlap
named?), which you can name what you prefer.
mapply
calls the function (its first argument, with as many named arguments as you want to pass to the rest of mapply
) for each pair of the vector/lists you provide later. For instance, it "unrolls" to
maxLikOverlap("z.a", "z.a", siber.test, p.interval = NULL, n = 100)
maxLikOverlap("z.b", "z.a", siber.test, p.interval = NULL, n = 100)
maxLikOverlap("z.c", "z.a", siber.test, p.interval = NULL, n = 100)
...
and returning the results in a vector/array/matrix (if all return values are vectors and the same class) or a list
(otherwise).
Issues where your code is breaking.
-
vec=c(df1[i,],df2[j,],ij.overlap,ij.overlap.95,ij.over,ij.over.95)
is concatenating strings (from
df1[1,]
anddf2[j,]
) with the numeric return value inij.overlap
; seec("A", 1)
to see what happens to the 1, you should not do this. If you want to combine them into one row, I suggest something liketmpframe <- data.frame(arg1=df1[i,], arg2=df2[j,], area.1=is.overlap, ...) # fill in rest
-
mylist[[i]]=vec
is overwriting the previousj
-iteration each time in the loop, I don't think this is intentional. Instead, you might trymylist <- c(mylist, list(tmpframe))
, and then after your loops doing yourdo.call("rbind", mylist)
.
答案2
得分: 0
不使用嵌套的for循环,您可以:
- 创建两个长度相同的椭圆向量,您可以同时迭代(在您的代码中为df1和df2)。
- 迭代两个椭圆向量(使用
purrr::map2()
或base::mapply()
)以获取每个椭圆组合的结果。 - 行绑定(
dplyr::bind_rows()
)结果并将椭圆向量添加为列。
library(tidyverse)
library(GGally)
library(SIBER)
# 数据错误,请确保仅包括您要查找的组合。
ellipses1 <-
distinct(si.test, group, community) |
mutate(ellipse = paste(community, group, sep = ".")) |
pull(ellipse)
# 这将创建第二个椭圆向量。
ellipses2 <- rep_len(ellipses1[1:2], length(ellipses1))
# 使用`purrr::map2()`,我们可以同时迭代ellipses1和ellipses2。
# 每次迭代将每个向量的一个元素输入到`maxLikOverlap()`中。
# `bind_rows()`将所有内容合并为数据帧/表格。`mutate()`添加椭圆向量。`relocate()`将椭圆列放在值列之前。
map2(
ellipses1,
ellipses2,
maxLikOverlap,
siber.object = siber.test,
p.interval = NULL,
n = 100
) |
bind_rows() |
mutate(elipse1 = ellipses1,
elipse2 = ellipses2) |
relocate(elipse1, elipse2, .before = area.1)
英文:
Instead of using nested for-loops, you could:
- Create two ellipse vectors of the same length that you can iterate over simultaneously (df1, df2 in your code).
- Iterate over the two ellipse vectors (using
purrr::map2()
, orbase::mapply()
) to obtain the results for each ellipse combination. - Row bind (
dplyr::bind_rows()
) the results and add the ellipse vectors as columns.
library(tidyverse)
library(GGally)
library(SIBER)
# I got an error for the data you shared is it maybe truncated? Instead, I'm using
# the demo data provided in the `SIBER` package.
data(demo.siber.data)
# Modify the demo data to be more similar to your data.
si.test <-
demo.siber.data |>
mutate(group = factor(group, label = letters[1:3]),
community = factor(community, labels = rev(letters)[1:2]))
# The code below should work for your data as is.
siber.test <- createSiberObject(si.test)
# This extracts all group, and community combinations found in the data. Make sure to adjust
# this only to include the combinations you are looking for.
ellipses1 <-
distinct(si.test, group, community) |>
mutate(ellipse = paste(community, group, sep = ".")) |>
pull(ellipse)
# This will create the second ellipse vector.
ellipses2 <- rep_len(ellipses1[1:2], length(ellipses1))
# Using `purrr::map2()` we can simoultanesously iterate over ellipses1 and ellipses2.
# Each iteration feeds one element of each vector into `maxLikOverlap()`.
# `bind_rows()` combines everything into a data.frame/tibble. `mutate()` adds
# the ellipse vectors. `relocate()` places the ellipse columns before the
# value columns.
map2(
ellipses1,
ellipses2,
maxLikOverlap,
siber.object = siber.test,
p.interval = NULL,
n = 100
) |>
bind_rows() |>
mutate(elipse1 = ellipses1,
elipse2 = ellipses2) |>
relocate(elipse1, elipse2, .before = area.1)
#> # A tibble: 6 × 5
#> elipse1 elipse2 area.1 area.2 overlap
#> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 z.a z.a 5.99 5.99 5.99
#> 2 z.b z.b 3.37 3.37 3.37
#> 3 z.c z.a 5.31 5.99 0
#> 4 y.a z.b 0.893 3.37 0
#> 5 y.b z.a 3.58 5.99 0
#> 6 y.c z.b 0.459 3.37 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论