R – 多重嵌套的for循环未完全填充数据框。

huangapple go评论93阅读模式
英文:

R - multiple nested for in loop not fully populating in dataframe

问题

Here is your code with the requested translation:

我正在尝试创建一个数据框,其中包含从考虑两个不同的'character'参数的函数的输出。我正在使用for循环,但它只填充了数据框中的最后一个参数的最后一个字符。你能帮助我吗?我的代码首先创建了一个siber对象,然后进行for循环。请参见以下内容:

首先,这是我在代码中使用的R包:"tidyverse""SIBER""GGally""reshape2"

这是我处理的数据。

[数据表格已省略]

这是我的代码,注意:df1和df2是我的数据中“community”和“group”变量的组合,因为它们是我制作的siber.test对象中列出的变量。

[代码已省略]

现在,该代码只为我提供了df2的最后一个组合的数据框,如下所示:

[数据表格已省略]

相反,我想要获得以下结果,但是代码不会显示给我:

[期望的数据表格已省略]

在下面的几个有帮助的评论后,我按照建议改为创建一个数据框,然后将其纳入mylist。请注意,我必须在ij.overlap和ij.overlap的tmpframe参数中添加一个转置函数t(),因为它们具有3个值输出,否则将为每个值输出创建新行。

[修改后的代码已省略]

Please let me know if you need any further assistance.

英文:

I am trying to create a dataframe that has output from a function that takes into account two different 'character' arguments. I am using a for in loop, but it is only populating the dataframe with the last character of the last argument. Can you please help me? My code includes first making a siber object, and then doing the for in loop. See below:

First, here are the R packages I have in my code: "tidyverse","SIBER","GGally","reshape2"

Here is my data that I work with.

     iso1 iso2 group community
1   -8.75 8.87     a         z
2   -9.21 8.53     a         z
3   -8.52 8.32     a         z
4  -10.69 5.79     a         z
5  -10.08 7.67     a         z
6  -14.88 7.16     a         z
7  -15.22 6.43     a         z
8  -14.09 6.70     a         z
9  -14.84 6.80     a         z
10 -13.76 6.40     a         z
11 -14.10 5.79     a         z
12 -13.92 6.16     a         z
13 -12.11 6.07     a         z
14 -14.48 6.48     a         z
15 -10.33 6.45     a         z
16 -10.85 7.01     a         z
17 -11.75 6.62     a         z
18 -11.64 6.78     a         z
19 -11.60 4.97     a         z
20 -11.91 7.86     a         z
21 -11.34 6.68     a         z
22  -9.80 7.78     a         z
23 -11.19 7.43     a         z
24 -10.74 6.85     a         z
25 -10.57 7.16     a         z
26 -10.29 5.71     a         z
27 -11.21 7.80     a         z
28 -11.69 7.85     a         z
29 -10.89 7.62     a         z
30 -10.18 4.73     b         z
31  -9.20 6.63     b         z
32 -14.28 5.47     b         z
33 -12.80 4.21     b         z
34 -14.87 5.72     b         z
35 -12.43 6.13     b         z
36 -12.60 5.02     b         z
37 -11.06 5.06     b         z
38 -10.65 5.63     b         z
39 -13.69 2.95     c         z
40 -13.19 2.41     c         z
41 -14.41 3.11     c         z
42 -11.47 4.88     d         z
43 -11.73 4.79     d         z
44 -12.62 5.59     d         z
45 -11.57 3.63     e         z
46 -11.79 2.75     e         z
47 -12.67 4.93     f         z
48 -12.50 5.09     f         z
49 -13.18 5.11     f         z
50 -12.71 5.36     f         z
51 -10.24 5.49     f         z
52 -10.10 4.46     f         z
53 -10.54 4.09     f         z
54 -10.46 4.22     f         z
55 -11.05 4.78     f         z
56 -11.08 4.67     f         z
57 -11.64 4.19     f         z
58 -11.61 4.82     f         z
59 -11.22 4.50     f         z
60  -6.00 1.29     g         z
61  -7.30 4.08     g         z
62  -7.30 1.68     g         z
63  -6.90 1.87     g         z
64  -8.10 1.13     g         z
65  -5.10 2.54     g         z
66  -6.90 2.38     g         z
67  -6.50 4.00     g         z
68  -7.10 3.60     g         z
69  -5.30 2.31     g         z
70  -7.30 3.97     g         z
71  -4.20 1.03     g         z
72 -10.50 2.85     g         z
73  -8.68 2.61     g         z
74  -8.04 3.06     g         z
75 -14.33 2.13     e         z
76 -12.05 2.21     e         z
77 -12.06 2.45     e         z
78 -13.18 2.84     e         z
79 -12.26 1.35     e         z
80 -13.14 3.01     e         z
81 -14.20 3.55     e         z
82 -13.56 3.36     e         z
83 -11.98 2.93     e         z
84 -14.49 2.68     e         z
85 -14.45 3.00     e         z
86 -15.08 2.32     e         z

Here is my code, note: the df1 and df2 are a combination of the "community" and "group" variable in my data that are needed as that is how they are listed in the siber.test object I made.

siber.test = createSiberObject(si.test)

df1 <- data.frame(c("z.a","z.b","z.c","z.d","z.e","z.f","z.g"))
df2 <- data.frame(c("z.a","z.b"))

mylist <- list() #create an empty list
for (i in 1:nrow(df1)) {
  for (j in 1:nrow(df2)) {
    
    ij.overlap <- maxLikOverlap(df1[i,],df2[j,], siber.test, 
                                p.interval = NULL, n = 100)
    ij.over=ij.overlap[3]/(ij.overlap[2]+ij.overlap[1]-ij.overlap[3])
    vec=c(df1[i,],df2[j,],ij.overlap,ij.over)
    mylist[[i]]=vec
  }
}
df=do.call("rbind",mylist)

Right now, the code only gives me a dataframe with just the last combination of df2, see below:


                 area.1              area.2             overlap                overlap               
[1,] "z.a" "z.b" "4.93603972039955"  "4.89919820826708" "0.0841731887902768"   "0.00863220489615416" 
[2,] "z.b" "z.b" "4.89919820826708"  "4.89919820826708" "4.89919820826767"     "1.00000000000024"    
[3,] "z.c" "z.b" "0.559725304862695" "4.89919820826708" "9.54097911787244e-18" "1.74777666236292e-18"
[4,] "z.d" "z.b" "0.522204858582882" "4.89919820826708" "0.45393111227755"     "0.0913807096303221"  
[5,] "z.e" "z.b" "2.49971788496032"  "4.89919820826708" "4.85722573273506e-17" "6.56478012662033e-18"
[6,] "z.f" "z.b" "1.38774348908464"  "4.89919820826708" "0.742263994748963"    "0.133869637616724"   
[7,] "z.g" "z.b" "5.21007280975346"  "4.89919820826708" "0"                    "0"     

instead, what I want to get, but the code won't show me is the following:

                 area.1              area.2             overlap                overlap               
[1,] "z.a" "z.a" "4.93603972039955"  "4.93603972039955" "4.93603972040026"     "1.00000000000029"    
[2,] "z.b" "z.a" "4.89919820826708"  "4.93603972039955" "0.0841731887902741"   "0.00863220489615387" 
[3,] "z.c" "z.a" "0.559725304862695" "4.93603972039955" "8.67361737988404e-18" "1.57823657671211e-18"
[4,] "z.d" "z.a" "0.522204858582882" "4.93603972039955" "6.93889390390723e-18" "1.27126840937583e-18"
[5,] "z.e" "z.a" "2.49971788496032"  "4.93603972039955" "2.77555756156289e-17" "3.73271656887014e-18"
[6,] "z.f" "z.a" "1.38774348908464"  "4.93603972039955" "3.64291929955129e-17" "5.76066442962145e-18"
[7,] "z.g" "z.a" "5.21007280975346"  "4.93603972039955" "0"                    "0"   
[8,] "z.a" "z.b" "4.93603972039955"  "4.89919820826708" "0.0841731887902768"   "0.00863220489615416" 
[9,] "z.b" "z.b" "4.89919820826708"  "4.89919820826708" "4.89919820826767"     "1.00000000000024"    
[10,] "z.c" "z.b" "0.559725304862695" "4.89919820826708" "9.54097911787244e-18" "1.74777666236292e-18"
[11,] "z.d" "z.b" "0.522204858582882" "4.89919820826708" "0.45393111227755"     "0.0913807096303221"  
[12,] "z.e" "z.b" "2.49971788496032"  "4.89919820826708" "4.85722573273506e-17" "6.56478012662033e-18"
[13,] "z.f" "z.b" "1.38774348908464"  "4.89919820826708" "0.742263994748963"    "0.133869637616724"   
[14,] "z.g" "z.b" "5.21007280975346"  "4.89919820826708" "0"                    "0"             

After the several helpful comments from below, I followed the recommendation to instead make a dataframe that then gets incorporated into mylist. Note, I had to add a transpose function t() inside of the tmpframe arguments for ij.overlap and ij.over because they have 3 value outputs that otherwise would make new rows for each value output.

siber.test = createSiberObject(si.test)

df1 <- data.frame(c("z.a","z.b","z.c","z.d","z.e","z.f","z.g"))
df2 <- data.frame(c("z.a","z.b"))

mylist <- list() #create an empty list
for (i in 1:nrow(df1)) {
  for (j in 1:nrow(df2)) {
    
    ij.overlap <- maxLikOverlap(df1[i,],df2[j,], siber.test, 
                                p.interval = NULL, n = 100)
    ij.over=ij.overlap[3]/(ij.overlap[2]+ij.overlap[1]-ij.overlap[3])
    tmpframe <- data.frame(df1[i,],df2[j,],t(ij.overlap),t(ij.over))
    mylist <- c(mylist, list(tmpframe))
  }
}
df=do.call("rbind",mylist)

答案1

得分: 2

我没有SIBER,但也许这个方法会起作用:它首先在df1df2的每个组合上创建一个扩展,然后对每个组合运行该函数。

我们可以使用mapply(替代您的两个for循环);这里返回一个矩阵(需要进行转置,很容易),因为maxLikOverlap的返回值是一个长度为3的数值向量。

eg <- expand.grid(
  a = c("z.a","z.b","z.c","z.d","z.e","z.f","z.g"),
  b = c("z.a","z.b"),
  stringsAsFactors = FALSE)
eg
#      a   b
# 1  z.a z.a
# 2  z.b z.a
# 3  z.c z.a
# 4  z.d z.a
# 5  z.e z.a
# 6  z.f z.a
# 7  z.g z.a
# 8  z.a z.b
# 9  z.b z.b
# 10 z.c z.b
# 11 z.d z.b
# 12 z.e z.b
# 13 z.f z.b
# 14 z.g z.b

res <- t(mapply(function(a, b) maxLikOverlap(a, b, siber.test, p.interval = NULL, n = 100), 
                eg$a, eg$b))

从这里,您可以调用cbind(eg, data.frame(res)),现在您将拥有X1X3(或者从maxLikOverlap返回的向量命名为什么您喜欢的名称)列。

mapply调用函数(作为其第一个参数,以及您要传递给mapply的其余命名参数)用于提供的向量/列表的每一对。例如,它展开为:

maxLikOverlap("z.a", "z.a", siber.test, p.interval = NULL, n = 100)
maxLikOverlap("z.b", "z.a", siber.test, p.interval = NULL, n = 100)
maxLikOverlap("z.c", "z.a", siber.test, p.interval = NULL, n = 100)
...

然后将结果返回为向量/数组/矩阵(如果所有返回值都是向量并且是相同的类别)或list(否则)。


您的代码中出现错误的问题:

  • vec=c(df1[i,],df2[j,],ij.overlap,ij.overlap.95,ij.over,ij.over.95) 正在将字符串(来自df1[1,]df2[j,])与ij.overlap的数值返回值连接在一起;看看c("A", 1),了解1会发生什么,您不应该这样做。如果要将它们组合成一行,建议使用以下方式:

    tmpframe <- data.frame(arg1=df1[i,], arg2=df2[j,], area.1=is.overlap, ...) # 填写其余部分
    
  • mylist[[i]]=vec 每次在循环中覆盖上一次j迭代,我不认为这是有意的。相反,您可以尝试mylist <- c(mylist, list(tmpframe)),然后在循环结束后执行do.call("rbind", mylist)

英文:

I don't have SIBER, but perhaps this will work: it first creates an expansion on each from df1 and df2, then runs the function on each pair.

We can use mapply (to replace your two for loops); this returns a matrix here (needing to be tranposed, easy enough), which is okay because the return value from maxLikOverlap is a numeric vector length-3.

eg &lt;- expand.grid(
  a = c(&quot;z.a&quot;,&quot;z.b&quot;,&quot;z.c&quot;,&quot;z.d&quot;,&quot;z.e&quot;,&quot;z.f&quot;,&quot;z.g&quot;),
  b = c(&quot;z.a&quot;,&quot;z.b&quot;),
  stringsAsFactors = FALSE)
eg
#      a   b
# 1  z.a z.a
# 2  z.b z.a
# 3  z.c z.a
# 4  z.d z.a
# 5  z.e z.a
# 6  z.f z.a
# 7  z.g z.a
# 8  z.a z.b
# 9  z.b z.b
# 10 z.c z.b
# 11 z.d z.b
# 12 z.e z.b
# 13 z.f z.b
# 14 z.g z.b

res &lt;- t(mapply(function(a, b) maxLikOverlap(a, b, siber.test, p.interval = NULL, n = 100), 
                eg$a, eg$b))

From here, you can call cbind(eg, data.frame(res)), and you'll now have columns X1 through X3 (or is the vector returned from maxLikOverlap named?), which you can name what you prefer.

mapply calls the function (its first argument, with as many named arguments as you want to pass to the rest of mapply) for each pair of the vector/lists you provide later. For instance, it "unrolls" to

maxLikOverlap(&quot;z.a&quot;, &quot;z.a&quot;, siber.test, p.interval = NULL, n = 100)
maxLikOverlap(&quot;z.b&quot;, &quot;z.a&quot;, siber.test, p.interval = NULL, n = 100)
maxLikOverlap(&quot;z.c&quot;, &quot;z.a&quot;, siber.test, p.interval = NULL, n = 100)
...

and returning the results in a vector/array/matrix (if all return values are vectors and the same class) or a list (otherwise).


Issues where your code is breaking.

  • vec=c(df1[i,],df2[j,],ij.overlap,ij.overlap.95,ij.over,ij.over.95)
    is concatenating strings (from df1[1,] and df2[j,]) with the numeric return value in ij.overlap; see c(&quot;A&quot;, 1) to see what happens to the 1, you should not do this. If you want to combine them into one row, I suggest something like

    tmpframe &lt;- data.frame(arg1=df1[i,], arg2=df2[j,], area.1=is.overlap, ...) # fill in rest
    
  • mylist[[i]]=vec is overwriting the previous j-iteration each time in the loop, I don't think this is intentional. Instead, you might try mylist &lt;- c(mylist, list(tmpframe)), and then after your loops doing your do.call(&quot;rbind&quot;, mylist).

答案2

得分: 0

不使用嵌套的for循环,您可以:

  1. 创建两个长度相同的椭圆向量,您可以同时迭代(在您的代码中为df1和df2)。
  2. 迭代两个椭圆向量(使用purrr::map2()base::mapply())以获取每个椭圆组合的结果。
  3. 行绑定(dplyr::bind_rows())结果并将椭圆向量添加为列。
library(tidyverse)
library(GGally)
library(SIBER)

# 数据错误,请确保仅包括您要查找的组合。
ellipses1 <- 
  distinct(si.test, group, community) | 
  mutate(ellipse = paste(community, group, sep = ".")) | 
  pull(ellipse)

# 这将创建第二个椭圆向量。
ellipses2 <- rep_len(ellipses1[1:2], length(ellipses1))

# 使用`purrr::map2()`,我们可以同时迭代ellipses1和ellipses2。
# 每次迭代将每个向量的一个元素输入到`maxLikOverlap()`中。
# `bind_rows()`将所有内容合并为数据帧/表格。`mutate()`添加椭圆向量。`relocate()`将椭圆列放在值列之前。

map2(
  ellipses1,
  ellipses2,
  maxLikOverlap,
  siber.object = siber.test,
  p.interval = NULL,
  n = 100
) | 
  bind_rows() | 
  mutate(elipse1 = ellipses1,
         elipse2 = ellipses2) | 
  relocate(elipse1, elipse2, .before = area.1)
英文:

Instead of using nested for-loops, you could:

  1. Create two ellipse vectors of the same length that you can iterate over simultaneously (df1, df2 in your code).
  2. Iterate over the two ellipse vectors (using purrr::map2(), or base::mapply()) to obtain the results for each ellipse combination.
  3. Row bind (dplyr::bind_rows()) the results and add the ellipse vectors as columns.
library(tidyverse)
library(GGally)
library(SIBER)

# I got an error for the data you shared is it maybe truncated? Instead, I&#39;m using
# the demo data provided in the `SIBER` package.
data(demo.siber.data)

# Modify the demo data to be more similar to your data.
si.test &lt;- 
  demo.siber.data |&gt; 
  mutate(group = factor(group, label = letters[1:3]),
         community = factor(community, labels = rev(letters)[1:2]))

# The code below should work for your data as is.

siber.test &lt;- createSiberObject(si.test)


# This extracts all group, and community combinations found in the data. Make sure to adjust
# this only to include the combinations you are looking for.
ellipses1 &lt;- 
  distinct(si.test, group, community) |&gt; 
  mutate(ellipse = paste(community, group, sep = &quot;.&quot;)) |&gt; 
  pull(ellipse)

# This will create the second ellipse vector.
ellipses2 &lt;- rep_len(ellipses1[1:2], length(ellipses1))

# Using `purrr::map2()` we can simoultanesously iterate over ellipses1 and ellipses2.
# Each iteration feeds one element of each vector into `maxLikOverlap()`.
# `bind_rows()` combines everything into a data.frame/tibble. `mutate()` adds
# the ellipse vectors. `relocate()` places the ellipse columns before the
# value columns.

map2(
  ellipses1,
  ellipses2,
  maxLikOverlap,
  siber.object = siber.test,
  p.interval = NULL,
  n = 100
) |&gt; 
  bind_rows() |&gt; 
  mutate(elipse1 = ellipses1,
         elipse2 = ellipses2) |&gt; 
  relocate(elipse1, elipse2, .before = area.1)
#&gt; # A tibble: 6 &#215; 5
#&gt;   elipse1 elipse2 area.1 area.2 overlap
#&gt;   &lt;chr&gt;   &lt;chr&gt;    &lt;dbl&gt;  &lt;dbl&gt;   &lt;dbl&gt;
#&gt; 1 z.a     z.a      5.99    5.99    5.99
#&gt; 2 z.b     z.b      3.37    3.37    3.37
#&gt; 3 z.c     z.a      5.31    5.99    0   
#&gt; 4 y.a     z.b      0.893   3.37    0   
#&gt; 5 y.b     z.a      3.58    5.99    0   
#&gt; 6 y.c     z.b      0.459   3.37    0

huangapple
  • 本文由 发表于 2023年8月5日 01:10:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76837947.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定