输出由使用sapply()的用户定义函数生成的数据的问题。

huangapple go评论69阅读模式
英文:

Issue outputting data generated from a user-defined function that uses sapply()

问题

I am attempting to summarize the data generated from looping through data using sapply() and am not sure why I can't access the dataframe that's generated. I'm setting up a monte-carlo simulation, and I have a function that sets up parameters and an estimate, and I wish to apply that function a set number of times per datapoint in a data set. So, I am using replicate() for my function, for which I have within a function that uses sapply(). It appears to work, however the data cannot be accessed in order to describe the resulting distributions of estimates; the data frame is not output, but is printed. Nice, but I need to now take them and calculate means and confidence intervals, and probably plot some.

I'm not super experienced with R, but here is basically what I'm trying to do:

repapply <- function(iter, datapoint){
  estimates <- sapply(datapoint, function(datapoint){
    replicate(n=iter, expr=generate_data(datapoint))
  })
  estimatesdf <- data.frame(estimates)
  return(estimatesdf)
}
#test:
repapply(1000, measure)

Could anyone explain why there isn't any output dataframe? It prints the information below:

X1                                                   X2
1 476.454335, 6.240725, 4.433396, 24.017384, 36.900104 594.890067, 2.310075, 7.210158, 21.379092, 30.256849
X3                                                   X4
1 359.167706, 5.817891, 7.276368, 20.776742, 23.539489 459.848602, 3.826445, 4.319803, 23.774576, 52.130509
X5                                                   X6
1 504.624220, 8.159456, 4.110860, 23.805009, 42.983076 578.252014, 6.749054, 5.880862, 23.312351, 42.320465
X7                                                   X8
1 427.196750, 7.458934, 3.295953, 24.764725, 45.647360 284.724297, 5.234101, 6.481678, 20.159478, 42.160186
X9                                                  X10
1 307.605356, 4.386591, 5.562230, 22.711697, 3.675961 418.109465, 5.618156, 3.135784, 24.503502, 34.891379

 ...
英文:

I am attempting to summarize the data generated from looping through data using sapply() and am not sure why I can't access the dataframe that's generated. I'm setting up a monte-carlo simulation, and I have a function that sets up parameters and an estimate, and I wish to apply that function a set number of times per datapoint in a data set. So, I am using replicate() for my function, for which I have within a function that uses sapply(). It appears to work, however the data cannot be accessed in order to describe the resulting distributions of estimates; the data frame is not output, but is printed. Nice, but I need to now take them and calculate means and confidence intervals, and probably plot some.

I'm not super experienced with R, but here is basically what I'm trying to do:

repapply <- function(iter, datapoint){
  estimates <- sapply(datapoint, function(datapoint){
    replicate(n=iter, expr=generate_data(datapoint))
  })
  estimatesdf <- data.frame(estimates)
  return(estimatesdf)
}
#test:
repapply(1000, measure)

Could anyone explain why there isn't any output dataframe? It prints the information below:

X1                                                   X2
1 476.454335, 6.240725, 4.433396, 24.017384, 36.900104 594.890067, 2.310075, 7.210158, 21.379092, 30.256849
X3                                                   X4
1 359.167706, 5.817891, 7.276368, 20.776742, 23.539489 459.848602, 3.826445, 4.319803, 23.774576, 52.130509
X5                                                   X6
1 504.624220, 8.159456, 4.110860, 23.805009, 42.983076 578.252014, 6.749054, 5.880862, 23.312351, 42.320465
X7                                                   X8
1 427.196750, 7.458934, 3.295953, 24.764725, 45.647360 284.724297, 5.234101, 6.481678, 20.159478, 42.160186
X9                                                  X10
1 307.605356, 4.386591, 5.562230, 22.711697, 3.675961 418.109465, 5.618156, 3.135784, 24.503502, 34.891379

 ...

答案1

得分: 1

欢迎来到SO!

简短回答:

你必须将 repapply(1000, measure) 赋值给一个值或对象。换句话说,你必须给它取个名字。例如:

df <- repapply(1000, measure)

原因:

当你在函数环境中定义对象时,它们局限于该函数的范围。所以,当你返回 estimatesdf 时,实际上只是返回它所指向的字面上的数据帧。因此,你甚至可以将函数的最后两行压缩成 return(data.frame(estimates)),你将得到相同的结果。


或者:

与函数中定义的对象不同,函数中修改的函数之外已存在的对象会在函数范围之外保留其值。如果你在函数外部定义了 estimatesdf(例如,将其设置为0),并且消除了 return() 调用,那么运行 repapply(1000, measure) 会将 estimatesdf 设置为所需的数据帧。

英文:

welcome to SO!

Short Answer:

You have to assign repapply(1000, measure) to a value/object. In other words, you have to name it. For instance:

df &lt;- repapply(1000, measure)

Why:

When you define objects in a function environment, they are local to that function's scope. So, when you return estimatesdf, you are really just returning the literal data.frame that it points to. Hence, you could even compress the last two lines of your function into return(data.frame(estimates)), and you would get the same result.


Alternatively:

Unlike objects defined in a function, pre-existing (outside the scope of the function) objects which are modified in a function do retain their value outside of the function's scope. If you define estimatesdf (e.g., by setting it equal to 0) outside the function, and eliminated the return() call, then running repapply(1000, measure) would set the estimatesdf to the desired data.frame.

huangapple
  • 本文由 发表于 2023年5月25日 03:57:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327006.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定