如何从“party”随机森林(cforest)中获取每棵树的响应(预测)?

huangapple go评论53阅读模式
英文:

How to get the response (predict) of each tree from a "party" random forest (cforest)?

问题

我使用party::cforest训练了一个随机森林,用于回归(连续响应)。当使用"predict(type="response")"时,得到的只是所有n_trees响应的均值。如何获得每个单独树的响应(即n_trees个响应)?非常感谢!我已经尝试了几周,但仍然一筹莫展!

我还尝试了使用partykit训练森林,但仍然找不到获取所有响应的方法。在文档中有一个使用分位数函数的示例。我尝试使用函数(y, w) median(y) 来获取所有响应的中位数(如果我不能明确获得所有答案,至少我认为我可以从中获取一些统计信息),但这为所有数据点提供了相同的值。所以我真的不明白partykit::predict中的FUN应该如何工作。

我还尝试了predict(type="prob"),如其他帖子建议用于分类随机森林,但这导致了错误"cannot compute empirical distribution function with non-integer weights"。

所以我仍然一筹莫展。感谢任何帮助!

英文:

I trained a random forest with party::cforest with n_trees for a regression (continuous response).
When using "predict(type="response") what one get is only the mean of all n_trees responses.
How do I get the response of each individual tree (that is, n_trees responses) ?
Thank you very much! I've been trying for weeks and I'm still clueless!

I also tried training the forest with partykit, but still I cannot find a way of getting all responses.
In the documentation there is an example with a quantile function. I tried getting the median of all responses (If I can't get all answers explicitly, at least I thought I could get some stats from it), with function(y, w) median(y), but that gives me the same value for all datapoints. So I didn't really understand how the FUN should work in the partykit::predict

I also tried predict(type="prob"), as suggested in other posts for classification randomforests, but with that I got an error "cannot compute empirical distribution function with non-integer weights".

So I remain clueless.
Thank you for any help!

答案1

得分: 0

以下是翻译的内容:

"ntree" 的个体预测实际上并不是在 "cforest()" 内部计算的。相反,森林的预测是原始响应的加权均值,其中权重取决于新数据点。

然而,您可以设置 "ntree" 个体树并自行计算预测。所有必要的信息都在 "cforest" 对象中。

让我们考虑以下简单的示例,使用具有只有 10 个树的森林的 "cars" 数据:

library("partykit")
set.seed(1)
cf <- cforest(dist ~ speed, data = cars, ntree = 10)

然后,您可以获取两个新数据点的预测:

nd <- data.frame(speed = c(10, 20)) 
predict(cf, newdata = nd)  
##        1        2
## 22.65411 63.11666

现在,为了复制这个过程,我们还可以设置森林中的 10 个个体树。为此,我们使用 "constparty" 类,这也是由 "ctree()" 返回的:

ct <- lapply(seq_along(cf$nodes), function(i) as.constparty(
  party(cf$nodes[[i]], data = cf$data, terms = cf$terms,
    fitted = data.frame(
      `(response)` = cf$fitted[["(response)"]],
      `(weights)` = cf$weights[[i]],
      check.names = FALSE))
))

然后,您可以将 10 个 "constparty" 树的列表应用 "predict()" 方法,以获取 10 个个体预测并计算它们的均值:

p <- sapply(ct, predict, newdata = nd)
dim(p)
## [1]  2 10
rowMeans(p)
##        1        2 
## 22.65411 63.11666 

但现在您还可以检查包含来自所有个体树的预测的完整的 2 x 10 矩阵 "p"。

英文:

The ntree individual predictions are actually not computed within cforest(). Instead the predictions of the forest are computed as weighted means of the original responses, where the weights depend on the new data points.

However, you can set up the ntree individual trees and compute the predictions yourself. All the necessary information is in the cforest object.

Let's consider the following simple example for the cars data using a forest with only 10 trees:

library(&quot;partykit&quot;)
set.seed(1)
cf &lt;- cforest(dist ~ speed, data = cars, ntree = 10)

Then you can obtain the predictions for two new data points:

nd &lt;- data.frame(speed = c(10, 20)) 
predict(cf, newdata = nd)  
##        1        2
## 22.65411 63.11666

Now to replicate this we can also set up the 10 individual trees from the forest. For this we use the constparty class as also returned by ctree():

ct &lt;- lapply(seq_along(cf$nodes), function(i) as.constparty(
  party(cf$nodes[[i]], data = cf$data, terms = cf$terms,
    fitted = data.frame(
      `(response)` = cf$fitted[[&quot;(response)&quot;]],
      `(weights)` = cf$weights[[i]],
      check.names = FALSE))
))

To the list of 10 constparty trees you can then apply the predict() method to obtain the 10 individual predictions and compute their mean:

p &lt;- sapply(ct, predict, newdata = nd)
dim(p)
## [1]  2 10
rowMeans(p)
##        1        2 
## 22.65411 63.11666 

But now you can also inspect the full 2 x 10 matrix p with the predictions from all individual trees.

huangapple
  • 本文由 发表于 2023年8月5日 01:42:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76838122.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定