英文:
Finding the precision, recall and the f1 in R
问题
I want to run models on a loop via and then store the performance metrics into a table. I do not want to use the confusionMatrix function in caret, but I want to compute the precision, recall and f1
and then store those in a table. Please assist, edits to the code are welcome.
My attempt is below.
library(MASS) #will load our biopsy data
library(caret)
data("biopsy")
biopsy$ID <- NULL
names(biopsy) <- c('clump thickness','uniformity cell size','uniformity cell shape',
'marginal adhesion','single epithelial cell size','bare nuclei',
'bland chromatin','normal nuclei','mitosis','class')
sum(is.na(biopsy))
biopsy <- na.omit(biopsy)
sum(is.na(biopsy))
head(biopsy, 5)
set.seed(123)
inTraining <- createDataPartition(biopsy$class, p = .75, list = FALSE)
training <- biopsy[inTraining,]
testing <- biopsy[-inTraining,]
# Run algorithms using 10-fold cross validation
control <- trainControl(method="repeatedcv", number=10, repeats = 5, verboseIter = FALSE, classProbs = TRUE)
# CHANGING THE CHARACTERS INTO FACTORS VARIABLES
training <- as.data.frame(unclass(training),
stringsAsFactors = TRUE)
# CHANGING THE CHARACTERS INTO FACTORS VARIABLES
testing <- as.data.frame(unclass(testing),
stringsAsFactors = TRUE)
models <- c("svmRadial", "rf")
results_table <- data.frame(models = models, stringsAsFactors = FALSE)
for (i in models){
model_train <- train(class ~ ., data = training, method = i,
trControl = control, metric = "Accuracy")
predictions <- predict(model_train, newdata = testing)
precision_ <- posPredValue(predictions, testing)
recall_ <- sensitivity(predictions, testing)
f1 <- (2 * precision_ * recall_) / (precision_ + recall_)
# put that in the results table
results_table[i, "Precision"] <- precision_
results_table[i, "Recall"] <- recall_
results_table[i, "F1score"] <- f1
}
However, I get an error which says Error in posPredValue.default(predictions, testing) : inputs must be factors
. I do not know where I went wrong, and any edits to my code are welcome.
I know that I could get precision, recall, f1
by just using the code below (B), however, this is a tutorial question where I am required not to use the code example below (B):
(B)
for (i in models){
model_train <- train(class ~ ., data = training, method = i,
trControl = control, metric = "Accuracy")
predictions <- predict(model_train, newdata = testing)
print(confusionMatrix(predictions, testing$class, mode = "prec_recall"))
}
英文:
I want to run models on a loop via and then store the performance metrics into a table. I do not want to use the confusionMatrix function in caret, but I want to compute the precision, recall and f1
and then store those in a table. Please assist, edits to the code are welcome.
My attempt is below.
library(MASS) #will load our biopsy data
library(caret)
data("biopsy")
biopsy$ID<-NULL
names(biopsy)<-c('clump thickness','uniformity cell size','uniformity cell shape',
'marginal adhesion','single epithelial cell size','bare nuclei',
'bland chromatin','normal nuclei','mitosis','class')
sum(is.na(biopsy))
biopsy<-na.omit(biopsy)
sum(is.na(biopsy))
head(biopsy,5)
set.seed(123)
inTraining <- createDataPartition(biopsy$class, p = .75, list = FALSE)
training <- biopsy[ inTraining,]
testing <- biopsy[-inTraining,]
# Run algorithms using 10-fold cross validation
control <- trainControl(method="repeatedcv", number=10,repeats = 5, verboseIter = F, classProbs = T)
#CHANGING THE CHARACTERS INTO FACTORS VARAIBLES
training<- as.data.frame(unclass(training),
stringsAsFactors = TRUE)
#CHANGING THE CHARACTERS INTO FACTORS VARAIBLES
testing <- as.data.frame(unclass(testing),
stringsAsFactors = TRUE)
models<-c("svmRadial","rf")
results_table <- data.frame(models = models, stringsAsFactors = F)
for (i in models){
model_train<-train(class~., data=training, method=i,
trControl=control,metric="Accuracy")
predictions<-predict(model_train, newdata=testing)
precision_<-posPredValue(predictions,testing)
recall_<-sensitivity(predictions,testing)
f1<-(2*precision_*recall_)/(precision_+recall_)
# put that in the results table
results_table[i, "Precision"] <- precision_
results_table[i, "Recall"] <- recall_
results_table[i, "F1score"] <- f1
}
However I get an error which says Error in posPredValue.default(predictions, testing) :
. i do not know where I went wrong and any edits to my code are welcome.
inputs must be factors
I know that I could get precision,recall, f1
by just using the code below (B), however this is a tutorial question where I am required not to use the code example below (B):
(B)
for (i in models){
model_train<-train(class~., data=training, method=i,
trControl=control,metric="Accuracy")
predictions<-predict(model_train, newdata=testing)
print(confusionMatrix(predictions, testing$class,mode="prec_recall"))
}
答案1
得分: 1
需要发生一些事情。
-
您需要更改
posPredValue
和sensitivity
的函数调用。对于两者,将testing
更改为testing$class
。 -
对于
results_table
,i
是一个_单词_,而不是一个值,所以您正在分配results_table["rf", "Precision"] <- precision_
(这会创建一个新行,行名为"rf")。
以下是您的for
语句,其中包括对1)中提到的函数的更改以及解决2)中问题的修改。
for (i in models){
model_train <- train(class~., data = training, method = i,
trControl= control, metric = "Accuracy")
assign("fit", model_train)
predictions <- predict(model_train, newdata = testing)
precision_ <- posPredValue(predictions, testing$class)
recall_ <- sensitivity(predictions, testing$class)
f1 <- (2*precision_ * recall_) / (precision_ + recall_)
# 将这些值放入结果表
results_table[results_table$models %in% i, "Precision"] <- precision_
results_table[results_table$models %in% i, "Recall"] <- recall_
results_table[results_table$models %in% i, "F1score"] <- f1
}
这是对我而言的样子。
results_table
# models Precision Recall F1score
# 1 svmRadial 0.9722222 0.9459459 0.9589041
# 2 rf 0.9732143 0.9819820 0.9775785
英文:
A few things need to happen.
-
You have to change the function calls for
posPredValue
andsensitivity
. For both, changetesting
totesting$class
. -
for the
results_table
,i
is a word, not a value, so you're assigningresults_table["rf", "Precision"] <- precision_
(This makes a new row, where the row name is "rf".)
Here is your for
statement, with changes to those functions mentioned in 1) and a modification to address the issue in 2).
for (i in models){
model_train <- train(class~., data = training, method = i,
trControl= control, metric = "Accuracy")
assign("fit", model_train)
predictions <- predict(model_train, newdata = testing)
precision_ <-posPredValue(predictions, testing$class)
recall_ <- sensitivity(predictions, testing$class)
f1 <- (2*precision_ * recall_) / (precision_ + recall_)
# put that in the results table
results_table[results_table$models %in% i, "Precision"] <- precision_
results_table[results_table$models %in% i, "Recall"] <- recall_
results_table[results_table$models %in% i, "F1score"] <- f1
}
This is what it looks like for me.
results_table
# models Precision Recall F1score
# 1 svmRadial 0.9722222 0.9459459 0.9589041
# 2 rf 0.9732143 0.9819820 0.9775785
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论