制作学习曲线

huangapple go评论62阅读模式
英文:

How to make a learning curve

问题

I understand that you're experiencing an issue with your R code, and you would like assistance in resolving it. However, I can't directly execute or debug code. I can provide guidance and suggestions based on the information you've provided.

The error message you're encountering, "object 'ID' not found," suggests that the 'ID' variable is not recognized within the ggplot function when you're trying to use it for coloring.

Make sure that you have correctly loaded the 'individual_df' data frame and that the 'ID' column exists in the data frame.

To diagnose the issue, you can try the following:

  1. Check the structure of 'individual_df' using str(individual_df) to ensure that the 'ID' column exists.

  2. Verify that the 'ID' column in your data frame is named exactly 'ID' (case-sensitive).

  3. Ensure that there are no typos or extra spaces in the column names or variable names.

  4. Double-check your data file to make sure it's correctly formatted and that the 'ID' column is present.

If you continue to face issues, please provide more specific details about your data and the code you're using, and I'll do my best to assist you further.

英文:

getting error while making a learning curve on binary data

 library(dplyr)
 library(tidyr)
 library(dplyr)
 library(tidyverse)
 library(tidytext)
 library(ggplot2)

#data frame

structure(list(ID = c(32L, 32L, 32L, 32L, 32L, 32L, 32L, 32L,
33L, 33L, 33L, 33L, 33L, 33L, 33L, 33L, 34L, 34L, 34L, 34L, 34L,
34L, 34L, 34L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 36L, 43L, 43L,
43L, 43L, 43L, 43L, 43L, 43L, 47L, 47L, 47L, 47L, 47L, 47L, 47L,
47L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 55L, 56L, 56L, 56L, 56L,
56L, 56L, 56L, 56L, 57L, 57L, 57L, 57L, 57L, 57L, 57L, 57L, 59L,
59L, 59L, 59L, 59L, 59L, 59L, 59L, 69L, 69L, 69L, 69L, 69L, 69L,
69L, 69L, 71L, 71L, 71L, 71L, 71L, 71L, 71L, 71L, 72L, 72L, 72L,
72L, 72L, 72L, 72L, 72L, 79L, 79L, 79L, 79L, 79L, 79L, 79L, 79L,
80L, 80L, 80L, 80L, 80L, 80L, 80L, 80L, 81L, 81L, 81L, 81L, 81L,
81L, 81L, 81L, 82L, 82L, 82L, 82L, 82L, 82L, 82L, 82L, 83L, 83L,
83L, 83L, 83L, 83L, 83L, 83L, 84L, 84L, 84L, 84L, 84L, 84L, 84L,
84L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 92L, 92L, 92L, 92L,
92L, 92L, 92L, 92L, 123L, 123L, 123L, 123L, 123L, 123L, 123L,
123L, 124L, 124L, 124L, 124L, 124L, 124L, 124L, 124L, 125L, 125L,
125L, 125L, 125L, 125L, 125L, 125L, 126L, 126L, 126L, 126L, 126L,
126L, 126L, 126L, 127L, 127L, 127L, 127L, 127L, 127L, 127L, 127L,
128L, 128L, 128L, 128L, 128L, 128L, 128L, 128L, 137L, 137L, 137L,
137L, 137L, 137L, 137L, 137L, 138L, 138L, 138L, 138L, 138L, 138L,
138L, 138L, 139L, 139L, 139L, 139L, 139L, 139L, 139L, 139L, 140L,
140L, 140L, 140L, 140L, 140L, 140L, 140L, 147L, 147L, 147L, 147L,
147L, 147L, 147L, 147L, 148L, 148L, 148L, 148L, 148L, 148L, 148L,
148L, 149L, 149L, 149L, 149L, 149L, 149L, 149L, 149L, 150L, 150L,
150L, 150L, 150L, 150L, 150L, 150L, 151L, 151L, 151L, 151L, 151L,
151L, 151L, 151L, 152L, 152L, 152L, 152L, 152L, 152L, 152L, 152L,
159L, 159L, 159L, 159L, 159L, 159L, 159L, 159L, 160L, 160L, 160L,
160L, 160L, 160L, 160L, 160L), Measurement = c(1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L), Value = c(4L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 2L,
1L, 2L, 0L, 0L, 0L, 0L, 0L, 3L, 2L, 2L, 0L, 0L, 0L, 0L, 0L, 3L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 3L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 9L, 14L, 14L, 0L, 0L, 0L, 0L,
1L, 5L, 5L, 7L, 5L, 8L, 0L, 0L, 1L, 1L, 1L, 10L, 4L, 4L, 6L,
0L, 1L, 1L, 5L, 3L, 5L, 0L, 0L, 0L, 1L, 3L, 2L, 3L, 1L, 2L, 0L,
0L, 2L, 1L, 1L, 5L, 2L, 9L, 8L, 8L, 4L, 3L, 2L, 5L, 3L, 2L, 4L,
0L, 4L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 13L, 2L, 4L, 1L, 9L, 3L,
5L, 9L, 5L, 1L, 6L, 1L, 7L, 18L, 14L, 15L, 9L, 3L, 3L, 9L, 2L,
11L, 9L, 13L, 1L, 4L, 1L, 1L, 6L, 2L, 6L, 8L, 1L, 1L, 6L, 1L,
3L, 4L, 3L, 10L, 5L, 2L, 3L, 5L, 6L, 3L, 3L, 0L, 7L, 1L, 5L,
2L, 7L, 9L, 13L, 14L, 4L, 3L, 4L, 2L, 0L, 0L, 0L, 0L, 7L, 1L,
5L, 0L, 0L, 0L, 0L, 0L, 3L, 3L, 6L, 7L, 7L, 4L, 6L, 4L, 2L, 1L,
5L, 4L, 0L, 0L, 0L, 0L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
0L, 0L, 0L, 0L, 0L, 0L, 6L, 6L, 5L, 3L, 9L, 20L, 8L, 10L, 4L,
3L, 2L, 2L, 4L, 5L, 0L, 0L, 11L, 5L, 3L, 4L, 7L, 1L, 0L, 0L,
10L, 1L, 2L, 5L, 0L, 0L, 0L, 0L, 1L, 5L, 4L, 2L, 8L, 8L, 6L,
0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L,
0L, 3L, 1L, 2L, 2L, 0L, 0L, 0L, 0L, 7L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 3L, 1L, 1L, 4L, 2L, 3L, 0L,
0L, 1L, 6L, 2L, 0L, 0L, 0L, 0L, 0L), Success = c(1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0,
1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0,
0, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0,
0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1,
1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1,
1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,
0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1,
1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0,
0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)), row.names = c(NA,
-312L), class = "data.frame")
then

#Read the the file

individual_df <- read.csv("C:/Users/ASUS/OneDrive/Desktop/Working_R_Sheets/output_list -           Second_copy2.csv")

Then i plotted the data

ggplot(individual_df, aes(x = as.factor(Measurement), y = Value, group = ID, color = factor(ID))) +
  geom_point(shape = 21, size = 5, fill = "white") +
  geom_line(size = 1) +
  labs(x = "Measurement", y = "Value", color = "ID") +
  scale_color_discrete(name = "ID")

then i got this nice plot

制作学习曲线

Then

# Add a new column "success" where 3 or above 3 from the value counts will considered as success

individual_df$Success <- ifelse(individual_df$Value >= 3, 1, 0)

# Fit logistic regression model

model <- glm(Success ~ Measurement, data = individual_df, family = binomial)
 summary(model)

# Create a new data frame for prediction

prediction_data <- data.frame(
  Measurement = seq(min(individual_df$Measurement), max(individual_df$Measurement), length.out = 100)
)

# Predict probabilities

prediction_data$fit <- predict(model, newdata = prediction_data, type = "response")

# Create the plot

ggplot() +
  geom_point(data = prediction_data, aes(x = Measurement, y = fit, fill = as.factor(ID)), shape = 21,           size = 4) +
  geom_line(data = prediction_data, aes(x = Measurement, y = fit), color = "blue", size = 1) +
  labs(x = "Measurement", y = "Success", title = "Logistic Regression") +
  theme_minimal()

I am getting error here

Error in `geom_point()`:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error:
! object 'ID' not found
Run `rlang::last_trace()` to see where the error occurred

i want a plot like this

制作学习曲线

So here is the data_frame that i imported to R

text

Thank you.

答案1

得分: 0

以下是翻译好的部分:

模型不考虑ID,而是假设成功的对数几率根据测量线性变化。如果您想要针对每个ID进行预测,您需要将ID包含在模型中,可能作为随机效应。

以下代码直接从源下载您的数据,执行混合效应逻辑回归,然后绘制每个ID范围内的预测:

library(tidyverse)
library(lme4)

df <- 'https://raw.githubusercontent.com/conda-suman07/' %>%
  paste0('Learn_Python/7aae4cbb522dacb854bb0b0adc2779eb62c3d1e9') %>%
  paste0('/output_list%20-%20Second_copy2.csv') %>%
  read.csv() %>%
  mutate(ID = factor(ID)) %>%
  mutate(Success = ifelse(individual_df$Value >= 3, 1, 0))

mod <- glmer(Success ~ Measurement|ID, family = binomial, data = df)

expand.grid(ID = unique(df$ID), Measurement = 0:7) %>%
  mutate(Probability = predict(mod, ., type = 'response')) %>%
  ggplot(aes(x = Measurement, y = Probability, color = ID, group = ID)) +
  geom_line() +
  geom_point(shape = 21, fill = 'white', size = 2.5)

请务必检查您模型的假设以及其实际含义。该模型表明成功的对数几率与测量线性相关,但效应可以是正的或负的,这取决于个体。我不确定您的原始数据是否支持使用这种模型。对我来说,看起来在测量0和1之间有一个大幅下降,然后随着测量值超过1而逐渐增加。使模型代表您的数据非常依赖于上下文,如果您不确定如何继续,可能需要统计学家的意见。

英文:

The model you are using does not take ID into account, but instead assumes that the log odds of success varies linearly according to Measurement. If you want predictions for each ID, you will need to include ID in your model, presumably as a random effect.

The following code downloads your data directly from source, carries out a mixed-effects logistic regression, then plots predictions across the range for each ID:

library(tidyverse)
library(lme4)

df &lt;- &#39;https://raw.githubusercontent.com/conda-suman07/&#39; %&gt;%
  paste0(&#39;Learn_Python/7aae4cbb522dacb854bb0b0adc2779eb62c3d1e9&#39;) %&gt;%
  paste0(&#39;/output_list%20-%20Second_copy2.csv&#39;) %&gt;%
  read.csv() %&gt;%
  mutate(ID = factor(ID)) %&gt;%
  mutate(Success = ifelse(individual_df$Value &gt;= 3, 1, 0))

mod &lt;- glmer(Success ~ Measurement|ID, family = binomial, data = df)

expand.grid(ID = unique(df$ID), Measurement = 0:7) %&gt;%
  mutate(Probability = predict(mod, ., type = &#39;response&#39;)) %&gt;%
  ggplot(aes(x = Measurement, y = Probability, color = ID, group = ID)) +
  geom_line() +
  geom_point(shape = 21, fill = &#39;white&#39;, size = 2.5)

制作学习曲线

It is important that you check the assumptions of your model here, and what it actually means. This model states that the log odds of success is linearly related to Measurement, but that the effect can be positive or negative, depending on the individual. I'm not sure your raw data supports the use of such a model. To me, it looks as though there is a large drop in values between Measurement 0 and 1, then a gradual increase as Measurement increases above 1. Getting a model to represent your data is very context specific, and might need the input of a statistician if you are unsure how to proceed.

huangapple
  • 本文由 发表于 2023年5月29日 21:53:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76357954.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定