R告诉我两个因子水平是”0″和”1″,但然后列出1和2,并将它们绘制为1和2。

huangapple go评论128阅读模式
英文:

Why does R tell me the two factor levels are "0" "1" But then lists 1s and 2s and plots them as 1 and 2?

问题

我没做什么复杂的操作,只是用 as.factor() 和 factor() 将我的 0 和 1 的数据转换为因子。

  1. data$Class <- as.factor(as.numeric(data$Class))
  2. str(data$Class)
  3. data$Class
  4. 输出:
  5. Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 2 2 ...
  6. [1] 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

这似乎矛盾。

我的天真假设是这是一个 bug。

我尝试创建一个新的变量

  1. data$Class2 <- ifelse(data$Class == '1', 1, 0)
  2. data$Class2 <- factor(data$Class2)
  3. data$Class2

然后我也尝试通过强制转换来做

  1. factor(data$Class, levels = c("0","1"), labels = c("0","1"))

这些都产生相同的结果。

英文:

I do nothing fancy, just turn my data of 0s and 1s into factors with as.factor() and factor(), which do the same thing.

  1. data$Class <- as.factor(as.numeric(data$Class))
  2. str(data$Class)
  3. data$Class
  4. Output:
  5. Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 2 2 ...
  6. [1] 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

This seems contradictory.
My naive assumption is it is a bug.

I tried making a new variable

  1. data$Class2 <- ifelse(data$Class == '1', 1, 0)
  2. data$Class2 <- factor(data$Class2)
  3. data$Class2

Then I also tried forcing it by doing it with

  1. factor(data$Class, levels = c("0","1"), labels = c("0","1"))

These all produce the same outcome.

R告诉我两个因子水平是”0″和”1″,但然后列出1和2,并将它们绘制为1和2。

答案1

得分: 5

你对对象的基础表示和打印方式之间的差异感到困惑是可以理解的。

  1. x <- factor(c(0,0,1,1,1))

在底层,这被存储为从1开始的一组整数代码和与每个代码对应的一组级别:1 = "0",2 = "1" 在这种情况下。

str(x) 告诉你关于对象的structure:

  1. str(x)
  2. Factor w/ 2 levels "0","1": 1 1 2 2 2

这打印出对象的类别(一个"factor with two levels, '0' and '1' ")和其底层整数值(1,1,2,2,2)。

使用 print 打印出每个元素对应的级别不带引号,以及元素的列表:

  1. print(x)
  2. [1] 0 0 1 1 1
  3. Levels: 0 1

使用 print(., quote = TRUE) 可能会更少令人困惑:

  1. print(x, quote=TRUE)
  2. [1] "0" "0" "1" "1" "1"
  3. Levels: "0" "1"

虽然 glm 乐意接受因变量为因子,但要绘制它们需要做更多工作,例如:

  1. mm <- transform(mtcars, am = factor(am)) ## 0-1 levels
  2. ff <- glm(am ~ mpg, family = binomial, data = mm)
  3. ## 绘图:抑制轴,以便我们可以添加自己的标签
  4. plot(as.numeric(am) ~ mpg, data = mm, axes = FALSE, ylab = "am")
  5. axis(side = 1)
  6. ## 在y轴上添加类别标签
  7. axis(side = 2, at = 1:2, labels = levels(mm$am))
  8. pframe <- data.frame(mpg = seq(min(mm$mpg), max(mm$mpg), length = 51))
  9. ## 找到预测概率 *并加1以与轴对齐*
  10. pframe$am <- 1 + predict(ff, newdata = pframe, type = "response")
  11. with(pframe, lines(mpg, am))
英文:

You're (understandably) confused by the differences between the underlying representation of the object and the way it is printed.

  1. x &lt;- factor(c(0,0,1,1,1))

Underneath, this is stored as a set of integer codes starting from 1 and a set of levels corresponding to each code: 1 = "0", 2 = "1" in this case.

str(x) tells you about the structure
of the object:

  1. str(x)
  2. Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 2 2 2

This is printing out the class of the object (a "factor with two levels, '0' and '1'") and its underlying integer values (1,1,2,2,2).

Using print prints the level corresponding to each element, without quotation marks, and a list of the elements:

  1. print(x)
  2. [1] 0 0 1 1 1
  3. Levels: 0 1

Using print(., quote = TRUE) might be less confusing:

  1. print(x, quote=TRUE)
  2. [1] &quot;0&quot; &quot;0&quot; &quot;1&quot; &quot;1&quot; &quot;1&quot;
  3. Levels: &quot;0&quot; &quot;1&quot;

While glm is happy to take a factor as a response variable, you have to do more work to plot them, e.g.

  1. mm &lt;- transform(mtcars, am = factor(am)) ## 0-1 levels
  2. ff &lt;- glm(am ~ mpg, family = binomial, data = mm)
  3. ## plot: suppress axes so we can add our own labels
  4. plot(as.numeric(am) ~ mpg, data = mm, axes = FALSE, ylab = &quot;am&quot;)
  5. axis(side = 1)
  6. ## add class labels to y-axis
  7. axis(side = 2, at = 1:2, labels = levels(mm$am))
  8. pframe &lt;- data.frame(mpg = seq(min(mm$mpg), max(mm$mpg), length = 51))
  9. ## find predicted probability *and add 1 to line up with axis*
  10. pframe$am &lt;- 1 + predict(ff, newdata = pframe, type = &quot;response&quot;)
  11. with(pframe, lines(mpg, am))

huangapple
  • 本文由 发表于 2023年6月6日 07:54:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76410644.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定