英文:
Why does R tell me the two factor levels are "0" "1" But then lists 1s and 2s and plots them as 1 and 2?
问题
我没做什么复杂的操作,只是用 as.factor() 和 factor() 将我的 0 和 1 的数据转换为因子。
data$Class <- as.factor(as.numeric(data$Class))
str(data$Class)
data$Class
输出:
Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 2 2 ...
[1] 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
这似乎矛盾。
我的天真假设是这是一个 bug。
我尝试创建一个新的变量
data$Class2 <- ifelse(data$Class == '1', 1, 0)
data$Class2 <- factor(data$Class2)
data$Class2
然后我也尝试通过强制转换来做
factor(data$Class, levels = c("0","1"), labels = c("0","1"))
这些都产生相同的结果。
英文:
I do nothing fancy, just turn my data of 0s and 1s into factors with as.factor() and factor(), which do the same thing.
data$Class <- as.factor(as.numeric(data$Class))
str(data$Class)
data$Class
Output:
Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 2 2 2 ...
[1] 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
This seems contradictory.
My naive assumption is it is a bug.
I tried making a new variable
data$Class2 <- ifelse(data$Class == '1', 1, 0)
data$Class2 <- factor(data$Class2)
data$Class2
Then I also tried forcing it by doing it with
factor(data$Class, levels = c("0","1"), labels = c("0","1"))
These all produce the same outcome.
答案1
得分: 5
你对对象的基础表示和打印方式之间的差异感到困惑是可以理解的。
x <- factor(c(0,0,1,1,1))
在底层,这被存储为从1开始的一组整数代码和与每个代码对应的一组级别:1 = "0",2 = "1" 在这种情况下。
str(x)
告诉你关于对象的structure:
str(x)
Factor w/ 2 levels "0","1": 1 1 2 2 2
这打印出对象的类别(一个"factor with two levels, '0' and '1' ")和其底层整数值(1,1,2,2,2)。
使用 print
打印出每个元素对应的级别,不带引号,以及元素的列表:
print(x)
[1] 0 0 1 1 1
Levels: 0 1
使用 print(., quote = TRUE)
可能会更少令人困惑:
print(x, quote=TRUE)
[1] "0" "0" "1" "1" "1"
Levels: "0" "1"
虽然 glm
乐意接受因变量为因子,但要绘制它们需要做更多工作,例如:
mm <- transform(mtcars, am = factor(am)) ## 0-1 levels
ff <- glm(am ~ mpg, family = binomial, data = mm)
## 绘图:抑制轴,以便我们可以添加自己的标签
plot(as.numeric(am) ~ mpg, data = mm, axes = FALSE, ylab = "am")
axis(side = 1)
## 在y轴上添加类别标签
axis(side = 2, at = 1:2, labels = levels(mm$am))
pframe <- data.frame(mpg = seq(min(mm$mpg), max(mm$mpg), length = 51))
## 找到预测概率 *并加1以与轴对齐*
pframe$am <- 1 + predict(ff, newdata = pframe, type = "response")
with(pframe, lines(mpg, am))
英文:
You're (understandably) confused by the differences between the underlying representation of the object and the way it is printed.
x <- factor(c(0,0,1,1,1))
Underneath, this is stored as a set of integer codes starting from 1 and a set of levels corresponding to each code: 1 = "0", 2 = "1" in this case.
str(x)
tells you about the structure
of the object:
str(x)
Factor w/ 2 levels "0","1": 1 1 2 2 2
This is printing out the class of the object (a "factor with two levels, '0' and '1'") and its underlying integer values (1,1,2,2,2).
Using print
prints the level corresponding to each element, without quotation marks, and a list of the elements:
print(x)
[1] 0 0 1 1 1
Levels: 0 1
Using print(., quote = TRUE)
might be less confusing:
print(x, quote=TRUE)
[1] "0" "0" "1" "1" "1"
Levels: "0" "1"
While glm
is happy to take a factor as a response variable, you have to do more work to plot them, e.g.
mm <- transform(mtcars, am = factor(am)) ## 0-1 levels
ff <- glm(am ~ mpg, family = binomial, data = mm)
## plot: suppress axes so we can add our own labels
plot(as.numeric(am) ~ mpg, data = mm, axes = FALSE, ylab = "am")
axis(side = 1)
## add class labels to y-axis
axis(side = 2, at = 1:2, labels = levels(mm$am))
pframe <- data.frame(mpg = seq(min(mm$mpg), max(mm$mpg), length = 51))
## find predicted probability *and add 1 to line up with axis*
pframe$am <- 1 + predict(ff, newdata = pframe, type = "response")
with(pframe, lines(mpg, am))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论