2023年6月6日 07:54:41go评论128阅读模式

英文:

Why does R tell me the two factor levels are "0" "1" But then lists 1s and 2s and plots them as 1 and 2?

问题

我没做什么复杂的操作，只是用 as.factor() 和 factor() 将我的 0 和 1 的数据转换为因子。

data$Class &lt;- as.factor(as.numeric(data$Class))
str(data$Class)
data$Class
输出:
Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 1 1 1 1 1 2 2 2 ...
   [1] 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

这似乎矛盾。

我的天真假设是这是一个 bug。

我尝试创建一个新的变量

data$Class2 &lt;- ifelse(data$Class == &#39;1&#39;, 1, 0)
data$Class2 &lt;- factor(data$Class2)
data$Class2

然后我也尝试通过强制转换来做

factor(data$Class, levels = c(&quot;0&quot;,&quot;1&quot;), labels = c(&quot;0&quot;,&quot;1&quot;))

这些都产生相同的结果。

英文:

I do nothing fancy, just turn my data of 0s and 1s into factors with as.factor() and factor(), which do the same thing.

data$Class &lt;- as.factor(as.numeric(data$Class))
str(data$Class)
data$Class
Output:
Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 1 1 1 1 1 2 2 2 ...
   [1] 0 0 0 0 0 0 0 1 1 1 1 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

This seems contradictory.
My naive assumption is it is a bug.

I tried making a new variable

data$Class2 &lt;- ifelse(data$Class == &#39;1&#39;, 1, 0)
data$Class2 &lt;- factor(data$Class2)
data$Class2

Then I also tried forcing it by doing it with

factor(data$Class, levels = c(&quot;0&quot;,&quot;1&quot;), labels = c(&quot;0&quot;,&quot;1&quot;))

These all produce the same outcome.

R告诉我两个因子水平是”0″和”1″，但然后列出1和2，并将它们绘制为1和2。

答案1

得分: 5

你对对象的基础表示和打印方式之间的差异感到困惑是可以理解的。

x <- factor(c(0,0,1,1,1))

在底层，这被存储为从1开始的一组整数代码和与每个代码对应的一组级别：1 = "0"，2 = "1" 在这种情况下。

str(x) 告诉你关于对象的structure：

str(x)
 Factor w/ 2 levels "0","1": 1 1 2 2 2

这打印出对象的类别（一个"factor with two levels, '0' and '1' "）和其底层整数值（1,1,2,2,2）。

使用 print 打印出每个元素对应的级别，不带引号，以及元素的列表：

print(x)
[1] 0 0 1 1 1
Levels: 0 1

使用 print(., quote = TRUE) 可能会更少令人困惑：

print(x, quote=TRUE)
[1] "0" "0" "1" "1" "1"
Levels: "0" "1"

虽然 glm 乐意接受因变量为因子，但要绘制它们需要做更多工作，例如：

mm <- transform(mtcars, am = factor(am))  ## 0-1 levels
ff <- glm(am ~ mpg, family = binomial, data = mm)
## 绘图：抑制轴，以便我们可以添加自己的标签
plot(as.numeric(am) ~ mpg, data = mm, axes = FALSE, ylab = "am")
axis(side = 1)
## 在y轴上添加类别标签
axis(side = 2, at = 1:2, labels = levels(mm$am))
pframe <- data.frame(mpg = seq(min(mm$mpg), max(mm$mpg), length = 51))
## 找到预测概率 *并加1以与轴对齐*
pframe$am <- 1 + predict(ff, newdata = pframe, type = "response")
with(pframe, lines(mpg, am))

英文:

You're (understandably) confused by the differences between the underlying representation of the object and the way it is printed.

x &lt;- factor(c(0,0,1,1,1))

Underneath, this is stored as a set of integer codes starting from 1 and a set of levels corresponding to each code: 1 = "0", 2 = "1" in this case.

str(x) tells you about the structure
of the object:

str(x)
 Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 2 2 2

This is printing out the class of the object (a "factor with two levels, '0' and '1'") and its underlying integer values (1,1,2,2,2).

Using print prints the level corresponding to each element, without quotation marks, and a list of the elements:

print(x)
[1] 0 0 1 1 1
Levels: 0 1

Using print(., quote = TRUE) might be less confusing:

print(x, quote=TRUE)
[1] &quot;0&quot; &quot;0&quot; &quot;1&quot; &quot;1&quot; &quot;1&quot;
Levels: &quot;0&quot; &quot;1&quot;

While glm is happy to take a factor as a response variable, you have to do more work to plot them, e.g.

mm &lt;- transform(mtcars, am = factor(am))  ## 0-1 levels
ff &lt;- glm(am ~ mpg, family = binomial, data = mm)
## plot: suppress axes so we can add our own labels
plot(as.numeric(am) ~ mpg, data = mm, axes = FALSE, ylab = &quot;am&quot;)
axis(side = 1)
## add class labels to y-axis
axis(side = 2, at = 1:2, labels = levels(mm$am))
pframe &lt;- data.frame(mpg = seq(min(mm$mpg), max(mm$mpg), length = 51))
## find predicted probability *and add 1 to line up with axis*
pframe$am &lt;- 1 + predict(ff, newdata = pframe, type = &quot;response&quot;)
with(pframe, lines(mpg, am))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R告诉我两个因子水平是”0″和”1″，但然后列出1和2，并将它们绘制为1和2。

问题

答案1

如何在R绘图中使用dalek字体？

用最小和最大日期填充每个组的缺失日期。

如何在不丢失其他列的情况下将函数应用于特定列？

有没有办法根据R中的多个列条件，按ID分组来折叠行？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。