对函数str的输出感到困惑。

huangapple go评论69阅读模式
英文:

Confusion with the output of the function str

问题

在Baystate医疗中心(位于美国春田)于1986年收集的birth.csv数据集具有以下格式

导入csv文件后(使用read.csv()colClasses参数),str()函数的输出与head()函数不匹配。例如,列low的前6个值应该是0,但str()生成的示例输出显示它们为1

'data.frame':	189 obs. of  9 variables:
 $ low  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...  # 但它们应该是 0 0 0 0... 对吗?
 $ age  : num  19 33 20 21 18 21 22 17 29 26 ...
 $ lwt  : num  182 155 105 108 107 124 118 103 123 113 ...
 $ race : Factor w/ 3 levels "1","2","3": 2 3 1 1 1 3 1 3 1 1 ...
 $ smoke: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 1 1 2 2 ...
 $ ptl  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ht   : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ ui   : Factor w/ 2 levels "0","1": 2 1 1 2 2 1 1 1 1 1 ...
 $ ftv  : Factor w/ 3 levels "0","1","2": 1 3 2 3 1 1 2 2 2 1 ...

A data.frame: 6 × 9
    low	age	lwt	race smoke	ptl	ht	ui	ftv
    <fct><dbl><dbl><fct><fct><fct><fct><fct><fct>
1	0	19	182	2	 0	    0	0	1	0
2	0	33	155	3	 0	    0	0	0	2
3	0	20	105	1	 1	    0	0	0	1
4	0	21	108	1	 1	    0	0	1	2
5	0	18	107	1	 1	    0	0	1	0
6	0	21	124	3	 0	    0	0	0	0

请问有人能解释一下发生了什么吗?如果我为导入的数据集构建了一个逻辑模型,结果会是错误的吗?

英文:

The data set birth.csv collected at the Baystate Medical Center, Springfield, USA during 1986 has the following format

对函数str的输出感到困惑。

After I imported the csv file (using read.csv() with colClasses specification), the output of the function str() didn't match with that of the function head(). For example, the first 6 values of the column low were supposed to be 0 but the output sample generated by str() showed they were 1

&#39;data.frame&#39;:	189 obs. of  9 variables:
$ low  : Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 1 1 1 1 1 1 1 1 ...  # shouldn&#39;t they be 0 0 0 0... instead?
$ age  : num  19 33 20 21 18 21 22 17 29 26 ...
$ lwt  : num  182 155 105 108 107 124 118 103 123 113 ...
$ race : Factor w/ 3 levels &quot;1&quot;,&quot;2&quot;,&quot;3&quot;: 2 3 1 1 1 3 1 3 1 1 ...
$ smoke: Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 2 2 2 1 1 1 2 2 ...
$ ptl  : Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 1 1 1 1 1 1 1 1 ...
$ ht   : Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 1 1 1 1 1 1 1 1 ...
$ ui   : Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 2 1 1 2 2 1 1 1 1 1 ...
$ ftv  : Factor w/ 3 levels &quot;0&quot;,&quot;1&quot;,&quot;2&quot;: 1 3 2 3 1 1 2 2 2 1 ...
A data.frame: 6 &#215; 9
low	age	lwt	race smoke	ptl	ht	ui	ftv
&lt;fct&gt;&lt;dbl&gt;&lt;dbl&gt;&lt;fct&gt;&lt;fct&gt;&lt;fct&gt;&lt;fct&gt;&lt;fct&gt;&lt;fct&gt;
1	0	19	182	2	 0	    0	0	1	0
2	0	33	155	3	 0	    0	0	0	2
3	0	20	105	1	 1	    0	0	0	1
4	0	21	108	1	 1	    0	0	1	2
5	0	18	107	1	 1	    0	0	1	0
6	0	21	124	3	 0	    0	0	0	0

Could someone please explain what happened? If I built a logistic model for that imported dataset, would the result be wrong?

答案1

得分: 1

在R中,因子(即分类变量,在tibble列标签中表示为<fct>)内部存储为整数,其中1表示第一个水平(或类别),2表示第二个水平,以此类推,同时还有一个查找表将整数值映射到它们的标签/水平。

使用str()函数查看一些水平以及它们的整数值。大多数其他函数会打印标签,而不是整数值。

在你的情况下可能会有些混淆,因为你的标签是(字符类)整数,从0开始。为了更清晰的示例,让我们看一个标签为字母的因子:

x = factor(c("a", "b", "a", "c"))

x
# [1] a b a c
# Levels: a b c

str(x)
# Factor w/ 3 levels "a","b","c": 1 2 1 3
英文:

Factors (categorical variables, &lt;fct&gt; in the tibble column class labels) in R are stored internally as integers with 1 being the first level (or category), 2 the second level, etc., along with a lookup table mapping the integer values to their labels/levels.

str() a few of the levels and then the integer values. Most other functions print the labels, not the integer values.

It's extra confusing in your case because your labels are (character-class) integers starting at 0. For a somewhat clearer example, let's look at a factor with letters as the labels

x = factor(c(&quot;a&quot;, &quot;b&quot;, &quot;a&quot;, &quot;c&quot;))
x
# [1] a b a c
# Levels: a b c
str(x)
# Factor w/ 3 levels &quot;a&quot;,&quot;b&quot;,&quot;c&quot;: 1 2 1 3

huangapple
  • 本文由 发表于 2023年7月10日 12:51:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76650739.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定