对函数str的输出感到困惑。

huangapple go评论100阅读模式
英文:

Confusion with the output of the function str

问题

在Baystate医疗中心(位于美国春田)于1986年收集的birth.csv数据集具有以下格式

导入csv文件后(使用read.csv()colClasses参数),str()函数的输出与head()函数不匹配。例如,列low的前6个值应该是0,但str()生成的示例输出显示它们为1

  1. 'data.frame': 189 obs. of 9 variables:
  2. $ low : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ... # 但它们应该是 0 0 0 0... 对吗?
  3. $ age : num 19 33 20 21 18 21 22 17 29 26 ...
  4. $ lwt : num 182 155 105 108 107 124 118 103 123 113 ...
  5. $ race : Factor w/ 3 levels "1","2","3": 2 3 1 1 1 3 1 3 1 1 ...
  6. $ smoke: Factor w/ 2 levels "0","1": 1 1 2 2 2 1 1 1 2 2 ...
  7. $ ptl : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
  8. $ ht : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
  9. $ ui : Factor w/ 2 levels "0","1": 2 1 1 2 2 1 1 1 1 1 ...
  10. $ ftv : Factor w/ 3 levels "0","1","2": 1 3 2 3 1 1 2 2 2 1 ...
  11. A data.frame: 6 × 9
  12. low age lwt race smoke ptl ht ui ftv
  13. <fct><dbl><dbl><fct><fct><fct><fct><fct><fct>
  14. 1 0 19 182 2 0 0 0 1 0
  15. 2 0 33 155 3 0 0 0 0 2
  16. 3 0 20 105 1 1 0 0 0 1
  17. 4 0 21 108 1 1 0 0 1 2
  18. 5 0 18 107 1 1 0 0 1 0
  19. 6 0 21 124 3 0 0 0 0 0

请问有人能解释一下发生了什么吗?如果我为导入的数据集构建了一个逻辑模型,结果会是错误的吗?

英文:

The data set birth.csv collected at the Baystate Medical Center, Springfield, USA during 1986 has the following format

对函数str的输出感到困惑。

After I imported the csv file (using read.csv() with colClasses specification), the output of the function str() didn't match with that of the function head(). For example, the first 6 values of the column low were supposed to be 0 but the output sample generated by str() showed they were 1

  1. &#39;data.frame&#39;: 189 obs. of 9 variables:
  2. $ low : Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 1 1 1 1 1 1 1 1 ... # shouldn&#39;t they be 0 0 0 0... instead?
  3. $ age : num 19 33 20 21 18 21 22 17 29 26 ...
  4. $ lwt : num 182 155 105 108 107 124 118 103 123 113 ...
  5. $ race : Factor w/ 3 levels &quot;1&quot;,&quot;2&quot;,&quot;3&quot;: 2 3 1 1 1 3 1 3 1 1 ...
  6. $ smoke: Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 2 2 2 1 1 1 2 2 ...
  7. $ ptl : Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 1 1 1 1 1 1 1 1 ...
  8. $ ht : Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 1 1 1 1 1 1 1 1 1 1 ...
  9. $ ui : Factor w/ 2 levels &quot;0&quot;,&quot;1&quot;: 2 1 1 2 2 1 1 1 1 1 ...
  10. $ ftv : Factor w/ 3 levels &quot;0&quot;,&quot;1&quot;,&quot;2&quot;: 1 3 2 3 1 1 2 2 2 1 ...
  11. A data.frame: 6 &#215; 9
  12. low age lwt race smoke ptl ht ui ftv
  13. &lt;fct&gt;&lt;dbl&gt;&lt;dbl&gt;&lt;fct&gt;&lt;fct&gt;&lt;fct&gt;&lt;fct&gt;&lt;fct&gt;&lt;fct&gt;
  14. 1 0 19 182 2 0 0 0 1 0
  15. 2 0 33 155 3 0 0 0 0 2
  16. 3 0 20 105 1 1 0 0 0 1
  17. 4 0 21 108 1 1 0 0 1 2
  18. 5 0 18 107 1 1 0 0 1 0
  19. 6 0 21 124 3 0 0 0 0 0

Could someone please explain what happened? If I built a logistic model for that imported dataset, would the result be wrong?

答案1

得分: 1

在R中,因子(即分类变量,在tibble列标签中表示为<fct>)内部存储为整数,其中1表示第一个水平(或类别),2表示第二个水平,以此类推,同时还有一个查找表将整数值映射到它们的标签/水平。

使用str()函数查看一些水平以及它们的整数值。大多数其他函数会打印标签,而不是整数值。

在你的情况下可能会有些混淆,因为你的标签是(字符类)整数,从0开始。为了更清晰的示例,让我们看一个标签为字母的因子:

  1. x = factor(c("a", "b", "a", "c"))
  2. x
  3. # [1] a b a c
  4. # Levels: a b c
  5. str(x)
  6. # Factor w/ 3 levels "a","b","c": 1 2 1 3
英文:

Factors (categorical variables, &lt;fct&gt; in the tibble column class labels) in R are stored internally as integers with 1 being the first level (or category), 2 the second level, etc., along with a lookup table mapping the integer values to their labels/levels.

str() a few of the levels and then the integer values. Most other functions print the labels, not the integer values.

It's extra confusing in your case because your labels are (character-class) integers starting at 0. For a somewhat clearer example, let's look at a factor with letters as the labels

  1. x = factor(c(&quot;a&quot;, &quot;b&quot;, &quot;a&quot;, &quot;c&quot;))
  2. x
  3. # [1] a b a c
  4. # Levels: a b c
  5. str(x)
  6. # Factor w/ 3 levels &quot;a&quot;,&quot;b&quot;,&quot;c&quot;: 1 2 1 3

huangapple
  • 本文由 发表于 2023年7月10日 12:51:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76650739.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定