如何将数据框中存储为字符的所有数字列转换为数值型?

huangapple go评论92阅读模式
英文:

How to convert all numeric columns stored as character to numeric in a dataframe?

问题

我有一个包含数百列的数据框,其中一些列虽然只包含数值,但它们以字符数据类型存储。我需要将所有只包含数字的列转换为数值类型(数据中可能也包含NA值)。

示例数据框:

  1. df <- data.frame(id = c("R1","R2","R3","R4","R5"), name=c("A","B","C","D","E"), age=c("24", "NA", "55", "19", "40"), sbp=c(174, 125, 180, NA, 130), dbp=c("106", "67", "109", "NA", "87"))
  2. > print(df, row.names = F)
  3. id name age sbp dbp
  4. R1 A 24 174 106
  5. R2 B NA 125 67
  6. R3 C 55 180 109
  7. R4 D 19 NA NA
  8. R5 E 40 130 87

这些列应该被转换为数值类型。

  1. > df$age
  2. [1] "24" "NA" "55" "19" "40"
  3. > df$dbp
  4. [1] "106" "67" "109" "NA" "87"

我尝试应用as.numeric()函数,但它也将所有字符变量(如id、name等)转换为数值,因此生成了NA值。

  1. > sapply(df,as.numeric)
  2. id name age sbp dbp
  3. [1,] NA NA 24 174 106
  4. [2,] NA NA NA 125 67
  5. [3,] NA NA 55 180 109
  6. [4,] NA NA 19 NA NA
  7. [5,] NA NA 40 130 87
  8. > lapply(df,as.numeric)
  9. $id
  10. [1] NA NA NA NA NA
  11. $name
  12. [1] NA NA NA NA NA
  13. $age
  14. [1] 24 NA 55 19 40
  15. $sbp
  16. [1] 174 125 180 NA 130
  17. $dbp
  18. [1] 106 67 109 NA 87

我需要在循环遍历数据框时忽略真正的字符列(id、name等)。非常感谢任何帮助!

英文:

I have a dataframe with hundreds of columns where some columns despite having only numeric values are stored as character data type. I need to convert all the columns to numeric where values are numbers only (there might also be NAs in the data).

Example dataframe:

  1. df &lt;- data.frame(id = c(&quot;R1&quot;,&quot;R2&quot;,&quot;R3&quot;,&quot;R4&quot;,&quot;R5&quot;), name=c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;D&quot;,&quot;E&quot;), age=c(&quot;24&quot;, &quot;NA&quot;, &quot;55&quot;, &quot;19&quot;, &quot;40&quot;), sbp=c(174, 125, 180, NA, 130), dbp=c(&quot;106&quot;, &quot;67&quot;, &quot;109&quot;, &quot;NA&quot;, &quot;87&quot;))
  2. &gt; print(df, row.names = F)
  3. id name age sbp dbp
  4. R1 A 24 174 106
  5. R2 B NA 125 67
  6. R3 C 55 180 109
  7. R4 D 19 NA NA
  8. R5 E 40 130 87
  9. These columns should be numeric.
  10. &gt; df$age
  11. [1] &quot;24&quot; &quot;NA&quot; &quot;55&quot; &quot;19&quot; &quot;40&quot;
  12. &gt; df$dbp
  13. [1] &quot;106&quot; &quot;67&quot; &quot;109&quot; &quot;NA&quot; &quot;87&quot;

I applied as.numeric() function but it also converted all the character varaibles (id, name..etc) to numeric thus the NA generated.

  1. &gt; sapply(df,as.numeric)
  2. id name age sbp dbp
  3. [1,] NA NA 24 174 106
  4. [2,] NA NA NA 125 67
  5. [3,] NA NA 55 180 109
  6. [4,] NA NA 19 NA NA
  7. [5,] NA NA 40 130 87
  8. &gt; lapply(df,as.numeric)
  9. $id
  10. [1] NA NA NA NA NA
  11. $name
  12. [1] NA NA NA NA NA
  13. $age
  14. [1] 24 NA 55 19 40
  15. $sbp
  16. [1] 174 125 180 NA 130
  17. $dbp
  18. [1] 106 67 109 NA 87

What I need to do is ignoreing the real character colums (id, names..) while looping through the dataframe. Any help is much appreciated!

答案1

得分: 3

尝试使用type.convert()函数:

  1. df2 <- type.convert(df, as.is = TRUE)

结果:

  1. #> df2
  2. id name age sbp dbp
  3. 1 R1 A 24 174 106
  4. 2 R2 B NA 125 67
  5. 3 R3 C 55 180 109
  6. 4 R4 D 19 NA NA
  7. 5 R5 E 40 130 87
  8. ## 检查列的类别
  9. #> sapply(df2, class)
  10. id name age sbp dbp
  11. "character" "character" "integer" "integer" "integer"

注意,as.is参数控制是否将字符列转换为因子。即,如果as.is= FALSE,前两列将被更改为因子。

英文:

Try type.convert():

  1. df2 &lt;- type.convert(df, as.is = TRUE)

Result:

  1. #&gt; df2
  2. id name age sbp dbp
  3. 1 R1 A 24 174 106
  4. 2 R2 B NA 125 67
  5. 3 R3 C 55 180 109
  6. 4 R4 D 19 NA NA
  7. 5 R5 E 40 130 87
  8. ## check column classes
  9. #&gt; sapply(df2, class)
  10. id name age sbp dbp
  11. &quot;character&quot; &quot;character&quot; &quot;integer&quot; &quot;integer&quot; &quot;integer&quot;

Note, the as.is argument controls whether character columns are converted to factors. i.e., if as.is= FALSE, the first two columns would have been changed to factors.

答案2

得分: 0

这是可能的。它再次提供了一个 DF

  1. df[1:2] |&gt; bind_cols(sapply(df[3:5], as.numeric))
  2. # id name age sbp dbp
  3. # R1 A 24 174 106
  4. # R2 B NA 125 67
  5. # R3 C 55 180 109
  6. # R4 D 19 NA NA
  7. # R5 E 40 130 87
英文:

This is possible. It delivers again a DF

  1. df[1:2] |&gt; bind_cols(sapply(df[3:5], as.numeric))
  2. # id name age sbp dbp
  3. # R1 A 24 174 106
  4. # R2 B NA 125 67
  5. # R3 C 55 180 109
  6. # R4 D 19 NA NA
  7. # R5 E 40 130 87

huangapple
  • 本文由 发表于 2023年1月9日 19:26:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75056602.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定