如何将数据框中存储为字符的所有数字列转换为数值型?

huangapple go评论73阅读模式
英文:

How to convert all numeric columns stored as character to numeric in a dataframe?

问题

我有一个包含数百列的数据框,其中一些列虽然只包含数值,但它们以字符数据类型存储。我需要将所有只包含数字的列转换为数值类型(数据中可能也包含NA值)。

示例数据框:

df <- data.frame(id = c("R1","R2","R3","R4","R5"), name=c("A","B","C","D","E"), age=c("24", "NA", "55", "19", "40"), sbp=c(174, 125, 180, NA, 130), dbp=c("106", "67", "109", "NA", "87"))

> print(df, row.names = F)
 id name age sbp dbp
 R1    A  24 174 106
 R2    B  NA 125  67
 R3    C  55 180 109
 R4    D  19  NA  NA
 R5    E  40 130  87

这些列应该被转换为数值类型。

> df$age
[1] "24" "NA" "55" "19" "40"
> df$dbp
[1] "106" "67"  "109" "NA"  "87" 

我尝试应用as.numeric()函数,但它也将所有字符变量(如id、name等)转换为数值,因此生成了NA值。

> sapply(df,as.numeric)
     id name age sbp dbp
[1,] NA   NA  24 174 106
[2,] NA   NA  NA 125  67
[3,] NA   NA  55 180 109
[4,] NA   NA  19  NA  NA
[5,] NA   NA  40 130  87

> lapply(df,as.numeric)
$id
[1] NA NA NA NA NA

$name
[1] NA NA NA NA NA

$age
[1] 24 NA 55 19 40

$sbp
[1] 174 125 180  NA 130

$dbp
[1] 106  67 109  NA  87

我需要在循环遍历数据框时忽略真正的字符列(id、name等)。非常感谢任何帮助!

英文:

I have a dataframe with hundreds of columns where some columns despite having only numeric values are stored as character data type. I need to convert all the columns to numeric where values are numbers only (there might also be NAs in the data).

Example dataframe:

df &lt;- data.frame(id = c(&quot;R1&quot;,&quot;R2&quot;,&quot;R3&quot;,&quot;R4&quot;,&quot;R5&quot;), name=c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;D&quot;,&quot;E&quot;), age=c(&quot;24&quot;, &quot;NA&quot;, &quot;55&quot;, &quot;19&quot;, &quot;40&quot;), sbp=c(174, 125, 180, NA, 130), dbp=c(&quot;106&quot;, &quot;67&quot;, &quot;109&quot;, &quot;NA&quot;, &quot;87&quot;))

&gt; print(df, row.names = F)
 id name age sbp dbp
 R1    A  24 174 106
 R2    B  NA 125  67
 R3    C  55 180 109
 R4    D  19  NA  NA
 R5    E  40 130  87

These columns should be numeric.
&gt; df$age
[1] &quot;24&quot; &quot;NA&quot; &quot;55&quot; &quot;19&quot; &quot;40&quot;
&gt; df$dbp
[1] &quot;106&quot; &quot;67&quot;  &quot;109&quot; &quot;NA&quot;  &quot;87&quot; 

I applied as.numeric() function but it also converted all the character varaibles (id, name..etc) to numeric thus the NA generated.

&gt; sapply(df,as.numeric)
     id name age sbp dbp
[1,] NA   NA  24 174 106
[2,] NA   NA  NA 125  67
[3,] NA   NA  55 180 109
[4,] NA   NA  19  NA  NA
[5,] NA   NA  40 130  87

&gt; lapply(df,as.numeric)
$id
[1] NA NA NA NA NA

$name
[1] NA NA NA NA NA

$age
[1] 24 NA 55 19 40

$sbp
[1] 174 125 180  NA 130

$dbp
[1] 106  67 109  NA  87

What I need to do is ignoreing the real character colums (id, names..) while looping through the dataframe. Any help is much appreciated!

答案1

得分: 3

尝试使用type.convert()函数:

df2 <- type.convert(df, as.is = TRUE)

结果:

#> df2
  id name age sbp dbp
1 R1    A  24 174 106
2 R2    B  NA 125  67
3 R3    C  55 180 109
4 R4    D  19  NA  NA
5 R5    E  40 130  87

## 检查列的类别
#> sapply(df2, class)
         id        name         age         sbp         dbp 
"character" "character"   "integer"   "integer"   "integer" 

注意,as.is参数控制是否将字符列转换为因子。即,如果as.is= FALSE,前两列将被更改为因子。

英文:

Try type.convert():

df2 &lt;- type.convert(df, as.is = TRUE)

Result:

#&gt; df2
  id name age sbp dbp
1 R1    A  24 174 106
2 R2    B  NA 125  67
3 R3    C  55 180 109
4 R4    D  19  NA  NA
5 R5    E  40 130  87

## check column classes
#&gt; sapply(df2, class)
         id        name         age         sbp         dbp 
&quot;character&quot; &quot;character&quot;   &quot;integer&quot;   &quot;integer&quot;   &quot;integer&quot; 

Note, the as.is argument controls whether character columns are converted to factors. i.e., if as.is= FALSE, the first two columns would have been changed to factors.

答案2

得分: 0

这是可能的。它再次提供了一个 DF

df[1:2] |&gt; bind_cols(sapply(df[3:5], as.numeric))
# id name age sbp dbp
# R1    A  24 174 106
# R2    B  NA 125  67
# R3    C  55 180 109
# R4    D  19  NA  NA
# R5    E  40 130  87
英文:

This is possible. It delivers again a DF

df[1:2] |&gt; bind_cols(sapply(df[3:5], as.numeric))
# id name age sbp dbp
# R1    A  24 174 106
# R2    B  NA 125  67
# R3    C  55 180 109
# R4    D  19  NA  NA
# R5    E  40 130  87

huangapple
  • 本文由 发表于 2023年1月9日 19:26:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75056602.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定