英文:
How to convert all numeric columns stored as character to numeric in a dataframe?
问题
我有一个包含数百列的数据框,其中一些列虽然只包含数值,但它们以字符数据类型存储。我需要将所有只包含数字的列转换为数值类型(数据中可能也包含NA值)。
示例数据框:
df <- data.frame(id = c("R1","R2","R3","R4","R5"), name=c("A","B","C","D","E"), age=c("24", "NA", "55", "19", "40"), sbp=c(174, 125, 180, NA, 130), dbp=c("106", "67", "109", "NA", "87"))
> print(df, row.names = F)
id name age sbp dbp
R1 A 24 174 106
R2 B NA 125 67
R3 C 55 180 109
R4 D 19 NA NA
R5 E 40 130 87
这些列应该被转换为数值类型。
> df$age
[1] "24" "NA" "55" "19" "40"
> df$dbp
[1] "106" "67" "109" "NA" "87"
我尝试应用as.numeric()
函数,但它也将所有字符变量(如id、name等)转换为数值,因此生成了NA值。
> sapply(df,as.numeric)
id name age sbp dbp
[1,] NA NA 24 174 106
[2,] NA NA NA 125 67
[3,] NA NA 55 180 109
[4,] NA NA 19 NA NA
[5,] NA NA 40 130 87
> lapply(df,as.numeric)
$id
[1] NA NA NA NA NA
$name
[1] NA NA NA NA NA
$age
[1] 24 NA 55 19 40
$sbp
[1] 174 125 180 NA 130
$dbp
[1] 106 67 109 NA 87
我需要在循环遍历数据框时忽略真正的字符列(id、name等)。非常感谢任何帮助!
英文:
I have a dataframe with hundreds of columns where some columns despite having only numeric values are stored as character data type. I need to convert all the columns to numeric where values are numbers only (there might also be NAs in the data).
Example dataframe:
df <- data.frame(id = c("R1","R2","R3","R4","R5"), name=c("A","B","C","D","E"), age=c("24", "NA", "55", "19", "40"), sbp=c(174, 125, 180, NA, 130), dbp=c("106", "67", "109", "NA", "87"))
> print(df, row.names = F)
id name age sbp dbp
R1 A 24 174 106
R2 B NA 125 67
R3 C 55 180 109
R4 D 19 NA NA
R5 E 40 130 87
These columns should be numeric.
> df$age
[1] "24" "NA" "55" "19" "40"
> df$dbp
[1] "106" "67" "109" "NA" "87"
I applied as.numeric() function but it also converted all the character varaibles (id, name..etc) to numeric thus the NA generated.
> sapply(df,as.numeric)
id name age sbp dbp
[1,] NA NA 24 174 106
[2,] NA NA NA 125 67
[3,] NA NA 55 180 109
[4,] NA NA 19 NA NA
[5,] NA NA 40 130 87
> lapply(df,as.numeric)
$id
[1] NA NA NA NA NA
$name
[1] NA NA NA NA NA
$age
[1] 24 NA 55 19 40
$sbp
[1] 174 125 180 NA 130
$dbp
[1] 106 67 109 NA 87
What I need to do is ignoreing the real character colums (id, names..) while looping through the dataframe. Any help is much appreciated!
答案1
得分: 3
尝试使用type.convert()
函数:
df2 <- type.convert(df, as.is = TRUE)
结果:
#> df2
id name age sbp dbp
1 R1 A 24 174 106
2 R2 B NA 125 67
3 R3 C 55 180 109
4 R4 D 19 NA NA
5 R5 E 40 130 87
## 检查列的类别
#> sapply(df2, class)
id name age sbp dbp
"character" "character" "integer" "integer" "integer"
注意,as.is
参数控制是否将字符列转换为因子。即,如果as.is= FALSE
,前两列将被更改为因子。
英文:
Try type.convert()
:
df2 <- type.convert(df, as.is = TRUE)
Result:
#> df2
id name age sbp dbp
1 R1 A 24 174 106
2 R2 B NA 125 67
3 R3 C 55 180 109
4 R4 D 19 NA NA
5 R5 E 40 130 87
## check column classes
#> sapply(df2, class)
id name age sbp dbp
"character" "character" "integer" "integer" "integer"
Note, the as.is
argument controls whether character columns are converted to factors. i.e., if as.is= FALSE
, the first two columns would have been changed to factors.
答案2
得分: 0
这是可能的。它再次提供了一个 DF。
df[1:2] |> bind_cols(sapply(df[3:5], as.numeric))
# id name age sbp dbp
# R1 A 24 174 106
# R2 B NA 125 67
# R3 C 55 180 109
# R4 D 19 NA NA
# R5 E 40 130 87
英文:
This is possible. It delivers again a DF
df[1:2] |> bind_cols(sapply(df[3:5], as.numeric))
# id name age sbp dbp
# R1 A 24 174 106
# R2 B NA 125 67
# R3 C 55 180 109
# R4 D 19 NA NA
# R5 E 40 130 87
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论