英文:
Turn percentages to decimals in a column that contains both
问题
I'm cleaning a data frame and one of the columns contains percentage values, decimal values, and blank/NA values. I've read this data in from a CSV file and it's been read in as a character field:
value |
---|
15% |
20.5% |
NA |
0.17 |
0.356 |
I want to turn all the percentage values into decimals so that it becomes:
value |
---|
0.15 |
0.205 |
NA |
0.17 |
0.356 |
I've tried to use case_when
and grepl
to evaluate when the row contains a '%', to remove the character and then divide by 100 but I'm getting error.
df <- df %>%
mutate(value = case_when(
is.na(value) ~ NA, # to keep the NAs
grepl("%", value, fixed = TRUE) ~ as.numeric(gsub("%", "", value))/100, # to fix the %s
.default = value # to keep the decimal values
)
)
The error I get is:
Error in `mutate()`:
! Problem while computing `value = case_when(...)`.
Caused by error in `case_when()`:
! Case 3 (`is.na(value) ~ NA`) must be a two-sided formula, not a
character vector.
I don't have to use case_when
so will accept answers that achieve the same goal but in a different way.
Thanks
英文:
I'm cleaning a data frame and one of the columns contains percentage values, decimal values, and blank/NA values. I've read this data in from a CSV file and it's been read in as a character field:
value |
---|
15% |
20.5% |
NA |
0.17 |
0.356 |
I want to turn all the percentage values into decimals so that it becomes:
value |
---|
0.15 |
0.205 |
NA |
0.17 |
0.356 |
I've tried to use case_when
and grepl
to evaluate when the row contains a '%', to remove the character and then divide by 100 but I'm getting error.
df <- df %>%
mutate(value = case_when(
is.na(value) ~ NA, # to keep the NAs
grepl("%", value, fixed = TRUE) ~ as.numeric(gsub("%", "", value))/100, # to fix the %s
.default = value # to keep the decimal values
)
)
The error I get is:
Error in `mutate()`:
! Problem while computing `value = case_when(...)`.
Caused by error in `case_when()`:
! Case 3 (`is.na(value) ~ NA`) must be a two-sided formula, not a
character vector.
I don't have to use case_when
so will accept answers that achieve the same goal but in a different way.
Thanks
答案1
得分: 1
Your problem is that .default = value
is returning characters while the rest of your conditions return numeric values. Columns are atomic, so they must be the same type. To fix your code you need to do:
.default = as.numeric(value)
Explanation
is.na(value)
might not be doing anything. You see NA
, but R just sees a string "NA"
, which is not the same. Try running is.na("NA"); is.na(NA)
. Many routines that read CSVs will auto-detect these string values and replace them with NA
. Just an FYI.
If your NA
is a true NA
, then .default
will return "0.17" "0.356"
. Again, you can see these are numbers, but they are, in fact, characters to R. You cannot mix types in vectors and data frame columns. R has a hierarchy for coercing types that can be dangerous. So here, instead of coercing it, it just throws an error.
Otherwise, here is an alternative:
library(dplyr)
df %>%
mutate(value = ifelse(grepl("%", value), readr::parse_number(value) / 100, as.numeric(value)))
Note: both your solution and mine might throw a warning message like
NAs introduced by coercion
This is because as.numeric("NA")
will try to convert this value to a number, and when it cannot, it will coerce it to NA
.
Output
value
1 0.150
2 0.205
3 NA
4 0.170
5 0.356
英文:
Your problem is that .default = value
is returning characters while the rest of your conditions return numeric values. Columns are atomic so they must be the same type. To fix your code you need to do:
.default = as.numeric(value)
Explanation
is.na(value)
might not be doing anything. You see NA
, but R just sees a string "NA"
which is not the same. Try running is.na("NA"); is.na(NA)
. Many routines that read CSVs will auto detect these string values and replace them with NA
. Just an FYI.
If your NA
is a true NA
then .default
will return "0.17" "0.356"
. Again you can see these are numbers but they are in fact characters to R. You cannot mix types in vectors and data frame columns. R has a hierarchy for coercing types that can be dangerous. So here instead of coercing it just throws an error.
Otherwise, here is an alternative:
library(dplyr)
df |>
mutate(value = ifelse(grepl("%", value), readr::parse_number(value) / 100, as.numeric(value)))
Note: both your solution and mine might throw a warning message like
> NAs introduced by coercion
This is because as.numeric("NA")
will try to convert this value to a number and when it cannot it will coerce it to NA
.
Output
value
1 0.150
2 0.205
3 NA
4 0.170
5 0.356
答案2
得分: 1
以下是翻译好的代码部分:
# 不使用 `mutate`/`ifelse` 语句的基本R方法:
df$newvalue <- as.numeric(gsub("%", "", df$value))
df$newvalue[grepl("%", df$value)] <- df$newvalue[grepl("%", df$value)] / 100
输出:
# value newvalue
#1 15% 0.150
#2 20.5% 0.205
#3 <NA> NA
#4 0.17 0.170
#5 0.356 0.356
数据:
df <- read.table(text = "value
15%
20.5%
NA
0.17
0.356", h = TRUE)
英文:
One approach in base R without an mutate
/ifelse
statement:
df$newvalue <- as.numeric(gsub("%", "", df$value))
df$newvalue[grepl("%", df$value)] <- df$newvalue[grepl("%", df$value)] / 100
Output
# value newvalue
#1 15% 0.150
#2 20.5% 0.205
#3 <NA> NA
#4 0.17 0.170
#5 0.356 0.356
Data
df <- read.table(text = "value
15%
20.5%
NA
0.17
0.356", h = TRUE)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论