将包含百分比和小数的列中的百分比转换为小数。

huangapple go评论157阅读模式
英文:

Turn percentages to decimals in a column that contains both

问题

I'm cleaning a data frame and one of the columns contains percentage values, decimal values, and blank/NA values. I've read this data in from a CSV file and it's been read in as a character field:

value
15%
20.5%
NA
0.17
0.356

I want to turn all the percentage values into decimals so that it becomes:

value
0.15
0.205
NA
0.17
0.356

I've tried to use case_when and grepl to evaluate when the row contains a '%', to remove the character and then divide by 100 but I'm getting error.

df <- df %>%
  mutate(value = case_when(
    is.na(value) ~ NA,              # to keep the NAs
    grepl("%", value, fixed = TRUE) ~ as.numeric(gsub("%", "", value))/100,              # to fix the %s
    .default = value              # to keep the decimal values
    )
  )

The error I get is:

Error in `mutate()`:
! Problem while computing `value = case_when(...)`.
Caused by error in `case_when()`:
! Case 3 (`is.na(value) ~ NA`) must be a two-sided formula, not a
  character vector.

I don't have to use case_when so will accept answers that achieve the same goal but in a different way.

Thanks

英文:

I'm cleaning a data frame and one of the columns contains percentage values, decimal values, and blank/NA values. I've read this data in from a CSV file and it's been read in as a character field:

value
15%
20.5%
NA
0.17
0.356

I want to turn all the percentage values into decimals so that it becomes:

value
0.15
0.205
NA
0.17
0.356

I've tried to use case_when and grepl to evaluate when the row contains a '%', to remove the character and then divide by 100 but I'm getting error.

df &lt;- df %&gt;%
  mutate(value = case_when(
    is.na(value) ~ NA,              # to keep the NAs
    grepl(&quot;%&quot;, value, fixed = TRUE) ~ as.numeric(gsub(&quot;%&quot;, &quot;&quot;, value))/100,              # to fix the %s
    .default = value              # to keep the decimal values
    )
  )

The error I get is:

Error in `mutate()`:
! Problem while computing `value = case_when(...)`.
Caused by error in `case_when()`:
! Case 3 (`is.na(value) ~ NA`) must be a two-sided formula, not a
  character vector.

I don't have to use case_when so will accept answers that achieve the same goal but in a different way.

Thanks

答案1

得分: 1

Your problem is that .default = value is returning characters while the rest of your conditions return numeric values. Columns are atomic, so they must be the same type. To fix your code you need to do:

.default = as.numeric(value)

Explanation

is.na(value) might not be doing anything. You see NA, but R just sees a string "NA", which is not the same. Try running is.na("NA"); is.na(NA). Many routines that read CSVs will auto-detect these string values and replace them with NA. Just an FYI.

If your NA is a true NA, then .default will return "0.17" "0.356". Again, you can see these are numbers, but they are, in fact, characters to R. You cannot mix types in vectors and data frame columns. R has a hierarchy for coercing types that can be dangerous. So here, instead of coercing it, it just throws an error.


Otherwise, here is an alternative:

library(dplyr)

df %>%
  mutate(value = ifelse(grepl("%", value), readr::parse_number(value) / 100, as.numeric(value)))

Note: both your solution and mine might throw a warning message like

NAs introduced by coercion

This is because as.numeric("NA") will try to convert this value to a number, and when it cannot, it will coerce it to NA.

Output

  value
1 0.150
2 0.205
3    NA
4 0.170
5 0.356
英文:

Your problem is that .default = value is returning characters while the rest of your conditions return numeric values. Columns are atomic so they must be the same type. To fix your code you need to do:

.default = as.numeric(value)

Explanation

is.na(value) might not be doing anything. You see NA, but R just sees a string &quot;NA&quot; which is not the same. Try running is.na(&quot;NA&quot;); is.na(NA) . Many routines that read CSVs will auto detect these string values and replace them with NA. Just an FYI.

If your NA is a true NA then .default will return &quot;0.17&quot; &quot;0.356&quot;. Again you can see these are numbers but they are in fact characters to R. You cannot mix types in vectors and data frame columns. R has a hierarchy for coercing types that can be dangerous. So here instead of coercing it just throws an error.


Otherwise, here is an alternative:

library(dplyr)

df |&gt;
  mutate(value = ifelse(grepl(&quot;%&quot;, value), readr::parse_number(value) / 100, as.numeric(value)))

Note: both your solution and mine might throw a warning message like

> NAs introduced by coercion

This is because as.numeric(&quot;NA&quot;) will try to convert this value to a number and when it cannot it will coerce it to NA.

Output

  value
1 0.150
2 0.205
3    NA
4 0.170
5 0.356

答案2

得分: 1

以下是翻译好的代码部分:

# 不使用 `mutate`/`ifelse` 语句的基本R方法:
df$newvalue <- as.numeric(gsub("%", "", df$value))
df$newvalue[grepl("%", df$value)] <- df$newvalue[grepl("%", df$value)] / 100

输出:

# value newvalue
#1   15%    0.150
#2 20.5%    0.205
#3  <NA>       NA
#4  0.17    0.170
#5 0.356    0.356

数据:

df <- read.table(text = "value
15%
20.5%
NA
0.17
0.356", h = TRUE)
英文:

One approach in base R without an mutate/ifelse statement:

df$newvalue &lt;- as.numeric(gsub(&quot;%&quot;, &quot;&quot;, df$value))
df$newvalue[grepl(&quot;%&quot;, df$value)] &lt;- df$newvalue[grepl(&quot;%&quot;, df$value)] / 100

Output

# value newvalue
#1   15%    0.150
#2 20.5%    0.205
#3  &lt;NA&gt;       NA
#4  0.17    0.170
#5 0.356    0.356

Data

df &lt;- read.table(text = &quot;value
15%
20.5%
NA
0.17
0.356&quot;, h = TRUE)

huangapple
  • 本文由 发表于 2023年6月9日 02:37:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76434804.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定