2023年6月9日 02:37:32go评论175阅读模式

英文:

Turn percentages to decimals in a column that contains both

问题

I'm cleaning a data frame and one of the columns contains percentage values, decimal values, and blank/NA values. I've read this data in from a CSV file and it's been read in as a character field:

value
15%
20.5%
NA
0.17
0.356

I want to turn all the percentage values into decimals so that it becomes:

value
0.15
0.205
NA
0.17
0.356

I've tried to use case_when and grepl to evaluate when the row contains a '%', to remove the character and then divide by 100 but I'm getting error.

df <- df %>%
  mutate(value = case_when(
    is.na(value) ~ NA,              # to keep the NAs
    grepl("%", value, fixed = TRUE) ~ as.numeric(gsub("%", "", value))/100,              # to fix the %s
    .default = value              # to keep the decimal values
    )
  )

The error I get is:

Error in `mutate()`:
! Problem while computing `value = case_when(...)`.
Caused by error in `case_when()`:
! Case 3 (`is.na(value) ~ NA`) must be a two-sided formula, not a
  character vector.

I don't have to use case_when so will accept answers that achieve the same goal but in a different way.

Thanks

英文:

I'm cleaning a data frame and one of the columns contains percentage values, decimal values, and blank/NA values. I've read this data in from a CSV file and it's been read in as a character field:

value
15%
20.5%
NA
0.17
0.356

I want to turn all the percentage values into decimals so that it becomes:

value
0.15
0.205
NA
0.17
0.356

I've tried to use case_when and grepl to evaluate when the row contains a '%', to remove the character and then divide by 100 but I'm getting error.

df &lt;- df %&gt;%
  mutate(value = case_when(
    is.na(value) ~ NA,              # to keep the NAs
    grepl(&quot;%&quot;, value, fixed = TRUE) ~ as.numeric(gsub(&quot;%&quot;, &quot;&quot;, value))/100,              # to fix the %s
    .default = value              # to keep the decimal values
    )
  )

The error I get is:

Error in `mutate()`:
! Problem while computing `value = case_when(...)`.
Caused by error in `case_when()`:
! Case 3 (`is.na(value) ~ NA`) must be a two-sided formula, not a
  character vector.

I don't have to use case_when so will accept answers that achieve the same goal but in a different way.

Thanks

答案1

得分: 1

Your problem is that .default = value is returning characters while the rest of your conditions return numeric values. Columns are atomic, so they must be the same type. To fix your code you need to do:

.default = as.numeric(value)

Explanation

is.na(value) might not be doing anything. You see NA, but R just sees a string "NA", which is not the same. Try running is.na("NA"); is.na(NA). Many routines that read CSVs will auto-detect these string values and replace them with NA. Just an FYI.

If your NA is a true NA, then .default will return "0.17" "0.356". Again, you can see these are numbers, but they are, in fact, characters to R. You cannot mix types in vectors and data frame columns. R has a hierarchy for coercing types that can be dangerous. So here, instead of coercing it, it just throws an error.

Otherwise, here is an alternative:

library(dplyr)
df %>%
  mutate(value = ifelse(grepl("%", value), readr::parse_number(value) / 100, as.numeric(value)))

Note: both your solution and mine might throw a warning message like

NAs introduced by coercion

This is because as.numeric("NA") will try to convert this value to a number, and when it cannot, it will coerce it to NA.

Output

  value
1 0.150
2 0.205
3    NA
4 0.170
5 0.356

英文:

Your problem is that .default = value is returning characters while the rest of your conditions return numeric values. Columns are atomic so they must be the same type. To fix your code you need to do:

.default = as.numeric(value)

Explanation

is.na(value) might not be doing anything. You see NA, but R just sees a string "NA" which is not the same. Try running is.na("NA"); is.na(NA) . Many routines that read CSVs will auto detect these string values and replace them with NA. Just an FYI.

If your NA is a true NA then .default will return "0.17" "0.356". Again you can see these are numbers but they are in fact characters to R. You cannot mix types in vectors and data frame columns. R has a hierarchy for coercing types that can be dangerous. So here instead of coercing it just throws an error.

Otherwise, here is an alternative:

library(dplyr)
df |&gt;
  mutate(value = ifelse(grepl(&quot;%&quot;, value), readr::parse_number(value) / 100, as.numeric(value)))

Note: both your solution and mine might throw a warning message like

> NAs introduced by coercion

This is because as.numeric("NA") will try to convert this value to a number and when it cannot it will coerce it to NA.

Output

  value
1 0.150
2 0.205
3    NA
4 0.170
5 0.356

答案2

得分: 1

以下是翻译好的代码部分：

# 不使用 `mutate`/`ifelse` 语句的基本R方法：
df$newvalue <- as.numeric(gsub("%", "", df$value))
df$newvalue[grepl("%", df$value)] <- df$newvalue[grepl("%", df$value)] / 100

输出：

# value newvalue
#1   15%    0.150
#2 20.5%    0.205
#3  <NA>       NA
#4  0.17    0.170
#5 0.356    0.356

数据：

df <- read.table(text = "value
15%
20.5%
NA
0.17
0.356", h = TRUE)

英文:

One approach in base R without an mutate/ifelse statement:

df$newvalue &lt;- as.numeric(gsub(&quot;%&quot;, &quot;&quot;, df$value))
df$newvalue[grepl(&quot;%&quot;, df$value)] &lt;- df$newvalue[grepl(&quot;%&quot;, df$value)] / 100

Output

# value newvalue
#1   15%    0.150
#2 20.5%    0.205
#3  &lt;NA&gt;       NA
#4  0.17    0.170
#5 0.356    0.356

Data

df &lt;- read.table(text = &quot;value
15%
20.5%
NA
0.17
0.356&quot;, h = TRUE)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将包含百分比和小数的列中的百分比转换为小数。

问题

答案1

答案2

为什么deSolve中的ode函数总是在时间t = 0时触发事件？

如何在R中编写匿名函数箭头形式。

将一列进行分组，同时保留其他常数。

Environmental problems while predicting from gaulss-gams with a custom variance function inside a package

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。