英文:
How can I extract only USD values from a column in R data table including salaries in crore?
问题
我期望这段代码给我所需的输出,但实际上每个薪水数值现在都是NA。
英文:
I have a table with data for the ongoing IPL 2023, with includes a list of the players' salaries. The salaries are either written in the forms, for example, "₹6.75 crore (US$850,000)", "₹50 lakh (US$63,000)", or ₹16 crore (US$2.0 million)". I want to only extract the USD value (if it's listed in millions, I want to have the full number, i.e 2.0 milllion should be 2,000,000).
ipl_2023_salaries <- read_excel(file_path)
usd_values <- str_extract_all(ipl_2023_salaries$Salary, "\\(US\\$([0-9.,]+)\\)")
usd_values <- sapply(usd_values, function(x) ifelse(length(x) > 0, gsub("[^0-9.]", "", x), NA))
ipl_2023_salaries$Salary <- as.numeric(usd_values)
print(ipl_2023_salaries)
I expected this code to give me the output I desired, but instead every salary value is now NA.
答案1
得分: 1
使用 `$` 分割字符串,将第二部分传递给 `readr::parse_number()`,如果存在匹配的 "*million*",则乘以 *1e6*:
``` r
library(stringr)
library(readr)
salary <- c("₹6.75 crore (US$850,000)", "₹50 lakh (US$63,000)", "₹16 crore (US$2.0 million)")
parse_number(str_split_i(salary, "\$", 2)) * ifelse(str_detect(salary, "\$.*million"), 1e6, 1)
#> [1] 850000 63000 2000000
创建于 2023-05-28,使用 reprex v2.0.2
<details>
<summary>英文:</summary>
Split strings at `$`, pass 2nd part to `readr::parse_number()` and if there's a match for *"million"*, multiply by *1e6*:
``` r
library(stringr)
library(readr)
salary <- c("₹6.75 crore (US$850,000)", "₹50 lakh (US$63,000)", "₹16 crore (US$2.0 million)")
parse_number(str_split_i(salary, "\$", 2)) * ifelse(str_detect(salary, "\$.*million"),1e6,1)
#> [1] 850000 63000 2000000
<sup>Created on 2023-05-28 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论