英文:
How can I mutate data based on whether a string in R contains a specific value?
问题
For merging purposes, I summarized rows of an existing dataset (structured by country, year, and party) as follows, as I am not interested in the values of distinct parties, but whether a distinct kind of parties were part of the government in a year:
library(dplyr)
vparty <- vparty %>%
group_by(country_name, year) %>%
summarise(across(everything(), ~toString(.)))
Now I have a data frame looking like this:
country_name | year | apingov |
---|---|---|
Country1 | year1 | NA, NA, NA, 1, NA |
Country1 | year2 | NA, NA, NA, 0, NA |
Country2 | year1 | NA, 1, NA, NA, NA |
Country2 | year2 | NA, NA, NA, NA, 0 |
What I want to do now is to mutate
apingov depending on whether the string contains a "1" or a "0". For a string containing "1", the value of the variable shall be 1, for a string containing "0" = 0, when there is neither "1" nor "0" than the value shall be NA.
I tried different solutions I found here. However, none of them really worked for my specific case.
英文:
For merging purposes, I summarized rows of an existing dataset (structured by country, year and party) as follows, as I am not interested in the values of distinct parties, but whether a distinct kind of parties were part of the government in a year:
library(dplyr)
vparty <- vparty%>%
group_by(country_name, year) %>%
summarise(across(everything(), ~toString(.)))
Now I have a data frame looking like this:
country_name | year | apingov |
---|---|---|
Country1 | year1 | NA, NA, NA, 1, NA |
Country1 | year2 | NA, NA, NA, 0, NA |
Country2 | year1 | NA, 1, NA, NA, NA |
Country2 | year2 | NA, NA, NA, NA, 0 |
What I want to do now is, to mutate
apingov depending on whether the string contains a "1" or a "0". (It is not possible that there is 1 and 0 in the same string, and the NAs are not important when 1 or 0 is contained)- For a string containing 1, the value of the variable shall be 1, for a string containing 0 = 0, when there is neither 1 nor 0 than the value shall be NA.
I tried different solutions I found here. However, none of them really worked for my specific case.
答案1
得分: 2
你可以使用readr
包中的parse_number
来实现。我已经添加了一个额外的行,演示了当所有的apingov
都是NA
时的行为。
library(dplyr)
vparty %>% mutate(apingov = parse_number(apingov))
如果除了"1"或"0"之外还有其他数字,上述的parse_number
函数将不会报告NA
。如果是这种情况,可以在case_when
中使用grepl
。
vparty %>%
mutate(apingov = case_when(grepl("1", apingov) ~ "1",
grepl("0", apingov) ~ "0",
.default = NA))
输出
country_name year apingov
1 Country1 year1 1
2 Country1 year2 0
3 Country2 year1 1
4 Country2 year2 0
5 Country2 year2 NA
数据
vparty <- structure(list(country_name = c("Country1", "Country1", "Country2",
"Country2", "Country2"), year = c("year1", "year2", "year1",
"year2", "year2"), apingov = c("NA, NA, NA, 1, NA", "NA, NA, NA, 0, NA",
"NA, 1, NA, NA, NA", "NA, NA, NA, NA, 0", "NA, NA, NA, NA, NA"
)), class = "data.frame", row.names = c(NA, -5L))
英文:
You can use parse_number
from the readr
package for that. I have added an extra row demonstrating the behaviour when all apingov
is NA
.
library(dplyr)
vparty %>% mutate(apingov = parse_number(apingov))
If there are numbers other than "1" or "0", the above parse_number
function would not report NA
. Use grepl
in case_when
if that's the case.
vparty %>%
mutate(apingov = case_when(grepl("1", apingov) ~ "1",
grepl("0", apingov) ~ "0",
.default = NA))
Output
country_name year apingov
1 Country1 year1 1
2 Country1 year2 0
3 Country2 year1 1
4 Country2 year2 0
5 Country2 year2 NA
Data
vparty <- structure(list(country_name = c("Country1", "Country1", "Country2",
"Country2", "Country2"), year = c("year1", "year2", "year1",
"year2", "year2"), apingov = c("NA, NA, NA, 1, NA", "NA, NA, NA, 0, NA",
"NA, 1, NA, NA, NA", "NA, NA, NA, NA, 0", "NA, NA, NA, NA, NA"
)), class = "data.frame", row.names = c(NA, -5L))
答案2
得分: 1
我建议使用parse_number
。以下是使用str_detect
和case_when
的版本:
library(dplyr)
library(stringr)
vparty %>%
mutate(apingov = case_when(
str_detect(apingov, "1") ~ 1,
str_detect(apingov, "0") ~ 0,
TRUE ~ NA_real_
))
country_name year apingov
1 Country1 year1 1
2 Country1 year2 0
3 Country2 year1 1
4 Country2 year2 0
英文:
I would recommend parse_number
. Here is a version with str_detect
and case_when
library(dplyr)
library(stringr)
vparty %>%
mutate(apingov = case_when(
str_detect(apingov, "1") ~ 1,
str_detect(apingov, "0") ~ 0,
TRUE ~ NA_real_
))
country_name year apingov
1 Country1 year1 1
2 Country1 year2 0
3 Country2 year1 1
4 Country2 year2 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论