你可以根据在R中的字符串是否包含特定值来改变数据。

huangapple go评论61阅读模式
英文:

How can I mutate data based on whether a string in R contains a specific value?

问题

For merging purposes, I summarized rows of an existing dataset (structured by country, year, and party) as follows, as I am not interested in the values of distinct parties, but whether a distinct kind of parties were part of the government in a year:

library(dplyr)

vparty <- vparty %>%
  group_by(country_name, year) %>%
  summarise(across(everything(), ~toString(.)))

Now I have a data frame looking like this:

country_name year apingov
Country1 year1 NA, NA, NA, 1, NA
Country1 year2 NA, NA, NA, 0, NA
Country2 year1 NA, 1, NA, NA, NA
Country2 year2 NA, NA, NA, NA, 0

What I want to do now is to mutate apingov depending on whether the string contains a "1" or a "0". For a string containing "1", the value of the variable shall be 1, for a string containing "0" = 0, when there is neither "1" nor "0" than the value shall be NA.

I tried different solutions I found here. However, none of them really worked for my specific case.

英文:

For merging purposes, I summarized rows of an existing dataset (structured by country, year and party) as follows, as I am not interested in the values of distinct parties, but whether a distinct kind of parties were part of the government in a year:

library(dplyr)

vparty &lt;- vparty%&gt;%
  group_by(country_name, year) %&gt;%
  summarise(across(everything(), ~toString(.)))

Now I have a data frame looking like this:

country_name year apingov
Country1 year1 NA, NA, NA, 1, NA
Country1 year2 NA, NA, NA, 0, NA
Country2 year1 NA, 1, NA, NA, NA
Country2 year2 NA, NA, NA, NA, 0

What I want to do now is, to mutate apingov depending on whether the string contains a "1" or a "0". (It is not possible that there is 1 and 0 in the same string, and the NAs are not important when 1 or 0 is contained)- For a string containing 1, the value of the variable shall be 1, for a string containing 0 = 0, when there is neither 1 nor 0 than the value shall be NA.

I tried different solutions I found here. However, none of them really worked for my specific case.

答案1

得分: 2

你可以使用readr包中的parse_number来实现。我已经添加了一个额外的行,演示了当所有的apingov都是NA时的行为。

library(dplyr)

vparty %>% mutate(apingov = parse_number(apingov))

如果除了"1"或"0"之外还有其他数字,上述的parse_number函数将不会报告NA。如果是这种情况,可以在case_when中使用grepl

vparty %>% 
  mutate(apingov = case_when(grepl("1", apingov) ~ "1",
                             grepl("0", apingov) ~ "0",
                             .default = NA))

输出

  country_name  year apingov
1     Country1 year1       1
2     Country1 year2       0
3     Country2 year1       1
4     Country2 year2       0
5     Country2 year2      NA

数据

vparty <- structure(list(country_name = c("Country1", "Country1", "Country2", 
"Country2", "Country2"), year = c("year1", "year2", "year1", 
"year2", "year2"), apingov = c("NA, NA, NA, 1, NA", "NA, NA, NA, 0, NA", 
"NA, 1, NA, NA, NA", "NA, NA, NA, NA, 0", "NA, NA, NA, NA, NA"
)), class = "data.frame", row.names = c(NA, -5L))
英文:

You can use parse_number from the readr package for that. I have added an extra row demonstrating the behaviour when all apingov is NA.

library(dplyr)

vparty %&gt;% mutate(apingov = parse_number(apingov))

If there are numbers other than "1" or "0", the above parse_number function would not report NA. Use grepl in case_when if that's the case.

vparty %&gt;% 
  mutate(apingov = case_when(grepl(&quot;1&quot;, apingov) ~ &quot;1&quot;,
                             grepl(&quot;0&quot;, apingov) ~ &quot;0&quot;,
                             .default = NA))

Output

  country_name  year apingov
1     Country1 year1       1
2     Country1 year2       0
3     Country2 year1       1
4     Country2 year2       0
5     Country2 year2      NA

Data

vparty &lt;- structure(list(country_name = c(&quot;Country1&quot;, &quot;Country1&quot;, &quot;Country2&quot;, 
&quot;Country2&quot;, &quot;Country2&quot;), year = c(&quot;year1&quot;, &quot;year2&quot;, &quot;year1&quot;, 
&quot;year2&quot;, &quot;year2&quot;), apingov = c(&quot;NA, NA, NA, 1, NA&quot;, &quot;NA, NA, NA, 0, NA&quot;, 
&quot;NA, 1, NA, NA, NA&quot;, &quot;NA, NA, NA, NA, 0&quot;, &quot;NA, NA, NA, NA, NA&quot;
)), class = &quot;data.frame&quot;, row.names = c(NA, -5L))

答案2

得分: 1

我建议使用parse_number。以下是使用str_detectcase_when的版本:

library(dplyr)
library(stringr)

vparty %>%
  mutate(apingov = case_when(
    str_detect(apingov, "1") ~ 1,
    str_detect(apingov, "0") ~ 0,
    TRUE ~ NA_real_
  ))
  
 country_name  year apingov
1     Country1 year1       1
2     Country1 year2       0
3     Country2 year1       1
4     Country2 year2       0
英文:

I would recommend parse_number. Here is a version with str_detect and case_when

library(dplyr)
library(stringr)

vparty %&gt;% 
  mutate(apingov = case_when(
    str_detect(apingov, &quot;1&quot;) ~ 1,
    str_detect(apingov, &quot;0&quot;) ~ 0,
    TRUE ~ NA_real_
  ))

 country_name  year apingov
1     Country1 year1       1
2     Country1 year2       0
3     Country2 year1       1
4     Country2 year2       0

huangapple
  • 本文由 发表于 2023年5月26日 15:13:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76338435.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定