你可以根据在R中的字符串是否包含特定值来改变数据。

huangapple go评论84阅读模式
英文:

How can I mutate data based on whether a string in R contains a specific value?

问题

For merging purposes, I summarized rows of an existing dataset (structured by country, year, and party) as follows, as I am not interested in the values of distinct parties, but whether a distinct kind of parties were part of the government in a year:

  1. library(dplyr)
  2. vparty <- vparty %>%
  3. group_by(country_name, year) %>%
  4. summarise(across(everything(), ~toString(.)))

Now I have a data frame looking like this:

country_name year apingov
Country1 year1 NA, NA, NA, 1, NA
Country1 year2 NA, NA, NA, 0, NA
Country2 year1 NA, 1, NA, NA, NA
Country2 year2 NA, NA, NA, NA, 0

What I want to do now is to mutate apingov depending on whether the string contains a "1" or a "0". For a string containing "1", the value of the variable shall be 1, for a string containing "0" = 0, when there is neither "1" nor "0" than the value shall be NA.

I tried different solutions I found here. However, none of them really worked for my specific case.

英文:

For merging purposes, I summarized rows of an existing dataset (structured by country, year and party) as follows, as I am not interested in the values of distinct parties, but whether a distinct kind of parties were part of the government in a year:

  1. library(dplyr)
  2. vparty &lt;- vparty%&gt;%
  3. group_by(country_name, year) %&gt;%
  4. summarise(across(everything(), ~toString(.)))

Now I have a data frame looking like this:

country_name year apingov
Country1 year1 NA, NA, NA, 1, NA
Country1 year2 NA, NA, NA, 0, NA
Country2 year1 NA, 1, NA, NA, NA
Country2 year2 NA, NA, NA, NA, 0

What I want to do now is, to mutate apingov depending on whether the string contains a "1" or a "0". (It is not possible that there is 1 and 0 in the same string, and the NAs are not important when 1 or 0 is contained)- For a string containing 1, the value of the variable shall be 1, for a string containing 0 = 0, when there is neither 1 nor 0 than the value shall be NA.

I tried different solutions I found here. However, none of them really worked for my specific case.

答案1

得分: 2

你可以使用readr包中的parse_number来实现。我已经添加了一个额外的行,演示了当所有的apingov都是NA时的行为。

  1. library(dplyr)
  2. vparty %>% mutate(apingov = parse_number(apingov))

如果除了"1"或"0"之外还有其他数字,上述的parse_number函数将不会报告NA。如果是这种情况,可以在case_when中使用grepl

  1. vparty %>%
  2. mutate(apingov = case_when(grepl("1", apingov) ~ "1",
  3. grepl("0", apingov) ~ "0",
  4. .default = NA))

输出

  1. country_name year apingov
  2. 1 Country1 year1 1
  3. 2 Country1 year2 0
  4. 3 Country2 year1 1
  5. 4 Country2 year2 0
  6. 5 Country2 year2 NA

数据

  1. vparty <- structure(list(country_name = c("Country1", "Country1", "Country2",
  2. "Country2", "Country2"), year = c("year1", "year2", "year1",
  3. "year2", "year2"), apingov = c("NA, NA, NA, 1, NA", "NA, NA, NA, 0, NA",
  4. "NA, 1, NA, NA, NA", "NA, NA, NA, NA, 0", "NA, NA, NA, NA, NA"
  5. )), class = "data.frame", row.names = c(NA, -5L))
英文:

You can use parse_number from the readr package for that. I have added an extra row demonstrating the behaviour when all apingov is NA.

  1. library(dplyr)
  2. vparty %&gt;% mutate(apingov = parse_number(apingov))

If there are numbers other than "1" or "0", the above parse_number function would not report NA. Use grepl in case_when if that's the case.

  1. vparty %&gt;%
  2. mutate(apingov = case_when(grepl(&quot;1&quot;, apingov) ~ &quot;1&quot;,
  3. grepl(&quot;0&quot;, apingov) ~ &quot;0&quot;,
  4. .default = NA))

Output

  1. country_name year apingov
  2. 1 Country1 year1 1
  3. 2 Country1 year2 0
  4. 3 Country2 year1 1
  5. 4 Country2 year2 0
  6. 5 Country2 year2 NA

Data

  1. vparty &lt;- structure(list(country_name = c(&quot;Country1&quot;, &quot;Country1&quot;, &quot;Country2&quot;,
  2. &quot;Country2&quot;, &quot;Country2&quot;), year = c(&quot;year1&quot;, &quot;year2&quot;, &quot;year1&quot;,
  3. &quot;year2&quot;, &quot;year2&quot;), apingov = c(&quot;NA, NA, NA, 1, NA&quot;, &quot;NA, NA, NA, 0, NA&quot;,
  4. &quot;NA, 1, NA, NA, NA&quot;, &quot;NA, NA, NA, NA, 0&quot;, &quot;NA, NA, NA, NA, NA&quot;
  5. )), class = &quot;data.frame&quot;, row.names = c(NA, -5L))

答案2

得分: 1

我建议使用parse_number。以下是使用str_detectcase_when的版本:

  1. library(dplyr)
  2. library(stringr)
  3. vparty %>%
  4. mutate(apingov = case_when(
  5. str_detect(apingov, "1") ~ 1,
  6. str_detect(apingov, "0") ~ 0,
  7. TRUE ~ NA_real_
  8. ))
  9. country_name year apingov
  10. 1 Country1 year1 1
  11. 2 Country1 year2 0
  12. 3 Country2 year1 1
  13. 4 Country2 year2 0
英文:

I would recommend parse_number. Here is a version with str_detect and case_when

  1. library(dplyr)
  2. library(stringr)
  3. vparty %&gt;%
  4. mutate(apingov = case_when(
  5. str_detect(apingov, &quot;1&quot;) ~ 1,
  6. str_detect(apingov, &quot;0&quot;) ~ 0,
  7. TRUE ~ NA_real_
  8. ))
  9. country_name year apingov
  10. 1 Country1 year1 1
  11. 2 Country1 year2 0
  12. 3 Country2 year1 1
  13. 4 Country2 year2 0

huangapple
  • 本文由 发表于 2023年5月26日 15:13:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76338435.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定