2023年5月29日 03:34:28go评论176阅读模式

英文:

Best way to exclude any of multiple characters in a string column when filtering a dataframe in R

问题

library(dplyr)

# Your input dataframe
df <- data.frame(Name = c("101a,102a", "101b,102b,103b", "103c", "102d,103d", "101,103"),
                 Value = c("2", "3", "4", "5", "6"))

# The character you want to filter on
char <- "a"

# Create a vector of characters to filter out
chars_to_filter <- setdiff(c("a", "b", "c", "d"), char)

# Filter the dataframe
df_filtered <- df %>%
  filter(!grepl(paste(chars_to_filter, collapse = "|"), Name))

df_filtered

This code will filter the dataframe df based on the character provided in the char variable. It creates a vector chars_to_filter containing the characters you want to filter out, and then uses grepl to filter the rows where the Name column does not contain any of those characters. The resulting filtered dataframe is stored in df_filtered.

英文:

I have a dataframe with string and number data that I need to filter.

library(dplyr)
df &lt;- data.frame (Name  = c(&quot;101a,102a&quot;, &quot;101b,102b,103b&quot;, &quot;103c&quot;, &quot;102d,103d&quot;, &quot;101,103&quot;),
                  Value = c(&quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;)
                  )

The supplementary characters in Name can only be a, b, c or d. I want to find the best way to filter away all data where other characters than the one provided in the filter occur, while keeping data where Name does not contain characters. When filtering with "a", I wish to remove all data that contains "b", "c" or "d", keeping the first and last data:

Name                Value
&quot;101a,102a&quot;         &quot;2&quot;
&quot;101,102,103&quot;       &quot;6&quot;

I can probably do this with if elses

If (char==&quot;a&quot;) {
	df &lt;- filter(df, (!grepl(&quot;b&quot;, Name) | !grepl(&quot;c&quot;, Name) | !grepl(&quot;d&quot;, Name))
} else if (char==&quot;b&quot;) {
	df &lt;- filter(df, (!grepl(&quot;a&quot;, Name) | !grepl(&quot;c&quot;, Name) | !grepl(&quot;d&quot;, Name))
} else if (char==&quot;c&quot;) {
	df &lt;- filter(df, (!grepl(&quot;a&quot;, Name) | !grepl(&quot;b&quot;, Name) | !grepl(&quot;d&quot;, Name))
} else if (char==&quot;d&quot;) {
	df &lt;- filter(df, (!grepl(&quot;a&quot;, Name) | !grepl(&quot;b&quot;, Name) | !grepl(&quot;c&quot;, Name))
}

But I was hoping someone could help me to something more efficient and shorter code. I'm looking for a code that essentially does this:

"remove char from 'a,b,c,d' and filter out all data where Name does not contain any the remaining chars".

I tried:

abcd &lt;- c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;)
df &lt;- filter(df, !Name %in% abcd[!abcd==char])

but %in% seems to use match which requires perfect match, so I tried

df &lt;- filter(!grepl(paste(abcd[!abcd==char], collapse=&quot;|&quot;),Name))

but I can't get the right syntax. I think I need some help creating the

(!grepl(&quot;a&quot;, Name) | !grepl(&quot;b&quot;, Name) | !grepl(&quot;c&quot;, Name))

part on the fly.

答案1

得分: 1

如果您每次只处理一个字母，您可以使用以下类似的函数：

library(dplyr)
df <- data.frame(Name = c("101a,102a", "101b,102b,103b", "103c", "102d,103d", "101,103"),
                 Value = c("2", "3", "4", "5", "6")
)

keep <- function(df, keep = c("a", "b", "c", "d")) {
     df[grepl(paste(keep, collapse = "|"), df$Name, fixed = TRUE), ]
}

> keep(df, "a")
       Name Value
1 101a,102a     2

> keep(df, "b")
            Name Value
2 101b,102b,103b     3

英文:

If you only ever do one letter as a time you could use a function like this.

library(dplyr)
df &lt;- data.frame (Name  = c(&quot;101a,102a&quot;, &quot;101b,102b,103b&quot;, &quot;103c&quot;, &quot;102d,103d&quot;, &quot;101,103&quot;),
                  Value = c(&quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;)
)


keep &lt;- function(df, keep = c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;)){
     df[grepl(keep, df$Name,  fixed = TRUE),]
}

&gt; keep(df, &quot;a&quot;)
       Name Value
1 101a,102a     2

&gt; keep(df, &quot;b&quot;)
            Name Value
2 101b,102b,103b     3

答案2

得分: 1

使用 paste 创建一个正则表达式，不包括您想要的字符。然后使用 filter 反转 grepl 的结果。

suppressPackageStartupMessages(
  library(dplyr)
)

df <- data.frame(Name  = c("101a,102a", "101b,102b,103b", "103c", "102d,103d", "101,103"),
                 Value = c("2", "3", "4", "5", "6"))

abcd <- c("a", "b", "c", "d")

char <- "a"

discard <- paste(abcd[abcd != char], collapse = "|")
filter(df, !grepl(discard, Name))
#>        Name Value
#> 1 101a,102a     2
#> 2   101,103     6

基本的 R 方法如下。

char <- "a"

discard <- paste(abcd[abcd != char], collapse = "|")
df[grep(discard, df$Name, invert = TRUE), ]
#>        Name Value
#> 1 101a,102a     2
#> 5   101,103     6

英文:

Use paste to create a regex not including the character you want. Then filter negating the result of grepl.

suppressPackageStartupMessages(
  library(dplyr)
)

df &lt;- data.frame(Name  = c(&quot;101a,102a&quot;, &quot;101b,102b,103b&quot;, &quot;103c&quot;, &quot;102d,103d&quot;, &quot;101,103&quot;),
                 Value = c(&quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;))

abcd &lt;- c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;)

char &lt;- &quot;a&quot;

discard &lt;- paste(abcd[abcd != char], collapse = &quot;|&quot;)
filter(df, !grepl(discard, Name))
#&gt;        Name Value
#&gt; 1 101a,102a     2
#&gt; 2   101,103     6

<sup>Created on 2023-05-28 with reprex v2.0.2</sup>

A base R way is the following.

char &lt;- &quot;a&quot;

discard &lt;- paste(abcd[abcd != char], collapse = &quot;|&quot;)
df[grep(discard, df$Name, invert = TRUE), ]
#&gt;        Name Value
#&gt; 1 101a,102a     2
#&gt; 5   101,103     6

<sup>Created on 2023-05-28 with reprex v2.0.2</sup>

答案3

得分: 1

这是使用separate_rows()将数据转换成长格式的解决方案：

library(tidyverse)

df %>%
  separate_rows(Name) %>%
  mutate(x = str_extract(Name, "[A-Za-z]"),
         Name = parse_number(Name)) %>%
  filter(x == "a" | is.na(x)) %>%
  mutate(Name = ifelse(!is.na(x), paste0(Name, x), Name)) %>%
  summarise(Name = toString(Name), .by=Value)

  Value Name      
1 2     101a, 102a
2 6     101, 103

请注意，这是R语言代码，用于将数据从宽格式转换为长格式，并对数据进行一些操作和汇总。

英文:

Here is solution with bringing the data in long format with separate_rows():

library(tidyverse)

df %&gt;% 
  separate_rows(Name) %&gt;% 
  mutate(x = str_extract(Name, &quot;[A-Za-z]&quot;),
         Name = parse_number(Name)) %&gt;% 
  filter(x == &quot;a&quot; | is.na(x)) %&gt;% 
  mutate(Name = ifelse(!is.na(x), paste0(Name, x), Name)) %&gt;% 
  summarise(Name = toString(Name), .by=Value)

  Value Name      
  &lt;chr&gt; &lt;chr&gt;     
1 2     101a, 102a
2 6     101, 103

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中过滤数据框时，排除字符串列中的多个字符的最佳方法是什么？

问题

答案1

答案2

答案3

在R中创建一个新列，该列包含文件名。

选择在R中的一组组中具有最大值的行如何？

将元素按照 separate() 函数分成不同的列。

如何根据分类变量在R中从某些行的值中减去其他行的值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论