2023年2月24日 01:06:50go评论101阅读模式

英文:

Remove part of string with multiple occurences inside cell

问题

你可以尝试以下的代码来获得你期望的结果：

gsub("([^|;]+)\\|.*", "\", bla$mycol)

这将返回bla_v2_2072;bla_v2_0113。

英文:

I have the following dataframe:

bla = data.frame(mycol = &quot;bla_v2_2072|ID:61462952|;bla_v2_0113|ID:61460993|&quot;)

and I want to remove everything after the first '|', but the cell contains basically two substrings separated by ';'.

Now, I tried

gsub(&quot;\\|.*&quot;,&quot;&quot;,bla$mycol)

which gives me bla_v2_2072, but what I expect is

bla_v2_2072;bla_v2_0113

答案1

得分: 1

以下是代码的翻译部分：

使用dplyr库：

library(dplyr)
library(tidyr)
library(stringr)
bla %>% 
  mutate(rn = row_number()) %>% 
  separate_longer_delim(mycol, delim = ";") %>% 
  reframe(mycol = str_c(str_remove(mycol, "|.*"), 
  collapse = ";"), .by = 'rn') %>%
  select(-rn)

输出：

                   mycol
1 bla_v2_2072;bla_v2_0113

或者使用基本R：

gsub("(\\w+)(\\|ID:\\d+\\|)", "\", bla$mycol)
[1] "bla_v2_2072;bla_v2_0113"

请注意，这是代码的翻译部分，没有其他内容。

英文:

We may use

library(dplyr)
library(tidyr)
library(stringr)
bla %&gt;% 
  mutate(rn = row_number()) %&gt;% 
  separate_longer_delim(mycol, delim = &quot;;&quot;) %&gt;% 
   reframe(mycol = str_c(str_remove(mycol, &quot;\\|.*&quot;), 
   collapse = &quot;;&quot;), .by = &#39;rn&#39;) %&gt;%
  select(-rn)

-output

                   mycol
1 bla_v2_2072;bla_v2_0113

Or using base R

gsub(&quot;(\\w+)(\\|ID:\\d+\\|)&quot;, &quot;\&quot;, bla$mycol)
[1] &quot;bla_v2_2072;bla_v2_0113&quot;

答案2

得分: 0

使用 gsub():

bla$mycol &lt;- gsub(&quot;(\\|.*?(?=;))|(\\|[^;]*$)&quot;, &quot;&quot;, bla$mycol, perl = TRUE)

或者在tidyverse中使用相同的正则表达式模式：

library(dplyr)
library(stringr)
bla %&gt;% 
  mutate(mycol = str_remove_all(mycol, &quot;(\\|.*?(?=;))|(\\|[^;]*$)&quot;))

结果:

                    mycol
1 bla_v2_2072;bla_v2_0113

解释:

&quot;(\\|.*?(?=;))              # 字面上的&#39;|&#39;和直到下一个&#39;;&#39;的字符
              |             # 或者
               (\\|[^;]*$)&quot; # 字面上的&#39;|&#39;直到字符串结束，如果没有中间的&#39;;&#39;的话

英文:

Using gsub():

bla$mycol &lt;- gsub(&quot;(\\|.*?(?=;))|(\\|[^;]*$)&quot;, &quot;&quot;, bla$mycol, perl = TRUE)

Or using the same regex pattern in tidyverse:

library(dplyr)
library(stringr)
bla %&gt;% 
  mutate(mycol = str_remove_all(mycol, &quot;(\\|.*?(?=;))|(\\|[^;]*$)&quot;))

Result:

                    mycol
1 bla_v2_2072;bla_v2_0113

Explanation:

&quot;(\\|.*?(?=;))              # literal &#39;|&#39; and following characters up to next &#39;;&#39;
              |             # or
               (\\|[^;]*$)&quot; # literal &#39;|&#39; through end of string if no intervening &#39;;&#39;

答案3

得分: 0

gsub("\\|[^|]+\\|", "", bla$mycol)
#> [1] "bla_v2_2072;bla_v2_0113"

模式解释：转义的"|"后跟至少一次任意字符（不包括"|"）然后再跟一个"|"。

英文:

gsub(&quot;\\|[^|]+\\|&quot;, &quot;&quot;, bla$mycol)
#&gt; [1] &quot;bla_v2_2072;bla_v2_0113&quot;

pattern explanation: escaped "|" followed by everything not "|" at least one time then one more "|"

答案4

得分: -1

可以首先通过分号";"分隔字符串，然后删除分号后的所有内容，最后使用 paste0 将它们连接起来。

> paste0(sub("\\|.*", "", unlist(strsplit(bla$mycol, split=";"))), collapse = "; ")
[1] "bla_v2_2072; bla_v2_0113"

英文:

You can first separate your string by ";" and then remove everything after "|". Finally, concatenate them back using paste0.

&gt; paste0(sub(&quot;\\|.*&quot;,&quot;&quot;, unlist(strsplit(bla$mycol, split=&quot;;&quot;))), collapse = &quot;; &quot;)
[1] &quot;bla_v2_2072; bla_v2_0113&quot;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从单元格中移除具有多个出现的字符串部分

问题

答案1

答案2

答案3

答案4

检测特定的字符串

Endianness 和将字符存储为无符号整数

How to split string in Go based on certain prefix and suffix?

assigning dates using regex, converting them with strptime and then applying to dataframe using lambda. Code works but pytest is failing?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。