2023年6月26日 00:56:23go评论113阅读模式

英文:

How can I extract a string from between last dash and second to last dash out of a column that contains lists of strings?

问题

# 以下是翻译好的部分：
我有一些数据，想要创建一个新列，其中包含在倒数第二个破折号和倒数第一个破折号之间的字符串。但有一个小技巧！我的一些观察结果是"列出的"，我也想从列表项中获取每个目标字符串。
示例数据如下：
data <- data.frame(
  a = c("1500925OR3-29139-315012", 
        "1500925OR3-2-2913A-315012", 
        "c(\"1500925OR3-200B-315012\", \"1500925OR3-4-2919999-315012\")")
)
看起来像这样：
                                                           a
1                                    1500925OR3-29139-315012
2                                  1500925OR3-2-2913A-315012
3 c("1500925OR3-200B-315012", "1500925OR3-4-2919999-315012")
我想要的数据看起来像这样
       a_clean
1         29139
2         2913A
3 200B, 2919999
我一直在尝试使用正则表达式，但我无法弄清如何获取最后一个破折号之前的字符串。这会捕获最后一个破折号后面的内容...`-[^-]*$`，但显然那不对。

英文:

I have some data and I want to make a new column with the string that is between the last dash and the second to last dash. But there is a twist! Some of my observations are "listed", and I want to get each target string out of the list items as well.

Example data here:

data &lt;- data.frame(
  a = c(&quot;1500925OR3-29139-315012&quot;, 
        &quot;1500925OR3-2-2913A-315012&quot;, 
        &quot;c(\&quot;1500925OR3-200B-315012\&quot;, \&quot;1500925OR3-4-2919999-315012\&quot;)&quot;)
)

looks like:

                                                           a
1                                    1500925OR3-29139-315012
2                                  1500925OR3-2-2913A-315012
3 c(&quot;1500925OR3-200B-315012&quot;, &quot;1500925OR3-4-2919999-315012&quot;)

I want data that looks like this

        a_clean
1         29139
2         2913A
3 200B, 2919999

I've been working on using regex, but I can't figure out how to get the string before the last dash. This grabs the stuff after the last dash...-[^-]*$ but obviously thats not right.

答案1

得分: 3

尝试在sub中使用这个正则表达式，并使用lapply。

dat$b <- lapply(dat$a, \(x) sub('-?.*-(.*)-.*', '\', x, perl=TRUE))
dat
#                                                     a             b
# 1                             1500925OR3-29139-315012         29139
# 2                           1500925OR3-2-2913A-315012         2913A
# 3 1500925OR3-200B-315012, 1500925OR3-4-2919999-315012 200B, 2919999

你提到了一个"list"列，所以我假设你的真实数据看起来是这样的。

数据：

dat <- structure(list(a = list("1500925OR3-29139-315012", "1500925OR3-2-2913A-315012", 
    c("1500925OR3-200B-315012", "1500925OR3-4-2919999-315012"))), row.names = c(NA, -3L), class = "data.frame")

英文:

Try this regex in sub and use lapply.

dat$b &lt;- lapply(dat$a, \(x) sub(&#39;-?.*-(.*)-.*&#39;, &#39;\&#39;, x, perl=TRUE))
dat
#                                                     a             b
# 1                             1500925OR3-29139-315012         29139
# 2                           1500925OR3-2-2913A-315012         2913A
# 3 1500925OR3-200B-315012, 1500925OR3-4-2919999-315012 200B, 2919999

You're talking about a "list" column, so I created one assuming that's what your real data looks like.

Data:

dat &lt;- structure(list(a = list(&quot;1500925OR3-29139-315012&quot;, &quot;1500925OR3-2-2913A-315012&quot;, 
    c(&quot;1500925OR3-200B-315012&quot;, &quot;1500925OR3-4-2919999-315012&quot;
    ))), row.names = c(NA, -3L), class = &quot;data.frame&quot;)

答案2

得分: 2

A tidyverse approach:

library(dplyr)
library(tidyr)
data %>%
  mutate(id = row_number()) %>%
  separate_rows(a, sep = "\\s") %>%
  mutate(b = str_extract(a, "(?<=-)[^-]*(?=-[^-]*$)")) %>%
  summarise(a_clean = toString(b), .by=id) %>%
  select(-id)

 a_clean      
  <chr>        
1 29139        
2 2913A        
3 200B, 2919999

英文:

A tidyverse approach:

library(dplyr)
library(tidyr)
data %&gt;%
  mutate(id = row_number()) %&gt;% 
  separate_rows(a, sep = &quot;\\s&quot;) %&gt;% 
  mutate(b = str_extract(a, &quot;(?&lt;=-)[^-]*(?=-[^-]*$)&quot;)) %&gt;% 
  summarise(a_clean = toString(b), .by=id) %&gt;% 
  select(-id)

 a_clean      
  &lt;chr&gt;        
1 29139        
2 2913A        
3 200B, 2919999

答案3

得分: 2

data.frame(
  a = c(
    "1500925OR3-29139-315012",
    "1500925OR3-2-2913A-315012",
    c("1500925OR3-200B-315012", "1500925OR3-4-2919999-315012")
  ),
  b = c(1:3)
) %>% separate_rows(a, sep = ',') %>% separate(a,
                                                 c('col1', 'col2', 'col3', 'col4'),
                                                 sep = '-',
                                                 fill = 'left') %>% group_by(b) %>%
  summarise(col3 = str_c(col3, collapse = ","))

# A tibble: 3 x 2
      b col3        
  <int> <chr>       
1     1 29139       
2     2 2913A       
3     3 200B,2919999

英文:

Alternatively,

data.frame(
  a = c(
    &quot;1500925OR3-29139-315012&quot;,
    &quot;1500925OR3-2-2913A-315012&quot;,
    &quot;c(\&quot;1500925OR3-200B-315012\&quot;, \&quot;1500925OR3-4-2919999-315012\&quot;)&quot;
  ),
  b = c(1:3)
) %&gt;% separate_rows(a, sep = &#39;\\,&#39;) %&gt;% separate(a,
                                                 c(&#39;col1&#39;, &#39;col2&#39;, &#39;col3&#39;, &#39;col4&#39;),
                                                 sep = &#39;\\-&#39;,
                                                 fill = &#39;left&#39;) %&gt;% group_by(b) %&gt;%
  summarise(col3 = str_c(col3, collapse = &quot;,&quot;))

# A tibble: 3 &#215; 2
      b col3        
  &lt;int&gt; &lt;chr&gt;       
1     1 29139       
2     2 2913A       
3     3 200B,2919999

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How can I extract a string from between last dash and second to last dash out of a column that contains lists of strings?

问题

答案1

答案2

答案3

如何在JavaScript中选择数组中的特定字符而不使用正则表达式？

Go模板和多行字符串缩进

Index X out of bounds for length X 索引 X 超出长度 X 的范围

How to store %3a or %a in String in golang?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。