在R中查找数据框列中字符的第一次出现。

huangapple go评论151阅读模式
英文:

Find first occurrence of a character in column of a data frame in R

问题

I can help you with the translation:

在R中处理字符串时感到困惑...

我在R数据框中有一列字符串。每个字符串中都包含一次且仅一次的字符"="。我想知道每个列元素中"="字符的位置,这是将该列拆分为两个独立列的步骤之一(一个用于"="之前的部分,另一个用于"="之后的部分)。有人能帮忙吗?我相信这很简单,但我一直在努力寻找答案。

例如,如果我有:

x <- data.frame(string = c("aa=1", "aa=2", "aa=3", "b=1", "b=2", "abc=5"))

我想要一段代码返回:

(3, 3, 3, 2, 2, 4)

谢谢。

英文:

Struggling with string handling in R...

I've got a column of strings in an R data frame. Each one contains the &quot;=&quot; character once and only once. I'd like to know the position of the &quot;=&quot; character in each element of the column, as a step to splitting the column into two separate columns (one for the bit before the &quot;=&quot; and one for the bit after the &quot;=&quot;). Can anyone help please? I'm sure it's simple but I'm struggling to find the answer.

For example, if I have:

x &lt;- data.frame(string = c(&quot;aa=1&quot;, &quot;aa=2&quot;, &quot;aa=3&quot;, &quot;b=1&quot;, &quot;b=2&quot;, &quot;abc=5&quot;))

I'd like a bit of code to return

> (3, 3, 3, 2, 2, 4)

Thank you.

答案1

得分: 1

这是一种方法:

library(stringr)
str_locate(x$string, "=")[,1]
英文:

Here's a way to do:

library(stringr)
str_locate(x$string, &quot;=&quot;)[,1]

答案2

得分: 1

在基本的 R 中,您可以执行以下操作:

as.numeric(lapply(strsplit(as.character(x$string), ""), function(x) which(x == "=")))

>[1] 3 3 3 2 2 4

英文:

In Base R you can do:

as.numeric(lapply(strsplit(as.character(x$string), &quot;&quot;), function(x) which(x == &quot;=&quot;)))

>[1] 3 3 3 2 2 4

答案3

得分: 1

你可以使用 gregexpr

unlist(lapply(gregexpr(pattern = ''='', x$string), min))
[1] 3 3 3 2 2 4

英文:

You can use gregexpr:

unlist(lapply(gregexpr(pattern = &#39;=&#39;, x$string), min))
[1] 3 3 3 2 2 4

答案4

得分: 1

要获取“=”的位置,您可以使用regexp函数:

regexpr("=", x$string)
#[1] 3 3 3 2 2 4
#attr(,"match.length")
#[1] 1 1 1 1 1 1
#attr(,"useBytes")
#[1] TRUE

但是,正如@Michael所述,如果您的目标是拆分字符串,您可以使用strsplit

strsplit(x$string, "=")
#[[1]]
#[1] "aa" "1" 

#[[2]]
#[1] "aa" "2" 

#[[3]]
#[1] "aa" "3" 

#[[4]]
#[1] "b"  "1" 

#[[5]]
#[1] "b"  "2" 

#[[6]]
#[1] "abc" "5" 

或者使用do.callrbind组合来创建一个新的数据框:

do.call(rbind, strsplit(x$string, "="))
#     [,1]  [,2]
#[1,] "aa"  "1" 
#[2,] "aa"  "2" 
#[3,] "aa"  "3" 
#[4,] "b"   "1" 
#[5,] "b"   "2" 
#[6,] "abc" "5" 
英文:

To get the position of "=" you can use the regexp function:

regexpr(&quot;=&quot;, x$string)
#[1] 3 3 3 2 2 4
#attr(,&quot;match.length&quot;)
#[1] 1 1 1 1 1 1
#attr(,&quot;useBytes&quot;)
#[1] TRUE 

However, as @Michael stated if your goal is to split the string you can use strsplit:

strsplit(x$string, &quot;=&quot;)
#[[1]]
#[1] &quot;aa&quot; &quot;1&quot; 
#
#[[2]]
#[1] &quot;aa&quot; &quot;2&quot; 
#
#[[3]]
#[1] &quot;aa&quot; &quot;3&quot; 
#
#[[4]]
#[1] &quot;b&quot; &quot;1&quot;
#
#[[5]]
#[1] &quot;b&quot; &quot;2&quot;
#
#[[6]]
#[1] &quot;abc&quot; &quot;5&quot;

Or to combine with do.call and `rbind to create a new dataframe:

do.call(rbind, strsplit(x$string, &quot;=&quot;))
#     [,1]  [,2]
#[1,] &quot;aa&quot;  &quot;1&quot; 
#[2,] &quot;aa&quot;  &quot;2&quot; 
#[3,] &quot;aa&quot;  &quot;3&quot; 
#[4,] &quot;b&quot;   &quot;1&quot; 
#[5,] &quot;b&quot;   &quot;2&quot; 
#[6,] &quot;abc&quot; &quot;5&quot; 

答案5

得分: 1

以下是翻译好的部分:

这是获取一个两列数据框的另一种解决方案,第一列包含等号(=)之前的字符,第二列包含等号之后的字符。您可以在不获取等号位置的情况下完成这个操作。

library(stringr)

t(as.data.frame(strsplit(x$string, "=")))

#              [,1]  [,2]
#c..aa....1..  "aa"  "1" 
#c..aa....2..  "aa"  "2" 
#c..aa....3..  "aa"  "3" 
#c..b....1..   "b"   "1" 
#c..b....2..   "b"   "2" 
#c..abc....5.. "abc" "5"
英文:

Here is another solution to obtain a two column dataframe, the first containing the characters before = and the second one containing the characters after =. You can do that without obtaining the positions of the = character.

library(stringr)

t(as.data.frame(strsplit(x$string, &quot;=&quot;)))

#              [,1]  [,2]
#c..aa....1..  &quot;aa&quot;  &quot;1&quot; 
#c..aa....2..  &quot;aa&quot;  &quot;2&quot; 
#c..aa....3..  &quot;aa&quot;  &quot;3&quot; 
#c..b....1..   &quot;b&quot;   &quot;1&quot; 
#c..b....2..   &quot;b&quot;   &quot;2&quot; 
#c..abc....5.. &quot;abc&quot; &quot;5&quot;

答案6

得分: 0

一些人可能会觉得这更容易阅读

    library(tidyverse)
    x %>%
      mutate(
        number = string %>%
          str_extract('[:digit:]+'),
        text = string %>%
          str_extract('[:alpha:]+')
      ) %>%
      as_tibble()
    # 一个 tibble: 6 x 3
      string number text 
      <fct>  <chr>  <chr>
    1 aa=1   1      aa   
    2 aa=2   2      aa   
    3 aa=3   3      aa   
    4 b=1    1      b    
    5 b=2    2      b    
    6 abc=5  5      abc  
英文:

Some may find this more readable

library(tidyverse)
x %&gt;%
  mutate(
    number = string %&gt;% str_extract(&#39;[:digit:]+&#39;),
    text = string %&gt;%  str_extract(&#39;[:alpha:]+&#39;)
  ) %&gt;%
  as_tibble()
# A tibble: 6 x 3
  string number text 
  &lt;fct&gt;  &lt;chr&gt;  &lt;chr&gt;
1 aa=1   1      aa   
2 aa=2   2      aa   
3 aa=3   3      aa   
4 b=1    1      b    
5 b=2    2      b    
6 abc=5  5      abc  

huangapple
  • 本文由 发表于 2020年1月4日 00:46:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/59582251.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定