在R中查找数据框列中字符的第一次出现。

huangapple go评论179阅读模式
英文:

Find first occurrence of a character in column of a data frame in R

问题

I can help you with the translation:

在R中处理字符串时感到困惑...

我在R数据框中有一列字符串。每个字符串中都包含一次且仅一次的字符"="。我想知道每个列元素中"="字符的位置,这是将该列拆分为两个独立列的步骤之一(一个用于"="之前的部分,另一个用于"="之后的部分)。有人能帮忙吗?我相信这很简单,但我一直在努力寻找答案。

例如,如果我有:

  1. x <- data.frame(string = c("aa=1", "aa=2", "aa=3", "b=1", "b=2", "abc=5"))

我想要一段代码返回:

  1. (3, 3, 3, 2, 2, 4)

谢谢。

英文:

Struggling with string handling in R...

I've got a column of strings in an R data frame. Each one contains the &quot;=&quot; character once and only once. I'd like to know the position of the &quot;=&quot; character in each element of the column, as a step to splitting the column into two separate columns (one for the bit before the &quot;=&quot; and one for the bit after the &quot;=&quot;). Can anyone help please? I'm sure it's simple but I'm struggling to find the answer.

For example, if I have:

  1. x &lt;- data.frame(string = c(&quot;aa=1&quot;, &quot;aa=2&quot;, &quot;aa=3&quot;, &quot;b=1&quot;, &quot;b=2&quot;, &quot;abc=5&quot;))

I'd like a bit of code to return

> (3, 3, 3, 2, 2, 4)

Thank you.

答案1

得分: 1

这是一种方法:

  1. library(stringr)
  2. str_locate(x$string, "=")[,1]
英文:

Here's a way to do:

  1. library(stringr)
  2. str_locate(x$string, &quot;=&quot;)[,1]

答案2

得分: 1

在基本的 R 中,您可以执行以下操作:

  1. as.numeric(lapply(strsplit(as.character(x$string), ""), function(x) which(x == "=")))

>[1] 3 3 3 2 2 4

英文:

In Base R you can do:

  1. as.numeric(lapply(strsplit(as.character(x$string), &quot;&quot;), function(x) which(x == &quot;=&quot;)))

>[1] 3 3 3 2 2 4

答案3

得分: 1

你可以使用 gregexpr

unlist(lapply(gregexpr(pattern = ''='', x$string), min))
[1] 3 3 3 2 2 4

英文:

You can use gregexpr:

  1. unlist(lapply(gregexpr(pattern = &#39;=&#39;, x$string), min))
  2. [1] 3 3 3 2 2 4

答案4

得分: 1

要获取“=”的位置,您可以使用regexp函数:

  1. regexpr("=", x$string)
  2. #[1] 3 3 3 2 2 4
  3. #attr(,"match.length")
  4. #[1] 1 1 1 1 1 1
  5. #attr(,"useBytes")
  6. #[1] TRUE

但是,正如@Michael所述,如果您的目标是拆分字符串,您可以使用strsplit

  1. strsplit(x$string, "=")
  2. #[[1]]
  3. #[1] "aa" "1"
  4. #[[2]]
  5. #[1] "aa" "2"
  6. #[[3]]
  7. #[1] "aa" "3"
  8. #[[4]]
  9. #[1] "b" "1"
  10. #[[5]]
  11. #[1] "b" "2"
  12. #[[6]]
  13. #[1] "abc" "5"

或者使用do.callrbind组合来创建一个新的数据框:

  1. do.call(rbind, strsplit(x$string, "="))
  2. # [,1] [,2]
  3. #[1,] "aa" "1"
  4. #[2,] "aa" "2"
  5. #[3,] "aa" "3"
  6. #[4,] "b" "1"
  7. #[5,] "b" "2"
  8. #[6,] "abc" "5"
英文:

To get the position of "=" you can use the regexp function:

  1. regexpr(&quot;=&quot;, x$string)
  2. #[1] 3 3 3 2 2 4
  3. #attr(,&quot;match.length&quot;)
  4. #[1] 1 1 1 1 1 1
  5. #attr(,&quot;useBytes&quot;)
  6. #[1] TRUE

However, as @Michael stated if your goal is to split the string you can use strsplit:

  1. strsplit(x$string, &quot;=&quot;)
  2. #[[1]]
  3. #[1] &quot;aa&quot; &quot;1&quot;
  4. #
  5. #[[2]]
  6. #[1] &quot;aa&quot; &quot;2&quot;
  7. #
  8. #[[3]]
  9. #[1] &quot;aa&quot; &quot;3&quot;
  10. #
  11. #[[4]]
  12. #[1] &quot;b&quot; &quot;1&quot;
  13. #
  14. #[[5]]
  15. #[1] &quot;b&quot; &quot;2&quot;
  16. #
  17. #[[6]]
  18. #[1] &quot;abc&quot; &quot;5&quot;

Or to combine with do.call and `rbind to create a new dataframe:

  1. do.call(rbind, strsplit(x$string, &quot;=&quot;))
  2. # [,1] [,2]
  3. #[1,] &quot;aa&quot; &quot;1&quot;
  4. #[2,] &quot;aa&quot; &quot;2&quot;
  5. #[3,] &quot;aa&quot; &quot;3&quot;
  6. #[4,] &quot;b&quot; &quot;1&quot;
  7. #[5,] &quot;b&quot; &quot;2&quot;
  8. #[6,] &quot;abc&quot; &quot;5&quot;

答案5

得分: 1

以下是翻译好的部分:

这是获取一个两列数据框的另一种解决方案,第一列包含等号(=)之前的字符,第二列包含等号之后的字符。您可以在不获取等号位置的情况下完成这个操作。

  1. library(stringr)
  2. t(as.data.frame(strsplit(x$string, "=")))
  3. # [,1] [,2]
  4. #c..aa....1.. "aa" "1"
  5. #c..aa....2.. "aa" "2"
  6. #c..aa....3.. "aa" "3"
  7. #c..b....1.. "b" "1"
  8. #c..b....2.. "b" "2"
  9. #c..abc....5.. "abc" "5"
英文:

Here is another solution to obtain a two column dataframe, the first containing the characters before = and the second one containing the characters after =. You can do that without obtaining the positions of the = character.

  1. library(stringr)
  2. t(as.data.frame(strsplit(x$string, &quot;=&quot;)))
  3. # [,1] [,2]
  4. #c..aa....1.. &quot;aa&quot; &quot;1&quot;
  5. #c..aa....2.. &quot;aa&quot; &quot;2&quot;
  6. #c..aa....3.. &quot;aa&quot; &quot;3&quot;
  7. #c..b....1.. &quot;b&quot; &quot;1&quot;
  8. #c..b....2.. &quot;b&quot; &quot;2&quot;
  9. #c..abc....5.. &quot;abc&quot; &quot;5&quot;

答案6

得分: 0

  1. 一些人可能会觉得这更容易阅读
  2. library(tidyverse)
  3. x %>%
  4. mutate(
  5. number = string %>%
  6. str_extract('[:digit:]+'),
  7. text = string %>%
  8. str_extract('[:alpha:]+')
  9. ) %>%
  10. as_tibble()
  11. # 一个 tibble: 6 x 3
  12. string number text
  13. <fct> <chr> <chr>
  14. 1 aa=1 1 aa
  15. 2 aa=2 2 aa
  16. 3 aa=3 3 aa
  17. 4 b=1 1 b
  18. 5 b=2 2 b
  19. 6 abc=5 5 abc
英文:

Some may find this more readable

  1. library(tidyverse)
  2. x %&gt;%
  3. mutate(
  4. number = string %&gt;% str_extract(&#39;[:digit:]+&#39;),
  5. text = string %&gt;% str_extract(&#39;[:alpha:]+&#39;)
  6. ) %&gt;%
  7. as_tibble()
  8. # A tibble: 6 x 3
  9. string number text
  10. &lt;fct&gt; &lt;chr&gt; &lt;chr&gt;
  11. 1 aa=1 1 aa
  12. 2 aa=2 2 aa
  13. 3 aa=3 3 aa
  14. 4 b=1 1 b
  15. 5 b=2 2 b
  16. 6 abc=5 5 abc

huangapple
  • 本文由 发表于 2020年1月4日 00:46:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/59582251.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定