从数据框中提取引号内的字母字符串

huangapple go评论59阅读模式
英文:

R: Extract alphabetical string from inside of quotation mark in dataframe

问题

我有一个数据框

data.frame(string = c('["jewelry","tailor","Jewelry"]', '["apple","banana","orange"]'))


我想要创建三列和两行...
我想让数据框看起来像

data.frame(string1 = c('jewelry','apple'), string2=c('tailor','banana'), string3=c('jewelry','orange'))


<details>
<summary>英文:</summary>

I have a dataframe

data.frame(string = c('["jewelry","tailor","Jewelry"]', '["apple","banana","orange"]'))




I want to create three columns and two rows...
 I want to have the dataframe to look like 

data.frame(string1 = c('jewelry','apple'), string2=c('tailor','banana'), string3=c('jewelry','orange'))



</details>


# 答案1
**得分**: 1

In `base R`, we could remove the brackets, and use `read.csv` to read the column into a `data.frame`

```R
read.csv(text = gsub('\\[|\\]|&quot;', '', df1$string),
    header = FALSE, col.names =  paste0('string', 1:3))

-output

  string1 string2 string3
1 jewelry  tailor Jewelry
2   apple  banana  orange

Or using tidyverse

library(dplyr)
library(stringr)
library(tidyr)
df1 %>%
   mutate(string = str_remove_all(string, '\\[|\\]|&quot;')) %>%
   separate_wider_delim(string, delim = ',',
     names = c('string1', 'string2', 'string3'))

-output

# A tibble: 2 x 3
  string1 string2 string3
  <chr>   <chr>   <chr>  
1 jewelry tailor  Jewelry
2 apple   banana  orange 

data

df1 <- data.frame(string = c('[\"jewelry\",\"tailor\",\"Jewelry\"]', 
     '[\"apple\",\"banana\",\"orange\"]')) 
英文:

In base R, we could remove the brackets, and use read.csv to read the column into a data.frame

read.csv(text = gsub(&#39;\\[|\\]|&quot;&#39;, &quot;&quot;, df1$string),
    header = FALSE, col.names =  paste0(&quot;string&quot;, 1:3))

-output

  string1 string2 string3
1 jewelry  tailor Jewelry
2   apple  banana  orange

Or using tidyverse

library(dplyr)
library(stringr)
library(tidyr)
df1 %&gt;%
   mutate(string = str_remove_all(string, &#39;\\[|\\]|&quot;&#39;)) %&gt;% 
   separate_wider_delim(string, delim = &#39;,&#39;, 
     names = c(&quot;string1&quot;, &quot;string2&quot;, &quot;string3&quot;))

-output

# A tibble: 2 &#215; 3
  string1 string2 string3
  &lt;chr&gt;   &lt;chr&gt;   &lt;chr&gt;  
1 jewelry tailor  Jewelry
2 apple   banana  orange 

data

df1 &lt;- data.frame(string = c(&#39;[&quot;jewelry&quot;,&quot;tailor&quot;,&quot;Jewelry&quot;]&#39;, 
     &#39;[&quot;apple&quot;,&quot;banana&quot;,&quot;orange&quot;]&#39;)) 

答案2

得分: 0

以下是翻译好的部分:

# 如果您事先知道最大列数并且在排序或列分配方面没有重要的间隙,可以使用tidyverse中的可选方法(对于此示例中涉及少于3列的行):

# 虚拟数据
myDf <- data.frame(string = c('[&quot;jewelry&quot;,&quot;tailor&quot;,&quot;Jewelry&quot;]', '[&quot;apple&quot;,&quot;banana&quot;,&quot;orange&quot;]'))

library(dplyr)
library(tidyr)

myDf %>% 
    # 选择要拆分的列、新列名和包含在括号中的捕获组的正则表达式
    tidyr::extract(string, into = c("a", "b", "c"), regex = '"(\\w*)","(\\w*)","(\\w*)")

    a      b       c
1 jewelry tailor Jewelry
2   apple banana  orange
英文:

a optional tidyverse approach if you know the max number of columns before hand and have no gaps that matter in terms of ordering or column assignment (rows with less then 3 columns concerning this example):

# dummy data
myDf &lt;- data.frame(string = c(&#39;[&quot;jewelry&quot;,&quot;tailor&quot;,&quot;Jewelry&quot;]&#39;, &#39;[&quot;apple&quot;,&quot;banana&quot;,&quot;orange&quot;]&#39;)) 


library(dplyr)
library(tidyr)

myDf %&gt;% 
    # select column to split, new column names and regex with capture groups (parts between brakets
    tidyr::extract(string, into = c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;), regex = &#39;&quot;(\\w*)&quot;,&quot;(\\w*)&quot;,&quot;(\\w*)&quot;&#39;)

        a      b       c
1 jewelry tailor Jewelry
2   apple banana  orange

答案3

得分: 0

这看起来像一个有效的Python/JSON列表。

使用reticulate

library(tidyverse)

df1 %>%
  rowwise() %>%
  transmute(string=list(reticulate::py_eval(string))) %>%
  unnest_wider(string, names_sep = '')

使用jsonlite

a <- jsonlite::fromJSON(paste('[', paste(df1$string, collapse = ','), ']'))
setNames(data.frame(a), paste0('string', seq(ncol(a)))

或者:

d <- do.call(rbind, lapply(df1$string, jsonlite::fromJSON))
setNames(data.frame(d), paste0('string', seq(ncol(d)))
英文:

This looks like a valid python/json list.

Using reticulate:

library(tidyverse)

df1%&gt;%
  rowwise() %&gt;%
  transmute(string=list(reticulate::py_eval(string)))%&gt;%
  unnest_wider(string, names_sep = &#39;&#39;)

#&gt; # A tibble: 2 &#215; 3
#&gt;   string1 string2 string3
#&gt;   &lt;chr&gt;   &lt;chr&gt;   &lt;chr&gt;  
#&gt; 1 jewelry tailor  Jewelry
#&gt; 2 apple   banana  orange

using jsonlite:

a &lt;- jsonlite::fromJSON(paste(&#39;[&#39;, paste(df1$string, collapse = &#39;,&#39;), &#39;]&#39;))
setNames(data.frame(a), paste0(&#39;string&#39;, seq(ncol(a))))

#&gt;   string1 string2 string3
#&gt; 1 jewelry  tailor Jewelry
#&gt; 2   apple  banana  orange

or even:

d &lt;- do.call(rbind, lapply(df1$string, jsonlite::fromJSON))
setNames(data.frame(d), paste0(&#39;string&#39;, seq(ncol(d))))

  string1 string2 string3
1 jewelry  tailor Jewelry
2   apple  banana  orange

答案4

得分: 0

使用jsonliteunnest_wider处理列表列:

library(dplyr)
library(tidyr)
library(purrr)
library(jsonlite)

df <- data.frame(string = c('[\"jewelry\",\"tailor\",\"Jewelry\"]', 
                            '[\"apple\",\"banana\",\"orange\"]')) 
                            
df %>%  
  mutate(string = map(string, ~ parse_json(.x))) %>% 
  unnest_wider(string, names_sep = "")
#> # A tibble: 2 x 3
#>   string1 string2 string3
#>   <chr>   <chr>   <chr>  
#> 1 jewelry tailor  Jewelry
#> 2 apple   banana  orange

创建于2023年2月13日,使用reprex v2.0.2

英文:

jsonlite through a list column and unnest_wider :

library(dplyr)
library(tidyr)
library(purrr)
library(jsonlite)

df &lt;- data.frame(string = c(&#39;[&quot;jewelry&quot;,&quot;tailor&quot;,&quot;Jewelry&quot;]&#39;, 
                            &#39;[&quot;apple&quot;,&quot;banana&quot;,&quot;orange&quot;]&#39;)) 
                            
df %&gt;%  
  mutate(string = map(string, ~ parse_json(.x))) %&gt;% 
  unnest_wider(string, names_sep = &quot;&quot;)
#&gt; # A tibble: 2 &#215; 3
#&gt;   string1 string2 string3
#&gt;   &lt;chr&gt;   &lt;chr&gt;   &lt;chr&gt;  
#&gt; 1 jewelry tailor  Jewelry
#&gt; 2 apple   banana  orange

<sup>Created on 2023-02-13 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年2月14日 01:25:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75439242.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定