2023年2月14日 01:25:08go评论77阅读模式

英文:

R: Extract alphabetical string from inside of quotation mark in dataframe

问题

我有一个数据框

data.frame(string = c('["jewelry","tailor","Jewelry"]', '["apple","banana","orange"]'))


我想要创建三列和两行...
我想让数据框看起来像

data.frame(string1 = c('jewelry','apple'), string2=c('tailor','banana'), string3=c('jewelry','orange'))


<details>
<summary>英文:</summary>

I have a dataframe

data.frame(string = c('["jewelry","tailor","Jewelry"]', '["apple","banana","orange"]'))




I want to create three columns and two rows...
 I want to have the dataframe to look like

data.frame(string1 = c('jewelry','apple'), string2=c('tailor','banana'), string3=c('jewelry','orange'))



</details>


# 答案1
**得分**: 1

In `base R`, we could remove the brackets, and use `read.csv` to read the column into a `data.frame`

```R
read.csv(text = gsub('\\[|\\]|&quot;', '', df1$string),
    header = FALSE, col.names =  paste0('string', 1:3))

-output

  string1 string2 string3
1 jewelry  tailor Jewelry
2   apple  banana  orange

Or using tidyverse

library(dplyr)
library(stringr)
library(tidyr)
df1 %>%
   mutate(string = str_remove_all(string, '\\[|\\]|&quot;')) %>%
   separate_wider_delim(string, delim = ',',
     names = c('string1', 'string2', 'string3'))

-output

# A tibble: 2 x 3
  string1 string2 string3
  <chr>   <chr>   <chr>  
1 jewelry tailor  Jewelry
2 apple   banana  orange

data

df1 <- data.frame(string = c('[\"jewelry\",\"tailor\",\"Jewelry\"]', 
     '[\"apple\",\"banana\",\"orange\"]'))

英文:

In base R, we could remove the brackets, and use read.csv to read the column into a data.frame

read.csv(text = gsub(&#39;\\[|\\]|&quot;&#39;, &quot;&quot;, df1$string),
    header = FALSE, col.names =  paste0(&quot;string&quot;, 1:3))

-output

  string1 string2 string3
1 jewelry  tailor Jewelry
2   apple  banana  orange

Or using tidyverse

library(dplyr)
library(stringr)
library(tidyr)
df1 %&gt;%
   mutate(string = str_remove_all(string, &#39;\\[|\\]|&quot;&#39;)) %&gt;% 
   separate_wider_delim(string, delim = &#39;,&#39;, 
     names = c(&quot;string1&quot;, &quot;string2&quot;, &quot;string3&quot;))

-output

# A tibble: 2 &#215; 3
  string1 string2 string3
  &lt;chr&gt;   &lt;chr&gt;   &lt;chr&gt;  
1 jewelry tailor  Jewelry
2 apple   banana  orange

data

df1 &lt;- data.frame(string = c(&#39;[&quot;jewelry&quot;,&quot;tailor&quot;,&quot;Jewelry&quot;]&#39;, 
     &#39;[&quot;apple&quot;,&quot;banana&quot;,&quot;orange&quot;]&#39;))

答案2

得分: 0

以下是翻译好的部分：

# 如果您事先知道最大列数并且在排序或列分配方面没有重要的间隙，可以使用tidyverse中的可选方法（对于此示例中涉及少于3列的行）：

# 虚拟数据
myDf <- data.frame(string = c('[&quot;jewelry&quot;,&quot;tailor&quot;,&quot;Jewelry&quot;]', '[&quot;apple&quot;,&quot;banana&quot;,&quot;orange&quot;]'))

library(dplyr)
library(tidyr)

myDf %>% 
    # 选择要拆分的列、新列名和包含在括号中的捕获组的正则表达式
    tidyr::extract(string, into = c("a", "b", "c"), regex = '"(\\w*)","(\\w*)","(\\w*)")

    a      b       c
1 jewelry tailor Jewelry
2   apple banana  orange

英文:

a optional tidyverse approach if you know the max number of columns before hand and have no gaps that matter in terms of ordering or column assignment (rows with less then 3 columns concerning this example):

# dummy data
myDf &lt;- data.frame(string = c(&#39;[&quot;jewelry&quot;,&quot;tailor&quot;,&quot;Jewelry&quot;]&#39;, &#39;[&quot;apple&quot;,&quot;banana&quot;,&quot;orange&quot;]&#39;)) 


library(dplyr)
library(tidyr)

myDf %&gt;% 
    # select column to split, new column names and regex with capture groups (parts between brakets
    tidyr::extract(string, into = c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;), regex = &#39;&quot;(\\w*)&quot;,&quot;(\\w*)&quot;,&quot;(\\w*)&quot;&#39;)

        a      b       c
1 jewelry tailor Jewelry
2   apple banana  orange

答案3

得分: 0

这看起来像一个有效的Python/JSON列表。

使用reticulate：

library(tidyverse)

df1 %>%
  rowwise() %>%
  transmute(string=list(reticulate::py_eval(string))) %>%
  unnest_wider(string, names_sep = '')

使用jsonlite：

a <- jsonlite::fromJSON(paste('[', paste(df1$string, collapse = ','), ']'))
setNames(data.frame(a), paste0('string', seq(ncol(a)))

或者：

d <- do.call(rbind, lapply(df1$string, jsonlite::fromJSON))
setNames(data.frame(d), paste0('string', seq(ncol(d)))

英文:

This looks like a valid python/json list.

Using reticulate:

library(tidyverse)

df1%&gt;%
  rowwise() %&gt;%
  transmute(string=list(reticulate::py_eval(string)))%&gt;%
  unnest_wider(string, names_sep = &#39;&#39;)

#&gt; # A tibble: 2 &#215; 3
#&gt;   string1 string2 string3
#&gt;   &lt;chr&gt;   &lt;chr&gt;   &lt;chr&gt;  
#&gt; 1 jewelry tailor  Jewelry
#&gt; 2 apple   banana  orange

using jsonlite:

a &lt;- jsonlite::fromJSON(paste(&#39;[&#39;, paste(df1$string, collapse = &#39;,&#39;), &#39;]&#39;))
setNames(data.frame(a), paste0(&#39;string&#39;, seq(ncol(a))))

#&gt;   string1 string2 string3
#&gt; 1 jewelry  tailor Jewelry
#&gt; 2   apple  banana  orange

or even:

d &lt;- do.call(rbind, lapply(df1$string, jsonlite::fromJSON))
setNames(data.frame(d), paste0(&#39;string&#39;, seq(ncol(d))))

  string1 string2 string3
1 jewelry  tailor Jewelry
2   apple  banana  orange

答案4

得分: 0

使用jsonlite和unnest_wider处理列表列：

library(dplyr)
library(tidyr)
library(purrr)
library(jsonlite)

df <- data.frame(string = c('[\"jewelry\",\"tailor\",\"Jewelry\"]', 
                            '[\"apple\",\"banana\",\"orange\"]')) 
                            
df %>%  
  mutate(string = map(string, ~ parse_json(.x))) %>% 
  unnest_wider(string, names_sep = "")
#> # A tibble: 2 x 3
#>   string1 string2 string3
#>   <chr>   <chr>   <chr>  
#> 1 jewelry tailor  Jewelry
#> 2 apple   banana  orange

^{创建于2023年2月13日，使用reprex v2.0.2}

英文:

jsonlite through a list column and unnest_wider :

library(dplyr)
library(tidyr)
library(purrr)
library(jsonlite)

df &lt;- data.frame(string = c(&#39;[&quot;jewelry&quot;,&quot;tailor&quot;,&quot;Jewelry&quot;]&#39;, 
                            &#39;[&quot;apple&quot;,&quot;banana&quot;,&quot;orange&quot;]&#39;)) 
                            
df %&gt;%  
  mutate(string = map(string, ~ parse_json(.x))) %&gt;% 
  unnest_wider(string, names_sep = &quot;&quot;)
#&gt; # A tibble: 2 &#215; 3
#&gt;   string1 string2 string3
#&gt;   &lt;chr&gt;   &lt;chr&gt;   &lt;chr&gt;  
#&gt; 1 jewelry tailor  Jewelry
#&gt; 2 apple   banana  orange

<sup>Created on 2023-02-13 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从数据框中提取引号内的字母字符串

问题

data

data

答案2

答案3

答案4

R的for循环用于根据另一列的ID将函数应用于数据框向量。

将当前工作中的下拉菜单“updateSelectInput”模块化为闪亮的输入。

如何提取每家医院中重叠的住院期间？

如何在R中执行复杂的算术操作，从观测值之间减去指定百分比。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论