英文:
R: Extract alphabetical string from inside of quotation mark in dataframe
问题
我有一个数据框
data.frame(string = c('["jewelry","tailor","Jewelry"]', '["apple","banana","orange"]'))
我想要创建三列和两行...
我想让数据框看起来像
data.frame(string1 = c('jewelry','apple'), string2=c('tailor','banana'), string3=c('jewelry','orange'))
<details>
<summary>英文:</summary>
I have a dataframe
data.frame(string = c('["jewelry","tailor","Jewelry"]', '["apple","banana","orange"]'))
I want to create three columns and two rows...
I want to have the dataframe to look like
data.frame(string1 = c('jewelry','apple'), string2=c('tailor','banana'), string3=c('jewelry','orange'))
</details>
# 答案1
**得分**: 1
In `base R`, we could remove the brackets, and use `read.csv` to read the column into a `data.frame`
```R
read.csv(text = gsub('\\[|\\]|"', '', df1$string),
header = FALSE, col.names = paste0('string', 1:3))
-output
string1 string2 string3
1 jewelry tailor Jewelry
2 apple banana orange
Or using tidyverse
library(dplyr)
library(stringr)
library(tidyr)
df1 %>%
mutate(string = str_remove_all(string, '\\[|\\]|"')) %>%
separate_wider_delim(string, delim = ',',
names = c('string1', 'string2', 'string3'))
-output
# A tibble: 2 x 3
string1 string2 string3
<chr> <chr> <chr>
1 jewelry tailor Jewelry
2 apple banana orange
data
df1 <- data.frame(string = c('[\"jewelry\",\"tailor\",\"Jewelry\"]',
'[\"apple\",\"banana\",\"orange\"]'))
英文:
In base R
, we could remove the brackets, and use read.csv
to read the column into a data.frame
read.csv(text = gsub('\\[|\\]|"', "", df1$string),
header = FALSE, col.names = paste0("string", 1:3))
-output
string1 string2 string3
1 jewelry tailor Jewelry
2 apple banana orange
Or using tidyverse
library(dplyr)
library(stringr)
library(tidyr)
df1 %>%
mutate(string = str_remove_all(string, '\\[|\\]|"')) %>%
separate_wider_delim(string, delim = ',',
names = c("string1", "string2", "string3"))
-output
# A tibble: 2 × 3
string1 string2 string3
<chr> <chr> <chr>
1 jewelry tailor Jewelry
2 apple banana orange
data
df1 <- data.frame(string = c('["jewelry","tailor","Jewelry"]',
'["apple","banana","orange"]'))
答案2
得分: 0
以下是翻译好的部分:
# 如果您事先知道最大列数并且在排序或列分配方面没有重要的间隙,可以使用tidyverse中的可选方法(对于此示例中涉及少于3列的行):
# 虚拟数据
myDf <- data.frame(string = c('["jewelry","tailor","Jewelry"]', '["apple","banana","orange"]'))
library(dplyr)
library(tidyr)
myDf %>%
# 选择要拆分的列、新列名和包含在括号中的捕获组的正则表达式
tidyr::extract(string, into = c("a", "b", "c"), regex = '"(\\w*)","(\\w*)","(\\w*)")
a b c
1 jewelry tailor Jewelry
2 apple banana orange
英文:
a optional tidyverse approach if you know the max number of columns before hand and have no gaps that matter in terms of ordering or column assignment (rows with less then 3 columns concerning this example):
# dummy data
myDf <- data.frame(string = c('["jewelry","tailor","Jewelry"]', '["apple","banana","orange"]'))
library(dplyr)
library(tidyr)
myDf %>%
# select column to split, new column names and regex with capture groups (parts between brakets
tidyr::extract(string, into = c("a", "b", "c"), regex = '"(\\w*)","(\\w*)","(\\w*)"')
a b c
1 jewelry tailor Jewelry
2 apple banana orange
答案3
得分: 0
这看起来像一个有效的Python/JSON列表。
使用reticulate
:
library(tidyverse)
df1 %>%
rowwise() %>%
transmute(string=list(reticulate::py_eval(string))) %>%
unnest_wider(string, names_sep = '')
使用jsonlite
:
a <- jsonlite::fromJSON(paste('[', paste(df1$string, collapse = ','), ']'))
setNames(data.frame(a), paste0('string', seq(ncol(a)))
或者:
d <- do.call(rbind, lapply(df1$string, jsonlite::fromJSON))
setNames(data.frame(d), paste0('string', seq(ncol(d)))
英文:
This looks like a valid python/json list.
Using reticulate:
library(tidyverse)
df1%>%
rowwise() %>%
transmute(string=list(reticulate::py_eval(string)))%>%
unnest_wider(string, names_sep = '')
#> # A tibble: 2 × 3
#> string1 string2 string3
#> <chr> <chr> <chr>
#> 1 jewelry tailor Jewelry
#> 2 apple banana orange
using jsonlite
:
a <- jsonlite::fromJSON(paste('[', paste(df1$string, collapse = ','), ']'))
setNames(data.frame(a), paste0('string', seq(ncol(a))))
#> string1 string2 string3
#> 1 jewelry tailor Jewelry
#> 2 apple banana orange
or even:
d <- do.call(rbind, lapply(df1$string, jsonlite::fromJSON))
setNames(data.frame(d), paste0('string', seq(ncol(d))))
string1 string2 string3
1 jewelry tailor Jewelry
2 apple banana orange
答案4
得分: 0
使用jsonlite
和unnest_wider
处理列表列:
library(dplyr)
library(tidyr)
library(purrr)
library(jsonlite)
df <- data.frame(string = c('[\"jewelry\",\"tailor\",\"Jewelry\"]',
'[\"apple\",\"banana\",\"orange\"]'))
df %>%
mutate(string = map(string, ~ parse_json(.x))) %>%
unnest_wider(string, names_sep = "")
#> # A tibble: 2 x 3
#> string1 string2 string3
#> <chr> <chr> <chr>
#> 1 jewelry tailor Jewelry
#> 2 apple banana orange
创建于2023年2月13日,使用reprex v2.0.2
英文:
jsonlite
through a list column and unnest_wider
:
library(dplyr)
library(tidyr)
library(purrr)
library(jsonlite)
df <- data.frame(string = c('["jewelry","tailor","Jewelry"]',
'["apple","banana","orange"]'))
df %>%
mutate(string = map(string, ~ parse_json(.x))) %>%
unnest_wider(string, names_sep = "")
#> # A tibble: 2 × 3
#> string1 string2 string3
#> <chr> <chr> <chr>
#> 1 jewelry tailor Jewelry
#> 2 apple banana orange
<sup>Created on 2023-02-13 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论