英文:
Transforming list to data frame with related data points in same row
问题
Here's the translated code part:
my_list <- list(
list(id = "80067", name = "3403 Macromolecular and Materials Chemistry"),
list(id = "80011", name = "40 Engineering"),
list(id = "80005", name = "34 Chemical Sciences")
)
And the desired output:
data.frame(division = c("40 Engineering", "34 Chemical Sciences"),
group = c(NA, "3403 Macromolecular and Materials Chemistry"))
英文:
I have a list that represents field of research information of a single publication. I want to combine the list to a data.frame such that each 2-digit code is stored in a "division" column and each 4-digit code in a "group" column. When the first two digits are shared, the division and group should be stored in the same row. I apologize for the bad title.
my_list <- list(
list(id = "80067", name = "3403 Macromolecular and Materials Chemistry"),
list(id = "80011", name = "40 Engineering"),
list(id = "80005", name = "34 Chemical Sciences")
)
Desired output:
data.frame(division = c("40 Engineering", "34 Chemical Sciences"),
group = c(NA, "3403 Macromolecular and Materials Chemistry"))
答案1
得分: 5
以下是您要翻译的内容:
首先,将您的 my_list
转化为一个向量,然后将其转化为一个两列的数据框,使用 filter
来仅包括 name
列,然后根据数字的模式分配 group
和 prefix
(用于分组到相同的行),最后将结构从 "long" 转换为 "wide"。
library(tidyverse)
unlist(my_list) %>%
enframe() %>%
filter(name == "name") %>%
mutate(group = ifelse(str_count(value, "\\d") == 4, "group", "division"),
prefix = str_extract(value, "^\\d{2}"), .keep = "used") %>%
pivot_wider(names_from = group, values_from = value)
更新:如果我们在开头使用 bind_rows
(受 @akrun 答案启发),可以简化上述代码:
bind_rows(my_list) %>%
mutate(group = ifelse(str_count(name, "\\d") == 4, "group", "division"),
prefix = str_extract(name, "^\\d{2}"), .keep = "used") %>%
pivot_wider(names_from = group, values_from = name)
输出
# A tibble: 2 × 3
prefix group division
<chr> <chr> <chr>
1 34 3403 Macromolecular and Materials Chemistry 34 Chemical Sciences
2 40 NA 40 Engineering 40 Engineering
英文:
First unlist
your my_list
into a vector, then enframe
it into a two-column dataframe. filter
to only include the name
column, then assign group
and prefix
(for grouping into same row) by the patterns of digit. Finally reshape the structure from "long" to "wide".
library(tidyverse)
unlist(my_list) %>%
enframe() %>%
filter(name == "name") %>%
mutate(group = ifelse(str_count(value, "\\d") == 4, "group", "division"),
prefix = str_extract(value, "^\\d{2}"), .keep = "used") %>%
pivot_wider(names_from = group, values_from = value)
<hr>
Update: The above code can be simplified by a bit if we use bind_rows
(inspired by @akrun's answer) at the beginning:
bind_rows(my_list) %>%
mutate(group = ifelse(str_count(name, "\\d") == 4, "group", "division"),
prefix = str_extract(name, "^\\d{2}"), .keep = "used") %>%
pivot_wider(names_from = group, values_from = name)
<hr>
Output
# A tibble: 2 × 3
prefix group division
<chr> <chr> <chr>
1 34 3403 Macromolecular and Materials Chemistry 34 Chemical Sciences
2 40 NA 40 Engineering 40 Engineering
答案2
得分: 4
We could use bind_rows
to create a data.frame and then do a grouping by substring to extract the components
library(stringr)
library(dplyr)
bind_rows(my_list) %>%
group_by(grp = substr(name, 1, 2)) %>%
summarise(group = str_extract(name, "^\\d{4}\\s+(.*)"),
division = name, .groups = "keep") %>%
filter(any(!is.na(group)) & !is.na(group) | n() == 1) %>%
ungroup %>%
select(division, group)
-output
# A tibble: 2 × 2
division group
<chr> <chr>
1 3403 Macromolecular and Materials Chemistry 3403 Macromolecular and Materials Chemistry
2 40 Engineering <NA>
英文:
We could use bind_rows
to create a data.frame and then do a grouping by substring to extract the components
library(stringr)
library(dplyr)
bind_rows(my_list) %>%
group_by(grp = substr(name, 1, 2)) %>%
summarise(group = str_extract(name, "^\\d{4}\\s+(.*)"),
division = name, .groups = "keep") %>%
filter(any(!is.na(group)) & !is.na(group) | n() == 1) %>%
ungroup %>%
select(division, group)
-output
# A tibble: 2 × 2
division group
<chr> <chr>
1 3403 Macromolecular and Materials Chemistry 3403 Macromolecular and Materials Chemistry
2 40 Engineering <NA>
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论