Transforming list to data frame with related data points in same row

huangapple go评论68阅读模式
英文:

Transforming list to data frame with related data points in same row

问题

Here's the translated code part:

my_list <- list(
  list(id = "80067", name = "3403 Macromolecular and Materials Chemistry"),
  list(id = "80011", name = "40 Engineering"),
  list(id = "80005", name = "34 Chemical Sciences")
)

And the desired output:

data.frame(division = c("40 Engineering", "34 Chemical Sciences"), 
           group = c(NA, "3403 Macromolecular and Materials Chemistry"))
英文:

I have a list that represents field of research information of a single publication. I want to combine the list to a data.frame such that each 2-digit code is stored in a "division" column and each 4-digit code in a "group" column. When the first two digits are shared, the division and group should be stored in the same row. I apologize for the bad title.

my_list &lt;- list(
  list(id = &quot;80067&quot;, name = &quot;3403 Macromolecular and Materials Chemistry&quot;),
  list(id = &quot;80011&quot;, name = &quot;40 Engineering&quot;),
  list(id = &quot;80005&quot;, name = &quot;34 Chemical Sciences&quot;)
)

Desired output:

data.frame(division = c(&quot;40 Engineering&quot;, &quot;34 Chemical Sciences&quot;), 
           group = c(NA, &quot;3403 Macromolecular and Materials Chemistry&quot;))

答案1

得分: 5

以下是您要翻译的内容:

首先,将您的 my_list 转化为一个向量,然后将其转化为一个两列的数据框,使用 filter 来仅包括 name 列,然后根据数字的模式分配 groupprefix(用于分组到相同的行),最后将结构从 "long" 转换为 "wide"。

library(tidyverse)

unlist(my_list) %>%
  enframe() %>%
  filter(name == "name") %>%
  mutate(group = ifelse(str_count(value, "\\d") == 4, "group", "division"), 
         prefix = str_extract(value, "^\\d{2}"), .keep = "used") %>%
  pivot_wider(names_from = group, values_from = value)

更新:如果我们在开头使用 bind_rows(受 @akrun 答案启发),可以简化上述代码:

bind_rows(my_list) %>%
  mutate(group = ifelse(str_count(name, "\\d") == 4, "group", "division"), 
         prefix = str_extract(name, "^\\d{2}"), .keep = "used") %>%
  pivot_wider(names_from = group, values_from = name)

输出

# A tibble: 2 × 3
  prefix group                                       division            
  <chr>  <chr>                                       <chr>               
1 34     3403 Macromolecular and Materials Chemistry 34 Chemical Sciences
2 40     NA                                          40 Engineering                                        40 Engineering 
英文:

First unlist your my_list into a vector, then enframe it into a two-column dataframe. filter to only include the name column, then assign group and prefix (for grouping into same row) by the patterns of digit. Finally reshape the structure from "long" to "wide".

library(tidyverse)

unlist(my_list) %&gt;% 
  enframe() %&gt;% 
  filter(name == &quot;name&quot;) %&gt;% 
  mutate(group = ifelse(str_count(value, &quot;\\d&quot;) == 4, &quot;group&quot;, &quot;division&quot;), 
         prefix = str_extract(value, &quot;^\\d{2}&quot;), .keep = &quot;used&quot;) %&gt;% 
  pivot_wider(names_from = group, values_from = value)

<hr>

Update: The above code can be simplified by a bit if we use bind_rows (inspired by @akrun's answer) at the beginning:

bind_rows(my_list) %&gt;% 
  mutate(group = ifelse(str_count(name, &quot;\\d&quot;) == 4, &quot;group&quot;, &quot;division&quot;), 
         prefix = str_extract(name, &quot;^\\d{2}&quot;), .keep = &quot;used&quot;) %&gt;% 
  pivot_wider(names_from = group, values_from = name)

<hr>

Output

# A tibble: 2 &#215; 3
  prefix group                                       division            
  &lt;chr&gt;  &lt;chr&gt;                                       &lt;chr&gt;               
1 34     3403 Macromolecular and Materials Chemistry 34 Chemical Sciences
2 40     NA                                          40 Engineering                                        40 Engineering 

答案2

得分: 4

We could use bind_rows to create a data.frame and then do a grouping by substring to extract the components

library(stringr)
library(dplyr)
 bind_rows(my_list) %>%
  group_by(grp = substr(name, 1, 2)) %>%
   summarise(group = str_extract(name, "^\\d{4}\\s+(.*)"),
     division = name, .groups = "keep") %>%
  filter(any(!is.na(group)) & !is.na(group) | n() == 1) %>%
  ungroup %>%
 select(division, group)

-output

# A tibble: 2 × 2
  division                                    group                                      
  <chr>                                       <chr>                                      
1 3403 Macromolecular and Materials Chemistry 3403 Macromolecular and Materials Chemistry
2 40 Engineering                              <NA>                                       
英文:

We could use bind_rows to create a data.frame and then do a grouping by substring to extract the components

library(stringr)
library(dplyr)
 bind_rows(my_list) %&gt;%
  group_by(grp = substr(name, 1, 2)) %&gt;%
   summarise(group = str_extract(name, &quot;^\\d{4}\\s+(.*)&quot;), 
     division = name, .groups = &quot;keep&quot;) %&gt;% 
  filter(any(!is.na(group)) &amp; !is.na(group) | n() == 1) %&gt;% 
  ungroup %&gt;% 
 select(division, group)

-output

# A tibble: 2 &#215; 2
  division                                    group                                      
  &lt;chr&gt;                                       &lt;chr&gt;                                      
1 3403 Macromolecular and Materials Chemistry 3403 Macromolecular and Materials Chemistry
2 40 Engineering                              &lt;NA&gt;       

</details>



huangapple
  • 本文由 发表于 2023年5月11日 14:43:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76224798.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定