2023年5月11日 14:43:15go评论74阅读模式

英文:

Transforming list to data frame with related data points in same row

问题

Here's the translated code part:

my_list <- list(
  list(id = "80067", name = "3403 Macromolecular and Materials Chemistry"),
  list(id = "80011", name = "40 Engineering"),
  list(id = "80005", name = "34 Chemical Sciences")
)

And the desired output:

data.frame(division = c("40 Engineering", "34 Chemical Sciences"), 
           group = c(NA, "3403 Macromolecular and Materials Chemistry"))

英文:

I have a list that represents field of research information of a single publication. I want to combine the list to a data.frame such that each 2-digit code is stored in a "division" column and each 4-digit code in a "group" column. When the first two digits are shared, the division and group should be stored in the same row. I apologize for the bad title.

my_list &lt;- list(
  list(id = &quot;80067&quot;, name = &quot;3403 Macromolecular and Materials Chemistry&quot;),
  list(id = &quot;80011&quot;, name = &quot;40 Engineering&quot;),
  list(id = &quot;80005&quot;, name = &quot;34 Chemical Sciences&quot;)
)

Desired output:

data.frame(division = c(&quot;40 Engineering&quot;, &quot;34 Chemical Sciences&quot;), 
           group = c(NA, &quot;3403 Macromolecular and Materials Chemistry&quot;))

答案1

得分: 5

以下是您要翻译的内容：

首先，将您的 my_list 转化为一个向量，然后将其转化为一个两列的数据框，使用 filter 来仅包括 name 列，然后根据数字的模式分配 group 和 prefix（用于分组到相同的行），最后将结构从 "long" 转换为 "wide"。

library(tidyverse)

unlist(my_list) %>%
  enframe() %>%
  filter(name == "name") %>%
  mutate(group = ifelse(str_count(value, "\\d") == 4, "group", "division"), 
         prefix = str_extract(value, "^\\d{2}"), .keep = "used") %>%
  pivot_wider(names_from = group, values_from = value)

更新：如果我们在开头使用 bind_rows（受 @akrun 答案启发），可以简化上述代码：

bind_rows(my_list) %>%
  mutate(group = ifelse(str_count(name, "\\d") == 4, "group", "division"), 
         prefix = str_extract(name, "^\\d{2}"), .keep = "used") %>%
  pivot_wider(names_from = group, values_from = name)

输出

# A tibble: 2 × 3
  prefix group                                       division            
  <chr>  <chr>                                       <chr>               
1 34     3403 Macromolecular and Materials Chemistry 34 Chemical Sciences
2 40     NA                                          40 Engineering                                        40 Engineering

英文:

First unlist your my_list into a vector, then enframe it into a two-column dataframe. filter to only include the name column, then assign group and prefix (for grouping into same row) by the patterns of digit. Finally reshape the structure from "long" to "wide".

library(tidyverse)

unlist(my_list) %&gt;% 
  enframe() %&gt;% 
  filter(name == &quot;name&quot;) %&gt;% 
  mutate(group = ifelse(str_count(value, &quot;\\d&quot;) == 4, &quot;group&quot;, &quot;division&quot;), 
         prefix = str_extract(value, &quot;^\\d{2}&quot;), .keep = &quot;used&quot;) %&gt;% 
  pivot_wider(names_from = group, values_from = value)

<hr>

Update: The above code can be simplified by a bit if we use bind_rows (inspired by @akrun's answer) at the beginning:

bind_rows(my_list) %&gt;% 
  mutate(group = ifelse(str_count(name, &quot;\\d&quot;) == 4, &quot;group&quot;, &quot;division&quot;), 
         prefix = str_extract(name, &quot;^\\d{2}&quot;), .keep = &quot;used&quot;) %&gt;% 
  pivot_wider(names_from = group, values_from = name)

<hr>

Output

# A tibble: 2 &#215; 3
  prefix group                                       division            
  &lt;chr&gt;  &lt;chr&gt;                                       &lt;chr&gt;               
1 34     3403 Macromolecular and Materials Chemistry 34 Chemical Sciences
2 40     NA                                          40 Engineering                                        40 Engineering

答案2

得分: 4

We could use bind_rows to create a data.frame and then do a grouping by substring to extract the components

library(stringr)
library(dplyr)
 bind_rows(my_list) %>%
  group_by(grp = substr(name, 1, 2)) %>%
   summarise(group = str_extract(name, "^\\d{4}\\s+(.*)"),
     division = name, .groups = "keep") %>%
  filter(any(!is.na(group)) & !is.na(group) | n() == 1) %>%
  ungroup %>%
 select(division, group)

-output

# A tibble: 2 × 2
  division                                    group                                      
  <chr>                                       <chr>                                      
1 3403 Macromolecular and Materials Chemistry 3403 Macromolecular and Materials Chemistry
2 40 Engineering                              <NA>

英文:

We could use bind_rows to create a data.frame and then do a grouping by substring to extract the components

library(stringr)
library(dplyr)
 bind_rows(my_list) %&gt;%
  group_by(grp = substr(name, 1, 2)) %&gt;%
   summarise(group = str_extract(name, &quot;^\\d{4}\\s+(.*)&quot;), 
     division = name, .groups = &quot;keep&quot;) %&gt;% 
  filter(any(!is.na(group)) &amp; !is.na(group) | n() == 1) %&gt;% 
  ungroup %&gt;% 
 select(division, group)

-output

# A tibble: 2 &#215; 2
  division                                    group                                      
  &lt;chr&gt;                                       &lt;chr&gt;                                      
1 3403 Macromolecular and Materials Chemistry 3403 Macromolecular and Materials Chemistry
2 40 Engineering                              &lt;NA&gt;       

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Transforming list to data frame with related data points in same row

问题

答案1

输出

Output

答案2

比较 pandas 数据框的列，忽略文本前面的数字。

如何通过字典中的关键字(keys)的值来提高文件重命名的效率？

Python 格式化 JSON 以保存到文件

如何在列表元素上进行“分组”（在Base R中）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论