2023年4月17日 20:35:33go评论102阅读模式

英文:

split column in a data frame R

问题

我有一个类似这样的数据框：

物种	时间	位置
Barbar,Barbar	9:30	1
Barbar	10:37	4
Barbar,Pippip	12:03	2
Barbar,Pippip,Hypsav	09:52	5
Pippip,Barbar	07:45	5
Barbar,Pippip	00:00	3

基本上，我应该创建新的行来分割物种列，当在同一情况下有两个标签时。

例如：如果我有这一行：

物种	时间	位置
Barbar,Pippip,Hypsav	09:52	5

我将获得这些行：

物种	时间	位置
Barbar	09:52	5
Pippip	09:52	5
Hypsav	09:52	5

因此，使用第一个数据框，我将获得这种结果：

物种	时间	位置
Barbar	9:30	1
Barbar	9:30	1
Barbar	10:37	4
Barbar	12:03	2
Pippip	12:03	2
Barbar	09:52	5
Pippip	09:52	5
Hypsav	09:52	5
Pippip	07:45	5
Barbar	07:45	5
Barbar	00:00	3
Pippip	00:00	3

英文:

I have a data frame which looks like that :

species	time	loc
Barbar,Barbar	9:30	1
Barbar	10:37	4
Barbar,Pippip	12:03	2
Barbar,Pippip,Hypsav	09:52	5
Pippip,Barbar	07:45	5
Barbar,Pippip	00:00	3

Basically I whould create new rows to split the species colums when there two tags in rthe same case.

For example : if I had this row :

species	time	loc
Barbar,Pippip,Hypsav	09:52	5

I whould obtain these rows :

species	time	loc
Barbar	09:52	5
Pippip	09:52	5
Hypsav	09:52	5

So with the first data frame I would obtain this kind of result :

species	time	loc
Barbar	9:30	1
Barbar	9:30	1
Barbar	10:37	4
Barbar	12:03	2
Pippip	12:03	2
Barbar	09:52	5
Pippip	09:52	5
Hypsav	09:52	5
Pippip	07:45	5
Barbar	07:45	5
Barbar	00:00	3
Pippip	00:00	3

What can I do to get the result ?

答案1

得分: 2

使用unnest函数

library(dplyr)
library(tidyr)
df %>%
  mutate(species = strsplit(species, ",")) %>%
  unnest(species)

数据

df <- structure(list(species = c("Barbar,Barbar", "Barbar", "Barbar,Pippip", 
"Barbar,Pippip,Hypsav", "Pippip,Barbar", "Barbar,Pippip"), time = c("9:30", 
"10:37", "12:03", "09:52", "07:45", "00:00"), loc = c(1L, 4L, 
2L, 5L, 5L, 3L)), class = "data.frame", row.names = c(NA, -6L))

英文:

With unnest

library(dplyr)
library(tidyr)
df %&gt;% 
  mutate(species = strsplit(species, &quot;,&quot;)) %&gt;% 
  unnest(species)
# A tibble: 12 &#215; 3
   species time    loc
   &lt;chr&gt;   &lt;chr&gt; &lt;int&gt;
 1 Barbar  9:30      1
 2 Barbar  9:30      1
 3 Barbar  10:37     4
 4 Barbar  12:03     2
 5 Pippip  12:03     2
 6 Barbar  09:52     5
 7 Pippip  09:52     5
 8 Hypsav  09:52     5
 9 Pippip  07:45     5
10 Barbar  07:45     5
11 Barbar  00:00     3
12 Pippip  00:00     3

Data

df &lt;- structure(list(species = c(&quot;Barbar,Barbar&quot;, &quot;Barbar&quot;, &quot;Barbar,Pippip&quot;, 
&quot;Barbar,Pippip,Hypsav&quot;, &quot;Pippip,Barbar&quot;, &quot;Barbar,Pippip&quot;), time = c(&quot;9:30&quot;, 
&quot;10:37&quot;, &quot;12:03&quot;, &quot;09:52&quot;, &quot;07:45&quot;, &quot;00:00&quot;), loc = c(1L, 4L, 
2L, 5L, 5L, 3L)), class = &quot;data.frame&quot;, row.names = c(NA, -6L))

答案2

得分: 1

以下是代码部分的翻译：

尝试使用逗号作为当前分隔符的以下内容：

library(tidyverse)
data_split <- data %>%
  separate(species, into = c("species1", "species2", "species3"), sep = ",") %>%
  pivot_longer(cols = starts_with("species"), values_to = "species") %>%
  filter(!is.na(species)) %>%
  select(-name)
  print(data_split)

请注意，以上是您提供的代码的翻译部分。

英文:

Try use the following using a , as the present separator

library(tidyverse)
data_split &lt;- data %&gt;%
  separate(species, into = c(&quot;species1&quot;, &quot;species2&quot;, &quot;species3&quot;), sep = &quot;,&quot;) %&gt;%
  pivot_longer(cols = starts_with(&quot;species&quot;), values_to = &quot;species&quot;) %&gt;%
  filter(!is.na(species)) %&gt;%
  select(-name)
  print(data_split)

答案3

得分: 1

或者，您可以使用data.table的方法：

library(data.table)
# 将df转换为data.table
setDT(df)
# 首先拆分物种并重新分配到同一列
# 然后使用“loc”和“time”对物种进行分发
df[, species := strsplit(x = species, split = ","), ][
  , .(species = unlist(species)), by = .(loc, time)]
#    loc  time species
# 1:   1  9:30  Barbar
# 2:   1  9:30  Barbar
# 3:   4 10:37  Barbar
# 4:   2 12:03  Barbar
# 5:   2 12:03  Pippip
# 6:   5 09:52  Barbar
# 7:   5 09:52  Pippip
# 8:   5 09:52  Hypsav
# 9:   5 07:45  Pippip
#10:   5 07:45  Barbar
#11:   3 00:00  Barbar
#12:   3 00:00  Pippip

根据您的一般工作流程或数据大小，您可以评估哪种方法对您最有效。

英文:

Alternatively, you can use the data.table approach:

library(data.table)
# convert df to a data.table
setDT(df)
# at first split the species and reassign it to the same column
# then unlist to distribute the species for every &quot;loc&quot; and &quot;time&quot;
df[,species:=strsplit(x = species, split = &quot;,&quot;),][
  ,.(species = unlist(species)), by=.(loc,time)]
#    loc  time species
# 1:   1  9:30  Barbar
# 2:   1  9:30  Barbar
# 3:   4 10:37  Barbar
# 4:   2 12:03  Barbar
# 5:   2 12:03  Pippip
# 6:   5 09:52  Barbar
# 7:   5 09:52  Pippip
# 8:   5 09:52  Hypsav
# 9:   5 07:45  Pippip
#10:   5 07:45  Barbar
#11:   3 00:00  Barbar
#12:   3 00:00  Pippip

Depending on your general workflow or data size you can evaluate what works best for you.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中拆分数据框的列。

问题

答案1

数据

Data

答案2

答案3

在Excel列中字符串的出现次数 – R

分开更宽的部分，第一半成为列名，第二半成为单元格值。

如何可能将QCA_mm类对象强制转换为表格？

返回列表中向量的特定元素

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。