在R中拆分数据框的列。

huangapple go评论69阅读模式
英文:

split column in a data frame R

问题

我有一个类似这样的数据框:

物种 时间 位置
Barbar,Barbar 9:30 1
Barbar 10:37 4
Barbar,Pippip 12:03 2
Barbar,Pippip,Hypsav 09:52 5
Pippip,Barbar 07:45 5
Barbar,Pippip 00:00 3

基本上,我应该创建新的行来分割物种列,当在同一情况下有两个标签时。

例如:如果我有这一行:

物种 时间 位置
Barbar,Pippip,Hypsav 09:52 5

我将获得这些行:

物种 时间 位置
Barbar 09:52 5
Pippip 09:52 5
Hypsav 09:52 5

因此,使用第一个数据框,我将获得这种结果:

物种 时间 位置
Barbar 9:30 1
Barbar 9:30 1
Barbar 10:37 4
Barbar 12:03 2
Pippip 12:03 2
Barbar 09:52 5
Pippip 09:52 5
Hypsav 09:52 5
Pippip 07:45 5
Barbar 07:45 5
Barbar 00:00 3
Pippip 00:00 3
英文:

I have a data frame which looks like that :

species time loc
Barbar,Barbar 9:30 1
Barbar 10:37 4
Barbar,Pippip 12:03 2
Barbar,Pippip,Hypsav 09:52 5
Pippip,Barbar 07:45 5
Barbar,Pippip 00:00 3

Basically I whould create new rows to split the species colums when there two tags in rthe same case.

For example : if I had this row :

species time loc
Barbar,Pippip,Hypsav 09:52 5

I whould obtain these rows :

species time loc
Barbar 09:52 5
Pippip 09:52 5
Hypsav 09:52 5

So with the first data frame I would obtain this kind of result :

species time loc
Barbar 9:30 1
Barbar 9:30 1
Barbar 10:37 4
Barbar 12:03 2
Pippip 12:03 2
Barbar 09:52 5
Pippip 09:52 5
Hypsav 09:52 5
Pippip 07:45 5
Barbar 07:45 5
Barbar 00:00 3
Pippip 00:00 3

What can I do to get the result ?

答案1

得分: 2

使用unnest函数

library(dplyr)
library(tidyr)

df %>%
  mutate(species = strsplit(species, ",")) %>%
  unnest(species)

数据

df <- structure(list(species = c("Barbar,Barbar", "Barbar", "Barbar,Pippip", 
"Barbar,Pippip,Hypsav", "Pippip,Barbar", "Barbar,Pippip"), time = c("9:30", 
"10:37", "12:03", "09:52", "07:45", "00:00"), loc = c(1L, 4L, 
2L, 5L, 5L, 3L)), class = "data.frame", row.names = c(NA, -6L))
英文:

With unnest

library(dplyr)
library(tidyr)

df %&gt;% 
  mutate(species = strsplit(species, &quot;,&quot;)) %&gt;% 
  unnest(species)
# A tibble: 12 &#215; 3
   species time    loc
   &lt;chr&gt;   &lt;chr&gt; &lt;int&gt;
 1 Barbar  9:30      1
 2 Barbar  9:30      1
 3 Barbar  10:37     4
 4 Barbar  12:03     2
 5 Pippip  12:03     2
 6 Barbar  09:52     5
 7 Pippip  09:52     5
 8 Hypsav  09:52     5
 9 Pippip  07:45     5
10 Barbar  07:45     5
11 Barbar  00:00     3
12 Pippip  00:00     3

Data

df &lt;- structure(list(species = c(&quot;Barbar,Barbar&quot;, &quot;Barbar&quot;, &quot;Barbar,Pippip&quot;, 
&quot;Barbar,Pippip,Hypsav&quot;, &quot;Pippip,Barbar&quot;, &quot;Barbar,Pippip&quot;), time = c(&quot;9:30&quot;, 
&quot;10:37&quot;, &quot;12:03&quot;, &quot;09:52&quot;, &quot;07:45&quot;, &quot;00:00&quot;), loc = c(1L, 4L, 
2L, 5L, 5L, 3L)), class = &quot;data.frame&quot;, row.names = c(NA, -6L))

答案2

得分: 1

以下是代码部分的翻译:

尝试使用逗号作为当前分隔符的以下内容:

library(tidyverse)

data_split <- data %>%
  separate(species, into = c("species1", "species2", "species3"), sep = ",") %>%
  pivot_longer(cols = starts_with("species"), values_to = "species") %>%
  filter(!is.na(species)) %>%
  select(-name)
  print(data_split)

请注意,以上是您提供的代码的翻译部分。

英文:

Try use the following using a , as the present separator

library(tidyverse)

data_split &lt;- data %&gt;%
  separate(species, into = c(&quot;species1&quot;, &quot;species2&quot;, &quot;species3&quot;), sep = &quot;,&quot;) %&gt;%
  pivot_longer(cols = starts_with(&quot;species&quot;), values_to = &quot;species&quot;) %&gt;%
  filter(!is.na(species)) %&gt;%
  select(-name)
  print(data_split)

答案3

得分: 1

或者,您可以使用data.table的方法:

library(data.table)

# 将df转换为data.table
setDT(df)

# 首先拆分物种并重新分配到同一列
# 然后使用“loc”和“time”对物种进行分发
df[, species := strsplit(x = species, split = ","), ][
  , .(species = unlist(species)), by = .(loc, time)]

#    loc  time species
# 1:   1  9:30  Barbar
# 2:   1  9:30  Barbar
# 3:   4 10:37  Barbar
# 4:   2 12:03  Barbar
# 5:   2 12:03  Pippip
# 6:   5 09:52  Barbar
# 7:   5 09:52  Pippip
# 8:   5 09:52  Hypsav
# 9:   5 07:45  Pippip
#10:   5 07:45  Barbar
#11:   3 00:00  Barbar
#12:   3 00:00  Pippip

根据您的一般工作流程或数据大小,您可以评估哪种方法对您最有效。

英文:

Alternatively, you can use the data.table approach:

library(data.table)

# convert df to a data.table
setDT(df)

# at first split the species and reassign it to the same column
# then unlist to distribute the species for every &quot;loc&quot; and &quot;time&quot;
df[,species:=strsplit(x = species, split = &quot;,&quot;),][
  ,.(species = unlist(species)), by=.(loc,time)]

#    loc  time species
# 1:   1  9:30  Barbar
# 2:   1  9:30  Barbar
# 3:   4 10:37  Barbar
# 4:   2 12:03  Barbar
# 5:   2 12:03  Pippip
# 6:   5 09:52  Barbar
# 7:   5 09:52  Pippip
# 8:   5 09:52  Hypsav
# 9:   5 07:45  Pippip
#10:   5 07:45  Barbar
#11:   3 00:00  Barbar
#12:   3 00:00  Pippip

Depending on your general workflow or data size you can evaluate what works best for you.

huangapple
  • 本文由 发表于 2023年4月17日 20:35:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76035216.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定