在R中拆分数据框的列。

huangapple go评论102阅读模式
英文:

split column in a data frame R

问题

我有一个类似这样的数据框:

物种 时间 位置
Barbar,Barbar 9:30 1
Barbar 10:37 4
Barbar,Pippip 12:03 2
Barbar,Pippip,Hypsav 09:52 5
Pippip,Barbar 07:45 5
Barbar,Pippip 00:00 3

基本上,我应该创建新的行来分割物种列,当在同一情况下有两个标签时。

例如:如果我有这一行:

物种 时间 位置
Barbar,Pippip,Hypsav 09:52 5

我将获得这些行:

物种 时间 位置
Barbar 09:52 5
Pippip 09:52 5
Hypsav 09:52 5

因此,使用第一个数据框,我将获得这种结果:

物种 时间 位置
Barbar 9:30 1
Barbar 9:30 1
Barbar 10:37 4
Barbar 12:03 2
Pippip 12:03 2
Barbar 09:52 5
Pippip 09:52 5
Hypsav 09:52 5
Pippip 07:45 5
Barbar 07:45 5
Barbar 00:00 3
Pippip 00:00 3
英文:

I have a data frame which looks like that :

species time loc
Barbar,Barbar 9:30 1
Barbar 10:37 4
Barbar,Pippip 12:03 2
Barbar,Pippip,Hypsav 09:52 5
Pippip,Barbar 07:45 5
Barbar,Pippip 00:00 3

Basically I whould create new rows to split the species colums when there two tags in rthe same case.

For example : if I had this row :

species time loc
Barbar,Pippip,Hypsav 09:52 5

I whould obtain these rows :

species time loc
Barbar 09:52 5
Pippip 09:52 5
Hypsav 09:52 5

So with the first data frame I would obtain this kind of result :

species time loc
Barbar 9:30 1
Barbar 9:30 1
Barbar 10:37 4
Barbar 12:03 2
Pippip 12:03 2
Barbar 09:52 5
Pippip 09:52 5
Hypsav 09:52 5
Pippip 07:45 5
Barbar 07:45 5
Barbar 00:00 3
Pippip 00:00 3

What can I do to get the result ?

答案1

得分: 2

使用unnest函数

  1. library(dplyr)
  2. library(tidyr)
  3. df %>%
  4. mutate(species = strsplit(species, ",")) %>%
  5. unnest(species)

数据

  1. df <- structure(list(species = c("Barbar,Barbar", "Barbar", "Barbar,Pippip",
  2. "Barbar,Pippip,Hypsav", "Pippip,Barbar", "Barbar,Pippip"), time = c("9:30",
  3. "10:37", "12:03", "09:52", "07:45", "00:00"), loc = c(1L, 4L,
  4. 2L, 5L, 5L, 3L)), class = "data.frame", row.names = c(NA, -6L))
英文:

With unnest

  1. library(dplyr)
  2. library(tidyr)
  3. df %&gt;%
  4. mutate(species = strsplit(species, &quot;,&quot;)) %&gt;%
  5. unnest(species)
  6. # A tibble: 12 &#215; 3
  7. species time loc
  8. &lt;chr&gt; &lt;chr&gt; &lt;int&gt;
  9. 1 Barbar 9:30 1
  10. 2 Barbar 9:30 1
  11. 3 Barbar 10:37 4
  12. 4 Barbar 12:03 2
  13. 5 Pippip 12:03 2
  14. 6 Barbar 09:52 5
  15. 7 Pippip 09:52 5
  16. 8 Hypsav 09:52 5
  17. 9 Pippip 07:45 5
  18. 10 Barbar 07:45 5
  19. 11 Barbar 00:00 3
  20. 12 Pippip 00:00 3

Data

  1. df &lt;- structure(list(species = c(&quot;Barbar,Barbar&quot;, &quot;Barbar&quot;, &quot;Barbar,Pippip&quot;,
  2. &quot;Barbar,Pippip,Hypsav&quot;, &quot;Pippip,Barbar&quot;, &quot;Barbar,Pippip&quot;), time = c(&quot;9:30&quot;,
  3. &quot;10:37&quot;, &quot;12:03&quot;, &quot;09:52&quot;, &quot;07:45&quot;, &quot;00:00&quot;), loc = c(1L, 4L,
  4. 2L, 5L, 5L, 3L)), class = &quot;data.frame&quot;, row.names = c(NA, -6L))

答案2

得分: 1

以下是代码部分的翻译:

尝试使用逗号作为当前分隔符的以下内容:

  1. library(tidyverse)
  2. data_split <- data %>%
  3. separate(species, into = c("species1", "species2", "species3"), sep = ",") %>%
  4. pivot_longer(cols = starts_with("species"), values_to = "species") %>%
  5. filter(!is.na(species)) %>%
  6. select(-name)
  7. print(data_split)

请注意,以上是您提供的代码的翻译部分。

英文:

Try use the following using a , as the present separator

  1. library(tidyverse)
  2. data_split &lt;- data %&gt;%
  3. separate(species, into = c(&quot;species1&quot;, &quot;species2&quot;, &quot;species3&quot;), sep = &quot;,&quot;) %&gt;%
  4. pivot_longer(cols = starts_with(&quot;species&quot;), values_to = &quot;species&quot;) %&gt;%
  5. filter(!is.na(species)) %&gt;%
  6. select(-name)
  7. print(data_split)

答案3

得分: 1

或者,您可以使用data.table的方法:

  1. library(data.table)
  2. # 将df转换为data.table
  3. setDT(df)
  4. # 首先拆分物种并重新分配到同一列
  5. # 然后使用“loc”和“time”对物种进行分发
  6. df[, species := strsplit(x = species, split = ","), ][
  7. , .(species = unlist(species)), by = .(loc, time)]
  8. # loc time species
  9. # 1: 1 9:30 Barbar
  10. # 2: 1 9:30 Barbar
  11. # 3: 4 10:37 Barbar
  12. # 4: 2 12:03 Barbar
  13. # 5: 2 12:03 Pippip
  14. # 6: 5 09:52 Barbar
  15. # 7: 5 09:52 Pippip
  16. # 8: 5 09:52 Hypsav
  17. # 9: 5 07:45 Pippip
  18. #10: 5 07:45 Barbar
  19. #11: 3 00:00 Barbar
  20. #12: 3 00:00 Pippip

根据您的一般工作流程或数据大小,您可以评估哪种方法对您最有效。

英文:

Alternatively, you can use the data.table approach:

  1. library(data.table)
  2. # convert df to a data.table
  3. setDT(df)
  4. # at first split the species and reassign it to the same column
  5. # then unlist to distribute the species for every &quot;loc&quot; and &quot;time&quot;
  6. df[,species:=strsplit(x = species, split = &quot;,&quot;),][
  7. ,.(species = unlist(species)), by=.(loc,time)]
  8. # loc time species
  9. # 1: 1 9:30 Barbar
  10. # 2: 1 9:30 Barbar
  11. # 3: 4 10:37 Barbar
  12. # 4: 2 12:03 Barbar
  13. # 5: 2 12:03 Pippip
  14. # 6: 5 09:52 Barbar
  15. # 7: 5 09:52 Pippip
  16. # 8: 5 09:52 Hypsav
  17. # 9: 5 07:45 Pippip
  18. #10: 5 07:45 Barbar
  19. #11: 3 00:00 Barbar
  20. #12: 3 00:00 Pippip

Depending on your general workflow or data size you can evaluate what works best for you.

huangapple
  • 本文由 发表于 2023年4月17日 20:35:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76035216.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定