英文:
split column in a data frame R
问题
我有一个类似这样的数据框:
物种 | 时间 | 位置 |
---|---|---|
Barbar,Barbar | 9:30 | 1 |
Barbar | 10:37 | 4 |
Barbar,Pippip | 12:03 | 2 |
Barbar,Pippip,Hypsav | 09:52 | 5 |
Pippip,Barbar | 07:45 | 5 |
Barbar,Pippip | 00:00 | 3 |
基本上,我应该创建新的行来分割物种列,当在同一情况下有两个标签时。
例如:如果我有这一行:
物种 | 时间 | 位置 |
---|---|---|
Barbar,Pippip,Hypsav | 09:52 | 5 |
我将获得这些行:
物种 | 时间 | 位置 |
---|---|---|
Barbar | 09:52 | 5 |
Pippip | 09:52 | 5 |
Hypsav | 09:52 | 5 |
因此,使用第一个数据框,我将获得这种结果:
物种 | 时间 | 位置 |
---|---|---|
Barbar | 9:30 | 1 |
Barbar | 9:30 | 1 |
Barbar | 10:37 | 4 |
Barbar | 12:03 | 2 |
Pippip | 12:03 | 2 |
Barbar | 09:52 | 5 |
Pippip | 09:52 | 5 |
Hypsav | 09:52 | 5 |
Pippip | 07:45 | 5 |
Barbar | 07:45 | 5 |
Barbar | 00:00 | 3 |
Pippip | 00:00 | 3 |
英文:
I have a data frame which looks like that :
species | time | loc |
---|---|---|
Barbar,Barbar | 9:30 | 1 |
Barbar | 10:37 | 4 |
Barbar,Pippip | 12:03 | 2 |
Barbar,Pippip,Hypsav | 09:52 | 5 |
Pippip,Barbar | 07:45 | 5 |
Barbar,Pippip | 00:00 | 3 |
Basically I whould create new rows to split the species colums when there two tags in rthe same case.
For example : if I had this row :
species | time | loc |
---|---|---|
Barbar,Pippip,Hypsav | 09:52 | 5 |
I whould obtain these rows :
species | time | loc |
---|---|---|
Barbar | 09:52 | 5 |
Pippip | 09:52 | 5 |
Hypsav | 09:52 | 5 |
So with the first data frame I would obtain this kind of result :
species | time | loc |
---|---|---|
Barbar | 9:30 | 1 |
Barbar | 9:30 | 1 |
Barbar | 10:37 | 4 |
Barbar | 12:03 | 2 |
Pippip | 12:03 | 2 |
Barbar | 09:52 | 5 |
Pippip | 09:52 | 5 |
Hypsav | 09:52 | 5 |
Pippip | 07:45 | 5 |
Barbar | 07:45 | 5 |
Barbar | 00:00 | 3 |
Pippip | 00:00 | 3 |
What can I do to get the result ?
答案1
得分: 2
使用unnest
函数
library(dplyr)
library(tidyr)
df %>%
mutate(species = strsplit(species, ",")) %>%
unnest(species)
数据
df <- structure(list(species = c("Barbar,Barbar", "Barbar", "Barbar,Pippip",
"Barbar,Pippip,Hypsav", "Pippip,Barbar", "Barbar,Pippip"), time = c("9:30",
"10:37", "12:03", "09:52", "07:45", "00:00"), loc = c(1L, 4L,
2L, 5L, 5L, 3L)), class = "data.frame", row.names = c(NA, -6L))
英文:
With unnest
library(dplyr)
library(tidyr)
df %>%
mutate(species = strsplit(species, ",")) %>%
unnest(species)
# A tibble: 12 × 3
species time loc
<chr> <chr> <int>
1 Barbar 9:30 1
2 Barbar 9:30 1
3 Barbar 10:37 4
4 Barbar 12:03 2
5 Pippip 12:03 2
6 Barbar 09:52 5
7 Pippip 09:52 5
8 Hypsav 09:52 5
9 Pippip 07:45 5
10 Barbar 07:45 5
11 Barbar 00:00 3
12 Pippip 00:00 3
Data
df <- structure(list(species = c("Barbar,Barbar", "Barbar", "Barbar,Pippip",
"Barbar,Pippip,Hypsav", "Pippip,Barbar", "Barbar,Pippip"), time = c("9:30",
"10:37", "12:03", "09:52", "07:45", "00:00"), loc = c(1L, 4L,
2L, 5L, 5L, 3L)), class = "data.frame", row.names = c(NA, -6L))
答案2
得分: 1
以下是代码部分的翻译:
尝试使用逗号作为当前分隔符的以下内容:
library(tidyverse)
data_split <- data %>%
separate(species, into = c("species1", "species2", "species3"), sep = ",") %>%
pivot_longer(cols = starts_with("species"), values_to = "species") %>%
filter(!is.na(species)) %>%
select(-name)
print(data_split)
请注意,以上是您提供的代码的翻译部分。
英文:
Try use the following using a ,
as the present separator
library(tidyverse)
data_split <- data %>%
separate(species, into = c("species1", "species2", "species3"), sep = ",") %>%
pivot_longer(cols = starts_with("species"), values_to = "species") %>%
filter(!is.na(species)) %>%
select(-name)
print(data_split)
答案3
得分: 1
或者,您可以使用data.table
的方法:
library(data.table)
# 将df转换为data.table
setDT(df)
# 首先拆分物种并重新分配到同一列
# 然后使用“loc”和“time”对物种进行分发
df[, species := strsplit(x = species, split = ","), ][
, .(species = unlist(species)), by = .(loc, time)]
# loc time species
# 1: 1 9:30 Barbar
# 2: 1 9:30 Barbar
# 3: 4 10:37 Barbar
# 4: 2 12:03 Barbar
# 5: 2 12:03 Pippip
# 6: 5 09:52 Barbar
# 7: 5 09:52 Pippip
# 8: 5 09:52 Hypsav
# 9: 5 07:45 Pippip
#10: 5 07:45 Barbar
#11: 3 00:00 Barbar
#12: 3 00:00 Pippip
根据您的一般工作流程或数据大小,您可以评估哪种方法对您最有效。
英文:
Alternatively, you can use the data.table
approach:
library(data.table)
# convert df to a data.table
setDT(df)
# at first split the species and reassign it to the same column
# then unlist to distribute the species for every "loc" and "time"
df[,species:=strsplit(x = species, split = ","),][
,.(species = unlist(species)), by=.(loc,time)]
# loc time species
# 1: 1 9:30 Barbar
# 2: 1 9:30 Barbar
# 3: 4 10:37 Barbar
# 4: 2 12:03 Barbar
# 5: 2 12:03 Pippip
# 6: 5 09:52 Barbar
# 7: 5 09:52 Pippip
# 8: 5 09:52 Hypsav
# 9: 5 07:45 Pippip
#10: 5 07:45 Barbar
#11: 3 00:00 Barbar
#12: 3 00:00 Pippip
Depending on your general workflow or data size you can evaluate what works best for you.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论