英文:
Keep the last n columns only outputted by separate by delimiter
问题
我有一个包含以下因子变量的数据框:
> head(example.df)
path
1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt
(虚构的目录)。
我想根据分隔符 /
拆分成单独的列。
我可以使用以下代码实现:
library(tidyverse)
example.df <- example.df %>%
separate(path,
into=c("dir",
"ok",
"hello",
"etc...",
"finally...",
"location",
"category",
"filename"),
sep="/")
尽管如此,我只关心最后两个目录和文件名,或来自分隔函数的最后 3 个结果。因为父目录(高于 location 的部分)可能会更改。我期望的输出如下:
> head(example.df)
location category filename
1 location1 categoryA eyoshdzjow_random_image.txt
可复制的示例:
example.df <- as.data.frame(
c("C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt",
"C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt")
)
colnames(example.df)<-"path"
英文:
I have a dataframe with the following factor variable:
> head(example.df)
path
1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt
(made up dirs).
I want to split into separate columns based on a delimiter: /
.
I can do this using
library(tidyverse)
example.df <- example.df %>%
separate(path,
into=c("dir",
"ok",
"hello",
"etc...",
"finally...",
"location",
"category",
"filename"),
sep="/")
Although, I am only interested in the last two dirs and the file name or the last 3 results from the separate function. As parent directories (higher than location) may change. My desired output would be:
> head(example.df)
location category filename
1 location1 categoryA eyoshdzjow_random_image.txt
Reproducible:
example.df <- as.data.frame(
c("C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt",
"C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt")
)
colnames(example.df)<-"path"
答案1
得分: 2
One way in base R is to split string at "/" and select last 3 elements from each list.
as.data.frame(t(sapply(strsplit(as.character(example.df$path), "/"), tail, 3)))
Using tidyverse, we can get the data in long format, select last 3 entries in each row, and get the data in wide format.
library(tidyverse)
example.df %>%
mutate(row = row_number()) %>%
separate_rows(path, sep = "/") %>%
group_by(row) %>%
slice((n() - 2) : n()) %>%
mutate(cols = c('location', 'category', 'filename')) %>%
pivot_wider(names_from = cols, values_from = path) %>%
ungroup() %>%
select(-row)
Or a similar concept as base R but using tidyverse.
example.df %>%
mutate(temp = map(str_split(path, "/"), tail, 3)) %>%
unnest_wider(temp, names_repair = ~paste0("dir", seq_along(.) - 1)) %>%
select(-dir0)
英文:
One way in base R is to split string at "/"
and select last 3 elements from each list.
as.data.frame(t(sapply(strsplit(as.character(example.df$path), "/"), tail, 3)))
# V1 V2 V3
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt
Using tidyverse
, we can get the data in long format, select last 3 entries in each row and get the data in wide format.
library(tidyverse)
example.df %>%
mutate(row = row_number()) %>%
separate_rows(path, sep = "/") %>%
group_by(row) %>%
slice((n() - 2) : n()) %>%
mutate(cols = c('location', 'category', 'filename')) %>%
pivot_wider(names_from = cols, values_from = path) %>%
ungroup() %>%
select(-row)
# A tibble: 2 x 3
# location category filename
# <chr> <chr> <chr>
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt
Or similar concept as base R but using tidyverse
example.df %>%
mutate(temp = map(str_split(path, "/"), tail, 3)) %>%
unnest_wider(temp, names_repair = ~paste0("dir", seq_along(.) - 1)) %>%
select(-dir0)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论