保留最后n列,用分隔符分隔输出。

huangapple go评论89阅读模式
英文:

Keep the last n columns only outputted by separate by delimiter

问题

我有一个包含以下因子变量的数据框:

> head(example.df)
                                                                                          path
1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt

(虚构的目录)。

我想根据分隔符 / 拆分成单独的列。

我可以使用以下代码实现:

library(tidyverse)

example.df <- example.df %>% 
  separate(path,
           into=c("dir",
                  "ok",
                  "hello",
                  "etc...",
                  "finally...",
                  "location",
                  "category",
                  "filename"),
           sep="/")

尽管如此,我只关心最后两个目录和文件名,或来自分隔函数的最后 3 个结果。因为父目录(高于 location 的部分)可能会更改。我期望的输出如下:

> head(example.df)
       location       category                       filename
1     location1      categoryA    eyoshdzjow_random_image.txt

可复制的示例:

example.df <- as.data.frame(
  c("C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt",
    "C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt")
)

colnames(example.df)<-"path"
英文:

I have a dataframe with the following factor variable:

> head(example.df)
                                                                                      path
1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt

(made up dirs).

I want to split into separate columns based on a delimiter: /.

I can do this using

library(tidyverse)

example.df <- example.df %>% 
  separate(path,
           into=c("dir",
                  "ok",
                  "hello",
                  "etc...",
                  "finally...",
                  "location",
                  "category",
                  "filename"),
           sep="/")

Although, I am only interested in the last two dirs and the file name or the last 3 results from the separate function. As parent directories (higher than location) may change. My desired output would be:

> head(example.df)
       location       category                       filename
1     location1      categoryA    eyoshdzjow_random_image.txt

Reproducible:

example.df <- as.data.frame(
  c("C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt",
    "C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt")
)

colnames(example.df)<-"path"

答案1

得分: 2

One way in base R is to split string at "/" and select last 3 elements from each list.

as.data.frame(t(sapply(strsplit(as.character(example.df$path), "/"), tail, 3)))

Using tidyverse, we can get the data in long format, select last 3 entries in each row, and get the data in wide format.

library(tidyverse)

example.df %>%
  mutate(row = row_number()) %>%
  separate_rows(path, sep = "/") %>%
  group_by(row) %>%
  slice((n() - 2) : n()) %>%
  mutate(cols = c('location', 'category', 'filename')) %>%
  pivot_wider(names_from = cols, values_from = path) %>%
  ungroup() %>%
  select(-row)

Or a similar concept as base R but using tidyverse.

example.df %>%
  mutate(temp = map(str_split(path, "/"), tail, 3)) %>%
  unnest_wider(temp, names_repair = ~paste0("dir", seq_along(.) - 1)) %>%
  select(-dir0)
英文:

One way in base R is to split string at "/" and select last 3 elements from each list.

as.data.frame(t(sapply(strsplit(as.character(example.df$path), "/"), tail, 3)))

#         V1        V2                          V3
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt

Using tidyverse, we can get the data in long format, select last 3 entries in each row and get the data in wide format.

library(tidyverse)

example.df %>%
  mutate(row = row_number()) %>%
  separate_rows(path, sep = "/") %>%
  group_by(row) %>%
  slice((n() - 2) : n()) %>%
  mutate(cols = c('location', 'category', 'filename')) %>%
  pivot_wider(names_from = cols, values_from = path) %>%
  ungroup() %>%
  select(-row)

# A tibble: 2 x 3
#  location  category  filename                   
#  <chr>     <chr>     <chr>                      
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt

Or similar concept as base R but using tidyverse

example.df %>%
  mutate(temp = map(str_split(path, "/"), tail, 3)) %>%
  unnest_wider(temp, names_repair = ~paste0("dir", seq_along(.) - 1)) %>%
  select(-dir0)

huangapple
  • 本文由 发表于 2020年1月3日 17:15:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/59575839.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定