保留最后n列,用分隔符分隔输出。

huangapple go评论120阅读模式
英文:

Keep the last n columns only outputted by separate by delimiter

问题

我有一个包含以下因子变量的数据框:

  1. > head(example.df)
  2. path
  3. 1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt

(虚构的目录)。

我想根据分隔符 / 拆分成单独的列。

我可以使用以下代码实现:

  1. library(tidyverse)
  2. example.df <- example.df %>%
  3. separate(path,
  4. into=c("dir",
  5. "ok",
  6. "hello",
  7. "etc...",
  8. "finally...",
  9. "location",
  10. "category",
  11. "filename"),
  12. sep="/")

尽管如此,我只关心最后两个目录和文件名,或来自分隔函数的最后 3 个结果。因为父目录(高于 location 的部分)可能会更改。我期望的输出如下:

  1. > head(example.df)
  2. location category filename
  3. 1 location1 categoryA eyoshdzjow_random_image.txt

可复制的示例:

  1. example.df <- as.data.frame(
  2. c("C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt",
  3. "C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt")
  4. )
  5. colnames(example.df)<-"path"
英文:

I have a dataframe with the following factor variable:

  1. > head(example.df)
  2. path
  3. 1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt

(made up dirs).

I want to split into separate columns based on a delimiter: /.

I can do this using

  1. library(tidyverse)
  2. example.df <- example.df %>%
  3. separate(path,
  4. into=c("dir",
  5. "ok",
  6. "hello",
  7. "etc...",
  8. "finally...",
  9. "location",
  10. "category",
  11. "filename"),
  12. sep="/")

Although, I am only interested in the last two dirs and the file name or the last 3 results from the separate function. As parent directories (higher than location) may change. My desired output would be:

  1. > head(example.df)
  2. location category filename
  3. 1 location1 categoryA eyoshdzjow_random_image.txt

Reproducible:

  1. example.df <- as.data.frame(
  2. c("C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt",
  3. "C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt")
  4. )
  5. colnames(example.df)<-"path"

答案1

得分: 2

One way in base R is to split string at "/" and select last 3 elements from each list.

  1. as.data.frame(t(sapply(strsplit(as.character(example.df$path), "/"), tail, 3)))

Using tidyverse, we can get the data in long format, select last 3 entries in each row, and get the data in wide format.

  1. library(tidyverse)
  2. example.df %>%
  3. mutate(row = row_number()) %>%
  4. separate_rows(path, sep = "/") %>%
  5. group_by(row) %>%
  6. slice((n() - 2) : n()) %>%
  7. mutate(cols = c('location', 'category', 'filename')) %>%
  8. pivot_wider(names_from = cols, values_from = path) %>%
  9. ungroup() %>%
  10. select(-row)

Or a similar concept as base R but using tidyverse.

  1. example.df %>%
  2. mutate(temp = map(str_split(path, "/"), tail, 3)) %>%
  3. unnest_wider(temp, names_repair = ~paste0("dir", seq_along(.) - 1)) %>%
  4. select(-dir0)
英文:

One way in base R is to split string at "/" and select last 3 elements from each list.

  1. as.data.frame(t(sapply(strsplit(as.character(example.df$path), "/"), tail, 3)))
  2. # V1 V2 V3
  3. #1 location1 categoryA eyoshdzjow_random_image.txt
  4. #2 location2 categoryB jdugnbtudg_random_image.txt

Using tidyverse, we can get the data in long format, select last 3 entries in each row and get the data in wide format.

  1. library(tidyverse)
  2. example.df %>%
  3. mutate(row = row_number()) %>%
  4. separate_rows(path, sep = "/") %>%
  5. group_by(row) %>%
  6. slice((n() - 2) : n()) %>%
  7. mutate(cols = c('location', 'category', 'filename')) %>%
  8. pivot_wider(names_from = cols, values_from = path) %>%
  9. ungroup() %>%
  10. select(-row)
  11. # A tibble: 2 x 3
  12. # location category filename
  13. # <chr> <chr> <chr>
  14. #1 location1 categoryA eyoshdzjow_random_image.txt
  15. #2 location2 categoryB jdugnbtudg_random_image.txt

Or similar concept as base R but using tidyverse

  1. example.df %>%
  2. mutate(temp = map(str_split(path, "/"), tail, 3)) %>%
  3. unnest_wider(temp, names_repair = ~paste0("dir", seq_along(.) - 1)) %>%
  4. select(-dir0)

huangapple
  • 本文由 发表于 2020年1月3日 17:15:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/59575839.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定