2020年1月3日 17:15:49go评论120阅读模式

英文:

Keep the last n columns only outputted by separate by delimiter

问题

我有一个包含以下因子变量的数据框：

&gt; head(example.df)
                                                                                          path
1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt

（虚构的目录）。

我想根据分隔符 / 拆分成单独的列。

我可以使用以下代码实现：

library(tidyverse)
example.df &lt;- example.df %&gt;% 
  separate(path,
           into=c("dir",
                  "ok",
                  "hello",
                  "etc...",
                  "finally...",
                  "location",
                  "category",
                  "filename"),
           sep="/")

尽管如此，我只关心最后两个目录和文件名，或来自分隔函数的最后 3 个结果。因为父目录（高于 location 的部分）可能会更改。我期望的输出如下：

&gt; head(example.df)
       location       category                       filename
1     location1      categoryA    eyoshdzjow_random_image.txt

可复制的示例：

example.df &lt;- as.data.frame(
  c("C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt",
    "C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt")
)
colnames(example.df)&lt;-"path"

英文:

I have a dataframe with the following factor variable:

&gt; head(example.df)
                                                                                      path
1 C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt

(made up dirs).

I want to split into separate columns based on a delimiter: /.

I can do this using

library(tidyverse)
example.df &lt;- example.df %&gt;% 
  separate(path,
           into=c(&quot;dir&quot;,
                  &quot;ok&quot;,
                  &quot;hello&quot;,
                  &quot;etc...&quot;,
                  &quot;finally...&quot;,
                  &quot;location&quot;,
                  &quot;category&quot;,
                  &quot;filename&quot;),
           sep=&quot;/&quot;)

Although, I am only interested in the last two dirs and the file name or the last 3 results from the separate function. As parent directories (higher than location) may change. My desired output would be:

&gt; head(example.df)
       location       category                       filename
1     location1      categoryA    eyoshdzjow_random_image.txt

Reproducible:

example.df &lt;- as.data.frame(
  c(&quot;C:/Users/My PC/pinkhipppos/tinyhorsefeet/location1/categoryA/eyoshdzjow_random_image.txt&quot;,
    &quot;C:/Users/My PC/pinkhipppos/tinyhorsefeet/location2/categoryB/jdugnbtudg_random_image.txt&quot;)
)
colnames(example.df)&lt;-&quot;path&quot;

答案1

得分: 2

One way in base R is to split string at "/" and select last 3 elements from each list.

as.data.frame(t(sapply(strsplit(as.character(example.df$path), "/"), tail, 3)))

Using tidyverse, we can get the data in long format, select last 3 entries in each row, and get the data in wide format.

library(tidyverse)
example.df %>%
  mutate(row = row_number()) %>%
  separate_rows(path, sep = "/") %>%
  group_by(row) %>%
  slice((n() - 2) : n()) %>%
  mutate(cols = c('location', 'category', 'filename')) %>%
  pivot_wider(names_from = cols, values_from = path) %>%
  ungroup() %>%
  select(-row)

Or a similar concept as base R but using tidyverse.

example.df %>%
  mutate(temp = map(str_split(path, "/"), tail, 3)) %>%
  unnest_wider(temp, names_repair = ~paste0("dir", seq_along(.) - 1)) %>%
  select(-dir0)

英文:

One way in base R is to split string at "/" and select last 3 elements from each list.

as.data.frame(t(sapply(strsplit(as.character(example.df$path), &quot;/&quot;), tail, 3)))
#         V1        V2                          V3
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt

Using tidyverse, we can get the data in long format, select last 3 entries in each row and get the data in wide format.

library(tidyverse)
example.df %&gt;%
  mutate(row = row_number()) %&gt;%
  separate_rows(path, sep = &quot;/&quot;) %&gt;%
  group_by(row) %&gt;%
  slice((n() - 2) : n()) %&gt;%
  mutate(cols = c(&#39;location&#39;, &#39;category&#39;, &#39;filename&#39;)) %&gt;%
  pivot_wider(names_from = cols, values_from = path) %&gt;%
  ungroup() %&gt;%
  select(-row)
# A tibble: 2 x 3
#  location  category  filename                   
#  &lt;chr&gt;     &lt;chr&gt;     &lt;chr&gt;                      
#1 location1 categoryA eyoshdzjow_random_image.txt
#2 location2 categoryB jdugnbtudg_random_image.txt

Or similar concept as base R but using tidyverse

example.df %&gt;%
  mutate(temp = map(str_split(path, &quot;/&quot;), tail, 3)) %&gt;%
  unnest_wider(temp, names_repair = ~paste0(&quot;dir&quot;, seq_along(.) - 1)) %&gt;%
  select(-dir0)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

保留最后n列，用分隔符分隔输出。

问题

答案1

如何在列范围下联合更改值，并在其他列中分别更改。

Y轴标题中的换行

可以移除代码前面的所有’>’吗？

重塑数据框中的字符串在 R 中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。