2023年2月24日 17:38:39go评论99阅读模式

英文:

Read files in pairs in R using regular expression

问题

files <- list(
c("postgwas2hmp_Extr_Soil_C_Ratio_top1.txt", "manhattan_Extr_Soil_C_Ratio__top1.txt"),
c("postgwas2hmp_Extr_Soil_P_Ratio_top1.txt", "manhattan_Extr_Soil_P_Ratio__top1.txt"),
c("postgwas2hmp_Total_Soil_D_Ratio_top1.txt", "manhattan_Total_Soil_D_Ratio__top1.txt"),
c("postgwas2hmp_Extr_Soil_E_Ratio_top1.txt", "manhattan_Extr_Soil_E_Ratio__top1.txt")
)

files <- list(
c(grep("postgwas2hmp_.\.txt$", dir(), value = TRUE),
grep("^manhattan_.\.txt$", dir(), value = TRUE))
)

str(files)
List of 4
$ : chr [1:2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt" "manhattan_Extr_Soil_C_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Extr_Soil_P_Ratio_top1.txt" "manhattan_Extr_Soil_P_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Total_Soil_D_Ratio_top1.txt" "manhattan_Total_Soil_D_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Extr_Soil_E_Ratio_top1.txt" "manhattan_Extr_Soil_E_Ratio__top1.txt"

英文:

I am trying to list files in pairs in R but my way is a bit messy and I have to provide all the file names manually in pairs.

Currently I am doing like this:

files &lt;- list(
  c(&quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot;,&quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;),
  c(&quot;postgwas2hmp_Extr_Soil_P_Ratio_top1.txt&quot;,&quot;manhattan_Extr_Soil_P_Ratio__top1.txt&quot;),
  c(&quot;postgwas2hmp_Total_Soil_D_Ratio_top1.txt&quot;,&quot;manhattan_Total_Soil_D_Ratio__top1.txt&quot;),
  c(&quot;postgwas2hmp_Extr_Soil_E_Ratio_top1.txt&quot;,&quot;manhattan_Extr_Soil_E_Ratio__top1.txt&quot;)
  )

And then I am using these files in a r function. It is working fine but is there any way I just need to reads all these files in pairs using regular expression just in one line something like this:

files &lt;- list(
  c(&quot;postgwas2hmp_*//.txt$&quot;,&quot;^manhattan_.*\\.txt$&quot;)
  )

This second code is not working but I want something like this to avoid listing all the files individually.

And What I finally want to have after list calling:

str(files)
List of 4
 $ : chr [1:2] &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot; &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;
 $ : chr [1:2] &quot;postgwas2hmp_Extr_Soil_P_Ratio_top1.txt&quot; &quot;manhattan_Extr_Soil_P_Ratio__top1.txt&quot;
 $ : chr [1:2] &quot;postgwas2hmp_Total_Soil_D_Ratio_top1.txt&quot; &quot;manhattan_Total_Soil_D_Ratio__top1.txt&quot;
 $ : chr [1:2] &quot;postgwas2hmp_Extr_Soil_E_Ratio_top1.txt&quot; &quot;manhattan_Extr_Soil_E_Ratio__top1.txt&quot;

Thanks,

答案1

得分: 1

如果您的文件都在同一个目录下，我们可以使用 list.files() 并提供一个正则表达式 pattern 来实现。这应该可以工作（假设文件在您的工作目录中）。下面我将以下两个文件放在我的当前工作目录中："postgwas2hmp_Extr_Soil_C_Ratio_top1.txt" 和 "manhattan_Extr_Soil_C_Ratio__top1.txt"。然后，list.files() 的结果如下所示：

list.files(pattern = "^(manhattan|postgwas2hmp)_.*\\.txt$")
#> [1] "manhattan_Extr_Soil_C_Ratio__top1.txt"  
#> [2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"

要生成一个包含相同字母的两个元素的列表，我们可以使用以下方法：

x <- list.files(pattern = "^(manhattan|postgwas2hmp)_Extr_Soil_.*\\.txt$")
letrs <- unique(gsub("^(manhattan|postgwas2hmp)_Extr_Soil_([A-z]+).*", "\", x))
lapply(letrs,
       \(x) list.files(pattern = paste0("^(manhattan|postgwas2hmp)_Extr_Soil_", x, "_.*\\.txt$"))
       )
#> [[1]]
#> [1] "manhattan_Extr_Soil_B_Ratio__top1.txt"  
#> [2] "postgwas2hmp_Extr_Soil_B_Ratio_top1.txt"
#> 
#> [[2]]
#> [1] "manhattan_Extr_Soil_C_Ratio__top1.txt"  
#> [2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"

^{创建于2023-02-24，使用 reprex package（v2.0.1）}

英文:

If your files are all in the same directoy we can use list.files() and provide a regex pattern. This should work (given the files are in your working directory). Below I put the following two files in my current working directory: "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt" and "manhattan_Extr_Soil_C_Ratio__top1.txt". Then the result of list.files() is as follows:

list.files(pattern = &quot;^(manhattan|postgwas2hmp)_.*\\.txt$&quot;)
#&gt; [1] &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;  
#&gt; [2] &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot;

To generate a list with two elements containing the same letter we can use the following approach:

x &lt;- list.files(pattern = &quot;^(manhattan|postgwas2hmp)_Extr_Soil_.*\\.txt$&quot;)
letrs &lt;- unique(gsub(&quot;^(manhattan|postgwas2hmp)_Extr_Soil_([A-z]+).*&quot;, &quot;\&quot;, x))
lapply(letrs,
       \(x) list.files(pattern = paste0(&quot;^(manhattan|postgwas2hmp)_Extr_Soil_&quot;, x, &quot;_.*\\.txt$&quot;))
       )
#&gt; [[1]]
#&gt; [1] &quot;manhattan_Extr_Soil_B_Ratio__top1.txt&quot;  
#&gt; [2] &quot;postgwas2hmp_Extr_Soil_B_Ratio_top1.txt&quot;
#&gt; 
#&gt; [[2]]
#&gt; [1] &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;  
#&gt; [2] &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot;

<sup>Created on 2023-02-24 by the reprex package (v2.0.1)</sup>

答案2

得分: 0

我的方法是拆分文件名的组件，然后组装它们，然后创建列表。列表的最终操作使用了 pivot_wider()，然后对名称进行一些整理。这可能是不必要的。

library(tidyr)
library(dplyr)
topic <- c("postgwas2hmp_", "manhattan_")
id <- c("Extr_Soil_C", "Extr_Soil_P", "Total_Soil_D", "Extr_Soil_E")
run <- c("Ratio_top1", "Ratio_top1")
end <- c(".txt")
df <- crossing(topic, id, run, end) %>%
  mutate(files = paste0(topic, id, run, end)) %>%
  as_tibble() %>%
  pivot_wider(names_from = topic, values_from = files) %>%
  select(-c(id, run, end))
df <- df %>%
  unlist() %>%
  unname() %>%
  split(f = seq(nrow(df))) %>%
  unname()
[[1]]
[1] "manhattan_Extr_Soil_CRatio_top1.txt"    "postgwas2hmp_Extr_Soil_CRatio_top1.txt"
[[2]]
[1] "manhattan_Extr_Soil_ERatio_top1.txt"    "postgwas2hmp_Extr_Soil_ERatio_top1.txt"
[[3]]
[1] "manhattan_Extr_Soil_PRatio_top1.txt"    "postgwas2hmp_Extr_Soil_PRatio_top1.txt"
[[4]]
[1] "manhattan_Total_Soil_DRatio_top1.txt"    "postgwas2hmp_Total_Soil_DRatio_top1.txt"

英文:

My approach would be to split out the components of your file names, assemble them and then create the lists. The final manipulation of the lists uses a pivot_wider() and then some playing around with the names to clean them up. This may not be needed.

library(tidyr)
library(dplyr)
topic &lt;- c(&quot;postgwas2hmp_&quot;, &quot;manhattan_&quot;)
id &lt;- c(&quot;Extr_Soil_C&quot;, &quot;Extr_Soil_P&quot;, &quot;Total_Soil_D&quot;, &quot;Extr_Soil_E&quot;)
run &lt;- c(&quot;Ratio_top1&quot;, &quot;Ratio_top1&quot;)
end &lt;- c(&quot;.txt&quot;)
df &lt;- crossing(topic, id, run, end) %&gt;% 
  mutate(files = paste0(topic, id, run, end)) %&gt;% 
  as_tibble() %&gt;% 
  pivot_wider(names_from = topic, values_from = files) %&gt;% 
  select(-c(id, run, end)) 
df &lt;- df %&gt;% 
  unlist() %&gt;% 
  unname() %&gt;% 
  split(f = seq(nrow(df))) %&gt;% 
  unname()
[[1]]
[1] &quot;manhattan_Extr_Soil_CRatio_top1.txt&quot;    &quot;postgwas2hmp_Extr_Soil_CRatio_top1.txt&quot;
[[2]]
[1] &quot;manhattan_Extr_Soil_ERatio_top1.txt&quot;    &quot;postgwas2hmp_Extr_Soil_ERatio_top1.txt&quot;
[[3]]
[1] &quot;manhattan_Extr_Soil_PRatio_top1.txt&quot;    &quot;postgwas2hmp_Extr_Soil_PRatio_top1.txt&quot;
[[4]]
[1] &quot;manhattan_Total_Soil_DRatio_top1.txt&quot;    &quot;postgwas2hmp_Total_Soil_DRatio_top1.txt&quot;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

读取R中的文件对，使用正则表达式

问题

答案1

答案2

How to add background Color to a navigationView in swiftUI

在Keras中为自定义损失函数明确计算梯度。

根据多个标准创建分类

将字符更改为可由curve()函数使用的方程。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。