读取R中的文件对,使用正则表达式

huangapple go评论69阅读模式
英文:

Read files in pairs in R using regular expression

问题

files <- list(
c("postgwas2hmp_Extr_Soil_C_Ratio_top1.txt", "manhattan_Extr_Soil_C_Ratio__top1.txt"),
c("postgwas2hmp_Extr_Soil_P_Ratio_top1.txt", "manhattan_Extr_Soil_P_Ratio__top1.txt"),
c("postgwas2hmp_Total_Soil_D_Ratio_top1.txt", "manhattan_Total_Soil_D_Ratio__top1.txt"),
c("postgwas2hmp_Extr_Soil_E_Ratio_top1.txt", "manhattan_Extr_Soil_E_Ratio__top1.txt")
)

files <- list(
c(grep("postgwas2hmp_.\.txt$", dir(), value = TRUE),
grep("^manhattan_.
\.txt$", dir(), value = TRUE))
)

str(files)
List of 4
$ : chr [1:2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt" "manhattan_Extr_Soil_C_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Extr_Soil_P_Ratio_top1.txt" "manhattan_Extr_Soil_P_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Total_Soil_D_Ratio_top1.txt" "manhattan_Total_Soil_D_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Extr_Soil_E_Ratio_top1.txt" "manhattan_Extr_Soil_E_Ratio__top1.txt"

英文:

I am trying to list files in pairs in R but my way is a bit messy and I have to provide all the file names manually in pairs.

Currently I am doing like this:

files &lt;- list(
  c(&quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot;,&quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;),
  c(&quot;postgwas2hmp_Extr_Soil_P_Ratio_top1.txt&quot;,&quot;manhattan_Extr_Soil_P_Ratio__top1.txt&quot;),
  c(&quot;postgwas2hmp_Total_Soil_D_Ratio_top1.txt&quot;,&quot;manhattan_Total_Soil_D_Ratio__top1.txt&quot;),
  c(&quot;postgwas2hmp_Extr_Soil_E_Ratio_top1.txt&quot;,&quot;manhattan_Extr_Soil_E_Ratio__top1.txt&quot;)
  )

And then I am using these files in a r function. It is working fine but is there any way I just need to reads all these files in pairs using regular expression just in one line something like this:

files &lt;- list(
  c(&quot;postgwas2hmp_*//.txt$&quot;,&quot;^manhattan_.*\\.txt$&quot;)
  )

This second code is not working but I want something like this to avoid listing all the files individually.

And What I finally want to have after list calling:

str(files)
List of 4
 $ : chr [1:2] &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot; &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;
 $ : chr [1:2] &quot;postgwas2hmp_Extr_Soil_P_Ratio_top1.txt&quot; &quot;manhattan_Extr_Soil_P_Ratio__top1.txt&quot;
 $ : chr [1:2] &quot;postgwas2hmp_Total_Soil_D_Ratio_top1.txt&quot; &quot;manhattan_Total_Soil_D_Ratio__top1.txt&quot;
 $ : chr [1:2] &quot;postgwas2hmp_Extr_Soil_E_Ratio_top1.txt&quot; &quot;manhattan_Extr_Soil_E_Ratio__top1.txt&quot;

Thanks,

答案1

得分: 1

如果您的文件都在同一个目录下,我们可以使用 list.files() 并提供一个正则表达式 pattern 来实现。这应该可以工作(假设文件在您的工作目录中)。下面我将以下两个文件放在我的当前工作目录中:"postgwas2hmp_Extr_Soil_C_Ratio_top1.txt""manhattan_Extr_Soil_C_Ratio__top1.txt"。然后,list.files() 的结果如下所示:

list.files(pattern = "^(manhattan|postgwas2hmp)_.*\\.txt$")
#> [1] "manhattan_Extr_Soil_C_Ratio__top1.txt"  
#> [2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"

要生成一个包含相同字母的两个元素的列表,我们可以使用以下方法:

x <- list.files(pattern = "^(manhattan|postgwas2hmp)_Extr_Soil_.*\\.txt$")

letrs <- unique(gsub("^(manhattan|postgwas2hmp)_Extr_Soil_([A-z]+).*", "\", x))

lapply(letrs,
       \(x) list.files(pattern = paste0("^(manhattan|postgwas2hmp)_Extr_Soil_", x, "_.*\\.txt$"))
       )

#> [[1]]
#> [1] "manhattan_Extr_Soil_B_Ratio__top1.txt"  
#> [2] "postgwas2hmp_Extr_Soil_B_Ratio_top1.txt"
#> 
#> [[2]]
#> [1] "manhattan_Extr_Soil_C_Ratio__top1.txt"  
#> [2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"

创建于2023-02-24,使用 reprex package(v2.0.1)

英文:

If your files are all in the same directoy we can use list.files() and provide a regex pattern. This should work (given the files are in your working directory). Below I put the following two files in my current working directory: &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot; and &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;. Then the result of list.files() is as follows:

list.files(pattern = &quot;^(manhattan|postgwas2hmp)_.*\\.txt$&quot;)
#&gt; [1] &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;  
#&gt; [2] &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot;

To generate a list with two elements containing the same letter we can use the following approach:

x &lt;- list.files(pattern = &quot;^(manhattan|postgwas2hmp)_Extr_Soil_.*\\.txt$&quot;)

letrs &lt;- unique(gsub(&quot;^(manhattan|postgwas2hmp)_Extr_Soil_([A-z]+).*&quot;, &quot;\&quot;, x))

lapply(letrs,
       \(x) list.files(pattern = paste0(&quot;^(manhattan|postgwas2hmp)_Extr_Soil_&quot;, x, &quot;_.*\\.txt$&quot;))
       )

#&gt; [[1]]
#&gt; [1] &quot;manhattan_Extr_Soil_B_Ratio__top1.txt&quot;  
#&gt; [2] &quot;postgwas2hmp_Extr_Soil_B_Ratio_top1.txt&quot;
#&gt; 
#&gt; [[2]]
#&gt; [1] &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;  
#&gt; [2] &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot;

<sup>Created on 2023-02-24 by the reprex package (v2.0.1)</sup>

答案2

得分: 0

我的方法是拆分文件名的组件,然后组装它们,然后创建列表。列表的最终操作使用了 pivot_wider(),然后对名称进行一些整理。这可能是不必要的。

library(tidyr)
library(dplyr)

topic <- c("postgwas2hmp_", "manhattan_")
id <- c("Extr_Soil_C", "Extr_Soil_P", "Total_Soil_D", "Extr_Soil_E")
run <- c("Ratio_top1", "Ratio_top1")
end <- c(".txt")

df <- crossing(topic, id, run, end) %>%
  mutate(files = paste0(topic, id, run, end)) %>%
  as_tibble() %>%
  pivot_wider(names_from = topic, values_from = files) %>%
  select(-c(id, run, end))

df <- df %>%
  unlist() %>%
  unname() %>%
  split(f = seq(nrow(df))) %>%
  unname()

[[1]]
[1] "manhattan_Extr_Soil_CRatio_top1.txt"    "postgwas2hmp_Extr_Soil_CRatio_top1.txt"

[[2]]
[1] "manhattan_Extr_Soil_ERatio_top1.txt"    "postgwas2hmp_Extr_Soil_ERatio_top1.txt"

[[3]]
[1] "manhattan_Extr_Soil_PRatio_top1.txt"    "postgwas2hmp_Extr_Soil_PRatio_top1.txt"

[[4]]
[1] "manhattan_Total_Soil_DRatio_top1.txt"    "postgwas2hmp_Total_Soil_DRatio_top1.txt"
英文:

My approach would be to split out the components of your file names, assemble them and then create the lists. The final manipulation of the lists uses a pivot_wider() and then some playing around with the names to clean them up. This may not be needed.

library(tidyr)
library(dplyr)

topic &lt;- c(&quot;postgwas2hmp_&quot;, &quot;manhattan_&quot;)
id &lt;- c(&quot;Extr_Soil_C&quot;, &quot;Extr_Soil_P&quot;, &quot;Total_Soil_D&quot;, &quot;Extr_Soil_E&quot;)
run &lt;- c(&quot;Ratio_top1&quot;, &quot;Ratio_top1&quot;)
end &lt;- c(&quot;.txt&quot;)

df &lt;- crossing(topic, id, run, end) %&gt;% 
  mutate(files = paste0(topic, id, run, end)) %&gt;% 
  as_tibble() %&gt;% 
  pivot_wider(names_from = topic, values_from = files) %&gt;% 
  select(-c(id, run, end)) 

df &lt;- df %&gt;% 
  unlist() %&gt;% 
  unname() %&gt;% 
  split(f = seq(nrow(df))) %&gt;% 
  unname()

[[1]]
[1] &quot;manhattan_Extr_Soil_CRatio_top1.txt&quot;    &quot;postgwas2hmp_Extr_Soil_CRatio_top1.txt&quot;

[[2]]
[1] &quot;manhattan_Extr_Soil_ERatio_top1.txt&quot;    &quot;postgwas2hmp_Extr_Soil_ERatio_top1.txt&quot;

[[3]]
[1] &quot;manhattan_Extr_Soil_PRatio_top1.txt&quot;    &quot;postgwas2hmp_Extr_Soil_PRatio_top1.txt&quot;

[[4]]
[1] &quot;manhattan_Total_Soil_DRatio_top1.txt&quot;    &quot;postgwas2hmp_Total_Soil_DRatio_top1.txt&quot;

huangapple
  • 本文由 发表于 2023年2月24日 17:38:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75554896.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定