读取R中的文件对,使用正则表达式

huangapple go评论99阅读模式
英文:

Read files in pairs in R using regular expression

问题

files <- list(
c("postgwas2hmp_Extr_Soil_C_Ratio_top1.txt", "manhattan_Extr_Soil_C_Ratio__top1.txt"),
c("postgwas2hmp_Extr_Soil_P_Ratio_top1.txt", "manhattan_Extr_Soil_P_Ratio__top1.txt"),
c("postgwas2hmp_Total_Soil_D_Ratio_top1.txt", "manhattan_Total_Soil_D_Ratio__top1.txt"),
c("postgwas2hmp_Extr_Soil_E_Ratio_top1.txt", "manhattan_Extr_Soil_E_Ratio__top1.txt")
)

files <- list(
c(grep("postgwas2hmp_.\.txt$", dir(), value = TRUE),
grep("^manhattan_.
\.txt$", dir(), value = TRUE))
)

str(files)
List of 4
$ : chr [1:2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt" "manhattan_Extr_Soil_C_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Extr_Soil_P_Ratio_top1.txt" "manhattan_Extr_Soil_P_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Total_Soil_D_Ratio_top1.txt" "manhattan_Total_Soil_D_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Extr_Soil_E_Ratio_top1.txt" "manhattan_Extr_Soil_E_Ratio__top1.txt"

英文:

I am trying to list files in pairs in R but my way is a bit messy and I have to provide all the file names manually in pairs.

Currently I am doing like this:

  1. files &lt;- list(
  2. c(&quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot;,&quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;),
  3. c(&quot;postgwas2hmp_Extr_Soil_P_Ratio_top1.txt&quot;,&quot;manhattan_Extr_Soil_P_Ratio__top1.txt&quot;),
  4. c(&quot;postgwas2hmp_Total_Soil_D_Ratio_top1.txt&quot;,&quot;manhattan_Total_Soil_D_Ratio__top1.txt&quot;),
  5. c(&quot;postgwas2hmp_Extr_Soil_E_Ratio_top1.txt&quot;,&quot;manhattan_Extr_Soil_E_Ratio__top1.txt&quot;)
  6. )

And then I am using these files in a r function. It is working fine but is there any way I just need to reads all these files in pairs using regular expression just in one line something like this:

  1. files &lt;- list(
  2. c(&quot;postgwas2hmp_*//.txt$&quot;,&quot;^manhattan_.*\\.txt$&quot;)
  3. )

This second code is not working but I want something like this to avoid listing all the files individually.

And What I finally want to have after list calling:

  1. str(files)
  2. List of 4
  3. $ : chr [1:2] &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot; &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;
  4. $ : chr [1:2] &quot;postgwas2hmp_Extr_Soil_P_Ratio_top1.txt&quot; &quot;manhattan_Extr_Soil_P_Ratio__top1.txt&quot;
  5. $ : chr [1:2] &quot;postgwas2hmp_Total_Soil_D_Ratio_top1.txt&quot; &quot;manhattan_Total_Soil_D_Ratio__top1.txt&quot;
  6. $ : chr [1:2] &quot;postgwas2hmp_Extr_Soil_E_Ratio_top1.txt&quot; &quot;manhattan_Extr_Soil_E_Ratio__top1.txt&quot;

Thanks,

答案1

得分: 1

如果您的文件都在同一个目录下,我们可以使用 list.files() 并提供一个正则表达式 pattern 来实现。这应该可以工作(假设文件在您的工作目录中)。下面我将以下两个文件放在我的当前工作目录中:"postgwas2hmp_Extr_Soil_C_Ratio_top1.txt""manhattan_Extr_Soil_C_Ratio__top1.txt"。然后,list.files() 的结果如下所示:

  1. list.files(pattern = "^(manhattan|postgwas2hmp)_.*\\.txt$")
  2. #> [1] "manhattan_Extr_Soil_C_Ratio__top1.txt"
  3. #> [2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"

要生成一个包含相同字母的两个元素的列表,我们可以使用以下方法:

  1. x <- list.files(pattern = "^(manhattan|postgwas2hmp)_Extr_Soil_.*\\.txt$")
  2. letrs <- unique(gsub("^(manhattan|postgwas2hmp)_Extr_Soil_([A-z]+).*", "\", x))
  3. lapply(letrs,
  4. \(x) list.files(pattern = paste0("^(manhattan|postgwas2hmp)_Extr_Soil_", x, "_.*\\.txt$"))
  5. )
  6. #> [[1]]
  7. #> [1] "manhattan_Extr_Soil_B_Ratio__top1.txt"
  8. #> [2] "postgwas2hmp_Extr_Soil_B_Ratio_top1.txt"
  9. #>
  10. #> [[2]]
  11. #> [1] "manhattan_Extr_Soil_C_Ratio__top1.txt"
  12. #> [2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"

创建于2023-02-24,使用 reprex package(v2.0.1)

英文:

If your files are all in the same directoy we can use list.files() and provide a regex pattern. This should work (given the files are in your working directory). Below I put the following two files in my current working directory: &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot; and &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;. Then the result of list.files() is as follows:

  1. list.files(pattern = &quot;^(manhattan|postgwas2hmp)_.*\\.txt$&quot;)
  2. #&gt; [1] &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;
  3. #&gt; [2] &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot;

To generate a list with two elements containing the same letter we can use the following approach:

  1. x &lt;- list.files(pattern = &quot;^(manhattan|postgwas2hmp)_Extr_Soil_.*\\.txt$&quot;)
  2. letrs &lt;- unique(gsub(&quot;^(manhattan|postgwas2hmp)_Extr_Soil_([A-z]+).*&quot;, &quot;\&quot;, x))
  3. lapply(letrs,
  4. \(x) list.files(pattern = paste0(&quot;^(manhattan|postgwas2hmp)_Extr_Soil_&quot;, x, &quot;_.*\\.txt$&quot;))
  5. )
  6. #&gt; [[1]]
  7. #&gt; [1] &quot;manhattan_Extr_Soil_B_Ratio__top1.txt&quot;
  8. #&gt; [2] &quot;postgwas2hmp_Extr_Soil_B_Ratio_top1.txt&quot;
  9. #&gt;
  10. #&gt; [[2]]
  11. #&gt; [1] &quot;manhattan_Extr_Soil_C_Ratio__top1.txt&quot;
  12. #&gt; [2] &quot;postgwas2hmp_Extr_Soil_C_Ratio_top1.txt&quot;

<sup>Created on 2023-02-24 by the reprex package (v2.0.1)</sup>

答案2

得分: 0

我的方法是拆分文件名的组件,然后组装它们,然后创建列表。列表的最终操作使用了 pivot_wider(),然后对名称进行一些整理。这可能是不必要的。

  1. library(tidyr)
  2. library(dplyr)
  3. topic <- c("postgwas2hmp_", "manhattan_")
  4. id <- c("Extr_Soil_C", "Extr_Soil_P", "Total_Soil_D", "Extr_Soil_E")
  5. run <- c("Ratio_top1", "Ratio_top1")
  6. end <- c(".txt")
  7. df <- crossing(topic, id, run, end) %>%
  8. mutate(files = paste0(topic, id, run, end)) %>%
  9. as_tibble() %>%
  10. pivot_wider(names_from = topic, values_from = files) %>%
  11. select(-c(id, run, end))
  12. df <- df %>%
  13. unlist() %>%
  14. unname() %>%
  15. split(f = seq(nrow(df))) %>%
  16. unname()
  17. [[1]]
  18. [1] "manhattan_Extr_Soil_CRatio_top1.txt" "postgwas2hmp_Extr_Soil_CRatio_top1.txt"
  19. [[2]]
  20. [1] "manhattan_Extr_Soil_ERatio_top1.txt" "postgwas2hmp_Extr_Soil_ERatio_top1.txt"
  21. [[3]]
  22. [1] "manhattan_Extr_Soil_PRatio_top1.txt" "postgwas2hmp_Extr_Soil_PRatio_top1.txt"
  23. [[4]]
  24. [1] "manhattan_Total_Soil_DRatio_top1.txt" "postgwas2hmp_Total_Soil_DRatio_top1.txt"
英文:

My approach would be to split out the components of your file names, assemble them and then create the lists. The final manipulation of the lists uses a pivot_wider() and then some playing around with the names to clean them up. This may not be needed.

  1. library(tidyr)
  2. library(dplyr)
  3. topic &lt;- c(&quot;postgwas2hmp_&quot;, &quot;manhattan_&quot;)
  4. id &lt;- c(&quot;Extr_Soil_C&quot;, &quot;Extr_Soil_P&quot;, &quot;Total_Soil_D&quot;, &quot;Extr_Soil_E&quot;)
  5. run &lt;- c(&quot;Ratio_top1&quot;, &quot;Ratio_top1&quot;)
  6. end &lt;- c(&quot;.txt&quot;)
  7. df &lt;- crossing(topic, id, run, end) %&gt;%
  8. mutate(files = paste0(topic, id, run, end)) %&gt;%
  9. as_tibble() %&gt;%
  10. pivot_wider(names_from = topic, values_from = files) %&gt;%
  11. select(-c(id, run, end))
  12. df &lt;- df %&gt;%
  13. unlist() %&gt;%
  14. unname() %&gt;%
  15. split(f = seq(nrow(df))) %&gt;%
  16. unname()
  17. [[1]]
  18. [1] &quot;manhattan_Extr_Soil_CRatio_top1.txt&quot; &quot;postgwas2hmp_Extr_Soil_CRatio_top1.txt&quot;
  19. [[2]]
  20. [1] &quot;manhattan_Extr_Soil_ERatio_top1.txt&quot; &quot;postgwas2hmp_Extr_Soil_ERatio_top1.txt&quot;
  21. [[3]]
  22. [1] &quot;manhattan_Extr_Soil_PRatio_top1.txt&quot; &quot;postgwas2hmp_Extr_Soil_PRatio_top1.txt&quot;
  23. [[4]]
  24. [1] &quot;manhattan_Total_Soil_DRatio_top1.txt&quot; &quot;postgwas2hmp_Total_Soil_DRatio_top1.txt&quot;

huangapple
  • 本文由 发表于 2023年2月24日 17:38:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75554896.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定