英文:
Read files in pairs in R using regular expression
问题
files <- list(
c("postgwas2hmp_Extr_Soil_C_Ratio_top1.txt", "manhattan_Extr_Soil_C_Ratio__top1.txt"),
c("postgwas2hmp_Extr_Soil_P_Ratio_top1.txt", "manhattan_Extr_Soil_P_Ratio__top1.txt"),
c("postgwas2hmp_Total_Soil_D_Ratio_top1.txt", "manhattan_Total_Soil_D_Ratio__top1.txt"),
c("postgwas2hmp_Extr_Soil_E_Ratio_top1.txt", "manhattan_Extr_Soil_E_Ratio__top1.txt")
)
files <- list(
c(grep("postgwas2hmp_.\.txt$", dir(), value = TRUE),
grep("^manhattan_.\.txt$", dir(), value = TRUE))
)
str(files)
List of 4
$ : chr [1:2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt" "manhattan_Extr_Soil_C_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Extr_Soil_P_Ratio_top1.txt" "manhattan_Extr_Soil_P_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Total_Soil_D_Ratio_top1.txt" "manhattan_Total_Soil_D_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Extr_Soil_E_Ratio_top1.txt" "manhattan_Extr_Soil_E_Ratio__top1.txt"
英文:
I am trying to list files in pairs in R but my way is a bit messy and I have to provide all the file names manually in pairs.
Currently I am doing like this:
files <- list(
c("postgwas2hmp_Extr_Soil_C_Ratio_top1.txt","manhattan_Extr_Soil_C_Ratio__top1.txt"),
c("postgwas2hmp_Extr_Soil_P_Ratio_top1.txt","manhattan_Extr_Soil_P_Ratio__top1.txt"),
c("postgwas2hmp_Total_Soil_D_Ratio_top1.txt","manhattan_Total_Soil_D_Ratio__top1.txt"),
c("postgwas2hmp_Extr_Soil_E_Ratio_top1.txt","manhattan_Extr_Soil_E_Ratio__top1.txt")
)
And then I am using these files in a r function. It is working fine but is there any way I just need to reads all these files in pairs using regular expression just in one line something like this:
files <- list(
c("postgwas2hmp_*//.txt$","^manhattan_.*\\.txt$")
)
This second code is not working but I want something like this to avoid listing all the files individually.
And What I finally want to have after list calling:
str(files)
List of 4
$ : chr [1:2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt" "manhattan_Extr_Soil_C_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Extr_Soil_P_Ratio_top1.txt" "manhattan_Extr_Soil_P_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Total_Soil_D_Ratio_top1.txt" "manhattan_Total_Soil_D_Ratio__top1.txt"
$ : chr [1:2] "postgwas2hmp_Extr_Soil_E_Ratio_top1.txt" "manhattan_Extr_Soil_E_Ratio__top1.txt"
Thanks,
答案1
得分: 1
如果您的文件都在同一个目录下,我们可以使用 list.files()
并提供一个正则表达式 pattern
来实现。这应该可以工作(假设文件在您的工作目录中)。下面我将以下两个文件放在我的当前工作目录中:"postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"
和 "manhattan_Extr_Soil_C_Ratio__top1.txt"
。然后,list.files()
的结果如下所示:
list.files(pattern = "^(manhattan|postgwas2hmp)_.*\\.txt$")
#> [1] "manhattan_Extr_Soil_C_Ratio__top1.txt"
#> [2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"
要生成一个包含相同字母的两个元素的列表,我们可以使用以下方法:
x <- list.files(pattern = "^(manhattan|postgwas2hmp)_Extr_Soil_.*\\.txt$")
letrs <- unique(gsub("^(manhattan|postgwas2hmp)_Extr_Soil_([A-z]+).*", "\", x))
lapply(letrs,
\(x) list.files(pattern = paste0("^(manhattan|postgwas2hmp)_Extr_Soil_", x, "_.*\\.txt$"))
)
#> [[1]]
#> [1] "manhattan_Extr_Soil_B_Ratio__top1.txt"
#> [2] "postgwas2hmp_Extr_Soil_B_Ratio_top1.txt"
#>
#> [[2]]
#> [1] "manhattan_Extr_Soil_C_Ratio__top1.txt"
#> [2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"
创建于2023-02-24,使用 reprex package(v2.0.1)
英文:
If your files are all in the same directoy we can use list.files()
and provide a regex pattern
. This should work (given the files are in your working directory). Below I put the following two files in my current working directory: "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"
and "manhattan_Extr_Soil_C_Ratio__top1.txt"
. Then the result of list.files()
is as follows:
list.files(pattern = "^(manhattan|postgwas2hmp)_.*\\.txt$")
#> [1] "manhattan_Extr_Soil_C_Ratio__top1.txt"
#> [2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"
To generate a list with two elements containing the same letter we can use the following approach:
x <- list.files(pattern = "^(manhattan|postgwas2hmp)_Extr_Soil_.*\\.txt$")
letrs <- unique(gsub("^(manhattan|postgwas2hmp)_Extr_Soil_([A-z]+).*", "\", x))
lapply(letrs,
\(x) list.files(pattern = paste0("^(manhattan|postgwas2hmp)_Extr_Soil_", x, "_.*\\.txt$"))
)
#> [[1]]
#> [1] "manhattan_Extr_Soil_B_Ratio__top1.txt"
#> [2] "postgwas2hmp_Extr_Soil_B_Ratio_top1.txt"
#>
#> [[2]]
#> [1] "manhattan_Extr_Soil_C_Ratio__top1.txt"
#> [2] "postgwas2hmp_Extr_Soil_C_Ratio_top1.txt"
<sup>Created on 2023-02-24 by the reprex package (v2.0.1)</sup>
答案2
得分: 0
我的方法是拆分文件名的组件,然后组装它们,然后创建列表。列表的最终操作使用了 pivot_wider(),然后对名称进行一些整理。这可能是不必要的。
library(tidyr)
library(dplyr)
topic <- c("postgwas2hmp_", "manhattan_")
id <- c("Extr_Soil_C", "Extr_Soil_P", "Total_Soil_D", "Extr_Soil_E")
run <- c("Ratio_top1", "Ratio_top1")
end <- c(".txt")
df <- crossing(topic, id, run, end) %>%
mutate(files = paste0(topic, id, run, end)) %>%
as_tibble() %>%
pivot_wider(names_from = topic, values_from = files) %>%
select(-c(id, run, end))
df <- df %>%
unlist() %>%
unname() %>%
split(f = seq(nrow(df))) %>%
unname()
[[1]]
[1] "manhattan_Extr_Soil_CRatio_top1.txt" "postgwas2hmp_Extr_Soil_CRatio_top1.txt"
[[2]]
[1] "manhattan_Extr_Soil_ERatio_top1.txt" "postgwas2hmp_Extr_Soil_ERatio_top1.txt"
[[3]]
[1] "manhattan_Extr_Soil_PRatio_top1.txt" "postgwas2hmp_Extr_Soil_PRatio_top1.txt"
[[4]]
[1] "manhattan_Total_Soil_DRatio_top1.txt" "postgwas2hmp_Total_Soil_DRatio_top1.txt"
英文:
My approach would be to split out the components of your file names, assemble them and then create the lists. The final manipulation of the lists uses a pivot_wider() and then some playing around with the names to clean them up. This may not be needed.
library(tidyr)
library(dplyr)
topic <- c("postgwas2hmp_", "manhattan_")
id <- c("Extr_Soil_C", "Extr_Soil_P", "Total_Soil_D", "Extr_Soil_E")
run <- c("Ratio_top1", "Ratio_top1")
end <- c(".txt")
df <- crossing(topic, id, run, end) %>%
mutate(files = paste0(topic, id, run, end)) %>%
as_tibble() %>%
pivot_wider(names_from = topic, values_from = files) %>%
select(-c(id, run, end))
df <- df %>%
unlist() %>%
unname() %>%
split(f = seq(nrow(df))) %>%
unname()
[[1]]
[1] "manhattan_Extr_Soil_CRatio_top1.txt" "postgwas2hmp_Extr_Soil_CRatio_top1.txt"
[[2]]
[1] "manhattan_Extr_Soil_ERatio_top1.txt" "postgwas2hmp_Extr_Soil_ERatio_top1.txt"
[[3]]
[1] "manhattan_Extr_Soil_PRatio_top1.txt" "postgwas2hmp_Extr_Soil_PRatio_top1.txt"
[[4]]
[1] "manhattan_Total_Soil_DRatio_top1.txt" "postgwas2hmp_Total_Soil_DRatio_top1.txt"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论