2023年6月29日 10:43:57go评论90阅读模式

英文:

How can I give different names to files in a directory with a for loop in a bash script?

问题

我希望获得17个不同的配对末端fastq文件（总共34个），所以我想制作一个bash脚本，可以一次性运行我的代码处理目录中的所有fastq文件。如何在每次脚本运行时更改输入和输出文件的名称，以便在移动到file_002时，所有名称都以file_002开头，而不是file_001，依此类推。此外，在合并R1和R2读取时，如何确保只合并相应的文件并使用循环？例如，只合并file_001_R1与file_001_R2，file_002_R1与file_002_R2，file_003_R1与file_003_R2，依此类推。

for file in directory_name
do
pear -f file_001_R1.fastq.gz -r file_001_R2.fastq.gz -o file_001.fastq
cutadapt -g TGATAACAATTGGAGCAGCCTC...GGATCGACCAAGAACCAGCA -o file_001_barcode.fastq file_001.fastq
cutadapt -g GTGTACAAATAATTGTCAAC...CTGTCTCTTATACACATCTC -o file_001_UMI.fastq file_001.fastq
seqkit concat file_001_barcode.fastq file_001_UMI.fastq > file_001_concatenation.fastq
seqkit rmdup -s file_001_concatenation.fastq -o file_001_unique_pairs.fastq
seqkit subseq -r file_001_unique_pairs.fastq > file_001_unique_barcodes.fasta
bowtie -q --suppress 1,2,4,6,7,8 -x ref_index file_001_unique_barcodes.fasta > file_001_barcodes_allignment.bowtie
sort file_001_barcodes_allignment.bowtie | uniq -c > file_001_barcode_counts.txt
awk 'BEGIN{print "Barcode,TF_variant,Code"}{print $3","$2","$1}' file_001_barcode_counts.txt > file_001_barcode_counts.csv
done

英文:

I'm expecting to get 17 different paired-end fastq files (34 in total), so I want to make a bash script to just run my code through all the fastq files in a directory at once. How can I change the name of the input and output files each time the script runs through each file? So when it moves to the file_002, all names have file_002 at the beginning instead of file_001, and so on. And also, when merging the R1 and R2 reads how can I make that it only merges the correspondant files with a loop? for examples merging only file_001_R1 with file_001_R2, file_002_R1 with file_002_R2, file_003_R1 with file_003_R2, and so on.

for file in directory_name
do
pear -f file_001_R1.fastq.gz -r file_001_R2.fastq.gz -o file_001.fastq
cutadapt -g TGATAACAATTGGAGCAGCCTC...GGATCGACCAAGAACCAGCA -o file_001_barcode.fastq file_001.fastq
cutadapt -g GTGTACAAATAATTGTCAAC...CTGTCTCTTATACACATCTC -o file_001_UMI.fastq file_001.fastq
seqkit concat file_001_barcode.fastq file_001_UMI.fastq &gt; file_001_concatenation.fastq
seqkit rmdup -s file_001_concatenation.fastq -o file_001_unique_pairs.fastq
seqkit subseq -r file_001_unique_pairs.fastq &gt; file_001_unique_barcodes.fasta
bowtie -q --suppress 1,2,4,6,7,8 -x ref_index file_001_unique_barcodes.fasta &gt; file_001_barcodes_allignment.bowtie
sort file_001_barcodes_allignment.bowtie | uniq -c &gt; file_001_barcode_counts.txt
awk &#39;BEGIN{print &quot;Barcode,TF_variant,Code&quot;}{print $3&quot;,&quot;$2&quot;,&quot;$1}&#39; file_001_barcode_counts.txt &gt; file_001_barcode_counts.csv
done

答案1

得分: 0

你可以使用bash 参数扩展来捕获文件名中的 "file_001" 部分，例如。

cd directory_name
for file in ./*_R1.fastq.gz
do
    pear -f "$file" -r "${file%_*}_R2.fastq.gz" -o "${file%_*}.fastq"
    cutadapt -g TGATAACAATTGGAGCAGCCTC...GGATCGACCAAGAACCAGCA -o "${file%_*}_barcode.fastq" "${file%_*}.fastq"
    cutadapt -g GTGTACAAATAATTGTCAAC...CTGTCTCTTATACACATCTC -o "${file%_*}_UMI.fastq" "${file%_*}.fastq"
    seqkit concat "${file%_*}_barcode.fastq" "${file%_*}_UMI.fastq" > "${file%_*}_concatenation.fastq"
    seqkit rmdup -s "${file%_*}_concatenation.fastq" -o "${file%_*}_unique_pairs.fastq"
    seqkit subseq -r "${file%_*}_unique_pairs.fastq" > "${file%_*}_unique_barcodes.fasta"
    bowtie -q --suppress 1,2,4,6,7,8 -x ref_index "${file%_*}_unique_barcodes.fasta" > "${file%_*}_barcodes_allignment.bowtie"
    sort "${file%_*}_barcodes_allignment.bowtie" | uniq -c > "${file%_*}_barcode_counts.txt"
    awk 'BEGIN{print "Barcode,TF_variant,Code"} {print $3 "," $2 "," $1}' "${file%_*}_barcode_counts.txt" > "${file%_*}_barcode_counts.csv"
done

不确定你的流程是否适当/最佳，你可能想向 https://bioinformatics.stackexchange.com 的专家寻求建议。

英文:

You can use bash parameter expansion to capture the "file_001" part of the filename, e.g.

cd directory_name
for file in ./*_R1.fastq.gz
do
    pear -f &quot;$file&quot; -r &quot;${file%_*}_R2.fastq.gz&quot; -o &quot;${file%_*}.fastq&quot;
    cutadapt -g TGATAACAATTGGAGCAGCCTC...GGATCGACCAAGAACCAGCA -o &quot;${file%_*}_barcode.fastq&quot; &quot;${file%_*}.fastq&quot;
    cutadapt -g GTGTACAAATAATTGTCAAC...CTGTCTCTTATACACATCTC -o &quot;${file%_*}_UMI.fastq&quot; &quot;${file%_*}.fastq&quot;
    seqkit concat &quot;${file%_*}_barcode.fastq&quot; &quot;${file%_*}_UMI.fastq&quot; &gt; &quot;${file%_*}_concatenation.fastq&quot;
    seqkit rmdup -s &quot;${file%_*}_concatenation.fastq&quot; -o &quot;${file%_*}_unique_pairs.fastq&quot;
    seqkit subseq -r &quot;${file%_*}_unique_pairs.fastq&quot; &gt; &quot;${file%_*}_unique_barcodes.fasta&quot;
    bowtie -q --suppress 1,2,4,6,7,8 -x ref_index &quot;${file%_*}_unique_barcodes.fasta&quot; &gt; &quot;${file%_*}_barcodes_allignment.bowtie&quot;
    sort &quot;${file%_*}_barcodes_allignment.bowtie&quot; | uniq -c &gt; &quot;${file%_*}_barcode_counts.txt&quot;
    awk &#39;BEGIN{print &quot;Barcode,TF_variant,Code&quot;} {print $3 &quot;,&quot; $2 &quot;,&quot; $1}&#39; &quot;${file%_*}_barcode_counts.txt&quot; &gt; &quot;${file%_*}_barcode_counts.csv&quot;
done

Not sure whether your pipeline is appropriate/optimal though; you might want to ask for advice from the experts over at https://bioinformatics.stackexchange.com

答案2

得分: 0

我不确定我完全理解你的问题，但你可以使用一个bash脚本来做类似这样的事情：循环遍历文件并提取没有扩展名的文件名，然后合并相应的R1和R2文件。

# 设置存放文件的目录
directory="/path/to/directory"
# 遍历目录中的文件
for file in "$directory"/*_R1.fastq; do
    # 提取没有扩展名和后缀的文件名
    filename=$(basename "$file" | sed 's/_R1.fastq//')
    # 设置R1和R2文件名
    r1_file="${filename}_R1.fastq"
    r2_file="${filename}_R2.fastq"
    # 设置输出文件名
    output_file="${filename}_merged.fastq"
    # 使用相应的R1和R2文件执行合并操作
    # 用你想运行的代码替换这一行
done

英文:

I am not sure entirely if I understand your question but you can use a bash script to do something like this: loop through files and extract the file names without the extensions and merge only the corresponding R1 R2 files.

# Set the directory where your files are located
directory=&quot;/path/to/directory&quot;
# Loop through the files in the directory
for file in &quot;$directory&quot;/*_R1.fastq; do
    # Extract the file name without the extension and suffix
    filename=$(basename &quot;$file&quot; | sed &#39;s/_R1.fastq//&#39;)
# Set the R1 and R2 file names
r1_file=&quot;${filename}_R1.fastq&quot;
r2_file=&quot;${filename}_R2.fastq&quot;
# Set the output file name
output_file=&quot;${filename}_merged.fastq&quot;
# Perform the merge operation using the corresponding R1 and R2 files
# Replace this line with whatever code you want to run
done

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在bash脚本中使用for循环为目录中的文件赋予不同的名称？

问题

答案1

答案2

如何在Bash中使用tee时返回错误代码

将变量从Bash脚本传递到Jenkins管道，无需插件。

Bash – 使用For循环，在文件类型之前，排除文件名中具有特定模式的文件。

bash脚本用于在5次无法到达特定IP或网站的ping请求后运行另一个脚本。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。