英文:
Merge multiple files based on regex
问题
如何使用for循环合并具有匹配字符/数字的两个带下划线的文件?目录中有许多文件
输入:
SRR9200887_1.fastq
SRR9200887_2.fastq
SRR9200888_1.fastq
SRR9200888_2.fastq
SRR9200889_1.fastq
SRR9200889_2.fastq
期望输出:
SRR9200887.fastq
SRR9200888.fastq
SRR9200889.fastq
我的尝试:
for l in $(ls *.fastq | cut -d_ -f1 | sort | uniq); do cat ${l}*.fastq
英文:
How do I merge two files with matching characters/digits before an underscore using a for loop? I have many files in the directory
Input:
SRR9200887_1.fastq
SRR9200887_2.fastq
SRR9200888_1.fastq
SRR9200888_2.fastq
SRR9200889_1.fastq
SRR9200889_2.fastq
Expected output:
SRR9200887.fastq
SRR9200888.fastq
SRR9200889.fastq
My attempt:
for l in $(ls *.fastq | cut -d_ -f1 | sort |uniq); do cat ${l}*.fastq
答案1
得分: 5
With bash
和它的 Parameter Expansion:
for i in *_1.fastq; do
cat "${i%_*.fastq}_1.fastq" "${i%_*.fastq}_2.fastq" > "${i%_*.fastq}.fastq";
done
${i%_*.fastq}
输出 $i
但不包含 _
及其后的部分,例如 SRR9200887
。
英文:
With bash
and its Parameter Expansion:
for i in *_1.fastq; do
cat "${i%_*.fastq}_1.fastq" "${i%_*.fastq}_2.fastq" > "${i%_*.fastq}.fastq";
done
${i%_*.fastq}
outputs $i
without _
and all following it, e.g. SRR9200887
.
答案2
得分: 3
for f in *_*.fastq; do cat "$f" >> "${f%_*}.fastq"; done
英文:
for f in *_*.fastq; do cat "$f" >> "${f%_*}.fastq"; done
答案3
得分: 2
为了将文件合并在一起,假设您有匹配的"_1.fastq"和"_2.fastq"文件,每个"SRR"对应一个,一个潜在的选项是:
SRR_array=(*_1.fastq)
for f in "${SRR_array[@]%%_*}"
do
cat "$f"_1.fastq "$f"_2.fastq > "$f".fastq
done
如果您想在合并后删除"_1.fastq"和"_2.fastq"文件:
SRR_array=(*_1.fastq)
for f in "${SRR_array[@]%%_*}"
do
cat "$f"_1.fastq "$f"_2.fastq > "$f".fastq
rm "$f"_1.fastq "$f"_2.fastq
done
英文:
To cat
the files together, assuming you have matching "_1.fastq" and "_2.fastq" for every "SRR", one potential option is:
SRR_array=(*_1.fastq)
for f in "${SRR_array[@]%%_*}"
do
cat "$f"_1.fastq "$f"_2.fastq > "$f".fastq
done
If you wanted to delete the _1.fastq and _2.fastq files after merging them together:
SRR_array=(*_1.fastq)
for f in "${SRR_array[@]%%_*}"
do
cat "$f"_1.fastq "$f"_2.fastq > "$f".fastq
rm "$f"_1.fastq "$f"_2.fastq
done
答案4
得分: 1
One bash
idea:
while read -r pfx
do
cat "${pfx}"_*.fastq >> "${pfx}".fastq
done < <(find . -name "*_*.fastq" | cut -d'_' -f1 | sort -u)
Tweaking OP's current code:
for l in $(ls -1 *_*.fastq | cut -d_ -f1 | sort | uniq)
do
cat ${l}_*.fastq >> "${l}".fastq
done
Where:
- we look for files with a
_
in the name; if the script is run more than once this will insure we don't pick up the previous concatenated files - make sure
ls
lists one file per line (hence the-1
) - in this case
sort | uniq
could be replaced withsort -u
英文:
One bash
idea:
while read -r pfx
do
cat "${pfx}"_*.fastq >> "${pfx}".fastq
done < <(find . -name "*_*.fastq" | cut -d'_' -f1 | sort -u)
Tweaking OP's current code:
for l in $(ls -1 *_*.fastq | cut -d_ -f1 | sort | uniq)
do
cat ${l}_*.fastq >> "${l}".fastq
done
Where:
- we look for files with a
_
in the name; if the script is run more than once this will insure we don't pick up the previous concatenated files - make sure
ls
lists one file per line (hence the-1
) - in this case
sort | uniq
could be replaced withsort -u
答案5
得分: 0
使用任何awk(未经测试):
FNR==1 {
out = FILENAME
sub(/_[0-9]+/,"",out)
if ( out != prev ) {
close(prev)
prev = out
}
}
{ print > out }
英文:
Using any awk (untested):
awk '
FNR==1 {
out = FILENAME
sub(/_[0-9]+/,"",out)
if ( out != prev ) {
close(prev)
prev = out
}
}
{ print > out }
' *_*.fastq
That will concatenate files with the same suffix no matter how many files have the same suffix, not just 2.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论