2023年4月4日 07:24:14go评论93阅读模式

英文:

Merge multiple files based on regex

问题

如何使用for循环合并具有匹配字符/数字的两个带下划线的文件？目录中有许多文件

输入：

SRR9200887_1.fastq
SRR9200887_2.fastq
SRR9200888_1.fastq
SRR9200888_2.fastq
SRR9200889_1.fastq
SRR9200889_2.fastq

期望输出：

SRR9200887.fastq
SRR9200888.fastq
SRR9200889.fastq

我的尝试：

for l in $(ls *.fastq | cut -d_ -f1 | sort | uniq); do cat ${l}*.fastq

英文:

How do I merge two files with matching characters/digits before an underscore using a for loop? I have many files in the directory

Input:

SRR9200887_1.fastq
SRR9200887_2.fastq
SRR9200888_1.fastq
SRR9200888_2.fastq
SRR9200889_1.fastq
SRR9200889_2.fastq

Expected output:

SRR9200887.fastq
SRR9200888.fastq
SRR9200889.fastq

My attempt:

for l in $(ls *.fastq | cut -d_ -f1 | sort |uniq); do cat ${l}*.fastq

答案1

得分: 5

With bash 和它的 Parameter Expansion：

for i in *_1.fastq; do
  cat &quot;${i%_*.fastq}_1.fastq&quot; &quot;${i%_*.fastq}_2.fastq&quot; &gt; &quot;${i%_*.fastq}.fastq&quot;;
done

${i%_*.fastq} 输出 $i 但不包含 _ 及其后的部分，例如 SRR9200887。

英文:

With bash and its Parameter Expansion:

for i in *_1.fastq; do
  cat &quot;${i%_*.fastq}_1.fastq&quot; &quot;${i%_*.fastq}_2.fastq&quot; &gt; &quot;${i%_*.fastq}.fastq&quot;;
done

${i%_*.fastq} outputs $i without _ and all following it, e.g. SRR9200887.

答案2

得分: 3

for f in *_*.fastq; do cat "$f" >> "${f%_*}.fastq"; done

英文:

for f in *_*.fastq; do cat &quot;$f&quot; &gt;&gt; &quot;${f%_*}.fastq&quot;; done

答案3

得分: 2

为了将文件合并在一起，假设您有匹配的"_1.fastq"和"_2.fastq"文件，每个"SRR"对应一个，一个潜在的选项是：

SRR_array=(*_1.fastq)
for f in "${SRR_array[@]%%_*}"
do
    cat "$f"_1.fastq "$f"_2.fastq > "$f".fastq
done

如果您想在合并后删除"_1.fastq"和"_2.fastq"文件：

SRR_array=(*_1.fastq)
for f in "${SRR_array[@]%%_*}"
do
    cat "$f"_1.fastq "$f"_2.fastq > "$f".fastq
    rm "$f"_1.fastq "$f"_2.fastq
done

英文:

To cat the files together, assuming you have matching "_1.fastq" and "_2.fastq" for every "SRR", one potential option is:

SRR_array=(*_1.fastq)
for f in &quot;${SRR_array[@]%%_*}&quot;
do
    cat &quot;$f&quot;_1.fastq &quot;$f&quot;_2.fastq &gt; &quot;$f&quot;.fastq
done

If you wanted to delete the _1.fastq and _2.fastq files after merging them together:

SRR_array=(*_1.fastq)
for f in &quot;${SRR_array[@]%%_*}&quot;
do
    cat &quot;$f&quot;_1.fastq &quot;$f&quot;_2.fastq &gt; &quot;$f&quot;.fastq
    rm &quot;$f&quot;_1.fastq &quot;$f&quot;_2.fastq
done

答案4

得分: 1

One bash idea:

while read -r pfx
do
    cat "${pfx}"_*.fastq >> "${pfx}".fastq
done < <(find . -name "*_*.fastq" | cut -d'_' -f1 | sort -u)

Tweaking OP's current code:

for l in $(ls -1 *_*.fastq | cut -d_ -f1 | sort | uniq)
do
    cat ${l}_*.fastq >> "${l}".fastq
done

Where:

we look for files with a _ in the name; if the script is run more than once this will insure we don't pick up the previous concatenated files
make sure ls lists one file per line (hence the -1)
in this case sort | uniq could be replaced with sort -u

英文:

One bash idea:

while read -r pfx
do
    cat &quot;${pfx}&quot;_*.fastq &gt;&gt; &quot;${pfx}&quot;.fastq
done &lt; &lt;(find . -name &quot;*_*.fastq&quot; | cut -d&#39;_&#39; -f1 | sort -u)

Tweaking OP's current code:

for l in $(ls -1 *_*.fastq | cut -d_ -f1 | sort | uniq)
do
    cat ${l}_*.fastq &gt;&gt; &quot;${l}&quot;.fastq
done

Where:

we look for files with a _ in the name; if the script is run more than once this will insure we don't pick up the previous concatenated files
make sure ls lists one file per line (hence the -1)
in this case sort | uniq could be replaced with sort -u

答案5

得分: 0

使用任何awk（未经测试）：

    FNR==1 {
        out = FILENAME
        sub(/_[0-9]+/,"",out)
        if ( out != prev ) {
            close(prev)
            prev = out
        }
    }
    { print > out }

英文:

Using any awk (untested):

awk &#39;
    FNR==1 {
        out = FILENAME
        sub(/_[0-9]+/,&quot;&quot;,out)
        if ( out != prev ) {
            close(prev)
            prev = out
        }
    }
    { print &gt; out }
&#39; *_*.fastq

That will concatenate files with the same suffix no matter how many files have the same suffix, not just 2.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

合并多个文件，基于正则表达式。

问题

答案1

答案2

答案3

答案4

答案5

如何在Bash中填充包含多行的字符串数组？

在另一个文件中匹配特定字符串的序列，过滤Fasta文件。

循环遍历对象并使用Shell命令的输出来添加一个字段

如何使用for循环和if-else语句来迭代地在命令行上运行软件

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论