如何在Linux中编辑不同目录中的多个文件?

huangapple go评论43阅读模式
英文:

How to edit multiple files in different directories in linux?

问题

我有几个在Linux中的目录,每个目录都有相同的VCF文件名和相同的初始几行。在这些相同的信息行之后是我需要的数据。我想要编写一个命令行代码,它可以进入每个目录,编辑文件,使得信息行被删除,只剩下数据。

当我尝试处理单独的文件时,我使用以下代码:

find . -type f -name "File.vcf" -print0 |
    while IFS= read -r -d '' file; do
        awk 'substr($0,1,3)=="chr"' $file > "$(echo "$file" | cut -d'_' -f2)"_cleaned.vcf
    done

这个方法有效,只给我留下以"chr"开头的行,这正是我想要的。现在我尝试将这个操作扩展到多个目录,我想用一个命令完成,但是我写的代码是这样的:

for i in "directory"; do
    cd /user/xxxxxxxx/$i |
        find . -type f -name "File.vcf" -print0 |
        while IFS= read -r -d '' file; do
            awk 'substr($0,1,3)=="chr"' $file > "$(echo "$file" | cut -d'_' -f2)"_cleaned.vcf
        done
    done

但是当我运行这个代码时,文件被完全清空了,我不明白为什么。我仍然在努力理解Linux和命令行函数,如果有人有建议,我将不胜感激。

英文:

I have several directories in linux where each directory has the same VCF file name and same initial few lines. Past those identical informative lines lies the data that I need. I want to essentially write a command line code that goes to each of these directories, edits the files so that the informative lines are removed and I am only left with the data.

When I tried this with a file by itself, I use the following code

find . -type f -name "File.vcf" -print0 |
    while IFS= read -r -d '' file; do
        awk 'substr($0,1,3)=="chr"' $file > "$(echo "$file" | cut -d'_' -f2)"_cleaned.vcf
    done

This works and only gives me lines that start with chr which is what I want. Now I tried to step this up and with one command, I wanted to hit 7 birds with 1 stone and wrote the following code:

for i in "directory"; do
    cd /user/xxxxxxxx/$i |
        find . -type f -name "File.vcf" -print0 |
        while IFS= read -r -d '' file; do
            awk 'substr($0,1,3)=="chr"' $file > "$(echo "$file" | cut -d'_' -f2)"_cleaned.vcf
        done
    done

When I run this, the files are completely emptied out and I don't understand why. I am still trying to grasp linux and command line functions, but if anyone has tips, I would be grateful.

答案1

得分: 1

I can help with the translation:

"Without more information about the names, locations, and contents of the files you are targeting, it's hard to debug your specific problem. But I would refactor to

find directories -type f -name "File.vcf" \
    -exec 'for file; do
        grep "^chr" "$file" > "${file%.vcf}_cleaned.vcf"
    done' "$0" {} +

which should be more robust, more portable, and somewhat more efficient (though I had to guess about the positions of underscores in the names of the found files; perhaps I guessed wrong).

Like William Pursell comments, cd | find is not useful or correct. Pass the list of directories you want to search to find directly; it accepts a list of directories to traverse before the predicates.

The parameter expansion ${file%.vcf} produces the value of the variable $file with any suffix .vcf trimmed off. The suffix expression can be a pattern, but if you require the text between two underscores, you need two parameter expansions (one to remove a prefix with ${file#*_} and another to remove a suffix ${file%_*})."

英文:

Without more information about the names, locations, and contents of the files you are targeting, it's hard to debug your specific problem. But I would refactor to

find directories -type f -name "File.vcf" \
    -exec 'for file; do
        grep "^chr" "$file" >"${file%.vcf}_cleaned.vcf"
    done' "$0" {} +

which should be more robust, more portable, and somewhat more efficient (though I had to guess about the positions of underscores in the names of the found files; perhaps I guessed wrong).

Like William Pursell comments, cd | find is not useful or correct. Pass the list of directories you want to search to find directly; it accepts a list of directories to traverse before the predicates.

The parameter expansion ${file%.vcf} produces the value of the variable $file with any suffix .vcf trimmed off. The suffix expression can be a pattern, but if you require the text between two underscores, you need two parameter expansions (one to remove a prefix with ${file#*_} and another to remove a suffix ${file%_*}).

huangapple
  • 本文由 发表于 2023年5月20日 21:03:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76295375.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定