英文:
How to edit multiple files in different directories in linux?
问题
我有几个在Linux中的目录,每个目录都有相同的VCF文件名和相同的初始几行。在这些相同的信息行之后是我需要的数据。我想要编写一个命令行代码,它可以进入每个目录,编辑文件,使得信息行被删除,只剩下数据。
当我尝试处理单独的文件时,我使用以下代码:
find . -type f -name "File.vcf" -print0 |
while IFS= read -r -d '' file; do
awk 'substr($0,1,3)=="chr"' $file > "$(echo "$file" | cut -d'_' -f2)"_cleaned.vcf
done
这个方法有效,只给我留下以"chr"开头的行,这正是我想要的。现在我尝试将这个操作扩展到多个目录,我想用一个命令完成,但是我写的代码是这样的:
for i in "directory"; do
cd /user/xxxxxxxx/$i |
find . -type f -name "File.vcf" -print0 |
while IFS= read -r -d '' file; do
awk 'substr($0,1,3)=="chr"' $file > "$(echo "$file" | cut -d'_' -f2)"_cleaned.vcf
done
done
但是当我运行这个代码时,文件被完全清空了,我不明白为什么。我仍然在努力理解Linux和命令行函数,如果有人有建议,我将不胜感激。
英文:
I have several directories in linux where each directory has the same VCF file name and same initial few lines. Past those identical informative lines lies the data that I need. I want to essentially write a command line code that goes to each of these directories, edits the files so that the informative lines are removed and I am only left with the data.
When I tried this with a file by itself, I use the following code
find . -type f -name "File.vcf" -print0 |
while IFS= read -r -d '' file; do
awk 'substr($0,1,3)=="chr"' $file > "$(echo "$file" | cut -d'_' -f2)"_cleaned.vcf
done
This works and only gives me lines that start with chr which is what I want. Now I tried to step this up and with one command, I wanted to hit 7 birds with 1 stone and wrote the following code:
for i in "directory"; do
cd /user/xxxxxxxx/$i |
find . -type f -name "File.vcf" -print0 |
while IFS= read -r -d '' file; do
awk 'substr($0,1,3)=="chr"' $file > "$(echo "$file" | cut -d'_' -f2)"_cleaned.vcf
done
done
When I run this, the files are completely emptied out and I don't understand why. I am still trying to grasp linux and command line functions, but if anyone has tips, I would be grateful.
答案1
得分: 1
I can help with the translation:
"Without more information about the names, locations, and contents of the files you are targeting, it's hard to debug your specific problem. But I would refactor to
find directories -type f -name "File.vcf" \
-exec 'for file; do
grep "^chr" "$file" > "${file%.vcf}_cleaned.vcf"
done' "$0" {} +
which should be more robust, more portable, and somewhat more efficient (though I had to guess about the positions of underscores in the names of the found files; perhaps I guessed wrong).
Like William Pursell comments, cd | find
is not useful or correct. Pass the list of directories you want to search to find
directly; it accepts a list of directories to traverse before the predicates.
The parameter expansion ${file%.vcf}
produces the value of the variable $file
with any suffix .vcf
trimmed off. The suffix expression can be a pattern, but if you require the text between two underscores, you need two parameter expansions (one to remove a prefix with ${file#*_}
and another to remove a suffix ${file%_*}
)."
英文:
Without more information about the names, locations, and contents of the files you are targeting, it's hard to debug your specific problem. But I would refactor to
find directories -type f -name "File.vcf" \
-exec 'for file; do
grep "^chr" "$file" >"${file%.vcf}_cleaned.vcf"
done' "$0" {} +
which should be more robust, more portable, and somewhat more efficient (though I had to guess about the positions of underscores in the names of the found files; perhaps I guessed wrong).
Like William Pursell comments, cd | find
is not useful or correct. Pass the list of directories you want to search to find
directly; it accepts a list of directories to traverse before the predicates.
The parameter expansion ${file%.vcf}
produces the value of the variable $file
with any suffix .vcf
trimmed off. The suffix expression can be a pattern, but if you require the text between two underscores, you need two parameter expansions (one to remove a prefix with ${file#*_}
and another to remove a suffix ${file%_*}
).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论