英文:
Adding file name to lines of characters starting with ">" for multiple .faa files with PowerShell
问题
Sure, here's the translated part of your request:
我有一个包含蛋白质序列的100个FASTA文件,存储在一个单独的目录中。我需要将它们的文件名添加到每个FASTA头部(以字符字符串字符串以">"开头的方式),然后将它们合并成一个单独的.faa文件。
我已经使用以下PowerShell命令进行了合并:
#将扩展名从.faa更改为.txt
gci -File | Rename-Item -NewName { $_.name -replace ".faa", ".txt" }
#实际合并
Get-ChildItem $directory -include *.txt -rec | ForEach-Object {gc $_; ""} | out-file $directory
#更改编码以便我可以在R中进一步处理文件
Get-Content .\test.txt | Set-Content -Encoding utf8 test-utf8.txt
之后,只需将扩展名改回.faa。
每个文件存储多个蛋白质序列。每个头部应该如下所示:
some_sequence -> >some_sequence 文件名
这是我第一次接触PowerShell,我该如何做呢?
最好的问候!
英文:
I have a 100 of FASTA containing protein sequences stored in a singe directory. I need to add their respective file names to each of the FASTA headers (character string strings starting with ">") containd within them and subsequently merge them into a single .faa file.
I got the merging part going with the following PowerShell commands:
#Change extensions from .faa to .txt
gci -File | Rename-Item -NewName { $_.name -replace ".faa", ".txt" }
#Actual merging
Get-ChildItem $directory -include *.txt -rec | ForEach-Object {gc $_; ""} | out-file $directory
#Change encoding so I can process the file further in R
Get-Content .\test.txt | Set-Content -Encoding utf8 test-utf8.txt
After that I just change the extension back to .faa.
Each file stores multiple sequences of proteins. Each header should look like this:
>some_sequence -> >some_sequence file_name
This is my first contact with PowerShell, how can I do this?
Best regards!
答案1
得分: 1
我假设你正在寻找类似以下的内容,它使用了一个 [`switch`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Switch) 语句来处理单个文件并修改它们的头部:
Get-ChildItem $directory -Filter *.faa -Recurse |
ForEach-Object {
$file = $_
switch -Regex -File $file.FullName { # 处理当前文件。
'^>' { $_ + ' ' + $file.Name } # 头部行 -> 追加文件名
default { $_ } # 传递
}
'' # 空行,用于分隔各个文件的内容。
} |
Set-Content -Encoding utf8 test-utf8.txt
注意:
* 无需首先重命名 `.faa` 文件。
* 无需中间文件来保存修改后的头部 - 所有内容都可以直接流式传输到单个 [`Set-Content`](https://learn.microsoft.com/powershell/module/microsoft.powershell.management/set-content) 调用中。
英文:
<!-- language-all: sh -->
I assume you're looking for something like the following, which uses a switch
statement to process the individual files and modifies their headers:
Get-ChildItem $directory -Filter *.faa -Recurse |
ForEach-Object {
$file = $_
switch -Regex -File $file.FullName { # Process the file at hand.
'^>' { $_ + ' ' + $file.Name } # header line -> append file name
default { $_ } # pass through
}
'' # Empty line between the content from the indiv. files.
} |
Set-Content -Encoding utf8 test-utf8.txt
Note:
- No need to rename the
.faa
files first. - No need for intermediate files with modified headers - all content for the ultimate output file can directly be streamed to a single
Set-Content
call.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论