将多个 .faa 文件中以“>”开头的字符行添加文件名的 PowerShell 代码部分。

huangapple go评论52阅读模式
英文:

Adding file name to lines of characters starting with ">" for multiple .faa files with PowerShell

问题

Sure, here's the translated part of your request:

我有一个包含蛋白质序列的100个FASTA文件,存储在一个单独的目录中。我需要将它们的文件名添加到每个FASTA头部(以字符字符串字符串以">"开头的方式),然后将它们合并成一个单独的.faa文件。

我已经使用以下PowerShell命令进行了合并:

#将扩展名从.faa更改为.txt
gci -File | Rename-Item -NewName { $_.name -replace ".faa", ".txt" }

#实际合并
Get-ChildItem $directory -include *.txt -rec | ForEach-Object {gc $_; ""} | out-file $directory

#更改编码以便我可以在R中进一步处理文件
Get-Content .\test.txt | Set-Content -Encoding utf8 test-utf8.txt

之后,只需将扩展名改回.faa。

每个文件存储多个蛋白质序列。每个头部应该如下所示:

some_sequence -> >some_sequence 文件名

这是我第一次接触PowerShell,我该如何做呢?
最好的问候!

英文:

I have a 100 of FASTA containing protein sequences stored in a singe directory. I need to add their respective file names to each of the FASTA headers (character string strings starting with ">") containd within them and subsequently merge them into a single .faa file.

I got the merging part going with the following PowerShell commands:

#Change extensions from .faa to .txt
gci -File | Rename-Item -NewName { $_.name -replace ".faa", ".txt" }

#Actual merging
Get-ChildItem $directory -include *.txt -rec | ForEach-Object {gc $_; ""} | out-file $directory

#Change encoding so I can process the file further in R
Get-Content .\test.txt | Set-Content -Encoding utf8 test-utf8.txt

After that I just change the extension back to .faa.

Each file stores multiple sequences of proteins. Each header should look like this:

>some_sequence -> >some_sequence file_name

This is my first contact with PowerShell, how can I do this?
Best regards!

答案1

得分: 1

我假设你正在寻找类似以下的内容,它使用了一个 [`switch`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Switch) 语句来处理单个文件并修改它们的头部:

Get-ChildItem $directory -Filter *.faa -Recurse |
ForEach-Object {
$file = $_
switch -Regex -File $file.FullName { # 处理当前文件。
'^>' { $_ + ' ' + $file.Name } # 头部行 -> 追加文件名
default { $_ } # 传递
}
'' # 空行,用于分隔各个文件的内容。
} |
Set-Content -Encoding utf8 test-utf8.txt


注意:

* 无需首先重命名 `.faa` 文件。
* 无需中间文件来保存修改后的头部 - 所有内容都可以直接流式传输到单个 [`Set-Content`](https://learn.microsoft.com/powershell/module/microsoft.powershell.management/set-content) 调用中。
英文:

<!-- language-all: sh -->

I assume you're looking for something like the following, which uses a switch statement to process the individual files and modifies their headers:

Get-ChildItem $directory -Filter *.faa -Recurse | 
  ForEach-Object {
    $file = $_
    switch -Regex -File $file.FullName { # Process the file at hand.
      &#39;^&gt;&#39; { $_ + &#39; &#39; + $file.Name  } # header line -&gt; append file name
      default { $_ } # pass through
    }
    &#39;&#39;  # Empty line between the content from the indiv. files.
  } | 
  Set-Content -Encoding utf8 test-utf8.txt

Note:

  • No need to rename the .faa files first.
  • No need for intermediate files with modified headers - all content for the ultimate output file can directly be streamed to a single Set-Content call.

huangapple
  • 本文由 发表于 2023年4月13日 17:51:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004046.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定