2023年5月11日 06:29:03go评论62阅读模式

英文:

Sed is not removing the pattern for a list of headers

问题

问候，

我有一个文件中有多个DNA序列的以下头部信息

&gt;10 AC_000167.1
&gt;11 AC_000168.1
&gt;12 AC_000169.1
&gt;MT NC_006853.1
&gt;X AC_000187.1
&gt;GPS_000341582.1 NW_003097887.1
&gt;GPS_000341583.1 NW_003097888.1
&gt;GPS_000341584.1 NW_003097889.1
&gt;GPS_000341585.1 NW_003097890.1
&gt;GPS_000341586.1 NW_003097891.1

我正在使用以下sed命令来替换第一个空格后的所有内容。

sed -i &#39;s/[^(&gt;\d+?MT?X?GPS_\d+\.\d+)]\S..\d+\.\d+//g&#39; newHeader.txt

输出应该像这样

&gt;10
&gt;11
&gt;12
&gt;MT
&gt;X
&gt;GPS_000341582.1
&gt;GPS_000341583.1
&gt;GPS_000341584.1
&gt;GPS_000341585.1

然而，该命令似乎不起作用，也没有出现任何错误。我该如何修复？

英文:

Greeting,

I have following headers in a file with multiple dna sequences

&gt;10 AC_000167.1
&gt;11 AC_000168.1
&gt;12 AC_000169.1
&gt;MT NC_006853.1
&gt;X AC_000187.1
&gt;GPS_000341582.1 NW_003097887.1
&gt;GPS_000341583.1 NW_003097888.1
&gt;GPS_000341584.1 NW_003097889.1
&gt;GPS_000341585.1 NW_003097890.1
&gt;GPS_000341586.1 NW_003097891.1

I am using following sed command to replace everything after the first white space.

sed -i &#39;s/[^(&gt;\d+?MT?X?GPS_\d+\.\d+)]\S..\d+\.\d+//g&#39; newHeader.txt

The output should like this

&gt;10
&gt;11
&gt;12
&gt;MT
&gt;X
&gt;GPS_000341582.1
&gt;GPS_000341583.1
&gt;GPS_000341584.1
&gt;GPS_000341585.1

However the command does not seem to work and does not give any error. How can I fix this?

答案1

得分: 1

Sure, here's the translated content:

如果意图是删除第一个空格之后的所有内容（包括空格），但只针对某些特定行，则根据您提供的不起作用的sed命令，可能是您想要的：

# 使用支持-i和-E的sed：
sed -i -E 's/^(&gt;[0-9]+|MT|X|GPS_[0-9]+\.[0-9]+)[[:space:]].*//' infile

默认情况下，许多sed元字符在\之后出现。不需要在-E中使用反斜杠：

^ - 匹配行的开头
(...) - 分组
| - 替代（可能需要-E才能理解）
[list] - 来自列表的任何单个字符
- 在方括号内，[:space:] 匹配“空格”字符（制表符、换行符、空格等）
{min,max} - 重复前面的 min 到 max 次
* - 前面的零个或多个
+ - 前面的一个或多个（如果没有 -E 则无法理解）

警告： 使用 -i 非常危险。确保在发生问题时备份原始文件。

只支持 POSIX BRE 的sed版本不支持替代（\|）。对于这些版本，可以单独测试每个替代项：

# 使用任何POSIX sed：
sed '
    # 如果行匹配，则跳转到标签s
    /^\(&gt;X\)[[:space:]].*/bs
    /^\(&gt;MT\)[[:space:]].*/bs
    /^\(&gt;[0-9]\{1,\}\)[[:space:]].*/bs
    /^\(&gt;GPS_[0-9]\{1,\}\.[0-9]\{1,\}\)[[:space:]].*/bs

    # 如果到达这里，没有匹配项
    # 所以只需打印行并开始下一个循环
    d

    :s
    # 空的正则表达式重用前一个正则表达式
    s///
' infile > tmpfile && mv tmpfile infile

请注意，以上内容是对您提供的代码片段的翻译。如果您有其他问题或需要进一步的帮助，请随时告诉我。

英文:

If the intent is to strip everything after the first space (including the space), but only on some specific lines, then
based on the non-working sed command you provided, this may be what you want:

# with a sed that supports -i and -E:
sed -i -E &#39;s/^(&gt;[0-9]+|MT|X|GPS_[0-9]+\.[0-9]+)[[:space:]].*//&#39; infile

By default, many sed metacharacters appear after \. The backslash is not needed with -E:

^ - match start of line
(...) - grouping
| - alternation (may not be understood without -E)
[list] - any single character from list
- inside brackets [:space:] matches "space" characters (tab, newline, space, etc)
{min,max} - from min to max repetitions of preceding
* - zero or more of preceding
+ - one or more of preceding (not understood without -E)

Warning: Using -i is quite dangerous. Make sure you have backups of the original file in case something goes wrong.

Versions of sed that only support POSIX BRE do not support alternation (\|).
With these, one can test each alternative separately:

# with any POSIX sed:
sed &#39;
    # if line matches, branch to label s
    /^\(&gt;X\)[[:space:]].*/bs
    /^\(&gt;MT\)[[:space:]].*/bs
    /^\(&gt;[0-9]\{1,\}\)[[:space:]].*/bs
    /^\(&gt;GPS_[0-9]\{1,\}\.[0-9]\{1,\}\)[[:space:]].*/bs

    # if we got here nothing matched
    # so just print line and start next cycle
    d

    :s
    # empty regex reuses the previous one
    s///
&#39; infile &gt;tmpfile &amp;&amp; mv tmpfile infile

答案2

得分: 0

使用 sed 命令：

$ sed -i -E 's/^([^ ]+) .*//' file

正则表达式匹配如下：

节点	解释
`^`	字符串开头锚点
`(`	捕获组 \1：
`[^ ]+`	任意字符除了空格（1次或更多次，尽可能匹配最多字符）
`)`	\1 的结束
' '	空格
`.*`	任意字符除了换行符（0次或更多次，尽可能匹配最多字符）

使用 grep 命令：

grep -oP '^>\S+' file

正则表达式匹配如下：

节点	解释
`^`	字符串开头锚点
`>`	> 字符
`\S+`	非空白字符（除了换行、回车、制表、换页、和双引号之外的字符）（1次或更多次，尽可能匹配最多字符）

如果要进行原地编辑：

grep -oP '^>\S+' file | sponge file

英文:

With sed:

$ sed -i -E &#39;s/^([^ ]+) .*//&#39; file

The regular expression matches as follows:

Node	Explanation
`^`	the beginning of the string anchor
`(`	group and capture to \1:
`[^`	]+ any character except: space (1 or more times (matching the most amount possible))
`)`	end of \1
' '	space
`.*`	any character except \n (0 or more times (matching the most amount possible))

With grep:

grep -oP &#39;^&gt;\S+&#39; file
&gt;10
&gt;11
&gt;12
&gt;MT
&gt;X
&gt;GPS_000341582.1
&gt;GPS_000341583.1
&gt;GPS_000341584.1
&gt;GPS_000341585.1
&gt;GPS_000341586.1

The regular expression matches as follows:

Node	Explanation
`^`	the beginning of the string anchor
`>`	>
`\S+`	non-whitespace (all but \n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible))

If you want to edit in place:

 grep -oP &#39;^&gt;\S+&#39; file | sponge file

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“Sed未能删除标题列表的模式”

问题

答案1

答案2

在Linux上安装tensorflow-decision-forests的问题

如何创建类似图片中显示的Linux TUI界面

Go: strange results when using strings with exec.Command

GoLang: main找不到模块使用的共享库

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论