如何在bash中从“name1 11/22 name2 33 / 44 name3 last3 55/66”中提取名称

huangapple go评论46阅读模式
英文:

How to extract the names from a "name1 11/22 name2 33 / 44 name3 last3 55/66" in bash

问题

My input data formatted as the pattern '<name> <a>/<b>'. It is possible occurs zero or many times in the same line. And it is possible exist a extra space between the '/'.

I expect to extract the names as

name1
name2
name3 last3

Here is the wrong code

echo "name1 11/22 name2 33 / 44 name3 last3 55/66" \
 | grep -o -E '\b[a-zA-Z][a-zA-Z0-9. ]+\b'))

will extract

name1 11
name2 33
name3 last3 55

This script should also pass empty line as no output.

英文:

My input data formatted as the pattern '<name> <a>/<b>'. It is possible occurs zero or many times in the same line. And it is possible exist a extra space between the '/'.

I expect to extract the names as

name1
name2
name3 last3

Here is the wrong code

echo "name1 11/22 name2 33 / 44 name3 last3 55/66" \
 | grep -o -E '\b[a-zA-Z][a-zA-Z0-9. ]+\b'))

will extract

name1 11
name2 33
name3 last3 55

This script should also pass empty line as no output.

答案1

得分: 1

name1
name2
name3 last3

英文:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" | 
    awk -F' *[0-9]* */ *[0-9]* *' '{for(i=1;i<NF;i++) print $i}'
name1
name2
name3 last3

$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" | 
    awk -F' *[0-9]* */ *[0-9]* *' -v OFS='\n' '{NF-=1}1'
name1
name2
name3 last3

答案2

得分: 0

使用 grep,您可以轻松提取<a>/<b>部分:

$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  grep -oE '[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+'
11/22
33 / 44
55/66

但如果您想要的不是打印这些内容,而是用换行符替换它们,sedawk可能是更好的选择。使用 sed 的示例:

$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  sed 's![[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*/[[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*!\n!g'
name1
name2
name3 last3

或者,使用 GNU sed

$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  sed -E 's!\s*\S+\s*/\s*\S+\s*!\n!g'
name1
name2
name3 last3

请注意,在每行的最后一个名称后也会添加一个换行符,导致输出中有空行。如果这不可接受,我们可以单独处理每行的最后一个名称。使用 GNU sed 的示例:

$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  sed -E 's!\s*\S+\s*/\s*\S+\s*$!!;s!\s*\S+\s*/\s*\S+\s*!\n!g'
name1
name2
name3 last3

使用 awk,我们可以将字段分隔符定义为您要删除的<a>/<b>部分,并在单独的行上打印所有字段(除了最后一个空字段):

$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  awk -F '[[:space:]]*[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+[[:space:]]*' '{for(i=1;i<NF;i++) print $i}'
name1
name2
name3 last3

或者,使用 GNU awk

$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  awk -F '\\s*\\S+\\s*/\\s*\\S+\\s*' '{for(i=1;i<NF;i++) print $i}'
name1
name2
name3 last3
英文:

With grep you could easily extract the &lt;a&gt;/&lt;b&gt; parts:

$ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  grep -oE &#39;[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+&#39;
11/22
33 / 44
55/66

But as what you want is not to print these but replace them with newlines, sed or awk are probably better choices. Example with sed:

$ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  sed &#39;s![[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*/[[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*!\n!g&#39;
name1
name2
name3 last3

Or, with GNU sed:

$ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  sed -E &#39;s!\s*\S+\s*/\s*\S+\s*!\n!g&#39;
name1
name2
name3 last3

Note that a newline is also added after the last name of a line, leading to empty lines in the output. If this is not acceptable we can process the last name of a line separately. Example with GNU sed:

$ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  sed -E &#39;s!\s*\S+\s*/\s*\S+\s*$!!;s!\s*\S+\s*/\s*\S+\s*!\n!g&#39;
name1
name2
name3 last3

With awk we can define the field separator as the &lt;a&gt;/&lt;b&gt; parts you want to remove and print all fields (except the last empty field) on a separate line:

$ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  awk -F &#39;[[:space:]]*[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+[[:space:]]*&#39; &#39;
    {for(i=1;i&lt;NF;i++) print $i}&#39;
name1
name2
name3 last3

Or, with GNU awk:

$ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  awk -F &#39;\\s*\\S+\\s*/\\s*\\S+\\s*&#39; &#39;{for(i=1;i&lt;NF;i++) print $i}&#39;
name1
name2
name3 last3

答案3

得分: 0

如果您的 `grep` 支持 `-P`(PCRE)选项,请尝试:

```shell
echo "name1 11/22 name2 33 / 44 name3 last3 55/66" \
 | grep -oP '\b[a-zA-Z][a-zA-Z0-9. ]+\b(?=\s+\d+\s*/\s*\d+)'

输出:

name1
name2
name3 last3

(?=\s+\d+\s*/\s*\d+) 是前瞻断言,用于匹配以下序列:

  • 一个或多个空白字符
  • 一个或多个数字
  • 零个或多个空白字符
  • 一个斜杠字符
  • 零个或多个空白字符
  • 一个或多个数字

匹配的子字符串不包括在输出中。


<details>
<summary>英文:</summary>

If your `grep` supports `-P` (PCRE) option, would you please try:

echo "name1 11/22 name2 33 / 44 name3 last3 55/66"
| grep -oP '\b[a-zA-Z][a-zA-Z0-9. ]+\b(?=\s+\d+\s*/\s*\d+)'

Output:

name1
name2
name3 last3

`(?=\s+\d+\s*/\s*\d+)` is the lookahead assertion which matches
a sequence of:
- one or more blank character(s)
- one or more digit(s)
- zero or more blank character(s)
- a slash character
- zero or more blank character(s)
- one or more digit(s)

The matched substring is not included in the output.

</details>



# 答案4
**得分**: 0

使用GNU awk来处理多字符的RSRT\s/\S,这可能是您想要的:

$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
awk -v RS='\s+\S+\s*/\s*\S+\s*' 'RT'
name1
name2
name3 last3


<details>
<summary>英文:</summary>

Using GNU awk for multi-char `RS`, `RT`, and `\s/\S`, this might be what you want:

    $ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; | 
        awk -v RS=&#39;\\s+\\S+\\s*/\\s*\\S+\\s*&#39; &#39;RT&#39;
    name1
    name2
    name3 last3



</details>



huangapple
  • 本文由 发表于 2023年7月20日 13:30:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76726920.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定