如何在bash中从“name1 11/22 name2 33 / 44 name3 last3 55/66”中提取名称

huangapple go评论81阅读模式
英文:

How to extract the names from a "name1 11/22 name2 33 / 44 name3 last3 55/66" in bash

问题

My input data formatted as the pattern '<name> <a>/<b>'. It is possible occurs zero or many times in the same line. And it is possible exist a extra space between the '/'.

I expect to extract the names as

  1. name1
  2. name2
  3. name3 last3

Here is the wrong code

  1. echo "name1 11/22 name2 33 / 44 name3 last3 55/66" \
  2. | grep -o -E '\b[a-zA-Z][a-zA-Z0-9. ]+\b'))

will extract

  1. name1 11
  2. name2 33
  3. name3 last3 55

This script should also pass empty line as no output.

英文:

My input data formatted as the pattern '<name> <a>/<b>'. It is possible occurs zero or many times in the same line. And it is possible exist a extra space between the '/'.

I expect to extract the names as

  1. name1
  2. name2
  3. name3 last3

Here is the wrong code

  1. echo "name1 11/22 name2 33 / 44 name3 last3 55/66" \
  2. | grep -o -E '\b[a-zA-Z][a-zA-Z0-9. ]+\b'))

will extract

  1. name1 11
  2. name2 33
  3. name3 last3 55

This script should also pass empty line as no output.

答案1

得分: 1

name1
name2
name3 last3

英文:
  1. $ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  2. awk -F' *[0-9]* */ *[0-9]* *' '{for(i=1;i<NF;i++) print $i}'
  3. name1
  4. name2
  5. name3 last3
  6. $ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  7. awk -F' *[0-9]* */ *[0-9]* *' -v OFS='\n' '{NF-=1}1'
  8. name1
  9. name2
  10. name3 last3

答案2

得分: 0

使用 grep,您可以轻松提取<a>/<b>部分:

  1. $ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  2. grep -oE '[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+'
  3. 11/22
  4. 33 / 44
  5. 55/66

但如果您想要的不是打印这些内容,而是用换行符替换它们,sedawk可能是更好的选择。使用 sed 的示例:

  1. $ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  2. sed 's![[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*/[[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*!\n!g'
  3. name1
  4. name2
  5. name3 last3

或者,使用 GNU sed

  1. $ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  2. sed -E 's!\s*\S+\s*/\s*\S+\s*!\n!g'
  3. name1
  4. name2
  5. name3 last3

请注意,在每行的最后一个名称后也会添加一个换行符,导致输出中有空行。如果这不可接受,我们可以单独处理每行的最后一个名称。使用 GNU sed 的示例:

  1. $ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  2. sed -E 's!\s*\S+\s*/\s*\S+\s*$!!;s!\s*\S+\s*/\s*\S+\s*!\n!g'
  3. name1
  4. name2
  5. name3 last3

使用 awk,我们可以将字段分隔符定义为您要删除的<a>/<b>部分,并在单独的行上打印所有字段(除了最后一个空字段):

  1. $ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  2. awk -F '[[:space:]]*[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+[[:space:]]*' '{for(i=1;i<NF;i++) print $i}'
  3. name1
  4. name2
  5. name3 last3

或者,使用 GNU awk

  1. $ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
  2. awk -F '\\s*\\S+\\s*/\\s*\\S+\\s*' '{for(i=1;i<NF;i++) print $i}'
  3. name1
  4. name2
  5. name3 last3
英文:

With grep you could easily extract the &lt;a&gt;/&lt;b&gt; parts:

  1. $ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  2. grep -oE &#39;[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+&#39;
  3. 11/22
  4. 33 / 44
  5. 55/66

But as what you want is not to print these but replace them with newlines, sed or awk are probably better choices. Example with sed:

  1. $ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  2. sed &#39;s![[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*/[[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*!\n!g&#39;
  3. name1
  4. name2
  5. name3 last3

Or, with GNU sed:

  1. $ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  2. sed -E &#39;s!\s*\S+\s*/\s*\S+\s*!\n!g&#39;
  3. name1
  4. name2
  5. name3 last3

Note that a newline is also added after the last name of a line, leading to empty lines in the output. If this is not acceptable we can process the last name of a line separately. Example with GNU sed:

  1. $ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  2. sed -E &#39;s!\s*\S+\s*/\s*\S+\s*$!!;s!\s*\S+\s*/\s*\S+\s*!\n!g&#39;
  3. name1
  4. name2
  5. name3 last3

With awk we can define the field separator as the &lt;a&gt;/&lt;b&gt; parts you want to remove and print all fields (except the last empty field) on a separate line:

  1. $ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  2. awk -F &#39;[[:space:]]*[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+[[:space:]]*&#39; &#39;
  3. {for(i=1;i&lt;NF;i++) print $i}&#39;
  4. name1
  5. name2
  6. name3 last3

Or, with GNU awk:

  1. $ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  2. awk -F &#39;\\s*\\S+\\s*/\\s*\\S+\\s*&#39; &#39;{for(i=1;i&lt;NF;i++) print $i}&#39;
  3. name1
  4. name2
  5. name3 last3

答案3

得分: 0

  1. 如果您的 `grep` 支持 `-P`PCRE)选项,请尝试:
  2. ```shell
  3. echo "name1 11/22 name2 33 / 44 name3 last3 55/66" \
  4. | grep -oP '\b[a-zA-Z][a-zA-Z0-9. ]+\b(?=\s+\d+\s*/\s*\d+)'

输出:

  1. name1
  2. name2
  3. name3 last3

(?=\s+\d+\s*/\s*\d+) 是前瞻断言,用于匹配以下序列:

  • 一个或多个空白字符
  • 一个或多个数字
  • 零个或多个空白字符
  • 一个斜杠字符
  • 零个或多个空白字符
  • 一个或多个数字

匹配的子字符串不包括在输出中。

  1. <details>
  2. <summary>英文:</summary>
  3. If your `grep` supports `-P` (PCRE) option, would you please try:

echo "name1 11/22 name2 33 / 44 name3 last3 55/66"
| grep -oP '\b[a-zA-Z][a-zA-Z0-9. ]+\b(?=\s+\d+\s*/\s*\d+)'

  1. Output:

name1
name2
name3 last3

  1. `(?=\s+\d+\s*/\s*\d+)` is the lookahead assertion which matches
  2. a sequence of:
  3. - one or more blank character(s)
  4. - one or more digit(s)
  5. - zero or more blank character(s)
  6. - a slash character
  7. - zero or more blank character(s)
  8. - one or more digit(s)
  9. The matched substring is not included in the output.
  10. </details>
  11. # 答案4
  12. **得分**: 0

使用GNU awk来处理多字符的RSRT\s/\S,这可能是您想要的:

$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
awk -v RS='\s+\S+\s*/\s*\S+\s*' 'RT'
name1
name2
name3 last3

  1. <details>
  2. <summary>英文:</summary>
  3. Using GNU awk for multi-char `RS`, `RT`, and `\s/\S`, this might be what you want:
  4. $ echo &quot;name1 11/22 name2 33 / 44 name3 last3 55/66&quot; |
  5. awk -v RS=&#39;\\s+\\S+\\s*/\\s*\\S+\\s*&#39; &#39;RT&#39;
  6. name1
  7. name2
  8. name3 last3
  9. </details>

huangapple
  • 本文由 发表于 2023年7月20日 13:30:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76726920.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定