英文:
How to extract the names from a "name1 11/22 name2 33 / 44 name3 last3 55/66" in bash
问题
My input data formatted as the pattern '<name> <a>/<b>'. It is possible occurs zero or many times in the same line. And it is possible exist a extra space between the '/'.
I expect to extract the names as
name1
name2
name3 last3
Here is the wrong code
echo "name1 11/22 name2 33 / 44 name3 last3 55/66" \
| grep -o -E '\b[a-zA-Z][a-zA-Z0-9. ]+\b'))
will extract
name1 11
name2 33
name3 last3 55
This script should also pass empty line as no output.
英文:
My input data formatted as the pattern '<name> <a>/<b>'. It is possible occurs zero or many times in the same line. And it is possible exist a extra space between the '/'.
I expect to extract the names as
name1
name2
name3 last3
Here is the wrong code
echo "name1 11/22 name2 33 / 44 name3 last3 55/66" \
| grep -o -E '\b[a-zA-Z][a-zA-Z0-9. ]+\b'))
will extract
name1 11
name2 33
name3 last3 55
This script should also pass empty line as no output.
答案1
得分: 1
name1
name2
name3 last3
英文:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
awk -F' *[0-9]* */ *[0-9]* *' '{for(i=1;i<NF;i++) print $i}'
name1
name2
name3 last3
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
awk -F' *[0-9]* */ *[0-9]* *' -v OFS='\n' '{NF-=1}1'
name1
name2
name3 last3
答案2
得分: 0
使用 grep
,您可以轻松提取<a>/<b>
部分:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
grep -oE '[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+'
11/22
33 / 44
55/66
但如果您想要的不是打印这些内容,而是用换行符替换它们,sed
或awk
可能是更好的选择。使用 sed
的示例:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
sed 's![[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*/[[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*!\n!g'
name1
name2
name3 last3
或者,使用 GNU sed
:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
sed -E 's!\s*\S+\s*/\s*\S+\s*!\n!g'
name1
name2
name3 last3
请注意,在每行的最后一个名称后也会添加一个换行符,导致输出中有空行。如果这不可接受,我们可以单独处理每行的最后一个名称。使用 GNU sed
的示例:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
sed -E 's!\s*\S+\s*/\s*\S+\s*$!!;s!\s*\S+\s*/\s*\S+\s*!\n!g'
name1
name2
name3 last3
使用 awk
,我们可以将字段分隔符定义为您要删除的<a>/<b>
部分,并在单独的行上打印所有字段(除了最后一个空字段):
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
awk -F '[[:space:]]*[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+[[:space:]]*' '{for(i=1;i<NF;i++) print $i}'
name1
name2
name3 last3
或者,使用 GNU awk
:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
awk -F '\\s*\\S+\\s*/\\s*\\S+\\s*' '{for(i=1;i<NF;i++) print $i}'
name1
name2
name3 last3
英文:
With grep
you could easily extract the <a>/<b>
parts:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
grep -oE '[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+'
11/22
33 / 44
55/66
But as what you want is not to print these but replace them with newlines, sed
or awk
are probably better choices. Example with sed
:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
sed 's![[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*/[[:space:]]*[^[:space:]][^[:space:]]*[[:space:]]*!\n!g'
name1
name2
name3 last3
Or, with GNU sed
:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
sed -E 's!\s*\S+\s*/\s*\S+\s*!\n!g'
name1
name2
name3 last3
Note that a newline is also added after the last name of a line, leading to empty lines in the output. If this is not acceptable we can process the last name of a line separately. Example with GNU sed
:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
sed -E 's!\s*\S+\s*/\s*\S+\s*$!!;s!\s*\S+\s*/\s*\S+\s*!\n!g'
name1
name2
name3 last3
With awk
we can define the field separator as the <a>/<b>
parts you want to remove and print all fields (except the last empty field) on a separate line:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
awk -F '[[:space:]]*[^[:space:]]+[[:space:]]*/[[:space:]]*[^[:space:]]+[[:space:]]*' '
{for(i=1;i<NF;i++) print $i}'
name1
name2
name3 last3
Or, with GNU awk
:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
awk -F '\\s*\\S+\\s*/\\s*\\S+\\s*' '{for(i=1;i<NF;i++) print $i}'
name1
name2
name3 last3
答案3
得分: 0
如果您的 `grep` 支持 `-P`(PCRE)选项,请尝试:
```shell
echo "name1 11/22 name2 33 / 44 name3 last3 55/66" \
| grep -oP '\b[a-zA-Z][a-zA-Z0-9. ]+\b(?=\s+\d+\s*/\s*\d+)'
输出:
name1
name2
name3 last3
(?=\s+\d+\s*/\s*\d+)
是前瞻断言,用于匹配以下序列:
- 一个或多个空白字符
- 一个或多个数字
- 零个或多个空白字符
- 一个斜杠字符
- 零个或多个空白字符
- 一个或多个数字
匹配的子字符串不包括在输出中。
<details>
<summary>英文:</summary>
If your `grep` supports `-P` (PCRE) option, would you please try:
echo "name1 11/22 name2 33 / 44 name3 last3 55/66"
| grep -oP '\b[a-zA-Z][a-zA-Z0-9. ]+\b(?=\s+\d+\s*/\s*\d+)'
Output:
name1
name2
name3 last3
`(?=\s+\d+\s*/\s*\d+)` is the lookahead assertion which matches
a sequence of:
- one or more blank character(s)
- one or more digit(s)
- zero or more blank character(s)
- a slash character
- zero or more blank character(s)
- one or more digit(s)
The matched substring is not included in the output.
</details>
# 答案4
**得分**: 0
使用GNU awk来处理多字符的RS
、RT
和\s/\S
,这可能是您想要的:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
awk -v RS='\s+\S+\s*/\s*\S+\s*' 'RT'
name1
name2
name3 last3
<details>
<summary>英文:</summary>
Using GNU awk for multi-char `RS`, `RT`, and `\s/\S`, this might be what you want:
$ echo "name1 11/22 name2 33 / 44 name3 last3 55/66" |
awk -v RS='\\s+\\S+\\s*/\\s*\\S+\\s*' 'RT'
name1
name2
name3 last3
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论