在sed中的正则表达式,保留当前行但移除下一行。

huangapple go评论64阅读模式
英文:

regex in sed to keep current line but remove next line

问题

这是样本数据:

ServerA
Value1 fh824rfz
Plan CustomA
ServerB
Value3 9fgjzxlo
Plan CustomD
ServerC
Value10 339fgh0l
Plan CustomE

这是在vscode中工作的正则表达式:

(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)

预期输出:

ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l

但是我尝试在sed中使用以下类似的正则表达式,但它们不起作用:

-E 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'
-re 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'
-zre 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'

我该如何做呢?我认为问题出在\n,因为当我将其删除时,示例起作用(但仍然不是预期输出)。

英文:

This is sample data:

ServerA
Value1 fh824rfz
Plan CustomA
ServerB
Value3 9fgjzxlo
Plan CustomD
ServerC
Value10 339fgh0l
Plan CustomE

This is the regex works in vscode:

(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)

Expected output:

ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l

But I'm trying in sed with regexes like this but they don't work:

-E 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'
-re 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'
-zre 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'

How can I do this? I think the issue is with \n because when I remove that, the sample works (but still it's not the expected output).

答案1

得分: 6

以下是翻译好的部分:

"Here's one way to do it (checked with GNU sed, syntax might vary for other implementations):

$ sed -E '/Value[0-9]{1,2} [0-9a-z]{8}$/{n; d}' ip.txt
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l

n command prints the pattern space (if auto-print is not disabled by -n option), replaces it with the next line and d will delete it."

英文:

Here's one way to do it (checked with GNU sed, syntax might vary for other implementations):

$ sed -E '/Value[0-9]{1,2} [0-9a-z]{8}$/{n; d}' ip.txt
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l

n command prints the pattern space (if auto-print is not disabled by -n option), replaces it with the next line and d will delete it.

答案2

得分: 2

默认情况下,sed 是基于行的,你不能像你在前两次尝试中那样一次处理多行。要应用你的策略,你可以将下一行连接到当前行(使用 N 命令),然后处理这个两行的块:

sed -E '/Value[0-9]{1,2} [0-9a-z]{8}$/{N;s/\n.*//}'

另一个解决方案,如果你使用 GNU sed,并且你的文件不太大且不包含 NUL 字节(ASCII 码为 0),可以使用 -z 选项将整个文件作为一个包含嵌入换行字符的单个行读入(使用 -z 选项时,sed 将输入行视为以 NUL 字节而不是换行符终止的):

sed -Ez 's/(Value[0-9]{1,2} [0-9a-z]{8}\n)[^\n]*\n//g'

这几乎是你在第三次尝试中尝试的内容,但是你在使用的 (.*) 组匹配了包括换行符在内的所有内容,所以只获取了第一个组... 在你的第三次尝试中,将 (.*) 替换为 [^\n]* 应该可以工作。此外,由于你想要去掉它,所以不需要分组。

如果使用 GNU sed,你还可以使用 s 命令的多行模式(m 修饰符),使句点不匹配换行符:

sed -Ez 's/(Value[0-9]{1,2} [0-9a-z]{8}\n).*\n//gm'

正如其他人也提出的 awk 解决方案,这里是一个简单的示例:

awk 's=0;next /Value[0-9]{1,2} [0-9a-z]{8}$/{s=1} 1'

也就是说,如果变量 s 不等于 0(或空字符串),则将其重置为 0 并跳过当前行(next)。否则,如果当前行与你的正则表达式匹配,则将变量 s 设置为 1。最后,打印当前行(1)。

英文:

By default sed is line-based and you cannot process several lines at once like you are trying to do in your two first atempts. To apply your strategy you must, for instance, concatenate the next line to the current one (N command) and then process this block of two line:

sed -E '/Value[0-9]{1,2} [0-9a-z]{8}$/{N;s/\n.*//}'

Another solution, if you are using GNU sed, if your file is not too large and does not contain NUL bytes (ASCII code 0), is to use the -z option to slurp the whole file as one single line with embedded newline characters (with the -z option sed considers that the input lines are terminated by a NUL byte instead of a newline):

sed -Ez 's/(Value[0-9]{1,2} [0-9a-z]{8}\n)[^\n]*\n//g'

This is almost what you tried with your 3rd attempt but as the (.*) group you used matches everything including newline characters you got only the first group printed... Replacing (.*) with [^\n]* in your 3rd attempt should work. Note also that as you want to suppress it there is no need for grouping.

With GNU sed you can also use the multi-line mode of the s command (m modifier) such that the period does not match a newline:

sed -Ez 's/(Value[0-9]{1,2} [0-9a-z]{8}\n).*\n//gm'

As others also proposed awk solutions here is a simple one:

awk 's{s=0;next} /Value[0-9]{1,2} [0-9a-z]{8}$/{s=1} 1'

That is, if variable s is different from 0 (or the empty string) reset it to 0 and skip the current line (next). Else, if the current line matches your regex, set variable s to 1. Finally, print the current line (1).

答案3

得分: 2

sed通常不是处理多个输入行的最佳工具。

使用任何POSIX兼容的awk,并且一次只在内存中存储1行:

$ awk '!f; {f=(/^Value[0-9]{1,2} [0-9a-z]{8}$/)}' 文件名
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
英文:

sed is usually not the best tool for anything that involves processing multiple input lines at a time.

Using any POSIX awk and only storing 1 line in memory at a time:

$ awk '!f; {f=(/^Value[0-9]{1,2} [0-9a-z]{8}$/)}' file
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l

答案4

得分: 1

使用GNU awk:

$ awk '{where=match($0,"Value[0-9]{1,2} [0-9a-z]{8}"); if (where) {print; getline} else {print}}' 文件
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
英文:

Using GNU awk:

$ awk '{where=match($0,"Value[0-9]{1,2} [0-9a-z]{8}"); if (where) {print; getline} else {print}}' file
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l

答案5

得分: 1

使用您提供的示例,请尝试以下awk代码。使用awkmatch函数,该函数在其中使用正则表达式来获取匹配的值。

awk -v RS="" '
{
  while(match($0,/Value[0-9]+ [0-9a-z]{8}\n[^\n]*/)){
    val=substr($0,RSTART,RLENGTH)
    split(val,arr,ORS)
    prevLine=substr($0,1,RSTART-1)
    gsub(/^\n+|\n+$/,"",prevLine)
    print prevLine ORS arr[1]
    $0=substr($0,RSTART+RLENGTH)
  }
}
' Input_file

请注意,这是您提供的原始awk代码的翻译,不包括代码部分。

英文:

With your shown samples please try following awk code. Using match function of awk which uses regex in it to get the matched values.

awk -v RS="" '
{
  while(match($0,/Value[0-9]+ [0-9a-z]{8}\n[^\n]*/)){
    val=substr($0,RSTART,RLENGTH)
    split(val,arr,ORS)
    prevLine=substr($0,1,RSTART-1)
    gsub(/^\n+|\n+$/,"",prevLine)
    print prevLine ORS arr[1]
    $0=substr($0,RSTART+RLENGTH)
  }
}
'  Input_file

答案6

得分: 1

Wow,所有的教授都在这里聚集。我也可以提出一种方法吗? 😄

awk --posix '/Value[0-9]{1,2} [0-9a-z]{8}/ { print prev ORS $0 } { prev = $0 }' data.txt

或者

awk --posix '/Value[0-9]{1,2} [0-9a-z]{8}/ { print prev; print $0 } { prev = $0 }' data.txt

输出:

ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
英文:

Wow, all the professors have gathered here. Can I also suggest a way? 😉

awk --posix '/Value[0-9]{1,2} [0-9a-z]{8}/ { print prev ORS $0 } { prev = $0 }' data.txt

#OR

awk --posix '/Value[0-9]{1,2} [0-9a-z]{8}/ { print prev; print $0 } { prev = $0 }' data.txt

output :

ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l

答案7

得分: 1

$ grep -v -f <(
    grep -P -A 1 "Value[0-9]{1,2} [0-9a-z]{8}" file |
    grep -P -v "Value[0-9]{1,2} [0-9a-z]{8}"
) file
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
英文:
$ grep -v -f <(
     grep -P -A 1 "Value[0-9]{1,2} [0-9a-z]{8}" file |
     grep -P -v "Value[0-9]{1,2} [0-9a-z]{8}"
 ) file
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l

答案8

得分: 0

保存$0awk中,然后使用虚拟的getline来通过利用其返回代码作为substr()的起始索引来“删除下一行”,并使用赋值来覆盖“下一行”与当前行:

jot 10 |
mawk '(__ = $_)~/[3-7]/ && $!NF = substr(__, getline)'
---
3
5
7
英文:

save up $0 in awk, then use a dummy getline to "remove next line" by leveraging its return code as the substr()'s starting index, and using assignment to overwrite the "next line" with the current one :

jot 10 | 

mawk '(__ = $_)~/[3-7]/ && $!NF = substr(__, getline)'

3
5
7

huangapple
  • 本文由 发表于 2023年7月24日 15:22:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76752205.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定