英文:
regex in sed to keep current line but remove next line
问题
这是样本数据:
ServerA
Value1 fh824rfz
Plan CustomA
ServerB
Value3 9fgjzxlo
Plan CustomD
ServerC
Value10 339fgh0l
Plan CustomE
这是在vscode
中工作的正则表达式:
(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)
预期输出:
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
但是我尝试在sed
中使用以下类似的正则表达式,但它们不起作用:
-E 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'
-re 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'
-zre 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'
我该如何做呢?我认为问题出在\n
,因为当我将其删除时,示例起作用(但仍然不是预期输出)。
英文:
This is sample data:
ServerA
Value1 fh824rfz
Plan CustomA
ServerB
Value3 9fgjzxlo
Plan CustomD
ServerC
Value10 339fgh0l
Plan CustomE
This is the regex works in vscode
:
(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)
Expected output:
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
But I'm trying in sed
with regexes like this but they don't work:
-E 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'
-re 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'
-zre 's|(Value[0-9]{1,2} [0-9a-z]{8}\n)(.*)\n||g'
How can I do this? I think the issue is with \n
because when I remove that, the sample works (but still it's not the expected output).
答案1
得分: 6
以下是翻译好的部分:
"Here's one way to do it (checked with GNU sed
, syntax might vary for other implementations):
$ sed -E '/Value[0-9]{1,2} [0-9a-z]{8}$/{n; d}' ip.txt
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
n
command prints the pattern space (if auto-print is not disabled by -n
option), replaces it with the next line and d
will delete it."
英文:
Here's one way to do it (checked with GNU sed
, syntax might vary for other implementations):
$ sed -E '/Value[0-9]{1,2} [0-9a-z]{8}$/{n; d}' ip.txt
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
n
command prints the pattern space (if auto-print is not disabled by -n
option), replaces it with the next line and d
will delete it.
答案2
得分: 2
默认情况下,sed
是基于行的,你不能像你在前两次尝试中那样一次处理多行。要应用你的策略,你可以将下一行连接到当前行(使用 N
命令),然后处理这个两行的块:
sed -E '/Value[0-9]{1,2} [0-9a-z]{8}$/{N;s/\n.*//}'
另一个解决方案,如果你使用 GNU sed
,并且你的文件不太大且不包含 NUL
字节(ASCII 码为 0),可以使用 -z
选项将整个文件作为一个包含嵌入换行字符的单个行读入(使用 -z
选项时,sed
将输入行视为以 NUL
字节而不是换行符终止的):
sed -Ez 's/(Value[0-9]{1,2} [0-9a-z]{8}\n)[^\n]*\n//g'
这几乎是你在第三次尝试中尝试的内容,但是你在使用的 (.*)
组匹配了包括换行符在内的所有内容,所以只获取了第一个组... 在你的第三次尝试中,将 (.*)
替换为 [^\n]*
应该可以工作。此外,由于你想要去掉它,所以不需要分组。
如果使用 GNU sed
,你还可以使用 s
命令的多行模式(m
修饰符),使句点不匹配换行符:
sed -Ez 's/(Value[0-9]{1,2} [0-9a-z]{8}\n).*\n//gm'
正如其他人也提出的 awk
解决方案,这里是一个简单的示例:
awk 's=0;next /Value[0-9]{1,2} [0-9a-z]{8}$/{s=1} 1'
也就是说,如果变量 s
不等于 0(或空字符串),则将其重置为 0 并跳过当前行(next
)。否则,如果当前行与你的正则表达式匹配,则将变量 s
设置为 1。最后,打印当前行(1
)。
英文:
By default sed
is line-based and you cannot process several lines at once like you are trying to do in your two first atempts. To apply your strategy you must, for instance, concatenate the next line to the current one (N
command) and then process this block of two line:
sed -E '/Value[0-9]{1,2} [0-9a-z]{8}$/{N;s/\n.*//}'
Another solution, if you are using GNU sed
, if your file is not too large and does not contain NUL
bytes (ASCII code 0), is to use the -z
option to slurp the whole file as one single line with embedded newline characters (with the -z
option sed
considers that the input lines are terminated by a NUL
byte instead of a newline):
sed -Ez 's/(Value[0-9]{1,2} [0-9a-z]{8}\n)[^\n]*\n//g'
This is almost what you tried with your 3rd attempt but as the (.*)
group you used matches everything including newline characters you got only the first group printed... Replacing (.*)
with [^\n]*
in your 3rd attempt should work. Note also that as you want to suppress it there is no need for grouping.
With GNU sed
you can also use the multi-line mode of the s
command (m
modifier) such that the period does not match a newline:
sed -Ez 's/(Value[0-9]{1,2} [0-9a-z]{8}\n).*\n//gm'
As others also proposed awk
solutions here is a simple one:
awk 's{s=0;next} /Value[0-9]{1,2} [0-9a-z]{8}$/{s=1} 1'
That is, if variable s
is different from 0 (or the empty string) reset it to 0 and skip the current line (next
). Else, if the current line matches your regex, set variable s
to 1. Finally, print the current line (1
).
答案3
得分: 2
sed通常不是处理多个输入行的最佳工具。
使用任何POSIX兼容的awk,并且一次只在内存中存储1行:
$ awk '!f; {f=(/^Value[0-9]{1,2} [0-9a-z]{8}$/)}' 文件名
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
英文:
sed is usually not the best tool for anything that involves processing multiple input lines at a time.
Using any POSIX awk and only storing 1 line in memory at a time:
$ awk '!f; {f=(/^Value[0-9]{1,2} [0-9a-z]{8}$/)}' file
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
答案4
得分: 1
使用GNU awk:
$ awk '{where=match($0,"Value[0-9]{1,2} [0-9a-z]{8}"); if (where) {print; getline} else {print}}' 文件
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
英文:
Using GNU awk:
$ awk '{where=match($0,"Value[0-9]{1,2} [0-9a-z]{8}"); if (where) {print; getline} else {print}}' file
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
答案5
得分: 1
使用您提供的示例,请尝试以下awk
代码。使用awk
的match
函数,该函数在其中使用正则表达式来获取匹配的值。
awk -v RS="" '
{
while(match($0,/Value[0-9]+ [0-9a-z]{8}\n[^\n]*/)){
val=substr($0,RSTART,RLENGTH)
split(val,arr,ORS)
prevLine=substr($0,1,RSTART-1)
gsub(/^\n+|\n+$/,"",prevLine)
print prevLine ORS arr[1]
$0=substr($0,RSTART+RLENGTH)
}
}
' Input_file
请注意,这是您提供的原始awk
代码的翻译,不包括代码部分。
英文:
With your shown samples please try following awk
code. Using match
function of awk
which uses regex in it to get the matched values.
awk -v RS="" '
{
while(match($0,/Value[0-9]+ [0-9a-z]{8}\n[^\n]*/)){
val=substr($0,RSTART,RLENGTH)
split(val,arr,ORS)
prevLine=substr($0,1,RSTART-1)
gsub(/^\n+|\n+$/,"",prevLine)
print prevLine ORS arr[1]
$0=substr($0,RSTART+RLENGTH)
}
}
' Input_file
答案6
得分: 1
Wow,所有的教授都在这里聚集。我也可以提出一种方法吗? 😄
awk --posix '/Value[0-9]{1,2} [0-9a-z]{8}/ { print prev ORS $0 } { prev = $0 }' data.txt
或者
awk --posix '/Value[0-9]{1,2} [0-9a-z]{8}/ { print prev; print $0 } { prev = $0 }' data.txt
输出:
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
英文:
Wow, all the professors have gathered here. Can I also suggest a way? 😉
awk --posix '/Value[0-9]{1,2} [0-9a-z]{8}/ { print prev ORS $0 } { prev = $0 }' data.txt
#OR
awk --posix '/Value[0-9]{1,2} [0-9a-z]{8}/ { print prev; print $0 } { prev = $0 }' data.txt
output :
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
答案7
得分: 1
$ grep -v -f <(
grep -P -A 1 "Value[0-9]{1,2} [0-9a-z]{8}" file |
grep -P -v "Value[0-9]{1,2} [0-9a-z]{8}"
) file
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
英文:
$ grep -v -f <(
grep -P -A 1 "Value[0-9]{1,2} [0-9a-z]{8}" file |
grep -P -v "Value[0-9]{1,2} [0-9a-z]{8}"
) file
ServerA
Value1 fh824rfz
ServerB
Value3 9fgjzxlo
ServerC
Value10 339fgh0l
答案8
得分: 0
保存$0
在awk
中,然后使用虚拟的getline
来通过利用其返回代码作为substr()
的起始索引来“删除下一行”,并使用赋值来覆盖“下一行”与当前行:
jot 10 |
mawk '(__ = $_)~/[3-7]/ && $!NF = substr(__, getline)'
---
3
5
7
英文:
save up $0
in awk
, then use a dummy getline
to "remove next line" by leveraging its return code as the substr()
's starting index, and using assignment to overwrite the "next line" with the current one :
jot 10 |
mawk '(__ = $_)~/[3-7]/ && $!NF = substr(__, getline)'
3
5
7
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论