英文:
count number of lines between two pattern match in file and delete all lines between pattern match only if count is more than 4
问题
我有一个包含以下文本的文件
输入文件:
------start---------
第一行
第二行
第三行
第四行
------end-----------
xyx
pqr
------start---------
第一行
第二行
第三行
第四行
第五行
第六行
------end-----------
我想要一个输出文件,如果在"start"和"end"之间的行数超过4行,我希望删除这两个模式之间的所有行,否则我不希望触摸它们,如果行数少于或等于4。
我需要的输出文件如下,其中我只希望在两个模式之间匹配的行数超过4时删除所有行:
预期输出文件:
------start---------
第一行
第二行
第三行
第四行
------end-----------
xyx
pqr
------start---------
------end-----------
我使用了以下sed命令
sed -i '/start/,/end/{//!d}' 文件名
删除模式匹配之间的行,但它不会涵盖避免在模式匹配之间的总行数少于或等于4时删除所有行的情况。
英文:
I have a file which contains text like below
Input file:
------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
first line
second line
third line
fourth line
fith line
sixth line
------end-----------
I want a output file such that if lines between patterns "start" and "end" is more than 4 all i want to delete all those lines between pattern else i dont wish to touch them if count is less than or equal to 4
I need the output file like below, inside which i want to delete all lines between two pattern match only if the total count of line is more than 4
expected output file:
------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
------end-----------
I have used this sed command
sed -i '/start/,/end/{//!d}' filename
to delete lines between pattern match but it will not cover the case to avoid deletion of all lines when total line count between pattern match is less than or equal to 4
答案1
得分: 3
使用任何awk,而不必一次将所有输入都读入内存:
awk '
inBlock {
if ( /^-+end-+$/ ) {
if ( gsub(/\n/,"&",rec) <= 4 ) {
printf "%s", rec
}
inBlock = rec = ""
}
else {
rec = rec $0 ORS
}
}
!inBlock {
print
}
/^-+start-+$/ {
inBlock = 1
}
' file
<p>
------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
------end-----------
英文:
Using any awk and without reading all of the input into memory at once:
awk '
inBlock {
if ( /^-+end-+$/ ) {
if ( gsub(/\n/,"&",rec) <= 4 ) {
printf "%s", rec
}
inBlock = rec = ""
}
else {
rec = rec $0 ORS
}
}
!inBlock {
print
}
/^-+start-+$/ {
inBlock = 1
}
' file
<p>
------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
------end-----------
答案2
得分: 2
awk \
-v s1='------start---------' \
-v s2='------end-----------' \
# 累积标记之间的行
between { data = data RS $0 }
# 找到起始标记
$0==s1 {
# 重置状态
between = NR
data = $0
}
# 打印标记外的内容
!between;
# 找到结束标记
$0==s2 {
# 如果总行数少于6行(包括两个标记行)
# 则打印数据,否则只打印标记
print (NR < between+6) ? data : s1 RS s2
# 重置状态
between = 0
data = ""
}
' filename
英文:
awk \
-v s1='------start---------' \
-v s2='------end-----------' \
'
# accumulate lines between markers
between { data = data RS $0 }
# found start marker
$0==s1 {
# reset state
between = NR
data = $0
}
# print anything outside markers
!between;
# found end marker
$0==s2 {
# print data if fewer than 6 lines total
# (including the two marker lines)
# else just print the markers
print (NR < between+6) ? data : s1 RS s2
# reset state
between = 0
data = ""
}
' filename
答案3
得分: 1
你可以使用 perl
以类似的方式替代 sed
:
这里的 perl -0pe
是对 sed -z
的替代,但它支持 PCRE 正则表达式,包括 lookahead 和捕获重置 \K
。
正则表达式本身的含义如下:
------start---------\n
- 带有换行符的起始标记,\K
忽略先前匹配的序列,保持在相同的光标位置。与之替代的可能是更常见的(?<=------start---------\n)
,但\K
具有更好的性能,(?:(?:(?!------(?:end|start)---------).)+?\n){5,}
- 至少包含 5 行,这些行不包含起始或结束标记,(?=------end-----------)
lookahead 检查匹配的序列后面是否跟着结束标记。
正则表达式的演示在 此处。
英文:
You can use perl
instead of sed
in a similar manner:
perl -0pe 's/------start---------\n\K(?:(?:(?!------(?:end|start)---------).)+?\n){5,}(?=------end-----------)//g' myfile
Here perl -0pe
is alternative to sed -z
, but it supports PCRE regexes, including lookaheads and capturing resets \K
.
Regex itself means:
------start---------\n
- start tag with newline,\K
disregards previously matched sequence, staying in the same cursor position. Alternative to this could be more widely familiar(?<=------start---------\n)
, but\K
has better performance,(?:(?:(?!------(?:end|start)---------).)+?\n){5,}
- at least 5 lines that don't contain start or end tag,(?=------end-----------)
lookahead checks, that matched sequence is followed by end tag.
Demo of regex here.
答案4
得分: 0
使用GNU `sed`
```bash
$ sed -Ezi.bak ''s/(-+start-+\n)(([^-\n]*\n){5,})(-+end-+\n)//g'' 输入文件
------start---------
第一行
第二行
第三行
第四行
------end-----------
xyx
pqr
------start---------
------end-----------
英文:
Using GNU sed
$ sed -Ezi.bak 's/(-+start-+\n)(([^-\n]*\n){5,})(-+end-+\n)//g' input_file
------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
------end-----------
答案5
得分: 0
Perl具有与sed和awk相同的范围操作符,因此您可以在流处理模式下执行此操作:
perl -ne 'push @l,$_ if /-+start-+/ .. /-+end-+/; print if $#l<0; if (/-+end-+/) { if ($#l<6) { print @l } else { print @l[0,-1] } @l=() }' input_file
展开后如下所示:
# 将行累积到数组@l中,处于范围内
push @l,$_ if /-+start-+/ .. /-+end-+/;
print if $#l<0; # 如果为空则打印(范围外)
if (/-+end-+/) { # 结束时...
if ($#l<6) { # ...如果积累的行数小于6:
print @l; # 打印整个范围
} else { # ...如果超过6行:
print @l[0,-1]; # 打印第一行和最后一行
}
@l=(); # ...清空@l
}
英文:
Perl has the same range operators as sed and awk, so you can do this in stream processing mode:
perl -ne 'push @l,$_ if /-+start-+/ .. /-+end-+/; print if $#l<0; if (/-+end-+/) { if ($#l<6) { print @l } else { print @l[0,-1] } @l=() }' input_file
Unrolled that looks like this:
# accumulate lines into array @l within range
push @l,$_ if /-+start-+/ .. /-+end-+/;
print if $#l<0; # print if empty (outside range)
if (/-+end-+/) { # upon end...
if ($#l<6) { # ...if 6 or fewer lines accumulated:
print @l; # print entire range
} else { # ...if more than 6 lines:
print @l[0,-1]; # print first and last
}
@l=(); # ...empty @l
}
答案6
得分: -1
-
NF < 6
这里是因为有 4 个标准行 +----end----
行。 -
__
是ON / OFF
打印指示标志。
mawk 'BEGIN { _+=_^= ORS = RS = "---" (FS = RS)
_^= ! (___ = _ += _+_) }
(__ = $NF ~ /^[-]+(start|end)[-]*$/) < _ ||
(!__ ? _ : (NF < ___ || $!NF = $NF)^(_ = !_)^_)'
英文:
-
NF < 6
here since it's 4 standard lines +----end----
line. -
__
is theON / OFF
printing indicator flag.
echo '
------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
first line
second line
third line
fourth line
fith line
sixth line
------end-----------' |
mawk 'BEGIN { _+=_^= ORS = RS = "---" (FS = RS)
_^= ! (___ = _ += _+_) }
(__ = $NF ~ /^[-]+(start|end)[-]*$/) < _ ||
(!__ ? _ : (NF < ___ || $!NF = $NF)^(_ = !_)^_)'
------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
------end-----------
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论