count number of lines between two pattern match in file and delete all lines between pattern match only if count is more than 4

huangapple go评论74阅读模式
英文:

count number of lines between two pattern match in file and delete all lines between pattern match only if count is more than 4

问题

我有一个包含以下文本的文件

输入文件:

------start---------
第一行
第二行
第三行
第四行
------end-----------
xyx
pqr
------start---------
第一行
第二行
第三行
第四行
第五行
第六行
------end-----------

我想要一个输出文件,如果在"start"和"end"之间的行数超过4行,我希望删除这两个模式之间的所有行,否则我不希望触摸它们,如果行数少于或等于4。

我需要的输出文件如下,其中我只希望在两个模式之间匹配的行数超过4时删除所有行:

预期输出文件:

------start---------
第一行
第二行
第三行
第四行
------end-----------
xyx
pqr
------start---------
------end-----------

我使用了以下sed命令

sed -i '/start/,/end/{//!d}' 文件名

删除模式匹配之间的行,但它不会涵盖避免在模式匹配之间的总行数少于或等于4时删除所有行的情况。

英文:

I have a file which contains text like below

Input file:

------start---------
first line
second line
third line 
fourth line
------end-----------
xyx
pqr
------start---------
first line
second line
third line 
fourth line
fith line
sixth line
------end-----------

I want a output file such that if lines between patterns "start" and "end" is more than 4 all i want to delete all those lines between pattern else i dont wish to touch them if count is less than or equal to 4

I need the output file like below, inside which i want to delete all lines between two pattern match only if the total count of line is more than 4

expected output file:

------start---------
first line
second line
third line 
fourth line
------end-----------
xyx
pqr
------start---------
------end-----------

I have used this sed command

sed -i '/start/,/end/{//!d}' filename

to delete lines between pattern match but it will not cover the case to avoid deletion of all lines when total line count between pattern match is less than or equal to 4

答案1

得分: 3

使用任何awk,而不必一次将所有输入都读入内存:

awk '
    inBlock {
        if ( /^-+end-+$/ ) {
            if ( gsub(/\n/,"&",rec) <= 4 ) {
                printf "%s", rec
            }
            inBlock = rec = ""
        }
        else {
            rec = rec $0 ORS
        }
    }
    !inBlock {
        print
    }
    /^-+start-+$/ {
        inBlock = 1
    }
' file
<p>

------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
------end-----------
英文:

Using any awk and without reading all of the input into memory at once:

awk &#39;
    inBlock {
        if ( /^-+end-+$/ ) {
            if ( gsub(/\n/,&quot;&amp;&quot;,rec) &lt;= 4 ) {
                printf &quot;%s&quot;, rec
            }
            inBlock = rec = &quot;&quot;
        }
        else {
            rec = rec $0 ORS
        }
    }
    !inBlock {
        print
    }
    /^-+start-+$/ {
        inBlock = 1
    }
&#39; file

<p>

------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
------end-----------

答案2

得分: 2

awk \
    -v s1='------start---------' \
    -v s2='------end-----------' \

    # 累积标记之间的行
	between { data = data RS $0 }

    # 找到起始标记
	$0==s1 {
        # 重置状态
		between = NR
		data = $0
	}

    # 打印标记外的内容
	!between;

    # 找到结束标记
	$0==s2 {
        # 如果总行数少于6行(包括两个标记行)
        # 则打印数据,否则只打印标记
		print (NR < between+6) ? data : s1 RS s2

        # 重置状态
        between = 0
		data = ""
	}
' filename
英文:
awk \
    -v s1=&#39;------start---------&#39; \
    -v s2=&#39;------end-----------&#39; \
&#39;
    # accumulate lines between markers
	between { data = data RS $0 }

    # found start marker
	$0==s1 {
        # reset state
		between = NR
		data = $0
	}

    # print anything outside markers
	!between;

    # found end marker
	$0==s2 {
        # print data if fewer than 6 lines total
        #     (including the two marker lines)
        # else just print the markers 
		print (NR &lt; between+6) ? data : s1 RS s2

        # reset state
        between = 0
		data = &quot;&quot;
	}
&#39; filename

答案3

得分: 1

你可以使用 perl 以类似的方式替代 sed

这里的 perl -0pe 是对 sed -z 的替代,但它支持 PCRE 正则表达式,包括 lookahead 和捕获重置 \K

正则表达式本身的含义如下:

  • ------start---------\n - 带有换行符的起始标记,
  • \K 忽略先前匹配的序列,保持在相同的光标位置。与之替代的可能是更常见的 (?&lt;=------start---------\n),但 \K 具有更好的性能,
  • (?:(?:(?!------(?:end|start)---------).)+?\n){5,} - 至少包含 5 行,这些行不包含起始或结束标记,
  • (?=------end-----------) lookahead 检查匹配的序列后面是否跟着结束标记。

正则表达式的演示在 此处

英文:

You can use perl instead of sed in a similar manner:

perl -0pe &#39;s/------start---------\n\K(?:(?:(?!------(?:end|start)---------).)+?\n){5,}(?=------end-----------)//g&#39; myfile

Here perl -0pe is alternative to sed -z, but it supports PCRE regexes, including lookaheads and capturing resets \K.

Regex itself means:

  • ------start---------\n - start tag with newline,
  • \K disregards previously matched sequence, staying in the same cursor position. Alternative to this could be more widely familiar (?&lt;=------start---------\n), but \K has better performance,
  • (?:(?:(?!------(?:end|start)---------).)+?\n){5,} - at least 5 lines that don't contain start or end tag,
  • (?=------end-----------) lookahead checks, that matched sequence is followed by end tag.

Demo of regex here.

答案4

得分: 0

使用GNU `sed`
```bash
$ sed -Ezi.bak '&#39;s/(-+start-+\n)(([^-\n]*\n){5,})(-+end-+\n)//g&#39;' 输入文件
------start---------
第一行
第二行
第三行
第四行
------end-----------
xyx
pqr
------start---------
------end-----------
英文:

Using GNU sed

$ sed -Ezi.bak &#39;s/(-+start-+\n)(([^-\n]*\n){5,})(-+end-+\n)//g&#39; input_file
------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
------end-----------

答案5

得分: 0

Perl具有与sed和awk相同的范围操作符,因此您可以在流处理模式下执行此操作:

perl -ne 'push @l,$_ if /-+start-+/ .. /-+end-+/; print if $#l<0; if (/-+end-+/) { if ($#l<6) { print @l } else { print @l[0,-1] } @l=() }' input_file

展开后如下所示:

# 将行累积到数组@l中,处于范围内
push @l,$_ if /-+start-+/ .. /-+end-+/; 

print if $#l<0;       # 如果为空则打印(范围外)

if (/-+end-+/) {      # 结束时...
   if ($#l<6) {       # ...如果积累的行数小于6:
      print @l;       # 打印整个范围
   } else {           # ...如果超过6行:
      print @l[0,-1]; # 打印第一行和最后一行
   } 
   @l=();             # ...清空@l
}
英文:

Perl has the same range operators as sed and awk, so you can do this in stream processing mode:

perl -ne &#39;push @l,$_ if /-+start-+/ .. /-+end-+/; print if $#l&lt;0; if (/-+end-+/) { if ($#l&lt;6) { print @l } else { print @l[0,-1] } @l=() }&#39; input_file

Unrolled that looks like this:

# accumulate lines into array @l within range
push @l,$_ if /-+start-+/ .. /-+end-+/; 

print if $#l&lt;0;       # print if empty (outside range)

if (/-+end-+/) {      # upon end...
   if ($#l&lt;6) {       # ...if 6 or fewer lines accumulated:
      print @l;       # print entire range
   } else {           # ...if more than 6 lines:
      print @l[0,-1]; # print first and last
   } 
   @l=();             # ...empty @l
}

答案6

得分: -1

  1. NF &lt; 6 这里是因为有 4 个标准行 + ----end---- 行。

  2. __ON / OFF 打印指示标志。


mawk &#39;BEGIN { _+=_^= ORS = RS = &quot;---&quot; (FS = RS)       
              _^= ! (___ = _ += _+_)            } 
            (__ = $NF ~ /^[-]+(start|end)[-]*$/) &lt; _ || 
           (!__ ? _ : (NF &lt; ___ || $!NF = $NF)^(_ = !_)^_)&#39; 

英文:
  1. NF &lt; 6 here since it's 4 standard lines + ----end---- line.

  2. __ is the ON / OFF printing indicator flag.

echo &#39;
------start---------
first line
second line
third line
fourth line
------end-----------
xyx
pqr
------start---------
first line
second line
third line
fourth line
fith line
sixth line
------end-----------&#39; | 

mawk &#39;BEGIN { _+=_^= ORS = RS = &quot;---&quot; (FS = RS)       
              _^= ! (___ = _ += _+_)            } 
            (__ = $NF ~ /^[-]+(start|end)[-]*$/) &lt; _ || 
           (!__ ? _ : (NF &lt; ___ || $!NF = $NF)^(_ = !_)^_)&#39; 

------start---------
first line
second line
third line 
fourth line
------end-----------
xyx
pqr
------start---------
------end-----------

huangapple
  • 本文由 发表于 2023年5月29日 01:48:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76352815.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定