使用AWK在每50个字符串出现时拆分文件。

huangapple go评论44阅读模式
英文:

AWK split file every 50 ocurrences of a string

问题

我有一个关于awk的初学者问题。

我正在使用以下行将文件分割成c文件,使用'MATCH'作为分隔符。

awk 'BEGIN{flag=0} /MATCH/{flag++;next} {print $0 > (flag ".txt")}' file

我的文件非常长,但它的格式如下:

MATCH
a
b
c
d
MATCH
a
b

我希望上面的awk命令每50次'MATCH'出现时分割我的文件。当前的命令为每次'MATCH'出现都创建一个新文件。我确信有一种简单的方法可以实现这一点,但我还没有弄清楚。我尝试了下面的行,但没有成功。

awk 'BEGIN{flag=0} /MATCH/{flag++ == 50;next} {print $0 > (flag ".txt")}' file

感谢帮助和指导。

英文:

I have a beginner question about awk.

I am using the line below to split a file into c files, using 'MATCH' as my delimiter.

awk 'BEGIN{flag=0} /MATCH/{flag++;next} {print $0 > (flag ".txt")}' file

My file is very long, but it has the form shown below:

MATCH
a
b
c
d
MATCH
a
b

I want to have the above awk line split my file every 50 'MATCH' ocurrences. The current command creates a new file for each 'MATCH' ocurrence. I am sure there is a simple way to achieve this, but I have not figured it out yet. I have tried using the line below with no luck.

awk 'BEGIN{flag=0} /MATCH/{flag++ == 50;next} {print $0 > (flag ".txt")}' file

I appreciate the help and guidance.

答案1

得分: 3

    awk '
        /匹配/ && ( ( (++matchCnt) % 50 ) == 1 ) {
            close(out)
            out = (++outCnt) ".txt";
        }
        { print > out }
    ' 文件
英文:

Untested, using any awk:

awk '
    /MATCH/ && ( ( (++matchCnt) % 50 ) == 1 ) {
        close(out)
        out = (++outCnt) ".txt"
    }
    { print > out }
' file

答案2

得分: 3

假设:

  • MATCH 区块中的行数事先未知
  • MATCH 区块的行数可能会变化
  • MATCH 行需要被复制到输出文件中

具有 9 个 MATCH 区块的示例输入:

$ cat file
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2

一个 awk 的想法:

awk -v blkcnt=3 '                             # for OP case set blkcnt=50
BEGIN   { outfile= ++fcnt ".txt" }
/MATCH/ { if (++matchcnt > blkcnt) {
             close(outfile)
             outfile= ++fcnt ".txt"
             matchcnt=1
          }
       #  next                                # uncomment if the "MATCH" lines are *NOT* to be copied to the output files
        }
        { print $0 > outfile }
'  file

对于 blkcnt=3,这会生成:

$ head -40 {1..3}.txt
==> 1.txt <==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2

==> 2.txt <==
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2

==> 3.txt <==
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2

对于 blkcnt=4,这会生成:

$ head -40 {1..3}.txt
==> 1.txt <==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2

==> 2.txt <==
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2

==> 3.txt <==
MATCH
9.1
9.2
英文:

Assumptions:

  • the number of lines in a MATCH block are not known beforehand
  • the number of lines in a MATCH block could vary
  • the MATCH lines are to be copied to the output files

Sample input with 9 MATCH blocks:

$ cat file
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2

One awk idea:

awk -v blkcnt=3 '                             # for OP case set blkcnt=50
BEGIN   { outfile= ++fcnt ".txt" }
/MATCH/ { if (++matchcnt > blkcnt) {
             close(outfile)
             outfile= ++fcnt ".txt"
             matchcnt=1
          }
       #  next                                # uncomment if the "MATCH" lines are *NOT* to be copied to the output files
        }
        { print $0 > outfile }
'  file

For blkcnt=3 this generates:

$ head -40 {1..3}.txt
==> 1.txt <==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2

==> 2.txt <==
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2

==> 3.txt <==
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2

For blkcnt=4 this generates:

$ head -40 {1..3}.txt
==> 1.txt <==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2

==> 2.txt <==
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2

==> 3.txt <==
MATCH
9.1
9.2

答案3

得分: 1

如果我理解正确的话,a、b、c、d行的前50个块应该被写入 1.txt,接下来的50个块写入 2.txt,以此类推。

这可以通过从 (flag/50) 的整数值构建文件名来实现,并在其基础上加1(假设您想让文件系列从1开始,而不是0)。

BEGIN 块可以被移除,因为变量在第一次创建时,如果没有给定值,它们将被设置为0,并且它们在数值上被使用。

因此,以下代码应该实现所需的输出:

awk '/MATCH/{flag++;next} {print $0 >(int(flag/50)+1 ".txt")}' file
英文:

If I've understood correctly, the first 50 blocks of a,b,c,d lines should be written to 1.txt, the next 50 to 2.txt and so on.

This can be achieved by building the filename from the integer value of (flag/50) and adding 1 to it (assuming you want the file series to being with 1 and not 0).

The BEGIN block can be removed as variables are set to 0 when first created if no value is given and they are used numerically.

Thus the following should achieve the desired output:

awk '/MATCH/{flag++;next} {print $0 >(int(flag/50)+1 ".txt")}' file

答案4

得分: 0

虽然这不是一个完整的解决方案,但它展示了如何使用每个"MATCH"捕获每组行,因此一旦计数达到每50个一组,就将它们一次性打印出来。需要注意的是,要修剪掉末尾的"MATCH"并保存它以备下一轮使用。

nice jot 53 | mawk 'NR % 6 != 1 || ($!NF = "MATCH")^_'
mawk '{ printf(" :: input row(s) = %8u\n ::" \
       " output row # = %8u\n "   \
       "-------------------\n %s%s " \
       "----END-NEW-ROW----\n\n", NF^!!NF, NR, $!(NF = NF), ORS) 
}' RS='(^)?MATCH\r?\n' ORS='MATCH\n' FS='\n' OFS='\f'

注:以上内容是给出的代码部分,无法进行翻译。

英文:

so while this isn't a complete solution, it does showcase how to capture each group of rows with each "MATCH", so once you count off every 50, then print them out in one shot, bearing in mind one needs to trim out the tail "MATCH" and save it for the next round

nice jot 53 | mawk 'NR % 6 != 1 || ($!NF = "MATCH")^_' | 

> mawk '{ printf(" :: input row(s) = %8u\n ::"
> " output row # = %8u\n "
> "-------------------\n %s%s "
> "----END-NEW-ROW----\n\n", NF^!!NF, NR, $!(NF = NF), ORS)
>
> }' RS='(^)?MATCH\r?\n' ORS='MATCH\n' FS='\n' OFS='\f'

 :: input row(s) =        1
 :: output row # =        1
 -------------------
 MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        2
 -------------------
 2
  3
   4
    5
     6
      MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        3
 -------------------
 8
  9
   10
     11
       12
         MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        4
 -------------------
 14
   15
     16
       17
         18
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        5
 -------------------
 20
   21
     22
       23
         24
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        6
 -------------------
 26
   27
     28
       29
         30
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        7
 -------------------
 32
   33
     34
       35
         36
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        8
 -------------------
 38
   39
     40
       41
         42
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        9
 -------------------
 44
   45
     46
       47
         48
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        5
 :: output row # =       10
 -------------------
 50
   51
     52
       53
         MATCH
 ----END-NEW-ROW----

huangapple
  • 本文由 发表于 2023年2月16日 06:58:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75466211.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定