2023年2月16日 06:58:43go评论48阅读模式

英文:

AWK split file every 50 ocurrences of a string

问题

我有一个关于awk的初学者问题。

我正在使用以下行将文件分割成c文件，使用'MATCH'作为分隔符。

awk 'BEGIN{flag=0} /MATCH/{flag++;next} {print $0 > (flag ".txt")}' file

我的文件非常长，但它的格式如下：

MATCH
a
b
c
d
MATCH
a
b

我希望上面的awk命令每50次'MATCH'出现时分割我的文件。当前的命令为每次'MATCH'出现都创建一个新文件。我确信有一种简单的方法可以实现这一点，但我还没有弄清楚。我尝试了下面的行，但没有成功。

awk 'BEGIN{flag=0} /MATCH/{flag++ == 50;next} {print $0 > (flag ".txt")}' file

感谢帮助和指导。

英文:

I have a beginner question about awk.

I am using the line below to split a file into c files, using 'MATCH' as my delimiter.

awk 'BEGIN{flag=0} /MATCH/{flag++;next} {print $0 > (flag ".txt")}' file

My file is very long, but it has the form shown below:

MATCH
a
b
c
d
MATCH
a
b

I want to have the above awk line split my file every 50 'MATCH' ocurrences. The current command creates a new file for each 'MATCH' ocurrence. I am sure there is a simple way to achieve this, but I have not figured it out yet. I have tried using the line below with no luck.

awk 'BEGIN{flag=0} /MATCH/{flag++ == 50;next} {print $0 > (flag ".txt")}' file

I appreciate the help and guidance.

答案1

得分: 3

    awk '
        /匹配/ && ( ( (++matchCnt) % 50 ) == 1 ) {
            close(out)
            out = (++outCnt) ".txt";
        }
        { print > out }
    ' 文件

英文:

Untested, using any awk:

awk &#39;
    /MATCH/ &amp;&amp; ( ( (++matchCnt) % 50 ) == 1 ) {
        close(out)
        out = (++outCnt) &quot;.txt&quot;
    }
    { print &gt; out }
&#39; file

答案2

得分: 3

假设：

MATCH 区块中的行数事先未知
MATCH 区块的行数可能会变化
MATCH 行需要被复制到输出文件中

具有 9 个 MATCH 区块的示例输入：

$ cat file
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2

一个 awk 的想法：

awk -v blkcnt=3 '                             # for OP case set blkcnt=50
BEGIN   { outfile= ++fcnt ".txt" }
/MATCH/ { if (++matchcnt > blkcnt) {
             close(outfile)
             outfile= ++fcnt ".txt"
             matchcnt=1
          }
       #  next                                # uncomment if the "MATCH" lines are *NOT* to be copied to the output files
        }
        { print $0 > outfile }
'  file

对于 blkcnt=3，这会生成：

$ head -40 {1..3}.txt
==&gt; 1.txt &lt;==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2

==&gt; 2.txt &lt;==
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2

==&gt; 3.txt &lt;==
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2

对于 blkcnt=4，这会生成：

$ head -40 {1..3}.txt
==&gt; 1.txt &lt;==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2

==&gt; 2.txt &lt;==
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2

==&gt; 3.txt &lt;==
MATCH
9.1
9.2

英文:

Assumptions:

the number of lines in a MATCH block are not known beforehand
the number of lines in a MATCH block could vary
the MATCH lines are to be copied to the output files

Sample input with 9 MATCH blocks:

$ cat file
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2

One awk idea:

awk -v blkcnt=3 &#39;                             # for OP case set blkcnt=50
BEGIN   { outfile= ++fcnt &quot;.txt&quot; }
/MATCH/ { if (++matchcnt &gt; blkcnt) {
             close(outfile)
             outfile= ++fcnt &quot;.txt&quot;
             matchcnt=1
          }
       #  next                                # uncomment if the &quot;MATCH&quot; lines are *NOT* to be copied to the output files
        }
        { print $0 &gt; outfile }
&#39;  file

For blkcnt=3 this generates:

$ head -40 {1..3}.txt
==&gt; 1.txt &lt;==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2

==&gt; 2.txt &lt;==
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2

==&gt; 3.txt &lt;==
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2

For blkcnt=4 this generates:

$ head -40 {1..3}.txt
==&gt; 1.txt &lt;==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2

==&gt; 2.txt &lt;==
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2

==&gt; 3.txt &lt;==
MATCH
9.1
9.2

答案3

得分: 1

如果我理解正确的话，a、b、c、d行的前50个块应该被写入 1.txt，接下来的50个块写入 2.txt，以此类推。

这可以通过从 (flag/50) 的整数值构建文件名来实现，并在其基础上加1（假设您想让文件系列从1开始，而不是0）。

BEGIN 块可以被移除，因为变量在第一次创建时，如果没有给定值，它们将被设置为0，并且它们在数值上被使用。

因此，以下代码应该实现所需的输出：

awk '/MATCH/{flag++;next} {print $0 >(int(flag/50)+1 ".txt")}' file

英文:

If I've understood correctly, the first 50 blocks of a,b,c,d lines should be written to 1.txt, the next 50 to 2.txt and so on.

This can be achieved by building the filename from the integer value of (flag/50) and adding 1 to it (assuming you want the file series to being with 1 and not 0).

The BEGIN block can be removed as variables are set to 0 when first created if no value is given and they are used numerically.

Thus the following should achieve the desired output:

awk &#39;/MATCH/{flag++;next} {print $0 &gt;(int(flag/50)+1 &quot;.txt&quot;)}&#39; file

答案4

得分: 0

虽然这不是一个完整的解决方案，但它展示了如何使用每个"MATCH"捕获每组行，因此一旦计数达到每50个一组，就将它们一次性打印出来。需要注意的是，要修剪掉末尾的"MATCH"并保存它以备下一轮使用。

nice jot 53 | mawk 'NR % 6 != 1 || ($!NF = "MATCH")^_'
mawk '{ printf(" :: input row(s) = %8u\n ::" \
       " output row # = %8u\n "   \
       "-------------------\n %s%s " \
       "----END-NEW-ROW----\n\n", NF^!!NF, NR, $!(NF = NF), ORS) 
}' RS='(^)?MATCH\r?\n' ORS='MATCH\n' FS='\n' OFS='\f'

注：以上内容是给出的代码部分，无法进行翻译。

英文:

so while this isn't a complete solution, it does showcase how to capture each group of rows with each "MATCH", so once you count off every 50, then print them out in one shot, bearing in mind one needs to trim out the tail "MATCH" and save it for the next round

nice jot 53 | mawk &#39;NR % 6 != 1 || ($!NF = &quot;MATCH&quot;)^_&#39; |

> mawk '{ printf(" :: input row(s) = %8u\n ::"
> " output row # = %8u\n "
> "-------------------\n %s%s "
> "----END-NEW-ROW----\n\n", NF^!!NF, NR, $!(NF = NF), ORS)
>
> }' RS='(^)?MATCH\r?\n' ORS='MATCH\n' FS='\n' OFS='\f'

 :: input row(s) =        1
 :: output row # =        1
 -------------------
 MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        2
 -------------------
 2
  3
   4
    5
     6
      MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        3
 -------------------
 8
  9
   10
     11
       12
         MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        4
 -------------------
 14
   15
     16
       17
         18
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        5
 -------------------
 20
   21
     22
       23
         24
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        6
 -------------------
 26
   27
     28
       29
         30
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        7
 -------------------
 32
   33
     34
       35
         36
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        8
 -------------------
 38
   39
     40
       41
         42
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        6
 :: output row # =        9
 -------------------
 44
   45
     46
       47
         48
           MATCH
 ----END-NEW-ROW----

 :: input row(s) =        5
 :: output row # =       10
 -------------------
 50
   51
     52
       53
         MATCH
 ----END-NEW-ROW----

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用AWK在每50个字符串出现时拆分文件。

问题

答案1

答案2

答案3

答案4

需要将输出文件重新格式化为列。

使用Python，我想知道如何删除文件中两个字符串之间第一次出现的字符。

匹配包含来自另一个文件的两个字符串的行。

将变量设置为awk函数的输出。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论