英文:
AWK split file every 50 ocurrences of a string
问题
我有一个关于awk的初学者问题。
我正在使用以下行将文件分割成c文件,使用'MATCH'作为分隔符。
awk 'BEGIN{flag=0} /MATCH/{flag++;next} {print $0 > (flag ".txt")}' file
我的文件非常长,但它的格式如下:
MATCH
a
b
c
d
MATCH
a
b
我希望上面的awk命令每50次'MATCH'出现时分割我的文件。当前的命令为每次'MATCH'出现都创建一个新文件。我确信有一种简单的方法可以实现这一点,但我还没有弄清楚。我尝试了下面的行,但没有成功。
awk 'BEGIN{flag=0} /MATCH/{flag++ == 50;next} {print $0 > (flag ".txt")}' file
感谢帮助和指导。
英文:
I have a beginner question about awk.
I am using the line below to split a file into c files, using 'MATCH' as my delimiter.
awk 'BEGIN{flag=0} /MATCH/{flag++;next} {print $0 > (flag ".txt")}' file
My file is very long, but it has the form shown below:
MATCH
a
b
c
d
MATCH
a
b
I want to have the above awk line split my file every 50 'MATCH' ocurrences. The current command creates a new file for each 'MATCH' ocurrence. I am sure there is a simple way to achieve this, but I have not figured it out yet. I have tried using the line below with no luck.
awk 'BEGIN{flag=0} /MATCH/{flag++ == 50;next} {print $0 > (flag ".txt")}' file
I appreciate the help and guidance.
答案1
得分: 3
awk '
/匹配/ && ( ( (++matchCnt) % 50 ) == 1 ) {
close(out)
out = (++outCnt) ".txt";
}
{ print > out }
' 文件
英文:
Untested, using any awk:
awk '
/MATCH/ && ( ( (++matchCnt) % 50 ) == 1 ) {
close(out)
out = (++outCnt) ".txt"
}
{ print > out }
' file
答案2
得分: 3
假设:
MATCH
区块中的行数事先未知MATCH
区块的行数可能会变化MATCH
行需要被复制到输出文件中
具有 9 个 MATCH
区块的示例输入:
$ cat file
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2
一个 awk
的想法:
awk -v blkcnt=3 ' # for OP case set blkcnt=50
BEGIN { outfile= ++fcnt ".txt" }
/MATCH/ { if (++matchcnt > blkcnt) {
close(outfile)
outfile= ++fcnt ".txt"
matchcnt=1
}
# next # uncomment if the "MATCH" lines are *NOT* to be copied to the output files
}
{ print $0 > outfile }
' file
对于 blkcnt=3
,这会生成:
$ head -40 {1..3}.txt
==> 1.txt <==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
==> 2.txt <==
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2
==> 3.txt <==
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2
对于 blkcnt=4
,这会生成:
$ head -40 {1..3}.txt
==> 1.txt <==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2
==> 2.txt <==
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2
==> 3.txt <==
MATCH
9.1
9.2
英文:
Assumptions:
- the number of lines in a
MATCH
block are not known beforehand - the number of lines in a
MATCH
block could vary - the
MATCH
lines are to be copied to the output files
Sample input with 9 MATCH
blocks:
$ cat file
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2
One awk
idea:
awk -v blkcnt=3 ' # for OP case set blkcnt=50
BEGIN { outfile= ++fcnt ".txt" }
/MATCH/ { if (++matchcnt > blkcnt) {
close(outfile)
outfile= ++fcnt ".txt"
matchcnt=1
}
# next # uncomment if the "MATCH" lines are *NOT* to be copied to the output files
}
{ print $0 > outfile }
' file
For blkcnt=3
this generates:
$ head -40 {1..3}.txt
==> 1.txt <==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
==> 2.txt <==
MATCH
4.1
4.2
MATCH
5.1
5.2
MATCH
6.1
6.2
==> 3.txt <==
MATCH
7.1
7.2
MATCH
8.1
8.2
MATCH
9.1
9.2
For blkcnt=4
this generates:
$ head -40 {1..3}.txt
==> 1.txt <==
MATCH
1.1
1.2
MATCH
2.1
2.2
MATCH
3.1
3.2
MATCH
4.1
4.2
==> 2.txt <==
MATCH
5.1
5.2
MATCH
6.1
6.2
MATCH
7.1
7.2
MATCH
8.1
8.2
==> 3.txt <==
MATCH
9.1
9.2
答案3
得分: 1
如果我理解正确的话,a、b、c、d行的前50个块应该被写入 1.txt
,接下来的50个块写入 2.txt
,以此类推。
这可以通过从 (flag/50)
的整数值构建文件名来实现,并在其基础上加1(假设您想让文件系列从1开始,而不是0)。
BEGIN 块可以被移除,因为变量在第一次创建时,如果没有给定值,它们将被设置为0,并且它们在数值上被使用。
因此,以下代码应该实现所需的输出:
awk '/MATCH/{flag++;next} {print $0 >(int(flag/50)+1 ".txt")}' file
英文:
If I've understood correctly, the first 50 blocks of a,b,c,d lines should be written to 1.txt
, the next 50 to 2.txt
and so on.
This can be achieved by building the filename from the integer value of (flag/50)
and adding 1 to it (assuming you want the file series to being with 1 and not 0).
The BEGIN block can be removed as variables are set to 0 when first created if no value is given and they are used numerically.
Thus the following should achieve the desired output:
awk '/MATCH/{flag++;next} {print $0 >(int(flag/50)+1 ".txt")}' file
答案4
得分: 0
虽然这不是一个完整的解决方案,但它展示了如何使用每个"MATCH"
捕获每组行,因此一旦计数达到每50个一组,就将它们一次性打印出来。需要注意的是,要修剪掉末尾的"MATCH"
并保存它以备下一轮使用。
nice jot 53 | mawk 'NR % 6 != 1 || ($!NF = "MATCH")^_'
mawk '{ printf(" :: input row(s) = %8u\n ::" \
" output row # = %8u\n " \
"-------------------\n %s%s " \
"----END-NEW-ROW----\n\n", NF^!!NF, NR, $!(NF = NF), ORS)
}' RS='(^)?MATCH\r?\n' ORS='MATCH\n' FS='\n' OFS='\f'
注:以上内容是给出的代码部分,无法进行翻译。
英文:
so while this isn't a complete solution, it does showcase how to capture each group of rows with each "MATCH"
, so once you count off every 50, then print them out in one shot, bearing in mind one needs to trim out the tail "MATCH"
and save it for the next round
nice jot 53 | mawk 'NR % 6 != 1 || ($!NF = "MATCH")^_' |
> mawk '{ printf(" :: input row(s) = %8u\n ::"
> " output row # = %8u\n "
> "-------------------\n %s%s "
> "----END-NEW-ROW----\n\n", NF^!!NF, NR, $!(NF = NF), ORS)
>
> }' RS='(^)?MATCH\r?\n' ORS='MATCH\n' FS='\n' OFS='\f'
:: input row(s) = 1
:: output row # = 1
-------------------
MATCH
----END-NEW-ROW----
:: input row(s) = 6
:: output row # = 2
-------------------
2
3
4
5
6
MATCH
----END-NEW-ROW----
:: input row(s) = 6
:: output row # = 3
-------------------
8
9
10
11
12
MATCH
----END-NEW-ROW----
:: input row(s) = 6
:: output row # = 4
-------------------
14
15
16
17
18
MATCH
----END-NEW-ROW----
:: input row(s) = 6
:: output row # = 5
-------------------
20
21
22
23
24
MATCH
----END-NEW-ROW----
:: input row(s) = 6
:: output row # = 6
-------------------
26
27
28
29
30
MATCH
----END-NEW-ROW----
:: input row(s) = 6
:: output row # = 7
-------------------
32
33
34
35
36
MATCH
----END-NEW-ROW----
:: input row(s) = 6
:: output row # = 8
-------------------
38
39
40
41
42
MATCH
----END-NEW-ROW----
:: input row(s) = 6
:: output row # = 9
-------------------
44
45
46
47
48
MATCH
----END-NEW-ROW----
:: input row(s) = 5
:: output row # = 10
-------------------
50
51
52
53
MATCH
----END-NEW-ROW----
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论