英文:
Extract n characters after a word in linux including new line
问题
我需要在Linux中从文本文件中的特定单词后剪切n个字符。这里的棘手部分是n个字符分布在两行之间。几乎所有类似情况的解决方案都只在同一行内执行这种字符串提取。例如,我有一个文本文件中的以下条目:
从中我想提取仅在"local_qid"之后的条目,即0000000000000000000000000000 000000000000000000000000000000721fda00360005
并显示在一行中。将其提取到新变量或文本文件中的单行也可以。
有人能否请告诉我是否可能?
P.S.:整个位于local_qid
之后的一整组数字也可能在同一行。
非常感谢所有在群里的专家们提前的帮助!
我尝试过sed、awk、grep,几乎所有这些命令都只在找到与给定单词(local_qid
)匹配的行时执行,输出要么是000000000000000000000000000
,要么如果我使用sed/awk命令来排除local_qid
,那么输出是000000000000000000000000000000721fda00360005
。
英文:
I need to cut n number of characters after a specific word from a text file in linux . The tricky part here is the n number of characters is spread across two lines. Almost all of the solutions given for the similar scenario only performs this kind of string extraction within the same line. For example, I have this entry in a text file like below:
I. 2023/06/02 17:57:58. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
I. 2023/06/02 18:01:58. ...... connected to server 'xxx' as user 'xxx'.
I. 2023/06/02 18:30:02. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
E. 2023/06/02 18:38:01. ERROR #9027 DIST(103 xxx.xxx) - /generic/gtr/tdext.c(719)
A begin with transaction id = 'c7f0f1' @ '00000000' P3 '0000000000dd' c '87' ^eWH '0000' WH 'dd' c '87' ^eW>T '0000000000' g '00000000' and local_qid = 0000000000000000000000000000
000000000000000000000000000000721fda00360005 was seen from site xxx.xxwhile a transaction was still active.
From this I would like to extract only the entries after "local_qid", i.e. 0000000000000000000000000000
and display in single line. Extracting this into a new variable or a text file into single line is also fine.
000000000000000000000000000000721fda00360005
Could some one please shed some light if this is possible?
P.S.: It is also possible that the whole set of numbers after local_qid
will be in the same line as well.
Thanks in advance to all the experts in the group!!
I have tried sed, awk, grep and almost all of them do this only till the line where they find the match with the word given (local_qid
), output is either like 000000000000000000000000000
(or) if I use sed/awk command to exlcude local_qid
then the output is 000000000000000000000000000000721fda00360005
.
答案1
得分: 1
这是一个简单的awk解决方案:
/local_qid/{gsub(/.*local_qid = /,"");id=$0;next}
id!=""{print id,$1;id=""}
'
它首先查找包含"local_qid"的行,提取id并存储在一个变量中;然后继续处理下一行。如果id
变量包含一个值,将打印id和行的第一个字段;然后重置id变量。
第二个条件只在紧随"local_qid"行之后的行上成立。
英文:
Here's a simple awk solution:
awk '
/local_qid/{gsub(/.*local_qid = /,"");id=$0;next}
id!=""{print id,$1;id=""}
'
It first finds a line with "local_qid", extracts the id and stores it in a variable; then it proceeds to the next line. If the id
variable contains a value, the id and the first field of the line is printed; and then the id variable is reset.
The second condition is only true in the lines immediately following the "local_qid" lines.
答案2
得分: 1
使用TXR的解决方案。
$ txr data.txr data
local_qid = 0000000000000000000000000000000000000000000000000000000000721fda00360005
local_qid = 0000000000000000000000000000000000000000000000000000000000721fda00360006
我增加了数据以包含一行的包装和未包装实例:
$ cat dataI. 2023/06/02 17:57:58. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
I. 2023/06/02 18:01:58. ...... connected to server 'xxx' as user 'xxx'.
I. 2023/06/02 18:30:02. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
E. 2023/06/02 18:38:01. ERROR #9027 DIST(103 xxx.xxx) - /generic/gtr/tdext.c(719)
A begin with transaction id = 'c7f0f1' @ '00000000' P3 '0000000000dd' c '87' ^eWH '0000' WH 'dd' c '87' ^eW>T '0000000000' g '00000000' and local_qid = 0000000000000000000000000000
000000000000000000000000000000721fda00360005 was seen from site xxx.xxwhile a transaction was still active.
I. 2023/06/03/17:57:58. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
I. 2023/06/03/18:01:58. ...... connected to server 'xxx' as user 'xxx'.
I. 2023/06/03/18:30:02. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
E. 2023/06/03/18:38:01. ERROR #9027 DIST(103 xxx.xxx) - /generic/gtr/tdext.c(719)
A begin with transaction id = 'c7f0f2' @ '00000000' P3 '0000000000dd' c '87' ^eWH '0000' WH 'dd' c '87' ^eW>T '0000000000' g '00000000' and local_qid = 0000000000000000000000000000000000000000000000000000000000721fda00360006 was seen from site xxx.xxwhile a transaction was still active.
代码使用了freeform
指令和一个两行窗口。freeform
有助于减少面向行的TXR匹配代码需要处理的情况,特别是在存在任意换行等情况时:
@(repeat)
@ (freeform "" 2)
A begin with @(skip) local_qid = @qid was seen @(skip)
@ (do (put-line `local_qid = @qid`))
@(end)
@(freeform "" 2)
表示,“使用输入流的备用版本匹配以下模式,其中两行组合成一行,使用空白作为分隔符”。之后,我们只需像一行未包装的行一样匹配带有local_qid
的所需行。
freeform
对输入的形状没有永久影响;这种改变只对其范围内的模式代码可见。
freeform
使用惰性字符串数据结构,因此在不容易预先知道需要解包的行数的情况下,可以在不使用数值限制参数的情况下使用它(在这种情况下,必须小心,不要强制将整个大文件的剩余部分实例化为单个字符串:只匹配所需的内容)。freeform
具有这样的特性,即未被模式匹配消耗的组合行的材料会神奇地变回单独的行,因此多行逻辑可以在相同的范围内继续。
英文:
Solution using TXR.
$ txr data.txr data
local_qid = 0000000000000000000000000000000000000000000000000000000000721fda00360005
local_qid = 0000000000000000000000000000000000000000000000000000000000721fda00360006
I augmented the data to have a wrapped and unwrapped instance of the line:
$ cat dataI. 2023/06/02 17:57:58. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
I. 2023/06/02 18:01:58. ...... connected to server 'xxx' as user 'xxx'.
I. 2023/06/02 18:30:02. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
E. 2023/06/02 18:38:01. ERROR #9027 DIST(103 xxx.xxx) - /generic/gtr/tdext.c(719)
A begin with transaction id = 'c7f0f1' @ '00000000' P3 '0000000000dd' c '87' ^eWH '0000' WH 'dd' c '87' ^eW>T '0000000000' g '00000000' and local_qid = 0000000000000000000000000000
000000000000000000000000000000721fda00360005 was seen from site xxx.xxwhile a transaction was still active.
I. 2023/06/03 17:57:58. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
I. 2023/06/03 18:01:58. ...... connected to server 'xxx' as user 'xxx'.
I. 2023/06/03 18:30:02. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
E. 2023/06/03 18:38:01. ERROR #9027 DIST(103 xxx.xxx) - /generic/gtr/tdext.c(719)
A begin with transaction id = 'c7f0f2' @ '00000000' P3 '0000000000dd' c '87' ^eWH '0000' WH 'dd' c '87' ^eW>T '0000000000' g '00000000' and local_qid = 0000000000000000000000000000000000000000000000000000000000721fda00360006 was seen from site xxx.xxwhile a transaction was still active.
The code uses the freeform
directive with a two-line window. freeform
helps to reduce the number of cases that line-oriented TXR matching code has to deal with, in situations where there is arbitrary breaking and such:
@(repeat)
@ (freeform "" 2)
A begin with @(skip) local_qid = @qid was seen @(skip)
@ (do (put-line `local_qid = @qid`))
@(end)
@(freeform "" 2)
says, "match the following pattern(s) using an alternate version of the input stream in which two lines are combined into one, using nothing as the separator". After that we just match for the desired line with the local_qid
as if it were one unwrapped line.
freeform
has no permanent effect on the shape of the input; the alteration is only seen by pattern code under its scope.
freeform
uses a lazy string data structure, so it can
be used without a numeric limit argument in situations in which the number of lines that need unwrapping is not easily known in advance. (In those situations you have to be careful not to force an instantiation of the entire remainder of a large file as a single string: only match what is needed.) freeform
has the property that material from the combined line that was not consumed by pattern matching magically turns back into individual lines again, so multi-line logic can continue in the same scope.
答案3
得分: 0
一种使用awk的潜在选项:
cat file.txt
I. 2023/06/02 17:57:58. 连接到服务器 'xxx' 作为用户 'xxx' 的连接已经消失(已关闭)。
I. 2023/06/02 18:01:58. ...... 连接到服务器 'xxx' 作为用户 'xxx'。
I. 2023/06/02 18:30:02. 连接到服务器 'xxx' 作为用户 'xxx' 的连接已经消失(已关闭)。
E. 2023/06/02 18:38:01. 错误 #9027 DIST(103 xxx.xxx) - /generic/gtr/tdext.c(719)
以事务ID开头 = 'c7f0f1' @ '00000000' P3 '0000000000dd' c '87' ^eWH '0000' WH 'dd' c '87' ^eW>T '0000000000' g '00000000' 和 local_qid = 0000000000000000000000000000
000000000000000000000000000000721fda00360005 在事务仍处于活动状态时从站点 xxx.xx 处看到。
awk 'BEGIN{RS="\n\n"} {for (i=1; i<=NF; i++) {if ($i == "local_qid") {print $(i + 2) ($(i + 3) ~ /[0-9]{1,}/ ? $(i + 3) : "")}}}' file.txt
0000000000000000000000000000000000000000000000000000000000721fda00360005
这将更改记录分隔符(RS)从一个换行符("\n",即逐行读取和处理每一行)到两个换行符(即读取所有行直到看到"\n\n")。然后,您可以定位字符串"local_qid"并打印其后的字段(实际的local_qid),并有条件地打印其后的字段,如果它由一个或多个数字组成的话(即如果local_qid扩展到下一行)。
英文:
One potential option using awk:
cat file.txt
I. 2023/06/02 17:57:58. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
I. 2023/06/02 18:01:58. ...... connected to server 'xxx' as user 'xxx'.
I. 2023/06/02 18:30:02. Connection to server 'xxx' as user 'xxx' has been faded out (closed).
E. 2023/06/02 18:38:01. ERROR #9027 DIST(103 xxx.xxx) - /generic/gtr/tdext.c(719)
A begin with transaction id = 'c7f0f1' @ '00000000' P3 '0000000000dd' c '87' ^eWH '0000' WH 'dd' c '87' ^eW>T '0000000000' g '00000000' and local_qid = 0000000000000000000000000000
000000000000000000000000000000721fda00360005 was seen from site xxx.xxwhile a transaction was still active.
awk 'BEGIN{RS="\n\n"} {for (i=1; i<=NF; i++) {if ($i == "local_qid") {print $(i + 2) ($(i + 3) ~ /[0-9]{1,}/ ? $(i + 3) : "")}}}' file.txt
0000000000000000000000000000000000000000000000000000000000721fda00360005
This changes the Record Separator (RS) from one newline ("\n", i.e. read in and process each line, one-by-one) to two newlines (i.e. read in all of the lines until "\n\n" is seen). You can then locate the string "local_qid" and print the field after it (the actual local_qid) and conditionally print the field after that if it is comprised of one or more digits (i.e. if the local_qid extends over to the next line).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论