如何使用“ ”分隔符分割一行,但不包括在单引号中封装的字符串?

huangapple go评论59阅读模式
英文:

How to separate a line with " " delimiter but, excluding string encapsulated in the single quotes?

问题

请尝试以下代码:

sed "s/],/], /g; s/\([0-9]*\)ms//g" | awk -F\' '{print "\x27"$2"\x27", $4", "$6", "$10}'

这会将输入转换成你期望的输出:

'Temp.200.200B.Y2K & K-102 & P-503B.SP' (tp9012ga-bt102-734b-pqm4-kjk94kj10), 2023-02-12T06:39:48Z, 2023-02-12T07:25:48.044Z, 99

如果你对此有任何疑问,请随时问。

英文:

This is my first post ever so please forgive me if I missed any details.

PROBLEM STATEMENT:
I have a bunch of these lines in the file. The fields are separated by space.

'Temp.200.200B.Y2K & K-102 & P-503B.SP' (tp9012ga-bt102-734b-pqm4-kjk94kj10), PASSED, 2023-02-12T06:39:48Z, 2023-02-12T07:25:48.044Z, 1440] took 99ms including network delay.

I would like to keep what's in the single quotes and also break these into fields with " " delimiter. The desired output is below.

'Temp.200.200B.Y2K & K-102 & P-503B.SP' (tp9012ga-bt102-734b-pqm4-kjk94kj10), 2023-02-12T06:39:48Z, 2023-02-12T07:25:48.044Z, 99

now keep in mind that the character inside of the single quotes varies vastly but, they are always encapsulated within single quotes.

I have tried cut with a space delimiter but, it also considers spaces in the string inside of the single quotes.
cut -d\' -f1-6

Also, if you notice my desired output, I also wanted to remove some fields and some characters such as 'ms' from 99ms.

答案1

得分: 1

如何使用“”分隔符分隔一行,但要排除在单引号中包装的字符串?

我将使用GNU AWK 来执行此任务,以下是一个简单的示例,假设file.txt的内容如下:

fields without quotes
'quoted field' 'another quoted field' 'yet another field'
mixed 'quoted field' unquoted

然后可以运行以下awk命令:

awk 'BEGIN{FPAT="\047[^\047]*\047|[^ ]*"}{print "第一个字段是",$1; print "第二个字段是",$2; print "第三个字段是",$3}' file.txt

这将输出:

第一个字段是 fields
第二个字段是 without
第三个字段是 quotes
第一个字段是 'quoted field'
第二个字段是 'another quoted field'
第三个字段是 'yet another field'
第一个字段是 mixed
第二个字段是 'quoted field'
第三个字段是 unquoted

解释:我使用FPAT来告诉GNU AWK如何划分字段,即单引号(因为'用作终止符,我使用ASCII代码\047表示该字符的八进制编码),后跟零个或多个非引号字符,或者后跟零个或多个非空格字符。 免责声明:此解决方案假定'都是完美平衡的,并且引号内永远不会出现'以外的非终止字符。

(在GNU Awk 5.0.1中测试通过)

英文:

> How to separate a line with " " delimiter but, excluding string
> encapsulated in the single quotes?

I would harness GNU AWK for this task following way, consider following simple example, let file.txt content be

fields without quotes
'quoted field' 'another quoted field' 'yet another field'
mixed 'quoted field' unquoted

then

awk 'BEGIN{FPAT="7[^7]*7|[^ ]*"}{print "1st field is",$1; print "2nd field is",$2; print "3rd field is",$3}' file.txt

gives output

1st field is fields
2nd field is without
3rd field is quotes
1st field is 'quoted field'
2nd field is 'another quoted field'
3rd field is 'yet another field'
1st field is mixed
2nd field is 'quoted field'
3rd field is unquoted

Explanation: I use FPAT to inform GNU AWK what constitutes field, namely single quote (as ' is used as terminator I use \047 which is ASCII code of that character in octal) followed by zero-or-more non-quotes followed by single quote OR (|) zero-or-more non-space characters. Disclaimer: this solution assumes ' are perfectly balanced and there is never ' inside quoted field which is non-terminating.

(tested in GNU Awk 5.0.1)

答案2

得分: 1

这可能适用于您(GNU sed):

sed -E 's/'\''[^'\'']*'\''|\S+/&\n/g
        s/.*/echo "&"|sed -n "1,2p;4,5p;8s#ms##p"/e
        s/\n//g' file
在空格分隔符之前添加换行符。

使用替代命令中的评估,运行第二个 sed 调用,并将每个字段视为一行。

删除或修改行(字段)。

删除插入的换行符。
英文:

This might work for you (GNU sed):

sed -E 's/'\''[^'\'']*'\''|\S+/&\n/g
        s/.*/echo "&"|sed -n "1,2p;4,5p;8s#ms##p"/e
        s/\n//g' file

Prepend newlines to space delimiters.

Using the evaluation within the substitution command, run a second invocation of sed and treat each field as a line.

Remove or amend the lines (fields).

Remove the inserted newlines.

答案3

得分: 0

通过查看问题陈述和所期望的输出,您可能需要使用,作为分隔符,结合使用awksed

我将简单地在此示例中回显您的PROBLEM STATEMENT字符串,以向您展示如何执行。在这种情况下,我假设您的文件中的行格式相同(除了,之外的引号内字符变化不会有太大问题)。

输出结果:

'Temp.200.200B.Y2K & K-102 & P-503B.SP' (tp9012ga-bt102-734b-pqm4-kjk94kj10) , 2023-02-12T06:39:48Z, 2023-02-12T07:25:48.044Z, 99

编辑:
@Ed Morton - 我尝试了您的方法,您是对的。也可以仅使用awk来执行此操作。以下是相应的命令。

echo "'Temp.200.200B.Y2K & K-102 & P-503B.SP' (tp9012ga-bt102-734b-pqm4-kjk94kj10), PASSED, 2023-02-12T06:39:48Z, 2023-02-12T07:25:48.044Z, 1440] took 99ms including network delay." | awk -F "," '{ gsub("[0-9]*] took ","",$5); gsub("ms .*","",$5); print $1,","$3","$4","$5}'
英文:

By looking at the problem statement and the desired output, you may need to go for , as delimiter along with a combination of awk and sed.

I will simply echo your PROBLEM STATEMENT string in this case to show you how it can be done.
I am assuming the line format is the same in your file (no issues with characters inside the quote changing vastly except for ,)

echo "'Temp.200.200B.Y2K & K-102 & P-503B.SP' (tp9012ga-bt102-734b-pqm4-kjk94kj10), PASSED, 2023-02-12T06:39:48Z, 2023-02-12T07:25:48.044Z, 1440] took 99ms including network delay." | awk -F "," '{print $1,","$3","$4","$5}' | sed -e 's/ms .*//g' -e 's/[0-9]*] took //g'

The Output:

'Temp.200.200B.Y2K & K-102 & P-503B.SP' (tp9012ga-bt102-734b-pqm4-kjk94kj10) , 2023-02-12T06:39:48Z, 2023-02-12T07:25:48.044Z, 99

EDIT:
@Ed Morton - I tried your approach and you are right. It can be done using awk only as well. The command is given below.

echo "'Temp.200.200B.Y2K & K-102 & P-503B.SP' (tp9012ga-bt102-734b-pqm4-kjk94kj10), PASSED, 2023-02-12T06:39:48Z, 2023-02-12T07:25:48.044Z, 1440] took 99ms including network delay." | awk -F "," '{ gsub("[0-9]*] took ","",$5); gsub("ms .*","",$5); print $1,","$3","$4","$5}' 

huangapple
  • 本文由 发表于 2023年2月19日 14:01:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75498295.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定