英文:
Remove preceding duplicate numbers in a file - bash
问题
在下面的文本文件“BEFORE FILE”中,我应该如何去除重复的数字,使其看起来像下面的“AFTER FILE”?其中的“_PRODxxxx,”中的x代表数字,将保持在这种格式中。
BEFORE FILE
NET_SalesD_PROD1111,mexico
NET_Sales4_PROD22,newjersy
NET_SalesG_PROD333,bull
AFTER FILE
NET_SalesD_PROD1,mexico
NET_Sales4_PROD2,newjersy
NET_SalesG_PROD3,bull
我尝试使用sed和一个正则表达式捕获组,类似于“PROD[1-9]{2,4}”,但无法使其起作用。
英文:
In the text file below "BEFORE FILE", how would I remove the duplicate numbers to make it look like the "AFTER FILE" below? The "_PRODxxxx," where the x's are the numbers, will stay in that format.
BEFORE FILE
NET_SalesD_PROD1111,mexico
NET_Sales4_PROD22,newjersy
NET_SalesG_PROD333,bull
AFTER FILE
NET_SalesD_PROD1,mexico
NET_Sales4_PROD2,newjersy
NET_SalesG_PROD3,bull
I have tried using sed and a regex capture group like "PROD[1-9]{2,4}" but cannot get it to work.
答案1
得分: 5
使用捕获组来捕获第一个数字,然后使用反向引用来匹配它的重复。然后在替换中使用相同的反向引用来生成只有一个的数字。
sed -E 's/PROD([1-9])+,/PROD,/'
英文:
Use a capture group to capture the first digit, and a back-reference to match repetitions of it. Then use the same back-reference in the replacement to produce just one of it.
sed -E 's/PROD([1-9])+,/PROD,/'
答案2
得分: 2
***第一种解决方案:*** 如果您可以接受使用Perl,可以按照以下方式操作,使用正则表达式和捕获组功能,以及在正则表达式中使用贪婪匹配和懒惰匹配功能来实现所需的输出。
```perl
perl -pe 's|^(.*_)(.*?)(\d)*(,.*)$||' Input_file
第二种解决方案: 在Perl中使用简单的替换,使用捕获组查找重复项,并将其替换为自身后跟一个“,”。
perl -pe 's|([0-9])*,|,|' Input_file
英文:
1st solution: In case you are ok with Perl have it like this way then using regex and capturing group capability and using greedy match and then Lazy match capabilities in regex to achieve the required output.
perl -pe 's|^(.*_)(.*?)(\d)*(,.*)$||' Input_file
2nd solution: Using simple substitution in perl using capturing group to find duplicates and substitute it with itself followed by a ,
.
perl -pe 's|([0-9])*,|,|' Input_file
答案3
得分: 0
$ sed -E 's/(_PROD[0-9])[0-9]*/\1/' x
NET_SalesD_PROD1,mexico
NET_Sales4_PROD2,newjersy
NET_SalesG_PROD3,bull
英文:
Assumptions:
- all lines contain the string
_PROD[0-9]+,
- we (effectively) want to keep the first number that comes after
_PROD
One sed
approach:
$ sed -E 's/(_PROD[0-9])[0-9]*//' x
NET_SalesD_PROD1,mexico
NET_Sales4_PROD2,newjersy
NET_SalesG_PROD3,bull
Where:
(_PROD[0-9])
- (first) capture group matches on the string_PROD<single_digit>
followed by ...[0-9]*
- zero or more digits\1
- replace the match with the (first) capture group
答案4
得分: 0
如果你想使用`awk`,需要经过一番漫长的步骤:
awk -vc="PROD" '{
split($1, h1, c)
split(h1[2], h2, ",")
print h1[1] "c" substr(h2[1], 1, 1) "," h2[2]
}'
英文:
A long way if you want to awk
it:
awk -vc="PROD" '{
split($1,h1,c)
split(h1[2],h2,",")
print h1[1]""c""substr(h2[1],1,1)","h2[2]
}'
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论