在文件中移除前导重复的数字 – bash

2023年6月8日 06:25:53go评论95阅读模式

英文:

Remove preceding duplicate numbers in a file - bash

问题

在下面的文本文件“BEFORE FILE”中，我应该如何去除重复的数字，使其看起来像下面的“AFTER FILE”？其中的“_PRODxxxx，”中的x代表数字，将保持在这种格式中。
BEFORE FILE

NET_SalesD_PROD1111,mexico
NET_Sales4_PROD22,newjersy
NET_SalesG_PROD333,bull

AFTER FILE

NET_SalesD_PROD1,mexico
NET_Sales4_PROD2,newjersy
NET_SalesG_PROD3,bull


我尝试使用sed和一个正则表达式捕获组，类似于“PROD[1-9]{2,4}”，但无法使其起作用。

英文:

In the text file below "BEFORE FILE", how would I remove the duplicate numbers to make it look like the "AFTER FILE" below? The "_PRODxxxx," where the x's are the numbers, will stay in that format.

BEFORE FILE

NET_SalesD_PROD1111,mexico
NET_Sales4_PROD22,newjersy
NET_SalesG_PROD333,bull

AFTER FILE

NET_SalesD_PROD1,mexico
NET_Sales4_PROD2,newjersy
NET_SalesG_PROD3,bull

I have tried using sed and a regex capture group like "PROD[1-9]{2,4}" but cannot get it to work.

答案1

得分: 5

使用捕获组来捕获第一个数字，然后使用反向引用来匹配它的重复。然后在替换中使用相同的反向引用来生成只有一个的数字。

sed -E 's/PROD([1-9])+,/PROD,/'

英文:

Use a capture group to capture the first digit, and a back-reference to match repetitions of it. Then use the same back-reference in the replacement to produce just one of it.

sed -E &#39;s/PROD([1-9])+,/PROD,/&#39;

答案2

得分: 2

***第一种解决方案：*** 如果您可以接受使用Perl，可以按照以下方式操作，使用正则表达式和捕获组功能，以及在正则表达式中使用贪婪匹配和懒惰匹配功能来实现所需的输出。
```perl
perl -pe 's|^(.*_)(.*?)(\d)*(,.*)$||'  Input_file

第二种解决方案： 在Perl中使用简单的替换，使用捕获组查找重复项，并将其替换为自身后跟一个“,”。

perl -pe 's|([0-9])*,|,|'  Input_file

英文:

1st solution: In case you are ok with Perl have it like this way then using regex and capturing group capability and using greedy match and then Lazy match capabilities in regex to achieve the required output.

perl -pe &#39;s|^(.*_)(.*?)(\d)*(,.*)$||&#39;  Input_file

2nd solution: Using simple substitution in perl using capturing group to find duplicates and substitute it with itself followed by a ,.

perl -pe &#39;s|([0-9])*,|,|&#39;  Input_file

答案3

得分: 0

$ sed -E 's/(_PROD[0-9])[0-9]*/\1/' x
NET_SalesD_PROD1,mexico
NET_Sales4_PROD2,newjersy
NET_SalesG_PROD3,bull

英文:

Assumptions:

all lines contain the string _PROD[0-9]+,
we (effectively) want to keep the first number that comes after _PROD

One sed approach:

$ sed -E &#39;s/(_PROD[0-9])[0-9]*//&#39; x
NET_SalesD_PROD1,mexico
NET_Sales4_PROD2,newjersy
NET_SalesG_PROD3,bull

Where:

(_PROD[0-9]) - (first) capture group matches on the string _PROD<single_digit> followed by ...
[0-9]* - zero or more digits
\1 - replace the match with the (first) capture group

答案4

得分: 0

如果你想使用`awk`，需要经过一番漫长的步骤：
     awk -vc="PROD" '{
          split($1, h1, c)
          split(h1[2], h2, ",")
          print h1[1] "c" substr(h2[1], 1, 1) "," h2[2]
     }'

英文:

A long way if you want to awk it:

 awk -vc=&quot;PROD&quot; &#39;{
      split($1,h1,c)
      split(h1[2],h2,&quot;,&quot;)
      print h1[1]&quot;&quot;c&quot;&quot;substr(h2[1],1,1)&quot;,&quot;h2[2]
 }&#39;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

本文由 huangapple 发表于 2023年6月8日 06:25:53
转载请务必保留本文链接：https://go.coder-hub.com/76427464.html

awk
bash
sed
shell

比较两个版本号在Bash脚本中无法正常工作。

go 85 05/25

GOLANG 检查 MongoDB 是否正在运行

go 122 05/31

将一个十六进制八位数组转换为Bash中的字符串变量。

go 86 02/26

Bash脚本将包含"和'的变量值写入文件中

go 99 06/01

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。