在文本文件中对次要字段进行排序。

huangapple go评论62阅读模式
英文:

sorting secondary field on text file

问题

我已经设法生成了一个看起来像这样的奇怪的结果文件:

/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv130_samples_f1.json,0.1982142857142857
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv140_samples_f1.json,0.21814006888633755
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv150_samples_f1.json,0.1887550200803213
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv160_samples_f1.json,0.2225237449118046
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv170_samples_f1.json,0.20413793103448277
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv180_samples_f1.json,0.17142857142857146
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv190_samples_f1.json,0.17335473515248798
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv200_samples_f1.json,0.09504950495049505
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv20_samples_f1.json,0.25921052631578945
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv30_samples_f1.json,0.21734357848518113
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv40_samples_f1.json,0.2786516853932584
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv50_samples_f1.json,0.23666462293071736
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv60_samples_f1.json,0.2426584234930448
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv70_samples_f1.json,0.25702811244979923
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv80_samples_f1.json,0.3188405797101449
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv90_samples_f1.json,0.21703089675960813
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv100_samples_f1.json,0.5630689206762028
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv10_samples_f1.json,0.4421855146124524
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv110_samples_f1.json,0.5385074626865671
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv120_samples_f1.json,0.48465266558966075
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv130_samples_f1.json,0.6061946902654868
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv140_samples_f1.json,0.5345060893098782
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv150_samples_f1.json,0.5061946902654867
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv160_samples_f1.json,0.4723032069970845
...
...

问题是,我已经专注于对raw1/100raw1/101进行排序,这一点做得很好,但我现在意识到我需要解决以下问题:

/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_h

<details>
<summary>英文:</summary>

I have managed to somehow generate myself a bizzare results file which looks like so:

/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv130_samples_f1.json,0.1982142857142857
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv140_samples_f1.json,0.21814006888633755
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv150_samples_f1.json,0.1887550200803213
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv160_samples_f1.json,0.2225237449118046
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv170_samples_f1.json,0.20413793103448277
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv180_samples_f1.json,0.17142857142857146
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv190_samples_f1.json,0.17335473515248798
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv200_samples_f1.json,0.09504950495049505
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv20_samples_f1.json,0.25921052631578945
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv30_samples_f1.json,0.21734357848518113
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv40_samples_f1.json,0.2786516853932584
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv50_samples_f1.json,0.23666462293071736
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv60_samples_f1.json,0.2426584234930448
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv70_samples_f1.json,0.25702811244979923
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv80_samples_f1.json,0.3188405797101449
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv90_samples_f1.json,0.21703089675960813
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv100_samples_f1.json,0.5630689206762028
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv10_samples_f1.json,0.4421855146124524
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv110_samples_f1.json,0.5385074626865671
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv120_samples_f1.json,0.48465266558966075
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv130_samples_f1.json,0.6061946902654868
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv140_samples_f1.json,0.5345060893098782
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv150_samples_f1.json,0.5061946902654867
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv160_samples_f1.json,0.4723032069970845
...
...


The problem is that I have focused to sort on `raw1/100`, `raw1/101` which has worked well but I now realise that I need to sort out this problem:

/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485


as in, I want it ordered as 

*.csv10_samples_f1.json
*.csv20_samples_f1.json
*.csv30_samples_f1.json


currently I am manually doing this and pasting it to google sheets which is disgusting.
Unsure how I can automate this because I also need to preserve `100,101,102` order

any tips would be great



</details>


# 答案1
**得分**: 2

这只是一个猜测,考虑到我们对您的需求了解不多,输入中缺少相关的测试用例,问题中也没有预期的输出,但这可能是您想要的,使用GNU awk的`gensub()`与Decorate-Sort-Undecorate惯用法:

```shell
$ awk -F'[/.]' -v OFS='\t' '{
    print gensub(/[0-9].*/,"",1,$8),  gensub(/[^0-9]+/,"",1,$8),    \
          gensub(/[^0-9].*/,"",1,$9), gensub(/[0-9]+/,"",1,$9),     \
          gensub(/[0-9].*/,"",1,$10), gensub(/[^0-9]+/,"",1,$10)+0, \
          $0
}' file |
sort -k1,1 -k2,2n -k3,3n -k4,4 -k5,5 -k6,6n |
cut -f7-

上述假设您希望每个提到的字符串的第一个非数字部分按字母顺序排序,第一个数字部分按数字顺序排序,并且不关心字符串的其余部分(如果需要,您可以轻松调整gensub()sortcut来添加更多要排序的子字符串)。

英文:

It's just a guess given how much we don't know about your requirements, the lack of relevant test cases in your input, and the lack of expected output in your question, but this might be what you want using GNU awk for gensub() with a Decorate-Sort-Undecorate idiom:

$ awk -F&#39;[/.]&#39; -v OFS=&#39;\t&#39; &#39;{
    print gensub(/[0-9].*/,&quot;&quot;,1,$8),  gensub(/[^0-9]+/,&quot;&quot;,1,$8),    \
          gensub(/[^0-9].*/,&quot;&quot;,1,$9), gensub(/[0-9]+/,&quot;&quot;,1,$9),     \
          gensub(/[0-9].*/,&quot;&quot;,1,$10), gensub(/[^0-9]+/,&quot;&quot;,1,$10)+0, \
          $0
}&#39; file |
sort -k1,1 -k2,2n -k3,3n -k4,4 -k5,5 -k6,6n |
cut -f7-
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv20_samples_f1.json,0.25921052631578945
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv30_samples_f1.json,0.21734357848518113
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv40_samples_f1.json,0.2786516853932584
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv50_samples_f1.json,0.23666462293071736
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv60_samples_f1.json,0.2426584234930448
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv70_samples_f1.json,0.25702811244979923
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv80_samples_f1.json,0.3188405797101449
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv90_samples_f1.json,0.21703089675960813
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv130_samples_f1.json,0.1982142857142857
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv140_samples_f1.json,0.21814006888633755
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv150_samples_f1.json,0.1887550200803213
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv160_samples_f1.json,0.2225237449118046
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv170_samples_f1.json,0.20413793103448277
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv180_samples_f1.json,0.17142857142857146
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv190_samples_f1.json,0.17335473515248798
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv200_samples_f1.json,0.09504950495049505
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv10_samples_f1.json,0.4421855146124524
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv100_samples_f1.json,0.5630689206762028
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv110_samples_f1.json,0.5385074626865671
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv120_samples_f1.json,0.48465266558966075
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv130_samples_f1.json,0.6061946902654868
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv140_samples_f1.json,0.5345060893098782
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv150_samples_f1.json,0.5061946902654867
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv160_samples_f1.json,0.4723032069970845

The above assumes you want the first non-numeric part of each string you mentioned sorted alphabetically and the first numeric part sorted numerically and don't care about any remaining parts of the strings (you can easily tweak the gensub()s, sort, and cut to add more substrings to sort on if necessary).

huangapple
  • 本文由 发表于 2023年6月18日 18:09:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76499999.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定