在文本文件中对次要字段进行排序。

huangapple go评论84阅读模式
英文:

sorting secondary field on text file

问题

我已经设法生成了一个看起来像这样的奇怪的结果文件:

  1. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
  2. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
  3. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
  4. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485
  5. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv130_samples_f1.json,0.1982142857142857
  6. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv140_samples_f1.json,0.21814006888633755
  7. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv150_samples_f1.json,0.1887550200803213
  8. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv160_samples_f1.json,0.2225237449118046
  9. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv170_samples_f1.json,0.20413793103448277
  10. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv180_samples_f1.json,0.17142857142857146
  11. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv190_samples_f1.json,0.17335473515248798
  12. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv200_samples_f1.json,0.09504950495049505
  13. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv20_samples_f1.json,0.25921052631578945
  14. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv30_samples_f1.json,0.21734357848518113
  15. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv40_samples_f1.json,0.2786516853932584
  16. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv50_samples_f1.json,0.23666462293071736
  17. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv60_samples_f1.json,0.2426584234930448
  18. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv70_samples_f1.json,0.25702811244979923
  19. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv80_samples_f1.json,0.3188405797101449
  20. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv90_samples_f1.json,0.21703089675960813
  21. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv100_samples_f1.json,0.5630689206762028
  22. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv10_samples_f1.json,0.4421855146124524
  23. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv110_samples_f1.json,0.5385074626865671
  24. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv120_samples_f1.json,0.48465266558966075
  25. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv130_samples_f1.json,0.6061946902654868
  26. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv140_samples_f1.json,0.5345060893098782
  27. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv150_samples_f1.json,0.5061946902654867
  28. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv160_samples_f1.json,0.4723032069970845
  29. ...
  30. ...

问题是,我已经专注于对raw1/100raw1/101进行排序,这一点做得很好,但我现在意识到我需要解决以下问题:

  1. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
  2. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
  3. /home/john-rb/ramsve/ab_sft_h
  4. <details>
  5. <summary>英文:</summary>
  6. I have managed to somehow generate myself a bizzare results file which looks like so:

/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv130_samples_f1.json,0.1982142857142857
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv140_samples_f1.json,0.21814006888633755
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv150_samples_f1.json,0.1887550200803213
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv160_samples_f1.json,0.2225237449118046
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv170_samples_f1.json,0.20413793103448277
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv180_samples_f1.json,0.17142857142857146
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv190_samples_f1.json,0.17335473515248798
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv200_samples_f1.json,0.09504950495049505
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv20_samples_f1.json,0.25921052631578945
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv30_samples_f1.json,0.21734357848518113
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv40_samples_f1.json,0.2786516853932584
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv50_samples_f1.json,0.23666462293071736
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv60_samples_f1.json,0.2426584234930448
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv70_samples_f1.json,0.25702811244979923
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv80_samples_f1.json,0.3188405797101449
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv90_samples_f1.json,0.21703089675960813
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv100_samples_f1.json,0.5630689206762028
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv10_samples_f1.json,0.4421855146124524
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv110_samples_f1.json,0.5385074626865671
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv120_samples_f1.json,0.48465266558966075
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv130_samples_f1.json,0.6061946902654868
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv140_samples_f1.json,0.5345060893098782
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv150_samples_f1.json,0.5061946902654867
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv160_samples_f1.json,0.4723032069970845
...
...

  1. The problem is that I have focused to sort on `raw1/100`, `raw1/101` which has worked well but I now realise that I need to sort out this problem:

/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485

  1. as in, I want it ordered as

*.csv10_samples_f1.json
*.csv20_samples_f1.json
*.csv30_samples_f1.json

  1. currently I am manually doing this and pasting it to google sheets which is disgusting.
  2. Unsure how I can automate this because I also need to preserve `100,101,102` order
  3. any tips would be great
  4. </details>
  5. # 答案1
  6. **得分**: 2
  7. 这只是一个猜测,考虑到我们对您的需求了解不多,输入中缺少相关的测试用例,问题中也没有预期的输出,但这可能是您想要的,使用GNU awk`gensub()`Decorate-Sort-Undecorate惯用法:
  8. ```shell
  9. $ awk -F'[/.]' -v OFS='\t' '{
  10. print gensub(/[0-9].*/,"",1,$8), gensub(/[^0-9]+/,"",1,$8), \
  11. gensub(/[^0-9].*/,"",1,$9), gensub(/[0-9]+/,"",1,$9), \
  12. gensub(/[0-9].*/,"",1,$10), gensub(/[^0-9]+/,"",1,$10)+0, \
  13. $0
  14. }' file |
  15. sort -k1,1 -k2,2n -k3,3n -k4,4 -k5,5 -k6,6n |
  16. cut -f7-

上述假设您希望每个提到的字符串的第一个非数字部分按字母顺序排序,第一个数字部分按数字顺序排序,并且不关心字符串的其余部分(如果需要,您可以轻松调整gensub()sortcut来添加更多要排序的子字符串)。

英文:

It's just a guess given how much we don't know about your requirements, the lack of relevant test cases in your input, and the lack of expected output in your question, but this might be what you want using GNU awk for gensub() with a Decorate-Sort-Undecorate idiom:

  1. $ awk -F&#39;[/.]&#39; -v OFS=&#39;\t&#39; &#39;{
  2. print gensub(/[0-9].*/,&quot;&quot;,1,$8), gensub(/[^0-9]+/,&quot;&quot;,1,$8), \
  3. gensub(/[^0-9].*/,&quot;&quot;,1,$9), gensub(/[0-9]+/,&quot;&quot;,1,$9), \
  4. gensub(/[0-9].*/,&quot;&quot;,1,$10), gensub(/[^0-9]+/,&quot;&quot;,1,$10)+0, \
  5. $0
  6. }&#39; file |
  7. sort -k1,1 -k2,2n -k3,3n -k4,4 -k5,5 -k6,6n |
  8. cut -f7-
  9. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
  10. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv20_samples_f1.json,0.25921052631578945
  11. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv30_samples_f1.json,0.21734357848518113
  12. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv40_samples_f1.json,0.2786516853932584
  13. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv50_samples_f1.json,0.23666462293071736
  14. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv60_samples_f1.json,0.2426584234930448
  15. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv70_samples_f1.json,0.25702811244979923
  16. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv80_samples_f1.json,0.3188405797101449
  17. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv90_samples_f1.json,0.21703089675960813
  18. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
  19. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
  20. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485
  21. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv130_samples_f1.json,0.1982142857142857
  22. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv140_samples_f1.json,0.21814006888633755
  23. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv150_samples_f1.json,0.1887550200803213
  24. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv160_samples_f1.json,0.2225237449118046
  25. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv170_samples_f1.json,0.20413793103448277
  26. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv180_samples_f1.json,0.17142857142857146
  27. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv190_samples_f1.json,0.17335473515248798
  28. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv200_samples_f1.json,0.09504950495049505
  29. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv10_samples_f1.json,0.4421855146124524
  30. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv100_samples_f1.json,0.5630689206762028
  31. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv110_samples_f1.json,0.5385074626865671
  32. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv120_samples_f1.json,0.48465266558966075
  33. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv130_samples_f1.json,0.6061946902654868
  34. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv140_samples_f1.json,0.5345060893098782
  35. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv150_samples_f1.json,0.5061946902654867
  36. /home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv160_samples_f1.json,0.4723032069970845

The above assumes you want the first non-numeric part of each string you mentioned sorted alphabetically and the first numeric part sorted numerically and don't care about any remaining parts of the strings (you can easily tweak the gensub()s, sort, and cut to add more substrings to sort on if necessary).

huangapple
  • 本文由 发表于 2023年6月18日 18:09:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76499999.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定