英文:
sorting secondary field on text file
问题
我已经设法生成了一个看起来像这样的奇怪的结果文件:
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv130_samples_f1.json,0.1982142857142857
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv140_samples_f1.json,0.21814006888633755
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv150_samples_f1.json,0.1887550200803213
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv160_samples_f1.json,0.2225237449118046
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv170_samples_f1.json,0.20413793103448277
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv180_samples_f1.json,0.17142857142857146
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv190_samples_f1.json,0.17335473515248798
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv200_samples_f1.json,0.09504950495049505
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv20_samples_f1.json,0.25921052631578945
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv30_samples_f1.json,0.21734357848518113
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv40_samples_f1.json,0.2786516853932584
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv50_samples_f1.json,0.23666462293071736
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv60_samples_f1.json,0.2426584234930448
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv70_samples_f1.json,0.25702811244979923
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv80_samples_f1.json,0.3188405797101449
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv90_samples_f1.json,0.21703089675960813
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv100_samples_f1.json,0.5630689206762028
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv10_samples_f1.json,0.4421855146124524
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv110_samples_f1.json,0.5385074626865671
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv120_samples_f1.json,0.48465266558966075
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv130_samples_f1.json,0.6061946902654868
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv140_samples_f1.json,0.5345060893098782
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv150_samples_f1.json,0.5061946902654867
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv160_samples_f1.json,0.4723032069970845
...
...
问题是,我已经专注于对raw1/100
、raw1/101
进行排序,这一点做得很好,但我现在意识到我需要解决以下问题:
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_h
<details>
<summary>英文:</summary>
I have managed to somehow generate myself a bizzare results file which looks like so:
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv130_samples_f1.json,0.1982142857142857
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv140_samples_f1.json,0.21814006888633755
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv150_samples_f1.json,0.1887550200803213
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv160_samples_f1.json,0.2225237449118046
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv170_samples_f1.json,0.20413793103448277
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv180_samples_f1.json,0.17142857142857146
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv190_samples_f1.json,0.17335473515248798
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv200_samples_f1.json,0.09504950495049505
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv20_samples_f1.json,0.25921052631578945
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv30_samples_f1.json,0.21734357848518113
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv40_samples_f1.json,0.2786516853932584
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv50_samples_f1.json,0.23666462293071736
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv60_samples_f1.json,0.2426584234930448
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv70_samples_f1.json,0.25702811244979923
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv80_samples_f1.json,0.3188405797101449
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv90_samples_f1.json,0.21703089675960813
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv100_samples_f1.json,0.5630689206762028
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv10_samples_f1.json,0.4421855146124524
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv110_samples_f1.json,0.5385074626865671
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv120_samples_f1.json,0.48465266558966075
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv130_samples_f1.json,0.6061946902654868
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv140_samples_f1.json,0.5345060893098782
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv150_samples_f1.json,0.5061946902654867
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv160_samples_f1.json,0.4723032069970845
...
...
The problem is that I have focused to sort on `raw1/100`, `raw1/101` which has worked well but I now realise that I need to sort out this problem:
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485
as in, I want it ordered as
*.csv10_samples_f1.json
*.csv20_samples_f1.json
*.csv30_samples_f1.json
currently I am manually doing this and pasting it to google sheets which is disgusting.
Unsure how I can automate this because I also need to preserve `100,101,102` order
any tips would be great
</details>
# 答案1
**得分**: 2
这只是一个猜测,考虑到我们对您的需求了解不多,输入中缺少相关的测试用例,问题中也没有预期的输出,但这可能是您想要的,使用GNU awk的`gensub()`与Decorate-Sort-Undecorate惯用法:
```shell
$ awk -F'[/.]' -v OFS='\t' '{
print gensub(/[0-9].*/,"",1,$8), gensub(/[^0-9]+/,"",1,$8), \
gensub(/[^0-9].*/,"",1,$9), gensub(/[0-9]+/,"",1,$9), \
gensub(/[0-9].*/,"",1,$10), gensub(/[^0-9]+/,"",1,$10)+0, \
$0
}' file |
sort -k1,1 -k2,2n -k3,3n -k4,4 -k5,5 -k6,6n |
cut -f7-
上述假设您希望每个提到的字符串的第一个非数字部分按字母顺序排序,第一个数字部分按数字顺序排序,并且不关心字符串的其余部分(如果需要,您可以轻松调整gensub()
、sort
和cut
来添加更多要排序的子字符串)。
英文:
It's just a guess given how much we don't know about your requirements, the lack of relevant test cases in your input, and the lack of expected output in your question, but this might be what you want using GNU awk for gensub()
with a Decorate-Sort-Undecorate idiom:
$ awk -F'[/.]' -v OFS='\t' '{
print gensub(/[0-9].*/,"",1,$8), gensub(/[^0-9]+/,"",1,$8), \
gensub(/[^0-9].*/,"",1,$9), gensub(/[0-9]+/,"",1,$9), \
gensub(/[0-9].*/,"",1,$10), gensub(/[^0-9]+/,"",1,$10)+0, \
$0
}' file |
sort -k1,1 -k2,2n -k3,3n -k4,4 -k5,5 -k6,6n |
cut -f7-
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv10_samples_f1.json,0.1822943949711891
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv20_samples_f1.json,0.25921052631578945
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv30_samples_f1.json,0.21734357848518113
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv40_samples_f1.json,0.2786516853932584
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv50_samples_f1.json,0.23666462293071736
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv60_samples_f1.json,0.2426584234930448
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv70_samples_f1.json,0.25702811244979923
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv80_samples_f1.json,0.3188405797101449
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv90_samples_f1.json,0.21703089675960813
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv100_samples_f1.json,0.32955974842767294
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv110_samples_f1.json,0.2645739910313901
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv120_samples_f1.json,0.29959514170040485
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv130_samples_f1.json,0.1982142857142857
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv140_samples_f1.json,0.21814006888633755
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv150_samples_f1.json,0.1887550200803213
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv160_samples_f1.json,0.2225237449118046
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv170_samples_f1.json,0.20413793103448277
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv180_samples_f1.json,0.17142857142857146
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv190_samples_f1.json,0.17335473515248798
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/100_count0_4751_count1_250.csv200_samples_f1.json,0.09504950495049505
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv10_samples_f1.json,0.4421855146124524
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv100_samples_f1.json,0.5630689206762028
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv110_samples_f1.json,0.5385074626865671
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv120_samples_f1.json,0.48465266558966075
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv130_samples_f1.json,0.6061946902654868
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv140_samples_f1.json,0.5345060893098782
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv150_samples_f1.json,0.5061946902654867
/home/john-rb/ramsve/ab_sft_housing_raw1/data/housing/raw1/101_count0_4408_count1_593.csv160_samples_f1.json,0.4723032069970845
The above assumes you want the first non-numeric part of each string you mentioned sorted alphabetically and the first numeric part sorted numerically and don't care about any remaining parts of the strings (you can easily tweak the gensub()
s, sort
, and cut
to add more substrings to sort on if necessary).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论