英文:
Using Perl to sort an array using values that occur after a RegEx or symbol
问题
我有一个数组:
@all (
    <side.effect signif="life.threat">myocardial infarction</side.effect>
    <side.effect signif="life.threat">hypersensitivity reactions</side.effect>
    <side.effect signif="life.threat">lactic acidosis</side.effect>
    <side.effect signif="most.freq">vomiting</side.effect>
    <side.effect signif="most.freq">diarrhea</side.effect>
);
我想要按照在开放的 XML 标签/属性后的值对数组进行排序,以产生以下输出:
<side.effect signif="most.freq">diarrhea</side.effect>
<side.effect signif="life.threat">hypersensitivity reactions</side.effect>
<side.effect signif="life.threat">lactic acidosis</side.effect>
<side.effect signif="life.threat">myocardial infarction</side.effect>
<side.effect signif="most.freq">vomiting</side.effect>
我不能将它转换为哈希表,因为那会因为重复而消除标签。我尝试过以下代码,但它不会对它们进行排序:
my @sorted_all = sort {
    my ($aa, $bb) = map { (split)[1] } $a, $b;
    $bb <=> $aa;
} @all;
如果你有任何问题,请随时提出。
英文:
I have an array:
@all (
<side.effect signif="life.threat">myocardial infarction</side.effect>
<side.effect signif="life.threat">hypersensitivity reactions</side.effect>
<side.effect signif="life.threat">lactic acidosis</side.effect>
<side.effect signif="most.freq">vomiting</side.effect>
<side.effect signif="most.freq">diarrhea</side.effect>
);
I want to sort the array on the values after the opening XML tags/attributes (">) to produce this output:
<side.effect signif="most.freq">diarrhea</side.effect>
<side.effect signif="life.threat">hypersensitivity reactions</side.effect>
<side.effect signif="life.threat">lactic acidosis</side.effect>
<side.effect signif="life.threat">myocardial infarction</side.effect>
<side.effect signif="most.freq">vomiting</side.effect>
I cannot convert it to a hash as that would eliminate the tags due to replication.
I tried this but it doesn't sort them:
    my @sorted_all = sort {
    my ($aa, $bb) = map { (split)[1] } $a, $b;
    $bb <=> $aa;
} @all;
答案1
得分: 5
使用 [Sort::Key](https://metacpan.org/pod/Sort::Key)
    use strict;
    use warnings;
    use feature qw(say);
    
    use Sort::Key qw(keysort);
    
    my @all = ( 
        q{<side.effect signif="life.threat">myocardial infarctio</side.effect>},
        q{<side.effect signif="life.threat">hypersensitivity reations</side.effect>},
        q{<side.effect signif="life.threat">lactic acidosis</side.effect>},
        q{<side.effect signif="most.freq">vomiting</side.effect>},
        q{<side.effect signif="most.freq">diarrhea</side.effect>},
    );
    
    
    my @sorted = keysort { ( /">(.+?)<\// )[0] } @all;  
    say for @sorted;
该库在需要时使用 [Schwartzian Transform](https://en.wikipedia.org/wiki/Schwartzian_transform) ,首先为所有项构建比较模式(而不是在每对比较时重新构建)。我按原样复制了输入,包括拼写错误。
使用正则表达式来解析XML标签依赖于这种非常具体的输入格式。如果格式会有变化,请使用一个合适的XML解析器,如 [XML::LibXML](https://metacpan.org/pod/XML::LibXML)。例如
    use XML::LibXML;
    my $parser = XML::LibXML->new;
    my @sorted = keysort {
        $parser->parse_string($_)
            ->findnodes('side.effect')->[0]
            ->textContent
    } @all;
有关此代码,请参阅 [XML::LibXML::Parser](https://metacpan.org/dist/XML-LibXML/view/lib/XML/LibXML/Parser.pod) 和 [XML::LibXML::Node](https://metacpan.org/release/SHLOMIF/XML-LibXML-2.0128/view/lib/XML/LibXML/Node.pod)。该库附带了更多文档,请参阅首次提及的顶部文档链接。
_为了使其工作,我不得不纠正一个节点中的拼写错误 `sid.effect` ,以使其成为有效的XML。_
英文:
Using Sort::Key
use strict;
use warnings;
use feature qw(say);
use Sort::Key qw(keysort);
my @all = ( 
    q{<side.effect signif="life.threat">myocardial infarctio</side.effect>},
    q{<side.effect signif="life.threat">hypersensitivity reations</side.effect>},
    q{<side.effect signif="life.threat">lactic acidosis</sid.effect>},
    q{<side.effect signif="most.freq">vomiting</side.effect>},
    q{<side.effect signif="most.freq">diarrhea</side.effect>},
);
my @sorted = keysort { ( /">(.+?)<\// )[0] } @all;  
say for @sorted;
The library uses Schwartzian Transform when needed, to first build the comparison patterns for all items (and not re-do it at each pair comparison). I copied input as given, typos and all.
Using a regex to parse an XML tag relies on this very specific input format. If there are going to be variations in format then please use a proper XML parser, like XML::LibXML. For example
use XML::LibXML;
my $parser = XML::LibXML->new;
my @sorted = keysort {
    $parser -> parse_string($_)
        -> findnodes('side.effect') -> [0]
        -> textContent
} @all;
For this code see XML::LibXML::Parser and XML::LibXML::Node. This library comes with a lot more documentation, see the top document linked at first mention.
For this to work I had to correct the typo sid.effect in one node so to have a valid XML.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论