使用Perl根据正则表达式或符号后出现的值对数组进行排序。

huangapple go评论58阅读模式
英文:

Using Perl to sort an array using values that occur after a RegEx or symbol

问题

我有一个数组:

@all (
    <side.effect signif="life.threat">myocardial infarction</side.effect>
    <side.effect signif="life.threat">hypersensitivity reactions</side.effect>
    <side.effect signif="life.threat">lactic acidosis</side.effect>
    <side.effect signif="most.freq">vomiting</side.effect>
    <side.effect signif="most.freq">diarrhea</side.effect>
);

我想要按照在开放的 XML 标签/属性后的值对数组进行排序,以产生以下输出:

<side.effect signif="most.freq">diarrhea</side.effect>
<side.effect signif="life.threat">hypersensitivity reactions</side.effect>
<side.effect signif="life.threat">lactic acidosis</side.effect>
<side.effect signif="life.threat">myocardial infarction</side.effect>
<side.effect signif="most.freq">vomiting</side.effect>

我不能将它转换为哈希表,因为那会因为重复而消除标签。我尝试过以下代码,但它不会对它们进行排序:

my @sorted_all = sort {
    my ($aa, $bb) = map { (split)[1] } $a, $b;
    $bb <=> $aa;
} @all;

如果你有任何问题,请随时提出。

英文:

I have an array:

@all (
<side.effect signif="life.threat">myocardial infarction</side.effect>
<side.effect signif="life.threat">hypersensitivity reactions</side.effect>
<side.effect signif="life.threat">lactic acidosis</side.effect>
<side.effect signif="most.freq">vomiting</side.effect>
<side.effect signif="most.freq">diarrhea</side.effect>
);

I want to sort the array on the values after the opening XML tags/attributes (">) to produce this output:

<side.effect signif="most.freq">diarrhea</side.effect>
<side.effect signif="life.threat">hypersensitivity reactions</side.effect>
<side.effect signif="life.threat">lactic acidosis</side.effect>
<side.effect signif="life.threat">myocardial infarction</side.effect>
<side.effect signif="most.freq">vomiting</side.effect>

I cannot convert it to a hash as that would eliminate the tags due to replication.
I tried this but it doesn't sort them:

    my @sorted_all = sort {
    my ($aa, $bb) = map { (split)[1] } $a, $b;
    $bb <=> $aa;
} @all;

答案1

得分: 5

使用 [Sort::Key](https://metacpan.org/pod/Sort::Key)

    use strict;
    use warnings;
    use feature qw(say);
    
    use Sort::Key qw(keysort);
    
    my @all = ( 
        q{<side.effect signif="life.threat">myocardial infarctio</side.effect>},
        q{<side.effect signif="life.threat">hypersensitivity reations</side.effect>},
        q{<side.effect signif="life.threat">lactic acidosis</side.effect>},
        q{<side.effect signif="most.freq">vomiting</side.effect>},
        q{<side.effect signif="most.freq">diarrhea</side.effect>},
    );
    
    
    my @sorted = keysort { ( /">(.+?)<\// )[0] } @all;  

    say for @sorted;

该库在需要时使用 [Schwartzian Transform](https://en.wikipedia.org/wiki/Schwartzian_transform) 首先为所有项构建比较模式而不是在每对比较时重新构建)。我按原样复制了输入包括拼写错误

使用正则表达式来解析XML标签依赖于这种非常具体的输入格式如果格式会有变化请使用一个合适的XML解析器如 [XML::LibXML](https://metacpan.org/pod/XML::LibXML)例如

    use XML::LibXML;

    my $parser = XML::LibXML->new;

    my @sorted = keysort {
        $parser->parse_string($_)
            ->findnodes('side.effect')->[0]
            ->textContent
    } @all;

有关此代码请参阅 [XML::LibXML::Parser](https://metacpan.org/dist/XML-LibXML/view/lib/XML/LibXML/Parser.pod) 和 [XML::LibXML::Node](https://metacpan.org/release/SHLOMIF/XML-LibXML-2.0128/view/lib/XML/LibXML/Node.pod)该库附带了更多文档请参阅首次提及的顶部文档链接

_为了使其工作我不得不纠正一个节点中的拼写错误 `sid.effect` 以使其成为有效的XML_
英文:

Using Sort::Key

use strict;
use warnings;
use feature qw(say);

use Sort::Key qw(keysort);

my @all = ( 
    q{<side.effect signif="life.threat">myocardial infarctio</side.effect>},
    q{<side.effect signif="life.threat">hypersensitivity reations</side.effect>},
    q{<side.effect signif="life.threat">lactic acidosis</sid.effect>},
    q{<side.effect signif="most.freq">vomiting</side.effect>},
    q{<side.effect signif="most.freq">diarrhea</side.effect>},
);


my @sorted = keysort { ( /">(.+?)<\// )[0] } @all;  

say for @sorted;

The library uses Schwartzian Transform when needed, to first build the comparison patterns for all items (and not re-do it at each pair comparison). I copied input as given, typos and all.

Using a regex to parse an XML tag relies on this very specific input format. If there are going to be variations in format then please use a proper XML parser, like XML::LibXML. For example

use XML::LibXML;

my $parser = XML::LibXML->new;

my @sorted = keysort {
    $parser -> parse_string($_)
        -> findnodes('side.effect') -> [0]
        -> textContent
} @all;

For this code see XML::LibXML::Parser and XML::LibXML::Node. This library comes with a lot more documentation, see the top document linked at first mention.

For this to work I had to correct the typo sid.effect in one node so to have a valid XML.

huangapple
  • 本文由 发表于 2023年7月11日 02:59:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76656588.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定