Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching

huangapple go评论64阅读模式
英文:

Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching

问题

以下是您要翻译的代码部分:

#include <iostream>
#include <chrono>
#include <boost/spirit/home/x3.hpp>

bool isSubStrInStr(const std::string subStr, const std::string& str)
{
    auto begin = str.find_last_of('/');
    auto end = str.find_last_of('-');
    if (end != std::string::npos)
    {
        if (begin != std::string::npos)
        {
            if (str.substr((begin + 1), (end - begin - 1)) == subStr)
            {
                return true;
            }
        }
        else
        {
            if (str.substr(0, end) == subStr)
            {
                return true;
            }
        }
    }
    return false;
}

bool isSubStrInStrSpirit(const std::string& subStr, const std::string& str)
{
    std::vector<std::string> values;
    bool const result = boost::spirit::x3::parse(
        str.begin(), str.end(),
        +boost::spirit::x3::alnum % +(boost::spirit::x3::char_('-') >> boost::spirit::x3::int_ >> "/"),
        values
    );

    if (result && values.back() == subStr)
    {
        return true;
    }
    return false;
}

int main()
{
    std::string name = "TRINNDECK";
    std::vector<std::string> infoMaps{
        "SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1",
        "SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1",
    };

    std::cout << "WITH SPIRIT" << std::endl;
    auto start = std::chrono::steady_clock::now();
    for (auto& item : infoMaps)
    {
        std::cout << item << "  " << std::boolalpha << isSubStrInStrSpirit(name, item) << std::endl;
    }
    auto stop = std::chrono::steady_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
    std::cout << "Time taken: " << duration.count() << " microseconds." << std::endl;

    std::cout << "WITHOUT SPIRIT" << std::endl;
    start = std::chrono::steady_clock::now();
    for (auto& item : infoMaps)
    {
        std::cout << item << "  " << std::boolalpha << isSubStrInStr(name, item) << std::endl;
    }
    stop = std::chrono::steady_clock::now();
    duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
    std::cout << "Time taken: " << duration.count() << " microseconds." << std::endl;
    return 0;
}

希望这有所帮助!如果您需要更多翻译或其他帮助,请告诉我。

英文:

I am trying to replace an old piece of code which matches if substring is there in end of string with boost spirit. When i benchmark both implementation i see old implementation does it faster.

Live On Coliru

My questions are

  1. Is it wise to expect i would get faster parsing and matching when i replace old code with boost spirit.
  2. If (1.) is yes some suggestions on what i am doing wrong or what i shud do try.

logic explanation:
if substring is "TRINNDECK"
i need to check the last entry in "SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1" has substring.
all entries have simillar format (STRING-INT/STRING-INT)
so in my example "SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1" matches but not "SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1"

#include &lt;iostream&gt;
#include &lt;chrono&gt;
#include &lt;boost/spirit/home/x3.hpp&gt;
bool isSubStrInStr(const std::string subStr, const std::string&amp; str)
{
auto begin = str.find_last_of(&#39;/&#39;);
auto end = str.find_last_of(&#39;-&#39;);
if (end != std::string::npos)
{
if (begin != std::string::npos)
{
if (str.substr((begin + 1), (end - begin - 1)) == subStr)
{
return true;
}
}
else
{
if (str.substr(0, end) == subStr)
{
return true;
}
}
}
return false;
}
bool isSubStrInStrSpirit(const std::string&amp; subStr, const std::string&amp; str)
{
std::vector&lt;std::string&gt; values;
bool const result = boost::spirit::x3::parse(
str.begin(), str.end(),
+boost::spirit::x3::alnum % +(boost::spirit::x3::char_(&#39;-&#39;) &gt;&gt; boost::spirit::x3::int_ &gt;&gt; &quot;/&quot;),
values
);
if (result &amp;&amp; values.back() == subStr)
{
return true;
}
return false;
}
int main()
{
std::string name = &quot;TRINNDECK&quot;;
std::vector&lt;std::string&gt; infoMaps{
&quot;SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1&quot;,
&quot;SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1&quot;,
};
std::cout &lt;&lt; &quot;WITH SPIRIT&quot; &lt;&lt; std::endl;
auto start = std::chrono::steady_clock::now();
for (auto&amp; item : infoMaps)
{
std::cout &lt;&lt; item &lt;&lt; &quot;  &quot; &lt;&lt; std::boolalpha &lt;&lt; isSubStrInStrSpirit(name, item) &lt;&lt; std::endl;
}
auto stop = std::chrono::steady_clock::now();
auto duration = std::chrono::duration_cast&lt;std::chrono::microseconds&gt;(stop - start);
std::cout &lt;&lt; &quot;Time taken: &quot; &lt;&lt; duration.count() &lt;&lt; &quot; microseconds.&quot; &lt;&lt; std::endl;
std::cout &lt;&lt; &quot;WITHOUT SPIRIT&quot; &lt;&lt; std::endl;
start = std::chrono::steady_clock::now();
for (auto&amp; item : infoMaps)
{
std::cout &lt;&lt; item &lt;&lt; &quot;  &quot; &lt;&lt; std::boolalpha &lt;&lt; isSubStrInStr(name, item) &lt;&lt; std::endl;
}
stop = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast&lt;std::chrono::microseconds&gt;(stop - start);
std::cout &lt;&lt; &quot;Time taken: &quot; &lt;&lt; duration.count() &lt;&lt; &quot; microseconds.&quot; &lt;&lt; std::endl;
return 0;
}

答案1

得分: 2

I started optimizing, until I bothered to read your original implementation. Here goes my galaxy-brain version:

bool isSubStrInStrGalaxybrain(std::string_view expected, std::string_view input) {
    auto last = input.substr(input.find_last_of('/') + 1);
    return expected == last.substr(0, last.find('-'));
}

Everybody hates the string/string_view interface, but it is quite powerful if you find the sweet spot.

SPOILER: Take the rest of the original answer with a grain of salt because nothing is going to top this unless you wish to manually SSE4 optimize it (which the compiler might already do anyways).

OLD STUFF

I simplified the Spirit parser a bit to be more direct:

bool isSubStrInStrSpirit(std::string const& expected, std::string const& input) {
    namespace x3 = boost::spirit::x3;
    std::vector<std::string> values;

    bool result = parse(input.begin(), input.end(), +x3::alnum % +('-' >> x3::int_ >> '/'), values);
    return (result && values.back() == expected);
}

Let's look at this.

It does a lot more work than you require. In particular it:

  • Validates the integers and delimiters strictly for the entire input (even though it throws away the result).
  • Creates std::string entries in the vector for all of the "values" which you do not actually require.
  • You compile the parser expression each time around.

Besides your benchmarks measure console IO predominantly. Let's use a proper microbenchmark tool with statistical analysis:

NONIUS_BENCHMARK("Spirit", [](/*nonius::chronometer cm*/) {
    for (auto& item : infoMaps)
        isSubStrInStrSpirit(name, item);
});

NONIUS_BENCHMARK("Nospirit", [](/*nonius::chronometer cm*/) {
    for (auto& item : infoMaps)
        isSubStrInStr(name, item);
});

Output:

clock resolution: mean is 21.2206 ns (20480002 iterations)

benchmarking Spirit
collecting 100 samples, 33 iterations each, in estimated 2.1648 ms
mean: 490.892 ns, lb 479.67 ns, ub 509.147 ns, ci 0.95
std dev: 71.4656 ns, lb 49.6218 ns, ub 98.7286 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers

benchmarking Nospirit
collecting 100 samples, 315 iterations each, in estimated 2.1105 ms
mean: 60.2737 ns, lb 59.6758 ns, ub 61.197 ns, ci 0.95
std dev: 3.69908 ns, lb 2.68804 ns, ub 5.1282 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers

This confirms the expected slowness.

Fixing/Improving

  1. Avoid recompiling parser expression:
namespace x3 = boost::spirit::x3;
static const inline auto rule = +x3::alnum % +('-' >> x3::int_ >> '/');

bool isSubStrInStrSehe(std::string const& expected, std::string const& input) {
    std::vector<boost::iterator_range<char const*>> values;
    bool result = parse(input.begin(), input.end(), rule, values);
    return (result && values.back() == expected);
}

Minimal change Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching Compilers do optimize X3 neatly.

  1. Avoid string construction:
bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
    std::vector<boost::iterator_range<char const*>> values;
    bool result = parse(input.begin(), input.end(), rule, values);
    return (result && values.back() == expected);
}

That's very simple and makes the function interface a lot more flexible to boot. Now we're seeing the difference!

  1. Avoid vector construction:

You only need the last value. Let's cut the vector and use a semantic action to remember the last matched value:

static const inline auto token = x3::raw[+x3::alnum];
static const inline auto trailer = '-' >> x3::int_ >> '/';

bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
    boost::iterator_range<char const*> last;
    auto assign = [&last](auto& ctx) { last = _attr(ctx); };

    bool result = parse(input.begin(), input.end(), token[assign] % +trailer);

    return (result && last == expected);
}

If you're like me, the result is disappointing.

英文:

I started optimizing, until I bothered to read your original implementation. Here goes my galaxy-brain version:

bool isSubStrInStrGalaxybrain(std::string_view expected, std::string_view input) {
auto last = input.substr(input.find_last_of(&#39;/&#39;) + 1);
return expected == last.substr(0, last.find(&#39;-&#39;));
}

Everybody hates the string/string_view interface, but it is quite powerful if you find the sweet spot.

SPOILER: Take the rest of the original answer with a grain of salt because nothing is going to top this unless you wish to manually SSE4 optimize it (which the compiler might already do anyways):

Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching

OLD STUFF

I simplified the Spirit parser a bit to be more direct

bool isSubStrInStrSpirit(std::string const&amp; expected, std::string const&amp; input) {
namespace x3 = boost::spirit::x3;
std::vector&lt;std::string&gt; values;
bool result = parse(input.begin(), input.end(), +x3::alnum % +(&#39;-&#39; &gt;&gt; x3::int_ &gt;&gt; &#39;/&#39;), values);
return (result &amp;&amp; values.back() == expected);
}

Let's look at this.

It does a lot more work than you require. In particular it

  • validates the integers and delimiters strictly for the entire input (even though it throws away the result)
  • it creates std::string entries in the vector for all of the "values" which you donot actually require
  • you compile the parser expression each time around.

Besides your benchmarks measure console IO predominantly. Let's use a proper microbenchmark tool with statistical analysis:

NONIUS_BENCHMARK(&quot;Spirit&quot;, [](/*nonius::chronometer cm*/) {
for (auto&amp; item : infoMaps)
isSubStrInStrSpirit(name, item);
});
NONIUS_BENCHMARK(&quot;Nospirit&quot;, [](/*nonius::chronometer cm*/) {
for (auto&amp; item : infoMaps)
isSubStrInStr(name, item);
});

Output:

clock resolution: mean is 21.2206 ns (20480002 iterations)
benchmarking Spirit
collecting 100 samples, 33 iterations each, in estimated 2.1648 ms
mean: 490.892 ns, lb 479.67 ns, ub 509.147 ns, ci 0.95
std dev: 71.4656 ns, lb 49.6218 ns, ub 98.7286 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers
benchmarking Nospirit
collecting 100 samples, 315 iterations each, in estimated 2.1105 ms
mean: 60.2737 ns, lb 59.6758 ns, ub 61.197 ns, ci 0.95
std dev: 3.69908 ns, lb 2.68804 ns, ub 5.1282 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers

This confirms the expected slowness.

Fixing/Improving

  1. Avoid recompiling parser expression

    Let's get the rule compilation out of the hot-path:

    namespace x3 = boost::spirit::x3;
    static const inline auto rule = +x3::alnum % +(&#39;-&#39; &gt;&gt; x3::int_ &gt;&gt; &#39;/&#39;);
    bool isSubStrInStrSehe(std::string const&amp; expected, std::string const&amp; input) {
    std::vector&lt;std::string&gt; values;
    bool result = parse(input.begin(), input.end(), rule, values);
    return (result &amp;&amp; values.back() == expected);
    }
    

    Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching

    Minimal change Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching Compilers do optimize X3 neatly

  2. Avoid string construction

    bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
    std::vector&lt;boost::iterator_range&lt;char const*&gt; &gt; values;
    bool result = parse(input.begin(), input.end(), rule, values);
    return (result &amp;&amp; values.back() == expected);
    }
    

    That's very simple, and makes the function interface a lot more flexible to boot. Now we're seeing the difference! Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching

  3. Avoid vector construction

    You only need the last value. Let's cut the vector, and use a semantic action to remember the last matched value:

    static const inline auto token     = x3::raw[+x3::alnum];
    static const inline auto trailer = &#39;-&#39; &gt;&gt; x3::int_ &gt;&gt; &#39;/&#39;;
    bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
    boost::iterator_range&lt;char const*&gt; last;
    auto assign = [&amp;last](auto&amp; ctx) { last = _attr(ctx); };
    bool result = parse(input.begin(), input.end(), token[assign] % +trailer);
    return (result &amp;&amp; last == expected);
    }
    

    If you're like me, the result is disappointing Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching

huangapple
  • 本文由 发表于 2023年6月5日 19:05:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76405808.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定