2023年6月5日 19:05:47go评论81阅读模式

英文:

Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching

问题

以下是您要翻译的代码部分：

#include <iostream>
#include <chrono>
#include <boost/spirit/home/x3.hpp>

bool isSubStrInStr(const std::string subStr, const std::string& str)
{
    auto begin = str.find_last_of('/');
    auto end = str.find_last_of('-');
    if (end != std::string::npos)
    {
        if (begin != std::string::npos)
        {
            if (str.substr((begin + 1), (end - begin - 1)) == subStr)
            {
                return true;
            }
        }
        else
        {
            if (str.substr(0, end) == subStr)
            {
                return true;
            }
        }
    }
    return false;
}

bool isSubStrInStrSpirit(const std::string& subStr, const std::string& str)
{
    std::vector<std::string> values;
    bool const result = boost::spirit::x3::parse(
        str.begin(), str.end(),
        +boost::spirit::x3::alnum % +(boost::spirit::x3::char_('-') >> boost::spirit::x3::int_ >> "/"),
        values
    );

    if (result && values.back() == subStr)
    {
        return true;
    }
    return false;
}

int main()
{
    std::string name = "TRINNDECK";
    std::vector<std::string> infoMaps{
        "SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1",
        "SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1",
    };

    std::cout << "WITH SPIRIT" << std::endl;
    auto start = std::chrono::steady_clock::now();
    for (auto& item : infoMaps)
    {
        std::cout << item << "  " << std::boolalpha << isSubStrInStrSpirit(name, item) << std::endl;
    }
    auto stop = std::chrono::steady_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
    std::cout << "Time taken: " << duration.count() << " microseconds." << std::endl;

    std::cout << "WITHOUT SPIRIT" << std::endl;
    start = std::chrono::steady_clock::now();
    for (auto& item : infoMaps)
    {
        std::cout << item << "  " << std::boolalpha << isSubStrInStr(name, item) << std::endl;
    }
    stop = std::chrono::steady_clock::now();
    duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
    std::cout << "Time taken: " << duration.count() << " microseconds." << std::endl;
    return 0;
}

希望这有所帮助！如果您需要更多翻译或其他帮助，请告诉我。

英文:

I am trying to replace an old piece of code which matches if substring is there in end of string with boost spirit. When i benchmark both implementation i see old implementation does it faster.

Live On Coliru

My questions are

Is it wise to expect i would get faster parsing and matching when i replace old code with boost spirit.
If (1.) is yes some suggestions on what i am doing wrong or what i shud do try.

logic explanation:
if substring is "TRINNDECK"
i need to check the last entry in "SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1" has substring.
all entries have simillar format (STRING-INT/STRING-INT)
so in my example "SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1" matches but not "SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1"

#include &lt;iostream&gt;
#include &lt;chrono&gt;
#include &lt;boost/spirit/home/x3.hpp&gt;
bool isSubStrInStr(const std::string subStr, const std::string&amp; str)
{
auto begin = str.find_last_of(&#39;/&#39;);
auto end = str.find_last_of(&#39;-&#39;);
if (end != std::string::npos)
{
if (begin != std::string::npos)
{
if (str.substr((begin + 1), (end - begin - 1)) == subStr)
{
return true;
}
}
else
{
if (str.substr(0, end) == subStr)
{
return true;
}
}
}
return false;
}
bool isSubStrInStrSpirit(const std::string&amp; subStr, const std::string&amp; str)
{
std::vector&lt;std::string&gt; values;
bool const result = boost::spirit::x3::parse(
str.begin(), str.end(),
+boost::spirit::x3::alnum % +(boost::spirit::x3::char_(&#39;-&#39;) &gt;&gt; boost::spirit::x3::int_ &gt;&gt; &quot;/&quot;),
values
);
if (result &amp;&amp; values.back() == subStr)
{
return true;
}
return false;
}
int main()
{
std::string name = &quot;TRINNDECK&quot;;
std::vector&lt;std::string&gt; infoMaps{
&quot;SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1&quot;,
&quot;SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1&quot;,
};
std::cout &lt;&lt; &quot;WITH SPIRIT&quot; &lt;&lt; std::endl;
auto start = std::chrono::steady_clock::now();
for (auto&amp; item : infoMaps)
{
std::cout &lt;&lt; item &lt;&lt; &quot;  &quot; &lt;&lt; std::boolalpha &lt;&lt; isSubStrInStrSpirit(name, item) &lt;&lt; std::endl;
}
auto stop = std::chrono::steady_clock::now();
auto duration = std::chrono::duration_cast&lt;std::chrono::microseconds&gt;(stop - start);
std::cout &lt;&lt; &quot;Time taken: &quot; &lt;&lt; duration.count() &lt;&lt; &quot; microseconds.&quot; &lt;&lt; std::endl;
std::cout &lt;&lt; &quot;WITHOUT SPIRIT&quot; &lt;&lt; std::endl;
start = std::chrono::steady_clock::now();
for (auto&amp; item : infoMaps)
{
std::cout &lt;&lt; item &lt;&lt; &quot;  &quot; &lt;&lt; std::boolalpha &lt;&lt; isSubStrInStr(name, item) &lt;&lt; std::endl;
}
stop = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast&lt;std::chrono::microseconds&gt;(stop - start);
std::cout &lt;&lt; &quot;Time taken: &quot; &lt;&lt; duration.count() &lt;&lt; &quot; microseconds.&quot; &lt;&lt; std::endl;
return 0;
}

答案1

得分: 2

I started optimizing, until I bothered to read your original implementation. Here goes my galaxy-brain version:

bool isSubStrInStrGalaxybrain(std::string_view expected, std::string_view input) {
    auto last = input.substr(input.find_last_of('/') + 1);
    return expected == last.substr(0, last.find('-'));
}

Everybody hates the string/string_view interface, but it is quite powerful if you find the sweet spot.

SPOILER: Take the rest of the original answer with a grain of salt because nothing is going to top this unless you wish to manually SSE4 optimize it (which the compiler might already do anyways).

OLD STUFF

I simplified the Spirit parser a bit to be more direct:

bool isSubStrInStrSpirit(std::string const& expected, std::string const& input) {
    namespace x3 = boost::spirit::x3;
    std::vector<std::string> values;

    bool result = parse(input.begin(), input.end(), +x3::alnum % +('-' >> x3::int_ >> '/'), values);
    return (result && values.back() == expected);
}

Let's look at this.

It does a lot more work than you require. In particular it:

Validates the integers and delimiters strictly for the entire input (even though it throws away the result).
Creates std::string entries in the vector for all of the "values" which you do not actually require.
You compile the parser expression each time around.

Besides your benchmarks measure console IO predominantly. Let's use a proper microbenchmark tool with statistical analysis:

NONIUS_BENCHMARK("Spirit", [](/*nonius::chronometer cm*/) {
    for (auto& item : infoMaps)
        isSubStrInStrSpirit(name, item);
});

NONIUS_BENCHMARK("Nospirit", [](/*nonius::chronometer cm*/) {
    for (auto& item : infoMaps)
        isSubStrInStr(name, item);
});

Output:

clock resolution: mean is 21.2206 ns (20480002 iterations)

benchmarking Spirit
collecting 100 samples, 33 iterations each, in estimated 2.1648 ms
mean: 490.892 ns, lb 479.67 ns, ub 509.147 ns, ci 0.95
std dev: 71.4656 ns, lb 49.6218 ns, ub 98.7286 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers

benchmarking Nospirit
collecting 100 samples, 315 iterations each, in estimated 2.1105 ms
mean: 60.2737 ns, lb 59.6758 ns, ub 61.197 ns, ci 0.95
std dev: 3.69908 ns, lb 2.68804 ns, ub 5.1282 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers

This confirms the expected slowness.

Fixing/Improving

Avoid recompiling parser expression:

namespace x3 = boost::spirit::x3;
static const inline auto rule = +x3::alnum % +('-' >> x3::int_ >> '/');

bool isSubStrInStrSehe(std::string const& expected, std::string const& input) {
    std::vector<boost::iterator_range<char const*>> values;
    bool result = parse(input.begin(), input.end(), rule, values);
    return (result && values.back() == expected);
}

Minimal change Compilers do optimize X3 neatly.

Avoid string construction:

bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
    std::vector<boost::iterator_range<char const*>> values;
    bool result = parse(input.begin(), input.end(), rule, values);
    return (result && values.back() == expected);
}

That's very simple and makes the function interface a lot more flexible to boot. Now we're seeing the difference!

Avoid vector construction:

You only need the last value. Let's cut the vector and use a semantic action to remember the last matched value:

static const inline auto token = x3::raw[+x3::alnum];
static const inline auto trailer = '-' >> x3::int_ >> '/';

bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
    boost::iterator_range<char const*> last;
    auto assign = [&last](auto& ctx) { last = _attr(ctx); };

    bool result = parse(input.begin(), input.end(), token[assign] % +trailer);

    return (result && last == expected);
}

If you're like me, the result is disappointing.

英文:

I started optimizing, until I bothered to read your original implementation. Here goes my galaxy-brain version:

bool isSubStrInStrGalaxybrain(std::string_view expected, std::string_view input) {
auto last = input.substr(input.find_last_of(&#39;/&#39;) + 1);
return expected == last.substr(0, last.find(&#39;-&#39;));
}

Everybody hates the string/string_view interface, but it is quite powerful if you find the sweet spot.

SPOILER: Take the rest of the original answer with a grain of salt because nothing is going to top this unless you wish to manually SSE4 optimize it (which the compiler might already do anyways):

OLD STUFF

I simplified the Spirit parser a bit to be more direct

bool isSubStrInStrSpirit(std::string const&amp; expected, std::string const&amp; input) {
namespace x3 = boost::spirit::x3;
std::vector&lt;std::string&gt; values;
bool result = parse(input.begin(), input.end(), +x3::alnum % +(&#39;-&#39; &gt;&gt; x3::int_ &gt;&gt; &#39;/&#39;), values);
return (result &amp;&amp; values.back() == expected);
}

Let's look at this.

It does a lot more work than you require. In particular it

validates the integers and delimiters strictly for the entire input (even though it throws away the result)
it creates std::string entries in the vector for all of the "values" which you donot actually require
you compile the parser expression each time around.

Besides your benchmarks measure console IO predominantly. Let's use a proper microbenchmark tool with statistical analysis:

NONIUS_BENCHMARK(&quot;Spirit&quot;, [](/*nonius::chronometer cm*/) {
for (auto&amp; item : infoMaps)
isSubStrInStrSpirit(name, item);
});
NONIUS_BENCHMARK(&quot;Nospirit&quot;, [](/*nonius::chronometer cm*/) {
for (auto&amp; item : infoMaps)
isSubStrInStr(name, item);
});

Output:

clock resolution: mean is 21.2206 ns (20480002 iterations)
benchmarking Spirit
collecting 100 samples, 33 iterations each, in estimated 2.1648 ms
mean: 490.892 ns, lb 479.67 ns, ub 509.147 ns, ci 0.95
std dev: 71.4656 ns, lb 49.6218 ns, ub 98.7286 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers
benchmarking Nospirit
collecting 100 samples, 315 iterations each, in estimated 2.1105 ms
mean: 60.2737 ns, lb 59.6758 ns, ub 61.197 ns, ci 0.95
std dev: 3.69908 ns, lb 2.68804 ns, ub 5.1282 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers

This confirms the expected slowness.

Fixing/Improving

Avoid recompiling parser expression

Let's get the rule compilation out of the hot-path:

namespace x3 = boost::spirit::x3;
static const inline auto rule = +x3::alnum % +(&#39;-&#39; &gt;&gt; x3::int_ &gt;&gt; &#39;/&#39;);
bool isSubStrInStrSehe(std::string const&amp; expected, std::string const&amp; input) {
std::vector&lt;std::string&gt; values;
bool result = parse(input.begin(), input.end(), rule, values);
return (result &amp;&amp; values.back() == expected);
}

Minimal change Compilers do optimize X3 neatly

Avoid string construction

bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
std::vector&lt;boost::iterator_range&lt;char const*&gt; &gt; values;
bool result = parse(input.begin(), input.end(), rule, values);
return (result &amp;&amp; values.back() == expected);
}

That's very simple, and makes the function interface a lot more flexible to boot. Now we're seeing the difference!

Avoid vector construction

You only need the last value. Let's cut the vector, and use a semantic action to remember the last matched value:

static const inline auto token     = x3::raw[+x3::alnum];
static const inline auto trailer = &#39;-&#39; &gt;&gt; x3::int_ &gt;&gt; &#39;/&#39;;
bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
boost::iterator_range&lt;char const*&gt; last;
auto assign = [&amp;last](auto&amp; ctx) { last = _attr(ctx); };
bool result = parse(input.begin(), input.end(), token[assign] % +trailer);
return (result &amp;&amp; last == expected);
}

If you're like me, the result is disappointing

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching

问题

答案1

OLD STUFF

Fixing/Improving

OLD STUFF

Fixing/Improving

Avoid recompiling parser expression

Avoid string construction

Avoid vector construction

将新精灵的位置设置为最后一个销毁的精灵的位置。

C++以十六进制值打印十进制值。

当添加新线程时的堆栈行为

Creating transparent parent window without using LWA_COLORKEY or effecting child windows

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论