英文:
Any way to make parsing & matching of substring faster with boost spirit than default parsing and matching
问题
以下是您要翻译的代码部分:
#include <iostream>
#include <chrono>
#include <boost/spirit/home/x3.hpp>
bool isSubStrInStr(const std::string subStr, const std::string& str)
{
auto begin = str.find_last_of('/');
auto end = str.find_last_of('-');
if (end != std::string::npos)
{
if (begin != std::string::npos)
{
if (str.substr((begin + 1), (end - begin - 1)) == subStr)
{
return true;
}
}
else
{
if (str.substr(0, end) == subStr)
{
return true;
}
}
}
return false;
}
bool isSubStrInStrSpirit(const std::string& subStr, const std::string& str)
{
std::vector<std::string> values;
bool const result = boost::spirit::x3::parse(
str.begin(), str.end(),
+boost::spirit::x3::alnum % +(boost::spirit::x3::char_('-') >> boost::spirit::x3::int_ >> "/"),
values
);
if (result && values.back() == subStr)
{
return true;
}
return false;
}
int main()
{
std::string name = "TRINNDECK";
std::vector<std::string> infoMaps{
"SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1",
"SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1",
};
std::cout << "WITH SPIRIT" << std::endl;
auto start = std::chrono::steady_clock::now();
for (auto& item : infoMaps)
{
std::cout << item << " " << std::boolalpha << isSubStrInStrSpirit(name, item) << std::endl;
}
auto stop = std::chrono::steady_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << "Time taken: " << duration.count() << " microseconds." << std::endl;
std::cout << "WITHOUT SPIRIT" << std::endl;
start = std::chrono::steady_clock::now();
for (auto& item : infoMaps)
{
std::cout << item << " " << std::boolalpha << isSubStrInStr(name, item) << std::endl;
}
stop = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << "Time taken: " << duration.count() << " microseconds." << std::endl;
return 0;
}
希望这有所帮助!如果您需要更多翻译或其他帮助,请告诉我。
英文:
I am trying to replace an old piece of code which matches if substring is there in end of string with boost spirit. When i benchmark both implementation i see old implementation does it faster.
My questions are
- Is it wise to expect i would get faster parsing and matching when i replace old code with boost spirit.
- If (1.) is yes some suggestions on what i am doing wrong or what i shud do try.
logic explanation:
if substring is "TRINNDECK"
i need to check the last entry in "SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1" has substring.
all entries have simillar format (STRING-INT/STRING-INT)
so in my example "SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1" matches but not "SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1"
#include <iostream>
#include <chrono>
#include <boost/spirit/home/x3.hpp>
bool isSubStrInStr(const std::string subStr, const std::string& str)
{
auto begin = str.find_last_of('/');
auto end = str.find_last_of('-');
if (end != std::string::npos)
{
if (begin != std::string::npos)
{
if (str.substr((begin + 1), (end - begin - 1)) == subStr)
{
return true;
}
}
else
{
if (str.substr(0, end) == subStr)
{
return true;
}
}
}
return false;
}
bool isSubStrInStrSpirit(const std::string& subStr, const std::string& str)
{
std::vector<std::string> values;
bool const result = boost::spirit::x3::parse(
str.begin(), str.end(),
+boost::spirit::x3::alnum % +(boost::spirit::x3::char_('-') >> boost::spirit::x3::int_ >> "/"),
values
);
if (result && values.back() == subStr)
{
return true;
}
return false;
}
int main()
{
std::string name = "TRINNDECK";
std::vector<std::string> infoMaps{
"SERVERTRINN-11/TRINNDECK-1/TRINNUSER-1",
"SERVERMAGNUS-1/MAGNUSDECK-1/TRINNDECK-1",
};
std::cout << "WITH SPIRIT" << std::endl;
auto start = std::chrono::steady_clock::now();
for (auto& item : infoMaps)
{
std::cout << item << " " << std::boolalpha << isSubStrInStrSpirit(name, item) << std::endl;
}
auto stop = std::chrono::steady_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << "Time taken: " << duration.count() << " microseconds." << std::endl;
std::cout << "WITHOUT SPIRIT" << std::endl;
start = std::chrono::steady_clock::now();
for (auto& item : infoMaps)
{
std::cout << item << " " << std::boolalpha << isSubStrInStr(name, item) << std::endl;
}
stop = std::chrono::steady_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << "Time taken: " << duration.count() << " microseconds." << std::endl;
return 0;
}
答案1
得分: 2
I started optimizing, until I bothered to read your original implementation. Here goes my galaxy-brain version:
bool isSubStrInStrGalaxybrain(std::string_view expected, std::string_view input) {
auto last = input.substr(input.find_last_of('/') + 1);
return expected == last.substr(0, last.find('-'));
}
Everybody hates the string/string_view interface, but it is quite powerful if you find the sweet spot.
SPOILER: Take the rest of the original answer with a grain of salt because nothing is going to top this unless you wish to manually SSE4 optimize it (which the compiler might already do anyways).
OLD STUFF
I simplified the Spirit parser a bit to be more direct:
bool isSubStrInStrSpirit(std::string const& expected, std::string const& input) {
namespace x3 = boost::spirit::x3;
std::vector<std::string> values;
bool result = parse(input.begin(), input.end(), +x3::alnum % +('-' >> x3::int_ >> '/'), values);
return (result && values.back() == expected);
}
Let's look at this.
It does a lot more work than you require. In particular it:
- Validates the integers and delimiters strictly for the entire input (even though it throws away the result).
- Creates
std::string
entries in the vector for all of the "values" which you do not actually require. - You compile the parser expression each time around.
Besides your benchmarks measure console IO predominantly. Let's use a proper microbenchmark tool with statistical analysis:
NONIUS_BENCHMARK("Spirit", [](/*nonius::chronometer cm*/) {
for (auto& item : infoMaps)
isSubStrInStrSpirit(name, item);
});
NONIUS_BENCHMARK("Nospirit", [](/*nonius::chronometer cm*/) {
for (auto& item : infoMaps)
isSubStrInStr(name, item);
});
Output:
clock resolution: mean is 21.2206 ns (20480002 iterations)
benchmarking Spirit
collecting 100 samples, 33 iterations each, in estimated 2.1648 ms
mean: 490.892 ns, lb 479.67 ns, ub 509.147 ns, ci 0.95
std dev: 71.4656 ns, lb 49.6218 ns, ub 98.7286 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers
benchmarking Nospirit
collecting 100 samples, 315 iterations each, in estimated 2.1105 ms
mean: 60.2737 ns, lb 59.6758 ns, ub 61.197 ns, ci 0.95
std dev: 3.69908 ns, lb 2.68804 ns, ub 5.1282 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers
This confirms the expected slowness.
Fixing/Improving
- Avoid recompiling parser expression:
namespace x3 = boost::spirit::x3;
static const inline auto rule = +x3::alnum % +('-' >> x3::int_ >> '/');
bool isSubStrInStrSehe(std::string const& expected, std::string const& input) {
std::vector<boost::iterator_range<char const*>> values;
bool result = parse(input.begin(), input.end(), rule, values);
return (result && values.back() == expected);
}
Minimal change Compilers do optimize X3 neatly.
- Avoid string construction:
bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
std::vector<boost::iterator_range<char const*>> values;
bool result = parse(input.begin(), input.end(), rule, values);
return (result && values.back() == expected);
}
That's very simple and makes the function interface a lot more flexible to boot. Now we're seeing the difference!
- Avoid vector construction:
You only need the last value. Let's cut the vector and use a semantic action to remember the last matched value:
static const inline auto token = x3::raw[+x3::alnum];
static const inline auto trailer = '-' >> x3::int_ >> '/';
bool isSubStrInStrSehe(std::string_view expected, std::string_view input) {
boost::iterator_range<char const*> last;
auto assign = [&last](auto& ctx) { last = _attr(ctx); };
bool result = parse(input.begin(), input.end(), token[assign] % +trailer);
return (result && last == expected);
}
If you're like me, the result is disappointing.
英文:
I started optimizing, until I bothered to read your original implementation. Here goes my galaxy-brain version:
bool isSubStrInStrGalaxybrain(std::string_view expected, std::string_view input) {
auto last = input.substr(input.find_last_of('/') + 1);
return expected == last.substr(0, last.find('-'));
}
Everybody hates the string/string_view interface, but it is quite powerful if you find the sweet spot.
SPOILER: Take the rest of the original answer with a grain of salt because nothing is going to top this unless you wish to manually SSE4 optimize it (which the compiler might already do anyways):
OLD STUFF
I simplified the Spirit parser a bit to be more direct
bool isSubStrInStrSpirit(std::string const& expected, std::string const& input) {
namespace x3 = boost::spirit::x3;
std::vector<std::string> values;
bool result = parse(input.begin(), input.end(), +x3::alnum % +('-' >> x3::int_ >> '/'), values);
return (result && values.back() == expected);
}
Let's look at this.
It does a lot more work than you require. In particular it
- validates the integers and delimiters strictly for the entire input (even though it throws away the result)
- it creates
std::string
entries in the vector for all of the "values" which you donot actually require - you compile the parser expression each time around.
Besides your benchmarks measure console IO predominantly. Let's use a proper microbenchmark tool with statistical analysis:
NONIUS_BENCHMARK("Spirit", [](/*nonius::chronometer cm*/) {
for (auto& item : infoMaps)
isSubStrInStrSpirit(name, item);
});
NONIUS_BENCHMARK("Nospirit", [](/*nonius::chronometer cm*/) {
for (auto& item : infoMaps)
isSubStrInStr(name, item);
});
Output:
clock resolution: mean is 21.2206 ns (20480002 iterations)
benchmarking Spirit
collecting 100 samples, 33 iterations each, in estimated 2.1648 ms
mean: 490.892 ns, lb 479.67 ns, ub 509.147 ns, ci 0.95
std dev: 71.4656 ns, lb 49.6218 ns, ub 98.7286 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers
benchmarking Nospirit
collecting 100 samples, 315 iterations each, in estimated 2.1105 ms
mean: 60.2737 ns, lb 59.6758 ns, ub 61.197 ns, ci 0.95
std dev: 3.69908 ns, lb 2.68804 ns, ub 5.1282 ns, ci 0.95
found 21 outliers among 100 samples (21%)
variance is severely inflated by outliers
This confirms the expected slowness.
Fixing/Improving
-
Avoid recompiling parser expression
Let's get the rule compilation out of the hot-path:
namespace x3 = boost::spirit::x3; static const inline auto rule = +x3::alnum % +('-' >> x3::int_ >> '/'); bool isSubStrInStrSehe(std::string const& expected, std::string const& input) { std::vector<std::string> values; bool result = parse(input.begin(), input.end(), rule, values); return (result && values.back() == expected); }
Minimal change Compilers do optimize X3 neatly
-
Avoid string construction
bool isSubStrInStrSehe(std::string_view expected, std::string_view input) { std::vector<boost::iterator_range<char const*> > values; bool result = parse(input.begin(), input.end(), rule, values); return (result && values.back() == expected); }
That's very simple, and makes the function interface a lot more flexible to boot. Now we're seeing the difference!
-
Avoid vector construction
You only need the last value. Let's cut the vector, and use a semantic action to remember the last matched value:
static const inline auto token = x3::raw[+x3::alnum]; static const inline auto trailer = '-' >> x3::int_ >> '/'; bool isSubStrInStrSehe(std::string_view expected, std::string_view input) { boost::iterator_range<char const*> last; auto assign = [&last](auto& ctx) { last = _attr(ctx); }; bool result = parse(input.begin(), input.end(), token[assign] % +trailer); return (result && last == expected); }
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论