如何在boost::spirit中实现解析语句,本质上是切换解析器?

huangapple go评论74阅读模式
英文:

How could one implement parsing a statement in boost::spirit, which in essence switches parsers?

问题

我试图编写解析器的语言具有一种语句,其本质上是为接下来的文本设置属性。这些属性包括

  • 区分大小写
  • 格式(包括不同的注释样式)

我只能想象通过切换到不同的解析器来实现这一点。我认为这将要求将当前解析器终止为成功,并通过其属性返回对剩余未匹配输入的处理方式。如何实现这一点呢?

英文:

The language I'm trying to write a parser for has a statement, which in essence sets properties for the following text. These properties include

  • case sensitivity
  • format (including different comment style)

I can only imagine to implement this by switching to a different parser. I think this would require to terminate the current parser as successful and return via its attribute what to do with the rest of the unmatched input. How could one accomplish this?

答案1

得分: 1

使用语义动作和在语句解析器内部使用qi::lazy指令,根据指定的属性调用适当的解析器。

英文:

Use semantic actions and the qi::lazy directive inside the statement parser to invoke the appropriate parsers based on the specified properties

答案2

得分: 0

以下是您提供的文本的翻译部分:

"Switching to a different parser is one way." -> "切换到不同的解析器是一种方法。"

"The most notable pattern relating to this is the Nabialek Trick." -> "与此相关的最显著模式是Nabialek Trick。"

"This builds on the qi::lazy directive." -> "这建立在qi::lazy指令的基础上。"

"However, since you already mention multiple flags, that might not scale as it might lead to unnecessary duplication and/or combinatorial explosion." -> "然而,由于您已经提到了多个标志,这可能不会扩展,因为它可能导致不必要的重复和/或组合爆炸。"

"I'd recommend using some parser state. You can do that using some semantic actions that hold your logic, but it would imply mutable state inside your parser which may hurt re-entrancy, thread-safety and re-usability." -> "我建议使用一些解析器状态。您可以使用一些保留您的逻辑的语义操作来实现这一点,但这将意味着在解析器内部使用可变状态,这可能会影响到可重入性、线程安全性和可重用性。"

"Instead, Qi offers local attributes, which sit inside the runtime parser context." -> "相反,Qi提供了局部属性,这些属性位于运行时解析器上下文中。"

"As an example, let's switch case-sensitivity:" -> "作为示例,让我们切换大小写敏感性:"

...

请注意,这是您提供的文本的部分翻译。如果您需要更多翻译或有其他问题,请随时提问。

英文:

Switching to a different parser is one way.

The most notable pattern relating to this is the Nabialek Trick. This builds on the qi::lazy directive.

However, since you already mention multiple flags, that might not scale as it might lead to unnecessary duplication and/or combinatorial explosion.

I'd recommend using some parser state. You can do that using some semantic actions that hold your logic, but it would imply mutable state inside your parser which may hurt re-entrancy, thread-safety and re-usability. Those are pretty general drawbacks of semantic actions.

Instead, Qi offers local attributes, which sit inside the runtime parser context.

As an example, let's switch case-sensitivity:

// sample coming up, making dinner as well

Post-Dinner Update

As always time is a good teacher. I've tried my hand at actually making locals/inherited attributes work for re-entrancy, and it didn't work the way I remembered it.

So, instead let's embrace mutable state and put the option state right in the grammar instance. That way things stay at feasible level of complexity, though you cannot always share parser instances.

Live On Coliru

// #define BOOST_SPIRIT_DEBUG
#include <boost/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;
using namespace std::string_literals;

template <typename It> struct DemoParser : qi::grammar<It> {
    DemoParser() : DemoParser::base_type(start) {
        using namespace qi::labels;

        // shorthand mnemonics for accessing option state
        auto _case_option   = px::ref(case_opt);
        auto _strict_option = px::ref(strict_opt);
        qi::_r1_type kw_text; // another mnemonic, for the inherited attribute

        // handy phoenix actor (custom "directives")
        auto const _cs = qi::eps(_case_option == Sensitive);
        auto const _ci = qi::eps(_case_option == Insensitive);
     // auto const _sm = qi::eps(_strict_option == StrictOn);

        start = qi::skip(qi::space)[demo];

        demo = qi::eps[_case_option = Case::Sensitive]    // initialize
                      [_strict_option = Strict::StrictOn] // defaults?
            >> -(option | hello) % ';'                    //
            >> qi::eoi;

        option = kw("Option"s) >> (switch_case | switch_strict);
        hello                             //
            = _cs >> "Hello"              //
            | _ci >> qi::no_case["hello"] //
            ;

        _case_sym.add("sensitive", Case::Sensitive)("insensitive", Case::Insensitive);
        _strict_sym.add("on", Strict::StrictOn)("off", Strict::StrictOff);

        _case         = _cs >> _case_sym | _ci >> qi::no_case[_case_sym];
        _strict       = _cs >> _strict_sym | _ci >> qi::no_case[_strict_sym];
        switch_case   = kw("case"s) >> _case[_case_option = _1];
        switch_strict = kw("strict"s) >> _strict[_strict_option = _1];

        px::function c_str = [](std::string const& s) { return s.c_str(); };

        kw = (_cs >> qi::lit(c_str(kw_text))                 // case sensitive
              | _ci >> qi::no_case[qi::lit(c_str(kw_text))]) // case insensitive
            >> !qi::char_("a-zA-Z0-9._"); // lookahead assertion to avoid parsing partial identifiers

        BOOST_SPIRIT_DEBUG_NODES((start)(demo)(option)(hello)(switch_case)(switch_strict)(_case)(_strict)(kw))
    }

  private:
    qi::rule<It> start;

    enum Case { Sensitive, Insensitive } case_opt = Sensitive;
    enum Strict { StrictOff, StrictOn } strict_opt        = StrictOn;
    qi::symbols<char, Case>   _case_sym;
    qi::symbols<char, Strict> _strict_sym;

    using Skipper = qi::space_type;
    qi::rule<It, Skipper> demo, hello, option, switch_case, switch_strict;

    // lexeme
    qi::rule<It, Case()> _case;
    qi::rule<It, Strict()> _strict;
    qi::rule<It, std::string(std::string kw_text)> kw; // using inherited attribute
};

int main() {
    for (std::string_view input :
         {
             "",
             "bogus;", // FAIL
             "Hello;",
             "hello;",
             "Option case insensitive; heLlO;",
             "Option strict off;",
             "Option STRICT off;",
             "Option case insensitive; Option STRICT off;",
             "Option case insensitive; oPTION STRICT off;",
             "Option case insensitive; oPTION STRICT ON;",
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;", // FAIL
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;",
         }) //
    {
        DemoParser<std::string_view::const_iterator> p; // mutable instance now
                                                        //
        bool ok = parse(begin(input), end(input), p);
        std::cout << quoted(input) << " -> " << (ok ? "PASS" : "FAIL") << std::endl;
    }
}

Printing the expected output for the test cases:

"" -> PASS
"bogus;" -> FAIL
"Hello;" -> PASS
"hello;" -> FAIL
"Option case insensitive; heLlO;" -> PASS
"Option strict off;" -> PASS
"Option STRICT off;" -> FAIL
"Option case insensitive; Option STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT ON;" -> PASS
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;" -> FAIL
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;" -> PASS

Improving Compiletimes: X3

I honestly think that for dynamically parameterizing/composing rules X3 is a bit more convenient. It also compiles a lot faster and is easier to add some debug side-effecting if desired:

Live On Coliru

// #define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;
using namespace std::string_literals;

namespace DemoParser {
    enum Case { Insensitive, Sensitive };
    enum Strict { StrictOff, StrictOn };
    struct Options {
        enum Case   case_opt   = Sensitive;
        enum Strict strict_opt = StrictOn;
    };

    // custom "directives"
    auto const _cs = x3::eps[([](auto& ctx) { _pass(ctx) = get<Options>(ctx).case_opt == Sensitive; })];
    auto const _ci = x3::eps[([](auto& ctx) { _pass(ctx) = get<Options>(ctx).case_opt == Insensitive; })];
 // auto const _sm = x3::eps[([](auto& ctx) { _pass(ctx) = get<Options>(ctx).strict_opt == StrictOn; })];

    auto set_opt = [](auto member) {
        return [member](auto& ctx) {
            auto& opt = get<Options>(ctx).*member;
            x3::traits::move_to(_attr(ctx), opt); 
        };
    };

    static inline auto variable_case(auto p, char const* name = "variable_case") {
        using Attr = x3::traits::attribute_of<decltype(p), x3::unused_type, void>::type;
        return x3::rule<struct _, Attr, true>{name} = //
            (_cs >> x3::as_parser(p) |                //
             _ci >> x3::no_case[x3::as_parser(p)]);
    }

    static inline auto kw(char const* kw_text) {
        // using lookahead assertion to avoid parsing partial identifiers
        return x3::rule<struct kw, std::string>{kw_text} = x3::lexeme[ //
                   variable_case(x3::lit(kw_text), kw_text)            //
                   >> !x3::char_("a-zA-Z0-9._")                        //
        ];
    }

    auto _case_sym = x3::symbols<Case>{}.add("sensitive", Case::Sensitive)("insensitive", Case::Insensitive).sym;
    auto _strict_sym = x3::symbols<Strict>{}.add("on", Strict::StrictOn)("off", Strict::StrictOff).sym;

    auto switch_case   = kw("case") >> variable_case(_case_sym)[set_opt(&Options::case_opt)];
    auto switch_strict = kw("strict") >> variable_case(_strict_sym)[set_opt(&Options::strict_opt)];

    auto option = kw("Option") >> (switch_case | switch_strict);
    auto hello  = _cs >> "Hello"      //
        | _ci >> x3::no_case["hello"] //
        ;

    auto demo  = -(option | hello) % ';' >> x3::eoi;
    auto start = x3::skip(x3::space)[demo];
}

int main() {
    auto const p = DemoParser::start; // stateless parser
    using DemoParser::Options;

    for (std::string_view input :
         {
             "",
             "bogus;", // FAIL
             "Hello;",
             "hello;",
             "Option case insensitive; heLlO;",
             "Option strict off;",
             "Option STRICT off;",
             "Option case insensitive; Option STRICT off;",
             "Option case insensitive; oPTION STRICT off;",
             "Option case insensitive; oPTION STRICT ON;",
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;", // FAIL
             "Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;",
         }) //
    {
        Options opts;

        bool ok = parse(begin(input), end(input), x3::with<Options>(opts)

); std::cout << quoted(input) << " -> " << (ok ? "PASS" : "FAIL") << std::endl; } }

Still printing the same test output:

"" -> PASS
"bogus;" -> FAIL
"Hello;" -> PASS
"hello;" -> FAIL
"Option case insensitive; heLlO;" -> PASS
"Option strict off;" -> PASS
"Option STRICT off;" -> FAIL
"Option case insensitive; Option STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT off;" -> PASS
"Option case insensitive; oPTION STRICT ON;" -> PASS
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; HelLO;" -> FAIL
"Option case insensitive; HeLlO; OPTION CASE SENSitive ; Hello;" -> PASS

For the qi::lazy approach I'm a bit out of time, I think I'll refer to my existing examples on this site.

huangapple
  • 本文由 发表于 2023年6月26日 22:09:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76557481.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定