c++ 从 dbc 条目中多次提取正则表达式到数组

huangapple go评论65阅读模式
英文:

c++ multiple regex extractions to an array from dbc entry

问题

以下是要翻译的代码部分:

auto blah = std::string{"5001 | 5002 | 5003"};
auto values = std::vector<std::string>{
    std::sregex_token_iterator{blah.begin(), blah.end(), std::regex{R"(\d+)"}},
    std::sregex_token_iterator{}};

请注意,这段代码似乎包含了HTML实体编码(例如,"<)。如果您想要解析字符串中的值,您可以使用正则表达式来提取所需的数据。在这个情况下,您可以使用正则表达式来匹配数字和带引号的描述,然后按照需要进行进一步处理。如果需要进一步的帮助,请提出具体的问题。

英文:

Hello I would like to extract the parameters from the following string:
VAL_ 234 State1 123 "Description 1" 0 "Description 2 with \n new line" 90903489 "Big value and special characters &$§())!" ;

The desired matches are

  • 234
  • State1
  • and then an array with unsigned integer and string combination
    • 123 "Description 1"
    • 0 "Description 2 with \n new line"
    • 90903489 "Big value and special characters &$§())!"

The array shall be split in a second step if it is not possible to do it directly.
With the following regex I just get always the last match of the array 90903489 "Big value and special characters &$§())!"

^VAL_ ([0-9]+) ([A-Za-z_][A-Za-z_0-9]*) ([0-9]*\\s\"[^\"]*\"\\s)+

Is there a possibility to extract the values?
I found already

auto blah = std::string{"5001 | 5002 | 5003"};
auto values = std::vector<std::string>{
    std::sregex_token_iterator{blah.begin(), blah.end(), std::regex{R"(\d+)"}},
    std::sregex_token_iterator{}};

from this post but it returns me just the complete string. Is there a possibility to iterate over the submatches?

答案1

得分: 0

以下是代码部分的翻译:

Sample code:

const std::string input{ R"(VAL_ 234 State1 123 "Description 1" 0 "Description 2 with \n new line" 90903489 "Big value and special
characters &$§())!")" };
const std::regex regex{ R"((?:^VAL_\s(\d+)\s(\w+)|\s(\d+\s".+?")))" };
const std::sregex_iterator end{};
for(auto it = std::sregex_iterator{ std::cbegin(input), std::cend(input), regex };
    it != end; ++it) {
    auto match = *it;
    if (match.empty()) {
        std::cerr << "Nothing matched" << '\n';
        continue;
    } else {
        if (match[1].matched) {
            std::cout << "Val match: " << match[1].str() << '\n';
        }
        if (match[2].matched) {
            std::cout << "State match: " << match[2].str() << '\n';
        }
        if (match[3].matched) {
            std::cout << "Etc match: " << match[3].str() << '\n';
        }
    }
}

如果您有其他需要翻译的内容,请随时提出。

英文:

Not sure if you have any specific requirements on how the matches need to be separated, but you can match either of the patterns with the following regular expression:

(?:^VAL_\s(\d+)\s(\w+)|\s(\d+\s&quot;.+?&quot;))

Sample code:

const std::string input{ R&quot;(VAL_ 234 State1 123 &quot;Description 1&quot; 0 &quot;Description 2 with \n new line&quot; 90903489 &quot;Big value and special
characters &amp;$&#167;())!&quot;)&quot; };
const std::regex regex{ R&quot;((?:^VAL_\s(\d+)\s(\w+)|\s(\d+\s&quot;.+?&quot;)))&quot; };
const std::sregex_iterator end{};
for(auto it = std::sregex_iterator{ std::cbegin(input), std::cend(input), regex };
    it != end; ++it) {
    auto match = *it;
    if (match.empty()) {
        std::cerr &lt;&lt; &quot;Nothing matched&quot; &lt;&lt; &#39;\n&#39;;
        continue;
    } else {
        if (match[1].matched) {
            std::cout &lt;&lt; &quot;Val match: &quot; &lt;&lt; match[1].str() &lt;&lt; &#39;\n&#39;;
        }
        if (match[2].matched) {
            std::cout &lt;&lt; &quot;State match: &quot; &lt;&lt; match[2].str() &lt;&lt; &#39;\n&#39;;
        }
        if (match[3].matched) {
            std::cout &lt;&lt; &quot;Etc match: &quot; &lt;&lt; match[3].str() &lt;&lt; &#39;\n&#39;;
        }
    }
}

答案2

得分: 0

根据@rustyx的链接,我创建了自己的解析器。

enum VALToken {
    Identifier = 0,
    CANId,
    SignalName,
    Value,
    Description
};

struct ValueDescription {
    std::string value;
    std::string description;
};

int main(int argc, char *argv[]) {
    const std::string s = R"(VAL_ 234 State1 123 "Description 1" 0 "Description 2 with \n new line" 90903489 "Big value and special characters &amp;$&#167;())!" ;)";

    auto state = Identifier;
    const char* a = s.data();
    std::string can_id;
    std::string signal_name;
    std::vector<ValueDescription> vds;
    ValueDescription vd;
    for (;;) {
        switch (state) {
        case Identifier: {
            if (*a != 'V')
                return 0;
            a++;
            if (*a != 'A')
                return 0;
            a++;
            if (*a != 'L')
                return 0;
            a++;
            if (*a != '_')
                return 0;
            a++;
            if (*a != ' ')
                return 0;
            a++; // 跳过空格
            state = CANId;
            break;
        }
        case CANId: {
            while (*a >= '0' && *a <= '9') {
                can_id += *a;
                a++;
            }
            if (can_id.empty())
                return 0;
            if (*a != ' ')
                return 0;
            a++; // 跳过空格
            state = SignalName;
            break;
        }
        case SignalName: {
            if ((*a >= 'a' && *a <= 'z') || (*a >= 'A' && *a <= 'Z') || *a == '_')
                signal_name += *a;
            else
                return 0;
            a++;
            while ((*a >= 'a' && *a <= 'z') || (*a >= 'A' && *a <= 'Z') || *a == '_' || (*a >= '0' && *a <= '9')) {
                signal_name += *a;
                a++;
            }
            if (*a != ' ')
                return 0;
            a++; // 跳过空格
            state = Value;
            break;
        }
        case Value: {
            std::string value_str;
            while (*a >= '0' && *a <= '9') {
                value_str += *a;
                a++;
            }
            if (value_str.empty())
                return 0;

            if (*a != ' ')
                return 0;
            a++; // 跳过空格
            vd.value = value_str;
            state = Description;
            break;
        }
        case Description: {
            std::string desc;
            if (*a != '"')
                return 0;
            a++;
            while (*a != '"' && *a != 0) {
                desc += *a;
                a++;
            }
            if (*a == 0)
                return 0;
            a++;
            if (*a != ' ')
                return 0;
            a++; // 跳过空格

            vd.description = desc;
            vds.push_back(vd);

            state = Value;
            break;
        }
        }
    }

    return 0;
}
英文:

Based on @rustyx link I created my own parser

enum VALToken {
    Identifier = 0,
    CANId,
    SignalName,
    Value,
    Description
};

struct ValueDescription{
    std::string value;
    std::string description;
};


int main(int argc, char *argv[])
{
    const std::string s = R&quot;(VAL_ 234 State1 123 &quot;Description 1&quot; 0 &quot;Description 2 with \n new line&quot; 90903489 &quot;Big value and special characters &amp;$&#167;())!&quot; ;)&quot;;

    auto state = Identifier;
    const char* a = s.data();
    std::string can_id;
    std::string signal_name;
    std::vector&lt;ValueDescription&gt; vds;
    ValueDescription vd;
    for (;;) {
        switch (state) {
        case Identifier: {
            if (*a != &#39;V&#39;)
                return 0;
            a++;
            if (*a != &#39;A&#39;)
                return 0;
            a++;
            if (*a != &#39;L&#39;)
                return 0;
            a++;
            if (*a != &#39;_&#39;)
                return 0;
            a++;
            if (*a != &#39; &#39;)
                return 0;
            a++; // skip whitespace
            state = CANId;
            break;
        }
        case CANId: {
            while(*a &gt;= &#39;0&#39; &amp;&amp; *a &lt;= &#39;9&#39;) {
                can_id += *a;
                a++;
            }
            if (can_id.empty())
                return 0;
            if (*a != &#39; &#39;)
                return 0;
            a++; // skip whitespace
            state = SignalName;
            break;
        }
        case SignalName: {
            if ((*a &gt;= &#39;a&#39; &amp;&amp; *a &lt;= &#39;z&#39;) || (*a &gt;= &#39;A&#39; &amp;&amp; *a &lt;= &#39;Z&#39;) || *a == &#39;_&#39;)
                signal_name += *a;
            else
                return 0;
            a++;
            while ((*a &gt;= &#39;a&#39; &amp;&amp; *a &lt;= &#39;z&#39;) || (*a &gt;= &#39;A&#39; &amp;&amp; *a &lt;= &#39;Z&#39;) || *a == &#39;_&#39; || (*a &gt;= &#39;0&#39; &amp;&amp; *a &lt;= &#39;9&#39;)) {
                signal_name += *a;
                a++;
            }
            if (*a != &#39; &#39;)
                return 0;
            a++; // skip whitespace
            state = Value;
            break;
        }
        case Value: {
            std::string value_str;
            while (*a &gt;= &#39;0&#39; &amp;&amp; *a &lt;= &#39;9&#39;) {
                value_str += *a;
                a++;
            }
            if (value_str.empty())
                return 0;

            if (*a != &#39; &#39;)
                return 0;
            a++; // skip whitespace
            vd.value = value_str;
            state = Description;
            break;
        }
        case Description: {
            std::string desc;
            if (*a != &#39;&quot;&#39;)
                return 0;
            a++;
            while (*a != &#39;&quot;&#39; &amp;&amp; *a != 0) {
                desc += *a;
                a++;
            }
            if (*a == 0)
                return 0;
            a++;
            if (*a != &#39; &#39;)
                return 0;
            a++; // skip whitespace

            vd.description = desc;
            vds.push_back(vd);

            state = Value;
            break;
        }
        }
    }

    return 0;
}

答案3

得分: 0

我会执行一个regex_match,然后使用sregex_iterator进行循环。

[演示]

#include <fmt/core.h>
#include <regex>
#include <string>

int main() {
    const std::string text{ "VAL_ 234 State1"
        " 123 \"Description 1\""
        " 0 \"Description 2 with \\n new line\""
        " 90903489 \"Big value and special characters &$#167;())!\""
    };
    const std::regex pattern{ R"(VAL_ (\d+) \w+(\d+)(.*))" };
    std::smatch matches{};
    if (std::regex_match(text, matches, pattern)) {
        fmt::print("{}\n{}\n", matches[1].str(), matches[2].str());

        std::regex array_pattern{ R"(\s+(\d+)\s+"([^"]+)")" };
        auto array_text{ matches[3].str() };
        for (std::sregex_iterator it{ array_text.begin(), array_text.end(), array_pattern };
            it != std::sregex_iterator{};
            ++it) {

            std::smatch array_matches{ *it };
            fmt::print("\t'{}', '{}'\n", array_matches[1].str(), array_matches[2].str());
        }
    }
}

// 输出:
//
// 234
// 1
//    '123', 'Description 1'
//    '0', 'Description 2 with \n new line'
//    '90903489', 'Big value and special characters &$#167;())!'
英文:

I would do a regex_match followed by a loop using an sregex_iterator.

[Demo]

#include &lt;fmt/core.h&gt;
#include &lt;regex&gt;
#include &lt;string&gt;

int main() {
    const std::string text{ &quot;VAL_ 234 State1&quot;
        &quot; 123 \&quot;Description 1\&quot;&quot;
        &quot; 0 \&quot;Description 2 with \\n new line\&quot;&quot;
        &quot; 90903489 \&quot;Big value and special characters &amp;$&#167;())!\&quot;&quot;
    };
    const std::regex pattern{ R&quot;(VAL_ (\d+) \w+(\d+)(.*))&quot; };
    std::smatch matches{};
    if (std::regex_match(text, matches, pattern)) {
        fmt::print(&quot;{}\n{}\n&quot;, matches[1].str(), matches[2].str());

        std::regex array_pattern{ R&quot;(\s+(\d+)\s+\&quot;([^&quot;]+)\&quot;)&quot; };
        auto array_text{ matches[3].str() };
        for (std::sregex_iterator it{ array_text.begin(), array_text.end(), array_pattern };
            it != std::sregex_iterator{};
            ++it) {

            std::smatch array_matches{ *it };
            fmt::print(&quot;\t&#39;{}&#39;, &#39;{}&#39;\n&quot;, array_matches[1].str(), array_matches[2].str());
        }
    }
}

// Outputs:
//
// 234
// 1
//    &#39;123&#39;, &#39;Description 1&#39;
//    &#39;0&#39;, &#39;Description 2 with \n new line&#39;
//    &#39;90903489&#39;, &#39;Big value and special characters &amp;$&#167;())!&#39;

huangapple
  • 本文由 发表于 2023年2月10日 04:29:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/75404112.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定