从多行字符串中提取一个数字?

huangapple go评论73阅读模式
英文:

Extract a number from the middle of a multi-line string?

问题

I need to extract the scale value (which in this case is 0.5). My first attempt was to use regex as follows:

float scale = 1;
std::regex rgx("<Item name="SCALE">(.*?)</Item>");
std::smatch match;       
if (std::regex_search(metadata.begin(), metadata.end(), match, rgx)) {
    scale = static_cast<float>(std::atof(match.str().c_str()));
};

这个正则表达式的尝试没有成功,原因可能是正则表达式的写法不正确。如果您只想提取scale的值,可以尝试以下修正后的代码:

float scale = 1;
std::regex rgx("&lt;Item name=&quot;SCALE&quot;&gt;([0-9.]+)&lt;/Item&gt;");
std::smatch match;       
if (std::regex_search(metadata, match, rgx)) {
    scale = std::stof(match[1]);
};

这个正则表达式会匹配<Item name="SCALE">标签中的数字(包括小数点),然后将其提取并转换为浮点数。

希望这可以帮助您成功提取scale的值。

英文:

I'm working on a C++ program and I need to read a piece of meta-data from a TIF file. The meta-data is a string that looks like the following:

&lt;GDALMetadata&gt;
  &lt;Item name=&quot;BANDWIDTH&quot;&gt;&lt;/Item&gt;
  &lt;Item name=&quot;CENTER_FILTER_WAVELENGTH&quot;&gt;&lt;/Item&gt;
  &lt;Item name=&quot;DATA_SET_ID&quot;&gt;&amp;amp;quot;LRO-L-LOLA-4-GDR-V1.0&amp;amp;quot;&lt;/Item&gt;
  &lt;Item name=&quot;FILTER_NAME&quot;&gt;&lt;/Item&gt;
  &lt;Item name=&quot;INSTRUMENT_ID&quot;&gt;&amp;amp;quot;LOLA&amp;amp;quot;&lt;/Item&gt;
  &lt;Item name=&quot;INSTRUMENT_NAME&quot;&gt;&amp;amp;quot;LUNAR ORBITER LASER ALTIMETER&amp;amp;quot;&lt;/Item&gt;
  &lt;Item name=&quot;MISSION_NAME&quot;&gt;&lt;/Item&gt;
  &lt;Item name=&quot;NOTE&quot;&gt;&lt;/Item&gt;
  &lt;Item name=&quot;PRODUCER_INSTITUTION_NAME&quot;&gt;&amp;amp;quot;GODDARD SPACE FLIGHT CENTER&amp;amp;quot;&lt;/Item&gt;
  &lt;Item name=&quot;PRODUCT_CREATION_TIME&quot;&gt;2017-09-15&lt;/Item&gt;
  &lt;Item name=&quot;START_TIME&quot;&gt;2009-07-13T17:33:17&lt;/Item&gt;
  &lt;Item name=&quot;STOP_TIME&quot;&gt;2016-11-29T05:48:19&lt;/Item&gt;
  &lt;Item name=&quot;OFFSET&quot; sample=&quot;0&quot; role=&quot;offset&quot;&gt;1737400&lt;/Item&gt;
  &lt;Item name=&quot;SCALE&quot; sample=&quot;0&quot; role=&quot;scale&quot;&gt;0.5&lt;/Item&gt;
&lt;/GDALMetadata&gt;

I need to extract the scale value (which in this case is 0.5). My first attempt was to use regex as follows:

float scale = 1;
std::regex rgx(&quot;*&lt;Item name=\&quot;SCALE\&quot;*&gt;(.*?)&lt;/Item&gt;*&quot;);
std::smatch match;       
if (std::regex_search(metadata.begin(), metadata.end(), match, rgx)) {
    scale = static_cast&lt;float&gt;(std::atof(match.str().c_str()));
};

This did not work, and I'm unsure why. I'm very inexperienced with regex.

Obviously this looks like HTML but as I only need this one specific field I was thinking it should be simpler to simply try to extract that directly.

答案1

得分: 2

你可以在两个定界符role=&quot;scale&quot;&gt;&lt;/Item&gt;之间找到子字符串,首先可以删除所有在role=&quot;scale&quot;&gt;之前的&lt;/Item&gt;实例,以使子字符串正确工作,然后使用metadata.substr()metadata.find()找到role=&quot;scale&quot;&gt;&lt;/Item&gt;之间的子字符串。

#include <string>

float scale = 1;
while(metadata.find("&lt;/Item&gt;") < metadata.find("role=&quot;scale&quot;&gt;")){
  metadata.replace(metadata.find("&lt;/Item&gt;"), 7, "");
}
if(metadata.find("role=&quot;scale&quot;&gt;") != std::string::npos && metadata.find("&lt;/Item&gt;") != std::string::npos){
  scale = std::stof(metadata.substr(metadata.find("role=&quot;scale&quot;&gt;") + 13, metadata.find("&lt;/Item&gt;") - metadata.find("role=&quot;scale&quot;&gt;") - 13));
}
英文:

You can find the substring between the 2 delimiters role=&quot;scale&quot;&gt; and &lt;/Item&gt;, firstly you can remove all &lt;/Item&gt; instances that are before role=&quot;scale&quot;&gt;, so the substring works correctly, then find the substring between role=&quot;scale&quot;&gt; and &lt;/Item&gt; using metadata.substr() and metadata.find().

#include &lt;string&gt;

float scale = 1;
while(metadata.find(&quot;&lt;/Item&gt;&quot;) &lt; metadata.find(&quot;role=\&quot;scale\&quot;&gt;&quot;){
  metadata.replace(metadata.find(&quot;&lt;/Item&gt;&quot;), 7, &quot;&quot;);
}
if(metadata.find(&quot;role=\&quot;scale\&quot;&gt;&quot;) != string::npos &amp;&amp; metadata.find(&quot;&lt;/Item&gt;&quot;) != string::npos){
  scale = stof(metadata.substr(metadata.find(&quot;role=\&quot;scale\&quot;&gt;&quot;) + 13, metadata.find(&quot;&lt;/Item&gt;&quot;) - metadata.find(&quot;role=\&quot;scale\&quot;&gt;&quot;) - 13));
}

</details>



# 答案2
**得分**: 2

你的正则表达式字符串文字应该改为以下内容:

```cpp
"&quot;&lt;Item name=\&quot;SCALE\&quot;[^&gt;]*&gt;(.*?)&lt;\\/Item&gt;&quot;"

换句话说,去掉开头和结尾的 *,你不需要它们。并且在 &quot;SCALE&quot; 后面使用 [^&gt;]* 而不是只有 *,以忽略直到但不包括 &gt; 的所有内容。此外,在正则表达式本身中需要转义 &lt;/Item&gt; 中的 /(不是在字符串文字中)。

也就是说,match.str() 将返回与正则表达式匹配的整个子字符串,而不是你期望的 (.*?) 组中的值。因此,要提取组的值,使用 match[1].str()

最后,考虑使用 std::stof() 而不是 atof()

请尝试以下代码:

float scale = 1;
std::regex rgx("&quot;&lt;Item name=\&quot;SCALE\&quot;[^&gt;]*&gt;(.*?)&lt;\\/Item&gt;&quot;");
std::smatch match;       
if (std::regex_search(metadata.cbegin(), metadata.cend(), match, rgx)) {
    scale = std::stof(match[1].str());
}

在线演示

英文:

Your regex string literal should be this instead:

&quot;&lt;Item name=\&quot;SCALE\&quot;[^&gt;]*&gt;(.*?)&lt;\\/Item&gt;&quot;

IOW, drop the leading and trailing *, you don't need them. And use [^&gt;]* instead of just * to ignore everything up to but not including &gt; after &quot;SCALE&quot;. And you need to escape the / in &lt;/Item&gt; (in the regex itself, not in the string literal).

That being said, match.str() will return the entire substring that matched the regex, not the value in the (.*?) group as you are expecting. Thus, std::atof() will receive an invalid string and fail. To extract just the group value, use match[1].str() instead.

Lastly, consider using std::stof() instead of atof().

Try this:

float scale = 1;
std::regex rgx(&quot;&lt;Item name=\&quot;SCALE\&quot;[^&gt;]*&gt;(.*?)&lt;\\/Item&gt;&quot;);
std::smatch match;       
if (std::regex_search(metadata.cbegin(), metadata.cend(), match, rgx)) {
    scale = std::stof(match[1].str());
}

Online Demo

答案3

得分: 2

我认为使用 std::regex 过于假设未来元数据的格式,因为它是一个 XML 文本。XML 可以被重新排列,包含断行并且顺序可能不同。

我倾向于使用可以解析和处理 XML 的库,比如 libxml2 或 boost::property_tree。链接

以下示例解析了您的元数据并打印了比例。

#include <string>
#include <iostream>
#include <boost/property_tree/ptree.hpp>
#include <boost/property_tree/xml_parser.hpp>

std::string metadata = R"(
<GDALMetadata>
  <Item name="BANDWIDTH"></Item>
  <Item name="CENTER_FILTER_WAVELENGTH"></Item>
  <Item name="DATA_SET_ID">"LRO-L-LOLA-4-GDR-V1.0"</Item>
  <Item name="FILTER_NAME"></Item>
  <Item name="INSTRUMENT_ID">"LOLA"</Item>
  <Item name="INSTRUMENT_NAME">"LUNAR ORBITER LASER ALTIMETER"</Item>
  <Item name="MISSION_NAME"></Item>
  <Item name="NOTE"></Item>
  <Item name="PRODUCER_INSTITUTION_NAME">"GODDARD SPACE FLIGHT CENTER"</Item>
  <Item name="PRODUCT_CREATION_TIME">2017-09-15</Item>
  <Item name="START_TIME">2009-07-13T17:33:17</Item>
  <Item name="STOP_TIME">2016-11-29T05:48:19</Item>
  <Item name="OFFSET" sample="0" role="offset">1737400</Item>
  <Item name="SCALE" sample="0" role="scale">0.5</Item>
</GDALMetadata>)";

using namespace boost::property_tree;

int main() {
    std::istringstream input( metadata );
    ptree tree;
    read_xml(input, tree);
    auto items = tree.get_child("GDALMetadata", ptree());
    for (const auto& f: items) {
        auto p = f.second;
        std::string name = p.get<std::string>("<xmlattr>.name", "");
        if (name == "SCALE") { 
            std::cout << "Scale: " << p.data() << std::endl;
        }
    }
}

结果为

程序输出
Scale: 0.5

Godbolt 链接:https://godbolt.org/z/K94EW9YMf

英文:

I believe that using std::regex is to assume too much of the future format of the metadata, given that it is an XML text. XML can be shuffled, contain breaks and be in different order.

I would lean towards using a library that can parse and handle XML like libxml2 or boost::property_tree

Link.

The following example parses your metadata and prints the scale.

#include &lt;string&gt;
#include &lt;iostream&gt;
#include &lt;boost/property_tree/ptree.hpp&gt;
#include &lt;boost/property_tree/xml_parser.hpp&gt;

std::string metadata = R&quot;(
&lt;GDALMetadata&gt;
  &lt;Item name=&quot;BANDWIDTH&quot;&gt;&lt;/Item&gt;
  &lt;Item name=&quot;CENTER_FILTER_WAVELENGTH&quot;&gt;&lt;/Item&gt;
  &lt;Item name=&quot;DATA_SET_ID&quot;&gt;&amp;amp;quot;LRO-L-LOLA-4-GDR-V1.0&amp;amp;quot;&lt;/Item&gt;
  &lt;Item name=&quot;FILTER_NAME&quot;&gt;&lt;/Item&gt;
  &lt;Item name=&quot;INSTRUMENT_ID&quot;&gt;&amp;amp;quot;LOLA&amp;amp;quot;&lt;/Item&gt;
  &lt;Item name=&quot;INSTRUMENT_NAME&quot;&gt;&amp;amp;quot;LUNAR ORBITER LASER ALTIMETER&amp;amp;quot;&lt;/Item&gt;
  &lt;Item name=&quot;MISSION_NAME&quot;&gt;&lt;/Item&gt;
  &lt;Item name=&quot;NOTE&quot;&gt;&lt;/Item&gt;
  &lt;Item name=&quot;PRODUCER_INSTITUTION_NAME&quot;&gt;&amp;amp;quot;GODDARD SPACE FLIGHT CENTER&amp;amp;quot;&lt;/Item&gt;
  &lt;Item name=&quot;PRODUCT_CREATION_TIME&quot;&gt;2017-09-15&lt;/Item&gt;
  &lt;Item name=&quot;START_TIME&quot;&gt;2009-07-13T17:33:17&lt;/Item&gt;
  &lt;Item name=&quot;STOP_TIME&quot;&gt;2016-11-29T05:48:19&lt;/Item&gt;
  &lt;Item name=&quot;OFFSET&quot; sample=&quot;0&quot; role=&quot;offset&quot;&gt;1737400&lt;/Item&gt;
  &lt;Item name=&quot;SCALE&quot; sample=&quot;0&quot; role=&quot;scale&quot;&gt;0.5&lt;/Item&gt;
&lt;/GDALMetadata&gt;)&quot;;

using namespace boost::property_tree;

int main() {
    std::istringstream input( metadata );
    ptree tree;
    read_xml(input, tree);
    auto items = tree.get_child(&quot;GDALMetadata&quot;, ptree());
    for (const auto&amp; f: items) {
        auto p = f.second;
        std::string name = p.get&lt;std::string&gt;(&quot;&lt;xmlattr&gt;.name&quot;, &quot;&quot;);
        if ( name==&quot;SCALE&quot; ) { 
            std::cout &lt;&lt; &quot;Scale: &quot;&lt;&lt; p.data() &lt;&lt; std::endl;
        }
    }
}

Results in

Program stdout
Scale: 0.5

Godbolt: https://godbolt.org/z/K94EW9YMf

答案4

得分: 1

我会以传统的方式来做这件事。读取字符串,直到找到包含单词“scale”的那个字符串,然后详细解析它:

std::istringstream metadata_stream(metadata_string);
std::string metadata_text_line;
bool found = false;
while (std::getline(metadata_stream, metadata_text_line))
{
    if (metadata_text_line.find("SCALE") != std::string::npos)
    {
        static const char key_text[] = "\"scale\">";
        std::string::size_type position = metadata_text_line.find(key_text);
        if (position != std::string::npos)
        {
             std::string::size_type value_start_position = (position + sizeof(key_text) - 1U);
             std::string::size_type end_position = metadata_text_line.find("<", value_start_position);
             std::string scale_text = metadata_text_line.substr(value_start_position, end_position - value_start_position);
             //...
        }
    }
}

这段代码提供了一个一般性的思路或解决方案;它可能存在一些问题。

英文:

I would do this the old fashioned way. Read strings until you find the one with the word "scale", then parse this one in more detail:

std::istringstream metadata_stream(metadata_string);
std::string metadata_text_line;
bool found = false;
while (std::getline(metadata_text_line, metadata_stream))
{
if (metadata_text_line.find(&quot;SCALE&quot;) != std::string::npos)
{
static const char    key_text[] = &quot;\&quot;scale\&quot;&gt;&quot;;
std::string::size_type position = metadata_text_line.find(key_text);
if (position != std::string::npos)
{
std::string::npos value_start_position = (position + sizeof(key_text) - 1U);
end_position = metadata_text_line.find(value_start_position, &quot;&lt;&quot;);
std::string scale_text = metadata_text_line.substr(value_start_position,
end_position - value_startOposition);
//...
}
}
}

This code presents a general idea or solution; there may be issues with it.

huangapple
  • 本文由 发表于 2023年6月2日 07:02:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76386209.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定