为gcc编译器设置std::map.end()的哨兵值

huangapple go评论67阅读模式
英文:

set a sentinel value for std::map.end() for gcc compiler

问题

以下是您要翻译的内容:

"Here I specifically only care about the GCC compiler and run time code efficiency.

Consider the following code try me

#include <iostream>
#include <map>

char Find(const std::map<int, char>& map, int key) {
    auto iter = map.find(key);
    if (iter == map.end()) 
        return 'X';
    return iter->second;
}

char Find2(const std::map<int, char>& map, int key) {
    return map.find(key)->second;
}

int main()
{
    // part 1
    std::map<int, char> x{{0,'0'}, {4,'4'}};
    std::cout << Find(x, 3) << std::endl;
    std::cout << Find(x, 4) << std::endl;
    std::cout << (int)Find2(x, 3) << std::endl; // returns 0
    std::cout << Find2(x, 4) << std::endl;

    // part 2: Find2 is a shortcut
    std::map<int, char> y(x);
    y.end()->second = 'X';
    std::cout << Find2(y, 3) << std::endl;
    std::cout << Find2(y, 4) << std::endl;
}

The part 2 also works for a GCC compiler I tested in Godbolt, despite it uses the end() in a weird way.

In GCC, does map allocate a node std::pair to represent the end? Will it be changing when elements are added/deleted? This is related to how the map's end() is really implemented, and I am curious to know it.

As many people pointed out, the C++ standard defines it as UB if a the end() is dereferenced.

However, according to this answer, which GCC seems to have implemented the map in a way that the end() is pointing to a root node. With this I think setting the root node's value to X here seems to be a valid operation. Does this mean that the above code should work for GCC?"

英文:

Here I specifically only care about the GCC compiler and run time code efficiency.

Consider the following code try me

#include <iostream>
#include <map>

char Find(const std::map<int, char>& map, int key) {
    auto iter = map.find(key);
    if (iter == map.end()) 
        return 'X';
    return iter->second;
}

char Find2(const std::map<int, char>& map, int key) {
    return map.find(key)->second;
}

int main()
{
    // part 1
    std::map<int, char> x{{0,'0'}, {4,'4'}};
    std::cout << Find(x, 3) << std::endl;
    std::cout << Find(x, 4) << std::endl;
    std::cout << (int)Find2(x, 3) << std::endl; // returns 0
    std::cout << Find2(x, 4) << std::endl;

    // part 2: Find2 is a shortcut
    std::map<int, char> y(x);
    y.end()->second = 'X';
    std::cout << Find2(y, 3) << std::endl;
    std::cout << Find2(y, 4) << std::endl;
}

The part 2 also works for a GCC compiler I tested in Godbolt, despite it uses the end() in a weird way.

In GCC, does map allocate a node std::pair to represent the end? Will it be changing when elements are added/deleted? This is related to how the map's end() is really implemented, and I am curious to know it.

As many people pointed out, the C++ standard defines it as UB if a the end() is dereferenced.

However, according to this answer, which GCC seems to have implemented the map in a way that the end() is pointing to a root node. With this I think setting the root node's value to X here seems to be a valid operation. Does this mean that the above code should work for GCC?

答案1

得分: 4

以下是您要翻译的内容:

自从我阅读了几种GCC实现的STL容器之后,我对end() sentinel的实现有了相当了解。由于问题涉及到std::map容器的end()如何实现,我会处理它,尽管这个想法几乎适用于所有STL基于节点的容器。

std::map接口的大部分,就像其他关联容器一样,都是围绕着一个在stl_tree.h文件中实现的_Rb_tree对象的包装,部分实现在tree.tcc文件中。而正是在头文件中可以找到基本节点_Rb_tree_node_base和节点_Rb_tree_node的实现。节点是基本节点的派生类,其中添加了存储成员。确切地说,自从C++11以来,存储成员不再是类型T,而是使用了std::aligned_storage类型的GCC版本来定义。

诀窍是使用基本节点的实例作为哨兵节点,它既代表了过去的最后一个元素 - 根据定义,哨兵节点是一个专门设计的节点,用作路径遍历终止器 - 也代表了前面的第一个元素。实质上,哨兵节点被解释为最后一个元素之后的元素和第一个元素之前的元素。

GCC的实现已经设计了std::map的哨兵节点,以使用其对父节点、左子节点和右子节点的指针来跟踪根元素、最左元素和最右元素。

原始实现如下:

_Base_ptr&
_M_root() _GLIBCXX_NOEXCEPT
{ return this->_M_impl._M_header._M_parent; }

_Base_ptr&
_M_leftmost() _GLIBCXX_NOEXCEPT
{ return this->_M_impl._M_header._M_left; }

_Base_ptr&
_M_rightmost() _GLIBCXX_NOEXCEPT
{ return this->_M_impl._M_header._M_right; }

然后,_M_header成员就是哨兵节点。使用这种方法来处理基于节点的容器有几个优点,但是std::forward_list除外,主要好处有以下两个:

  • 由于end()迭代器直接指向哨兵节点,因此它不是悬挂指针,即使容器为空也不会表现出特定的行为 - 在这种情况下,begin()end()都引用同一元素,即哨兵节点;
  • 由于哨兵节点用于表示过去的最后一个元素和前面的第一个元素,因此在开始之前和结束之后始终存在有效的节点,然后可以在相同的方式中执行插入、删除和拼接操作。

这种实现的一个好处是,如果迭代器end()被递增,它将会落在容器的开头。如前所述,std::forward_list容器没有其他基于节点的容器的相同特性,因为它的哨兵节点不正确地表示只有前面的第一个元素,由before_begin()迭代器指向,而std::nullptr用于表示过去的最后一个元素。因此,如果递增before_begin()迭代器,它将落在序列的开头,但如果递增end()迭代器,将导致未定义行为。

对于所有其他基于节点的容器,如std::list,以下代码始终有效:

std::list<int> l{0, 1, 2, 3, 4, 5, 6};
auto it{l.cend()};
if (++it == l.cbegin())
  std::cout << "Wow!\n";

从中您可能已经猜到,尝试解引用end()迭代器会导致未定义行为。

英文:

Since I had read several GCC implementations of STL containers, I have a fair understanding of how the end() sentinel is implemented.
Since the question refers to how end() is implemented for the std::map container, I will deal with it, although the idea applies to almost all STL node-based containers.

The majority of the std::map interface, like the other associative containers, is a wrapper around a _Rb_tree object, that is implemented in the stl_tree.h file and partly in the tree.tcc file. And it is precisely in the header file that the implementation of the base node, _Rb_tree_node_base, and the node,_Rb_tree_node, can be found. The node is a derived class of the base node, to which the storage member is added. To be precise, since C++11 the storage member is no more of type T, but is defined with the GCC version of the std::aligned_storage type.

The trick is to use an instance of the base node as a sentinel node, that represents both the past-the-last element - by definition, the sentinel node is a specifically designed node that is used as a path traversal terminator - and the before-the-first element. Essentially, the sentinel node is interpreted as both the element after the last element and the element before the first element.
The GCC implementation has designed the std::map's sentinel node in order to keep track of the root element, the leftmost element and the rightmost element using its pointers to the parent, the left child and the right child, respectively.

The original implementation is the following:

_Base_ptr&amp;
       _M_root() _GLIBCXX_NOEXCEPT
       { return this-&gt;_M_impl._M_header._M_parent; }

_Base_ptr&amp;
       _M_leftmost() _GLIBCXX_NOEXCEPT
       { return this-&gt;_M_impl._M_header._M_left; }

_Base_ptr&amp;
      _M_rightmost() _GLIBCXX_NOEXCEPT
      { return this-&gt;_M_impl._M_header._M_right; }

The _M_header member is then the sentinel node. There are several advantages to using this approach for node-based containers, but std::forward_list, and the main benefits are the following two:

  • since the end() iterator points directly to the sentinel node, it is not a dangling pointer and do not exhibits a particular behavior even if the container is empty - in this case, both begin() and end() refer to the same thing, the sentinel node;
  • since the sentinel node is used to represent both the past-the-last element and the before-the-first element, there will always be a valid node before the beginning and after the end and then insert, erase and splice operations can be performed at any position in the same way.

A nice side effect of this implementation is that if the iterator end() is incremented, it will land at the beginning of the container. As mentioned before, the std::forward_list container does not have the same features of the other node-based containers, because its sentinel node improperly represents only the before-the-first element, pointed by the before_begin() iterator, and the std::nullptr is used to represent the past-the-last element. Then, if the before_begin() iterator is incremented, it will land at the beginning of the sequence, but if the end() iterator is incremented, it will lead to UB.

For all other node-based containers, such as std::list, the following code is always valid:

std::list&lt;int&gt; l{0, 1, 2, 3, 4, 5, 6};
auto it{l.cend()};
if(++it == l.cbegin())
  std::cout &lt;&lt; &quot;Wow!\n&quot;;

You may have guessed from this that trying to dereference the end() iterator is UB.

huangapple
  • 本文由 发表于 2023年4月13日 16:47:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76003457.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定