英文:
set a sentinel value for std::map.end() for gcc compiler
问题
以下是您要翻译的内容:
"Here I specifically only care about the GCC compiler and run time code efficiency.
Consider the following code try me
#include <iostream>
#include <map>
char Find(const std::map<int, char>& map, int key) {
auto iter = map.find(key);
if (iter == map.end())
return 'X';
return iter->second;
}
char Find2(const std::map<int, char>& map, int key) {
return map.find(key)->second;
}
int main()
{
// part 1
std::map<int, char> x{{0,'0'}, {4,'4'}};
std::cout << Find(x, 3) << std::endl;
std::cout << Find(x, 4) << std::endl;
std::cout << (int)Find2(x, 3) << std::endl; // returns 0
std::cout << Find2(x, 4) << std::endl;
// part 2: Find2 is a shortcut
std::map<int, char> y(x);
y.end()->second = 'X';
std::cout << Find2(y, 3) << std::endl;
std::cout << Find2(y, 4) << std::endl;
}
The part 2 also works for a GCC compiler I tested in Godbolt, despite it uses the end()
in a weird way.
In GCC, does map allocate a node std::pair
to represent the end? Will it be changing when elements are added/deleted? This is related to how the map's end()
is really implemented, and I am curious to know it.
As many people pointed out, the C++ standard defines it as UB if a the end()
is dereferenced.
However, according to this answer, which GCC seems to have implemented the map in a way that the end()
is pointing to a root node. With this I think setting the root node's value to X
here seems to be a valid operation. Does this mean that the above code should work for GCC?"
英文:
Here I specifically only care about the GCC compiler and run time code efficiency.
Consider the following code try me
#include <iostream>
#include <map>
char Find(const std::map<int, char>& map, int key) {
auto iter = map.find(key);
if (iter == map.end())
return 'X';
return iter->second;
}
char Find2(const std::map<int, char>& map, int key) {
return map.find(key)->second;
}
int main()
{
// part 1
std::map<int, char> x{{0,'0'}, {4,'4'}};
std::cout << Find(x, 3) << std::endl;
std::cout << Find(x, 4) << std::endl;
std::cout << (int)Find2(x, 3) << std::endl; // returns 0
std::cout << Find2(x, 4) << std::endl;
// part 2: Find2 is a shortcut
std::map<int, char> y(x);
y.end()->second = 'X';
std::cout << Find2(y, 3) << std::endl;
std::cout << Find2(y, 4) << std::endl;
}
The part 2 also works for a GCC compiler I tested in Godbolt, despite it uses the end()
in a weird way.
In GCC, does map allocate a node std::pair
to represent the end? Will it be changing when elements are added/deleted? This is related to how the map's end()
is really implemented, and I am curious to know it.
As many people pointed out, the C++ standard defines it as UB if a the end()
is dereferenced.
However, according to this answer, which GCC seems to have implemented the map in a way that the end()
is pointing to a root node. With this I think setting the root node's value to X
here seems to be a valid operation. Does this mean that the above code should work for GCC?
答案1
得分: 4
以下是您要翻译的内容:
自从我阅读了几种GCC实现的STL容器之后,我对end()
sentinel的实现有了相当了解。由于问题涉及到std::map
容器的end()
如何实现,我会处理它,尽管这个想法几乎适用于所有STL基于节点的容器。
std::map
接口的大部分,就像其他关联容器一样,都是围绕着一个在stl_tree.h
文件中实现的_Rb_tree
对象的包装,部分实现在tree.tcc
文件中。而正是在头文件中可以找到基本节点_Rb_tree_node_base
和节点_Rb_tree_node
的实现。节点是基本节点的派生类,其中添加了存储成员。确切地说,自从C++11以来,存储成员不再是类型T
,而是使用了std::aligned_storage
类型的GCC版本来定义。
诀窍是使用基本节点的实例作为哨兵节点,它既代表了过去的最后一个元素 - 根据定义,哨兵节点是一个专门设计的节点,用作路径遍历终止器 - 也代表了前面的第一个元素。实质上,哨兵节点被解释为最后一个元素之后的元素和第一个元素之前的元素。
GCC的实现已经设计了std::map
的哨兵节点,以使用其对父节点、左子节点和右子节点的指针来跟踪根元素、最左元素和最右元素。
原始实现如下:
_Base_ptr&
_M_root() _GLIBCXX_NOEXCEPT
{ return this->_M_impl._M_header._M_parent; }
_Base_ptr&
_M_leftmost() _GLIBCXX_NOEXCEPT
{ return this->_M_impl._M_header._M_left; }
_Base_ptr&
_M_rightmost() _GLIBCXX_NOEXCEPT
{ return this->_M_impl._M_header._M_right; }
然后,_M_header
成员就是哨兵节点。使用这种方法来处理基于节点的容器有几个优点,但是std::forward_list
除外,主要好处有以下两个:
- 由于
end()
迭代器直接指向哨兵节点,因此它不是悬挂指针,即使容器为空也不会表现出特定的行为 - 在这种情况下,begin()
和end()
都引用同一元素,即哨兵节点; - 由于哨兵节点用于表示过去的最后一个元素和前面的第一个元素,因此在开始之前和结束之后始终存在有效的节点,然后可以在相同的方式中执行插入、删除和拼接操作。
这种实现的一个好处是,如果迭代器end()
被递增,它将会落在容器的开头。如前所述,std::forward_list
容器没有其他基于节点的容器的相同特性,因为它的哨兵节点不正确地表示只有前面的第一个元素,由before_begin()
迭代器指向,而std::nullptr
用于表示过去的最后一个元素。因此,如果递增before_begin()
迭代器,它将落在序列的开头,但如果递增end()
迭代器,将导致未定义行为。
对于所有其他基于节点的容器,如std::list
,以下代码始终有效:
std::list<int> l{0, 1, 2, 3, 4, 5, 6};
auto it{l.cend()};
if (++it == l.cbegin())
std::cout << "Wow!\n";
从中您可能已经猜到,尝试解引用end()
迭代器会导致未定义行为。
英文:
Since I had read several GCC implementations of STL containers, I have a fair understanding of how the end()
sentinel is implemented.
Since the question refers to how end()
is implemented for the std::map
container, I will deal with it, although the idea applies to almost all STL node-based containers.
The majority of the std::map
interface, like the other associative containers, is a wrapper around a _Rb_tree
object, that is implemented in the stl_tree.h
file and partly in the tree.tcc
file. And it is precisely in the header file that the implementation of the base node, _Rb_tree_node_base
, and the node,_Rb_tree_node
, can be found. The node is a derived class of the base node, to which the storage member is added. To be precise, since C++11 the storage member is no more of type T
, but is defined with the GCC version of the std::aligned_storage
type.
The trick is to use an instance of the base node as a sentinel node, that represents both the past-the-last element - by definition, the sentinel node is a specifically designed node that is used as a path traversal terminator - and the before-the-first element. Essentially, the sentinel node is interpreted as both the element after the last element and the element before the first element.
The GCC implementation has designed the std::map
's sentinel node in order to keep track of the root element, the leftmost element and the rightmost element using its pointers to the parent, the left child and the right child, respectively.
The original implementation is the following:
_Base_ptr&
_M_root() _GLIBCXX_NOEXCEPT
{ return this->_M_impl._M_header._M_parent; }
_Base_ptr&
_M_leftmost() _GLIBCXX_NOEXCEPT
{ return this->_M_impl._M_header._M_left; }
_Base_ptr&
_M_rightmost() _GLIBCXX_NOEXCEPT
{ return this->_M_impl._M_header._M_right; }
The _M_header
member is then the sentinel node. There are several advantages to using this approach for node-based containers, but std::forward_list
, and the main benefits are the following two:
- since the
end()
iterator points directly to the sentinel node, it is not a dangling pointer and do not exhibits a particular behavior even if the container is empty - in this case, bothbegin()
andend()
refer to the same thing, the sentinel node; - since the sentinel node is used to represent both the past-the-last element and the before-the-first element, there will always be a valid node before the beginning and after the end and then insert, erase and splice operations can be performed at any position in the same way.
A nice side effect of this implementation is that if the iterator end()
is incremented, it will land at the beginning of the container. As mentioned before, the std::forward_list
container does not have the same features of the other node-based containers, because its sentinel node improperly represents only the before-the-first element, pointed by the before_begin()
iterator, and the std::nullptr
is used to represent the past-the-last element. Then, if the before_begin()
iterator is incremented, it will land at the beginning of the sequence, but if the end()
iterator is incremented, it will lead to UB.
For all other node-based containers, such as std::list
, the following code is always valid:
std::list<int> l{0, 1, 2, 3, 4, 5, 6};
auto it{l.cend()};
if(++it == l.cbegin())
std::cout << "Wow!\n";
You may have guessed from this that trying to dereference the end()
iterator is UB.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论