如何高效使用`std::sort`来处理具有运行时记录长度的不透明数据类型?

huangapple go评论108阅读模式
英文:

How to efficiently use `std::sort` for an opaque data type with a run-time record length?

问题

I want to use std::sort for an opaque data type with a run-time record length. Here is an answer to a similar question.

我想在运行时记录长度的不透明数据类型上使用std::sort。以下是一个类似问题的答案

Attempts

I have tried indirectly sorting record pointers and indices, but these methods have terrible cache locality because the comparisons need to access randomly distributed rows of the dataset.

我尝试间接地对记录指针和索引进行排序,但这些方法的缓存局部性很差,因为比较需要访问数据集中随机分布的行。

I also tried falling back to the C functions qsort which allows for run-time record length. This is better, but it requires a callback function for the comparison. This has two drawbacks, 1) performance suffers compared to using a lambda which can be inlined, and 2) the interface ergonomics are poor because you (probably) have to create a structure to hold a functor for the comparison.

我还尝试回退到允许运行时记录长度的C函数qsort。这更好,但它需要一个用于比较的回调函数。这有两个缺点,1) 性能不如使用可以内联的lambda函数,2) 接口人机工程学很差,因为你(可能)必须创建一个用于比较的函数对象的结构。

Possible Idea

The previously mentioned answer suggested a,

先前提到的答案建议了一个,

pod block pseudo reference iterator with a specialized swap,

but did not give any hints about how that might work.

但没有提供关于它如何工作的任何提示。

Summary

To summarize, I want a way to call std::sort passing in a lambda for the comparison for an array of opaque data types with run-time record length as the following code demonstrates,

总之,我想找到一种调用std::sort的方法,以便为具有运行时记录长度的不透明数据类型数组传递一个用于比较的lambda函数,如下面的代码所示,

  1. #include <iostream>
  2. #include <random>
  3. using std::cout, std::endl;
  4. int main(int argc, const char *argv[]) {
  5. // We have 100 records each with 50 bytes.
  6. int nrecords = 100, nbytes = 50;
  7. std::vector<uint8_t> data(nrecords * nbytes);
  8. // Generate random data for testing.
  9. std::uniform_int_distribution<uint8_t> d;
  10. std::mt19937_64 rng;
  11. std::generate(data.begin(), data.end(), [&]() { return d(rng); });
  12. int key_index = 20;
  13. auto cmp = [&](uint8_t *a, uint8_t *b) {
  14. int aval = *(int*)(a + key_index), bval = *(int*)(b + key_index);
  15. return aval < bval;
  16. };
  17. // How can I call std::sort with run-time record length?
  18. // std::sort(data.begin(), ...)
  19. return 0;
  20. }

Iterator Attempt

I attempted to write an iterator based on the previous suggestion, but I have not been able to get it to compile yet. My iterator-foo is sub-par and I feel like I am making a conceptual error.

我尝试根据之前的建议编写一个迭代器,但是我还没有能够成功编译它。我的iterator-foo不够优秀,我觉得我可能在概念上出现了错误。

Here is the code,

  1. #include <algorithm>
  2. #include <random>
  3. struct MemoryIterator {
  4. using iterator_category = std::forward_iterator_tag;
  5. using difference_type = std::ptrdiff_t;
  6. using value_type = uint8_t *;
  7. using pointer = value_type*;
  8. using reference = value_type&;
  9. MemoryIterator(value_type ptr, size_t element_size)
  10. : ptr_(ptr)
  11. , element_size_(element_size) {
  12. }
  13. reference operator*() {
  14. return ptr_;
  15. }
  16. MemoryIterator& operator++() {
  17. ptr_ += element_size_;
  18. return *this;
  19. }
  20. MemoryIterator& operator--() {
  21. ptr_ -= element_size_;
  22. return *this;
  23. }
  24. MemoryIterator operator++(int) {
  25. auto tmp = *this;
  26. ++(*this);
  27. return tmp;
  28. }
  29. MemoryIterator operator--(int) {
  30. auto tmp = *this;
  31. --(*this);
  32. return tmp;
  33. }
  34. MemoryIterator& operator+=(size_t n) {
  35. ptr_ += n * element_size_;
  36. return *this;
  37. }
  38. MemoryIterator operator+(size_t n) {
  39. auto r = *this;
  40. r += n;
  41. return r;
  42. }
  43. MemoryIterator& operator-=(size_t n) {
  44. ptr_ -= n * element_size_;
  45. return *this;
  46. }
  47. MemoryIterator operator-(size_t n) {
  48. auto r = *this;
  49. r -= n;
  50. return r;
  51. }
  52. friend bool operator==(const MemoryIterator& a, const MemoryIterator& b) {
  53. return a.ptr_ == b.ptr_;
  54. }
  55. friend difference_type operator-(const MemoryIterator& a, const MemoryIterator& b) {
  56. return a.ptr_ - b.ptr_;
  57. }
  58. friend void swap(MemoryIterator a, MemoryIterator b) {
  59. }
  60. private:
  61. value_type ptr_;
  62. size_t element_size_;
  63. };
  64. int main(int argc, const char *argv[]) {
  65. int nrecords = 100, nbytes = 50;
  66. std::vector<uint8_t> data(nrecords * nbytes);
  67. std::uniform_int_distribution<uint8_t> d;
  68. std::mt19937_64 rng;
  69. std::generate(data.begin(), data.end(), [&]() { return d(rng); });
  70. int key_index = 20;
  71. auto cmp = [&](uint8_t *a, uint8_t *b) {
  72. int aval = *(int*)(a + key_index), bval = *(int*)(b + key_index);
  73. return aval < bval;
  74. };
  75. MemoryIterator begin(data.data(), nbytes);
  76. MemoryIterator end(data.data() + data.size(), nbytes);
  77. std::sort(begin, end, cmp);
  78. return 0;
  79. }

It complains that it cannot compare two MemoryIterator's which is true -- it needs to use the comparison function.

它抱怨它无法比

英文:

Question

I want to use std::sort for an opaque data type with a run-time record length. Here is an answer to a similar question.

Attempts

I have tried indirectly sorting record pointers and indices, but these methods have terrible cache locality because the comparisons need to access randomly distributed rows of the dataset.

I also tried falling back to the C functions qsort which allows for run-time record length. This is better, but it requires a callback function for the comparison. This has two drawbacks, 1) performance suffers compared to using a lambda which can be inlined, and 2) the interface ergonomics are poor because you (probably) have to create a structure to hold a functor for the comparison.

Possible Idea

The previously mentioned answer suggested a,

> pod block pseudo reference iterator with a specialized swap,

but did not give any hints about how that might work.

Summary

To summarize, I want a way to call std::sort passing in a lambda for the comparison for an array of opaque data types with run-time record length as the following code demonstrates,

  1. #include <iostream>
  2. #include <random>
  3. using std::cout, std::endl;
  4. int main(int argc, const char *argv[]) {
  5. // We have 100 records each with 50 bytes.
  6. int nrecords = 100, nbytes = 50;
  7. std::vector<uint8_t> data(nrecords * nbytes);
  8. // Generate random data for testing.
  9. std::uniform_int_distribution<uint8_t> d;
  10. std::mt19937_64 rng;
  11. std::generate(data.begin(), data.end(), [&]() { return d(rng); });
  12. int key_index = 20;
  13. auto cmp = [&](uint8_t *a, uint8_t *b) {
  14. int aval = *(int*)(a + key_index), bval = *(int*)(b + key_index);
  15. return aval < bval;
  16. };
  17. // How can I call std::sort with run-time record length?
  18. // std::sort(data.begin(), ...)
  19. return 0;
  20. }

Iterator Attempt

I attempted to write an iterator based on the previous suggestion, but I have not been able to get it to compile yet. My iterator-foo is sub-par and I feel like I am making a conceptual error.

Here is the code,

  1. #include <algorithm>
  2. #include <random>
  3. struct MemoryIterator {
  4. using iterator_category = std::forward_iterator_tag;
  5. using difference_type = std::ptrdiff_t;
  6. using value_type = uint8_t *;
  7. using pointer = value_type*;
  8. using reference = value_type&;
  9. MemoryIterator(value_type ptr, size_t element_size)
  10. : ptr_(ptr)
  11. , element_size_(element_size) {
  12. }
  13. reference operator*() {
  14. return ptr_;
  15. }
  16. MemoryIterator& operator++() {
  17. ptr_ += element_size_;
  18. return *this;
  19. }
  20. MemoryIterator& operator--() {
  21. ptr_ -= element_size_;
  22. return *this;
  23. }
  24. MemoryIterator operator++(int) {
  25. auto tmp = *this;
  26. ++(*this);
  27. return tmp;
  28. }
  29. MemoryIterator operator--(int) {
  30. auto tmp = *this;
  31. --(*this);
  32. return tmp;
  33. }
  34. MemoryIterator& operator+=(size_t n) {
  35. ptr_ += n * element_size_;
  36. return *this;
  37. }
  38. MemoryIterator operator+(size_t n) {
  39. auto r = *this;
  40. r += n;
  41. return r;
  42. }
  43. MemoryIterator& operator-=(size_t n) {
  44. ptr_ -= n * element_size_;
  45. return *this;
  46. }
  47. MemoryIterator operator-(size_t n) {
  48. auto r = *this;
  49. r -= n;
  50. return r;
  51. }
  52. friend bool operator==(const MemoryIterator& a, const MemoryIterator& b) {
  53. return a.ptr_ == b.ptr_;
  54. }
  55. friend difference_type operator-(const MemoryIterator& a, const MemoryIterator& b) {
  56. return a.ptr_ - b.ptr_;
  57. }
  58. friend void swap(MemoryIterator a, MemoryIterator b) {
  59. }
  60. private:
  61. value_type ptr_;
  62. size_t element_size_;
  63. };
  64. int main(int argc, const char *argv[]) {
  65. int nrecords = 100, nbytes = 50;
  66. std::vector<uint8_t> data(nrecords * nbytes);
  67. std::uniform_int_distribution<uint8_t> d;
  68. std::mt19937_64 rng;
  69. std::generate(data.begin(), data.end(), [&]() { return d(rng); });
  70. int key_index = 20;
  71. auto cmp = [&](uint8_t *a, uint8_t *b) {
  72. int aval = *(int*)(a + key_index), bval = *(int*)(b + key_index);
  73. return aval < bval;
  74. };
  75. MemoryIterator begin(data.data(), nbytes);
  76. MemoryIterator end(data.data() + data.size(), nbytes);
  77. std::sort(begin, end, cmp);
  78. return 0;
  79. }

It complains that it cannot compare two MemoryIterator's which is true -- it needs to use the comparison function.

Compilation Error

  1. /opt/local/libexec/llvm-16/bin/../include/c++/v1/__algorithm/sort.h:533:21: error:
  2. invalid operands to binary expression ('MemoryIterator' and 'MemoryIterator')
  3. if (__i >= __j)
  4. ~~~ ^ ~~~
  5. /opt/local/libexec/llvm-16/bin/../include/c++/v1/__algorithm/sort.h:639:8: note:
  6. in instantiation of function template specialization
  7. 'std::__introsort<std::_ClassicAlgPolicy, (lambda at
  8. sort0.cpp:92:16) &, MemoryIterator>' requested here
  9. std::__introsort<_AlgPolicy, _Compare>(__first, __last, __comp, __depth_limit);
  10. ^
  11. /opt/local/libexec/llvm-16/bin/../include/c++/v1/__algorithm/sort.h:699:10: note:
  12. in instantiation of function template specialization 'std::__sort<(lambda at
  13. sort0.cpp:92:16) &, MemoryIterator>' requested here
  14. std::__sort<_WrappedComp>(std::__unwrap_iter(__first), std::__unwrap_iter(__last), _...

答案1

得分: 3

以下是翻译好的部分:

这里是使std::sort按照您的要求工作所需的 Ingredient:

代理引用

代理引用是用户定义的类型,尽可能模仿C++引用。通过它们,字节数组中的元素实际上是由std::sort移动的。

请注意,对于更健壮的实现,也许执行的操作不仅仅是sort,您还会提供类似于标准库容器中的iteratorconst_iteratorConstRecordReference

  1. struct RecordReference {
  2. explicit RecordReference(void* data, size_t element_size):
  3. _data(data), _size(element_size) {}
  4. RecordReference(Record&);
  5. RecordReference(RecordReference const& rhs):
  6. _data(rhs.data()), _size(rhs.size()) {}
  7. /// 因为这表示引用,所以赋值代表引用的值的复制,而不是引用的复制
  8. RecordReference& operator=(RecordReference const& rhs) {
  9. assert(size() == rhs.size());
  10. std::memcpy(data(), rhs.data(), size());
  11. return *this;
  12. }
  13. RecordReference& operator=(Record const& rhs);
  14. /// 还有`swap`通过引用进行交换
  15. friend void swap(RecordReference a, RecordReference b) {
  16. assert(a.size() == b.size());
  17. size_t size = a.size();
  18. // 如果您真的很注重标准兼容性,也许不要使用alloca
  19. auto* buffer = (void*)alloca(size);
  20. std::memcpy(buffer, a.data(), size);
  21. std::memcpy(a.data(), b.data(), size);
  22. std::memcpy(b.data(), buffer, size);
  23. }
  24. void* data() const { return _data; }
  25. size_t size() const { return _size; }
  26. private:
  27. void* _data;
  28. size_t _size;
  29. };
  30. /// 在Record的定义之后:
  31. RecordReference::RecordReference(Record& rhs):
  32. _data(rhs.data()), _size(rhs.size()) {}
  33. RecordReference& RecordReference::operator=(Record const& rhs) {
  34. assert(size() == rhs.size());
  35. std::memcpy(data(), rhs.data(), size());
  36. return *this;
  37. }

用于表示记录的类类型

令人讨厌的是,我们甚至需要这个。我们需要它是因为libstdc++的std::sort实现(可能是任何实现)会创建一些栈分配的值副本,因此我们需要为我们的迭代器的value_type提供一些合理的东西,实际上表示一个记录。

我们提供了内联缓冲区以避免堆分配,直到一个合理的大小。您可以调整InlineSize参数以适应您的记录的预期大小。

  1. struct Record {
  2. static constexpr size_t InlineSize = 56;
  3. Record(RecordReference ref): _size(ref.size()) {
  4. if (is_inline()) {
  5. std::memcpy(&inline_data, ref.data(), size());
  6. }
  7. else {
  8. heap_data = std::malloc(size());
  9. std::memcpy(heap_data, ref.data(), size());
  10. }
  11. }
  12. Record(Record&& rhs) noexcept: _size(rhs.size()) {
  13. if (is_inline()) {
  14. std::memcpy(&inline_data, &rhs.inline_data, size());
  15. }
  16. else {
  17. heap_data = rhs.heap_data;
  18. rhs.heap_data = nullptr;
  19. }
  20. }
  21. ~Record() {
  22. if (!is_inline()) {
  23. std::free(heap_data);
  24. }
  25. }
  26. void* data() { return (void*)((Record const*)this)->data(); }
  27. void const* data() const { return is_inline() ? inline_data : heap_data; }
  28. size_t size() const { return _size; }
  29. private:
  30. bool is_inline() const { return _size <= InlineSize; }
  31. size_t _size;
  32. union {
  33. char inline_data[InlineSize];
  34. void* heap_data;
  35. };
  36. };

然而,似乎至少libc++的std::sort实现一次只会创建一个此类型的对象。这意味着我们可以预先分配一个thread_local缓冲区在栈上保存该单个对象。这样,我们可以防止为任何记录大小分配堆,但这依赖于std::sort的一个实现细节,并且可能在将来的构建中创建运行时错误,因此我不建议这样做。此外,这使得这些类型对许多其他算法都无用。

最后是您的迭代器类

正如注释中所指出的,它需要定义关系比较运算符(以满足_随机访问迭代器_的要求,或者仅仅是因为sort逻辑上需要它们)。

它通过value_typereference typedefs公开我们的RecordRecordReference类型。

  1. struct MemoryIterator {
  2. using iterator_category = std::random_access_iterator_tag;
  3. using difference_type = std::ptrdiff_t;
  4. using value_type = Record;
  5. using pointer = void*;
  6. using reference = RecordReference;
  7. MemoryIterator(pointer ptr, size_t size)
  8. : ptr_((char*)ptr)
  9. , size_(size) {
  10. }
  11. reference operator*() const {
  12. return reference(ptr_, size_);
  13. }
  14. reference operator[](size_t index) const {
  15. return *(*this + index);
  16. }
  17. MemoryIterator& operator++() {
  18. ptr_ += size_;
  19. return *this;
  20. }
  21. MemoryIterator& operator--() {
  22. ptr_ -= size_;
  23. return *this;
  24. }
  25. MemoryIterator operator++(int) {
  26. auto tmp = *this;
  27. ++(*this);
  28. return tmp;
  29. }
  30. MemoryIterator operator--(int) {
  31. auto tmp = *this;
  32. --(*this);
  33. return tmp;
  34. }
  35. MemoryIterator& operator+=(size_t n) {
  36. ptr_ += n * size_;
  37. return *this;
  38. }
  39. MemoryIterator operator+(size_t n) const {
  40. auto r = *this;
  41. r += n;
  42. return r;
  43. }
  44. MemoryIterator& operator-=(size_t n) {
  45. ptr_ -= n * size_;
  46. return *this;
  47. }
  48. MemoryIterator operator-(size_t n) const {
  49. auto r = *this;
  50. r -= n;
  51. return r;
  52. }
  53. friend bool operator==(const MemoryIterator& a,
  54. <details>
  55. <summary>英文:</summary>
  56. Here are the ingredients we need to make `std::sort` work the way you desire:
  57. ## Proxy references
  58. Proxy references are user defined types mimicking C++ references (as much as possible). Through them the elements in the byte array are actually moved around by `std::sort`.
  59. Note that for a more robust implementation that perhaps does more than `sort`, you would also supply a `ConstRecordReference`, akin to `iterator` and `const_iterator` in standard library containers.

struct RecordReference {
explicit RecordReference(void* data, size_t element_size):
_data(data), _size(element_size) {}

  1. RecordReference(Record&amp;);
  2. RecordReference(RecordReference const&amp; rhs):
  3. _data(rhs.data()), _size(rhs.size()) {}
  4. /// Because this represents a reference, an assignment represents a copy of
  5. /// the referred-to value, not of the reference
  6. RecordReference&amp; operator=(RecordReference const&amp; rhs) {
  7. assert(size() == rhs.size());
  8. std::memcpy(data(), rhs.data(), size());
  9. return *this;
  10. }
  11. RecordReference&amp; operator=(Record const&amp; rhs);
  12. /// Also `swap` swaps &#39;through&#39; the reference
  13. friend void swap(RecordReference a, RecordReference b) {
  14. assert(a.size() == b.size());
  15. size_t size = a.size();
  16. // Perhaps don&#39;t use alloca if you&#39;re really serious
  17. // about standard conformance
  18. auto* buffer = (void*)alloca(size);
  19. std::memcpy(buffer, a.data(), size);
  20. std::memcpy(a.data(), b.data(), size);
  21. std::memcpy(b.data(), buffer, size);
  22. }
  23. void* data() const { return _data; }
  24. size_t size() const { return _size; }

private:
void* _data;
size_t _size;
};

/// After the definition of Record (see below):

RecordReference::RecordReference(Record& rhs):
_data(rhs.data()), _size(rhs.size()) {}

RecordReference& RecordReference::operator=(Record const& rhs) {
assert(size() == rhs.size());
std::memcpy(data(), rhs.data(), size());
return *this;
}

  1. ## Class type to represent your records
  2. It is rather annoying that we even need this. We need it because libstdc++&#39;s `std::sort` implementation (and probably any implementation) creates some stack allocated copies of our values, so we need to supply something sensible for our iterators `value_type` that actually represents a record.
  3. We provide an inline buffer to avoid heap allocations up to a reasonable size. You can tune the `InlineSize` parameter to the expected size of your records.

struct Record {
static constexpr size_t InlineSize = 56;

  1. Record(RecordReference ref): _size(ref.size()) {
  2. if (is_inline()) {
  3. std::memcpy(&amp;inline_data, ref.data(), size());
  4. }
  5. else {
  6. heap_data = std::malloc(size());
  7. std::memcpy(heap_data, ref.data(), size());
  8. }
  9. }
  10. Record(Record&amp;&amp; rhs) noexcept: _size(rhs.size()) {
  11. if (is_inline()) {
  12. std::memcpy(&amp;inline_data, &amp;rhs.inline_data, size());
  13. }
  14. else {
  15. heap_data = rhs.heap_data;
  16. rhs.heap_data = nullptr;
  17. }
  18. }
  19. ~Record() {
  20. if (!is_inline()) {
  21. std::free(heap_data);
  22. }
  23. }
  24. void* data() { return (void*)((Record const*)this)-&gt;data(); }
  25. void const* data() const { return is_inline() ? inline_data : heap_data; }
  26. size_t size() const { return _size; }

private:
bool is_inline() const { return _size <= InlineSize; }

  1. size_t _size;
  2. union {
  3. char inline_data[InlineSize];
  4. void* heap_data;
  5. };

};

  1. It seems however that at least libc++&#39;s `std::sort` implementation only ever creates one object of this type at a single time. This means that we could preallocate a `thread_local` buffer on the stack to hold that single object. This way we prevent heap allocations for any record size, at the expense of relying on an implementation detail of `std::sort` and potentially creating _runtime_ errors in future builds, so I would not recommend this. Also this renders the types useless for many other algorithms.
  2. ## And finally your iterator class
  3. As noted in the comments it needs to have relational comparison operators defined (to satisfy the _random access iterator_ requirements or just because `sort` logically needs them).
  4. It exposes our `Record` and `RecordReference` types via the `value_type` and `reference` typedefs.

struct MemoryIterator {
using iterator_category = std::random_access_iterator_tag;
using difference_type = std::ptrdiff_t;
using value_type = Record;
using pointer = void*;
using reference = RecordReference;

  1. MemoryIterator(pointer ptr, size_t size)
  2. : ptr_((char*)ptr)
  3. , size_(size) {
  4. }
  5. reference operator*() const {
  6. return reference(ptr_, size_);
  7. }
  8. reference operator[](size_t index) const {
  9. return *(*this + index);
  10. }
  11. MemoryIterator&amp; operator++() {
  12. ptr_ += size_;
  13. return *this;
  14. }
  15. MemoryIterator&amp; operator--() {
  16. ptr_ -= size_;
  17. return *this;
  18. }
  19. MemoryIterator operator++(int) {
  20. auto tmp = *this;
  21. ++(*this);
  22. return tmp;
  23. }
  24. MemoryIterator operator--(int) {
  25. auto tmp = *this;
  26. --(*this);
  27. return tmp;
  28. }
  29. MemoryIterator&amp; operator+=(size_t n) {
  30. ptr_ += n * size_;
  31. return *this;
  32. }
  33. MemoryIterator operator+(size_t n) const {
  34. auto r = *this;
  35. r += n;
  36. return r;
  37. }
  38. MemoryIterator&amp; operator-=(size_t n) {
  39. ptr_ -= n * size_;
  40. return *this;
  41. }
  42. MemoryIterator operator-(size_t n) const {
  43. auto r = *this;
  44. r -= n;
  45. return r;
  46. }
  47. friend bool operator==(const MemoryIterator&amp; a, const MemoryIterator&amp; b) {
  48. assert(a.size_ == b.size_);
  49. return a.ptr_ == b.ptr_;
  50. }
  51. friend std::strong_ordering operator&lt;=&gt;(const MemoryIterator&amp; a, const MemoryIterator&amp; b) {
  52. assert(a.size_ == b.size_);
  53. return a.ptr_ &lt;=&gt; b.ptr_;
  54. }
  55. friend difference_type operator-(const MemoryIterator&amp; a, const MemoryIterator&amp; b) {
  56. assert(a.size_ == b.size_);
  57. return (a.ptr_ - b.ptr_) / a.size_;
  58. }

private:
char* ptr_;
size_t size_;
};

  1. [Here you can find a live demo of the code.][1]
  2. [1]: https://godbolt.org/z/aah7z9zcv
  3. </details>
  4. # 答案2
  5. **得分**: 1
  6. 这里是一些看起来有效的简单概念验证代码。它通过构建指向原始数据中的“记录”的结构体的第二个向量,然后对其进行排序来实现。通过精心制作合适的拷贝构造函数和拷贝赋值运算符,可以说服`std::sort`执行正确的操作。我不知道这是否是最佳方法,代码肯定可以更好地重构。
  7. 好的,让我们开始:
  8. ```cpp
  9. #include <iostream>
  10. #include <cstring>
  11. #include <algorithm>
  12. #include <vector>
  13. using std::cout, std::endl;
  14. struct sortable_record
  15. {
  16. sortable_record(int record_size, uint8_t *data, uint8_t *swap_buf) :
  17. record_size(record_size), data(data), swap_buf(swap_buf) { }
  18. sortable_record(const sortable_record &other)
  19. {
  20. data = other.swap_buf;
  21. operator=(other);
  22. }
  23. sortable_record &operator=(const sortable_record &other)
  24. {
  25. record_size = other.record_size;
  26. swap_buf = other.swap_buf;
  27. memcpy(data, other.data, record_size);
  28. return *this;
  29. }
  30. int record_size = 0;
  31. uint8_t *data = nullptr;
  32. uint8_t *swap_buf = nullptr;
  33. };
  34. int main() {
  35. int nrecords = 10, nbytes = 50;
  36. std::vector<uint8_t> data(nrecords * nbytes);
  37. // 生成(简化的)测试数据。
  38. for (int i = 0; i < nrecords; ++i)
  39. {
  40. std::fill(&data[i * nbytes], &data[i * nbytes] + nbytes, 'z' - i);
  41. *(&data[i * nbytes] + nbytes - 1) = 0;
  42. }
  43. // 填充 sortable_record 的向量
  44. std::vector<uint8_t> swap_buf(nbytes);
  45. std::vector<sortable_record> sort_me;
  46. sort_me.reserve(nrecords);
  47. for (int i = 0; i < nrecords; ++i)
  48. sort_me.emplace_back(nbytes, &data[i * nbytes], swap_buf.data());
  49. int key_index = 20;
  50. auto cmp = [key_index](const sortable_record &a, const sortable_record &b) {
  51. uint8_t aval = a.data[key_index];
  52. uint8_t bval = b.data[key_index];
  53. return aval < bval;
  54. };
  55. std::sort(sort_me.begin(), sort_me.end(), cmp);
  56. for (int i = 0; i < nrecords; ++i)
  57. std::cout << (const char *)&data[i * nbytes] << endl;
  58. }

请注意,如果没有sort_me.reserve(nrecords);,会发生糟糕的事情,因此构建sort_me的代码应该全部封装在一个好的工厂函数中。

在线演示

英文:

Here is some simple proof-of-concept code that seems to work. It works by constructing a second vector of structs pointing at the 'records' in the original data and then sorts it.

std::sort can then be persuaded to do the right thing by crafting a suitable copy constructor and copy assignment operator. I don't know if this is optimal, and the code could certainly be factored better.

OK, here we go:

  1. #include &lt;iostream&gt;
  2. #include &lt;cstring&gt;
  3. #include &lt;algorithm&gt;
  4. using std::cout, std::endl;
  5. struct sortable_record
  6. {
  7. sortable_record (int record_size, uint8_t *data, uint8_t *swap_buf) :
  8. record_size (record_size), data (data), swap_buf (swap_buf) { }
  9. sortable_record (const sortable_record &amp;other)
  10. {
  11. data = other.swap_buf;
  12. operator= (other);
  13. }
  14. sortable_record &amp;operator= (const sortable_record &amp;other)
  15. {
  16. record_size = other.record_size;
  17. swap_buf = other.swap_buf;
  18. memcpy (data, other.data, record_size);
  19. return *this;
  20. }
  21. int record_size = 0;
  22. uint8_t *data = nullptr;
  23. uint8_t *swap_buf = nullptr;
  24. };
  25. int main() {
  26. int nrecords = 10, nbytes = 50;
  27. std::vector&lt;uint8_t&gt; data(nrecords * nbytes);
  28. // Generate (simplified!) data for testing.
  29. for (int i = 0; i &lt; nrecords; ++i)
  30. {
  31. std::fill (&amp;data [i * nbytes], &amp;data [i * nbytes] + nbytes, &#39;z&#39; - i);
  32. *(&amp;data [i * nbytes] + nbytes - 1) = 0;
  33. }
  34. // Populate vector of sortable_record&#39;s
  35. std::vector&lt;uint8_t&gt; swap_buf (nbytes);
  36. std::vector&lt;sortable_record&gt; sort_me;
  37. sort_me.reserve (nrecords);
  38. for (int i = 0; i &lt; nrecords; ++i)
  39. sort_me.emplace_back (nbytes, &amp;data [i * nbytes], swap_buf.data ());
  40. int key_index = 20;
  41. auto cmp = [key_index](const sortable_record &amp;a, const sortable_record &amp;b) {
  42. uint8_t aval = a.data [key_index];
  43. uint8_t bval = b.data [key_index];
  44. return aval &lt; bval;
  45. };
  46. std::sort (sort_me.begin(), sort_me.end(), cmp);
  47. for (int i = 0; i &lt; nrecords; ++i)
  48. std::cout &lt;&lt; (const char *) &amp;data [i * nbytes] &lt;&lt; endl;
  49. }

Note that bad things will happen without sort_me.reserve (nrecords);, so the code building sort_me should all be bundled up in a nice factory function.

Live demo

huangapple
  • 本文由 发表于 2023年5月28日 06:57:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76349341.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定