英文:
MapReduce word count using C++ transform_reduce() function with a parallel execution policy
问题
我有一个单词字符串,我想要统计每个单词的出现次数,并将结果存储在一个映射中。我想要使用std::transform_reduce()
以便利用其并行处理选项,并在一个更大的数据集上使用它。
例如:
std::string text = "apple orange banana apple apple orange";
std::istringstream iss(text);
// 将这些单词放入一个向量中
// 定义一个向量(使用CTAD),使用其范围构造函数和std::istream_iterator作为迭代器
std::vector words(std::istream_iterator<std::string>(iss), {});
// 目标:使用transform_reduce()来填充一个无序映射,将每个单词映射到其单词计数
std::unordered_map<std::string, int> wordCount;
// 使用transform_reduce来计算出现次数
std::transform_reduce(
words.begin(),
words.end(),
wordCount.begin(), // 使用无序映射wordCount作为输出容器
[](const auto& word) { return std::pair<std::string, int>(word, 1); }, // 将单词转换为键值对
[](const auto& left, const auto& right) { return left + right; } // 合并函数,用于计数
);
在C++17或更高版本中,你可以使用上面的代码片段来使用std::transform_reduce()
来实现你的目标。
注意:chatGPT之前提供的代码不起作用的原因可能是因为它没有提供输出容器,而std::transform_reduce()
需要一个输出容器来存储结果。上面的代码在wordCount
上使用无序映射作为输出容器来解决这个问题。
英文:
I have a string of words and I would like to count the number of occurrences of each word, and store the results in a map. I want to use std::transform_reduce() in order to take advantage of it's parallel processing option and use it on a much larger dataset.
e.g.
std::string text = "apple orange banana apple apple orange";
std::istringstream iss(text);
// Put these words into a vector
// Define a vector (CTAD), use its range constructor and the std::istream_iterator as iterator
std::vector words(std::istream_iterator<std::string>(iss), {});
// Aim: Use transform_reduce() to populate an unordered_map that maps each word to its word count
std::unordered_map<std::string, int> wordCount;
// Count the occurrences using transform_reduce
std::transform_reduce(
words.begin(),
words.end(),
// use the unordered_map wordCount somehow?
// [](){ lambda for reduce to populate the unordered_map wordCount}
// [](){ lambda for transforming the words vector of data to pairs: e.g. ("apple", 1) }
);
How would I achieve this in C++17 or later using transform_reduce()?
Note: chatGPT had a stab at it, but its code didn't compile and I cannot see how to make it work.
答案1
得分: 3
我会使用 std::ranges::subrange
,然后使用基于范围的for循环。
#include <unordered_map>
#include <string>
#include <sstream>
#include <iostream>
#include <ranges>
#include <iterator>
int main()
{
using StreamWordIter = std::istream_iterator<std::string>;
using SubRange = std::ranges::subrange;
std::string text = "apple orange banana apple apple orange";
std::istringstream iss(text);
std::unordered_map<std::string, int> wordCount;
for (auto const& word: SubRange(StreamWordIter{iss}, StreamWordIter{})) {
++wordCount[word];
}
for (auto const& [word, count]: wordCount) {
std::cout << " Word: " << word << " => " << count << "\n";
}
}
英文:
I would use a std::ranges::subrange
then use a range based for loop.
#include <unordered_map>
#include <string>
#include <sstream>
#include <iostream>
#include <ranges>
#include <iterator>
int main()
{
using StreamWordIter = std::istream_iterator<std::string>;
using SubRange = std::ranges::subrange;
std::string text = "apple orange banana apple apple orange";
std::istringstream iss(text);
std::unordered_map<std::string, int> wordCount;
for (auto const& word: SubRange(StreamWordIter{iss}, StreamWordIter{})) {
++wordCount[word];
}
for (auto const& [word, count]: wordCount) {
std::cout << " Word: " << word << " => " << count << "\n";
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论