MapReduce 使用 C++ transform_reduce() 函数与并行执行策略的单词计数

huangapple go评论63阅读模式
英文:

MapReduce word count using C++ transform_reduce() function with a parallel execution policy

问题

我有一个单词字符串,我想要统计每个单词的出现次数,并将结果存储在一个映射中。我想要使用std::transform_reduce()以便利用其并行处理选项,并在一个更大的数据集上使用它。

例如:

std::string text = "apple orange banana apple apple orange";

std::istringstream iss(text);

// 将这些单词放入一个向量中
// 定义一个向量(使用CTAD),使用其范围构造函数和std::istream_iterator作为迭代器
std::vector words(std::istream_iterator<std::string>(iss), {});

// 目标:使用transform_reduce()来填充一个无序映射,将每个单词映射到其单词计数

std::unordered_map<std::string, int> wordCount;

// 使用transform_reduce来计算出现次数
std::transform_reduce(
    words.begin(),
    words.end(),
    wordCount.begin(),  // 使用无序映射wordCount作为输出容器
    [](const auto& word) { return std::pair<std::string, int>(word, 1); }, // 将单词转换为键值对
    [](const auto& left, const auto& right) { return left + right; } // 合并函数,用于计数
);

在C++17或更高版本中,你可以使用上面的代码片段来使用std::transform_reduce()来实现你的目标。

注意:chatGPT之前提供的代码不起作用的原因可能是因为它没有提供输出容器,而std::transform_reduce()需要一个输出容器来存储结果。上面的代码在wordCount上使用无序映射作为输出容器来解决这个问题。

英文:

I have a string of words and I would like to count the number of occurrences of each word, and store the results in a map. I want to use std::transform_reduce() in order to take advantage of it's parallel processing option and use it on a much larger dataset.

e.g.

std::string text = &quot;apple orange banana apple apple orange&quot;;

std::istringstream iss(text);

// Put these words into a vector
// Define a vector (CTAD), use its range constructor and the std::istream_iterator as iterator
std::vector words(std::istream_iterator&lt;std::string&gt;(iss), {});

// Aim: Use transform_reduce() to populate an unordered_map that maps each word to its word count

std::unordered_map&lt;std::string, int&gt; wordCount;

// Count the occurrences using transform_reduce
std::transform_reduce(
    words.begin(),
    words.end(), 
    // use the unordered_map wordCount somehow?
    // [](){ lambda for reduce to populate the unordered_map wordCount}
    // [](){ lambda for transforming the words vector of data to pairs: e.g. (&quot;apple&quot;, 1) }
);

How would I achieve this in C++17 or later using transform_reduce()?

Note: chatGPT had a stab at it, but its code didn't compile and I cannot see how to make it work.

答案1

得分: 3

我会使用 std::ranges::subrange,然后使用基于范围的for循环。

#include <unordered_map>
#include <string>
#include <sstream>
#include <iostream>
#include <ranges>
#include <iterator>



int main()
{
    using StreamWordIter = std::istream_iterator<std::string>;
    using SubRange       = std::ranges::subrange;

    std::string        text = "apple orange banana apple apple orange";    
    std::istringstream iss(text);

    std::unordered_map<std::string, int> wordCount;
    for (auto const& word: SubRange(StreamWordIter{iss}, StreamWordIter{})) {
        ++wordCount[word];
    }

    for (auto const& [word, count]: wordCount) {
        std::cout << "  Word: " << word << " => " << count << "\n";
    }
}
英文:

I would use a std::ranges::subrange then use a range based for loop.

#include &lt;unordered_map&gt;
#include &lt;string&gt;
#include &lt;sstream&gt;
#include &lt;iostream&gt;
#include &lt;ranges&gt;
#include &lt;iterator&gt;



int main()
{
    using StreamWordIter = std::istream_iterator&lt;std::string&gt;;
    using SubRange       = std::ranges::subrange;

    std::string        text = &quot;apple orange banana apple apple orange&quot;;    
    std::istringstream iss(text);

    std::unordered_map&lt;std::string, int&gt; wordCount;
    for (auto const&amp; word: SubRange(StreamWordIter{iss}, StreamWordIter{})) {
        ++wordCount[word];
    }

    for (auto const&amp; [word, count]: wordCount) {
        std::cout &lt;&lt; &quot;  Word: &quot; &lt;&lt; word &lt;&lt; &quot; =&gt; &quot; &lt;&lt; count &lt;&lt; &quot;\n&quot;;
    }
}

huangapple
  • 本文由 发表于 2023年5月29日 02:28:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76353012.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定