2023年5月29日 02:28:23go评论69阅读模式

英文:

MapReduce word count using C++ transform_reduce() function with a parallel execution policy

问题

我有一个单词字符串，我想要统计每个单词的出现次数，并将结果存储在一个映射中。我想要使用std::transform_reduce()以便利用其并行处理选项，并在一个更大的数据集上使用它。

例如：

std::string text = "apple orange banana apple apple orange";

std::istringstream iss(text);

// 将这些单词放入一个向量中
// 定义一个向量（使用CTAD），使用其范围构造函数和std::istream_iterator作为迭代器
std::vector words(std::istream_iterator<std::string>(iss), {});

// 目标：使用transform_reduce()来填充一个无序映射，将每个单词映射到其单词计数

std::unordered_map<std::string, int> wordCount;

// 使用transform_reduce来计算出现次数
std::transform_reduce(
    words.begin(),
    words.end(),
    wordCount.begin(),  // 使用无序映射wordCount作为输出容器
    [](const auto& word) { return std::pair<std::string, int>(word, 1); }, // 将单词转换为键值对
    [](const auto& left, const auto& right) { return left + right; } // 合并函数，用于计数
);

在C++17或更高版本中，你可以使用上面的代码片段来使用std::transform_reduce()来实现你的目标。

注意：chatGPT之前提供的代码不起作用的原因可能是因为它没有提供输出容器，而std::transform_reduce()需要一个输出容器来存储结果。上面的代码在wordCount上使用无序映射作为输出容器来解决这个问题。

英文:

I have a string of words and I would like to count the number of occurrences of each word, and store the results in a map. I want to use std::transform_reduce() in order to take advantage of it's parallel processing option and use it on a much larger dataset.

e.g.

std::string text = &quot;apple orange banana apple apple orange&quot;;

std::istringstream iss(text);

// Put these words into a vector
// Define a vector (CTAD), use its range constructor and the std::istream_iterator as iterator
std::vector words(std::istream_iterator&lt;std::string&gt;(iss), {});

// Aim: Use transform_reduce() to populate an unordered_map that maps each word to its word count

std::unordered_map&lt;std::string, int&gt; wordCount;

// Count the occurrences using transform_reduce
std::transform_reduce(
    words.begin(),
    words.end(), 
    // use the unordered_map wordCount somehow?
    // [](){ lambda for reduce to populate the unordered_map wordCount}
    // [](){ lambda for transforming the words vector of data to pairs: e.g. (&quot;apple&quot;, 1) }
);

How would I achieve this in C++17 or later using transform_reduce()?

Note: chatGPT had a stab at it, but its code didn't compile and I cannot see how to make it work.

答案1

得分: 3

我会使用 std::ranges::subrange，然后使用基于范围的for循环。

#include <unordered_map>
#include <string>
#include <sstream>
#include <iostream>
#include <ranges>
#include <iterator>



int main()
{
    using StreamWordIter = std::istream_iterator<std::string>;
    using SubRange       = std::ranges::subrange;

    std::string        text = "apple orange banana apple apple orange";    
    std::istringstream iss(text);

    std::unordered_map<std::string, int> wordCount;
    for (auto const& word: SubRange(StreamWordIter{iss}, StreamWordIter{})) {
        ++wordCount[word];
    }

    for (auto const& [word, count]: wordCount) {
        std::cout << "  Word: " << word << " => " << count << "\n";
    }
}

英文:

I would use a std::ranges::subrange then use a range based for loop.

#include &lt;unordered_map&gt;
#include &lt;string&gt;
#include &lt;sstream&gt;
#include &lt;iostream&gt;
#include &lt;ranges&gt;
#include &lt;iterator&gt;



int main()
{
    using StreamWordIter = std::istream_iterator&lt;std::string&gt;;
    using SubRange       = std::ranges::subrange;

    std::string        text = &quot;apple orange banana apple apple orange&quot;;    
    std::istringstream iss(text);

    std::unordered_map&lt;std::string, int&gt; wordCount;
    for (auto const&amp; word: SubRange(StreamWordIter{iss}, StreamWordIter{})) {
        ++wordCount[word];
    }

    for (auto const&amp; [word, count]: wordCount) {
        std::cout &lt;&lt; &quot;  Word: &quot; &lt;&lt; word &lt;&lt; &quot; =&gt; &quot; &lt;&lt; count &lt;&lt; &quot;\n&quot;;
    }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

MapReduce 使用 C++ transform_reduce() 函数与并行执行策略的单词计数

问题

答案1

operator== on C++ structs with conversion operators

Comunication between Host and Docker Container using FIFO pipes by bind mount (Linux)

调用pthread_sigmask在创建线程之前是否是线程安全的？

如何在我的情况下生成Python Swig包装器而不需要任何源文件？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论