使用ranges-v3将平面表格转换为树状结构

huangapple go评论72阅读模式
英文:

converting a flat table to a tree structure using ranges-v3

问题

以下是翻译的代码部分:

// 用于唯一性比较的比较器
static const auto customComp = [](const auto& lhs, const auto& rhs) {
    return std::tie(lhs.filename, lhs.function, lhs.lineNum, lhs.type) <
        std::tie(rhs.filename, rhs.function, rhs.lineNum, rhs.type);
};

int main() {
    print("unsorted vector", structs);
    // 对结构进行排序
    actions::sort(structs, customComp);
    const auto outerComp = [](auto&& lhs, auto&& rhs) {
        return lhs.filename == rhs.filename;
    };
    const auto innerComp = [](auto&& lhs, auto&& rhs) {
        return lhs.function == rhs.function;
    };
    print("sorted vector", structs);
    std::cout << std::endl;
    // 按文件名将排序后的探测点列表分组
    for (const auto& sources : structs | views::chunk_by(outerComp)) {
        auto foo = sources.size();
        for (const auto& next : sources) {
            auto outcomes = 0;
            // 按函数名将探测点列表分组
            for (const auto& functions : sources | views::chunk_by(innerComp)) {
                for (const auto& probe : functions) {
                    outcomes += (probe.type == Type::Two) ? 2 : 1;
                    std::cout << std::format("{}\n", probe);
                }
            }
            std::cout << next.filename << " outcomes [" << outcomes << "]\n";
            break;
        }
        std::cout << "\n";
    }
}

请注意,这段代码已经根据您提供的信息进行了翻译。如果您需要进一步的帮助或有其他问题,请随时告诉我。

英文:

I have a flat representation of a tree shown in the table below.
The unsorted data, std::vector<MyStruct> is:

unsorted vector
(id)    (path)          (fn)    (line)  (extra)
1       /abc/file3.c    foo0    10      1
2       /abc/file3.c    foo0    15      2
3       /abc/file3.c    foo0    20      1
4       /abc/file3.c    foo1    30      1
5       /abc/file3.c    foo1    35      2
6       /abc/file3.c    foo1    40      1
7       /abc/file1.c    foo2    10      1
8       /abc/file1.c    foo2    15      2
9       /abc/file1.c    foo2    20      1
10      /abc/file3.c    baz1    70      1
11      /abc/file3.c    baz1    75      2
12      /abc/file3.c    baz1    80      1
13      /abc/file2.c    bat     10      1
14      /abc/file2.c    bat     15      2
15      /abc/file2.c    bat     17      2
16      /abc/file2.c    bat     20      1
17      /def/file2.c    baz     70      1
18      /def/file2.c    baz     71      1
19      /def/file2.c    baz     72      1
20      /def/file2.c    baz     73      1

The columns represent 'ID', 'path', 'function', 'linenumber' and 'extra'. The data in tree form is hierarchically ordered as path->funcion->lineNumber (each path contains multiple functions, which contains multiple lines of interest (probe points)).

Each row in this table is represented with this struct:

using Type = enum class Type : unsigned {
    One = 1,
    Two = 2
};

using MyStruct = struct MyStruct {
    unsigned id;
    std::string filename;
    std::string function;
    unsigned lineNum;
    Type type;
};

After sorting this data using the hierarchy described above (via the following comparator)

// comparator used for unique
static const auto customComp = [](const auto&amp; lhs, const auto&amp; rhs) {
    return std::tie(lhs.filename, lhs.function, lhs.lineNum, lhs.type) &lt;
        std::tie(rhs.filename, rhs.function, rhs.lineNum, rhs.type);
    };

We end up with the correctly ordered vector:

sorted vector
(id)    (path)          (fn)    (line)  (extra)
7       /abc/file1.c    foo2    10      1
8       /abc/file1.c    foo2    15      2
9       /abc/file1.c    foo2    20      1
13      /abc/file2.c    bat     10      1
14      /abc/file2.c    bat     15      2
15      /abc/file2.c    bat     17      2
16      /abc/file2.c    bat     20      1
10      /abc/file3.c    baz1    70      1
11      /abc/file3.c    baz1    75      2
12      /abc/file3.c    baz1    80      1
1       /abc/file3.c    foo0    10      1
2       /abc/file3.c    foo0    15      2
3       /abc/file3.c    foo0    20      1
4       /abc/file3.c    foo1    30      1
5       /abc/file3.c    foo1    35      2
6       /abc/file3.c    foo1    40      1
17      /def/file2.c    baz     70      1
18      /def/file2.c    baz     71      1
19      /def/file2.c    baz     72      1
20      /def/file2.c    baz     73      1

I need to parse this data using the new ranges or ranges-v3 API to efficiently recreate the tree structure from which the table originated. I specify ranges here firstly as I am learning my way through this complicated API, but also because the API seems to show a very efficient way of handling large data sets by lazy evaluation).

The following code works (which is also in godbolt), however it seems wrong. I am using a pair of nested ranges chunk_by loops to parse the data. I need to terminate the outer loop early by a break.

The main body of the code is here:

// comparator used for unique
static const auto customComp = [](const auto&amp; lhs, const auto&amp; rhs) {
    return std::tie(lhs.filename, lhs.function, lhs.lineNum, lhs.type) &lt;
        std::tie(rhs.filename, rhs.function, rhs.lineNum, rhs.type);
    };

int
main() {
    print(&quot;unsorted vector&quot;, structs);
    // split the sorted probes into chunks
    actions::sort(structs, customComp);
    const auto outerComp = [](auto&amp;&amp; lhs, auto&amp;&amp; rhs) {
            return lhs.filename == rhs.filename;
        };
    const auto innerComp = [](auto&amp;&amp; lhs, auto&amp;&amp; rhs) {
            return lhs.function == rhs.function;
        };
    print(&quot;sorted vector&quot;, structs);
    std::cout &lt;&lt; std::endl;
    // split sorted list of probes into chunks by filename
    for (const auto&amp; sources : structs | views::chunk_by(outerComp)) {
        auto foo = sources.size();
        for (const auto&amp; next : sources) {
            auto outcomes = 0;
            for (const auto&amp; functions : sources | views::chunk_by(innerComp)) {
                for (const auto&amp; probe : functions) {
                    outcomes += (probe.type == Type::Two) ? 2 : 1;
                    std::cout &lt;&lt; std::format(&quot;{}\n&quot;, probe);
                }
            }
            std::cout &lt;&lt; next.filename &lt;&lt; &quot; outcomes [&quot; &lt;&lt; outcomes &lt;&lt; &quot;]\n&quot;;
            break;
        }
        std::cout &lt;&lt; &quot;\n&quot;;
    }
}

Would it be possible to perform the sort and double chunking on a single for loop? I would ideally like to use the composition form of the ranges API to achieve the best result.

答案1

得分: 1

鉴于每个记录都包含执行此操作所需的所有信息,我只需创建表示树结构的东西,并将记录插入其中,而不是对记录进行排序,然后从排序后的记录中解析出范围。

#include &lt;iostream&gt;
#include &lt;sstream&gt;
#include &lt;algorithm&gt;
#include &lt;iterator&gt;
#include &lt;map&gt;
#include &lt;string&gt;

// 保持代码自包含,但在实际使用中,无疑要从文件或类似文件中读取原始数据。
char const *rawData = R&quot;(
1       /abc/file3.c    foo0    10      1
2       /abc/file3.c    foo0    15      2
3       /abc/file3.c    foo0    20      1
4       /abc/file3.c    foo1    30      1
5       /abc/file3.c    foo1    35      2
6       /abc/file3.c    foo1    40      1
7       /abc/file1.c    foo2    10      1
8       /abc/file1.c    foo2    15      2
9       /abc/file1.c    foo2    20      1
10      /abc/file3.c    baz1    70      1
11      /abc/file3.c    baz1    75      2
12      /abc/file3.c    baz1    80      1
13      /abc/file2.c    bat     10      1
14      /abc/file2.c    bat     15      2
15      /abc/file2.c    bat     17      2
16      /abc/file2.c    bat     20      1
17      /def/file2.c    baz     70      1
18      /def/file2.c    baz     71      1
19      /def/file2.c    baz     72      1
20      /def/file2.c    baz     73      1
)&quot;;

struct record {
    int id;
    std::string path;
    std::string fn;
    int lineNumber;
    int type;

    bool operator&lt;(record const &amp;rhs) const { 
        return std::tie(path, fn, lineNumber, type) &lt; std::tie(rhs.path, rhs.fn, rhs.lineNumber, rhs.type);
    }

    friend std::istream &amp;operator&gt;&gt;(std::istream &amp;is, record &amp;r) { 
        return is &gt;&gt; r.id &gt;&gt; r.path &gt;&gt; r.fn &gt;&gt; r.lineNumber &gt;&gt; r.type;
    }
    friend std::ostream &amp;operator&lt;&lt;(std::ostream &amp;os, record const &amp;r) { 
        return os &lt;&lt; r.id &lt;&lt; &quot;\t&quot; &lt;&lt; r.path &lt;&lt; &quot;\t&quot; &lt;&lt; r.fn &lt;&lt; &quot;\t&quot; &lt;&lt; r.lineNumber &lt;&lt; &quot;\t&quot; &lt;&lt; r.type;
    }
};

struct Probe {
    int line;
    int type;

    friend std::ostream &amp;operator&lt;&lt;(std::ostream &amp;os, Probe const &amp;p) { 
        return os &lt;&lt; &quot;\t\t&quot; &lt;&lt; p.line &lt;&lt; &quot; &quot; &lt;&lt; p.type;
    }
};

class FuncRec { 
    std::vector&lt;Probe&gt; probes;
public:

    void insert(record const &amp;rec) { 
        probes.push_back(Probe{rec.lineNumber, rec.type});
    }

    friend std::ostream &amp;operator&lt;&lt;(std::ostream &amp;os, FuncRec const &amp;f) { 
        for (auto const &amp;p : f.probes) {
            os &lt;&lt; p &lt;&lt; &quot;\n&quot;;
        }
        return os;
    }
};

class FileRec { 
    std::map&lt;std::string, FuncRec&gt; functions;
public:
    void insert(record const &amp;rec) { 
        functions[rec.fn].insert(rec);
    }

    friend std::ostream &amp;operator&lt;&lt;(std::ostream &amp;os, FileRec const &amp;f) {
        for (auto const &amp;f : f.functions) { 
            os &lt;&lt; &quot;\t&quot; &lt;&lt; f.first &lt;&lt; &quot;\n&quot;;
            os &lt;&lt; f.second;
        }
        return os;
    }
};

class Tree {
    std::map&lt;std::string, FileRec&gt; files;

    void insert(record const &amp;rec) { 
        files[rec.path].insert(rec);
    }

public:

    Tree(std::vector&lt;record&gt; const &amp;in) {
        for (auto const &amp;r : in)
            insert(r);
    }

    friend std::ostream &amp;operator&lt;&lt;(std::ostream &amp;os, Tree const &amp;t) {
        for (auto const &amp;f : t.files) { 
            os &lt;&lt; f.first &lt;&lt; &quot;\n&quot;;
            os &lt;&lt; f.second;
        }
        return os;
    }
};

int main() {
    std::stringstream infile(rawData);

    std::vector&lt;record&gt; recs { std::istream_iterator&lt;record&gt;(infile), {}};

    Tree tree{recs};

    std::cout &lt;&lt; &quot;Tree struture:\n&quot;;
    std::cout &lt;&lt; tree;

    // 如果您还想显示已排序的结构体:
    std::cout &lt;&lt; &quot;\nSorted records:\n&quot;;
    std::sort(recs.begin(), recs.end());
    for (auto const &amp;r : recs) {
        std::cout &lt;&lt; r &lt;&lt; &quot;\n&quot;;
    }
    std::cout &lt;&lt; &quot;\n&quot;;
}

这可能比实际需要的要复杂一些。例如,FuncRec 实际上没有什么用处。我们可以将 Probe 向量嵌入到其父类中(但我假设这是更复杂版本的简化版本,其中 FuncRec 可能会起到更多的作用)。

英文:

Given that each record contains all the information necessary to do so, I'd just create something to represent the tree structure, and insert records into it, rather than sort, and then parse out ranges from the sorted records.

#include &lt;iostream&gt;
#include &lt;sstream&gt;
#include &lt;algorithm&gt;
#include &lt;iterator&gt;
#include &lt;map&gt;
#include &lt;string&gt;

// Keep the code self-contained, though in real use you undoubtedly want to 
// read the raw data from a file, or something on that order.
char const *rawData = R&quot;(
1       /abc/file3.c    foo0    10      1
2       /abc/file3.c    foo0    15      2
3       /abc/file3.c    foo0    20      1
4       /abc/file3.c    foo1    30      1
5       /abc/file3.c    foo1    35      2
6       /abc/file3.c    foo1    40      1
7       /abc/file1.c    foo2    10      1
8       /abc/file1.c    foo2    15      2
9       /abc/file1.c    foo2    20      1
10      /abc/file3.c    baz1    70      1
11      /abc/file3.c    baz1    75      2
12      /abc/file3.c    baz1    80      1
13      /abc/file2.c    bat     10      1
14      /abc/file2.c    bat     15      2
15      /abc/file2.c    bat     17      2
16      /abc/file2.c    bat     20      1
17      /def/file2.c    baz     70      1
18      /def/file2.c    baz     71      1
19      /def/file2.c    baz     72      1
20      /def/file2.c    baz     73      1
)&quot;;

struct record {
    int id;
    std::string path;
    std::string fn;
    int lineNumber;
    int type;

    bool operator&lt;(record const &amp;rhs) const { 
        return std::tie(path, fn, lineNumber, type) &lt; std::tie(rhs.path, rhs.fn, rhs.lineNumber, rhs.type);
    }

    friend std::istream &amp;operator&gt;&gt;(std::istream &amp;is, record &amp;r) { 
        return is &gt;&gt; r.id &gt;&gt; r.path &gt;&gt; r.fn &gt;&gt; r.lineNumber &gt;&gt; r.type;
    }
    friend std::ostream &amp;operator&lt;&lt;(std::ostream &amp;os, record const &amp;r) { 
        return os &lt;&lt; r.id &lt;&lt; &quot;\t&quot; &lt;&lt; r.path &lt;&lt; &quot;\t&quot; &lt;&lt; r.fn &lt;&lt; &quot;\t&quot; &lt;&lt; r.lineNumber &lt;&lt; &quot;\t&quot; &lt;&lt; r.type;
    }
};

struct Probe {
    int line;
    int type;

    friend std::ostream &amp;operator&lt;&lt;(std::ostream &amp;os, Probe const &amp;p) { 
        return os &lt;&lt; &quot;\t\t&quot; &lt;&lt; p.line &lt;&lt; &quot; &quot; &lt;&lt; p.type;
    }
};

class FuncRec { 
    std::vector&lt;Probe&gt; probes;
public:

    void insert(record const &amp;rec) { 
        probes.push_back(Probe{rec.lineNumber, rec.type});
    }

    friend std::ostream &amp;operator&lt;&lt;(std::ostream &amp;os, FuncRec const &amp;f) { 
        for (auto const &amp;p : f.probes) {
            os &lt;&lt; p &lt;&lt; &quot;\n&quot;;
        }
        return os;
    }
};

class FileRec { 
    std::map&lt;std::string, FuncRec&gt; functions;
public:
    void insert(record const &amp;rec) { 
        functions[rec.fn].insert(rec);
    }

    friend std::ostream &amp;operator&lt;&lt;(std::ostream &amp;os, FileRec const &amp;f) {
        for (auto const &amp;f : f.functions) { 
            os &lt;&lt; &quot;\t&quot; &lt;&lt; f.first &lt;&lt; &quot;\n&quot;;
            os &lt;&lt; f.second;
        }
        return os;
    }
};

class Tree {
    std::map&lt;std::string, FileRec&gt; files;

    void insert(record const &amp;rec) { 
        files[rec.path].insert(rec);
    }

public:

    Tree(std::vector&lt;record&gt; const &amp;in) {
        for (auto const &amp;r : in)
            insert(r);
    }

    friend std::ostream &amp;operator&lt;&lt;(std::ostream &amp;os, Tree const &amp;t) {
        for (auto const &amp;f : t.files) { 
            os &lt;&lt; f.first &lt;&lt; &quot;\n&quot;;
            os &lt;&lt; f.second;
        }
        return os;
    }
};

int main() {
    std::stringstream infile(rawData);

    std::vector&lt;record&gt; recs { std::istream_iterator&lt;record&gt;(infile), {}};

    Tree tree{recs};

    std::cout &lt;&lt; &quot;Tree struture:\n&quot;;
    std::cout &lt;&lt; tree;

    // In case you also want to show sorted structs:
    std::cout &lt;&lt; &quot;\nSorted records:\n&quot;;
    std::sort(recs.begin(), recs.end());
    for (auto const &amp;r : recs) {
        std::cout &lt;&lt; r &lt;&lt; &quot;\n&quot;;
    }
    std::cout &lt;&lt; &quot;\n&quot;;

}

This is probably a bit more elaborate than really needed. For example, FuncRec doesn't really accomplish much. We could just embed the vector of Probes in its parent (but I'm assuming this is kind of a simplified version of something more elaborate, where FuncRec might serve more purpose.

huangapple
  • 本文由 发表于 2023年7月17日 21:56:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76705175.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定