如何以最快的方式在C++中创建/初始化自定义对象?

huangapple go评论67阅读模式
英文:

How to create/initialize custom objects in C++ the fastest way possible?

问题

I have a function that creates a vector of Nodes and Edges from a vector of Profiles. Its working as intended, but this function is by far the slowest part of my full script. It takes around 150 milliseconds to run.

Machine(s):
Linux PC:

intel i7-12700k, 128 GB DDR4 3200 MHz RAM, Ubuntu LTS 20.04

Windows (WSL2) Laptop:

Ryzen 9 6900HX, 16 DB DDR5 RAM, Ubuntu 22.04 LTS (8 GB DDR5 RAM)

Both using gcc compiler and CMake and std=C++17

My source file that is the base of my entire program:

#ifndef DEFINITIONS_H
#define DEFINITIONS_H

#include <string>
#include <vector>
#include <string_view>

// ... (code omitted for brevity)

typedef std::vector<Company> Companies;

#endif // DEFINITIONS_H    

The profiles.h source file (have not included irrelevant code):

#ifndef PROFILES_H
#define PROFILES_H

#include "definitions.h"

std::pair<Nodes, Edges> create_edges_and_nodes_from_profiles(Profiles & profiles, Companies & companies );

#endif // PROFILES_H

The profiles.cpp file:

// ... (code omitted for brevity)

Now in my main driver:

#include "definitions.h"
#include "profiles.h"

int main()
{
    // ... (code omitted for brevity)

    #ifdef MONITOR

    auto before_creating_edges_and_nodes = std::chrono::high_resolution_clock::now();

    #endif

    // Create nodes and edges from profiles and companies
    auto[nodes, edges] = create_edges_and_nodes_from_profiles(profiles,companies);

    #ifdef MONITOR
    auto after_creating_edges_and_nodes = std::chrono::high_resolution_clock::now();
    auto time_creating_edges_and_nodes = std::chrono::duration_cast<std::chrono::milliseconds>(after_creating_edges_and_nodes - before_creating_edges_and_nodes).count();

    #endif

    // lots of functions using Nodes and Edges 
    //...
    //...

    return 0;
}

Note that I am not an expert in C++. The code shown above works but it takes a significant amount of time (~150 milliseconds). The rest of the script combined (around 15 functions) takes less than 100 milliseconds.

My question is basically, how do I restructure the data and/or the initialization of the objects so that it is as fast as possible.

The reason I am asking is because, I found it impossible to find how to do this on Google. It could be because I don't know the right terminology.

英文:

I have a function that creates a vector of Nodes and Edges from a vector of Profiles. Its working as intended, but this function is by far the slowest part of my full script. It takes around 150 milliseconds to run.

Machine(s):
Linux PC:

intel i7-12700k, 128 GB DDR4 3200 MHz RAM, Ubuntu LTS 20.04

Windows (WSL2) Laptop:

Ryzen 9 6900HX, 16 DB DDR5 RAM, Ubuntu 22.04 LTS (8 GB DDR5 RAM)

Both using gcc compiler and CMake and std=C++17

My source file that is the base of my entire program:

#ifndef DEFINITIONS_H
#define DEFINITIONS_H

#include &lt;string&gt;
#include &lt;vector&gt;
#include &lt;string_view&gt;

#include &lt;string&gt;
#include &lt;iostream&gt;
#include &lt;unordered_map&gt;
#include &lt;algorithm&gt;
#include &lt;utility&gt;
#include &lt;immintrin.h&gt;
#include &lt;chrono&gt;

typedef std::string s;
typedef std::string_view stv;


struct Experience 
    {
        s from_date;
        s to_date;
        s position_title;
        float duration;
        s location;
        s institution_name;
        float salary;

        Experience(s from_date, s to_date, s position_title, float duration, s location, s institution_name, float salary)
            {
                this-&gt;from_date = from_date;
                this-&gt;to_date = to_date;
                this-&gt;position_title = position_title;
                this-&gt;duration = duration;
                this-&gt;location = location;
                this-&gt;institution_name = institution_name;
                this-&gt;salary = salary;
            }
        
        Experience()
            {
                this-&gt;from_date = &quot;&quot;;
                this-&gt;to_date = &quot;&quot;;
                this-&gt;position_title = &quot;&quot;;
                this-&gt;duration = 0;
                this-&gt;location = &quot;&quot;;
                this-&gt;institution_name = &quot;&quot;;
                this-&gt;salary = 0;
            }
        
        friend std::ostream&amp; operator&lt;&lt;(std::ostream&amp; os, const Experience&amp; e)
            {
                os &lt;&lt; &quot;from_date: &quot; &lt;&lt; e.from_date &lt;&lt; std::endl;
                os &lt;&lt; &quot;to_date: &quot; &lt;&lt; e.to_date &lt;&lt; std::endl;
                os &lt;&lt; &quot;position_title: &quot; &lt;&lt; e.position_title &lt;&lt; std::endl;
                os &lt;&lt; &quot;duration: &quot; &lt;&lt; e.duration &lt;&lt; std::endl;
                os &lt;&lt; &quot;location: &quot; &lt;&lt; e.location &lt;&lt; std::endl;
                os &lt;&lt; &quot;institution_name: &quot; &lt;&lt; e.institution_name &lt;&lt; std::endl;
                os &lt;&lt; &quot;salary: &quot; &lt;&lt; e.salary &lt;&lt; std::endl;
                return os;
            }

    };
typedef std::vector&lt;Experience&gt; Experiences;

struct Profile
    {
        s linkedin_url;
        s name;
        Experiences experiences;
        std::vector&lt;s&gt; skills;

        Profile(s linkedin_url, s name, std::vector&lt;s&gt; skills, Experiences experiences)
            {
                this-&gt;linkedin_url = linkedin_url;
                this-&gt;name = name;
                this-&gt;skills = skills;
                this-&gt;experiences = experiences;
            }
        
        Profile()
            {
                this-&gt;linkedin_url = &quot;&quot;;
                this-&gt;name = &quot;&quot;;
                this-&gt;skills = {};
                this-&gt;experiences = {};
            }
        
        friend std::ostream&amp; operator&lt;&lt;(std::ostream&amp; os, const Profile&amp; p)
            {
                os &lt;&lt; &quot;linkedin_url: &quot; &lt;&lt; p.linkedin_url &lt;&lt; std::endl;
                os &lt;&lt; &quot;name: &quot; &lt;&lt; p.name &lt;&lt; std::endl;
                os &lt;&lt; &quot;experiences: &quot; &lt;&lt; std::endl;
                for (auto e : p.experiences)
                    {
                        os &lt;&lt; &#39;\t&#39; &lt;&lt; e &lt;&lt; std::endl;
                    }
                return os;
            }
    };

typedef std::vector&lt;Profile&gt; Profiles;

struct Node
    {
        s name;
        s position_title;
        s institution_name;
        s location;
        s industry;
        s linkedin_url;
        float duration;
        int company_size;
        float median_tenure;
        float salary;
        float headcount_growth;
        float current_experience_duration;

        Node(s t_name,s t_position_title, s t_institution_name, s t_location, s t_industry,s linkedin_url, float t_duration, float t_current_experience_duration, int t_company_size, float t_median_tenure, float t_salary, float t_headcount_growth )
            {
                this-&gt;name = t_name;
                this-&gt;position_title = t_position_title;
                this-&gt;institution_name = t_institution_name;
                this-&gt;location = t_location;
                this-&gt;industry = t_industry;
                this-&gt;linkedin_url = linkedin_url;
                this-&gt;duration = t_duration;
                this-&gt;current_experience_duration = t_current_experience_duration;
                this-&gt;company_size = t_company_size;
                this-&gt;median_tenure = t_median_tenure;
                this-&gt;salary = t_salary;
                this-&gt;headcount_growth = t_headcount_growth;
            }
        
        Node()
            {
                this-&gt;name = &quot;&quot;;
                this-&gt;position_title = &quot;&quot;;
                this-&gt;institution_name = &quot;&quot;;
                this-&gt;location = &quot;&quot;;
                this-&gt;industry = &quot;&quot;;
                this-&gt;linkedin_url = &quot;&quot;;
                this-&gt;duration = 0;
                this-&gt;current_experience_duration = 0;
                this-&gt;company_size = 0;
                this-&gt;median_tenure = 0;
                this-&gt;salary = 0;
                this-&gt;headcount_growth = 0;
            }

        bool operator==(const Node&amp; other) const
            {
                if (this-&gt;name == other.name &amp;&amp; this-&gt;position_title == other.position_title &amp;&amp; this-&gt;institution_name == other.institution_name &amp;&amp; this-&gt;location == other.location)
                    return true;
                else
                    return false;
            }
        
        friend std::ostream&amp; operator&lt;&lt;(std::ostream&amp; os, const Node&amp; node)
            {
                os &lt;&lt; &quot;Name: &quot; &lt;&lt; node.name &lt;&lt; std::endl;
                os &lt;&lt; &quot;Position Title: &quot; &lt;&lt; node.position_title &lt;&lt; std::endl;
                os &lt;&lt; &quot;Institution Name: &quot; &lt;&lt; node.institution_name &lt;&lt; std::endl;
                os &lt;&lt; &quot;Location: &quot; &lt;&lt; node.location &lt;&lt; std::endl;
                os &lt;&lt; &quot;Industry: &quot; &lt;&lt; node.industry &lt;&lt; std::endl;
                os &lt;&lt; &quot;Linkedin URL: &quot; &lt;&lt; node.linkedin_url &lt;&lt; std::endl;
                os &lt;&lt; &quot;Duration: &quot; &lt;&lt; node.duration &lt;&lt; std::endl;
                os &lt;&lt; &quot;Current Experience Duration &quot;&lt;&lt; node.current_experience_duration &lt;&lt; std::endl;
                os &lt;&lt; &quot;Company Size: &quot; &lt;&lt; node.company_size &lt;&lt; std::endl;
                os &lt;&lt; &quot;Median Tenure: &quot; &lt;&lt; node.median_tenure &lt;&lt; std::endl;
                os &lt;&lt; &quot;Salary: &quot; &lt;&lt; node.salary &lt;&lt; std::endl;
                os &lt;&lt; &quot;Headcount Growth: &quot; &lt;&lt; node.headcount_growth &lt;&lt; std::endl;
                return os;
            }
        
    };

typedef std::vector&lt;Node&gt; Nodes;


struct Edge
    {
        Node source;
        Node target;
        s linkedin_url;
        float duration;
        int company_size;
        float median_tenure;
        float salary;
        float headcount_growth;

        Edge(Node t_source, Node t_target)
            {
                
                this-&gt;source = t_source;
                this-&gt;target = t_target;
                this-&gt;linkedin_url = this-&gt;target.linkedin_url;
                this-&gt;duration = this-&gt;target.duration;
                this-&gt;company_size = this-&gt;target.company_size;
                this-&gt;median_tenure = this-&gt;target.median_tenure;
                this-&gt;salary = this-&gt;target.salary;
                this-&gt;headcount_growth = this-&gt;target.headcount_growth;
            }
        
        Edge()
            {
                this-&gt;source = Node();
                this-&gt;target = Node();
                this-&gt;linkedin_url = &quot;&quot;;
                this-&gt;duration = 0;
                this-&gt;company_size = 0;
                this-&gt;median_tenure = 0;
                this-&gt;salary = 0;
                this-&gt;headcount_growth = 0;
            }
        
        bool operator==(const Edge&amp; other) const
            {
                if (this-&gt;source == other.source &amp;&amp; this-&gt;target == other.target)
                    return true;
                else
                    return false;
            }
        
        friend std::ostream&amp; operator&lt;&lt;(std::ostream&amp; os, const Edge&amp; edge)
            {
                os &lt;&lt; &quot;Source: &quot; &lt;&lt; edge.source &lt;&lt; std::endl;
                os &lt;&lt; &quot;Target: &quot; &lt;&lt; edge.target &lt;&lt; std::endl;
                os &lt;&lt; &quot;Linkedin URL: &quot; &lt;&lt; edge.linkedin_url &lt;&lt; std::endl;
                os &lt;&lt; &quot;Duration: &quot; &lt;&lt; edge.duration &lt;&lt; std::endl;
                os &lt;&lt; &quot;Company Size: &quot; &lt;&lt; edge.company_size &lt;&lt; std::endl;
                os &lt;&lt; &quot;Median Tenure: &quot; &lt;&lt; edge.median_tenure &lt;&lt; std::endl;
                os &lt;&lt; &quot;Salary: &quot; &lt;&lt; edge.salary &lt;&lt; std::endl;
                os &lt;&lt; &quot;Headcount Growth: &quot; &lt;&lt; edge.headcount_growth &lt;&lt; std::endl;
                return os;
            }
    };


typedef std::vector&lt;Edge&gt; Edges;


struct Company
    {
        s name;
        s industry;
        float headcount_growth;
        float median_tenure;
        int company_size;

        Company(s name, s industry, float headcount_growth, float median_tenure, int company_size)
            {
                this-&gt;name = name;
                this-&gt;industry = industry;
                this-&gt;headcount_growth = headcount_growth;
                this-&gt;median_tenure = median_tenure;
                this-&gt;company_size = company_size;
            }
        
        Company()
            {
                this-&gt;name = &quot;&quot;;
                this-&gt;industry = &quot;&quot;;
                this-&gt;headcount_growth = 0;
                this-&gt;median_tenure = 0;
                this-&gt;company_size = 0;
            }
        
        bool operator==(const Company&amp; other) const
            {
                if (this-&gt;name == other.name)
                    {
                        return true;
                    }
                else
                    {
                        return false;
                    }
            }
        
        friend std::ostream&amp; operator&lt;&lt;(std::ostream&amp; os, const Company&amp; company)
            {
                os &lt;&lt; &quot;Name: &quot; &lt;&lt; company.name &lt;&lt; std::endl;
                os &lt;&lt; &quot;Industry: &quot; &lt;&lt; company.industry &lt;&lt; std::endl;
                os &lt;&lt; &quot;Headcount Growth: &quot; &lt;&lt; company.headcount_growth &lt;&lt; std::endl;
                os &lt;&lt; &quot;Median Tenure: &quot; &lt;&lt; company.median_tenure &lt;&lt; std::endl;
                os &lt;&lt; &quot;Company Size: &quot; &lt;&lt; company.company_size &lt;&lt; std::endl;
                return os;
            }

    };

typedef std::vector&lt;Company&gt; Companies;

#endif // DEFINITIONS_H    

The profiles.h source file (have not included irrelevant code):

#ifndef PROFILES_H
#define PROFILES_H

#include &quot;definitions.h&quot;

std::pair&lt;Nodes,Edges&gt; create_edges_and_nodes_from_profiles(Profiles &amp; profiles, Companies &amp; companies );

#endif // PROFILES_H

The profiles.cpp file:

std::pair&lt;Nodes,Edges&gt; create_edges_and_nodes_from_profiles(Profiles &amp; profiles, Companies &amp; companies)
    {
        using namespace std::literals;

        Nodes nodes;
        Edges edges;
        nodes.reserve(40&#39;000);
        edges.reserve(40&#39;000);

        bool use_profile;


        std::unordered_map&lt;s,float&gt; company_name_median_tenure_map;
        std::unordered_map&lt;s,float&gt; company_name_headcount_growth_map;
        std::unordered_map&lt;s,int&gt; company_name_company_size_map;
        std::unordered_map&lt;s,s&gt; company_name_industry_map;

        float current_duration;
        float t_company_size, t_median_tenure, t_headcount_growth, t_duration, t_salary;
        s t_name, t_position_title, t_location, t_institution_name, t_industry;

        

        #ifdef MONITOR
            auto before_map_createion = std::chrono::high_resolution_clock::now();
        #endif  

        for (Company &amp; company : companies)
            {
                
                company_name_median_tenure_map[company.name] = company.median_tenure;
                company_name_headcount_growth_map[company.name] = company.headcount_growth;
                company_name_company_size_map[company.name] = company.company_size;
                company_name_industry_map[company.name] = company.industry;

            }
        
        #ifdef MONITOR
            auto after_map_creation = std::chrono::high_resolution_clock::now();

            auto temp = std::chrono::duration_cast&lt;std::chrono::milliseconds&gt;(after_map_creation-before_map_createion).count();

            std::cout&lt;&lt;temp&lt;&lt;std::endl;
        #endif

        for (Profile &amp; profile : profiles)
            {
                Nodes nodes_temp;
                nodes_temp.reserve(profile.experiences.size());
                current_duration = 0.0;
                use_profile = false;


                for (Experience &amp; experience : profile.experiences)
                    {
                        
                        stv location = stv(experience.location);
                        
                        if (location.compare(&quot;&quot;sv)!=0)
                            {
                                
                                //if (location.find(&quot;United Kingdom&quot;sv)!=stv::npos)
                                    
                                t_name = profile.name;
                                t_position_title = experience.position_title;
                                t_location = experience.location;

                                
                                t_duration = experience.duration;
                                
                                
                                t_institution_name = experience.institution_name;
                                
                                t_company_size = company_name_company_size_map[t_institution_name];
                                t_median_tenure = company_name_median_tenure_map[t_institution_name];
                                t_headcount_growth = company_name_headcount_growth_map[t_institution_name];

                                
                                t_salary = experience.salary;


                                t_industry = company_name_industry_map[t_institution_name];
                                
                                Node node_obj = Node(std::move(t_name),
                                    std::move(t_position_title),
                                        std::move(t_institution_name),
                                            std::move(t_location),
                                                std::move(t_industry),
                                                    profile.linkedin_url,
                                                        t_duration,
                                                            current_duration,
                                                                t_company_size,
                                                                    t_median_tenure,
                                                                        t_salary,
                                                                            t_headcount_growth);
                                                        
                                
                                nodes_temp.push_back(std::move(node_obj));
                                
                                current_duration += t_duration;
                                    
                            }
                        
                    }
                
                for (Node &amp; node : nodes_temp)
                    {
                        if (node.location.compare(&quot;&quot;sv)!=0)
                        {
                            use_profile = true;
                            break;
                        }
                            
                    }
                
                if (!use_profile)
                    continue;
                
                current_duration = 0.0;

                for (Nodes::reverse_iterator rit=nodes_temp.rbegin();rit!=nodes_temp.rend();++rit)
                    {
                        rit-&gt;current_experience_duration = current_duration;
                        current_duration += rit-&gt;duration;
                    }
                
                if (nodes_temp.size()&gt;1)
                    {
                        for (int i=0;i&lt;nodes_temp.size()-1;i++)
                            {
                                //Edge edge = Edge(&amp;nodes_temp[i+1],&amp;nodes_temp[i]);
                                edges.emplace_back(std::move(nodes_temp[i+1]),std::move(nodes_temp[i]));

                            }
                    }
                
                for (auto &amp; node : nodes_temp)
                    {
                        nodes.push_back(std::move(node));
                    }
            }

        return std::make_pair(std::move(nodes),std::move(edges));
    }

Now in my main driver:

#include &quot;definitions.h&quot;
#include &quot;profiles.h&quot;


int main()
  {
     // not shown (but assume this works and created the profile objects as shown in definitions.h)
     Profiles profiles = get_profiles(coll_profiles);
    // Make the same assumption as before 
     Companies companies = get_companies(coll_companies);

      #ifdef MONITOR
            
          auto before_creating_edges_and_nodes =  std::chrono::high_resolution_clock::now();
            
      #endif

        
        
      // Create nodes and edges from profiles and companies
      auto[nodes, edges] =  create_edges_and_nodes_from_profiles(profiles,companies);

      #ifdef MONITOR
          auto after_creating_edges_and_nodes =  std::chrono::high_resolution_clock::now();
          auto time_creating_edges_and_nodes = std::chrono::duration_cast&lt;std::chrono::milliseconds&gt;(after_creating_edges_and_nodes - before_creating_edges_and_nodes).count();

            
      #endif
      // lots of functions using Nodes and Edges 
      //...
      //...
      return 0;
  }

Note that I am not an expert in C++. The code shown above works but it takes a significant amount of time (~150 milliseconds). The rest of the script combined (around 15 functions) takes less than 100 milliseconds.

My question is basically, how do I restructure the data and/or the initialization of the objects so that it is as fast as possible.

The reason I am asking is because, I found it impossible to find how to do this on Google. It could be because I don't know the right terminology.

答案1

得分: 1

Replace

std::unordered_map<s,float> company_name_median_tenure_map;
std::unordered_map<s,float> company_name_headcount_growth_map;
std::unordered_map<s,int> company_name_company_size_map;
std::unordered_map<s,s> company_name_industry_map;

with

struct company_data
{
    float tenure;
    float growth;
    int size;
    std::string industry;      // this should be an index into a table of industry names
};

std::unordered_map<std::string,company_data> company_map;
英文:

Replace

    std::unordered_map&lt;s,float&gt; company_name_median_tenure_map;
std::unordered_map&lt;s,float&gt; company_name_headcount_growth_map;
std::unordered_map&lt;s,int&gt; company_name_company_size_map;
std::unordered_map&lt;s,s&gt; company_name_industry_map;

with

struct company_data
{
float tenure;
float growth;
int size
std:string industry;      // this should be an index into a table of industry names
};
std::unordered_map&lt;std::string,company_data&gt;  company_map;

huangapple
  • 本文由 发表于 2023年6月25日 20:09:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76550312-2.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定