如何以最快的方式在C++中创建/初始化自定义对象?

huangapple go评论94阅读模式
英文:

How to create/initialize custom objects in C++ the fastest way possible?

问题

I have a function that creates a vector of Nodes and Edges from a vector of Profiles. Its working as intended, but this function is by far the slowest part of my full script. It takes around 150 milliseconds to run.

Machine(s):
Linux PC:

  1. intel i7-12700k, 128 GB DDR4 3200 MHz RAM, Ubuntu LTS 20.04

Windows (WSL2) Laptop:

  1. Ryzen 9 6900HX, 16 DB DDR5 RAM, Ubuntu 22.04 LTS (8 GB DDR5 RAM)

Both using gcc compiler and CMake and std=C++17

My source file that is the base of my entire program:

  1. #ifndef DEFINITIONS_H
  2. #define DEFINITIONS_H
  3. #include <string>
  4. #include <vector>
  5. #include <string_view>
  6. // ... (code omitted for brevity)
  7. typedef std::vector<Company> Companies;
  8. #endif // DEFINITIONS_H

The profiles.h source file (have not included irrelevant code):

  1. #ifndef PROFILES_H
  2. #define PROFILES_H
  3. #include "definitions.h"
  4. std::pair<Nodes, Edges> create_edges_and_nodes_from_profiles(Profiles & profiles, Companies & companies );
  5. #endif // PROFILES_H

The profiles.cpp file:

  1. // ... (code omitted for brevity)

Now in my main driver:

  1. #include "definitions.h"
  2. #include "profiles.h"
  3. int main()
  4. {
  5. // ... (code omitted for brevity)
  6. #ifdef MONITOR
  7. auto before_creating_edges_and_nodes = std::chrono::high_resolution_clock::now();
  8. #endif
  9. // Create nodes and edges from profiles and companies
  10. auto[nodes, edges] = create_edges_and_nodes_from_profiles(profiles,companies);
  11. #ifdef MONITOR
  12. auto after_creating_edges_and_nodes = std::chrono::high_resolution_clock::now();
  13. auto time_creating_edges_and_nodes = std::chrono::duration_cast<std::chrono::milliseconds>(after_creating_edges_and_nodes - before_creating_edges_and_nodes).count();
  14. #endif
  15. // lots of functions using Nodes and Edges
  16. //...
  17. //...
  18. return 0;
  19. }

Note that I am not an expert in C++. The code shown above works but it takes a significant amount of time (~150 milliseconds). The rest of the script combined (around 15 functions) takes less than 100 milliseconds.

My question is basically, how do I restructure the data and/or the initialization of the objects so that it is as fast as possible.

The reason I am asking is because, I found it impossible to find how to do this on Google. It could be because I don't know the right terminology.

英文:

I have a function that creates a vector of Nodes and Edges from a vector of Profiles. Its working as intended, but this function is by far the slowest part of my full script. It takes around 150 milliseconds to run.

Machine(s):
Linux PC:

  1. intel i7-12700k, 128 GB DDR4 3200 MHz RAM, Ubuntu LTS 20.04

Windows (WSL2) Laptop:

  1. Ryzen 9 6900HX, 16 DB DDR5 RAM, Ubuntu 22.04 LTS (8 GB DDR5 RAM)

Both using gcc compiler and CMake and std=C++17

My source file that is the base of my entire program:

  1. #ifndef DEFINITIONS_H
  2. #define DEFINITIONS_H
  3. #include &lt;string&gt;
  4. #include &lt;vector&gt;
  5. #include &lt;string_view&gt;
  6. #include &lt;string&gt;
  7. #include &lt;iostream&gt;
  8. #include &lt;unordered_map&gt;
  9. #include &lt;algorithm&gt;
  10. #include &lt;utility&gt;
  11. #include &lt;immintrin.h&gt;
  12. #include &lt;chrono&gt;
  13. typedef std::string s;
  14. typedef std::string_view stv;
  15. struct Experience
  16. {
  17. s from_date;
  18. s to_date;
  19. s position_title;
  20. float duration;
  21. s location;
  22. s institution_name;
  23. float salary;
  24. Experience(s from_date, s to_date, s position_title, float duration, s location, s institution_name, float salary)
  25. {
  26. this-&gt;from_date = from_date;
  27. this-&gt;to_date = to_date;
  28. this-&gt;position_title = position_title;
  29. this-&gt;duration = duration;
  30. this-&gt;location = location;
  31. this-&gt;institution_name = institution_name;
  32. this-&gt;salary = salary;
  33. }
  34. Experience()
  35. {
  36. this-&gt;from_date = &quot;&quot;;
  37. this-&gt;to_date = &quot;&quot;;
  38. this-&gt;position_title = &quot;&quot;;
  39. this-&gt;duration = 0;
  40. this-&gt;location = &quot;&quot;;
  41. this-&gt;institution_name = &quot;&quot;;
  42. this-&gt;salary = 0;
  43. }
  44. friend std::ostream&amp; operator&lt;&lt;(std::ostream&amp; os, const Experience&amp; e)
  45. {
  46. os &lt;&lt; &quot;from_date: &quot; &lt;&lt; e.from_date &lt;&lt; std::endl;
  47. os &lt;&lt; &quot;to_date: &quot; &lt;&lt; e.to_date &lt;&lt; std::endl;
  48. os &lt;&lt; &quot;position_title: &quot; &lt;&lt; e.position_title &lt;&lt; std::endl;
  49. os &lt;&lt; &quot;duration: &quot; &lt;&lt; e.duration &lt;&lt; std::endl;
  50. os &lt;&lt; &quot;location: &quot; &lt;&lt; e.location &lt;&lt; std::endl;
  51. os &lt;&lt; &quot;institution_name: &quot; &lt;&lt; e.institution_name &lt;&lt; std::endl;
  52. os &lt;&lt; &quot;salary: &quot; &lt;&lt; e.salary &lt;&lt; std::endl;
  53. return os;
  54. }
  55. };
  56. typedef std::vector&lt;Experience&gt; Experiences;
  57. struct Profile
  58. {
  59. s linkedin_url;
  60. s name;
  61. Experiences experiences;
  62. std::vector&lt;s&gt; skills;
  63. Profile(s linkedin_url, s name, std::vector&lt;s&gt; skills, Experiences experiences)
  64. {
  65. this-&gt;linkedin_url = linkedin_url;
  66. this-&gt;name = name;
  67. this-&gt;skills = skills;
  68. this-&gt;experiences = experiences;
  69. }
  70. Profile()
  71. {
  72. this-&gt;linkedin_url = &quot;&quot;;
  73. this-&gt;name = &quot;&quot;;
  74. this-&gt;skills = {};
  75. this-&gt;experiences = {};
  76. }
  77. friend std::ostream&amp; operator&lt;&lt;(std::ostream&amp; os, const Profile&amp; p)
  78. {
  79. os &lt;&lt; &quot;linkedin_url: &quot; &lt;&lt; p.linkedin_url &lt;&lt; std::endl;
  80. os &lt;&lt; &quot;name: &quot; &lt;&lt; p.name &lt;&lt; std::endl;
  81. os &lt;&lt; &quot;experiences: &quot; &lt;&lt; std::endl;
  82. for (auto e : p.experiences)
  83. {
  84. os &lt;&lt; &#39;\t&#39; &lt;&lt; e &lt;&lt; std::endl;
  85. }
  86. return os;
  87. }
  88. };
  89. typedef std::vector&lt;Profile&gt; Profiles;
  90. struct Node
  91. {
  92. s name;
  93. s position_title;
  94. s institution_name;
  95. s location;
  96. s industry;
  97. s linkedin_url;
  98. float duration;
  99. int company_size;
  100. float median_tenure;
  101. float salary;
  102. float headcount_growth;
  103. float current_experience_duration;
  104. Node(s t_name,s t_position_title, s t_institution_name, s t_location, s t_industry,s linkedin_url, float t_duration, float t_current_experience_duration, int t_company_size, float t_median_tenure, float t_salary, float t_headcount_growth )
  105. {
  106. this-&gt;name = t_name;
  107. this-&gt;position_title = t_position_title;
  108. this-&gt;institution_name = t_institution_name;
  109. this-&gt;location = t_location;
  110. this-&gt;industry = t_industry;
  111. this-&gt;linkedin_url = linkedin_url;
  112. this-&gt;duration = t_duration;
  113. this-&gt;current_experience_duration = t_current_experience_duration;
  114. this-&gt;company_size = t_company_size;
  115. this-&gt;median_tenure = t_median_tenure;
  116. this-&gt;salary = t_salary;
  117. this-&gt;headcount_growth = t_headcount_growth;
  118. }
  119. Node()
  120. {
  121. this-&gt;name = &quot;&quot;;
  122. this-&gt;position_title = &quot;&quot;;
  123. this-&gt;institution_name = &quot;&quot;;
  124. this-&gt;location = &quot;&quot;;
  125. this-&gt;industry = &quot;&quot;;
  126. this-&gt;linkedin_url = &quot;&quot;;
  127. this-&gt;duration = 0;
  128. this-&gt;current_experience_duration = 0;
  129. this-&gt;company_size = 0;
  130. this-&gt;median_tenure = 0;
  131. this-&gt;salary = 0;
  132. this-&gt;headcount_growth = 0;
  133. }
  134. bool operator==(const Node&amp; other) const
  135. {
  136. if (this-&gt;name == other.name &amp;&amp; this-&gt;position_title == other.position_title &amp;&amp; this-&gt;institution_name == other.institution_name &amp;&amp; this-&gt;location == other.location)
  137. return true;
  138. else
  139. return false;
  140. }
  141. friend std::ostream&amp; operator&lt;&lt;(std::ostream&amp; os, const Node&amp; node)
  142. {
  143. os &lt;&lt; &quot;Name: &quot; &lt;&lt; node.name &lt;&lt; std::endl;
  144. os &lt;&lt; &quot;Position Title: &quot; &lt;&lt; node.position_title &lt;&lt; std::endl;
  145. os &lt;&lt; &quot;Institution Name: &quot; &lt;&lt; node.institution_name &lt;&lt; std::endl;
  146. os &lt;&lt; &quot;Location: &quot; &lt;&lt; node.location &lt;&lt; std::endl;
  147. os &lt;&lt; &quot;Industry: &quot; &lt;&lt; node.industry &lt;&lt; std::endl;
  148. os &lt;&lt; &quot;Linkedin URL: &quot; &lt;&lt; node.linkedin_url &lt;&lt; std::endl;
  149. os &lt;&lt; &quot;Duration: &quot; &lt;&lt; node.duration &lt;&lt; std::endl;
  150. os &lt;&lt; &quot;Current Experience Duration &quot;&lt;&lt; node.current_experience_duration &lt;&lt; std::endl;
  151. os &lt;&lt; &quot;Company Size: &quot; &lt;&lt; node.company_size &lt;&lt; std::endl;
  152. os &lt;&lt; &quot;Median Tenure: &quot; &lt;&lt; node.median_tenure &lt;&lt; std::endl;
  153. os &lt;&lt; &quot;Salary: &quot; &lt;&lt; node.salary &lt;&lt; std::endl;
  154. os &lt;&lt; &quot;Headcount Growth: &quot; &lt;&lt; node.headcount_growth &lt;&lt; std::endl;
  155. return os;
  156. }
  157. };
  158. typedef std::vector&lt;Node&gt; Nodes;
  159. struct Edge
  160. {
  161. Node source;
  162. Node target;
  163. s linkedin_url;
  164. float duration;
  165. int company_size;
  166. float median_tenure;
  167. float salary;
  168. float headcount_growth;
  169. Edge(Node t_source, Node t_target)
  170. {
  171. this-&gt;source = t_source;
  172. this-&gt;target = t_target;
  173. this-&gt;linkedin_url = this-&gt;target.linkedin_url;
  174. this-&gt;duration = this-&gt;target.duration;
  175. this-&gt;company_size = this-&gt;target.company_size;
  176. this-&gt;median_tenure = this-&gt;target.median_tenure;
  177. this-&gt;salary = this-&gt;target.salary;
  178. this-&gt;headcount_growth = this-&gt;target.headcount_growth;
  179. }
  180. Edge()
  181. {
  182. this-&gt;source = Node();
  183. this-&gt;target = Node();
  184. this-&gt;linkedin_url = &quot;&quot;;
  185. this-&gt;duration = 0;
  186. this-&gt;company_size = 0;
  187. this-&gt;median_tenure = 0;
  188. this-&gt;salary = 0;
  189. this-&gt;headcount_growth = 0;
  190. }
  191. bool operator==(const Edge&amp; other) const
  192. {
  193. if (this-&gt;source == other.source &amp;&amp; this-&gt;target == other.target)
  194. return true;
  195. else
  196. return false;
  197. }
  198. friend std::ostream&amp; operator&lt;&lt;(std::ostream&amp; os, const Edge&amp; edge)
  199. {
  200. os &lt;&lt; &quot;Source: &quot; &lt;&lt; edge.source &lt;&lt; std::endl;
  201. os &lt;&lt; &quot;Target: &quot; &lt;&lt; edge.target &lt;&lt; std::endl;
  202. os &lt;&lt; &quot;Linkedin URL: &quot; &lt;&lt; edge.linkedin_url &lt;&lt; std::endl;
  203. os &lt;&lt; &quot;Duration: &quot; &lt;&lt; edge.duration &lt;&lt; std::endl;
  204. os &lt;&lt; &quot;Company Size: &quot; &lt;&lt; edge.company_size &lt;&lt; std::endl;
  205. os &lt;&lt; &quot;Median Tenure: &quot; &lt;&lt; edge.median_tenure &lt;&lt; std::endl;
  206. os &lt;&lt; &quot;Salary: &quot; &lt;&lt; edge.salary &lt;&lt; std::endl;
  207. os &lt;&lt; &quot;Headcount Growth: &quot; &lt;&lt; edge.headcount_growth &lt;&lt; std::endl;
  208. return os;
  209. }
  210. };
  211. typedef std::vector&lt;Edge&gt; Edges;
  212. struct Company
  213. {
  214. s name;
  215. s industry;
  216. float headcount_growth;
  217. float median_tenure;
  218. int company_size;
  219. Company(s name, s industry, float headcount_growth, float median_tenure, int company_size)
  220. {
  221. this-&gt;name = name;
  222. this-&gt;industry = industry;
  223. this-&gt;headcount_growth = headcount_growth;
  224. this-&gt;median_tenure = median_tenure;
  225. this-&gt;company_size = company_size;
  226. }
  227. Company()
  228. {
  229. this-&gt;name = &quot;&quot;;
  230. this-&gt;industry = &quot;&quot;;
  231. this-&gt;headcount_growth = 0;
  232. this-&gt;median_tenure = 0;
  233. this-&gt;company_size = 0;
  234. }
  235. bool operator==(const Company&amp; other) const
  236. {
  237. if (this-&gt;name == other.name)
  238. {
  239. return true;
  240. }
  241. else
  242. {
  243. return false;
  244. }
  245. }
  246. friend std::ostream&amp; operator&lt;&lt;(std::ostream&amp; os, const Company&amp; company)
  247. {
  248. os &lt;&lt; &quot;Name: &quot; &lt;&lt; company.name &lt;&lt; std::endl;
  249. os &lt;&lt; &quot;Industry: &quot; &lt;&lt; company.industry &lt;&lt; std::endl;
  250. os &lt;&lt; &quot;Headcount Growth: &quot; &lt;&lt; company.headcount_growth &lt;&lt; std::endl;
  251. os &lt;&lt; &quot;Median Tenure: &quot; &lt;&lt; company.median_tenure &lt;&lt; std::endl;
  252. os &lt;&lt; &quot;Company Size: &quot; &lt;&lt; company.company_size &lt;&lt; std::endl;
  253. return os;
  254. }
  255. };
  256. typedef std::vector&lt;Company&gt; Companies;
  257. #endif // DEFINITIONS_H

The profiles.h source file (have not included irrelevant code):

  1. #ifndef PROFILES_H
  2. #define PROFILES_H
  3. #include &quot;definitions.h&quot;
  4. std::pair&lt;Nodes,Edges&gt; create_edges_and_nodes_from_profiles(Profiles &amp; profiles, Companies &amp; companies );
  5. #endif // PROFILES_H

The profiles.cpp file:

  1. std::pair&lt;Nodes,Edges&gt; create_edges_and_nodes_from_profiles(Profiles &amp; profiles, Companies &amp; companies)
  2. {
  3. using namespace std::literals;
  4. Nodes nodes;
  5. Edges edges;
  6. nodes.reserve(40&#39;000);
  7. edges.reserve(40&#39;000);
  8. bool use_profile;
  9. std::unordered_map&lt;s,float&gt; company_name_median_tenure_map;
  10. std::unordered_map&lt;s,float&gt; company_name_headcount_growth_map;
  11. std::unordered_map&lt;s,int&gt; company_name_company_size_map;
  12. std::unordered_map&lt;s,s&gt; company_name_industry_map;
  13. float current_duration;
  14. float t_company_size, t_median_tenure, t_headcount_growth, t_duration, t_salary;
  15. s t_name, t_position_title, t_location, t_institution_name, t_industry;
  16. #ifdef MONITOR
  17. auto before_map_createion = std::chrono::high_resolution_clock::now();
  18. #endif
  19. for (Company &amp; company : companies)
  20. {
  21. company_name_median_tenure_map[company.name] = company.median_tenure;
  22. company_name_headcount_growth_map[company.name] = company.headcount_growth;
  23. company_name_company_size_map[company.name] = company.company_size;
  24. company_name_industry_map[company.name] = company.industry;
  25. }
  26. #ifdef MONITOR
  27. auto after_map_creation = std::chrono::high_resolution_clock::now();
  28. auto temp = std::chrono::duration_cast&lt;std::chrono::milliseconds&gt;(after_map_creation-before_map_createion).count();
  29. std::cout&lt;&lt;temp&lt;&lt;std::endl;
  30. #endif
  31. for (Profile &amp; profile : profiles)
  32. {
  33. Nodes nodes_temp;
  34. nodes_temp.reserve(profile.experiences.size());
  35. current_duration = 0.0;
  36. use_profile = false;
  37. for (Experience &amp; experience : profile.experiences)
  38. {
  39. stv location = stv(experience.location);
  40. if (location.compare(&quot;&quot;sv)!=0)
  41. {
  42. //if (location.find(&quot;United Kingdom&quot;sv)!=stv::npos)
  43. t_name = profile.name;
  44. t_position_title = experience.position_title;
  45. t_location = experience.location;
  46. t_duration = experience.duration;
  47. t_institution_name = experience.institution_name;
  48. t_company_size = company_name_company_size_map[t_institution_name];
  49. t_median_tenure = company_name_median_tenure_map[t_institution_name];
  50. t_headcount_growth = company_name_headcount_growth_map[t_institution_name];
  51. t_salary = experience.salary;
  52. t_industry = company_name_industry_map[t_institution_name];
  53. Node node_obj = Node(std::move(t_name),
  54. std::move(t_position_title),
  55. std::move(t_institution_name),
  56. std::move(t_location),
  57. std::move(t_industry),
  58. profile.linkedin_url,
  59. t_duration,
  60. current_duration,
  61. t_company_size,
  62. t_median_tenure,
  63. t_salary,
  64. t_headcount_growth);
  65. nodes_temp.push_back(std::move(node_obj));
  66. current_duration += t_duration;
  67. }
  68. }
  69. for (Node &amp; node : nodes_temp)
  70. {
  71. if (node.location.compare(&quot;&quot;sv)!=0)
  72. {
  73. use_profile = true;
  74. break;
  75. }
  76. }
  77. if (!use_profile)
  78. continue;
  79. current_duration = 0.0;
  80. for (Nodes::reverse_iterator rit=nodes_temp.rbegin();rit!=nodes_temp.rend();++rit)
  81. {
  82. rit-&gt;current_experience_duration = current_duration;
  83. current_duration += rit-&gt;duration;
  84. }
  85. if (nodes_temp.size()&gt;1)
  86. {
  87. for (int i=0;i&lt;nodes_temp.size()-1;i++)
  88. {
  89. //Edge edge = Edge(&amp;nodes_temp[i+1],&amp;nodes_temp[i]);
  90. edges.emplace_back(std::move(nodes_temp[i+1]),std::move(nodes_temp[i]));
  91. }
  92. }
  93. for (auto &amp; node : nodes_temp)
  94. {
  95. nodes.push_back(std::move(node));
  96. }
  97. }
  98. return std::make_pair(std::move(nodes),std::move(edges));
  99. }

Now in my main driver:

  1. #include &quot;definitions.h&quot;
  2. #include &quot;profiles.h&quot;
  3. int main()
  4. {
  5. // not shown (but assume this works and created the profile objects as shown in definitions.h)
  6. Profiles profiles = get_profiles(coll_profiles);
  7. // Make the same assumption as before
  8. Companies companies = get_companies(coll_companies);
  9. #ifdef MONITOR
  10. auto before_creating_edges_and_nodes = std::chrono::high_resolution_clock::now();
  11. #endif
  12. // Create nodes and edges from profiles and companies
  13. auto[nodes, edges] = create_edges_and_nodes_from_profiles(profiles,companies);
  14. #ifdef MONITOR
  15. auto after_creating_edges_and_nodes = std::chrono::high_resolution_clock::now();
  16. auto time_creating_edges_and_nodes = std::chrono::duration_cast&lt;std::chrono::milliseconds&gt;(after_creating_edges_and_nodes - before_creating_edges_and_nodes).count();
  17. #endif
  18. // lots of functions using Nodes and Edges
  19. //...
  20. //...
  21. return 0;
  22. }

Note that I am not an expert in C++. The code shown above works but it takes a significant amount of time (~150 milliseconds). The rest of the script combined (around 15 functions) takes less than 100 milliseconds.

My question is basically, how do I restructure the data and/or the initialization of the objects so that it is as fast as possible.

The reason I am asking is because, I found it impossible to find how to do this on Google. It could be because I don't know the right terminology.

答案1

得分: 1

Replace

  1. std::unordered_map<s,float> company_name_median_tenure_map;
  2. std::unordered_map<s,float> company_name_headcount_growth_map;
  3. std::unordered_map<s,int> company_name_company_size_map;
  4. std::unordered_map<s,s> company_name_industry_map;

with

  1. struct company_data
  2. {
  3. float tenure;
  4. float growth;
  5. int size;
  6. std::string industry; // this should be an index into a table of industry names
  7. };
  8. std::unordered_map<std::string,company_data> company_map;
英文:

Replace

  1. std::unordered_map&lt;s,float&gt; company_name_median_tenure_map;
  2. std::unordered_map&lt;s,float&gt; company_name_headcount_growth_map;
  3. std::unordered_map&lt;s,int&gt; company_name_company_size_map;
  4. std::unordered_map&lt;s,s&gt; company_name_industry_map;

with

  1. struct company_data
  2. {
  3. float tenure;
  4. float growth;
  5. int size
  6. std:string industry; // this should be an index into a table of industry names
  7. };
  8. std::unordered_map&lt;std::string,company_data&gt; company_map;

huangapple
  • 本文由 发表于 2023年6月25日 20:09:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76550312-2.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定