Deep Learning training slower on Google Cloud VM than Local PC.

huangapple go评论95阅读模式
英文:

Deep Learning training slower on Google Cloud VM than Local PC

问题

以下是你要翻译的部分:

"I am trying to train an LSTM neural network using Pytorch. On my own computer the process is quite slow due to the complexity of the model and size of the dataset. My initial thought was to move the training to a cloud server with more processing power to speed the process up and to avoid having my noisy computer running 24/7 in my living room. Unfortunately, each Epoch takes around twice as long on the virtual machine compared to on my own computer.

The virtual machine I deployed was Google Cloud's 'Deep Learning VM' with The Pytorch 1.13 (CUDA 11.3) framework, 1 Nvidia V100 GPU and 4 vCPU's (with a total of 26 GB memory), which is by far more processing power than on my own computer. I am running my python script through Jupyter Notebook on the Virtual Machine, I don't know if that the speed of the training?

Any ideas on how to improve the speed of the training would be highly appreciated.

The Python Script I am executing is:

  1. import time
  2. import pandas as pd
  3. import torch
  4. import torch.nn as nn # All neural network modules, nn.Linear, nn.Conv2d, BatchNorm, Loss functions
  5. import numpy as np
  6. from torch.utils.data import Dataset
  7. from torch.utils.data import DataLoader # data management
  8. test_indicator = '_test'
  9. indi = '5d'
  10. l = list(range(0,43,1))
  11. l.remove(1)
  12. l remove(3)
  13. l.remove(7)
  14. l.remove(6)
  15. if indi == '5d':
  16. l.remove(42) # 42 removes cumret_20d_y and 41 removes cumret_5d_y
  17. else:
  18. l.remove(41) # 42 removes cumret_20d_y and 41 removes cumret_5d_y
  19. print(l)
  20. epochs = 200
  21. lr = 0.01
  22. batch_size = 131072
  23. look back = 21
  24. lstm_input_dim = 36
  25. Linear_output_dim = 3
  26. lstm_hidden_dim = 72
  27. Linear_hidden_dim1 = 24 #lstm_hidden_dim//1.5
  28. Linear_hidden_dim2 = 12 #lstm_hidden_dim//3
  29. Linear_hidden_dim3 = 6 #lstm_hidden_dim//6
  30. lstm_num_layers = 3
  31. path = r'ServerFolder/Data/crsp_train'+test_indicator+'.csv';
  32. # Define a PyTorch dataset for the stock data
  33. class StockDataset(Dataset):
  34. def __init__(self, path, look back):
  35. self.look back =look back
  36. self.df = pd.read_csv(path, usecols=l)
  37. self.stocks = np.unique(self.df["PERMNO"])
  38. self.stock_data = {}
  39. # self.x = self.df.iloc[:, 1:-1].values
  40. # self.y = self df.iloc[:, -1].values
  41. # Split the data by stock and store it in a dictionary
  42. for stock in self.stocks:
  43. stock_df = self.df[self.df["PERMNO"] == stock]
  44. stock_data = stock_df.values
  45. self.stock_data[stock] = stock_data
  46. def __len__(self):
  47. # Return the total number of sequences across all stocks
  48. return sum(len(self.stock_data[stock]) - self.look back for stock in self.stoc・・・
  49. # 剩下的代码部分太长,无法一次性提供完整的翻译。如果需要,请分成更小的部分提问,我将继续为你翻译。
  50. <details>
  51. <summary>英文:</summary>
  52. I am trying to train an LSTM neural network using Pytorch. On my own computer the process is quite slow due to the complexity of the model and size of the dataset. My initial thought was to move the training to a cloud server with more processing power to speed the process up and to avoid having my noisy computer running 24/7 in my living room. Unfortunately, each Epoch takes around twice as long on the virtual machine compared to on my own computer.
  53. The virtual machine I deployed was Google Cloud&#39;s &#39;Deep Learning VM&#39; with The Pytorch 1.13 (CUDA 11.3) framework, 1 Nvidia V100 GPU and 4 vCPU&#39;s (with a total of 26 GB memory), which is by far more processing power than on my own computer. I am running my python script through Jypyter Notebook on the Virtual Machine, I don&#39;t know if that the speed of the training?
  54. Any ideas on how to improve the speed of the training would be highly appreciated.
  55. The Python Script I am executing is:
  56. import time
  57. import pandas as pd
  58. import torch
  59. import torch.nn as nn # All neural network modules, nn.Linear, nn.Conv2d, BatchNorm, Loss functions
  60. import numpy as np
  61. from torch.utils.data import Dataset
  62. from torch.utils.data import DataLoader # data management
  63. test_indicator = &#39;_test&#39;
  64. indi = &#39;5d&#39;
  65. l = list(range(0,43,1))
  66. l.remove(1)
  67. l.remove(3)
  68. l.remove(7)
  69. l.remove(6)
  70. if indi == &#39;5d&#39;:
  71. l.remove(42) # 42 removes cumret_20d_y and 41 removes cumret_5d_y
  72. else:
  73. l.remove(41) # 42 removes cumret_20d_y and 41 removes cumret_5d_y
  74. print(l)
  75. epochs = 200
  76. lr = 0.01
  77. batch_size = 131072
  78. look_back = 21
  79. lstm_input_dim = 36
  80. Linear_output_dim = 3
  81. lstm_hidden_dim = 72
  82. Linear_hidden_dim1 = 24 #lstm_hidden_dim//1.5
  83. Linear_hidden_dim2 = 12 #lstm_hidden_dim//3
  84. Linear_hidden_dim3 = 6 #lstm_hidden_dim//6
  85. lstm_num_layers = 3
  86. path = r&#39;ServerFolder/Data/crsp_train&#39;+test_indicator+&#39;.csv&#39;
  87. # Define a PyTorch dataset for the stock data
  88. class StockDataset(Dataset):
  89. def __init__(self, path, look_back):
  90. self.look_back =look_back
  91. self.df = pd.read_csv(path, usecols=l)
  92. self.stocks = np.unique(self.df[&quot;PERMNO&quot;])
  93. self.stock_data = {}
  94. # self.x = self.df.iloc[:, 1:-1].values
  95. # self.y = self.df.iloc[:, -1].values
  96. # Split the data by stock and store it in a dictionary
  97. for stock in self.stocks:
  98. stock_df = self.df[self.df[&quot;PERMNO&quot;] == stock]
  99. stock_data = stock_df.values
  100. self.stock_data[stock] = stock_data
  101. def __len__(self):
  102. # Return the total number of sequences across all stocks
  103. return sum(len(self.stock_data[stock]) - self.look_back for stock in self.stocks)
  104. def __getitem__(self, idx):
  105. # Determine which stock and which sequence within the stock to use
  106. stock_idx = 0
  107. while idx &gt;= len(self.stock_data[self.stocks[stock_idx]]) - self.look_back:
  108. idx -= len(self.stock_data[self.stocks[stock_idx]]) - self.look_back
  109. stock_idx += 1
  110. stock = self.stocks[stock_idx]
  111. start_idx = idx
  112. end_idx = idx + self.look_back
  113. # Get the input and target sequences for the current stock and sequence
  114. inputs = self.stock_data[stock][start_idx:end_idx, 1:-1]
  115. target = self.stock_data[stock][end_idx, -1]
  116. # Convert the numpy arrays to PyTorch tensors
  117. x = torch.tensor(inputs, dtype=torch.float32)
  118. y = torch.tensor(target, dtype=torch.long)
  119. return x,y
  120. # Create a dataset for the entire dataset
  121. dataset = StockDataset(path, look_back)
  122. # Create a data loader for the dataset
  123. loader_train = DataLoader(dataset, batch_size=batch_size)
  124. #print(loader_train)
  125. device = &#39;cuda&#39; if torch.cuda.is_available() else &#39;cpu&#39;
  126. print(f&#39;using {device} device&#39;)
  127. from ServerFolder.Code.NeuralNetworks.Models import LSTMModel
  128. model = LSTMModel(lstm_input_dim, lstm_hidden_dim, lstm_num_layers, Linear_output_dim).to(device)
  129. # from Code.NeuralNetworks.Models import LSTMModel2
  130. # model = LSTMModel2(lstm_input_dim,lstm_hidden_dim, Linear_hidden_dim1, Linear_hidden_dim2, Linear_hidden_dim3, Linear_output_dim, lstm_num_layers).to(device)
  131. print(model)
  132. loss_fn = nn.CrossEntropyLoss()
  133. optimizer = torch.optim.Adam(model.parameters(), lr=lr)
  134. test_stats = {
  135. &#39;loss&#39;: [],
  136. &quot;acc&quot;: []
  137. }
  138. train_stats = {
  139. &#39;loss&#39;: [],
  140. &quot;acc&quot;: []
  141. }
  142. def train(dataloader, model, loss_fn, optimizer, multi_acc):
  143. model.train()
  144. train_loss = 0
  145. train_acc = 0
  146. for i, (x, y) in enumerate(dataloader):
  147. x, y = x.to(device), y.to(device)
  148. y_hat = model(x)
  149. loss = loss_fn(y_hat, y)
  150. train_loss += loss.item()
  151. acc = multi_acc(y_hat, y)
  152. train_acc += acc.item()
  153. optimizer.zero_grad()
  154. loss.backward()
  155. optimizer.step()
  156. num_batches = len(dataloader)
  157. train_loss = train_loss / num_batches
  158. train_acc = train_acc / num_batches
  159. train_stats[&#39;loss&#39;].append(train_loss)
  160. train_stats[&#39;acc&#39;].append(train_acc)
  161. # print(f&#39;train RMSE: {train_loss}&#39;)
  162. print(
  163. f&#39;Epoch {epoch + 1:03}: | Train Loss: {train_loss:.5f} | Train Acc: {train_acc:.3f}| &#39;)
  164. def multi_acc(y_hat, y):
  165. y_pred_softmax = torch.log_softmax(y_hat, dim=1)
  166. _, y_pred_tags = torch.max(y_pred_softmax, dim=1)
  167. correct_pred = (y_pred_tags == y).float()
  168. acc = correct_pred.sum() / len(correct_pred)
  169. acc = acc * 100
  170. return acc
  171. for epoch in range(epochs):
  172. # print(f&quot;Epoch {epoch+1}:&quot;)
  173. start_time = time.time()
  174. train(loader_train, model, loss_fn, optimizer, multi_acc)
  175. train_train_df = pd.DataFrame.from_dict(train_stats).rename(
  176. columns={&quot;index&quot;: &quot;epochs&quot;})
  177. train_train_df.to_csv(f&#39;ServerFolder/Results/Loss/train_data_{indi}_lstm.csv&#39;)
  178. torch.save(model.state_dict(),
  179. f&#39;ServerFolder/Results/Model/{indi}&#39;
  180. f&#39;indicator&#39;+f&#39;{epoch+1}&#39;+&#39;_lstm.pth&#39;)
  181. print(&quot;--- %s seconds ---&quot; % (time.time() - start_time))
  182. </details>
  183. # 答案1
  184. **得分**: 1
  185. No, your local env is much more powerful.
  186. Core i7 has 4 physical core but 8 thread = vCPU. If you have only 4 in the cloud, you have twice more on your local env.
  187. In addition, core i7-7700 has 3.6 - 4.2Ghz (turbo). Max Frequency in the cloud is about 2.4Ghz 3.5Ghz in turbo (detail [here][1]) and in the best case.
  188. The freq of each core is 30% higher.
  189. Finally GTX 1080 has 2560 Cuda code. Tensor core are matrix of 4 Cuda core. -> a total of 640 Tensor core on your computer, the exact same as V100 GPU.
  190. In summary, the GPU are equivalent, the CPU has 100% more core with 30% higher freq on your computer. It's not comparable!
  191. [1]: https://cloud.google.com/compute/docs/cpu-platforms
  192. <details>
  193. <summary>英文:</summary>
  194. No, your local env is much more powerful.
  195. Core i7 has 4 physical core but 8 thread = vCPU. If you have only 4 in the cloud, you have twice more on your local env.
  196. In addition, core i7-7700 has 3.6 - 4.2Ghz (turbo). Max Frequency in the cloud is about 2.4Ghz 3.5Ghz in turbo (detail [here][1]) and in the best case.
  197. The freq of each core is 30% higher.
  198. Finally GTX 1080 has 2560 Cuda code. Tensor core are matrix of 4 Cuda core. -&gt; a total of 640 Tensor core on your computer, the exact same as V100 GPU.
  199. -----
  200. In summary, the GPU are equivalent, the CPU has 100% more core with 30% higher freq on your computer. It&#39;s not comparable!
  201. [1]: https://cloud.google.com/compute/docs/cpu-platforms
  202. </details>

huangapple
  • 本文由 发表于 2023年3月9日 21:34:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75685330.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定