循环遍历对象实例化会导致Python内存泄漏吗?

huangapple go评论115阅读模式
英文:

Can looping over object instantiations cause a memory leak in Python?

问题

I'm running an agent-based model in Python 3.9 using object-oriented programming. The point of the model is to simulate a predator-prey-population in a changing landscape. When I try to run multiple simulations using a for-loop, the runtime for one simulation increases with each run. I'm suspecting there is some sort of memory leak, but I'm not able to figure it out.

Here is a sketch of my code:

  1. # Parameters
  2. n_deers = ...
  3. n_wolves = ...
  4. # etc.
  5. # Functions
  6. def some_function(arg):
  7. pass
  8. # Helper objects
  9. some_dict = ...
  10. # Classes
  11. class Deer:
  12. pass
  13. class Wolf:
  14. pass
  15. class Environment:
  16. def __init__(self):
  17. self.deers = [Deer(ID = i) for i in range(n_deers)]
  18. self.wolves = [Wolf(ID = i) for i in range(n_wolves)]
  19. self.data = pd.DataFrame()
  20. def simulation(self):
  21. pass
  22. # Simulations
  23. for i in range(100):
  24. environment = Environment()
  25. environment.simulation()
  26. environment.data.to_csv()

In words: I have global parameters, global functions, and a global dictionary that the class instances use. There is a class for each type of animal, and there is a class for the environment that generates a certain number of each animal inside the environment. The environment tracks these animals in a data frame during one run of simulation, in which the animals move, feed, reproduce, die, etc.

My fear is that somehow the instances of the animals (at a full-length simulation around 7000 animals per simulation) are being dragged along in the memory. I don't have static class variables as this article warns: https://theorangeone.net/posts/static-vars/. But of course, this could be anything.

Do you have an idea what could be causing this? Any help is greatly appreciated.

EDIT

I have been able (it seems) to isolate the problem. It seems to originate from the animal movement. Here is a minimal reproducible example. As explanation: If I have the animals choose their next position at random from the adjacent cells, the problem does not seem to occur. Once I add memory, home ranges, and the function cell_choice(), the simulations take longer over time. On my machine, with this parametrization, the first simulation takes between 3 and 4 seconds, and the last between 10 and 11.

  1. # MINIMAL MOVEMENT MODEL
  2. # IMPORTS
  3. import random as rd
  4. import numpy as np
  5. import time
  6. import psutil
  7. # REPRODUCIBILITY
  8. rd.seed(42)
  9. # PARAMETERS
  10. landscape_size = 11
  11. n_deers = 100
  12. years = 10
  13. length_year = 360
  14. timesteps = years*length_year
  15. n_simulations = 20
  16. # HELPER FUNCTIONS AND OBJECTS
  17. # Landscape for first initialization
  18. mock_landscape = np.zeros((landscape_size,landscape_size))
  19. # Function to return a list of nxn cells around a given cell
  20. def range_finder(matrix, position, radius):
  21. adj = []
  22. lower = 0 - radius
  23. upper = 1 + radius
  24. for dx in range(lower, upper):
  25. for dy in range(lower, upper):
  26. rangeX = range(0, matrix.shape[0]) # Identifies X bounds
  27. rangeY = range(0, matrix.shape[1]) # Identifies Y bounds
  28. (newX, newY) = (position[0]+dx, position[1]+dy) # Identifies adjacent cell
  29. if (newX in rangeX) and (newY in rangeY) and (dx, dy) != (0, 0):
  30. adj.append((newX, newY))
  31. return adj
  32. # Nested dictionary that contains all sets of neighbors for all possible distances up to half the landscape size
  33. neighbor_dict = {d: {(i,j): range_finder(mock_landscape, (i,j), d)
  34. for i in range(landscape_size) for j in range(landscape_size)}
  35. for d in range(1,int(landscape_size/2)+1)}
  36. # Function that picks the cell in the home range that was visited longest ago
  37. def cell_choice(position, home_range, memory):
  38. # These are all the adjacent cells to the current position
  39. adjacent_cells = neighbor_dict[1][position]
  40. # This is the subset of cells of the adjacent cells belonging to homerange
  41. possible_choices = [i for i in adjacent_cells if i in home_range]
  42. # This yields the "master" indeces of those choices
  43. indeces = []
  44. for i in possible_choices:
  45. indeces.append(home_range.index(i))
  46. # This picks the index with the maximum value in the memory (ie visited longest ago)
  47. memory_values = [memory[i] for i in indeces]
  48. pick_index = indeces[memory_values.index(max(memory_values))]
  49. # Sets that values memory to zero
  50. memory[pick_index] = 0
  51. # # Adds one period to every other index
  52. other_indeces = [i for i in list(range(len(memory))) if i != pick_index]
  53. for i in other_indeces:
  54. memory[i] += 1
  55. # Returns the picked cell
  56. return home_range[pick_index]
  57. # CLASS DEFINITIONS
  58. class Deer:
  59. def __init__(self, ID):
  60. self.ID = ID
  61. self.position = (rd.randint(0,landscape_size-1),rd.randint(0,landscape_size-1))
  62. # Sets up a counter how long the deer has been in the cell
  63. self.time_spent_in_cell = 1
  64. # Defines a distance parameter that specifies the radius of the homerange around the base
  65. self.movement_radius = 1
  66. # Defines an initial home range around the position
  67. self.home_range = neighbor_dict[self.movement_radius][self.position]
  68. self.home_range.append(self.position)
  69. # Sets up a list of counters how long ago cells in the home range have been visited
  70. self.memory = [float('inf')]*len(self.home_range)
  71. self.memory[self.home_range.index(self.position)] = 0
  72. def move(self):
  73. self.position = cell_choice(self.position, self.home_range, self.memory)
  74. class Environment:
  75. def __init__(self):
  76. self.landscape = np.zeros((landscape_size, landscape_size))
  77. self.deers = [Deer(ID = i) for i in range(n_deers)]
  78. def simulation(self):
  79. for timestep in range(timesteps):
  80. for deer in self.deers:
  81. deer.move()
  82. # SIMULATIONS
  83. process = psutil.Process()
  84. times = []
  85. memory = []
  86. for i in range(1,n_simulations+1):
  87. print(i, " out of ",n_simulations)
  88. start_time = time.time()
  89. environment = Environment()
  90. environment.simulation()
  91. times.append(time.time() - start_time)
  92. memory.append(process.memory_info().rss)
  93. print(times)
  94. print(memory)
英文:

I'm running an agent-based model in Python 3.9 using object-oriented programming. The point of the model is to simulate a predator-prey-population in a changing landscape. When I try to run multiple simulations using a for-loop, the runtime for one simulation increases with each run. I'm suspecting there is some sort of memory leak, but I'm not able to figure it out.

Here is a sketch of my code:

  1. # Parameters
  2. n_deers = ...
  3. n_wolves = ...
  4. # etc.
  5. # Functions
  6. def some_function(arg):
  7. pass
  8. # Helper objects
  9. some_dict = ...
  10. # Classes
  11. class Deer:
  12. pass
  13. class Wolf:
  14. pass
  15. class Environment:
  16. def __init__(self):
  17. self.deers = [Deer(ID = i) for i in range(n_deers)]
  18. self.wolves = [Wolf(ID = i) for i in range(n_wolves)]
  19. self.data = pd.DataFrame()
  20. def simulation(self):
  21. pass
  22. # Simulations
  23. for i in range(100):
  24. environment = Environment()
  25. environment.simulation()
  26. environment.data.to_csv()

In words: I have global parameters, global functions, and a global dictionary that the class instances use. There is a class for each type of animal, and there is a class for the environment that generates a certain number of each animal inside the environment. The environment tracks these animals in a data frame during one run of simulation, in which the animals move, feed, reproduce, die etc.

My fear is that somehow the instances of the animals (at a full length-simulation around 7000 animals per simulation) are being dragged along in the memory. I don't have static class variables as this article warns: <https://theorangeone.net/posts/static-vars/> . But of course, this could be anything.

Do you have an idea what could be causing this? Any help is greatly appreciated.

EDIT

I have been able (it seems) to isolate the problem. It seems to originate from the animal movement. Here is a minimal reproducible example. As explanation: If I have the animals choose their next position at random from the adjacent cells, the problem does not seem to occur. Once I add memory, home ranges, and the function cell_choice(), the simulations take longer over time. On my machine, with this parametrization, the first simulation takes between 3 and 4 seconds, and the last between 10 and 11.

  1. # MINIMAL MOVEMENT MODEL
  2. # IMPORTS
  3. import random as rd
  4. import numpy as np
  5. import time
  6. import psutil
  7. # REPRODUCIBILITY
  8. rd.seed(42)
  9. # PARAMETERS
  10. landscape_size = 11
  11. n_deers = 100
  12. years = 10
  13. length_year = 360
  14. timesteps = years*length_year
  15. n_simulations = 20
  16. # HELPER FUNCTIONS AND OBJECTS
  17. # Landscape for first initialization
  18. mock_landscape = np.zeros((landscape_size,landscape_size))
  19. # Function to return a list of nxn cells around a given cell
  20. def range_finder(matrix, position, radius):
  21. adj = []
  22. lower = 0 - radius
  23. upper = 1 + radius
  24. for dx in range(lower, upper):
  25. for dy in range(lower, upper):
  26. rangeX = range(0, matrix.shape[0]) # Identifies X bounds
  27. rangeY = range(0, matrix.shape[1]) # Identifies Y bounds
  28. (newX, newY) = (position[0]+dx, position[1]+dy) # Identifies adjacent cell
  29. if (newX in rangeX) and (newY in rangeY) and (dx, dy) != (0, 0):
  30. adj.append((newX, newY))
  31. return adj
  32. # Nested dictionary that contains all sets of neighbors for all possible distances up to half the landscape size
  33. neighbor_dict = {d: {(i,j): range_finder(mock_landscape, (i,j), d)
  34. for i in range(landscape_size) for j in range(landscape_size)}
  35. for d in range(1,int(landscape_size/2)+1)}
  36. # Function that picks the cell in the home range that was visited longest ago
  37. def cell_choice(position, home_range, memory):
  38. # These are all the adjacent cells to the current position
  39. adjacent_cells = neighbor_dict[1][position]
  40. # This is the subset of cells of the adjacent cells belonging to homerange
  41. possible_choices = [i for i in adjacent_cells if i in home_range]
  42. # This yields the &quot;master&quot; indeces of those choices
  43. indeces = []
  44. for i in possible_choices:
  45. indeces.append(home_range.index(i))
  46. # This picks the index with the maximum value in the memory (ie visited longest ago)
  47. memory_values = [memory[i] for i in indeces]
  48. pick_index = indeces[memory_values.index(max(memory_values))]
  49. # Sets that values memory to zero
  50. memory[pick_index] = 0
  51. # # Adds one period to every other index
  52. other_indeces = [i for i in list(range(len(memory))) if i != pick_index]
  53. for i in other_indeces:
  54. memory[i] += 1
  55. # Returns the picked cell
  56. return home_range[pick_index]
  57. # CLASS DEFINITIONS
  58. class Deer:
  59. def __init__(self, ID):
  60. self.ID = ID
  61. self.position = (rd.randint(0,landscape_size-1),rd.randint(0,landscape_size-1))
  62. # Sets up a counter how long the deer has been in the cell
  63. self.time_spent_in_cell = 1
  64. # Defines a distance parameter that specifies the radius of the homerange around the base
  65. self.movement_radius = 1
  66. # Defines an initial home range around the position
  67. self.home_range = neighbor_dict[self.movement_radius][self.position]
  68. self.home_range.append(self.position)
  69. # Sets up a list of counters how long ago cells in the home range have been visited
  70. self.memory = [float(&#39;inf&#39;)]*len(self.home_range)
  71. self.memory[self.home_range.index(self.position)] = 0
  72. def move(self):
  73. self.position = cell_choice(self.position, self.home_range, self.memory)
  74. class Environment:
  75. def __init__(self):
  76. self.landscape = np.zeros((landscape_size, landscape_size))
  77. self.deers = [Deer(ID = i) for i in range(n_deers)]
  78. def simulation(self):
  79. for timestep in range(timesteps):
  80. for deer in self.deers:
  81. deer.move()
  82. # SIMULATIONS
  83. process = psutil.Process()
  84. times = []
  85. memory = []
  86. for i in range(1,n_simulations+1):
  87. print(i, &quot; out of &quot;,n_simulations)
  88. start_time = time.time()
  89. environment = Environment()
  90. environment.simulation()
  91. times.append(time.time() - start_time)
  92. memory.append(process.memory_info().rss)
  93. print(times)
  94. print(memory)

答案1

得分: 2

Deer构造函数中的这几行会有问题:

第一行使得名字 self.home_range 指向了 neighbor_dict 中一个内部字典的列表对象(这个列表对象最初是通过调用 range_finder 函数返回的)。

然后第二行改变了这个列表。这意味着后续从 neighbor_dict 检索得到的将是这个已经改变的列表的最新版本,而不是 range_finder 最初返回的值。

这些列表对象的不断增长可能会导致一些减慢,同时也会使得你的模拟结果不正确。

你可以通过让 self.home_range 指向这个列表的 副本 来修复这个问题。一种方法是:

  1. self.home_range = neighbor_dict[self.movement_radius][self.position].copy()

如果你喜欢的话,也有一些备选的语法选择。参见 如何克隆一个列表,使其在赋值后不会意外更改?

关于 Python 中名字如何指向对象的总结,也可以参见 Ned Batchelder 的“关于 Python 名字和值的事实与神话”

英文:

These lines in the constructor of Deer will be problematic:

  1. self.home_range = neighbor_dict[self.movement_radius][self.position]
  2. self.home_range.append(self.position)

The first line makes the name self.home_range refer to a list object in an inner dictionary of neighbor_dict (a list object originally returned from calling the range_finder function).

Then the second line mutates that list. This means that subsequent retrievals from neighbor_dict will get the latest version of that mutated list, not the value originally returned by range_finder.

The growing sizes of these list objects will likely cause some slowdown, but also make your simulation results incorrect.

You should be able to fix this by making self.home_range refer to a copy of the list. One way to do that is:

  1. self.home_range = neighbor_dict[self.movement_radius][self.position].copy()

There are some alternative syntactic choices for that if you prefer. See How do I clone a list so that it doesn't change unexpectedly after assignment?.

For a summary of how names refer to objects in Python, see also Ned Batchelder's "Facts and myths about Python names and values".

huangapple
  • 本文由 发表于 2023年6月29日 19:20:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76580544.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定