Appending to a numpy array in for loop

huangapple go评论100阅读模式

Appending to a numpy array in for loop


I'm trying to create a Monte Carlo simulation to simulate future stock prices using Numpy arrays.

My current approach is: create a For Loop which fills an array, stock_price_array, with simulated stock prices.

These stock prices are generated by taking the last stock price, then multiplying it by 1 + an annual return.

The annual returns are drawn randomly from a normal distribution and stored in the array annual_ret.

My problem is that although the "stock price" variables I print from my For Loop appear to be correct, I simply cannot figure out how to Append these stock price variables to stock_price_array.

I've tried various methods, including initializing the stock_price_array using .full instead of .empty, changing the order of where the array appears in the For Loop, and checking the size of the array.

I've read other Stack Overflow posts on similar topics but can't figure out what I'm doing wrong.
我阅读了其他关于类似主题的Stack Overflow帖子,但无法弄清楚我做错了什么。

Thank you in advance for your help!


I'm trying to create a Monte Carlo simulation to simulate future stock prices using Numpy arrays.

My current approach is: create a For Loop which fills an array, stock_price_array, with simulated stock prices. These stock prices are generated by taking the last stock price, then multiplying it by 1 + an annual return. The annual returns are drawn randomly from a normal distribution and stored in the array annual_ret.

My problem is that although the "stock price" variables I print from my For Loop appear to be correct, I simply cannot figure out how to Append these stock price variables to stock_price_array.

I've tried various methods, including initializing the stock_price_array using .full instead of .empty, changing the order of where the array appears in the For Loop, and checking the size of the array.

I've read other Stack Overflow posts on similar topics but can't figure out what I'm doing wrong.

Thank you in advance for your help!

annual_mean = .06
annual_stdev = .15
start_stock_price = 100

numYears = 3
numSimulations = 4
stock_price_array = np.empty(numYears)

# draw an annual return from a normal distribution; this annual return will be random
annual_ret = np.random.normal(annual_mean, annual_stdev, numSimulations)

for i in range(numYears):
    stock_price = np.multiply(start_stock_price, (1 + annual_ret[i]))
    np.append(stock_price_array, [stock_price])
    start_stock_price = stock_price


得分: 2




np.cumprod(np.hstack([start_stock_price, annual_ret+1]))

因此,如果初始值为100,利率为0.1,-0.1,0.2,0.2(例如),然后hstack构建一个值数组100, 1.1, 0.9, 1.2, 1.2


100, 100×1.1=110, 100×1.1×0.9=110×0.9=99, 100×1.1×0.9×1.2=99×1.2=118.8, 100×1.1×0.9×1.2×1.2=118.8×1.2=142.56



  • 要么提前分配一个数组,就像你做的那样(stock_price_array = np.empty(numYears))。然后,不要尝试将新的stock_price附加到stock_price_array,而是只需填充已经存在的空位置之一,通过执行stock_price_array[i] = stock_price

  • 要么不这样做。然后,将np.empty行替换为stock_price_array=[]。然后,在每个步骤中,你可以使用np.append来创建新的stock_price_array,像这样 stock_price_array = np.append(stock_price_array, [stock_price])




之所以不会抛出索引错误的唯一原因是因为numSimulation仅用于决定你绘制多少annual_ret。而且由于numSimulation > numYears,你有足够多的annual_ret来计算结果。



annual_ret = np.random.normal(annual_mean, annual_stdev, (numSimulations, numYears)) # 2D数组的利率。每行一个模拟,每列一年

t = np.pad(annual_ret+1, ((0,0), (1,0)), constant_values=start_stock_price) # 添加1,就像我们之前做过的那样。并在每次模拟的开头(即`start_stock_price`)填充

res = np.cumprod(t, axis=1) # 累积乘法。`axis=1`表示沿着轴1(年份)对每行(每个模拟)执行操作

The 1st rule of numpy is: never iterate your array yourself. Use numpy function that does all the computation in batch (and for doing so, they iterate the array, sure. But that iteration is not a python iteration, so it is way faster).

No-for solution

For example, here, you could do something like this

np.cumprod(np.hstack([start_stock_price, annual_ret+1]))

What it does is 1st building an array of a initial value, and some factors.
So if initial value is 100, and interest rate are 0.1, -0.1, 0.2, 0.2 (for example), then hstack build and array of values 100, 1.1, 0.9, 1.2, 1.2.

And the cumprod just build the cumulative product of those

100, 100×1.1=110, 100×1.1×0.9=110×0.9=99, 100×1.1×0.9×1.2=99×1.2=118.8, 100×1.1×0.9×1.2×1.2=118.8×1.2=142.56

Correction of yours

To answer to your initial question anyway (even if I strongly advise that you try to use solutions like the usage of cumprod I've shown), you have 2 choices:

  • Either you allocate in advance an array, as you did (your stock_price_array = np.empty(numYears)). And then, instead of trying to append the new stock_price to stock_price_array, you should simply fill one of the empty place that are already there. By simply doing stock_price_array[i] = stock_price

  • Or you don't. And then you replace the np.empty line by a stock_price_array=[]. And then, at each step, you do append the result to create a new stock_price_array, like this stock_price_array = np.append(stock_price_array, [stock_price])

I strongly advise against the 2nd solution. Since you already know the final size of the array, it is way better to create it once. Because np.append recreate a brand new array, then copies the input data it it. It does not just extend the existing array (generally speaking, we can't do that anyway).

But, well, anyway, I advise against both solution, since I find mine (with cumprod) preferable. for is the taboo word in numpy. And it is even more so, when what inside this for is the creation of a new array, like append is.


Since you've mentioned Monte-Carlo, and then shown a code that compute only one result (you draw 1 set of annual ret, and perform one computation of future values), I am wondering if that is really what you want.
In particular, I see that you have numSimulation and numYears, that appear to be playing redundant roles in your code (and therefore in mines).
The only reason why it doesn't just throw a index error, is because numSimulation is used only to decide how many annual_ret you draw. And since numSimulation > numYears, you have more than enough annual_ret to compute the result.

Wasn't your initial intention to redo the simulation over the years numSimulation time, to have numSimulation results ?

In which case, you probably need numSimulation sets of numYears annual rate. So a 2D array. And like wise, you should be computing numSimulation series of numYears results.

If my guess is not completely off, I surmise that what you really wanted to do was rather in the effect of:

annual_ret = np.random.normal(annual_mean, annual_stdev, (numSimulations, numYears)) # 2d array of interest rate. 1 simulation per row, 1 year per column

t = np.pad(annual_ret+1, ((0,0), (1,0)), constant_values=start_stock_price) # Add 1 as we did earlier. And pad with an initial 100 (`start_stock_price`) at the beginning of each simulation

res = np.cumprod(t, axis=1) # cumulative multiplication. `axis=1` means that it is done along axis 1 (along years) for each row (for each simulation)

  • 本文由 发表于 2023年2月6日 09:05:39
  • 转载请务必保留本文链接:



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
