英文:
Appending to a numpy array in for loop
问题
I'm trying to create a Monte Carlo simulation to simulate future stock prices using Numpy arrays.
我正在尝试使用NumPy数组创建蒙特卡洛模拟以模拟未来的股价。
My current approach is: create a For Loop which fills an array, stock_price_array, with simulated stock prices.
我的当前方法是:创建一个For循环,用模拟的股价填充数组stock_price_array。
These stock prices are generated by taking the last stock price, then multiplying it by 1 + an annual return.
这些股价是通过取最后一次的股价,然后乘以1加上年回报来生成的。
The annual returns are drawn randomly from a normal distribution and stored in the array annual_ret.
年回报是从正态分布中随机抽取的,然后存储在数组annual_ret中。
My problem is that although the "stock price" variables I print from my For Loop appear to be correct, I simply cannot figure out how to Append these stock price variables to stock_price_array.
我的问题是,尽管我从For循环中打印出来的“股价”变量似乎是正确的,但我简单地无法弄清楚如何将这些股价变量附加到stock_price_array中。
I've tried various methods, including initializing the stock_price_array using .full instead of .empty, changing the order of where the array appears in the For Loop, and checking the size of the array.
我尝试了各种方法,包括使用.full而不是.empty初始化stock_price_array,更改数组在For循环中出现的顺序,并检查数组的大小。
I've read other Stack Overflow posts on similar topics but can't figure out what I'm doing wrong.
我阅读了其他关于类似主题的Stack Overflow帖子,但无法弄清楚我做错了什么。
Thank you in advance for your help!
在此提前感谢您的帮助!
英文:
I'm trying to create a Monte Carlo simulation to simulate future stock prices using Numpy arrays.
My current approach is: create a For Loop which fills an array, stock_price_array, with simulated stock prices. These stock prices are generated by taking the last stock price, then multiplying it by 1 + an annual return. The annual returns are drawn randomly from a normal distribution and stored in the array annual_ret.
My problem is that although the "stock price" variables I print from my For Loop appear to be correct, I simply cannot figure out how to Append these stock price variables to stock_price_array.
I've tried various methods, including initializing the stock_price_array using .full instead of .empty, changing the order of where the array appears in the For Loop, and checking the size of the array.
I've read other Stack Overflow posts on similar topics but can't figure out what I'm doing wrong.
Thank you in advance for your help!
annual_mean = .06
annual_stdev = .15
start_stock_price = 100
numYears = 3
numSimulations = 4
stock_price_array = np.empty(numYears)
# draw an annual return from a normal distribution; this annual return will be random
annual_ret = np.random.normal(annual_mean, annual_stdev, numSimulations)
for i in range(numYears):
stock_price = np.multiply(start_stock_price, (1 + annual_ret[i]))
np.append(stock_price_array, [stock_price])
start_stock_price = stock_price
答案1
得分: 2
numpy的第一条规则是:永远不要手动迭代数组。使用numpy函数进行批处理计算(它们会迭代数组,但这不是Python迭代,因此速度更快)。
无for循环解决方案
例如,在这里,你可以像这样做
np.cumprod(np.hstack([start_stock_price, annual_ret+1]))
它的作用是首先构建一个包含初始值和一些因子的数组。
因此,如果初始值为100,利率为0.1,-0.1,0.2,0.2(例如),然后hstack
构建一个值数组100, 1.1, 0.9, 1.2, 1.2
。
然后cumprod
只是构建这些值的累积乘积
100, 100×1.1=110, 100×1.1×0.9=110×0.9=99, 100×1.1×0.9×1.2=99×1.2=118.8, 100×1.1×0.9×1.2×1.2=118.8×1.2=142.56
你的更正
无论如何回答你的初始问题(即使我强烈建议你尝试使用像我展示的cumprod
的解决方案),你有两个选择:
-
要么提前分配一个数组,就像你做的那样(
stock_price_array = np.empty(numYears)
)。然后,不要尝试将新的stock_price
附加到stock_price_array
,而是只需填充已经存在的空位置之一,通过执行stock_price_array[i] = stock_price
-
要么不这样做。然后,将
np.empty
行替换为stock_price_array=[]
。然后,在每个步骤中,你可以使用np.append
来创建新的stock_price_array
,像这样stock_price_array = np.append(stock_price_array, [stock_price])
我强烈不建议第二种解决方案。因为你已经知道数组的最终大小,最好一次性创建它。因为np.append
会重新创建一个全新的数组,然后将输入数据复制到其中。它不仅仅是扩展现有数组(一般来说,我们无法这样做)。
但无论如何,我都不建议这两种解决方案,因为我认为我的解决方案(使用cumprod)更可取。在numpy中,“for”是一个禁忌词。尤其是当for循环内部的操作是创建一个新数组,就像append
一样。
蒙特卡罗
既然你提到了蒙特卡罗,然后展示了一个只计算一个结果的代码(你生成了1组年度回报,并执行了一次未来价值的计算),我想知道这是否真的是你想要的。
特别是,我注意到你有numSimulation
和numYears
,它们在你的代码中起了冗余的作用(因此在我的代码中也是如此)。
之所以不会抛出索引错误的唯一原因是因为numSimulation
仅用于决定你绘制多少annual_ret
。而且由于numSimulation > numYears
,你有足够多的annual_ret
来计算结果。
你最初的意图是否是要多次在numSimulation
年内重新进行模拟,以获得numSimulation
个结果?
如果我猜测得不完全准确,我认为你真正想做的可能更像是:
annual_ret = np.random.normal(annual_mean, annual_stdev, (numSimulations, numYears)) # 2D数组的利率。每行一个模拟,每列一年
t = np.pad(annual_ret+1, ((0,0), (1,0)), constant_values=start_stock_price) # 添加1,就像我们之前做过的那样。并在每次模拟的开头(即`start_stock_price`)填充
res = np.cumprod(t, axis=1) # 累积乘法。`axis=1`表示沿着轴1(年份)对每行(每个模拟)执行操作
英文:
The 1st rule of numpy is: never iterate your array yourself. Use numpy function that does all the computation in batch (and for doing so, they iterate the array, sure. But that iteration is not a python iteration, so it is way faster).
No-for solution
For example, here, you could do something like this
np.cumprod(np.hstack([start_stock_price, annual_ret+1]))
What it does is 1st building an array of a initial value, and some factors.
So if initial value is 100, and interest rate are 0.1, -0.1, 0.2, 0.2 (for example), then hstack
build and array of values 100, 1.1, 0.9, 1.2, 1.2
.
And the cumprod
just build the cumulative product of those
100, 100×1.1=110, 100×1.1×0.9=110×0.9=99, 100×1.1×0.9×1.2=99×1.2=118.8, 100×1.1×0.9×1.2×1.2=118.8×1.2=142.56
Correction of yours
To answer to your initial question anyway (even if I strongly advise that you try to use solutions like the usage of cumprod
I've shown), you have 2 choices:
-
Either you allocate in advance an array, as you did (your
stock_price_array = np.empty(numYears)
). And then, instead of trying to append the newstock_price
tostock_price_array
, you should simply fill one of the empty place that are already there. By simply doingstock_price_array[i] = stock_price
-
Or you don't. And then you replace the
np.empty
line by astock_price_array=[]
. And then, at each step, you do append the result to create a newstock_price_array
, like thisstock_price_array = np.append(stock_price_array, [stock_price])
I strongly advise against the 2nd solution. Since you already know the final size of the array, it is way better to create it once. Because np.append
recreate a brand new array, then copies the input data it it. It does not just extend the existing array (generally speaking, we can't do that anyway).
But, well, anyway, I advise against both solution, since I find mine (with cumprod) preferable. for
is the taboo word in numpy. And it is even more so, when what inside this for is the creation of a new array, like append
is.
Monte-Carlo
Since you've mentioned Monte-Carlo, and then shown a code that compute only one result (you draw 1 set of annual ret, and perform one computation of future values), I am wondering if that is really what you want.
In particular, I see that you have numSimulation
and numYears
, that appear to be playing redundant roles in your code (and therefore in mines).
The only reason why it doesn't just throw a index error, is because numSimulation
is used only to decide how many annual_ret
you draw. And since numSimulation > numYears
, you have more than enough annual_ret to compute the result.
Wasn't your initial intention to redo the simulation over the years numSimulation
time, to have numSimulation
results ?
In which case, you probably need numSimulation
sets of numYears
annual rate. So a 2D array. And like wise, you should be computing numSimulation
series of numYears
results.
If my guess is not completely off, I surmise that what you really wanted to do was rather in the effect of:
annual_ret = np.random.normal(annual_mean, annual_stdev, (numSimulations, numYears)) # 2d array of interest rate. 1 simulation per row, 1 year per column
t = np.pad(annual_ret+1, ((0,0), (1,0)), constant_values=start_stock_price) # Add 1 as we did earlier. And pad with an initial 100 (`start_stock_price`) at the beginning of each simulation
res = np.cumprod(t, axis=1) # cumulative multiplication. `axis=1` means that it is done along axis 1 (along years) for each row (for each simulation)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论