英文:
Can multiple components edit parts of the same output in OpenMDAO?
问题
We have a large-ish problem in OpenMDAO where the set of variables exchanged via OpenMDAO would be much smaller if multiple different components could write to parts of the same variable name(s). Specifically:
- 4 different components are contributing to the overall cost values, to simplify this we could have a single
cost
variable that has length=4 and then each of these components would write to the corresponding index within the same cost variable. - For another part of the code, there are multiple near-duplicate components that ideally populate different elements of a given vector with a physical parameter (voltage). In this case we would like to scale the vector length with problem size. This size would be known at run time, but it would be helpful to not need to have a huge list of variables that changes with this scale.
- To complete the picture, an earlier-to run component creates the vector used as a common input for these, and they can easily index them.
- And downstream, another component aggregates the results, such that resorting to unique names per near-duplicate component might otherwise mean having to subscribe to a long and variable length set of variables names.
Ideally, for both cases if components could stably each write different aspects of the same variable and have other components that use this variable as an input simply get the most recently updated set of data. If this is not possible a work around for the fixed cost example (#1) could be to just name all of the pieces, but for the variable-length part (#2) seems we would need to create some form of wrapper around the sub-problems so it would look like a single component. But neither are quite as clean.
I have seen hints at this type of capability in the MPI support, but we are currently not planning to use mpi here for simplicity and for cross-platform ease (e.g. seems MPI can be challenging under windows, we also need run under OSX and Linux?)
Is this possible? Suggestions?
英文:
We have a large-ish problem in OpenMDAO where the set of variables exchanged via OpenMDAO would be much smaller if multiple different components could write to parts of the same variable name(s). Specifically:
- 4 different components are contributing to the overall cost values, to simplify this we could have a single
cost
variable that has length=4 and then each of these components would write to the corresponding index within the same cost variable. - For another part of the code, there are multiple near-duplicate components that ideally populate different elements of a given vector with a physical parameter (voltage). In this case we would like to scale the vector length with problem size. This size would be known at run time, but it would be helpful to not need to have a huge list of variables that changes with this scale.
- To complete the picture, an earlier-to run component creates the vector used as a common input for these, and they can easily index them.
- And downstream, another component aggregates the results, such that resorting to unique names per near-duplicate component might otherwise mean having to subscribe to a long and variable length set of variables names.
Ideally, for both cases if components could stably each write different aspects of the same variable and have other components that use this variable as an input simply get the most recently updated set of data. If this is not possible a work around for the fixed cost example (#1) could be to just name all of the pieces, but for the variable-length part (#2) seems we would need to create some form of wrapper around the sub-problems so it would look like a single component. But neither are quite as clean.
I have seen hints at this type of capability in the MPI support, but we are currently not planning to use mpi here for simplicity and for cross-platform ease (e.g. seems MPI can be challenging under windows, we also need run under OSX and Linux?)
Is this possible? Suggestions?
答案1
得分: 3
截至OpenMDAO V3.26版本,您所要求的功能是不存在的。在OpenMDAO中,您可以使用src_indices
指定要映射到组件输入的源数组的子集。然而,您只能将单个源(输出)链接到单个目标(输入)。这里是它们的工作原理,带有一个匆忙绘制的ASCII艺术图示:
源数组(输出)的长度为7。目标数组(输入)的长度为4。因此,该连接将具有src_indices
参数[0,3,4,6]。在数据传输期间,OpenMDAO将这些索引复制到tgt侧的连续数组中。关键在于只有一个连接,并且无论源端连接的src_indices
的值是什么,您在目标端只会得到一个连续的数组。
截至V3.26版本,这就是OpenMDAO允许的。从理论上讲,修改OpenMDAO以支持更复杂的连接方案是可能的,但魔鬼在于细节,截至2023年5月,尚无当前的POEM提出此类修改。尽管此前曾讨论过这个功能,特别是在2022年在俄亥俄州克利夫兰举办的OpenMDAO研讨会上。因此,如果您对此功能感兴趣,我建议您考虑编写一个POEM。为了帮助您,我会快速介绍您需要考虑的一些关键细节。
您所要求的是允许多个源(输出)连接到单个目标(输入)。即使OpenMDAO允许这种情况,现有的src_indices
数据也不足以完成连接。您需要考虑该数据在目标数组中的位置。有两种选择:
-
强制用户通过额外的
tgt_indices
参数告诉您数据应该放在目标数组的哪个位置。这将允许您指定任意源到目标的映射。然而,这可能会使错误检查非常棘手,并且会引发一个问题:如果在目标数组中某种方式出现未分配的索引怎么办?如果两个源意外地指定相同的目标索引并发生冲突怎么办? -
保持所有数据推送到与该连接相关连的目标数组中。在这种情况下,通常不需要目标索引。然而,您将面临一个问题,即在目标数组中数据的顺序是什么。因此,您可能需要通过连接中的
seq_index
参数添加某种排序信息(对于所有没有进行多源行为的正常连接,该参数基本上为1),或者将顺序隐含地定义为定义连接的顺序,或者想出我没考虑到的其他魔法方案
无论采用(1)还是(2),您可能还需要考虑目标组件的计算内部是否需要访问顺序信息。目前,组件的计算内部没有任何源信息,但我想在这种情况下可能是需要的。
我提出所有这些是为了强调,尽管在技术上是可能的,但问题并不像看起来那么简单。我也遇到过在某些情况下这个功能会很有用。因此,我从两方面看待这个问题。
我建议您做什么?
在上面给出的相当冗长的解释基础上,我的实际建议是基于您提出的两点:
> 为了完整地了解情况,早期运行的组件创建了
> 用作这些组件的共同输入,并且它们可以轻松索引
> 它们。
> 而在下游,另一个组件将结果聚合起来,
> 这样,如果每个近似重复的组件都必须使用唯一名称,可能
> 意味着必须订阅一组长而变化的变量名称。
考虑到在计算开始时有一个向量化组件,在计算结束时有另一个向量化组件,我很难理解为什么中间计算本身需要成为一系列单独的组件。它不能成为一个向量化组件吗?
对于cost
计算,为什么不将所有四个计算合并到一个单独的组件中?主要是将4个不同的compute
方法整合到一个方法中。特别是如果这4个组件之间没有共享任何连接,那么这应该是完全简单的,甚至考虑到您实现的任何解析导数。
对于voltage
组件,我再次问,为什么不将其变为一个单独的向量化组件,该组件在一个计算中处理几乎相同的计算?
一般来说,组件越少越好
即使存在某种tgt_indices
功能,我仍然会给出相同的建议。一般来说,OpenMDAO有一些用于循环的for循环(非常粗略地说,首先循环遍历组件,然后循环遍历变量)。因此,从框架开销的角度来看,使用较少的向量化组件和较少的总变量是更好的选择。
英文:
As of OpenMDAO V3.26 the functionality you are asking for does not exist. In OpenMDAO you can have src_indices
which specify a sub-set of the source array to map into a component input. However, you can only link a single src (output) to a single target (input). Here is how they work with the help of a hastily drawn ascii art diagram:
The src array (output) has a length of 7. The target array (input) has a length of 4. So this connection would have a src_indices
argument of [0,3,4,6]. During a data transfer OpenMDAO will copy those indices into a continuous array on the tgt side. The key here is that there is only one connection, and whatever the values of the src_indices
are on the source side of the connection you get only a single continuous array on the target side.
As of V3.26 This is all that OpenMDAO allows. It is theoretically possible to modify OpenMDAO to support a more complex connection scheme, but the devil is in the details and as of May 2023 there is no current POEM that proposes it. This feature has been discussed before though, particularly at the 2022 OpenMDAO workshop in Cleveland OH. So if you are interested in the feature I recommend you consider authoring a POEM. To help you, I'll quickly address a few key details you'd need to consider.
What you are asking for is to allow multiple sources (outputs) to connect to a single target (inputs). Even if OpenMDAO allowed this situation, the existing src_indices
data is not enough information to complete the connection. You would need to consider where that data is to go in the target array as well.
There are two options:
-
force users to ALSO tell you where it should go via an additional
tgt_indices
argument. This would let you specify any arbitrary mapping of source to target. However, it would potentially make error checking very tricky and leaves open the question of what to do if you somehow get unassigned indices in the target array. What if two sources accidentally specify the same target index and collide? -
keep all data pushed into the target array as continuous with respect to that connection. In this case, no target indices would be needed in general. However, you'd have a problem knowing which order the data was in target array then. So you'd either have to add some kind of ordering information to the connection via a
seq_index
argument in the connection (which would basically be 1 for all normal connections that didn't do any multi-source behavior), have the order be defined implicitly by the sequence the connections were defined int, or come up with some other magic scheme that I haven't thought of
Either way --- (1) or (2) --- you might also want to consider if the inside the compute of the target component, you'd need to have access to the sequence information at all. Components don't currently have any kind of source information inside their compute, but I imagine that it might be needed in this case.
I bring all this up to emphasize that, despite being technically possible, the issue isn't as trivial as it might seem. I have also run into cases where it would have been useful myself though. So I see the issue from both sides.
What do I suggest you do?
Having given that rather long winded explaination above, my actual advice keys off these two points that you raised:
> To complete the picture, an earlier-to run component creates the
> vector used as a common input for these, and they can easily index
> them.
> And downstream, another component aggregates the results, such
> that resorting to unique names per near-duplicate component might
> otherwise mean having to subscribe to a long and variable length set
> of variables names.
Given that you have a vectorized component at the start of the calculation, and another one at the end, I can't easily understand why the middle calculation itself needs to be a series of separate components. Could it not be a vectorized component itself?
For the cost
calculation, why not move all four calculations into a single component? It should be mostly a issue of consolidating the 4 different compute
methods into a single one. Especially of these 4 components don't share any connections with eachother then this should be completely trivial, even accounting for any analytic derivatives you've implemented.
For the voltage
component, again I ask why not make it a single vectorized component that handles the sets of nearly identicle calculations inside a single compute?
Less Components are Generally Better
Even if some kind of tgt_indices
functionality existed, I would still give the same advice. Generally speaking OpenMDAO has some for loops that (very roughly speaking) are nested by first looping over components then over the variables. So as a rule of thumb you're better off from a framework overhead prespective with fewer vectorized components and less overall variables.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论