英文:
Is there a way to estimate the performance of this model?
问题
我不是一名工程师,我是一个PO。
这可能对一个简单的回答来说太复杂了,但还是试试看。
我正在尝试理解某些决策引擎设计的性能估计。该引擎将运行一系列的是/否问题,其中模型表示一个非平衡的二叉树,其中每个节点都是一个比较 - 对象.x >= 值?处理过程大致如下:
- 检索对象
- 从树的顶部开始
- 对象.x >= y吗?
3a. 是,下一个节点
3b. 否,下一个节点 - 重复步骤3直到分支的末端
- 决定结果
有一些框架可以很容易地解决这个问题,例如Drools。或者,它可以通过数据库中的模型来解决,我认为。
为了保持简单,这里有一些约束/假设:
- 分支的最大长度为16个节点
- 这不是一个二分搜索算法,因为树不平衡/从低到高排序,每个节点都是一个比较
- 所有进行比较所需的相关数据已经被检索
- 最多,用于比较节点的任何计算都是具有以下运算符之一的2个字段:+,-,/,*
- 使用Java/PostgreSQL编写
我该如何估计这个模型的性能?O表示法是什么?搜索所有16个节点需要多长时间?我离谱了吗?
谢谢!
我已经尝试查找Drools的性能估计,但没有找到我希望实现的信息。
英文:
I'm not an engineer, I'm a PO.
This might be too complex of a question for a simple answer, but here goes.
I’m trying to understand performance estimation for some designs of a decision engine. The engine would run on a series of yes/no questions, with a model that represents a non-balanced binary tree where every node is a comparison – is object.x >= value? The processing would be something like:
- Retrieve object
- Start at top of the tree
- Is object.x > y?
3a. Yes, next node
3b. No, next node - Repeat step 3 until the end of a branch
- Outcome decided
There are frameworks that can solve this pretty easily, i.e Drools. Or, it can be solved with a model in a database, I think.
To keep things simple, here are some constraints / assumptions:
- the maximum length of a branch is 16 nodes
- this is not a binary search algorithm as the tree is not balanced / lowest to highest
every node is a comparison - all relevant data to do the comparisons is already retrieved
- at most, any calculation for a comparison node is 2 fields with any of the following operands: +, -, /, *
- Written in java/postgres
How can I go about estimating the performance of this model? What’s the O notation? How long would a search of all 16 nodes take? Am I way off?
Thanks!
I have tried finding Drools performance estimates but could not find what I'm hoping to achieve.
答案1
得分: 1
没有办法以这种抽象的级别来估计模型的性能。正如我在评论中写的那样,绝对性能取决于许多机器、环境和实施特征。在这么多参数参与的情况下,准确预测绝对性能非常困难。
你特别问到了渐进复杂度分析(大O符号)。这旨在预测计算成本(时间、内存)随问题规模的增长情况,但是:
-
要从这里得到对绝对性能的估计,需要准备一个可工作的实施,并对一些问题规模进行性能抽样。
-
我看到唯一的缩放参数是通过决策引擎处理的对象数量,我没有理由认为你描述的决策引擎的性能会随着输入对象数量的增加而不是线性增加。
关于每个对象的决策性能,我通常期望评估16个基本值的简单比较会非常快。如果你在基本值上添加一些算术运算,甚至每个比较还有几个操作,也是如此。根据这些比较结果导航树也应该很快。然而,请注意,所有这些“快”都是相对的,并且有意模糊。
这里的变数是“检索对象”。由于你在谈论Postgres,我倾向于认为你可能是从数据库中检索你想要操作的对象,相对而言,这可能会非常慢。足够慢,以至于如果你必须将数据库检索时间包括在你想要估计的性能中,那么你就不需要考虑其他任何事情。
然而,鉴于这可能适合于某个更大的系统,也许你不必将对象检索的全部成本计算到决策过程中。也许将部分或全部成本归因于决策引擎之上游、下游或包括在内的某些其他过程会更合适。
至于任何关于绝对性能的问题,对于这类问题通常的回答是“你的测试结果如何?”。你可能能够推断测试结果适用于其他条件,但在这个特定问题中提出的任何内容都不能支持一个不同的基本答案。
英文:
> Is there a way to estimate the performance of this model?
Not at that level of abstraction, no. As I also wrote in comments, absolute performance depends on a wide range of machine, environmental, and implementation characteristics. It is extremely difficult to make accurate predictions about absolute performance when there are so many parameters in play.<sup>*</sup>
You asked specifically about asymptotic complexity analysis (big O). This is aimed at predicting how computational costs (time, memory) scale with the size of the problem, but
-
To get from there to estimates of absolute performance requires preparing a working implementation and sampling its performance for some problem sizes.
-
The only scaling parameter I see is the number of objects processed through your decision engine, and I see no reason to think that performance of a decision engine such as you describe would scale other than linearly with the the number of input objects.
With respect to per-object decision performance, I would ordinarily expect the evaluation of 16 simple comparisons of primitive values to be extremely fast. Same if you add a few arithmetic operations on primitives, even several per comparison. Navigation of the tree according to the results of those comparisons should also be fast. Note well, however, that all these "fasts" are relative and intentionally vague.
The wildcard here is "Retrieve object". Since you are talking about Postgres, I'm inclined to suppose that you may be retrieving the objects you want to operate on from a database, which you can rely on being very slow, comparatively speaking. Enough so that if you have to include database retrieval time in the performance you want to estimate then you don't need to consider anything else.
Inasmuch as this presumably fits into some larger system, however, it may be that you don't have to account the full cost of object retrieval against the decision process. Perhaps it is more appropriate to attribute some or all of that cost to some other procedure upstream of, downstream of, or encompassing the decision engine.
As for any question of absolute performance, the usual answer to such questions is "what did your tests say?". You may be able to extrapolate test results to other conditions, but nothing presented in this particular question would support a different basic answer here.
<sup>*</sup>Unless one is satisfied with extremely low precision. I'd be pretty comfortable in this case with "probably less than one minute", for example.
答案2
得分: 0
当然,我可以帮您处理这个!以下是一个简化的 Node.js 代码片段,模拟了描述的决策引擎逻辑,并提供了其性能的估算。此代码假定您已经获取了所需的对象和数据进行比较。
function performComparison(node, object) {
if (node.type === 'leaf') {
// 分支结束,返回结果
return node.outcome;
}
const leftValue = performComparison(node.left, object);
const rightValue = performComparison(node.right, object);
switch (node.operator) {
case '>':
return leftValue > rightValue;
case '>=':
return leftValue >= rightValue;
case '<':
return leftValue < rightValue;
case '<=':
return leftValue <= rightValue;
// 添加其他运算符的情况(+,-,/,*)
default:
throw new Error(`不支持的运算符:${node.operator}`);
}
}
// 示例决策树节点
const decisionTree = {
type: 'comparison',
operator: '>=',
left: { type: 'value', value: 'object.x' },
right: { type: 'constant', value: 10 },
};
// 示例对象数据
const object = { x: 15 };
const outcome = performComparison(decisionTree, object);
console.log('结果:', outcome);
请注意,此代码提供了决策引擎逻辑的基本实现,应根据您的特定用例和数据结构进行调整和扩展。
就性能和 O 表示法的估算而言,决策引擎的时间复杂性高度依赖于决策树的深度。在这种情况下,最大分支长度为 16 个节点,所以在最坏情况下,它必须遍历所有 16 个节点,导致时间复杂性为 O(16),实际上是常数时间。然而,请注意,实际性能可能会受到许多因素的影响,包括比较节点内的计算复杂性、硬件和数据库性能。要获得更准确的性能估算,可能需要实施基准测试机制,并使用代表性数据和场景测试引擎。
最后,如果您有兴趣考虑性能优化,可以考虑使用记忆化技术以避免冗余计算并缓存先前计算的结果。
英文:
Of course, I can help you with that! Below is a simplified Node.js code snippet that models the described decision engine logic and provides an estimate of its performance. This code assumes that you have already obtained the object and the data needed for comparison.
function performComparison(node, object) {
if (node.type === 'leaf') {
// End of branch, return outcome
return node.outcome;
}
const leftValue = performComparison(node.left, object);
const rightValue = performComparison(node.right, object);
switch (node.operator) {
case '>':
return leftValue > rightValue;
case '>=':
return leftValue >= rightValue;
case '<':
return leftValue < rightValue;
case '<=':
return leftValue <= rightValue;
// Add more cases for other operators (+, -, /, *)
default:
throw new Error(`Unsupported operator: ${node.operator}`);
}
}
// Example decision tree node
const decisionTree = {
type: 'comparison',
operator: '>=',
left: { type: 'value', value: 'object.x' },
right: { type: 'constant', value: 10 },
};
// Example object data
const object = { x: 15 };
const outcome = performComparison(decisionTree, object);
console.log('Outcome:', outcome);
Note that this code provides a basic implementation of the decision engine logic and should be adapted and extended for your specific use case and data structure.
In terms of performance and estimation of O notation, the time complexity of the decision engine is highly dependent on the depth of the decision tree. In this case the maximum branch length is 16 nodes, so in the worst case scenario he would have to traverse all 16 nodes, resulting in a time complexity of O(16) which is effectively constant time. Masu.
Note, however, that the actual performance may be affected by many factors, including computational complexity within the comparison node, hardware, and database performance. To get more accurate performance estimates, it may be necessary to implement a benchmarking mechanism and test the engine using representative data and scenarios.
Finally, if you are interested in considering performance optimizations, you can consider techniques such as memoization to avoid redundant computations and cache previously computed results.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论