英文:
Is there a more efficient implementation for Pyomo models compared to the current intuitive one?
问题
I have translated the content you provided. Here it is:
A while ago I started trying to make a fair performance comparison of Pyomo and JuMP. I created a GitHub repository to share the code I use, and I am happy if anyone wants to contribute. As of now, I am trying to find a more efficient implementation for the Pyomo models than the intuitive one I came up with so far.
IJKLM
def pyomo(I, IJK, JKL, KLM, solve):
# ... (code omitted for brevity)
Supply Chain
def intuitive_pyomo(I, L, M, IJ, JK, IK, KL, LM, D, solve):
# ... (code omitted for brevity)
I measure performance in model generation time for two exemplary models I refer to as IJKLM and Supply Chain.
Models:
These are the results for increasing instance sizes:
Can anyone help improve Pyomo's performance?
英文:
A while ago I started trying to make a fair performance comparison of Pyomo and JuMP. I created a GitHub repository to share the code I use and I am happy if anyone wants to contribute. As of now, I am trying to find a more efficient implementation for the Pyomo models than the intuitive one i came up so far:
IJKLM
def pyomo(I, IJK, JKL, KLM, solve):
model = pyo.ConcreteModel()
model.I = pyo.Set(initialize=I)
x_list = [
(i, j, k, l, m) for (i, j, k) in IJK for l in JKL[j, k] for m in KLM[k, l]
]
constraint_dict_i = {
ii: ((i, j, k, l, m) for (i, j, k, l, m) in x_list if i == ii) for ii in I
}
model.x_list = pyo.Set(initialize=x_list)
model.c_dict_i = pyo.Set(model.I, initialize=constraint_dict_i)
model.z = pyo.Param(default=1)
model.x = pyo.Var(model.x_list, domain=pyo.NonNegativeReals)
model.OBJ = pyo.Objective(expr=model.z)
model.ei = pyo.Constraint(model.I, rule=ei_rule)
if solve:
opt = pyo.SolverFactory("gurobi")
opt.solve(model)
def ei_rule(model, i):
return sum(model.x[i, j, k, l, m] for i, j, k, l, m in model.c_dict_i[i]) >= 0
Supply Chain
def intuitive_pyomo(I, L, M, IJ, JK, IK, KL, LM, D, solve):
model = pyo.ConcreteModel()
model.I = pyo.Set(initialize=I)
model.L = pyo.Set(initialize=L)
model.M = pyo.Set(initialize=M)
model.IJ = pyo.Set(initialize=IJ)
model.JK = pyo.Set(initialize=JK)
model.IK = pyo.Set(initialize=IK)
model.KL = pyo.Set(initialize=KL)
model.LM = pyo.Set(initialize=LM)
model.f = pyo.Param(default=1)
model.d = pyo.Param(model.I, model.M, initialize=D)
model.x = pyo.Var(
[
(i, j, k)
for (i, j) in model.IJ
for (jj, k) in model.JK
if jj == j
for (ii, kk) in model.IK
if (ii == i) and (kk == k)
],
domain=pyo.NonNegativeReals,
)
model.y = pyo.Var(
[(i, k, l) for i in model.I for (k, l) in model.KL], domain=pyo.NonNegativeReals
)
model.z = pyo.Var(
[(i, l, m) for i in model.I for (l, m) in model.LM], domain=pyo.NonNegativeReals
)
model.OBJ = pyo.Objective(expr=model.f)
model.production = pyo.Constraint(model.IK, rule=intuitive_production_rule)
model.transport = pyo.Constraint(model.I, model.L, rule=intuitive_transport_rule)
model.demand = pyo.Constraint(model.I, model.M, rule=intuitive_demand_rule)
# model.write("int.lp")
if solve:
opt = pyo.SolverFactory("gurobi")
opt.solve(model)
def intuitive_production_rule(model, i, k):
lhs = [
model.x[i, j, k]
for (ii, j) in model.IJ
if ii == i
for (jj, kk) in model.JK
if (jj == j) and (kk == k)
]
rhs = [model.y[i, k, l] for (kk, l) in model.KL if kk == k]
if lhs or rhs:
return sum(lhs) >= sum(rhs)
else:
return pyo.Constraint.Skip
def intuitive_transport_rule(model, i, l):
lhs = [model.y[i, k, l] for (k, ll) in model.KL if ll == l]
rhs = [model.z[i, l, m] for (lll, m) in model.LM if lll == l]
if lhs or rhs:
return sum(lhs) >= sum(rhs)
else:
return pyo.Constraint.Skip
def intuitive_demand_rule(model, i, m):
return sum(model.z[i, l, m] for (l, mm) in model.LM if mm == m) >= model.d[i, m]
I measure performance in model generation time for two exemplary models I refer to as IJKLM and Supply Chain.
Models:
These are the results for increasing instance sizes:
Can anyone help to improve Pyomos performance?
答案1
得分: 3
以下是您提供的内容的中文翻译:
这里有一些关于供应链 Pyomo 模型的修改/想法,供您考虑,除了您所遇到的数据问题(范围不足和不可行性)。
-
在生产约束内部进行 lhs 集合的操作是昂贵的,如果数据是充分代表性的话,这个操作可能会更加昂贵。而且在当前构造中,您正在一个将执行
|model.IK|
次的函数内执行这个昂贵的操作。您可以从模型中单独提取数据,以预先计算model.x
的有效索引,就像在模型中显示的使用带索引集合或只是一个字典,如果您不想在模型中创建带索引的集合。这个相同的构造可以用来生成model.x
的域,就像显示的那样,以实现微小的速度提升(因为x
的域只计算一次),而不需要额外费用。 -
您可以将生产约束的数量减少到仅满足需求所需的数量(在您的构造中的
rhs
)。假设在生产模型中,您想要最小化x
,那么唯一需要的约束是存在rhs
的地方。请注意,我添加了一个小的打印语句,以显示在有需求但没有生产手段的地方的不可行性(数据缺陷)。
通过这两个调整,我在 3.8 秒的构建时间内得到了 |model.I|
的 7900 次运行。我认为这将随着更好的数据而改进(相对于原始情况)。
带有修改的您的代码:
import pyomo.environ as pyo
import logging
import timeit
import pandas as pd
import numpy as np
from collections import defaultdict
from itertools import chain
logging.getLogger("pyomo.core").setLevel(logging.ERROR)
########## 直观的 Pyomo ##########
def run_intuitive_pyomo(I, L, M, IJ, JK, IK, KL, LM, D, solve, repeats, number):
setup = {
"I": I,
"L": L,
"M": M,
"IJ": IJ,
"JK": JK,
"IK": IK,
"KL": KL,
"LM": LM,
"D": D,
"solve": solve,
"model_function": intuitive_pyomo,
}
r = timeit.repeat(
"model_function(I, L, M, IJ, JK, IK, KL, LM, D, solve)",
repeat=repeats,
number=number,
globals=setup,
)
result = pd.DataFrame(
{
"I": [len(I)],
"Language": ["直观的 Pyomo"],
"MinTime": [np.min(r)],
"MeanTime": [np.mean(r)],
"MedianTime": [np.median(r)],
}
)
return result
def intuitive_pyomo(I, L, M, IJ, JK, IK, KL, LM, D, solve):
# 数据整理
IJ_dict = defaultdict(set)
for i, j in IJ:
IJ_dict[i].add(j)
KJ_dict = defaultdict(set)
for j, k in JK:
KJ_dict[k].add(j)
# 制作一个 (i, k) : {(i, j, k) 元组} 的字典
IK_IJK = {(i, k): {(i, j, k) for j in IJ_dict.get(i, set()) & KJ_dict.get(k, set())}
for (i, k) in IK}
model = pyo.ConcreteModel()
model.I = pyo.Set(initialize=I)
model.L = pyo.Set(initialize=L)
model.M = pyo.Set(initialize=M)
model.IJ = pyo.Set(initialize=IJ)
model.JK = pyo.Set(initialize=JK)
model.IK = pyo.Set(initialize=IK)
model.KL = pyo.Set(initialize=KL)
model.LM = pyo.Set(initialize=LM)
model.IK_IJK = pyo.Set(IK_IJK.keys(), initialize=IK_IJK)
model.f = pyo.Param(default=1)
model.d = pyo.Param(model.I, model.M, initialize=D)
# x_idx = [
# (i, j, k)
# for (i, j) in model.IJ
# for (jj, k) in model.JK
# if jj == j
# for (ii, kk) in model.IK
# if (ii == i) and (kk == k)
# ]
x_idx_quick = list(chain(*IK_IJK.values()))
# assert set(x_idx) == set(x_idx_quick) # 检查一下,确保它们相同...
model.x = pyo.Var(x_idx_quick,
domain=pyo.NonNegativeReals,
)
print(f'模型.I的长度:{len(I)}')
print(f'模型.IK的长度:{len(IK)}')
print(f'模型.x的大小:{len(x_idx_quick)}')
model.y = pyo.Var(
[(i, k, l) for i in model.I for (k, l) in model.KL], domain=pyo.NonNegativeReals
)
model.z = pyo.Var(
[(i, l, m) for i in model.I for (l, m) in model.LM], domain=pyo.NonNegativeReals
)
model.OBJ = pyo.Objective(expr=model.f)
# model.write("int.lp")
if solve:
opt = pyo.SolverFactory("cbc")
opt.solve(model)
def intuitive_production_rule(model, i, k):
lhs = [model.x[i, j, k] for i, j, k in model.IK_IJK[i, k]]
rhs = [model.y[i, k, l] for (kk, l) in model.KL if kk == k]
# 显示数据不可行的情况...
# if rhs and not lhs:
# print(f'infeasible for (i, k): {i}, {k}')
if lhs and rhs:
return sum(lhs) >= sum(rhs)
else:
<details>
<summary>英文:</summary>
Here are a couple mods/notions for the Supply Chain Pyomo model for you to consider aside from the data issues that you have (under-representative scope and infeasibilities).
1. Making the lhs set inside the production constraint is expensive, and likely much more expensive if the data were representative. And in the current construct, you are executing that expensive operation inside of a function that is going to execute `|model.IK|` times. You can wrangle the data separately from the elements that you have to pre-compute the valid indices for `model.x` as shown with an indexed set in the model, or just a dictionary if you don't want to make the indexed set in the model. This same construct can be used to generate the domain for `model.x` as shown, for a tiny speedup (because the domain for `x` is only computed once) for free.
2. You can reduce the number of production constraints to only what is required by the demand (`rhs` in your construct). Presumably, in the production model, you would minimize `x`, so the only constraints needed are where there is a `rhs`. Note that I dropped in a little print statement to show infeasibilities where there is demand, but no means to produce (defect in the data).
With those 2 tweaks, I'm getting runs of `|model.I|` at 7900 in 3.8 seconds build time. I think that will improve (relative to original) with better data.
#### Your code w/ mods:
import pyomo.environ as pyo
import logging
import timeit
import pandas as pd
import numpy as np
from collections import defaultdict
from itertools import chain
logging.getLogger("pyomo.core").setLevel(logging.ERROR)
########## Intuitive Pyomo ##########
def run_intuitive_pyomo(I, L, M, IJ, JK, IK, KL, LM, D, solve, repeats, number):
setup = {
"I": I,
"L": L,
"M": M,
"IJ": IJ,
"JK": JK,
"IK": IK,
"KL": KL,
"LM": LM,
"D": D,
"solve": solve,
"model_function": intuitive_pyomo,
}
r = timeit.repeat(
"model_function(I, L, M, IJ, JK, IK, KL, LM, D, solve)",
repeat=repeats,
number=number,
globals=setup,
)
result = pd.DataFrame(
{
"I": [len(I)],
"Language": ["Intuitive Pyomo"],
"MinTime": [np.min(r)],
"MeanTime": [np.mean(r)],
"MedianTime": [np.median(r)],
}
)
return result
def intuitive_pyomo(I, L, M, IJ, JK, IK, KL, LM, D, solve):
# some data wrangling
IJ_dict = defaultdict(set)
for i, j in IJ: IJ_dict[i].add(j)
KJ_dict = defaultdict(set)
for j, k in JK: KJ_dict[k].add(j)
# make a dictionary of (i, k) : {(i, j, k) tuples}
IK_IJK = {(i, k): {(i, j, k) for j in IJ_dict.get(i, set()) & KJ_dict.get(k, set())}
for (i, k) in IK}
model = pyo.ConcreteModel()
model.I = pyo.Set(initialize=I)
model.L = pyo.Set(initialize=L)
model.M = pyo.Set(initialize=M)
model.IJ = pyo.Set(initialize=IJ)
model.JK = pyo.Set(initialize=JK)
model.IK = pyo.Set(initialize=IK)
model.KL = pyo.Set(initialize=KL)
model.LM = pyo.Set(initialize=LM)
model.IK_IJK = pyo.Set(IK_IJK.keys(), initialize=IK_IJK)
model.f = pyo.Param(default=1)
model.d = pyo.Param(model.I, model.M, initialize=D)
# x_idx = [
# (i, j, k)
# for (i, j) in model.IJ
# for (jj, k) in model.JK
# if jj == j
# for (ii, kk) in model.IK
# if (ii == i) and (kk == k)
# ]
x_idx_quick = list(chain(*IK_IJK.values()))
# assert set(x_idx) == set(x_idx_quick) # sanity check. Make sure it is same...
model.x = pyo.Var(x_idx_quick,
domain=pyo.NonNegativeReals,
)
print(f'length of model.I: {len(I)}')
print(f'length of modek.IK: {len(IK)}')
print(f'size of model.x: {len(x_idx_quick)}')
model.y = pyo.Var(
[(i, k, l) for i in model.I for (k, l) in model.KL], domain=pyo.NonNegativeReals
)
model.z = pyo.Var(
[(i, l, m) for i in model.I for (l, m) in model.LM], domain=pyo.NonNegativeReals
)
model.OBJ = pyo.Objective(expr=model.f)
# model.write("int.lp")
if solve:
opt = pyo.SolverFactory("cbc")
opt.solve(model)
def intuitive_production_rule(model, i, k):
lhs = [model.x[i, j, k] for i, j, k in model.IK_IJK[i, k]]
rhs = [model.y[i, k, l] for (kk, l) in model.KL if kk == k]
# show where the data is infeasible...
# if rhs and not lhs:
# print(f'infeasible for (i, k): {i}, {k}')
if lhs and rhs:
return sum(lhs) >= sum(rhs)
else:
return pyo.Constraint.Skip
def intuitive_transport_rule(model, i, l):
lhs = [model.y[i, k, l] for (k, ll) in model.KL if ll == l]
rhs = [model.z[i, l, m] for (lll, m) in model.LM if lll == l]
if lhs or rhs:
return sum(lhs) >= sum(rhs)
else:
return pyo.Constraint.Skip
def intuitive_demand_rule(model, i, m):
return sum(model.z[i, l, m] for (l, mm) in model.LM if mm == m) >= model.d[i, m]
model.production = pyo.Constraint(IK_IJK.keys(), rule=intuitive_production_rule)
print(f'created {len(model.production)} production constraints')
model.transport = pyo.Constraint(model.I, model.L, rule=intuitive_transport_rule)
model.demand = pyo.Constraint(model.I, model.M, rule=intuitive_demand_rule)
</details>
# 答案2
**得分**: 3
以下是要翻译的内容:
**IJLKM模型**
首先要提到的是要更新Pyomo。在Python 3.11下运行Pyomo 6.6.0,对于IJLKM模型(显示模型生成时间;Julia 1.9.0上的JuMP 1.11.1;RHEL7),会得到稍微不同的结果:
现在,要补充@AirSquid的回答,IJLKM模型与|I|呈线性关系,你应该期望Pyomo的运行时间也应该呈线性关系。它呈二次比例关系表明存在建模问题。首先要做的是找出问题出在哪里。Pyomo有一个内置工具,可以报告常见操作的时间信息,包括构建每个组件所需的时间。要启用它,请在创建模型之前调用以下代码:
```python
from pyomo.common.timing import report_timing
report_timing()
重新运行python main_IJKLM.py
得到如下结果:
0秒用于构建块ConcreteModel;1个总索引
0秒用于构建集合I;1个总索引
0.10秒用于构建集合x_list;1个总索引
5.62秒用于构建集合c_dict_i;共计3700个索引
0秒用于构建任意集合;1个总索引
0秒用于构建参数z;1个总索引
0.03秒用于构建变量x;共计69382个索引
0秒用于构建目标函数OBJ;1个总索引
0.05秒用于构建约束ei;共计3700个索引
0秒用于构建块ConcreteModel;1个总索引
0秒用于构建集合I;1个总索引
0.10秒用于构建集合x_list;1个总索引
5.71秒用于构建集合c_dict_i;共计3700个索引
0秒用于构建任意集合;1个总索引
0秒用于构建参数z;1个总索引
0.03秒用于构建变量x;共计69382个索引
0秒用于构建目标函数OBJ;1个总索引
0.05秒用于构建约束ei;共计3700个索引
Pyomo在5.82秒内完成了3700个索引。
事实上,大部分时间都用于创建c_dict_i
的IndexedSet。如果我们看一下,这就开始有意义了:
constraint_dict_i = {
ii: ((i, j, k, l, m) for (i, j, k, l, m) in x_list if i == ii) for ii in I
}
model.c_dict_i = pyo.Set(model.I, initialize=constraint_dict_i)
你正在使用生成器的字典来初始化IndexedSet,每个生成器都必须遍历整个x_list
来提取相关的子集(导致观察到二次行为)。我们最好的做法是在一次遍历中明确地创建字典,类似于以下方式:
constraint_dict_i = {i: [] for i in I}
for idx in x_list:
constraint_dict_i[idx[0]].append(idx)
最后,你的模型会因为c_dict_i
中的某些列表实际上是空的(导致一个平凡的约束)而生成错误。你的“直观Pyomo”模型跳过了这些约束,修改“Pyomo”模型中的ei_rule
将解决这个错误:
def ei_rule(model, i):
if not model.c_dict_i[i]:
return pyo.Constraint.Skip
return sum(model.x[idx] for idx in model.c_dict_i[i]) >= 0
(另外,请注意,如果你不打算操纵索引,你无需解包它们)
通过这些改变,我们现在看到了我们应该期望的线性扩展:
Supply Chain模型
我们可以将相同的技巧应用于供应链模型。从@AirSquid的回答开始,我们看到:
从report_timing()
中,我们可以看到40%的时间用于“transport”约束,30%用于“demand”约束,20%用于参数“d”,以及10%分布在其他地方。我们可以利用IndexedSet来避免重复迭代,通过定义三个新的IndexedSet:
L_M = {l: [] for l in model.L}
M_L = {m: [] for m in model.M}
for l, m in model.LM:
L_M[l].append(m)
M_L[m].append(l)
L_K = {l: [] for l in model.L}
for k, l in model.KL:
L_K[l].append(k)
model.L_M = pyo.Set(model.L, initialize=L_M)
model.M_L = pyo.Set(model.M, initialize=M_L)
model.L_K = pyo.Set(model.L, initialize=L_K)
以及重新编写约束规则:
def intuitive_transport_rule(model, i, l):
if len(model.L_K[l]) or len(model.L_M[l]):
return (
sum(model.y[i, k, l] for k in model.L_K[l]) >=
sum(model.z[i, l, m] for m in model.L_M[l])
)
return pyo.Constraint.Skip
def intuitive_demand_rule(model, i, m):
return sum(model.z[i, l, m] for l in model.M_L[m]) >= model.d[i, m]
虽然不像IJLKM模型那样戏剧性,但这仍然带来了约15%的改进。
英文:
IJKLM
The first thing to mention is to update Pyomo. Running on Pyomo 6.6.0 under Python 3.11 gives a slightly different result for the IJKLM model (showing the model generation time; JuMP 1.11.1 on Julia 1.9.0; RHEL7):
Now, to add to @AirSquid's answer, The IJKLM model scales linearly with |I|
, and you should expect that the Pyomo runtime should also scale linearly as well. The fact that it is scaling quadratically indicates a modeling problem. The first thing to do is find out where. Pyomo has a built-in tool to report the timing information for common operations, including the time it takes to construct each component. To enable that, call the following before creating a model:
from pyomo.common.timing import report_timing
report_timing()
Rerunning python main_IJKLM.py
gives:
0 seconds to construct Block ConcreteModel; 1 index total
0 seconds to construct Set I; 1 index total
0.10 seconds to construct Set x_list; 1 index total
5.62 seconds to construct Set c_dict_i; 3700 indices total
0 seconds to construct Set Any; 1 index total
0 seconds to construct Param z; 1 index total
0.03 seconds to construct Var x; 69382 indices total
0 seconds to construct Objective OBJ; 1 index total
0.05 seconds to construct Constraint ei; 3700 indices total
0 seconds to construct Block ConcreteModel; 1 index total
0 seconds to construct Set I; 1 index total
0.10 seconds to construct Set x_list; 1 index total
5.71 seconds to construct Set c_dict_i; 3700 indices total
0 seconds to construct Set Any; 1 index total
0 seconds to construct Param z; 1 index total
0.03 seconds to construct Var x; 69382 indices total
0 seconds to construct Objective OBJ; 1 index total
0.05 seconds to construct Constraint ei; 3700 indices total
Pyomo done 3700 in 5.82s
The bulk of the time is actually creating the c_dict_i
IndexedSet
. If we look at it, this starts to make sense:
constraint_dict_i = {
ii: ((i, j, k, l, m) for (i, j, k, l, m) in x_list if i == ii) for ii in I
}
model.c_dict_i = pyo.Set(model.I, initialize=constraint_dict_i)
You are initializing the IndexedSet
with a dictionary of generators, and each generator has to walk through the entire x_list
to extract the relevant subset (giving the observed quadratic behavior). We would be better off to explicitly create the dictionary of lists in a single pass with something like:
constraint_dict_i = {i: [] for i in I}
for idx in x_list:
constraint_dict_i[idx[0]].append(idx)
Finally, your model generates an error because some of the lists in c_dict_i
are actually empty (resulting in a trivial constraint). Your "Intuitive Pyomo" model skips those constraints, and modifying the ei_rule
in the "Pyomo" model will resolve that error:
def ei_rule(model, i):
if not model.c_dict_i[i]:
return pyo.Constraint.Skip
return sum(model.x[idx] for idx in model.c_dict_i[i]) >= 0
(As a sidebar, also note that you don't have to unpack the indicies if you are not going to manipulate them)
With these changes, we now see the linear scaling that we should expect:
As a final comment, the JuMP model would also benefit from a similar change to only iterate through the indexing set once.
Supply Chain
We can apply the same trick to the supply chain model. Starting from @AirSquid's answer, we see
From report_timing()
, we see that 40% of the time is in the transport
constraint, 30% in the demand
constraint, 20% in the d
Param, and 10% everywhere else. We can leverage IndexedSet
s to avoid repeated iteration by defining 3 new indexed sets:
L_M = {l: [] for l in model.L}
M_L = {m: [] for m in model.M}
for l, m in model.LM:
L_M[l].append(m)
M_L[m].append(l)
L_K = {l: [] for l in model.L}
for k, l in model.KL:
L_K[l].append(k)
model.L_M = pyo.Set(model.L, initialize=L_M)
model.M_L = pyo.Set(model.M, initialize=M_L)
model.L_K = pyo.Set(model.L, initialize=L_K)
and rewriting the constraint rules:
def intuitive_transport_rule(model, i, l):
if len(model.L_K[l]) or len(model.L_M[l]):
return (
sum(model.y[i, k, l] for k in model.L_K[l]) >=
sum(model.z[i, l, m] for m in model.L_M[l])
)
return pyo.Constraint.Skip
def intuitive_demand_rule(model, i, m):
return sum(model.z[i, l, m] for l in model.M_L[m]) >= model.d[i, m]
While not as dramatic as with the IJLKM model, this still gives a ~15% improvement:
答案3
得分: 0
感谢所有那些有用的提示和技巧。我遵循了@jsiirola在pyomo方面给出的建议,以及@Oscar Dowson在jump方面给出的建议,并想向所有感兴趣的人提供新的结果更新。
@AirSquid:感谢您指出这一点。供应链模型的数据生成是从IJKLM模型派生出来的,需要进行适应。我将尽快实施一个新的数据生成器。
英文:
Thanks for all those helpful tips and tricks. I followed the advice given by @jsiirola on the pyomo side and @Oscar Dowson on the jump side and want to give an update on the new results for anyone interested.
@AirSquid: Thanks for pointing this out. Data generation for the supply chain model is an artifact from the IJKLM model and needs to be adapted. I will implement a new data generator hopefully soonish.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论