英文:
Scipy optimize: limiting number of non-zero variables
问题
我的优化问题有约200个变量,但我希望找到一个只使用其中5个变量的解决方案。也就是说,195个变量应该为零,而另外5个可以非零。
我尝试了以下约束条件,但似乎优化算法完全忽略它,因为它继续使用所有200个变量,无论我是否包括约束条件。我是否遗漏了什么,或者SLSQP无法处理这个问题?
import pandas as pd
import numpy as np
from scipy.optimize import minimize, Bounds
tmp = pd.DataFrame()
tmp.insert(loc=0, column='pred', value=np.random.random(200))
tmp['x0'] = 0
def obj(x, df=tmp):
return np.sum(-x * df['pred'].values)
def c2(x):
return -(len(x[x!=0])-5)
sol = minimize(fun=obj, x0=tmp['x0'], method='SLSQP', bounds=Bounds(-10, 10), jac=False,
constraints=({'type': 'ineq', 'fun': c2}),
options={'maxiter': 1000})
当我运行这个代码时,它只是将所有变量设置为10,忽略了c2。
英文:
My optimization problem has ~200 variables but I would like to find a solution that only uses 5 of them. As in 195 should be zero and the other 5 can be non zero.
I have tried the following constraint, but it seems that the optimization algorithm completely ignores it, as it continues to use all 200 variables whether I include the constraint or not. Is there something I am missing or can SLSQP just not handle this?
import pandas as pd
import numpy as np
from scipy.optimize import minimize, Bounds
tmp = pd.DataFrame()
tmp.insert(loc=0,column='pred', value = np.random.random(200))
tmp['x0'] = 0
def obj(x, df = tmp):
return np.sum(-x * df['pred'].values)
def c2(x):
return -(len(x[x!=0])-5)
sol = minimize(fun=obj,x0=tmp['x0'],method='SLSQP',bounds=Bounds(-10,10),jac=False,
constraints=({'type': 'ineq', 'fun': c2}),
options={'maxiter': 1000})
When I run this it just sets everything to 10 and ignores c2.
答案1
得分: 3
slsqp
(以及minimize
)对于这样的问题来说并不实际,但你仍然可以使用scipy。请改用线性规划,其中所有变量都是二进制赋值。由于你的目标是线性的,因为它只是一个点积。
import numpy as np
from numpy.random import default_rng
from scipy.optimize import linprog
rand = default_rng(seed=0)
pred = rand.random(200)
# 我们忽略了因子10。最大解意味着所有选择的变量将乘以10。
result = linprog(
c=-pred, # 最大化pred与x的点积
bounds=(0, 1), # 所有选择变量都是二进制的
integrality=np.ones_like(pred, dtype=bool),
# 需要使用确切的5个变量
A_eq=np.ones((1, len(pred))), b_eq=[5],
)
print(result.message)
assert result.success
idx, = result.x.nonzero()
print('使用了这些索引:', idx)
print('使用了这些值:', pred[idx])
成功终止优化。 (HiGHS状态 7: 最优)
使用了这些索引: [ 26 77 94 171 194]
使用了这些值: [0.99720994 0.99509651 0.98119504 0.99491735 0.98194269]
但实际上,由于pred
是非负的,这更简单:
import numpy np
from numpy.random import default_rng
rand = default_rng(seed=0)
pred = rand.random(200)
print('使用这些值:', np.sort(pred)[-5:])
使用这些值: [0.98119504 0.98194269 0.99491735 0.99509651 0.99720994]
结果是一样的。
英文:
slsqp
(and indeed minimize
) are not practical for such a problem, but you can still use scipy. Use a linear program instead where all variables are binary assignments. Your objective is linear since it's just a dot product.
import numpy as np
from numpy.random import default_rng
from scipy.optimize import linprog
rand = default_rng(seed=0)
pred = rand.random(200)
# We disregard the factor of 10. The maximum solution implies
# that all selected variables will be multiplied by 10.
result = linprog(
c=-pred, # maximize dot product of pred with x
bounds=(0, 1), # all selection variables binary
integrality=np.ones_like(pred, dtype=bool),
# Exactly 5 of the variables need to be used
A_eq=np.ones((1, len(pred))), b_eq=[5],
)
print(result.message)
assert result.success
idx, = result.x.nonzero()
print('These indices were used:', idx)
print('These values were used:', pred[idx])
Optimization terminated successfully. (HiGHS Status 7: Optimal)
These indices were used: [ 26 77 94 171 194]
These values were used: [0.99720994 0.99509651 0.98119504 0.99491735 0.98194269]
But really, since pred
is non-negative, this is much more simple as
import numpy as np
from numpy.random import default_rng
rand = default_rng(seed=0)
pred = rand.random(200)
print('Use these values:', np.sort(pred)[-5:])
Use these values: [0.98119504 0.98194269 0.99491735 0.99509651 0.99720994]
The results are the same.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论