Scipy优化:限制非零变量数量

huangapple go评论111阅读模式
英文:

Scipy optimize: limiting number of non-zero variables

问题

我的优化问题有约200个变量,但我希望找到一个只使用其中5个变量的解决方案。也就是说,195个变量应该为零,而另外5个可以非零。

我尝试了以下约束条件,但似乎优化算法完全忽略它,因为它继续使用所有200个变量,无论我是否包括约束条件。我是否遗漏了什么,或者SLSQP无法处理这个问题?

  1. import pandas as pd
  2. import numpy as np
  3. from scipy.optimize import minimize, Bounds
  4. tmp = pd.DataFrame()
  5. tmp.insert(loc=0, column='pred', value=np.random.random(200))
  6. tmp['x0'] = 0
  7. def obj(x, df=tmp):
  8. return np.sum(-x * df['pred'].values)
  9. def c2(x):
  10. return -(len(x[x!=0])-5)
  11. sol = minimize(fun=obj, x0=tmp['x0'], method='SLSQP', bounds=Bounds(-10, 10), jac=False,
  12. constraints=({'type': 'ineq', 'fun': c2}),
  13. options={'maxiter': 1000})

当我运行这个代码时,它只是将所有变量设置为10,忽略了c2。

英文:

My optimization problem has ~200 variables but I would like to find a solution that only uses 5 of them. As in 195 should be zero and the other 5 can be non zero.

I have tried the following constraint, but it seems that the optimization algorithm completely ignores it, as it continues to use all 200 variables whether I include the constraint or not. Is there something I am missing or can SLSQP just not handle this?

  1. import pandas as pd
  2. import numpy as np
  3. from scipy.optimize import minimize, Bounds
  4. tmp = pd.DataFrame()
  5. tmp.insert(loc=0,column='pred', value = np.random.random(200))
  6. tmp['x0'] = 0
  7. def obj(x, df = tmp):
  8. return np.sum(-x * df['pred'].values)
  9. def c2(x):
  10. return -(len(x[x!=0])-5)
  11. sol = minimize(fun=obj,x0=tmp['x0'],method='SLSQP',bounds=Bounds(-10,10),jac=False,
  12. constraints=({'type': 'ineq', 'fun': c2}),
  13. options={'maxiter': 1000})

When I run this it just sets everything to 10 and ignores c2.

答案1

得分: 3

slsqp(以及minimize)对于这样的问题来说并不实际,但你仍然可以使用scipy。请改用线性规划,其中所有变量都是二进制赋值。由于你的目标是线性的,因为它只是一个点积。

  1. import numpy as np
  2. from numpy.random import default_rng
  3. from scipy.optimize import linprog
  4. rand = default_rng(seed=0)
  5. pred = rand.random(200)
  6. # 我们忽略了因子10。最大解意味着所有选择的变量将乘以10。
  7. result = linprog(
  8. c=-pred, # 最大化pred与x的点积
  9. bounds=(0, 1), # 所有选择变量都是二进制的
  10. integrality=np.ones_like(pred, dtype=bool),
  11. # 需要使用确切的5个变量
  12. A_eq=np.ones((1, len(pred))), b_eq=[5],
  13. )
  14. print(result.message)
  15. assert result.success
  16. idx, = result.x.nonzero()
  17. print('使用了这些索引:', idx)
  18. print('使用了这些值:', pred[idx])
  1. 成功终止优化。 (HiGHS状态 7: 最优)
  2. 使用了这些索引: [ 26 77 94 171 194]
  3. 使用了这些值: [0.99720994 0.99509651 0.98119504 0.99491735 0.98194269]

但实际上,由于pred是非负的,这更简单:

  1. import numpy np
  2. from numpy.random import default_rng
  3. rand = default_rng(seed=0)
  4. pred = rand.random(200)
  5. print('使用这些值:', np.sort(pred)[-5:])
  1. 使用这些值: [0.98119504 0.98194269 0.99491735 0.99509651 0.99720994]

结果是一样的。

英文:

slsqp (and indeed minimize) are not practical for such a problem, but you can still use scipy. Use a linear program instead where all variables are binary assignments. Your objective is linear since it's just a dot product.

  1. import numpy as np
  2. from numpy.random import default_rng
  3. from scipy.optimize import linprog
  4. rand = default_rng(seed=0)
  5. pred = rand.random(200)
  6. # We disregard the factor of 10. The maximum solution implies
  7. # that all selected variables will be multiplied by 10.
  8. result = linprog(
  9. c=-pred, # maximize dot product of pred with x
  10. bounds=(0, 1), # all selection variables binary
  11. integrality=np.ones_like(pred, dtype=bool),
  12. # Exactly 5 of the variables need to be used
  13. A_eq=np.ones((1, len(pred))), b_eq=[5],
  14. )
  15. print(result.message)
  16. assert result.success
  17. idx, = result.x.nonzero()
  18. print('These indices were used:', idx)
  19. print('These values were used:', pred[idx])
  1. Optimization terminated successfully. (HiGHS Status 7: Optimal)
  2. These indices were used: [ 26 77 94 171 194]
  3. These values were used: [0.99720994 0.99509651 0.98119504 0.99491735 0.98194269]

But really, since pred is non-negative, this is much more simple as

  1. import numpy as np
  2. from numpy.random import default_rng
  3. rand = default_rng(seed=0)
  4. pred = rand.random(200)
  5. print('Use these values:', np.sort(pred)[-5:])
  1. Use these values: [0.98119504 0.98194269 0.99491735 0.99509651 0.99720994]

The results are the same.

huangapple
  • 本文由 发表于 2023年6月8日 00:54:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76425545.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定