英文:
Finding the minima of a multivariable function through code
问题
我有一个多变量函数(9个变量),我想找出函数记录最小值的位置。该函数如下:
它还有一些约束条件:
英文:
I have a multivariable function (9 variables), and I want to find where the function records a minimum value. The function is as follows:
It also has a few constraints:
I tried a brute force algorithm, but because of 9 variables, it has a really high time complexity, and will take a long time to run. Is there a much better way to do this?
答案1
得分: 1
以下是翻译的代码部分:
我打算从[以前的回答][1]中偷取代码。
首先,从[《计算机程序的构造和解释》][2]中获取的用于计算单变量函数的数值导数的技巧。
def derivative(fn, delta=0.001):
def inner(x):
return (fn(x + delta/2) - fn(x - delta/2)) / delta
return inner
在这里,我使用了闭包来使概念更加直观。传入一个函数,获取一个近似的导数函数。如果你的编程语言没有闭包,你可以执行相同的计算,但可能需要为每个函数编写一些重复的代码。无论如何,我们可以在不必解析工作的情况下计算出导数的良好近似值。
现在我们有一个函数`f(v)`,其中`v`是一个向量。我们希望在某一点计算近似梯度。好吧...
def gradient(fn, point):
grad = []
for i in range(len(point)):
# 这将是仅变化此维度的函数。
def fn_i(x):
point2 = [_ for _ in point]
point2[i] += x
return fn(point2)
grad.append(derivative(fn_i)(0))
return grad
(你可能希望将这个原始数组转换为某种向量对象。但这是一个编程细节。)
这里是基本的梯度下降。
def minimize(fn, start_point, step_size=1, finish_size=0.0000001, max_iter=1000000):
point = start_point
n = 0
while finish_size < step_size and n < max_iter:
n += 1
g = gradient(fn, point)
g2 = sum((x*x for x in g))**0.5
if 0 == g2:
return point
else:
# 我们前进的方向。
# 一个长度为1,指向梯度方向的向量。
vec = [-x/g2 for x in g]
start = fn(point)
point2 = [step_size * vec[i] + point[i] for i in range(len(point))]
end = fn(point2)
while start < end:
step_size /= 2
point2 = [step_size * vec[i] + point[i] for i in range(len(point))]
end = fn(point2)
point = point2
return point
请注意,我有避免无限循环的逻辑,但有可能会卡在两个点之间来回弹跳。如果这是一个问题,你可以考虑在每次步进时使步长为`step_size * (0.9 + 0.2 * random.random())`。现在它不会卡住。
现在这有什么帮助呢?嗯,你试图最小化`f(v)`,满足一个可以写成以下形式的`k`个约束集:
constraint[0](v) = 0
constraint[1](v) = 0
.
.
.
constraint[k-1](v) = 0
其中`constraint`是一个函数数组,是两个应该相等的项的差。
我们首先要解决的问题是如何解决这些约束。为此,我们将最小化:
sum([con[v]*con[v] for con in constraints])
当这个值为0时,所有约束都得到满足。上面的最小化代码非常擅长解决这个问题。
好的,现在我们需要知道如何满足我们的约束,但我们如何在它们定义的表面上执行梯度下降?这是我们的策略。
1. 计算函数的梯度。
2. 投影到表面的切平面上。
3. 循环:
- 尝试沿着切向梯度的反方向向后走一步。
- 再次找到表面。
- 如果值减小了:
- 实际上执行这一步
- 退出循环
- 否则:
- 将步长减半
其中这些步骤我们知道如何执行吗?
首先,我们知道如何取梯度 - 上面有代码。
现在,我们知道如何找到`gradient(constraint[i], v)`,对于`i in range(k)`。我们可以使用它来填充一个数组,其中所有向量都垂直于我们的表面。我们想将该数组转换为[正交归一基][3],跨越与我们的表面垂直的空间。这里的想法就像这样尚未经过测试的代码(你需要做一些工作来使其真正起作用)。
for i in range(len(vectors)):
if epsilon < length(vectors[i]):
vectors[i] = vectors[i] / length(vectors[i])
for j in range(i+1, k):
vectors[k] = vectors[k] - dot(vectors[i], vectors[k]) * vectors[i]
vectors = [v in vectors if epsilon < length(v)]
那么为什么我们要这样做呢?非常简单,因为要将梯度投影到切平面上,我们可以简单地去掉垂直于表面的部分,就像这样:
grad_f = gradient(f, v)
for constraint in vectors:
grad_f = grad_f - dot(constraint, grad_f) * constraint
现在我们有一个沿着表面的切向梯度,我们可以朝着梯度的相反方向走一步。由于表面弯曲而我们的步骤没有弯曲,我们不再位于表面上,但我们只需再次找到表面,我们已经在表面上朝着最小值走了一步。
通常,你只需选择一组起始点,并尝试最小化。这并不保证绝对最小值。但实际上,在高
<details>
<summary>英文:</summary>
I'm going to steal code from [a previous answer][1] of mine.
First, a trick from [The Structure and Interpretation of Computer Programs][2] for computing numerical derivatives of functions of one variable.
def derivative (fn, delta=0.001):
def inner (x):
return (fn(x + delta/2) - fn(x - delta/2)) / delta
return inner
Here I'm using closures to make the concept straightforward. Pass in a function, get an approximate derivative function. If your language doesn't have closures you can do the same calculations, but may have to write some repetitive code for each function. Either way, we can compute a good approximation to a derivative without having to analytically work anything out.
Now we have a function `f(v)` where `v` is a vector. We'd like to compute an approximate gradient at a point. Well...
def gradient (fn, point):
grad = []
for i in range(len(point)):
# This will be the function of just varying this dimension.
def fn_i (x):
point2 = [_ for _ in point]
point2[i] += x
return fn(point2)
grad.append(derivative(fn_i)(0))
return grad
(You might want to turn this raw array into some sort of vector object. But that is a programming detail.)
And here is basic gradient descent.
def minimize (fn, start_point, step_size=1, finish_size = 0.0000001, max_iter=1000000):
point = start_point
n = 0
while finish_size < step_size and n < max_iter:
n += 1
g = gradient(fn, point)
g2 = sum((x*x for x in g))**0.5
if 0 == g2:
return point
else:
# The direction in which we go.
# A vector of length 1 pointing AWAY from the gradient.
vec = [-x/g2 for x in g]
start = fn(point)
point2 = [step_size * vec[i] + point[i] for i in range(len(point))]
end = fn(point2)
while start < end:
step_size /= 2
point2 = [step_size * vec[i] + point[i] for i in range(len(point))]
end = fn(point2)
point = point2
return point
Note that I have logic to avoid infinite loops, but it is possible to get stuck bouncing back and forth between points. If that is a concern, you can make sense to make the step each time be of length `step_size * (0.9 + 0.2*random.random())`. Now it can't get stuck.
Now how does this help? Well you're trying to minimize `f(v)` subject to a set of `k` constraints that can be written as:
constraint[0](v) = 0
constraint[1](v) = 0
.
.
.
constraint[k-1](v) = 0
where `constraint` is an array of functions which is the difference of the two terms that are supposed to be equal.
Our first question is how to solve for the constraints. To do that we minimize:
sum([con[v]*con[v] for con in constraints])
When that is 0, all of our constraints are satisfied. And the minimization code above does a pretty good job of solving it.
OK, we now need to know how to satisfy our constraints, but how do we do gradient descent on the surface that they define? Here is our strategy.
Calculate the gradient of the function.
Project it onto the tangent plane of the surface.
while true:
Try a step backwards along the tangent gradient (gradient = increasing, go the other way)
Find the surface again
if the value decreased:
actually make the step
break out of the loop
else:
cut step size in half
Which of these steps do we know how to do?
Well first, we know how to take a gradient - there is code for that above.
Now we know how to find `gradient(constraint[i], v)` for `i in range(k)`. We can use that to populate an array of vectors that are all at right angles to our surface. We'd like to turn that array into an [orthonormal basis][3] that spans the space at right angles to our surface. The idea here is like this untested code (which you'll need to do some work to make actually work).
for i in range(len(vectors)):
if epsilon < length(vectors[i]):
vectors[i] = vectors[i] / length(vectors[i])
for j in range(i+1, k):
vectors[k] = vectors[k] - dot(vectors[i], vectors[k]) * vectors[i]
vectors = [v in vectors if epsilon < length(v)]
now why did we want to do that? Very simply because to project the gradient to the tangent space we can simply remove the part at right angles to the surface like this:
grad_f = gradient(f, v)
for constraint in vectors:
grad_f = grad_f - dot(constraint, grad_f) * constraint
And now that we have a gradient tangent to the surface, we can take a step along that direction in the opposite direction from the gradient. Because the surface bent and our step didn't, we're no longer on the surface, but we just find the surface again, and we've wound up taking a step along the surface towards a minimum.
In general you just select a bunch of points to start from, and try to minimize. This doesn't guarantee the absolute minimum. But in practice, in high dimensional space it is surprisingly hard to get stuck. So you probably do find a really good minimum. (Machine learning works in part because of this fact.)
Good luck.
[1]: https://stackoverflow.com/questions/75621538/how-to-adjust-groups-of-numbers-to-match-ratios/75623853#75623853
[2]: https://sarabander.github.io/sicp/html/
[3]: https://en.wikipedia.org/wiki/Orthonormal_basis
</details>
# 答案2
**得分**: 1
我使用[Excel求解器][1]来解决了您的问题。在使用求解器之前,它必须作为Excel的附加组件进行激活。
输入表格和结果
----
(德国的小数分隔符是逗号,抱歉!):
[![输入图片描述][2]][2]
求解器的任务是最小化一个目标函数。该目标函数是值`T`的总和,再加上平方误差的总和。六个误差是约束表达式的所需值与实际值之间的差异。
平方误差的总和乘以10000作为惩罚因子以强制小误差。
为了缩短输入,我引入了名称,如`_c1=cos(theta1)`和`_s6=sin(theta6)`,使用[Excel名称管理器][3]。
请注意,由于重复出现的子表达式,约束表达式可以大大简化。
我不确定是否输入了所有内容正确,但求解只需要几秒钟。
-----
第二次尝试使用C#编写
----
然后,我使用[mathnet-numerics][4]库来使用[Nelder-Mead][5]方法解决最小化问题。这是一种多元优化的无导数算法。
**我的C#代码:**
```csharp
// 安装 NuGet 包 Math.NET Numerics
using MathNet.Numerics.LinearAlgebra;
using MathNet.Numerics.Optimization;
namespace akNelderMeadDemo
{
// ...(代码太长,无法一次性提供完整的翻译)
}
这个方法实际上在几分之一秒内找到了一个解决方案。但不幸的是,它看起来与我之前找到的解决方案不完全相同。Excel版本隐式假定所有变量为正数。这可能是不同之处。舍入和精度阈值也可能导致这种差异。
(由于您的请求要求仅返回翻译的部分,我只提供了前半部分的翻译,如果需要继续翻译,请告诉我。)
英文:
I tackled your problem with the Excel solver.
Before using the solver, it has to be activated as Excel add-on.
Input table and results
(German decimal separator is comma, sorry for that!):
The solver was tasked to minimize an objective. The objective is the sum of the value T
plus the sum of squared errors. The six errors are the differences between required value and actual value for each of the constraint expressions.
The sum of squared errors is multiplied by 10000 as penalty factor to enforce small errors.
To abbreviate typing, I introduced names like _c1=cos(theta1)
and _s6=sin(theta6)
using the Excel name manager.
Note that the constraint expressions can be simplified quite a lot due to reoccurring subexpressions.
I am not sure if I typed in everything correctly, but the solving only takes a couple of seconds.
Second attempt with C#
Then I used the mathnet-numerics library to solve the minimization with the Nelder-Mead method. This is a derivative-free algorithm for multivariate optimization.
My C# code:
// installed NuGet Package Math.NET Numerics
using MathNet.Numerics.LinearAlgebra;
using MathNet.Numerics.Optimization;
namespace akNelderMeadDemo
{
public class SOFunction : IObjectiveFunction
{
const double Pi2 = Math.PI * 2;
const double Grd60 = Pi2 / 6;
const double Grd45 = Pi2 / 8;
private int evalCount = 0;
public SOFunction()
{
// we have nine parameters
// this is also returned in InitialGuess(), see below
Point = CreateVector.Dense<double>(9);
// initial values taken from Excel
Point[0] = 0;
Point[1] = 0.541824973;
Point[2] = 1.148728295;
Point[3] = 0.125762746;
Point[4] = 1.712425399;
Point[5] = 0.262748905;
Point[6] = 0.170976846;
Point[7] = 0.170178244;
Point[8] = 0.073184024;
}
public Vector<double> Point { get; set; }
public double Value { get; set;}
public bool IsGradientSupported => false;
public Vector<double> Gradient => throw new NotImplementedException();
public bool IsHessianSupported => false;
public Matrix<double> Hessian => throw new NotImplementedException();
public IObjectiveFunction CreateNew() => new SOFunction();
private static double Cos(double x) => Math.Cos(x);
private static double Sin(double x) => Math.Sin(x);
private static void O(string s = "") => Console.WriteLine(s);
public void EvaluateAt(Vector<double> point)
{
const int Penalty = 100000;
point.CopyTo(Point);
var t1 = Point[0];
var t2 = Point[1];
var t3 = Point[2];
var t4 = Point[3];
var t5 = Point[4];
var t6 = Point[5];
var l1 = Point[6];
var l2 = Point[7];
var l3 = Point[8];
var c1 = Cos(t1);
var s1 = Sin(t1);
var c2 = Cos(t2);
var s2 = Sin(t2);
var c3 = Cos(t3);
var s3 = Sin(t3);
var c4 = Cos(t4);
var s4 = Sin(t4);
var c5 = Cos(t5);
var s5 = Sin(t5);
var c6 = Cos(t6);
var s6 = Sin(t6);
var T1 = 2 * l1 * (l1 * c1) + 2 * l2 * (l1 * c1 + l2 * c2 / 2) + l3 * (l1 * c1 + l2 * c2 + l3 * Cos(-Grd60) / 2);
var T2 = 2 * l1 * (l1 * c3) + 2 * l2 * (l1 * c3 + l2 * c4 / 2) + l3 * (l1 * c3 + l2 * c4 + l3 * Cos(0) / 2);
var T3 = 2 * l1 * (l1 * c5) + 2 * l2 * (l1 * c5 + l2 * c6 / 2) + l3 * (l1 * c5 + l2 * c6 + l3 * Cos(Grd45) / 2);
var T = Math.Sqrt(T1 * T1 + T2 * T2 + T3 * T3);
var e1 = (l1 * c1) + (l1 * c1 + l2 * c2 / 2) + (l1 * c1 + l2 * c2 + l3 * Cos(-Grd60) / 2) - 0.75;
var e2 = (l1 * s1) + (l1 * s1 + l2 * s2 / 2) + (l1 * s1 + l2 * s2 + l3 * Sin(-Grd60) / 2) - 0.1;
var e3 = (l1 * c3) + (l1 * c3 + l2 * c4 / 2) + (l1 * c3 + l2 * c4 + l3 * Cos(0) / 2) - 0.5;
var e4 = (l1 * s3) + (l1 * s3 + l2 * s4 / 2) + (l1 * s3 + l2 * s4 + l3 * Sin(0) / 2) - 0.5;
var e5 = (l1 * c5) + (l1 * c5 + l2 * c6 / 2) + (l1 * c5 + l2 * c6 + l3 * Cos(Grd45) / 2) - 0.2;
var e6 = (l1 * s5) + (l1 * s5 + l2 * s6 / 2) + (l1 * s5 + l2 * s6 + l3 * Sin(Grd45) / 2) - 0.6;
var errorSum = e1 * e1 + e2 * e2 + e3 * e3 + e4 * e4 + e5 * e5 + e6 * e6;
Value = T + errorSum * Penalty;
}
public IObjectiveFunction Fork() => new SOFunction() { Point = Point, Value = Value };
public Vector<double> InitialGuess() => Point;
internal void ShowResult(MinimizationResult result)
{
var p = result.MinimizingPoint;
O($"theta1 = {p[0]}");
O($"theta2 = {p[1]}");
O($"theta3 = {p[2]}");
O($"theta4 = {p[3]}");
O($"theta5 = {p[4]}");
O($"theta6 = {p[5]}");
O($"l1 = {p[6]}");
O($"l2 = {p[7]}");
O($"l3 = {p[8]}");
O($"Value = {result.FunctionInfoAtMinimum.Value}");
O($"Iterations = {result.Iterations}");
}
}
internal class Minimizer
{
internal static void Minimize()
{
var nms = new NelderMeadSimplex(convergenceTolerance: 1e-15,
maximumIterations: 1000000);
var objectiveFunction = new SOFunction();
Console.WriteLine("Minimizing ...");
var initialGuess = objectiveFunction.InitialGuess();
var result = nms.FindMinimum(objectiveFunction, initialGuess);
if (result.ReasonForExit == ExitCondition.Converged)
{
objectiveFunction.ShowResult(result);
}
else
{
Console.WriteLine($"{result}");
}
}
}
}
The method actually comes up with a solution after fractions of a second. But unfortunately, it does not look identical to what I've found before. The Excel version implicitely assumes all variables to be positive. That might be the difference. Rounding and accuracy thresholds might also be causing this.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论