Go语言的线性回归库

huangapple go评论150阅读模式
英文:

Linear regression library for Go language

问题

我正在寻找一个使用最大似然估计(MLE)或最小二乘法(LSE)实现线性回归的Go库。有人见过这样的库吗?

有一个叫做stats的库,但似乎没有我需要的功能:
https://github.com/grd/statistics

谢谢!

英文:

I'm looking for a Go library that implements linear regression with MLE or LSE.
Has anyone seen one?

There is this stats library, but it doesn't seem to have what I need:
https://github.com/grd/statistics

Thanks!

答案1

得分: 4

实现最小二乘线性回归(LSE)相当简单。

这里是JavaScript的实现 - 将其转换为Go应该很简单。

这里是一个(未经测试的)转换版本:

  1. package main
  2. import "fmt"
  3. type Point struct {
  4. X float64
  5. Y float64
  6. }
  7. func linearRegressionLSE(series []Point) []Point {
  8. q := len(series)
  9. if q == 0 {
  10. return make([]Point, 0, 0)
  11. }
  12. p := float64(q)
  13. sum_x, sum_y, sum_xx, sum_xy := 0.0, 0.0, 0.0, 0.0
  14. for _, p := range series {
  15. sum_x += p.X
  16. sum_y += p.Y
  17. sum_xx += p.X * p.X
  18. sum_xy += p.X * p.Y
  19. }
  20. m := (p*sum_xy - sum_x*sum_y) / (p*sum_xx - sum_x*sum_x)
  21. b := (sum_y / p) - (m * sum_x / p)
  22. r := make([]Point, q, q)
  23. for i, p := range series {
  24. r[i] = Point{p.X, (p.X*m + b)}
  25. }
  26. return r
  27. }
  28. func main() {
  29. // ...
  30. }
英文:

Implementing an LSE (Least Squared Error) linear regression is fairly simple.

Here's an implementation in JavaScript - it should be trivial to port to Go.


Here's an (untested) port:

  1. package main
  2. import "fmt"
  3. type Point struct {
  4. X float64
  5. Y float64
  6. }
  7. func linearRegressionLSE(series []Point) []Point {
  8. q := len(series)
  9. if q == 0 {
  10. return make([]Point, 0, 0)
  11. }
  12. p := float64(q)
  13. sum_x, sum_y, sum_xx, sum_xy := 0.0, 0.0, 0.0, 0.0
  14. for _, p := range series {
  15. sum_x += p.X
  16. sum_y += p.Y
  17. sum_xx += p.X * p.X
  18. sum_xy += p.X * p.Y
  19. }
  20. m := (p*sum_xy - sum_x*sum_y) / (p*sum_xx - sum_x*sum_x)
  21. b := (sum_y / p) - (m * sum_x / p)
  22. r := make([]Point, q, q)
  23. for i, p := range series {
  24. r[i] = Point{p.X, (p.X*m + b)}
  25. }
  26. return r
  27. }
  28. func main() {
  29. // ...
  30. }

答案2

得分: 4

我已经使用梯度下降法实现了以下内容,它只给出了系数,但可以使用任意数量的解释变量,并且具有合理的准确性:

  1. package main
  2. import "fmt"
  3. func calc_ols_params(y []float64, x[][]float64, n_iterations int, alpha float64) []float64 {
  4. thetas := make([]float64, len(x))
  5. for i := 0; i < n_iterations; i++ {
  6. my_diffs := calc_diff(thetas, y, x)
  7. my_grad := calc_gradient(my_diffs, x)
  8. for j := 0; j < len(my_grad); j++ {
  9. thetas[j] += alpha * my_grad[j]
  10. }
  11. }
  12. return thetas
  13. }
  14. func calc_diff (thetas []float64, y []float64, x[][]float64) []float64 {
  15. diffs := make([]float64, len(y))
  16. for i := 0; i < len(y); i++ {
  17. prediction := 0.0
  18. for j := 0; j < len(thetas); j++ {
  19. prediction += thetas[j] * x[j][i]
  20. }
  21. diffs[i] = y[i] - prediction
  22. }
  23. return diffs
  24. }
  25. func calc_gradient(diffs[] float64, x[][]float64) []float64 {
  26. gradient := make([]float64, len(x))
  27. for i := 0; i < len(diffs); i++ {
  28. for j := 0; j < len(x); j++ {
  29. gradient[j] += diffs[i] * x[j][i]
  30. }
  31. }
  32. for i := 0; i < len(x); i++ {
  33. gradient[i] = gradient[i] / float64(len(diffs))
  34. }
  35. return gradient
  36. }
  37. func main(){
  38. y := []float64 {3,4,5,6,7}
  39. x := [][]float64 {{1,1,1,1,1}, {4,3,2,1,3}}
  40. thetas := calc_ols_params(y, x, 100000, 0.001)
  41. fmt.Println("Thetas : ", thetas)
  42. y_2 := []float64 {1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4}
  43. x_2 := [][]float64 {{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1},
  44. {4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5},
  45. {4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5},
  46. {4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4},}
  47. thetas_2 := calc_ols_params(y_2, x_2, 100000, 0.001)
  48. fmt.Println("Thetas_2 : ", thetas_2)
  49. }

结果:

  1. Thetas : [6.999959251448524 -0.769216974483968]
  2. Thetas_2 : [1.5694174539341945 -0.06169183063112409 0.2359981255871977 0.2424327101610395]

我使用python.pandas检查了我的结果,它们非常接近:

  1. from pandas.stats.api import ols
  2. df = pd.DataFrame(np.array(x).T, columns=['x1','x2','x3','y'])
  3. ols(y=df['y'], x=df[['x1', 'x2', 'x3']])

结果为:

  1. -------------------------Summary of Regression Analysis-------------------------
  2. Formula: Y ~ <x1> + <x2> + <x3> + <intercept>
  3. Number of Observations: 23
  4. Number of Degrees of Freedom: 4
  5. R-squared: 0.5348
  6. Adj R-squared: 0.4614
  7. Rmse: 0.8254
  8. F-stat (3, 19): 7.2813, p-value: 0.0019
  9. Degrees of Freedom: model 3, resid 19
  10. -----------------------Summary of Estimated Coefficients------------------------
  11. Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
  12. --------------------------------------------------------------------------------
  13. x1 -0.0618 0.1446 -0.43 0.6741 -0.3453 0.2217
  14. x2 0.2360 0.1487 1.59 0.1290 -0.0554 0.5274
  15. x3 0.2424 0.1394 1.74 0.0983 -0.0309 0.5156
  16. intercept 1.5704 0.6331 2.48 0.0226 0.3296 2.8113
  17. ---------------------------------End of Summary---------------------------------

  1. df_1 = pd.DataFrame(np.array([[3,4,5,6,7], [4,3,2,1,3]]).T, columns=['y', 'x'])
  2. ols(y=df_1['y'], x=df_1['x'])

结果为:

  1. -------------------------Summary of Regression Analysis-------------------------
  2. Formula: Y ~ <x> + <intercept>
  3. Number of Observations: 5
  4. Number of Degrees of Freedom: 2
  5. R-squared: 0.3077
  6. Adj R-squared: 0.0769
  7. Rmse: 1.5191
  8. F-stat (1, 3): 1.3333, p-value: 0.3318
  9. Degrees of Freedom: model 1, resid 3
  10. -----------------------Summary of Estimated Coefficients------------------------
  11. Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
  12. --------------------------------------------------------------------------------
  13. x -0.7692 0.6662 -1.15 0.3318 -2.0749 0.5365
  14. intercept 7.0000 1.8605 3.76 0.0328 3.3534 10.6466
  15. ---------------------------------End of Summary---------------------------------

希望对你有所帮助!

英文:

I have implemented the following using gradient descent, it only gives the coefficients but takes any number of explanatory variables and is reasonably accurate:

  1. package main
  2. import &quot;fmt&quot;
  3. func calc_ols_params(y []float64, x[][]float64, n_iterations int, alpha float64) []float64 {
  4. thetas := make([]float64, len(x))
  5. for i := 0; i &lt; n_iterations; i++ {
  6. my_diffs := calc_diff(thetas, y, x)
  7. my_grad := calc_gradient(my_diffs, x)
  8. for j := 0; j &lt; len(my_grad); j++ {
  9. thetas[j] += alpha * my_grad[j]
  10. }
  11. }
  12. return thetas
  13. }
  14. func calc_diff (thetas []float64, y []float64, x[][]float64) []float64 {
  15. diffs := make([]float64, len(y))
  16. for i := 0; i &lt; len(y); i++ {
  17. prediction := 0.0
  18. for j := 0; j &lt; len(thetas); j++ {
  19. prediction += thetas[j] * x[j][i]
  20. }
  21. diffs[i] = y[i] - prediction
  22. }
  23. return diffs
  24. }
  25. func calc_gradient(diffs[] float64, x[][]float64) []float64 {
  26. gradient := make([]float64, len(x))
  27. for i := 0; i &lt; len(diffs); i++ {
  28. for j := 0; j &lt; len(x); j++ {
  29. gradient[j] += diffs[i] * x[j][i]
  30. }
  31. }
  32. for i := 0; i &lt; len(x); i++ {
  33. gradient[i] = gradient[i] / float64(len(diffs))
  34. }
  35. return gradient
  36. }
  37. func main(){
  38. y := []float64 {3,4,5,6,7}
  39. x := [][]float64 {{1,1,1,1,1}, {4,3,2,1,3}}
  40. thetas := calc_ols_params(y, x, 100000, 0.001)
  41. fmt.Println(&quot;Thetas : &quot;, thetas)
  42. y_2 := []float64 {1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4}
  43. x_2 := [][]float64 {{1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1},
  44. {4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5},
  45. {4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5},
  46. {4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4},}
  47. thetas_2 := calc_ols_params(y_2, x_2, 100000, 0.001)
  48. fmt.Println(&quot;Thetas_2 : &quot;, thetas_2)
  49. }

Result:

  1. Thetas : [6.999959251448524 -0.769216974483968]
  2. Thetas_2 : [1.5694174539341945 -0.06169183063112409 0.2359981255871977 0.2424327101610395]

go playground

I checked my results with python.pandas and they were very close:

  1. In [24]: from pandas.stats.api import ols
  2. In [25]: df = pd.DataFrame(np.array(x).T, columns=[&#39;x1&#39;,&#39;x2&#39;,&#39;x3&#39;,&#39;y&#39;])
  3. In [26]: from pandas.stats.api import ols
  4. In [27]: x = [
  5. [4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5],
  6. [4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5],
  7. [4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4]
  8. ]
  9. In [28]: y = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4]
  10. In [29]: x.append(y)
  11. In [30]: df = pd.DataFrame(np.array(x).T, columns=[&#39;x1&#39;,&#39;x2&#39;,&#39;x3&#39;,&#39;y&#39;])
  12. In [31]: ols(y=df[&#39;y&#39;], x=df[[&#39;x1&#39;, &#39;x2&#39;, &#39;x3&#39;]])
  13. Out[31]:
  14. -------------------------Summary of Regression Analysis-------------------------
  15. Formula: Y ~ &lt;x1&gt; + &lt;x2&gt; + &lt;x3&gt; + &lt;intercept&gt;
  16. Number of Observations: 23
  17. Number of Degrees of Freedom: 4
  18. R-squared: 0.5348
  19. Adj R-squared: 0.4614
  20. Rmse: 0.8254
  21. F-stat (3, 19): 7.2813, p-value: 0.0019
  22. Degrees of Freedom: model 3, resid 19
  23. -----------------------Summary of Estimated Coefficients------------------------
  24. Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
  25. --------------------------------------------------------------------------------
  26. x1 -0.0618 0.1446 -0.43 0.6741 -0.3453 0.2217
  27. x2 0.2360 0.1487 1.59 0.1290 -0.0554 0.5274
  28. x3 0.2424 0.1394 1.74 0.0983 -0.0309 0.5156
  29. intercept 1.5704 0.6331 2.48 0.0226 0.3296 2.8113
  30. ---------------------------------End of Summary---------------------------------

and

  1. In [34]: df_1 = pd.DataFrame(np.array([[3,4,5,6,7], [4,3,2,1,3]]).T, columns=[&#39;y&#39;, &#39;x&#39;])
  2. In [35]: df_1
  3. Out[35]:
  4. y x
  5. 0 3 4
  6. 1 4 3
  7. 2 5 2
  8. 3 6 1
  9. 4 7 3
  10. [5 rows x 2 columns]
  11. In [36]: ols(y=df_1[&#39;y&#39;], x=df_1[&#39;x&#39;])
  12. Out[36]:
  13. -------------------------Summary of Regression Analysis-------------------------
  14. Formula: Y ~ &lt;x&gt; + &lt;intercept&gt;
  15. Number of Observations: 5
  16. Number of Degrees of Freedom: 2
  17. R-squared: 0.3077
  18. Adj R-squared: 0.0769
  19. Rmse: 1.5191
  20. F-stat (1, 3): 1.3333, p-value: 0.3318
  21. Degrees of Freedom: model 1, resid 3
  22. -----------------------Summary of Estimated Coefficients------------------------
  23. Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
  24. --------------------------------------------------------------------------------
  25. x -0.7692 0.6662 -1.15 0.3318 -2.0749 0.5365
  26. intercept 7.0000 1.8605 3.76 0.0328 3.3534 10.6466
  27. ---------------------------------End of Summary---------------------------------
  28. In [37]: df_1 = pd.DataFrame(np.array([[3,4,5,6,7], [4,3,2,1,3]]).T, columns=[&#39;y&#39;, &#39;x&#39;])
  29. In [38]: ols(y=df_1[&#39;y&#39;], x=df_1[&#39;x&#39;])
  30. Out[38]:
  31. -------------------------Summary of Regression Analysis-------------------------
  32. Formula: Y ~ &lt;x&gt; + &lt;intercept&gt;
  33. Number of Observations: 5
  34. Number of Degrees of Freedom: 2
  35. R-squared: 0.3077
  36. Adj R-squared: 0.0769
  37. Rmse: 1.5191
  38. F-stat (1, 3): 1.3333, p-value: 0.3318
  39. Degrees of Freedom: model 1, resid 3
  40. -----------------------Summary of Estimated Coefficients------------------------
  41. Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
  42. --------------------------------------------------------------------------------
  43. x -0.7692 0.6662 -1.15 0.3318 -2.0749 0.5365
  44. intercept 7.0000 1.8605 3.76 0.0328 3.3534 10.6466
  45. ---------------------------------End of Summary---------------------------------

答案3

得分: 1

有一个名为gostat的项目,其中有一个贝叶斯包,应该能够进行线性回归。

不幸的是,文档有些不完整,所以你可能需要阅读代码来学习如何使用它。我自己稍微尝试了一下,但没有接触过贝叶斯包。

英文:

There's a project called gostat which has a bayes package which should be able to do linear regressions.

Unfortunately the documentation is somewhat lacking, so you'll probably have to read the code to learn how to use it. I dabbled with it a bit myself but haven't touched the bayes package.

huangapple
  • 本文由 发表于 2013年5月7日 22:58:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/16422287.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定