Python Sklearn 多元线性回归用于概率 – 将系数归一化为1

huangapple go评论100阅读模式
英文:

Python Sklearn multi-linear Regression for probabilities - normalize coefficients to 1

问题

以下是您提供的内容的翻译:

问题很简单。我有三个方程式:OmegaMIX = beta1 * Omega1 + beta2 * Omega2,我想通过线性回归找到最佳系数,其中截距为0。我已经得到了代码,但beta1和beta2是概率,所以必须满足beta1 + beta2 = 1,但实际情况并非如此。应该如何设置?

这是现有的代码:

  1. from sklearn import linear_model
  2. inputfilename = 'JO.csv' #输入
  3. df = pd.read_csv(inputfilename)
  4. x = df.drop('OmegaMIX', axis=1) #参考温度
  5. y = df['OmegaMIX'] #多参数LIRs
  6. regr = linear_model.LinearRegression(positive=True, fit_intercept=False)
  7. regr.fit(x, y)
  8. print('β1 = ', regr.coef_[0])
  9. print('β2 = ', regr.coef_[1])
  10. print("质量 = ", regr.score(x, y, sample_weight=None))

输出结果为:

  1. β1 = 0.33995522604783796
  2. β2 = 0.5794270911721245
  3. 质量 = 0.9995968335914979

输入文件为:

  1. OmegaMIX,Omega1,Omega2
  2. 2.70,4.43,2.09
  3. 1.84,3.00,1.37
  4. 0.50,1.19,0.17
英文:

The question is simple. I have three equations: OmegaMIX = beta1Omega1+beta2Omega2 and I want to find the best coefficients by the linear regression, where intercept is 0. I have obtained the code, but the beta1 and beta2 are probabilities, so it must be beta1+beta2 = 1, which is not the case. How can this be set?
This is the existing code:

  1. from sklearn import linear_model
  2. inputfilename = 'JO.csv' #input
  3. df = pd.read_csv(inputfilename)
  4. x = df.drop('OmegaMIX',axis=1) #Reference temperature
  5. y = df['OmegaMIX'] #Multiparametric LIRs
  6. regr = linear_model.LinearRegression(positive=True,fit_intercept=False)
  7. regr.fit(x, y)
  8. print('\u03B21 = ', regr.coef_[0])
  9. print('\u03B22 = ', regr.coef_[1])
  10. print("Quality = ", regr.score(x, y, sample_weight=None))

and the output is:

  1. β1 = 0.33995522604783796
  2. β2 = 0.5794270911721245
  3. Quality = 0.9995968335914979

The input file is:

  1. OmegaMIX,Omega1,Omega2
  2. 2.70,4.43,2.09
  3. 1.84,3.00,1.37
  4. 0.50,1.19,0.17

答案1

得分: 1

我不认为这可以在sklearn上完成。我更愿意使用CVXPY,因为你可以控制约束条件。

这是一个示例:

  1. import pandas as pd
  2. import cvxpy as cp
  3. df = pd.read_csv('JO.csv')
  4. # 设置矩阵X和向量y,cvxpy需要它们是numpy数组
  5. X = df.drop('OmegaMIX', axis=1).values
  6. y = df['OmegaMIX'].values
  7. # 定义问题的变量
  8. beta1 = cp.Variable()
  9. beta2 = cp.Variable()
  10. # 设置约束条件
  11. constraints = [beta1 >= 0, beta2 >= 0, beta1 + beta2 == 1]
  12. # 定义线性回归问题
  13. objective = cp.Minimize(cp.sum_squares(X[:, 0] * beta1 + X[:, 1] * beta2 - y))
  14. # 解决问题
  15. problem = cp.Problem(objective, constraints)
  16. problem.solve()
  17. # 打印beta值
  18. print('β1 =', beta1.value)
  19. print('β2 =', beta2.value)
  20. print('β1 + β2 =', (beta1.value + beta2.value))
  21. print("质量 =", problem.value)

请注意,这是原文的翻译。

英文:

I don't think this can be done on sklearn. I would rather use CVXPY as you can control the constraints.

Here's an example :

  1. import pandas as pd
  2. import cvxpy as cp
  3. df = pd.read_csv('JO.csv')
  4. # Setting out matrix X and our vector y, cvxpy needs them to be numpy arrays
  5. X = df.drop('OmegaMIX', axis=1).values
  6. y = df['OmegaMIX'].values
  7. # Defining the variables of the problem
  8. beta1 = cp.Variable()
  9. beta2 = cp.Variable()
  10. # setting the constraints
  11. constraints = [beta1 >= 0, beta2 >= 0, beta1 + beta2 == 1]
  12. # Defining the linear regression problem
  13. objective = cp.Minimize(cp.sum_squares(X[:, 0] * beta1 + X[:, 1] * beta2 - y))
  14. # Solving the probelem
  15. problem = cp.Problem(objective, constraints)
  16. problem.solve()
  17. # printing the betas
  18. print('\u03B21 =', beta1.value)
  19. print('\u03B22 =', beta2.value)
  20. print('\u03B21 + \u03B22 =', (beta1.value + beta2.value))
  21. print("Quality =", problem.value)

huangapple
  • 本文由 发表于 2023年5月21日 09:28:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76297944.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定