polars是否有用于最小二乘线性回归的API接口?

huangapple go评论63阅读模式
英文:

Does polars have an API interface for least squares linear regression?

问题

Polars库是否具有最小二乘线性回归的API接口?

我在Polars API参考文档中找不到相关信息。

如果没有的话,我该如何在仅使用Polars库的情况下实现高效的最小二乘线性回归?

import polars as pl

data = {
    'x': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
    'y': [22.0, 33.9, 44.8, 78.9, 44.3, 20.5, 30.5, 56.4, 92.3, 22.1, 88, 10.1]
}

df = pl.DataFrame(data)
英文:

Does polars have an API interface for least squares linear regression?

I can't find it in Polars API Reference

If not, how can I achieve efficient least squares linear regression if I only use the polars library?

import polars as pl

data = {
    'x': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
    'y': [22.0, 33.9, 44.8, 78.9, 44.3, 20.5, 30.5, 56.4, 92.3, 22.1, 88, 10.1]
}

df = pl.DataFrame(data)

答案1

得分: 4

你可以使用 Polars 和 Numpy 来运行最小二乘回归。

然而,由于 Polars 不是一个数据科学库,我认为使用像 sklearn 这样的库更有意义。

以下是使用 Polars 和 Numpy 运行线性回归的示例代码:

import polars as pl
import numpy as np

# 创建一个样本数据集
data = {
    'X1': [1, 2, 3, 4, 5],
    'X2': [2, 4, 6, 8, 12],
    'Y':  [2, 4, 5, 4, 5]
}
df = pl.DataFrame(data)

# 分离 X 和 Y
X = df.select(
    'X1', 'X2',
    ones = pl.lit(1)
)
Y = df['Y']

# 计算参数
X_transpose = X.transpose()
X_transpose_dot_X = np.dot(X_transpose, X)
X_transpose_dot_X_inv = np.linalg.inv(X_transpose_dot_X)
X_transpose_dot_Y = np.dot(X_transpose, Y)
theta = np.dot(X_transpose_dot_X_inv, X_transpose_dot_Y)

df = df.with_columns(
    Y_pred = pl.lit(np.dot(X, theta))
)

print(df)
print(f"intercept: {theta[-1]}")
print(f"coef_x1: {theta[0]}")
print(f"coef_x2: {theta[1]}")

希望这能帮助你运行线性回归分析。

英文:

You can run a least squares regression with a mix of Polars and Numpy.

However, as Polars is not a data science library, I think it would make sense to use libraries such as sklearn for it.

Here is an example for running a linear regression using Polars and Numpy:

import polars as pl
import numpy as np

# Create a sample dataset
data = {
    'X1': [1, 2, 3, 4, 5],
    'X2': [2, 4, 6, 8, 12],
    'Y':  [2, 4, 5, 4, 5]
}
df = pl.DataFrame(data)

# Separate X and Y
X = df.select(
    'X1', 'X2',
    ones = pl.lit(1)
)
Y = df['Y']

# Calculate the parameters
X_transpose = X.transpose()
X_transpose_dot_X = np.dot(X_transpose, X)
X_transpose_dot_X_inv = np.linalg.inv(X_transpose_dot_X)
X_transpose_dot_Y = np.dot(X_transpose, Y)
theta = np.dot(X_transpose_dot_X_inv, X_transpose_dot_Y)

df = df.with_columns(
    Y_pred = pl.lit(np.dot(X, theta))
)

print(df)
print(f"intercept: {theta[-1]}")
print(f"coef_x1: {theta[0]}")
print(f"coef_x2: {theta[1]}")

┌─────┬─────┬─────┬────────┐
 X1   X2   Y    Y_pred 
 ---  ---  ---  ---    
 i64  i64  i64  f64    
╞═════╪═════╪═════╪════════╡
 1    2    2    2.7    
 2    4    4    3.4    
 3    6    5    4.1    
 4    8    4    4.8    
 5    12   5    5.0    
└─────┴─────┴─────┴────────┘
intercept: 1.9999999999999947
coef_x1: 1.2000000000000357
coef_x2: -0.25000000000000533

huangapple
  • 本文由 发表于 2023年5月10日 21:37:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76219123.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定