英文:
Does polars have an API interface for least squares linear regression?
问题
Polars库是否具有最小二乘线性回归的API接口?
我在Polars API参考文档中找不到相关信息。
如果没有的话,我该如何在仅使用Polars库的情况下实现高效的最小二乘线性回归?
import polars as pl
data = {
'x': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'y': [22.0, 33.9, 44.8, 78.9, 44.3, 20.5, 30.5, 56.4, 92.3, 22.1, 88, 10.1]
}
df = pl.DataFrame(data)
英文:
Does polars have an API interface for least squares linear regression?
I can't find it in Polars API Reference
If not, how can I achieve efficient least squares linear regression if I only use the polars library?
import polars as pl
data = {
'x': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'y': [22.0, 33.9, 44.8, 78.9, 44.3, 20.5, 30.5, 56.4, 92.3, 22.1, 88, 10.1]
}
df = pl.DataFrame(data)
答案1
得分: 4
你可以使用 Polars 和 Numpy 来运行最小二乘回归。
然而,由于 Polars 不是一个数据科学库,我认为使用像 sklearn 这样的库更有意义。
以下是使用 Polars 和 Numpy 运行线性回归的示例代码:
import polars as pl
import numpy as np
# 创建一个样本数据集
data = {
'X1': [1, 2, 3, 4, 5],
'X2': [2, 4, 6, 8, 12],
'Y': [2, 4, 5, 4, 5]
}
df = pl.DataFrame(data)
# 分离 X 和 Y
X = df.select(
'X1', 'X2',
ones = pl.lit(1)
)
Y = df['Y']
# 计算参数
X_transpose = X.transpose()
X_transpose_dot_X = np.dot(X_transpose, X)
X_transpose_dot_X_inv = np.linalg.inv(X_transpose_dot_X)
X_transpose_dot_Y = np.dot(X_transpose, Y)
theta = np.dot(X_transpose_dot_X_inv, X_transpose_dot_Y)
df = df.with_columns(
Y_pred = pl.lit(np.dot(X, theta))
)
print(df)
print(f"intercept: {theta[-1]}")
print(f"coef_x1: {theta[0]}")
print(f"coef_x2: {theta[1]}")
希望这能帮助你运行线性回归分析。
英文:
You can run a least squares regression with a mix of Polars and Numpy.
However, as Polars is not a data science library, I think it would make sense to use libraries such as sklearn for it.
Here is an example for running a linear regression using Polars and Numpy:
import polars as pl
import numpy as np
# Create a sample dataset
data = {
'X1': [1, 2, 3, 4, 5],
'X2': [2, 4, 6, 8, 12],
'Y': [2, 4, 5, 4, 5]
}
df = pl.DataFrame(data)
# Separate X and Y
X = df.select(
'X1', 'X2',
ones = pl.lit(1)
)
Y = df['Y']
# Calculate the parameters
X_transpose = X.transpose()
X_transpose_dot_X = np.dot(X_transpose, X)
X_transpose_dot_X_inv = np.linalg.inv(X_transpose_dot_X)
X_transpose_dot_Y = np.dot(X_transpose, Y)
theta = np.dot(X_transpose_dot_X_inv, X_transpose_dot_Y)
df = df.with_columns(
Y_pred = pl.lit(np.dot(X, theta))
)
print(df)
print(f"intercept: {theta[-1]}")
print(f"coef_x1: {theta[0]}")
print(f"coef_x2: {theta[1]}")
┌─────┬─────┬─────┬────────┐
│ X1 ┆ X2 ┆ Y ┆ Y_pred │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ f64 │
╞═════╪═════╪═════╪════════╡
│ 1 ┆ 2 ┆ 2 ┆ 2.7 │
│ 2 ┆ 4 ┆ 4 ┆ 3.4 │
│ 3 ┆ 6 ┆ 5 ┆ 4.1 │
│ 4 ┆ 8 ┆ 4 ┆ 4.8 │
│ 5 ┆ 12 ┆ 5 ┆ 5.0 │
└─────┴─────┴─────┴────────┘
intercept: 1.9999999999999947
coef_x1: 1.2000000000000357
coef_x2: -0.25000000000000533
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论