2023年7月3日 04:05:12go评论132阅读模式

英文:

Linear regression predictions are way off i would like some pointers as to what should i change

问题

我使用Python与sklearn和pandas库进行机器学习项目。
数据集包含来自Steam的71,000款游戏，包括它们的评分和游戏时间，我从Kaggle上获取的，名为“SteamGames（71k games）”由“MEXWELL”创建，如果你想自己查看数据。

我这个项目的目标是预测平均游戏时间，我尝试过使用评分、价格和推荐作为我的训练集选项。

对于游戏时间的预测结果甚至有时候会出现负数。

这里是我创建DataFrame的代码：

games_data = pd.read_csv("games.csv")
games_data = games_data.dropna(axis=0)
train, test = train_test_split(games_data, test_size=0.2)
games_data.columns

这里是我将数据集分割为训练集和测试集的代码：

y_train = train['Average playtime forever']
game_features = ['Score rank', 'Recommendations']
X_train = train[game_features]
X_train.describe()
y_test = test['Average playtime forever']
X_test = test[game_features]

这里是训练模型的代码：

games_model = linear_model.LinearRegression()
games_model.fit(X_train, y_train)

这是预测的结果：

print("对以下5款游戏进行预测")
print(X_test.head())
print("预测结果为：\n")
print(games_model.predict(X_test.head()))
print("实际值为：")
print(y_test.head())

我是一个机器学习初学者，只是尝试做了一个小项目，希望能从中学到一些东西。如果能给我一些建议或者整体方法上的改进，我会很高兴的。

英文:

I use Python with sklearn and pandas libraries for and ML project
the dataset contains 71k games from steam with their scores and playtime, I took it from kaggle its called "SteamGames (71k games)" by "MEXWELL" if you want to see the data yourself.
my goal with this project is to predict the average time played, I tried different options with my training set as score, price and Recommendations.
and my predictions for the play time are way off even as to go to the negatives some time.

here I create the dataframe:

games_data = pd.read_csv(&quot;games.csv&quot;)
games_data = games_data.dropna(axis=0)
train, test = train_test_split(games_data, test_size=0.2)
games_data.columns

here I split into train and test:

y_train = train[&#39;Average playtime forever&#39;]
game_features = [&#39;Score rank&#39;, &#39;Recommendations&#39;]
X_train = train[game_features]
X_train.describe()
y_test = test[&#39;Average playtime forever&#39;]
X_test = test[game_features]

here's the training:

games_model = linear_model.LinearRegression()
games_model.fit(X_train, y_train)

and those are the predictions:

print(&quot;Making predictions for the following 5 games&quot;)
print(X_test.head())
print(&quot;the predictions are: .\n&quot;)
print(games_model.predict(X_test.head()))
print(&quot;the values are: &quot;)
print(y_test.head())

Im a begginer in ML and just tried making something small and hope to learn from this project I will be happy even for some directions and changes to my approach as a whole

答案1

得分: 1

在机器学习领域有一句著名的话：“如果提供高质量数据，一个糟糕的模型可能表现得比一个优秀的模型更好”。

你既没有进行数据清洗，也没有进行任何数据预处理，导致你的模型不起作用的原因可能有很多，可能存在异常值，你选择的特征数量很少，可能存在欠拟合问题。

进行一些数据清洗、特征提取和数据预处理，增加特征数量，并尝试多项式回归，因为许多特征之间的复杂关系无法通过简单的线性回归表示。

祝好运！

英文:

There is a famous saying in the field of Machine Learning, "A Bad model can perform better than a Great Model if provided high quality data".

You have neither performed any data cleaning nor any data Preprocessing, there may me several reasons for why your Model is not working, there maybe outliers, you have taken very less features so underfitting may be present.

Perform some data cleaning, feature extraction, data preprocessing, take more number of features and try Polynomial Regression this time as such complex relations between many features cannot be represented through simple Linear Regression.

Good Luck !

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

线性回归的预测差距很大，我想知道应该改变什么。

问题

答案1

在Python中规范化嵌套的JSON并将其转换为Pandas数据框。

mypy with dictionary’s get function

按列分组并获取组中行的字典列表。

Python 网络抓取空标签

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。