2023年4月17日 16:35:40go评论81阅读模式

英文:

How should I learn ML libraries like Sci-kit learn

问题

我几天前开始学习机器学习。我一直在跟着Udemy的课程。但我实际上并不感到完全舒适地盲目跟随教程。例如，在数据预处理部分将数据集拆分为训练集和测试集时，他们使用了来自sci-kit learn库的一个名为train_test_split的函数。

但当我查看他们的文档时，我找不到这样的函数。然后在视频中，他们说它在model_selection模块下。但我也找不到那个函数。也许如果我逐行浏览该模块的所有内容，我会找到它。但这也有点不可能浏览所有内容。所以，我的问题是，这样学习正常吗？还是我做错了什么？

英文:

I have started learning ML few days ago. I've been following a Udemy course. But I'm not actually feeling comfortable following the tutorial blindly. For example, in data preprocessing part while splitting the dataset into train and test set, they were using a function called train_test_split from sci-kit learn library.

But when I went into their documentation, I couldn't find such function. Then in the video they said it was under model_selection module. But I couldn't find the function there as well. Maybe I'll find it if I go through every single line of that module. But that's also kinda not possible to go through all of those. So, my question is, is it normal or okay to learn like this? or am I doing something wrong?

答案1

得分: 1

我的建议是直接使用 scikit-learn 文档。在那里你可以找到相关函数的示例。

例如，对于 train_test_split 方法，可以在这里找到方法的签名、参数和输出的详细说明。

方法签名：

sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)

在同一页中有一个 train_test_split 使用的基本示例：

import numpy as np
from sklearn.model_selection import train_test_split
# toy dataset
X, y = np.arange(10).reshape((5, 2)), range(5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
# with shuffle
X_train, X_test, y_train, y_test = train_test_split(y, shuffle=False)

同一页的底部还有很多示例，展示了 train_test_split 的用法，通常用于玩具问题。

英文:

My suggestion is to use directly the scikit-learn documentation. In there you can find also example in which the related functions are used.

For example, for the train_test_split method, found here , there is the method signature, arguments and output explained very well.

Method signature:

sklearn.model_selection.train_test_split(*arrays, test_size=None, train_size=None, random_state=None, shuffle=True, stratify=None)

In the same page there is a basic example of the train_test_split usage:

import numpy as np
from sklearn.model_selection import train_test_split
# toy dataset
X, y = np.arange(10).reshape((5, 2)), range(5)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
# with shuffle
X_train, X_test, y_train, y_test = train_test_split(y, shuffle=False)

Also in the bottom part of the page you can find a lot of examples in which the train_test_split is used, usually for toy problems

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何学习类似Sci-kit Learn的ML库

问题

答案1

如何在它们之间似乎没有相关性时找到一个变量对另一个变量的影响？

I want to make an AI text classifier using OpenAI API, based on GPT2 but i cannot find the API documentation for the GPT2

PyTorch在Windows 11上无法安装在Python 3.11上。

计算当前时间所属的15分钟时间段，使用go语言。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。