2023年7月27日 21:48:12go评论101阅读模式

英文:

How does this code for BernoulliNB classifier works?

问题

我不理解第5行发生了什么。

英文:

Can someone please explain what this code does? It's from the book "Introduction to Machine Learning with Python" on Bernoulli Naive Bayes classifier:

counts = {}
for label in np.unique(y):
    # iterate over each class
    # count (sum) entries of 1 per feature
    counts[label] = X[y == label].sum(axis=0)
print(&quot;Feature counts:\n&quot;, counts)

I don't understand what happens on line5.

答案1

得分: 0

让我们使用一个例子。

import numpy as np
X = np.array([[1, 10], [2, 20], [3, 30], [4, 40], [5, 50], [6, 60]])
y = np.array([11, 22, 33, 11, 22, 11])

np.unique(y) 是 [11, 22, 33]

因此标签将依次是这些。

当标签是 11 时
y==label 是 [11, 22, 33, 11, 22, 11]==11，这是 [True, False, False, True, False, True]
所以 X[y==label] 是 X[[True, False, False, True, False, True]]，因此它选择了 X 的第 0、3 和 5 行。所以 [[1, 10], [4, 40], [6, 60]]
sum(axis=0) 沿着轴 0 对其进行求和，因此 X[y==label].sum(axis=0) 是 [1+4+6, 10+40, 60] = [11, 110]
所以 counts[11]=[11, 110]

同样，当标签是 22 时，y==label 是 [False, True, False, False, True, False]，所以 X[y==label] 是 [[2, 20], [5, 50]]，所以 X[y==label].sum(axis=0) 是 [7, 70]，这赋值给了 counts[22]。

当标签是 33 时，y==label 仅是 [False, False, True, False, False, False]，所以 X[y==label] 是 [[3, 30]]，所以 X[y==label].sum(axis=0) 是 [3, 30]，这赋值给了 counts[33]。

因此，最终，如果你的 X 数据是一个包含 k 个值的列表，而 y 数据是一个从 n 个可能性中选择的 k 个类的列表，counts 对于每个可能的 n 个类，是匹配该类的数据值的 k 个和。

英文:

Let's use an example.

import numpy as np
X=np.array([[1,10],[2,20], [3,30], [4,40], [5,50], [6,60]])
y=np.array([11,22,33,11,22,11])

np.unique(y) is [11,22,33]

So label will successively be those.

When label is 11
y==label is [11,22,33,11,22,11]==11 which is [True,False,False,True,False,True]
so X[y==label] is X[[True,False,False,True,False,True]] so it is a selection of rows 0, 3 and 5 of X. So [[1,10],[4,40],[6,60]]
sum(axis=0) sum that along axis 0, so X[y==label].sum(axis=0) is [1+4+6,10+40,60] = [11,110]
so counts[11]=[11,110]

Likewise, when label is 22, y==label is [False,True,False,False,True,False], so X[y==label] is [[2,20],[5,50]] so X[y==label].sum(axis=0) is [7,70], which is affected to counts[22].

And when label is 33, y==label is just [False,False,True,False,False,False], so X[y==label] is [[3,30]] so X[y==label].sum(axis=0) is [3,30] which is affected to counts[33].

So at the end, if your X data are a list of k values, and y data a list of k classes, chosen among n possibilities, counts are, for each n possible classes, the k sums of the values of data matching that class.

答案2

得分: 0

根据这段简短的代码片段，似乎变量 X 是一个数据框(Data Frame)，当你执行 X[y == label] 时，它是根据列 y 与标签匹配的条件来过滤数据框。

接下来是 .sum()，它是计算列 y 中值与标签匹配的行的总和。如果这列恰好是多维数组，它会在第一个轴上求和，即垂直求和。

最后，在等号的左边，它将这个总和添加到名为 counts 的字典中，其中键是标签。

英文:

Based on the short code snippet, it seems like the variable X is a data frame, and when you do X[y == label], it is filtering the data frame based on the condition where column y matches with the label.

Proceeding to the .sum(), it is taking the sum of the rows where column y has the value that matches the label. If this column happens to be a multi-dimensional array, it is taking the sum on the first axis, i.e. vertically.

Finally on the left hand side of the equal sign, it is adding this sum to the counts dictionary where the key is the label.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

BernoulliNB分类器的代码是如何工作的？

问题

答案1

答案2

“Getting ‘Nonetype is not callable’ error but I don’t know why.”

执行使用Python执行ffmpeg命令来定位*.png文件失败。

如何使用 rfind 返回以 ‘ing’ 结尾的单词？

如何使用Scrapy Playwright设置页面的视口大小？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。