BernoulliNB分类器的代码是如何工作的?

huangapple go评论75阅读模式
英文:

How does this code for BernoulliNB classifier works?

问题

我不理解第5行发生了什么。

英文:

Can someone please explain what this code does? It's from the book "Introduction to Machine Learning with Python" on Bernoulli Naive Bayes classifier:

counts = {}
for label in np.unique(y):
    # iterate over each class
    # count (sum) entries of 1 per feature
    counts[label] = X[y == label].sum(axis=0)
print("Feature counts:\n", counts)

I don't understand what happens on line5.

答案1

得分: 0

让我们使用一个例子。

import numpy as np
X = np.array([[1, 10], [2, 20], [3, 30], [4, 40], [5, 50], [6, 60]])
y = np.array([11, 22, 33, 11, 22, 11])

np.unique(y)[11, 22, 33]

因此标签将依次是这些。

当标签是 11 时
y==label[11, 22, 33, 11, 22, 11]==11,这是 [True, False, False, True, False, True]
所以 X[y==label]X[[True, False, False, True, False, True]],因此它选择了 X 的第 0、3 和 5 行。所以 [[1, 10], [4, 40], [6, 60]]
sum(axis=0) 沿着轴 0 对其进行求和,因此 X[y==label].sum(axis=0)[1+4+6, 10+40, 60] = [11, 110]
所以 counts[11]=[11, 110]

同样,当标签是 22 时,y==label[False, True, False, False, True, False],所以 X[y==label][[2, 20], [5, 50]],所以 X[y==label].sum(axis=0)[7, 70],这赋值给了 counts[22]

当标签是 33 时,y==label 仅是 [False, False, True, False, False, False],所以 X[y==label][[3, 30]],所以 X[y==label].sum(axis=0)[3, 30],这赋值给了 counts[33]

因此,最终,如果你的 X 数据是一个包含 k 个值的列表,而 y 数据是一个从 n 个可能性中选择的 k 个类的列表,counts 对于每个可能的 n 个类,是匹配该类的数据值的 k 个和。

英文:

Let's use an example.

import numpy as np
X=np.array([[1,10],[2,20], [3,30], [4,40], [5,50], [6,60]])
y=np.array([11,22,33,11,22,11])

np.unique(y) is [11,22,33]

So label will successively be those.

When label is 11
y==label is [11,22,33,11,22,11]==11 which is [True,False,False,True,False,True]
so X[y==label] is X[[True,False,False,True,False,True]] so it is a selection of rows 0, 3 and 5 of X. So [[1,10],[4,40],[6,60]]
sum(axis=0) sum that along axis 0, so X[y==label].sum(axis=0) is [1+4+6,10+40,60] = [11,110]
so counts[11]=[11,110]

Likewise, when label is 22, y==label is [False,True,False,False,True,False], so X[y==label] is [[2,20],[5,50]] so X[y==label].sum(axis=0) is [7,70], which is affected to counts[22].

And when label is 33, y==label is just [False,False,True,False,False,False], so X[y==label] is [[3,30]] so X[y==label].sum(axis=0) is [3,30] which is affected to counts[33].

So at the end, if your X data are a list of k values, and y data a list of k classes, chosen among n possibilities, counts are, for each n possible classes, the k sums of the values of data matching that class.

答案2

得分: 0

根据这段简短的代码片段,似乎变量 X 是一个数据框(Data Frame),当你执行 X[y == label] 时,它是根据列 y 与标签匹配的条件来过滤数据框。

接下来是 .sum(),它是计算列 y 中值与标签匹配的行的总和。如果这列恰好是多维数组,它会在第一个轴上求和,即垂直求和。

最后,在等号的左边,它将这个总和添加到名为 counts 的字典中,其中键是标签。

英文:

Based on the short code snippet, it seems like the variable X is a data frame, and when you do X[y == label], it is filtering the data frame based on the condition where column y matches with the label.

Proceeding to the .sum(), it is taking the sum of the rows where column y has the value that matches the label. If this column happens to be a multi-dimensional array, it is taking the sum on the first axis, i.e. vertically.

Finally on the left hand side of the equal sign, it is adding this sum to the counts dictionary where the key is the label.

huangapple
  • 本文由 发表于 2023年7月27日 21:48:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76780413.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定