2020年1月4日 01:16:39go评论150阅读模式

英文:

(vowpal wabbit) contextual bandit dealing with new context

问题

这些最近的日子里，我试图通过Vowpalwabbit来训练一个上下文强化学习算法，所以我正在做一些玩具模型，以帮助我理解算法的工作原理。

所以我构想了一个具有4种可能操作的状态，并在两种不同的上下文中训练我的模型。每个上下文中只有一种最佳操作，其中有4种操作。

这是我所做的。

vw = pyvw.vw("--cb_explore 4 -q UA --epsilon 0.1")
vw.learn('1:-2:0.5 | 5')
vw.learn('3:2:0.5 | 5')
vw.learn('1:2:0.5 | 15')
vw.learn('3:-2:0.5 | 15')
vw.learn('4:2:0.5 | 5')
vw.learn('4:2:0.5 | 15')
vw.learn('2:2:0.5 | 5')
vw.learn('2:2:0.5 | 15')

因此，对于我的示例，在具有特征等于5的上下文中，最佳操作是2，而对于另一个上下文，最佳操作是3。

当我在这两种上下文中进行预测时，没有问题，因为算法已经遇到过它们一次，并且已经根据奖励来确定了选择。

但是当我带来一个新的上下文时，我希望算法能为我提供最相关的操作，例如考虑上下文特征的相似性。

例如，如果我提供一个特征等于29的上下文，我期望得到操作3，因为29比5更接近于15。

这就是我现在的疑问。

谢谢！

英文:

This last days I'm trying to train a contextual bandit algorithm throw Vowpalwabbit, so I'm doing some toy-model that can help me understand how the algorithm works.

So I imagined a state with 4 possible action and I train my model on two different context.
Each context has only one optimal action among the 4 actions.

That's how I did it.

vw = pyvw.vw(&quot;--cb_explore 4 -q UA --epsilon 0.1&quot;)
vw.learn(&#39;1:-2:0.5 | 5&#39;)
vw.learn(&#39;3:2:0.5 | 5&#39;)
vw.learn(&#39;1:2:0.5 | 15&#39;)
vw.learn(&#39;3:-2:0.5 | 15&#39;)
vw.learn(&#39;4:2:0.5 | 5&#39;)
vw.learn(&#39;4:2:0.5 | 15&#39;)
vw.learn(&#39;2:2:0.5 | 5&#39;)
vw.learn(&#39;2:2:0.5 | 15&#39;)

So for my example for the context with his feature equal to 5 the optimal action is 2 and for the other one the optimal action is 3.

When I predict on those two context, there is no problem since the algorithm meet them already once and had get a reward conditioning his choice.

But when I arrive with a new context I expect the algorithm to make me the most relevant action, for example by taking into account the similarity of the context features.

So for example if I give a feature that equal to 29, I'm expecting to get action 3, since 29 is more near to 15 than 5.

So that my interrogations right now.

Thanks !

答案1

得分: 2

问题出在你构造特征的方式上。特征的输入格式定义为name[:value]，如果没有提供值，则默认值为1.0。所以你提供的是一个名字为5或15的特征。特征名称会被哈希化并用于确定特征的索引。因此，在你的情况下，特征5和特征15都具有值1.0，并且具有不同的索引。

因此，要解决你的问题，你只需为你的特征指定一个名称。

vw.learn('1:-2:0.5 | my_feature_name:5')
vw.learn('1:2:0.5 | my_feature_name:15')

你可以在这里了解更多关于输入格式的信息。

另外，我想指出，在你的示例中-q UA没有起作用，因为你没有命名空间。命名空间可以通过将它们放置在竖线旁边来指定。以下示例有两个命名空间，A 和 B。（注意：如果命名空间使用多个字符，那么-q只使用第一个字符）

1:-2:0.5 |A my_feature_name:5 |B yet_another_feature:4

在这种情况下，如果我们提供了-q AB，那么VW会在运行时为A和B中的每一对特征创建一个新特征。这允许你在VW学习的表示中表达更复杂的交互作用。

英文:

The problem is in the way you've structured the feature. The input format for a feature is defined as name[:value], and if value is not supplied the default value is 1.0. So what you've supplied is a feature whose name is 5, or 15. Feature names are hashed and used to determine the index of the feature. So in your case feature 5 and feature 15 both have a value of 1.0 and are distinct features with different indices.

Therefore, to fix your problem you just need to give your features a name.

vw.learn(&#39;1:-2:0.5 | my_feature_name:5&#39;)
vw.learn(&#39;1:2:0.5 | my_feature_name:15&#39;)

You can read more about the input format here.

Also, I'd like to point out that -q UA is not doing anything in your example, as you do not have namespaces. Namespaces can be specified by placing them next to the bar. The following example has two namespaces, A and B. (Note: if more than one character is used for namespace only the first character is used with -q)

1:-2:0.5 |A my_feature_name:5 |B yet_another_feature:4

In this case if we supplied -q AB, then VW would create a new feature for each pair of features in A and B at runtime. This allows you to express more complicated interactions in the representation VW learns.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

(vowpal wabbit) 处理新上下文的情境赌博

问题

答案1

在Huggingface Vilt模型的顶部添加一个分类头部。

Fields are missing when I `pip show` my Python package.

在列中筛选包含子字符串的pandas数据框。

Python YAML解析引发KeyError。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论