vowpalwabbit - Vowpal Wabbit : question on training contextual bandit on historical data-6ren

vowpalwabbit - Vowpal Wabbit : question on training contextual bandit on historical data

转载作者：行者123 更新时间：2023-12-03 16:49:02

24

4

我从 this 知道页面，有一个选项可以根据使用某些探索策略收集的历史上下文老虎机数据来训练上下文老虎机大众模型:

VW contains a contextual bandit module which allows you to optimize a predictor based on already collected contextual bandit data. In other words, the module does not implement exploration, it assumes it can only use the currently available data logged using an exploration policy.

它是通过指定 --cb 来完成的。并传递格式为 的数据行动:成本:概率 |功能 :

1:2:0.4 | a c  
3:0.5:0.2 | b d  
4:1.2:0.5 | a b c  
2:1:0.3 | b c  
3:1.5:0.7 | a d

我的问题是，是否有一种方法可以使用 --cb 来利用不基于上下文老虎机策略的历史数据？ (或其他一些方法)和一些政策评估方法？假设操作是根据某些确定性的、非探索性的(编辑:有偏见的)启发式选择的？在这种情况下，我会有行动和费用，但我不会有这个概率(或者它会等于 1)。
我尝试了一种方法，我使用探索性方法并假设历史数据已完全标记(为未知奖励分配零奖励)，但似乎 PMF 在大多数操作中崩溃为零。

最佳答案

My question is, is there a way to leverage historical data that was not based on a contextual bandit policy using --cb (or some other method) and some policy evaluation method? Let's say actions were chosen according to some deterministic, non-exploratory heuristic? In this case, I would have the action and the cost, but I wouldn't have the probability (or it would be equal to 1).

是的，将概率设置为 1。使用退化日志记录策略没有理论上的保证，但在实践中这可能有助于初始化。展望 future ，您将希望在您的日志记录策略中有一些不确定性，否则您将永远无法改进。

I've tried a method where I use an exploratory approach and assume that the historical data is fully labelled (assign reward of zero for unknown rewards) but the PMF collapses to zero over most actions.

如果您确实有完整标记的历史数据，则可以使用 warm start functionality .如果你假装你有完全标记的数据，我不确定这比将概率设置为 1 更好。

关于vowpalwabbit - Vowpal Wabbit : question on training contextual bandit on historical data，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/61670224/

24

4

0

文章推荐： android - 应用程序在前台时如何计算应用程序使用时间？

gmail-contextual-gadgets - GMail 上下文小工具不起作用
我正在尝试在我的域中部署上下文小工具。我遵循了开发指南，但小工具没有出现。我认为问题出在“数据访问”:如果安装似乎顺利，则事件，在 Google Apps 控制台中，小工具的数据访问部分显示一个
android - 仅在 Contextual ActionBar 中自定义溢出图标
我有一个深色 ActionBar 和一个用于 ActionMode 的浅色 Contextual ActionBar。我看过this question关于更改 ActionBar 中的溢出图标，它可以
java - "contextual lob creation"在JDBC中是什么意思？
关于如何在导致错误时禁用它的问题和答案有很多，但此功能到底有什么作用？最佳答案你检查过这个帖子吗:Disabling contextual LOB creation as createClob()
Python 日志记录 : Inherit contextual information
考虑以下日志记录示例。有两个 python 文件，myapp.py 和 mylib.py。 # myapp.py import logging import mylib class customAda
vowpalwabbit - Vowpal Wabbit Contextual Bandit 数据格式
我有 2 个关于格式化数据以用于上下文强盗模型训练的问题。如果我有如下数据... 1:1:0.2 | d1:us d2:female d3:12 问题 1)我从 VW Wiki 上读到，每个功能都可
gmail-contextual-gadgets - 一些用户随机无法看到上下文 Gmail 小工具
我们是这个小工具的开发者，它已经运行了几年。我们有最终客户报告说，某些用户的工具栏正在消失，我们自己的一些用户也看到了这种情况。在所有情况下为这些用户启用工具栏。小工具定义位于: https://c
安卓 WebView : Override Contextual Action Bar
我正在尝试覆盖 Android Webview 中的上下文操作栏。当我长按选定的单词时，会显示自定义操作栏。但是，当我单击操作栏按钮时，没有任何反应。似乎没有调用 onContextItemSele
html - css bootstrap "Contextual backgrounds"显示没有填充
我正在尝试使用 css bootstrap framework .我想像文档显示的那样使用“上下文背景” block 。这是文档的截图所以我尝试做这样的事情: Now
c++ - "contextual conversion"与 `&&` 和 `||` 运算符如何与用户定义的运算符重载结合使用？
来自@Xeo 的优秀c++-faq 问题:Is the safe-bool idiom obsolete in C++11?我了解到不再需要 safe bool 习语，因为在 C+ 中需要 safe
android - Toolbar 仍然可以在 Contextual ActionBar 中进行交互
Contextual ActionBar 没有与工具栏集成，就像它与 ActionBar 一样。它会出现在工具栏上方。这可以通过放置 `true` 在 styles.xml 中问题是，虽然 CAB
google-api - 谷歌应用市场 SDK : setup a contextual gadget
好的，所以在没有给出太多通知的情况下，不再可能将市场列表添加到您的供应商资料中。 “创建新列表”按钮变灰。我需要对现有市场列表进行更改，但这会出错。我们被迫使用新的 SDK。所以我继续启用应用程序
.net - 阿拉伯字符串: get actual Glyph (contextual shaping)
我有一个阿拉伯字符串。例如:راماتراحيل 在阿拉伯语中，根据字母的位置，同一字母的字形也不同。因此，如果“孤立”或“首字母”，Lam字母为ل，如果是中间或结尾，则字母ﻟ。当我将原始字符串(
java - 如何在JAVA中更改firefox配置文件首选项字段，例如 “security.insecure_field_warning.contextual.enabled”
火狐版本 53.0(32 位) Selenium 3.4.0 ProfilesIni profile = new ProfilesIni(); FirefoxProfile firef
node.js - 反转 : contextual injection with class name
我正在尝试使用inversify将记录器注入(inject)到不同的类中。我想将目标类名传递给记录器以对其进行分类。问题是我无法从创建绑定(bind)的位置访问目标名称: container.bin
c++ - 类声明错误 : insufficient contextual information to determine type
我是编程和 C++ 的新手，正在学习如何使用 Allegro 5 编写游戏程序。我为自己设定的项目之一是清理我在此处找到的 Pong 教程源代码: http://www.cppgameprogramm
c# - 是否有可能在 Monogame/XNA 中获得 "contextual"手势？
我正在使用 Monogame 开发多点触控应用程序，多个用户可以在更大的多点触控屏幕上同时处理单独的文档/图像/视频，我想知道是否有可能使手势“上下文感知” "，即用两根手指捏住墙一侧的文档不应影响平
c++ - 模板函数指针 : "overloaded function with no contextual type information"
我需要一个通用函数指针。我在下面编写了这个简单的代码，但是如果我尝试为函数指针赋值，g++ 会给我一个错误。 TestMain.cpp: In function ‘int main(int, cons
python - Python : standard imports or contextual imports? 哪个效率更高
如果这个问题看起来有补救作用，我提前道歉。哪个在 Python 中被认为更有效: 标准导入 import logging try: ...some code... exception Excep
visual-studio - 如何在 Visual Studio 中启用 "contextual"编辑器？
许多 IDE 和编辑器都提供“上下文”编辑工具: 一个简单的例子是 Assistant Editor在 XCode 中。辅助编辑窗口会根据您所在的上下文自动加载相关的辅助文件。例如，如果您在主窗口中打
clojure - 在《The Joy of Clojure》中的 contextual-eval 中取消引用构造
以下代码来自 Houser Fogus 的The Joy of Clojure(第二版)第 8.1.1 章: (defn contextual-eval [ctx expr] (eval `

首页

博学

6Ren·AI

商城

vowpalwabbit - Vowpal Wabbit : question on training contextual bandit on historical data