The Epsilon-Greedy /UCB ("upper confidence bound") for MAB (Multiarmed-bandit) problem sometime in reinforcement learning (RL)

其他 2019-12-08 14:17:31 阅读次数: 0

你是球队教练，现在突然要打一场比赛，手下空降三个球员，场上只能有一个出战，你不知道他们的能力，只能硬着头皮上，如何根据有限的上场时间看出哪个球员厉害，然后多让他上，从而得更多分数？

Epsilon-Greedy

supposed an k arm(slot) and set ε a little number between [0,0.1]

In short, epsilon-greedy means pick the current best option ("greedy") most of the time----(1-ε) + ε/k

but pick a random option with a small probability sometimes for other option-----(k-1)ε/k

often works as well as, or even better than, more sophisticated algorithms such as UCB

for more information about

A/B testing

Thompson sampling

see

https://towardsdatascience.com/solving-multiarmed-bandits-a-comparison-of-epsilon-greedy-and-thompson-sampling-d97167ca9a50

猜你喜欢

转载自www.cnblogs.com/yifan2015/p/12005552.html

The Epsilon-Greedy /UCB ("upper confidence bound") for MAB (Multiarmed-bandit) problem sometime in reinforcement learning (RL)

Upper-Confidence-Bound(UCB) Action Selection

上置信界算法（the-upper-confidence-bound-algorithm，UCB）

RL(Chapter 1): The Reinforcement Learning Problem

强化学习中的multiarmed-Bandit以及经典解法epsilon-greedy算法，附加python实现

对RL（reinforcement learning）--强化学习的认识

强化学习（Reinforcement Learning, RL）初步介绍强化学习（Reinforcement Learning, RL）初步介绍

RL,MAB与Contextual Bandits区别

Deep RL Bootcamp Lecture 10B Inverse Reinforcement Learning

1 强化学习（Reinforcement Learning, RL）初步介绍

RL（Reinforcement Learning）中经常使用算法分类

RL+CO survey ：Reinforcement Learning for Combinatorial Optimization: A Survey

深度学习3. 强化学习-Reinforcement learning | RL

【RL系列】Multi-Armed Bandit笔记——UCB策略与Gradient策略

随机多臂赌博机 (Stochastic Multi-armed Bandits)：置信上界算法 (Upper Confidence Bound)

MAB问题和Bandit算法

深度强化学习（Deep Reinforcement Learning）入门：RL base & DQN-DDPG-A3C introduction

综述论文《Deep Reinforcement Learning and Its Neuroscientific Implications》精华总结 & 近期RL前沿方向汇总

基于强化学习（Reinforcement learning，RL）的机器人路径规划MATLAB

RL+RA 文献Multi-Agent Deep Reinforcement Learning for Enhancement of Distributed Resource Allocation

upper_bound

Learning as Decoding the World to Approach the Intelligence Upper-bound: Applying UICM to AlphaZero

贪心算法epsilon-greedy

upper_bound() lower_bound() 用法

lower_bound and upper_bound

lower_bound和upper_bound

lower_bound()和upper_bound()

认识lower_bound()与upper_bound()

lower_bound&&upper_bound专题

关于lower_bound与upper_bound

今日推荐

周排行

深度学习------Lingvo框架下的加速通道GPipe

webjars管理静态资源

C专家编程_2.2

mysql 源码安装

json文件操作

123231432

注解的实现

Spring MVC 控制器

《人月神话》读后感二

C#使用HttpWebRequest和HttpWebResponse上传文件示例

每日归档

2024-09-08(0)

2024-09-07(0)

2024-09-06(0)

2024-09-05(0)

2024-09-04(0)

2024-09-03(0)

2024-09-02(0)

2024-09-01(0)

2024-08-31(0)

2024-08-30(0)