我是靠谱客的博主 帅气茉莉,最近开发中收集的这篇文章主要介绍Online Learning 4: Upper Confidence Bound (UCB) algorithm1 Introduction2 UCB algorithm3 KL-UCB Bernoulli,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

Online Learning 4: Upper Confidence Bound [UCB] algorithm

  • 1 Introduction
    • 1.1 Key features
    • 1.2 Anytime concentration
    • 1.3 UCB intuition
  • 2 UCB algorithm
    • ***The Upper Confidence Bound Algorithm p102 (book)***
    • 2.1 Settings
    • 2.2 Algorithm
    • 2.3 Regret
  • 3 KL-UCB Bernoulli
    • 3.1 KL-divergence
    • 3.2 Algorithm
      • KL-UCB algorithm (notebook)
    • 3.3 Regret
      • UCB
      • KL-UCB
      • Comparison

1 Introduction

1.1 Key features

  1. All ETC-like algorithms (ETC, DB, e-greedy, Elimination), seperate exploitation examples from exploration examples. The number of explorations is fixed (or with fixed expectation).

  2. UCB characteristics:

    • Number of explorations for each arm is dynamic, there is no seperation between exploration and exploitation, i.e., all samples are used to improve estimates.
    • Technical challenge: Emprical estimates of arms depend on random number of samples, so the usual sub-gaussian concentration do not hold. We need anytime concentration.

1.2 Anytime concentration

在这里插入图片描述

1.3 UCB intuition

UCB relies on principle of optimism: Use the "most reasonable" optimistic estimate of each arm, and optimize your decision at each time-step using this optimistic estimate.

Taking an optimistic view of the unknown local cuisine leads to exploration because without data, it could be amazing. After trying the new option a few times, you can update your statistics and make a more informed decision.

On the other hand, taking a pessimistic view of the new option discourages exploration, and you may suffer significant regret if the local options are delicious.

For bandits, the optimism principle means using the data observed so far to assign to each arm a value, called the upper confidence bound that with high probability is an overestimate of the unknown mean.

The intuitive reason why this leads to sublinear regret is simple.

Assuming the upper confidence bound assigned to the optimal arm is indeed an overestimate,

then another arm can only be played if its upper confidence bound is larger than that of the optimal arm, which in turn is larger than the mean of the optimal arm.

And yet this cannot happen too often because the additional data provided by playing a suboptimal arm means that the upper confidence bound for this arm will eventually fall below that of the optimal arm.

2 UCB algorithm

The Upper Confidence Bound Algorithm p102 (book)

2.1 Settings

在这里插入图片描述

2.2 Algorithm

在这里插入图片描述

2.3 Regret

在这里插入图片描述
The growth rate is log ⁡ n log n logn, but is dependent on Δ Delta Δ.

在这里插入图片描述
E [ T i ( n ) ] = ln ⁡ 1 δ Δ i 2 [ 1 − ( n + 1 ) δ ] + n ( n + 1 ) δ + 1 E[T_i(n)]=frac{ln frac{1}{delta}}{Delta_i^2}[1-(n+1)delta]+n(n+1)delta+1 E[Ti(n)]=Δi2lnδ1[1(n+1)δ]+n(n+1)δ+1
R t ( π , υ ) = ∑ i = 1 k Δ i E ( T i ( n ) ) R_t(pi,upsilon)=sum_{i=1}^{k}Delta_imathbb{E}(T_i(n)) Rt(π,υ)=i=1kΔiE(Ti(n))

在这里插入图片描述
Growth rate is n k log ⁡ n sqrt{nklog n} nklogn , but is not dependent on Δ Delta Δ.

3 KL-UCB Bernoulli

3.1 KL-divergence

K L ( p , q ) = d ( p , q ) = p ln ⁡ p q + ( 1 − p ) ln ⁡ 1 − p 1 − q KL(p,q)=d(p,q)=pln {frac{p}{q}}+(1-p)ln{frac{1-p}{1-q}} KL(p,q)=d(p,q)=plnqp+(1p)ln1q1p

3.2 Algorithm

KL-UCB algorithm (notebook)

3.3 Regret

UCB

lim ⁡ n → ∞ R n ln ⁡ n ≤ ∑ i 1 2 Δ i lim_{nrightarrowinfin}frac{R_n}{ln n}leq sum_{i}frac{1}{2Delta_i} nlimlnnRni2Δi1

KL-UCB

lim ⁡ n → ∞ R n ln ⁡ n ≤ ∑ i 2 μ i ( 1 − μ i ) Δ i lim_{nrightarrowinfin}frac{R_n}{ln n}leq sum_{i}frac{2mu_i(1-mu_i)}{Delta_i} nlimlnnRniΔi2μi(1μi)

Comparison

If the coin was exactly a fair coin, so it’s equally likely there’s a zero or a one, then KL-UCB will not give you any asymptotic improvement over the UCB.

In general, biased coin, mu could be some parameter much closer to either zero or one, in which case, you’re going to get huge improvements over UCB. That’s the KL-UCB algorithm.

最后

以上就是帅气茉莉为你收集整理的Online Learning 4: Upper Confidence Bound (UCB) algorithm1 Introduction2 UCB algorithm3 KL-UCB Bernoulli的全部内容,希望文章能够帮你解决Online Learning 4: Upper Confidence Bound (UCB) algorithm1 Introduction2 UCB algorithm3 KL-UCB Bernoulli所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(53)

评论列表共有 0 条评论

立即
投稿
返回
顶部