我是靠谱客的博主 糊涂香氛,最近开发中收集的这篇文章主要介绍Upper-Confidence-Bound(UCB) Action Selection,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

Background

In ε-greedy method, we randomly choose non-greedy actions as exploration, but indiscriminately, with no preference for those that are nearly greedy or particularly uncertain.

Upper-Confidence-Bound

In order to take into account both how close their estimates are to being maximal and the uncertainties in those estimates, one effective way is to select actions according to: A t ≐ a r g m a x a [ Q t ( a ) + c ln ⁡ t N t ( a ) ] A_tdoteq underset{a}{argmax}[Q_t(a)+csqrt{frac{ln{t}}{N_t(a)}}] Ataargmax[Qt(a)+cNt(a)lnt ]

  • N t ( a ) N_t(a) Nt(a) denotes the number of times that action a a a has been selected prior to time t t t. If N t ( a ) = 0 N_t(a)=0 Nt(a)=0, then a a a is considered to be a maximizing action.
  • c > 0 c>0 c>0 controls the degree of exploration and determines the confidence level.
  • The use of natural logarithm ln ⁡ t ln{t} lnt means that the increases get smaller over time, but are unbounded - all actions will be selected eventually. But actions with lower value estimates or that have already been selected frequently, will be selected with decreasing frequency over time.

The idea of UCB action selection is that the square-root term c ln ⁡ t N t ( a ) csqrt{frac{ln{t}}{N_t(a)}} cNt(a)lnt is a measure of the uncertainty or variance in the estimate of a’s value. The quantity being max’ed over is a sort of upper bound on the possible true value of action a a a. Each time the action a a a is selected, the uncertainty is reduced. On the other hand, as the time step t t t goes larger, if the action other than a a a is selected, the uncertainty is increased.

Pros & Cons

  1. UCB is more difficult than ε-greedy method to extend beyond bandit problems.
  2. UCB has difficulties in dealing with large state spaces and nonstationary problems…

最后

以上就是糊涂香氛为你收集整理的Upper-Confidence-Bound(UCB) Action Selection的全部内容,希望文章能够帮你解决Upper-Confidence-Bound(UCB) Action Selection所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(41)

评论列表共有 0 条评论

立即
投稿
返回
顶部