Upper-Confidence-Bound(UCB) Action Selection
BackgroundIn ε-greedy method, we randomly choose non-greedy actions as exploration, but indiscriminately, with no preference for those that are nearly greedy or particularly uncertain.Upper-Confidence-BoundIn order to take into account both how close