概述
Explore/exploit
Computerscience about decision-making
This chapter discuss the choice betweenexploration and exploitation. We are used to face the choice like “have dinnerin the new restaurant or better go to my favorite one”. We treat the questionas a simple case, but data scientists consider the choice we have to make inlife as a mathematical riddle: “one-armed bandit”. It can be interpreted aschoosing a most profitable one-armed bandit in a casino.
Riddle:multi-armed bandit
Actually, the way how we choose, to exploreor to exploit, is highly dependent upon the time we left. For instance, at theeve we leave the town, we will choose our favorite café without doubt since wehave less time to risk. Even when we find a fantastic new one, we would have notime to come back and try it again. Therefore, the favorite, already well-knowncafé seems to be the wise choice. However, as we firstly come to a new city, wewill search for the new things around us as we have plenty of time to try.
Examplefor the riddle
The tradeoff between the exploration andexploitation can be applied not only for the café. It is indeed a question thatshould be taken into consideration in our life. Taking Hollywood films shown upin last decades as an example, since years the studios have been trying to findthe balance between the sequels and the new series. They are obviously takingadvantage of the sequels as they have the guaranteed fan base. In other word:“they’re pulling the arms of the best machines they’ve got before the casinoturns them out” (Page. 32. Line 21~22)
Win-stay
To solve the riddle, “Win-stay” was firstlyput forward by Herbert Robbins as an optimal algorithm that initially chooserandom arm and keep pulling until the it doesn’t pay off. Then comes the“Lose-shift”, which means once the arm doesn’t make profit, shift to anotherarm immediately. On basis of his work, Richard Bellman developed the algorithmto calculate the solution to the problem when the amount of the choice areknown. Although the algorithm to the multi-armed bandit problem is still notperfect on consideration of the rash move and unclear future in reality (inother words, we do not the amount of the opportunities we are facing, thus theproblem is actually still unsolved), it makes indeed progress.
The Gittinsindex
Figure 1. left: Gittins indexwith 90% of the payoff, right: Gittins index with 90% of the payoff
Given the 90% payoff of the next time
The Gittins index, then, provides a formal,rigorous justification for preferring the unknown and provides straightforwardsolution to the riddle. But the index is based on the optimal assumption thatpeople know exactly the next payoff. Besides, in daily life, it is also hard tocalculate the index on the fly.
UpperConfidence Bound
Since Gittins index is complicated to calculate.Another method or thinking is to reduce and minimize the regret in one’s life. HerbertRobbins discover three laws of the regret:
“
1. First, assuming you’re not omniscient, your total amount of regret will probably neverstop increasing, even if you pick the best possible strategy
2. Second, regret will increase at a slowerrate if you pick the best strategy than if you pick others
3. regret that increases at alogarithmic rate with every pull of the handle
” (Page 38, Para. 3)
According to the law, scientistslook for algorithms to minimize the regret. The most popular algorithm amongthem is known as Upper Confidence Bound.It assigns the “confidence interval” ofthe assessment with errors, as more data was obtained, the interval is going toshrink, and the algorithm picks simply the one whose top of the interval ishighest. it always chooses the arm that could reasonably perform best in thefuture, implementing a principle that has been dubbed “optimism in the face ofuncertainty.” (Page 39, Line 8~9)
A/B test
Although the detailed method was notintroduced in the book. A/B test is one of the most important test for webpages, voting, etc. It works like this:
“A/B testing (buckettests or split-run testing) is a controlled experiment with twovariants, A and B.” (Wikipedia, A/B testing)
In other word, two different designs with differentcharacteristics/ elements shall be provided to test the behaviors of the users.One can be the currently in service product (control), the other is modified insome level (treatment).
As mentioned in the text, Dan Siroker A/Btest the donate button for Obama’s election, huge companies such as Google andAmazon test search algorithms and checkout flow infamously. Internet companiesare used to test different strategies for higher rate for their ads.
Even for human lives the testing isalso introduced. Marvin Zelen introduced clinical trial on the study ofextracorporeal membrane oxygenation, “ECMO”—“ECMO takes blood that’s
heading for the lungs and routes itinstead out of the body, where it is oxygenated by a machine and returned tothe heart” (Page 42, Para. 5)
New born babies were split intodifferent groups to accept either the conventional treatment or ECMO, trying toprove the efficiency of the new method. Although the trial seems to beunethical, it indeed progress the medicine field and “transform it from a fieldin which doctors had to persuade each other in ad hoc ways about every newtreatment into one where they had clear guidelines about what sorts of evidencewere and were not persuasive” (Page 44, Para.1)
Restlessreality
Although algorithms were providedfor making decisions, the world is restless and so the future, which makes theriddle even harder to solve- payoff on the different arms change over time.Also, people tend to explore rather than exploit, even when the algorithmsuggests that the exploration should stop. All in all, what most important is,ty to figure out the time left to you, and seize them.
最后
以上就是忧郁小土豆为你收集整理的Algorithm to live 读书笔记的全部内容,希望文章能够帮你解决Algorithm to live 读书笔记所遇到的程序开发问题。
如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。
发表评论 取消回复