Algorithm to live 读书笔记

68 阅读 0 评论 45 点赞

我是靠谱客的博主忧郁小土豆，最近开发中收集的这篇文章主要介绍Algorithm to live 读书笔记，觉得挺不错的，现在分享给大家，希望可以做个参考。

概述

最近被邀请读一本书

《Algorithm to live by》

书中介绍一些生活中遇到的一些puzzle和数据科学家用来解决这些puzzle的方法

在这里开博客更新我读这本书的一些感悟

目前读到第二章，从这里开始更吧

Explore/exploit

Computerscience about decision-making

This chapter discuss the choice betweenexploration and exploitation. We are used to face the choice like “have dinnerin the new restaurant or better go to my favorite one”. We treat the questionas a simple case, but data scientists consider the choice we have to make inlife as a mathematical riddle: “one-armed bandit”. It can be interpreted aschoosing a most profitable one-armed bandit in a casino.

Riddle:multi-armed bandit

Actually, the way how we choose, to exploreor to exploit, is highly dependent upon the time we left. For instance, at theeve we leave the town, we will choose our favorite café without doubt since wehave less time to risk. Even when we find a fantastic new one, we would have notime to come back and try it again. Therefore, the favorite, already well-knowncafé seems to be the wise choice. However, as we firstly come to a new city, wewill search for the new things around us as we have plenty of time to try.

Examplefor the riddle

The tradeoff between the exploration andexploitation can be applied not only for the café. It is indeed a question thatshould be taken into consideration in our life. Taking Hollywood films shown upin last decades as an example, since years the studios have been trying to findthe balance between the sequels and the new series. They are obviously takingadvantage of the sequels as they have the guaranteed fan base. In other word:“they’re pulling the arms of the best machines they’ve got before the casinoturns them out” (Page. 32. Line 21~22)

Win-stay

To solve the riddle, “Win-stay” was firstlyput forward by Herbert Robbins as an optimal algorithm that initially chooserandom arm and keep pulling until the it doesn’t pay off. Then comes the“Lose-shift”, which means once the arm doesn’t make profit, shift to anotherarm immediately. On basis of his work, Richard Bellman developed the algorithmto calculate the solution to the problem when the amount of the choice areknown. Although the algorithm to the multi-armed bandit problem is still notperfect on consideration of the rash move and unclear future in reality (inother words, we do not the amount of the opportunities we are facing, thus theproblem is actually still unsolved), it makes indeed progress.

The Gittinsindex

In the 1970s, John Gittins, a youngmathematician solved the riddle by coincidence as he was asked to optimize drugtrial: to find the compound which is most effectively against a disease asquickly as possible. Companies want to invest money into the discovery of newdrugs, as well as their profitable current product lines are flourishing. Tosolves the problem, Gittins assumes that the value assigned to payoffsdecreases geometrically: that is, each restaurant you visit is worth a constantfraction of the last one. He invented the Gittins index with the simplified modelreferencing to the bribe. The index takes each arm as an individual object andthus gives a table on basis of the payoff of the next time, which is shownbelow:

Figure 1. left: Gittins indexwith 90% of the payoff, right: Gittins index with 90% of the payoff

Given the 90% payoff of the next time

The Gittins index, then, provides a formal,rigorous justification for preferring the unknown and provides straightforwardsolution to the riddle. But the index is based on the optimal assumption thatpeople know exactly the next payoff. Besides, in daily life, it is also hard tocalculate the index on the fly.

UpperConfidence Bound

Since Gittins index is complicated to calculate.Another method or thinking is to reduce and minimize the regret in one’s life. HerbertRobbins discover three laws of the regret:

“

1. First, assuming you’re not omniscient, your total amount of regret will probably neverstop increasing, even if you pick the best possible strategy

2. Second, regret will increase at a slowerrate if you pick the best strategy than if you pick others

3. regret that increases at alogarithmic rate with every pull of the handle

” (Page 38, Para. 3)

According to the law, scientistslook for algorithms to minimize the regret. The most popular algorithm amongthem is known as Upper Confidence Bound.It assigns the “confidence interval” ofthe assessment with errors, as more data was obtained, the interval is going toshrink, and the algorithm picks simply the one whose top of the interval ishighest. it always chooses the arm that could reasonably perform best in thefuture, implementing a principle that has been dubbed “optimism in the face ofuncertainty.” (Page 39, Line 8~9)

A/B test

Although the detailed method was notintroduced in the book. A/B test is one of the most important test for webpages, voting, etc. It works like this:

“A/B testing (buckettests or split-run testing) is a controlled experiment with twovariants, A and B.” (Wikipedia, A/B testing)

In other word, two different designs with differentcharacteristics/ elements shall be provided to test the behaviors of the users.One can be the currently in service product (control), the other is modified insome level (treatment).

As mentioned in the text, Dan Siroker A/Btest the donate button for Obama’s election, huge companies such as Google andAmazon test search algorithms and checkout flow infamously. Internet companiesare used to test different strategies for higher rate for their ads.

Even for human lives the testing isalso introduced. Marvin Zelen introduced clinical trial on the study ofextracorporeal membrane oxygenation, “ECMO”—“ECMO takes blood that’s

heading for the lungs and routes itinstead out of the body, where it is oxygenated by a machine and returned tothe heart” (Page 42, Para. 5)

New born babies were split intodifferent groups to accept either the conventional treatment or ECMO, trying toprove the efficiency of the new method. Although the trial seems to beunethical, it indeed progress the medicine field and “transform it from a fieldin which doctors had to persuade each other in ad hoc ways about every newtreatment into one where they had clear guidelines about what sorts of evidencewere and were not persuasive” (Page 44, Para.1)

Restlessreality

Although algorithms were providedfor making decisions, the world is restless and so the future, which makes theriddle even harder to solve- payoff on the different arms change over time.Also, people tend to explore rather than exploit, even when the algorithmsuggests that the exploration should stop. All in all, what most important is,ty to figure out the time left to you, and seize them.