multi-armed bandit

pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions by Brian Christian, Tom Griffiths

4chan, Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, algorithmic bias, algorithmic trading, anthropic principle, asset allocation, autonomous vehicles, Bayesian statistics, behavioural economics, Berlin Wall, Big Tech, Bill Duvall, bitcoin, Boeing 747, Charles Babbage, cognitive load, Community Supported Agriculture, complexity theory, constrained optimization, cosmological principle, cryptocurrency, Danny Hillis, data science, David Heinemeier Hansson, David Sedaris, delayed gratification, dematerialisation, diversification, Donald Knuth, Donald Shoup, double helix, Dutch auction, Elon Musk, exponential backoff, fault tolerance, Fellow of the Royal Society, Firefox, first-price auction, Flash crash, Frederick Winslow Taylor, fulfillment center, Garrett Hardin, Geoffrey Hinton, George Akerlof, global supply chain, Google Chrome, heat death of the universe, Henri Poincaré, information retrieval, Internet Archive, Jeff Bezos, Johannes Kepler, John Nash: game theory, John von Neumann, Kickstarter, knapsack problem, Lao Tzu, Leonard Kleinrock, level 1 cache, linear programming, martingale, multi-armed bandit, Nash equilibrium, natural language processing, NP-complete, P = NP, packet switching, Pierre-Simon Laplace, power law, prediction markets, race to the bottom, RAND corporation, RFC: Request For Comment, Robert X Cringely, Sam Altman, scientific management, sealed-bid auction, second-price auction, self-driving car, Silicon Valley, Skype, sorting algorithm, spectrum auction, Stanford marshmallow experiment, Steve Jobs, stochastic process, Thomas Bayes, Thomas Malthus, Tragedy of the Commons, traveling salesman, Turing machine, urban planning, Vickrey auction, Vilfredo Pareto, Walter Mischel, Y Combinator, zero-sum game

He then introduced the mathematical form of the problem to his colleagues, and it ultimately became generalized to the multi-armed bandit. A comprehensive introduction to multi-armed bandits appears in Berry and Fristed, Bandit Problems. Our focus in this chapter is on bandits where each arm either produces a payoff or doesn’t, with different probabilities but the same payoff amount on all arms. This is known as a Bernoulli bandit in the literature, as the probability distribution that describes a coin flip is called the Bernoulli distribution (after the seventeenth-century Swiss mathematician Jacob Bernoulli). Other kinds of multi-armed bandits are also possible, with unknown distributions of different kinds characterizing the payoffs from each arm.

…

best algorithms to use remain hotly contested: Chris Stucchio, for instance, penned a cutting article titled “Why Multi-armed Bandit Algorithms Are Superior to A/B Testing,” which was then countered by an equally cutting article called “Don’t Use Bandit Algorithms—They Probably Won’t Work for You”—also written by Chris Stucchio. See https://www.chrisstucchio.com/blog/2012/bandit_algorithms_vs_ab.html and https://www.chrisstucchio.com/blog/2015/dont_use_bandits.html. Stucchio’s 2012 post was written partly in reference to an article by Paras Chopra titled “Why Multi-armed Bandit Algorithm Is Not ‘Better’ than A/B Testing” (https://vwo.com/blog/multi-armed-bandit-algorithm/), which was itself written partly in reference to an article by Steve Hanov titled “20 lines of code that will beat A/B testing every time” (http://stevehanov.ca/blog/index.php?

…

Gittins (albeit many years before the first episode of Deal or No Deal aired) realized that the multi-armed bandit problem is no different. For every slot machine we know little or nothing about, there is some guaranteed payout rate which, if offered to us in lieu of that machine, will make us quite content never to pull its handle again. This number—which Gittins called the “dynamic allocation index,” and which the world now knows as the Gittins index—suggests an obvious strategy on the casino floor: always play the arm with the highest index.* In fact, the index strategy turned out to be more than a good approximation. It completely solves the multi-armed bandit with geometrically discounted payoffs.

pages: 543 words: 153,550

Model Thinker: What You Need to Know to Make Data Work for You by Scott E. Page

Airbnb, Albert Einstein, Alfred Russel Wallace, algorithmic trading, Alvin Roth, assortative mating, behavioural economics, Bernie Madoff, bitcoin, Black Swan, blockchain, business cycle, Capital in the Twenty-First Century by Thomas Piketty, Checklist Manifesto, computer age, corporate governance, correlation does not imply causation, cuban missile crisis, data science, deep learning, deliberate practice, discrete time, distributed ledger, Easter island, en.wikipedia.org, Estimating the Reproducibility of Psychological Science, Everything should be made as simple as possible, experimental economics, first-price auction, Flash crash, Ford Model T, Geoffrey West, Santa Fe Institute, germ theory of disease, Gini coefficient, Higgs boson, High speed trading, impulse control, income inequality, Isaac Newton, John von Neumann, Kenneth Rogoff, knowledge economy, knowledge worker, Long Term Capital Management, loss aversion, low skilled workers, Mark Zuckerberg, market design, meta-analysis, money market fund, multi-armed bandit, Nash equilibrium, natural language processing, Network effects, opioid epidemic / opioid crisis, p-value, Pareto efficiency, pattern recognition, Paul Erdős, Paul Samuelson, phenotype, Phillips curve, power law, pre–internet, prisoner's dilemma, race to the bottom, random walk, randomized controlled trial, Richard Feynman, Richard Thaler, Robert Solow, school choice, scientific management, sealed-bid auction, second-price auction, selection bias, six sigma, social graph, spectrum auction, statistical model, Stephen Hawking, Supply of New York City Cabdrivers, systems thinking, tacit knowledge, The Bell Curve by Richard Herrnstein and Charles Murray, The Great Moderation, the long tail, The Rise and Fall of American Growth, the rule of 72, the scientific method, The Spirit Level, the strength of weak ties, The Wisdom of Crowds, Thomas Malthus, Thorstein Veblen, Tragedy of the Commons, urban sprawl, value at risk, web application, winner-take-all economy, zero-sum game

To make sense of how opioids received approval and how the epidemic arose, we apply four models to generate some core intuitions as to how the crisis came to be. The first model, the multi-armed bandit model, explains why opioids were approved for use. When seeking drug approval, a pharmaceutical company runs clinical trials to demonstrate drug efficacy and a lack of deleterious side effects. We can model a clinical trial as a multi-armed bandit problem where one arm corresponds to prescribing the new drug and the other arm corresponds to a placebo or the existing drug. A Model of Opioid Approval Multi-Armed Bandit Model To demonstrate their efficacy, opioids were tested against placebos. In clinical trials, patients were randomly assigned to take either opioids or a placebo.

…

According to the first model, charismatic CEOs who can convince well-connected employees can introduce new strategies that trump culture. According to the second model, culture trumps weak incentives but not strong ones. 27. Multi-Armed Bandit Problems There’s one thing I’m really good at, and that’s hitting the ball over a net, in a box. I’m excellent. —Serena Williams In this chapter, we add uncertainty to the problem of learning the best alternative to create a class of models known as multi-armed bandit problems. In bandit problems, rewards from alternatives are distributions rather than fixed amounts. Bandit problems apply to a wide variety of real-world situations.

…

Bayesian Multi-Armed Bandit Problems In a Bayesian bandit problem, the decision-maker has prior beliefs over the reward distributions of the alternatives. Given these prior beliefs, a decision maker can quantify the trade-off between exploration and exploitation and (in theory) make optimal decisions in each period. However, except for the simplest of bandit problems, determining the optimal action requires rather tedious calculations. In real-world applications, these exact calculations may be impractical, obliging decision makers to rely on approximations. Bayesian Multi-Armed Bandit Problems A collection of alternatives {A, B, C, D,…, N} have associated reward distributions {f (A), f (B), f (C), f (D),…, f (N)}.

pages: 406 words: 109,794

Range: Why Generalists Triumph in a Specialized World by David Epstein

Airbnb, Albert Einstein, Apollo 11, Apple's 1984 Super Bowl advert, Atul Gawande, Checklist Manifesto, Claude Shannon: information theory, Clayton Christensen, clockwork universe, cognitive bias, correlation does not imply causation, Daniel Kahneman / Amos Tversky, deep learning, deliberate practice, Exxon Valdez, fail fast, Flynn Effect, Freestyle chess, functional fixedness, game design, Gene Kranz, Isaac Newton, Johannes Kepler, knowledge economy, language acquisition, lateral thinking, longitudinal study, Louis Pasteur, Mark Zuckerberg, medical residency, messenger bag, meta-analysis, Mikhail Gorbachev, multi-armed bandit, Nelson Mandela, Netflix Prize, pattern recognition, Paul Graham, precision agriculture, prediction markets, premature optimization, pre–internet, random walk, randomized controlled trial, retrograde motion, Richard Feynman, Richard Feynman: Challenger O-ring, Silicon Valley, Silicon Valley billionaire, Stanford marshmallow experiment, Steve Jobs, Steve Wozniak, Steven Pinker, sunk-cost fallacy, systems thinking, Walter Mischel, Watson beat the top human players on Jeopardy!, Y Combinator, young professional

That could be a problem of grit, or it could be a decision made in response to match quality information that could not have been gleaned without giving it a try. Carnegie Mellon economics and statistics professor Robert A. Miller modeled career matching—and a decision to attend a military academy is a major career choice—as a “multi-armed bandit process.” “One-armed bandit” is slang for a slot machine. A multi-armed bandit process refers to a hypothetical scenario: a single gambler is sitting in front of an entire row of slot machines; each machine has its own unique probability of reward with each pull; the gambler’s challenge is to test the different machines and try to figure out the best way to allocate their lever pulls to maximize rewards.

…

Persevering through difficulty is a competitive advantage for any traveler of a long road, but he suggested that knowing when to quit is such a big strategic advantage that every single person, before undertaking an endeavor, should enumerate conditions under which they should quit. The important trick, he said, is staying attuned to whether switching is simply a failure of perseverance, or astute recognition that better matches are available. Beast Barracks is perfect for a multi-armed bandit approach to quitting. A group of high achievers, not one of whom has an iota of military experience, pulls the West Point “lever,” so to speak. That is, they begin a high-risk, high-reward program and from week one get a massive information signal about whether military discipline is for them.

…

The Grit Scale statement “I have been obsessed with a certain idea or project for a short time but later lost interest” is Van Gogh in a nutshell, at least up until the final few years of his life when he settled on his unique style and creatively erupted. Van Gogh was an example of match quality optimization, Robert Miller’s multi-armed bandit process come to life. He tested options with maniacal intensity and got the maximum information signal about his fit as quickly as possible, and then moved to something else and repeated, until he had zigzagged his way to a place no one else had ever been, and where he alone excelled. Van Gogh’s Grit Scale score, according to Naifeh’s assessment, was flush with hard work but low on sticking with every goal or project.

pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values by Brian Christian

Albert Einstein, algorithmic bias, Alignment Problem, AlphaGo, Amazon Mechanical Turk, artificial general intelligence, augmented reality, autonomous vehicles, backpropagation, butterfly effect, Cambridge Analytica, Cass Sunstein, Claude Shannon: information theory, computer vision, Computing Machinery and Intelligence, data science, deep learning, DeepMind, Donald Knuth, Douglas Hofstadter, effective altruism, Elaine Herzberg, Elon Musk, Frances Oldham Kelsey, game design, gamification, Geoffrey Hinton, Goodhart's law, Google Chrome, Google Glasses, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, hedonic treadmill, ImageNet competition, industrial robot, Internet Archive, John von Neumann, Joi Ito, Kenneth Arrow, language acquisition, longitudinal study, machine translation, mandatory minimum, mass incarceration, multi-armed bandit, natural language processing, Nick Bostrom, Norbert Wiener, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, OpenAI, Panopticon Jeremy Bentham, pattern recognition, Peter Singer: altruism, Peter Thiel, precautionary principle, premature optimization, RAND corporation, recommendation engine, Richard Feynman, Rodney Brooks, Saturday Night Live, selection bias, self-driving car, seminal paper, side project, Silicon Valley, Skinner box, sparse data, speech recognition, Stanislav Petrov, statistical model, Steve Jobs, strong AI, the map is not the territory, theory of mind, Tim Cook: Apple, W. E. B. Du Bois, Wayback Machine, zero-sum game

As Stuart Russell put it in his original 1998 paper, “Can we determine the reward function by observation during rather than after learning?” Russell, “Learning Agents for Uncertain Environments (Extended Abstract).” 31. This remains very much an open research question. For recent work, see Chan et al., “The Assistive Multi-Armed Bandit.” 32. Berkeley’s Smitha Milli and Anca Drăgan have explored this question: see Milli and Drăgan, “Literal or Pedagogic Human?” 33. Stefano Ermon, interview by Ariel Conn, Future of Life Institute, January 26, 2017, https://futureoflife.org/2017/01/26/stefano-ermon-interview/. 34. Roman Yampolskiy, interview by Ariel Conn, Future of Life Institute, January 18, 2017, https://futureoflife.org/2017/01/18/roman-yampolskiy-interview/. 35.

…

In Machine Learning: Proceedings of the Tenth International Conference, 41–48, 1993. Caswell, Estelle. “Color Film Was Built for White People. Here’s What It Did to Dark Skin.” Vox, September 18, 2015. https://www.vox.com/2015/9/18/9348821/photography-race-bias. Chan, Lawrence, Dylan Hadfield-Menell, Siddhartha Srinivasa, and Anca Drăgan. “The Assistive Multi-Armed Bandit.” In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), 354–63. IEEE, 2019. “A Chance to Fix Parole in New York.” New York Times, September 4, 2015. Chen, Dawn, Joshua C. Peterson, and Thomas L. Griffiths. “Evaluating Vector-Space Models of Analogy.” In Proceedings of the 39th Annual Conference of the Cognitive Science Society, 2017.

pages: 344 words: 96,020

Hacking Growth: How Today's Fastest-Growing Companies Drive Breakout Success by Sean Ellis, Morgan Brown

Airbnb, Amazon Web Services, barriers to entry, behavioural economics, Ben Horowitz, bounce rate, business intelligence, business process, content marketing, correlation does not imply causation, crowdsourcing, dark pattern, data science, DevOps, disruptive innovation, Elon Musk, game design, gamification, Google Glasses, growth hacking, Internet of things, inventory management, iterative process, Jeff Bezos, Khan Academy, Kickstarter, Lean Startup, Lyft, Mark Zuckerberg, market design, minimum viable product, multi-armed bandit, Network effects, Paul Graham, Peter Thiel, Ponzi scheme, recommendation engine, ride hailing / ride sharing, Salesforce, Sheryl Sandberg, side project, Silicon Valley, Silicon Valley startup, Skype, Snapchat, software as a service, Steve Jobs, Steve Jurvetson, subscription business, TED Talk, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, working poor, Y Combinator, young professional

Recall from the last chapter, for example, how the engineers on the Pinterest engagement growth team built the Copytune machine learning program in order to supercharge speed of experimentation with copy in 30 languages in numerous emails used to retain existing Pinterest users. This is an example of what’s called a multivariate test, which goes beyond comparing two alternatives to comparing every possible combination of each element of a message to find the highest-performing permutation. Or, take what’s called multi-armed bandit, a more sophisticated testing scheme used by companies to find winning results faster. We’ll introduce more types of tests that can be run and explain them in more detail later in the book. EXPERIMENTS WITHIN THE PRODUCT The more complicated tests that require significant engineering time are usually those that are changes to the products themselves.

pages: 416 words: 112,268

Human Compatible: Artificial Intelligence and the Problem of Control by Stuart Russell

3D printing, Ada Lovelace, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Alfred Russel Wallace, algorithmic bias, AlphaGo, Andrew Wiles, artificial general intelligence, Asilomar, Asilomar Conference on Recombinant DNA, augmented reality, autonomous vehicles, basic income, behavioural economics, Bletchley Park, blockchain, Boston Dynamics, brain emulation, Cass Sunstein, Charles Babbage, Claude Shannon: information theory, complexity theory, computer vision, Computing Machinery and Intelligence, connected car, CRISPR, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, deep learning, deepfake, DeepMind, delayed gratification, Demis Hassabis, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Ernest Rutherford, fake news, Flash crash, full employment, future of work, Garrett Hardin, Geoffrey Hinton, Gerolamo Cardano, Goodhart's law, Hans Moravec, ImageNet competition, Intergovernmental Panel on Climate Change (IPCC), Internet of things, invention of the wheel, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John Nash: game theory, John von Neumann, Kenneth Arrow, Kevin Kelly, Law of Accelerating Returns, luminiferous ether, machine readable, machine translation, Mark Zuckerberg, multi-armed bandit, Nash equilibrium, Nick Bostrom, Norbert Wiener, NP-complete, OpenAI, openstreetmap, P = NP, paperclip maximiser, Pareto efficiency, Paul Samuelson, Pierre-Simon Laplace, positional goods, probability theory / Blaise Pascal / Pierre de Fermat, profit maximization, RAND corporation, random walk, Ray Kurzweil, Recombinant DNA, recommendation engine, RFID, Richard Thaler, ride hailing / ride sharing, Robert Shiller, robotic process automation, Rodney Brooks, Second Machine Age, self-driving car, Shoshana Zuboff, Silicon Valley, smart cities, smart contracts, social intelligence, speech recognition, Stephen Hawking, Steven Pinker, superintelligent machines, surveillance capitalism, Thales of Miletus, The Future of Employment, The Theory of the Leisure Class by Thorstein Veblen, Thomas Bayes, Thorstein Veblen, Tragedy of the Commons, transport as a service, trolley problem, Turing machine, Turing test, universal basic income, uranium enrichment, vertical integration, Von Neumann architecture, Wall-E, warehouse robotics, Watson beat the top human players on Jeopardy!, web application, zero-sum game

Neither author refers to the early work of Harsanyi, “Games with incomplete information, Parts I–III,” or Cyert and de Groot, “Adaptive utility.” 41. An initial paper on helping humans who don’t know their own preferences and are learning about them: Lawrence Chan et al., “The assistive multi-armed bandit,” in Proceedings of the 14th ACM/IEEE International Conference on Human–Robot Interaction (HRI), ed. David Sirkin et al. (IEEE, 2019). 42. Eliezer Yudkowsky, in Coherent Extrapolated Volition (Singularity Institute, 2004), lumps all these aspects, as well as plain inconsistency, under the heading of muddle—a term that has not, unfortunately, caught on. 43.

pages: 665 words: 159,350

Shape: The Hidden Geometry of Information, Biology, Strategy, Democracy, and Everything Else by Jordan Ellenberg

Albert Einstein, AlphaGo, Andrew Wiles, autonomous vehicles, British Empire, Brownian motion, Charles Babbage, Claude Shannon: information theory, computer age, coronavirus, COVID-19, deep learning, DeepMind, Donald Knuth, Donald Trump, double entry bookkeeping, East Village, Edmond Halley, Edward Jenner, Elliott wave, Erdős number, facts on the ground, Fellow of the Royal Society, Geoffrey Hinton, germ theory of disease, global pandemic, government statistician, GPT-3, greed is good, Henri Poincaré, index card, index fund, Isaac Newton, Johannes Kepler, John Conway, John Nash: game theory, John Snow's cholera map, Louis Bachelier, machine translation, Mercator projection, Mercator projection distort size, especially Greenland and Africa, Milgram experiment, multi-armed bandit, Nate Silver, OpenAI, Paul Erdős, pets.com, pez dispenser, probability theory / Blaise Pascal / Pierre de Fermat, Ralph Nelson Elliott, random walk, Rubik’s Cube, self-driving car, side hustle, Snapchat, social distancing, social graph, transcontinental railway, urban renewal

One more thing I’d like to acknowledge is all the things it would have been great to write about in a book about geometry, but aren’t here, because I ran out of time and space. I had meant to write about “lumpers and splitters” and the theory of clustering; Judea Pearl and the use of directed acyclic graphs in the study of causality; navigation charts of the Marshall Islands; exploitation versus exploration and multi-armed bandits; binocular vision in praying mantis larvae; the maximum size of a subset of the N by N grid with no three points forming an isosceles triangle (let me know if you solve this actually); much more on dynamics, starting from Poincaré and going through billiards, Sinai, and Mirzakhani; much more on Descartes, who launched the unification of algebra and geometry and somehow ended up nearly absent here; and much more on Grothendieck, who took that unification much farther than Descartes could have dreamed and somehow ditto; catastrophe theory; the tree of life.