The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy
Sharon Bertsch McGrayne

Bayes was interred on April 15, which is often called the date of his death. The degraded condition of his vault may have contributed to the confusion. Second, the often-reproduced portrait of Thomas Bayes is almost assuredly of someone else named “T. Bayes.” The sketch first appeared in 1936 in History of Life Insurance in its Formative Years by Terence O’Donnell. However, the picture’s caption on page 335 says it is of “Rev. T. Bayes, Improver of the Columnar Method developed by Barrett,” and Barrett did not develop his method until 1810, a half-century after the death of “our” Rev. Thomas Bayes. Bellhouse (2004) first noticed that the portrait’s hairstyle is anachronistic. Sharon North, curator of Textiles and Fashion at the Victoria and Albert Museum, London, agrees: “The hairstyle in this portrait looks very 20th century. . . .

The Laws of Medicine: Field Notes From an Uncertain Science
Siddhartha Mukherjee

Thank you for downloading this TED Books eBook. * * * Join our mailing list and get updates on new releases, deals, bonus content and other great books from TED Books and Simon & Schuster. CLICK HERE TO SIGN UP or visit us online to sign up at eBookNews.SimonandSchuster.com To Thomas Bayes (1702–1761), who saw uncertainty with such certainty “Are you planning to follow a career in Magical Laws, Miss Granger?” asked Scrimgeour. “No, I’m not,” retorted Hermione. “I’m hoping to do some good in the world!” J. K. Rowling The learned men of former ages employed a great part of their time and thoughts searching out the hidden causes of distemper, were curious in imagining the secret workmanship of nature and . . . putting all these fancies together, fashioned to themselves systems and hypotheses [that] diverted their enquiries from the true and advantageous knowledge of things.

…

It applies not only to medicine but to any other discipline that is predicated on predictions: economics or banking, gambling or astrology. The core logic holds true whether you are trying to forecast tomorrow’s weather or seeking to predict rises and falls in the stock market. It is a universal feature of all tests. .... The man responsible for this strange and illuminating idea was neither a doctor nor a scientist by trade. Born in Hertfordshire in 1702, Thomas Bayes was a clergyman and philosopher who served as the minister at the chapel in Tunbridge Wells, near London. He published only two significant papers in his lifetime—the first, a defense of God, and the second, a defense of Newton’s theory of calculus (it was a sign of the times that in 1732, a clergyman found no cognitive dissonance between these two efforts). His best-known work—on probability theory—was not published during his lifetime and was only rediscovered decades after his death.

The Drunkard's Walk: How Randomness Rules Our Lives
Leonard Mlodinow

The experiment was still in progress, he reported, and now he was suing his former employer, who had produced a psychiatrist willing to testify that he suffered from paranoia. One of the paranoid delusions the former employer’s psychiatrist pointed to was the student’s alleged invention of a fictitious eighteenth-century minister. In particular, the psychiatrist scoffed at the student’s claim that this minister was an amateur mathematician who had created in his spare moments a bizarre theory of probability. The minister’s name, according to the student, was Thomas Bayes. His theory, the student asserted, described how to assess the chances that some event would occur if some other event also occurred. What are the chances that a particular student would be the subject of a vast secret conspiracy of experimental psychologists? Admittedly not huge. But what if one’s wife speaks one’s thoughts before one can utter them and co-workers foretell your professional fate over drinks in casual conversation?

…

And he presented the court with a mumbo jumbo of formulas and calculations regarding his hypothesis, concluding that the additional evidence meant that the probability was 999,999 in 1 million that he was right about the conspiracy. The enemy psychiatrist claimed that this mathematician-minister and his theory were figments of the student’s schizophrenic imagination. The student asked the professor to help him refute that claim. The professor agreed. He had good reason, for Thomas Bayes, born in London in 1701, really was a minister, with a parish at Tunbridge Wells. He died in 1761 and was buried in a park in London called Bunhill Fields, in the same grave as his father, Joshua, also a minister. And he indeed did invent a theory of “conditional probability” to show how the theory of probability can be extended from independent events to events whose outcomes are connected.

…

The professor supplied a deposition explaining Bayes’s existence and his theory, though not supporting the specific and dubious calculations that his former student claimed proved his sanity. The sad part of this story is not just the middle-aged schizophrenic himself, but the medical and legal team on the other side. It is unfortunate that some people suffer from schizophrenia, but even though drugs can help to mediate the illness, they cannot battle ignorance. And ignorance of the ideas of Thomas Bayes, as we shall see, resides at the heart of many serious mistakes in both medical diagnosis and legal judgment. It is an ignorance that is rarely addressed during a doctor’s or a lawyer’s professional training. We also make Bayesian judgments in our daily lives. A film tells the story of an attorney who has a great job, a charming wife, and a wonderful family. He loves his wife and daughter, but still he feels that something is missing in his life.

Algorithms to Live By: The Computer Science of Human Decisions
Brian Christian,
Tom Griffiths

The story begins in eighteenth-century England, in a domain of inquiry irresistible to great mathematical minds of the time, even those of the clergy: gambling. Reasoning Backward with the Reverend Bayes If we be, therefore, engaged by arguments to put trust in past experience, and make it the standard of our future judgement, these arguments must be probable only. —DAVID HUME More than 250 years ago, the question of making predictions from small data weighed heavily on the mind of the Reverend Thomas Bayes, a Presbyterian minister in the charming spa town of Tunbridge Wells, England. If we buy ten tickets for a new and unfamiliar raffle, Bayes imagined, and five of them win prizes, then it seems relatively easy to estimate the raffle’s chances of a win: 5/10, or 50%. But what if instead we buy a single ticket and it wins a prize? Do we really imagine the probability of winning to be 1/1, or 100%?

…

Shedler. “An Anomaly in Space-Time Characteristics of Certain Programs Running in a Paging Machine.” Communications of the ACM 12, no. 6 (1969): 349–353. Belew, Richard K. Finding Out About: A Cognitive Perspective on Search Engine Technology and the WWW. Cambridge, UK: Cambridge University Press, 2000. Bell, Aubrey F. G. In Portugal. New York: John Lane, 1912. Bellhouse, David R. “The Reverend Thomas Bayes, FRS: A Biography to Celebrate the Tercentenary of His Birth.” Statistical Science 19 (2004): 3–43. Bellman, Richard. Dynamic Programming. Princeton, NJ: Princeton University Press, 1957. ______. “A Problem in the Sequential Design of Experiments.” Sankhyā: The Indian Journal of Statistics 16 (1956): 221–229. Bellows, Meghan L., and J. D. Luc Peterson. “Finding an Optimal Seating Chart.” Annals of Improbable Research (2012).

The Signal and the Noise: Why So Many Predictions Fail-But Some Don't
Nate Silver

Finding patterns is easy in any kind of data-rich environment; that’s what mediocre gamblers do. The key is in determining whether the patterns represent noise or signal. But although there isn’t any one particular key to why Voulgaris might or might not bet on a given game, there is a particular type of thought process that helps govern his decisions. It is called Bayesian reasoning. The Improbable Legacy of Thomas Bayes Thomas Bayes was an English minister who was probably born in 1701—although it may have been 1702. Very little is certain about Bayes’s life, even though he lent his name to an entire branch of statistics and perhaps its most famous theorem. It is not even clear that anybody knows what Bayes looked like; the portrait of him that is commonly used in encyclopedia articles may have been misattributed.19 What is in relatively little dispute is that Bayes was born into a wealthy family, possibly in the southeastern English county of Hertfordshire.

…

On average, a team will go either over or under the total five games in a row about five times per season. That works out to 150 such streaks per season between the thirty NBA teams combined. 19. D. R. Bellhouse, “The Reverend Thomas Bayes FRS: A Biography to Celebrate the Tercentenary of His Birth,” Statistical Science, 19, 1, pp. 3–43; 2004. http://www2.isye.gatech.edu/~brani/isyebayes/bank/bayesbiog.pdf. 20. Bayes may also have been an Arian, meaning someone who followed the teachings of the early Christian leader Arias and who regarded Jesus Christ as the divine son of God rather than (as most Christians then and now believe) a direct manifestation of God. 21. Thomas Bayes, “Divine Benevolence: Or an Attempt to Prove That the Principal End of the Divine Providence and Government Is the Happiness of His Creatures.” http://archive.org/details/DivineBenevolenceOrAnAttemptToProveThatThe. 22.

…

There are many reasons for it—some having to do with our psychological biases, some having to do with common methodological errors, and some having to do with misaligned incentives. Close to the root of the problem, however, is a flawed type of statistical thinking that these researchers are applying. FIGURE 8-6: A GRAPHICAL REPRESENTATION OF FALSE POSITIVES When Statistics Backtracked from Bayes Perhaps the chief intellectual rival to Thomas Bayes—although he was born in 1890, almost 120 years after Bayes’s death—was an English statistician and biologist named Ronald Aylmer (R. A.) Fisher. Fisher was a much more colorful character than Bayes, almost in the English intellectual tradition of Christopher Hitchens. He was handsome but a slovenly dresser,42 always smoking his pipe or his cigarettes, constantly picking fights with his real and imagined rivals.

Mastering Pandas
Femi Anthony

The various topics that will be discussed are as follows: Introduction to Bayesian statistics Mathematical framework for Bayesian statistics Probability distributions Bayesian versus Frequentist statistics Introduction to PyMC and Monte Carlo simulation Illustration of Bayesian inference – Switchpoint detection Introduction to Bayesian statistics The field of Bayesian statistics is built on the work of Reverend Thomas Bayes, an 18th century statistician, philosopher, and Presbyterian minister. His famous Bayes' theorem, which forms the theoretical underpinnings for Bayesian statistics, was published posthumously in 1763 as a solution to the problem of inverse probability. For more details on this topic, refer to http://en.wikipedia.org/wiki/Thomas_Bayes. Inverse probability problems were all the rage in the early 18th century and were often formulated as follows: Suppose you play a game with a friend. There are 10 green balls and 7 red balls in bag 1 and 4 green and 7 red balls in bag 2.

Against the Gods: The Remarkable Story of Risk
Peter L. Bernstein

With that innocent-sounding assertion, Bernoulli explained why King Midas was an unhappy man, why people tend to be risk-averse, and why prices must fall if customers are to be persuaded to buy more. Bernoulli's statement stood as the dominant paradigm of rational behavior for the next 250 years and laid the groundwork for modern principles of investment management. Almost exactly one hundred years after the collaboration between Pascal and Fermat, a dissident English minister named Thomas Bayes made a striking advance in statistics by demonstrating how to make better-informed decisions by mathematically blending new information into old information. Bayes's theorem focuses on the frequent occasions when we have sound intuitive judgments about the probability of some event and want to understand how to alter those judgments as actual events unfold. All the tools we use today in risk management and in the analysis of decisions and choice, from the strict rationality of game theory to the challenges of chaos theory, stem from the developments that took place between 1654 and 1760, with only two exceptions: In 1875, Francis Galton, an amateur mathematician who was Charles Darwin's first cousin, discovered regression to the mean, which explains why pride goeth before a fall and why clouds tend to have silver linings.

…

In this scenario, the data are given-10 pins, 12 pins, 1 pin-and the probability is the unknown. Questions put in this manner form the subject matter of what is known as inverse probability: with 12 defective pins out of 100,000, what is the probability that the true average ratio of defectives to the total is 0.01%? One of the most effective treatments of such questions was proposed by a minister named Thomas Bayes, who was born in 1701 and lived in Kent." Bayes was a Nonconformist; he rejected most of the ceremonial rituals that the Church of England had retained from the Catholic Church after their separation in the time of Henry VIII. Not much is known about Bayes, even though he was a Fellow of the Royal Society. One otherwise dry and impersonal textbook in statistics went so far as to characterize him as "enigmatic."16 He published nothing in mathematics while he was alive and left only two works that were published after his death but received little attention when they appeared.

…

The most exciting feature of all the achievements mentioned in this chapter is the daring idea that uncertainty can be measured. Uncertainty means unknown probabilities; to reverse Hacking's description of certainty, we can say that something is uncertain when our information is correct and an event fails to happen, or when our information is incorrect and an event does happen. Jacob Bernoulli, Abraham de Moivre, and Thomas Bayes showed how to infer previously unknown probabilities from the empirical facts of reality. These accomplishments are impressive for the sheer mental agility demanded, and audacious for their bold attack on the unknown. When de Moivre invoked ORIGINAL DESIGN, he made no secret of his wonderment at his own accomplishments. He liked to turn such phrases; at another point, he writes, "If we blind not ourselves with metaphysical dust we shall be led by a short and obvious way, to the acknowledgment of the great MAKER and GOUVERNOUR of all."25 We are by now well into the eighteenth century, when the Enlightenment identified the search for knowledge as the highest form of human activity.

The Irrational Economist: Making Decisions in a Dangerous World
Erwann Michel-Kerjan,
Paul Slovic

This chapter explores a two-part conjecture: (1) After the occurrence of a virgin risk, people will overestimate the probability of another occurrence in the near future; (2) by contrast, after an experienced risk occurs, people will under-update their assessment of another event occurring soon. THE INABILITY TO USE BAYESIAN UPDATING IN EVERYDAY PRACTICE Risks are often posited to have an unknown true probability. The textbook model for how to proceed employs Bayes’ Rule (after eighteenth-century British mathematician Thomas Bayes), which shows mathematically how people should rationally change their existing beliefs about something in light of new evidence. Individuals use information available beforehand to form a so-called prior belief about the probability that an event will occur in a given period. New evidence about the risk is captured in something called a likelihood function, which expresses how plausible the evidence is given each possible value of the probability.

…

American Enterprise Institute American International Group (AIG) American Psychiatric Association, homosexuality and Americans-in-London problem Amygdala(fig.) Anxiety Arrow, Ken Arthur Andersen Assets Asteroid and Comet Impact Hazards Group (NASA) Asteroid explosions, risk of At War with the Weather (Kunreuther and Michel-Kerjan) Attention deficit disorder Awareness, behavioral change and Bali Action Plan (2007) Bargaining games. See also Game Theory; Theory of Games; Ultimatum Games Batson, Daniel Bayes, Thomas Bayes’ Rule Bayesian updating Behavior acceptable awareness and collective Behavior (continued) decision making and descriptive models of individual learned managerial market motivating myopic neuroscience and rational social uncertainty/risk and Behavioral biases Behavioral data, linking(fig.) Behavioral explanations Behavioral research Behavioral science Beliefs Benefits concentrated extreme sharing uncertain Bhopal disaster Black Death Blair, Tony Bonds catastrophe municipal Bowman, Edward Brain emotional/rational parts of Brain activity unfair offers and(fig.)

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
Eric Siegel

To prepare for this battle, we armed PA with powerful weaponry. The predictions were generated from machine learning across 50 million learning cases, each depicting a micro-lesson from history of the form, “User Mary was shown ad A and she did click it” (a positive case) or “User John was shown ad B and he did not click it” (a negative case). The learning technology employed to pick the best ad for each user was a Naïve Bayes model. Reverend Thomas Bayes was an eighteenth-century mathematician, and the “Naïve” part means that we take a very smart man’s ideas and compromise them in a way that simplifies yet makes their application feasible, resulting in a practical method that’s often considered good enough at prediction, and scales to the task at hand. I went with this method for its relative simplicity, since in fact I needed to generate 291 such models, one for each ad.

…

Apple Mac Apple Siri Argonne National Laboratory Arizona Petrified Forest National Park Arizona State University artificial intelligence (AI) about Amazon.com Mechanical Turk mind-reading technology possibility of, the Watson computer and Asimov, Isaac astronomy AT&T Research BellKor Netflix Prize teams Australia Austria automobile insurance crashes, predicting credit scores and accidents driver inatentiveness, predicting fraud predictions for Averitt aviation incidents Aviva Insurance (UK) AWK computer language B backtesting. See also test data Baesens, Ben bagging (bootstrap aggregating) Bangladesh Barbie dolls Bayes, Thomas (Bayes Network) Beane, Billy Beano Beaux, Alex behavioral predictors Bella Pictures BellKor BellKor Netflix Prize teams Ben Gurion University (Israel) Bernstein, Peter Berra, Yogi Big Bang Theory, The Big Bang theory Big Brother BigChaos team “big data” movement billing errors, predicting black box trading Black Swan, The (Taleb) blogs and blogging anxiety, predicting from entries collective intelligence and data glut and content in LiveJournal mood prediction research via nature of Blue Cross Blue Shield of Tennessee BMW BNSF Railway board games, predictive play of Bohr, Niels book titles, testing Bowie, David brain activity, predicting Brandeis, Louis Brasil Telecom (Oi) breast cancer, predicting Brecht, Bertolt Breiman, Leo Brigham Young University British Broadcasting Corporation (BBC) Brobst, Stephen Brooks, Mel Brynjolfsson, Eric buildings, predicting fault in Bullard, Ben burglaries, predicting business rules, decision trees and buying behavior, predicting C Cage, Nicolas Canadian Automobile Association Canadian Tire car crashes and harm, predicting CareerBuilder Carlin, George Carlson, Gretchen Carnegie Mellon University CART decision trees Castagno, Davide causality cell phone industry consumer behavior and dropped calls, predicting GPS data and location predicting Telenor (Norway) CellTel (African telecom) Central Tables.

**
** by
Luciano Floridi

The question she is implicitly asking is: `what is the probability thatA (= the email was infected), given the fact that B (= the email was blocked by the antivirus and placed in the quarantine folder) when, on average, 2% of my emails are actually infected and my antivirus is successful 95% of the time?'. Jill has just identified a way of acquiring (learning) the missing piece of information that will help her to adopt the right strategy: if the chance that some emails in the quarantine folder might not be infected is very low, she will check it only occasionally. How could she obtain such a missing piece of information? The answer is by using a Bayesian approach. Thomas Bayes (1702-1761) was a Presbyterian minister and English mathematician whose investigations into probability, published posthumously, led to what is now known as Bayes' theorem and a new branch of applications of probability theory. The theorem calculates the posterior probability of an eventA given event B (that is, P(AIB) on the basis of the prior probability ofA (that is, P(A)). Basically, it tells us what sort of information can be retrodicted.

Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future
Cory Doctorow

The Future of Internet Immune Systems (Originally published on InformationWeek's Internet Evolution, November 19, 2007) Bunhill Cemetery is just down the road from my flat in London. It’s a handsome old boneyard, a former plague pit (“Bone hill” — as in, there are so many bones under there that the ground is actually kind of humped up into a hill). There are plenty of luminaries buried there — John “Pilgrim’s Progress” Bunyan, William Blake, Daniel Defoe, and assorted Cromwells. But my favorite tomb is that of Thomas Bayes, the 18th-century statistician for whom Bayesian filtering is named. Bayesian filtering is plenty useful. Here’s a simple example of how you might use a Bayesian filter. First, get a giant load of non-spam emails and feed them into a Bayesian program that counts how many times each word in their vocabulary appears, producing a statistical breakdown of the word-frequency in good emails. Then, point the filter at a giant load of spam (if you’re having a hard time getting a hold of one, I have plenty to spare), and count the words in it.

**
** by
Paul J. Nahin

The “general doctrine” does have a sort of plausibility to it: “if Y then not X” when “reversed” could be thought to imply “if X then not Y.” Boole argued that this is not so, using the ideas of the previous section, and showed that P( | X) is given by a considerably more involved expression than simply “p.” What Boole did was not really original, as conditional probability had been studied a century before by the English philosopher and minister Thomas Bayes (1701–1761), whose work was published posthumously in 1764 in the Philosophical Transactions of the Royal Society of London, where it was then promptly forgotten for twenty years until the great French mathematician Pierre-Simon Laplace (1749–1827) endorsed Bayes’s results. What Boole did, then, with the following analysis, was remind his readers what the Reverend Bayes had done a hundred years before.

**
** by
John H. Johnson

Ellen Davis, “Committing the ‘Gambler’s Fallacy’ May Be in the Cards, New Research Shows,” Texas A&M Health Science Center website, March 9, 2015, http://news.tamhsc.edu/?post=committing-the-gamblers-fallacy-may‑be‑in‑the -cards-new-research-shows. Thanks to Ron Friedman for the find. 27. There’s another way of looking at this, known as Bayesian probability (after the eighteenth-century English mathematician Thomas Bayes). With Bayesian probability, you use the data gathered to update your initial beliefs after the fact. It’s the opposite of the way in which the gambler’s fallacy works. As one of John’s colleagues pointed out, it’s the difference between knowing that a coin is fair and learning about the coin. So, a Bayesian might flip a coin 10 times, get heads all 10 times, and adjust his probability to say that the coin was always more likely to land heads up.

The Crash Detectives
Christine Negroni

Some of the credit for finally finding the submerged airliner goes to Metron Scientific Solutions, a company staffed with pencil-wielding mathematicians who used probability, logic, and numbers to conclude that the likely resting place of the plane was a narrow slice of ocean that had already been checked. “A lack of success tells you about where it is not, and that contributes to knowledge,” said Larry Stone, chief scientist at Metron. Talk about having a positive point of view. The Metron method is based on Bayesian probability, the theory of eighteenth-century statistician and philosopher Thomas Bayes, whose first published work, Divine Benevolence, was equally optimistic because it attempted to prove that God wants us to be happy. Using Bayesian logic to look for missing airplanes, as interpreted by Metron, involves taking all kinds of input about the missing thing (even conflicting input) and assigning levels of certainty or uncertainty to each. Everything gets a weight, and everything gets revised as things change.

**
** by
Ray Kurzweil

He plans to develop a system incorporating all human ideas.167 One application would be to inform policy makers of which ideas are held by which community. Bayesian Nets. Over the last decade a technique called Bayesian logic has created a robust mathematical foundation for combining thousands or even millions of such probabilistic rules in what are called "belief networks" or Bayesian nets. Originally devised by English mathematician Thomas Bayes and published posthumously in 1763, the approach is intended to determine the likelihood of future events based on similar occurrences in the past.168 Many expert systems based on Bayesian techniques gather data from experience in an ongoing fashion, thereby continually learning and improving their decision making. The most promising type of spam filters are based on this method. I personally use a spam filter called SpamBayes, which trains itself on e-mail that you have identified as either "spam" or "okay."169 You start out by presenting a folder of each to the filter.

…

Anthes, "Computerizing Common Sense," Computerworld, April 8, 2002, http://www.computerworld.com/news/2002/story/0,11280,69881,00.html. 167. Kristen Philipkoski, "Now Here's a Really Big Idea," Wired News, November 25, 2002, http://www.wired.com/news/technology/0,1282,56374,00.html, reporting on Darryl Macer, "The Next Challenge Is to Map the Human Mind," Nature 420 (November 14, 2002): 121; see also a description of the project at http://www.biol.tsukuba.ac.jp/~macer/index.html. 168. Thomas Bayes, "An Essay Towards Solving a Problem in the Doctrine of Chances," published in 1763, two years after his death in 1761. 169. SpamBayes spam filter, http://spambayes.sourceforge.net. 170. Lawrence R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proceedings of the IEEE 77 (1989): 257–86. For a mathematical treatment of Markov models, see http://jedlik.phy.bme.hu/~gerjanos/HMM/node2.html. 171.

**
** by
Nicholas Dunbar

Note that we ignore any mention of time, or the time value of money, in this example, which is the equivalent of setting the risk-free interest rate to zero. 6. One might argue that since the market values the loans at $800 million, the bank ought to write down the value of the equity investment to zero. However, accounting rules for loan books don’t require such recognitions to take place. 7. After de Moivre’s death, the refinement of mortality calculations was continued in London by Richard Price, friend of Thomas Bayes and Benjamin Franklin, and founding actuary of the Equitable Life Assurance Society. 8. Arturo Cifuentes and Gerard O’Connor, “The Binomial Expansion Method Applied to CBO/CLO Analysis,” Moody’s Investors Service special report, December 13, 1996. 9. Ibid. 10. For a detailed account of the invention of BISTRO, see Gillian Tett, Fool’s Gold: How the Bold Dream of a Small Tribe at J.P. Morgan Was Corrupted by Wall Street Greed and Unleashed a Catastrophe (New York: Free Press, 2009). 11.

**
** by
Toby Segaran

In docclass.py, create a subclass of classifier called naivebayes, and create a docprob method that extracts the features (words) and multiplies all their probabilities together to get an overall probability: class naivebayes(classifier): def docprob(self,item,cat): features=self.getfeatures(item) # Multiply the probabilities of all the features together p=1 for f in features: p*=self.weightedprob(f,cat,self.fprob) return p You now know how to calculate Pr(Document | Category), but this isn't very useful by itself. In order to classify documents, you really need Pr(Category | Document). In other words, given a specific document, what's the probability that it fits into this category? Fortunately, a British mathematician named Thomas Bayes figured out how to do this about 250 years ago. A Quick Introduction to Bayes' Theorem Bayes' Theorem is a way of flipping around conditional probabilities. It's usually written as: Pr(A | B) = Pr(B | A) × Pr(A)/Pr(B) In the example, this becomes: Pr(Category | Document) = Pr(Document | Category) × Pr(Category) / Pr(Document) The previous section showed how to calculate Pr(Document | Category), but what about the other two values in the equation?

**
** by
Jonathan Aldred

This is a very broad question, so we shall focus on just one aspect of it, namely the practice of quantifying uncertainty and ignorance — inventing probabilities when there is no basis to do so. The practice is widespread among economists because many of them believe that, no matter how extreme the uncertainty, effective probabilities always exist. This view is termed ‘subjective Bayesianism’ (hereafter Bayesianism for short), from the Reverend Thomas Bayes, an 18th-century English mathematician.35 Its implications are startling. Bayesians believe there is no such thing as pure uncertainty in the sense I have defined it. They assert that we always use probabilities, consciously or otherwise, when outcomes are not certain. The issues are more clearly depicted in simple gambling games than messy real-world choices; the Ellsberg Paradox (see box opposite) is a classic illustration.

**
** by
James Barrat

But by the time the tragedy unfolded, Holtzman told me, Good had retired. He was not in his office but at home, perhaps calculating the probability of God’s existence. According to Dr. Holtzman, sometime before he died, Good updated that probability from zero to point one. He did this because as a statistician, he was a long-term Bayesian. Named for the eighteenth-century mathematician and minister Thomas Bayes, Bayesian statistics’ main idea is that in calculating the probability of some statement, you can start with a personal belief. Then you update that belief as new evidence comes in that supports your statement or doesn’t. If Good’s original disbelief in God had remained 100 percent, no amount of data, not even God’s appearance, could change his mind. So, to be consistent with his Bayesian perspective, Good assigned a small positive probability to the existence of God to make sure he could learn from new data, if it arose.

**
** by
Philip Tetlock,
Dan Gardner

If he says, “It’s to the left,” the likelihood of the first ball being on the right side of the table increases a little more. Keep repeating the process and you slowly narrow the range of the possible locations, zeroing in on the truth—although you will never eliminate uncertainty entirely.16 If you’ve taken Statistics 101, you may recall a version of this thought experiment was dreamt up by Thomas Bayes. A Presbyterian minister, educated in logic, Bayes was born in 1701, so he lived at the dawn of modern probability theory, a subject to which he contributed with “An Essay Towards Solving a Problem in the Doctrine of Chances.” That essay, in combination with the work of Bayes’ friend Richard Price, who published Bayes’ essay posthumously in 1761, and the insights of the great French mathematician Pierre-Simon Laplace, ultimately produced Bayes’ theorem.

**
** by
Brian Christian

I walk out of the Brighton Centre, to the bracing sea air for a minute, and into a small, locally owned shoe store looking for a gift to bring back home to my girlfriend; the shopkeeper notices my accent; I tell her I’m from Seattle; she is a grunge fan; I comment on the music playing in the store; she says it’s Florence + the Machine; I tell her I like it and that she would probably like Feist … I walk into a tea and scone store called the Mock Turtle and order the British equivalent of coffee and a donut, except it comes with thirteen pieces of silverware and nine pieces of flatware; I am so in England, I think; an old man, probably in his eighties, is shakily eating a pastry the likes of which I’ve never seen; I ask him what it is; “coffee meringue,” he says and remarks on my accent; an hour later he is telling me about World War II, the exponentially increasing racial diversity of Britain, that House of Cards is a pretty accurate depiction of British politics, minus the murders, but that really I should watch Spooks; do you get Spooks on cable, he is asking me … I meet my old boss for dinner; and after a couple years of being his research assistant and occasionally co-author, and after a brief thought of becoming one of his Ph.D. students, after a year of our paths not really crossing, we negotiate whether our formerly collegial and hierarchical relationship, now that its context is removed, simply dries up or flourishes into a domain-general friendship; we are ordering appetizers and saying something about Wikipedia, something about Thomas Bayes, something about vegetarian dining … Laurels are of no use. If you de-anonymized yourself in the past, great. But that was that. And now, you begin again. 1. These logs would, three years later, be put on the IBM website, albeit in incomplete form and with so little fanfare that Kasparov himself wouldn’t find out about them until 2005. Epilogue: The Unsung Beauty of the Glassware Cabinet The Most Room-Like Room: The Cornell Box The image-processing world, it turns out, has a close analogue to the Turing test, called “the Cornell box,” which is a small model of a room with one red wall and one green wall (the others are white) and two blocks sitting inside it.

**
** by
Michael Lewis

The subject picked one of the bags at random and, without glancing inside the bag, began to pull chips out of it, one at a time. After extracting each chip, he’d give the psychologists his best guess of the odds that the bag he was holding was filled with mostly red, or mostly white, chips. The beauty of the experiment was that there was a correct answer to the question: What is the probability that I am holding the bag of mostly red chips? It was provided by a statistical formula called Bayes’s theorem (after Thomas Bayes, who, strangely, left the formula for others to discover in his papers after his death, in 1761). Bayes’s rule allowed you to calculate the true odds, after each new chip was pulled from it, that the book bag in question was the one with majority white, or majority red, chips. Before any chips had been withdrawn, those odds were 50:50—the bag in your hands was equally likely to be either majority red or majority white.

**
** by
William Poundstone

Smith holds or has applied for patents covering such exotica as a computer made out of DNA, theft-proof credit cards, a 3-D vision process, and a magnetic catapult that could be used for launching satellites, In December 2000, with the Supreme Court deciding a bitterly contested presidency, Smith completed an article purporting to demonstrate the superiority of a system that no one had taken seriously, range voting. He began with an idea for comparing the merits of different voting systems, using a measure called Bayesian regret. The "Bayes" part refers to eighteenth-century English mathematician Thomas Bayes, a pioneer of probability theory, "Bayesian regret" is a statistical term that Smith defines as "expected avoidable human unhappiness." In other words, Smith tried to gauge how voting systems fail the voters by electing candidates other than the one who would have resulted in the greatest overall satisfaction, To do this, he ran a large series of computer simulations of elections. In each of his simulations, virtual voters were assigned utilities (degrees of happiness, measured numerically) for simulated candidates.

**
** by
Cory Doctorow

No one could tell which of the Internet's packets were Xnet and which ones were just plain old banking and e-commerce and other encrypted communication. You couldn't find out who was tying the Xnet, let alone who was using the Xnet. But what about Dad's "Bayesian statistics?" I'd played with Bayesian math before. Darryl and I once tried to write our own better spam filter and when you filter spam, you need Bayesian math. Thomas Bayes was an 18th century British mathematician that no one cared about until a couple hundred years after he died, when computer scientists realized that his technique for statistically analyzing mountains of data would be super-useful for the modern world's info-Himalayas. Here's some of how Bayesian stats work. Say you've got a bunch of spam. You take every word that's in the spam and count how many times it appears.

**
** by
Bruce Frey

What about the accuracy of a negative result? Of the 9,102 women who will score negative on the screening, 12 actually have cancer. This is a relatively small 1/10 of 1 percent, but the testing will miss those people altogether, and they will not receive treatment. Why It Works Medical screening accuracy uses a specific application of a generalized approach to conditional probability attributed to Thomas Bayes, a philosopher and mathematician in the 1700s. "If this, then what are the chances that..." is a conditional probability question. Bayes's approach to conditional probabilities was to look at the naturally occurring frequencies of events. The basic formula for estimating the chance that one has a disease if one has a positive test result is: Expressed as conditional probabilities, the formula is: To answer the all-important question in our breast cancer example ("If a woman scores a positive test result, how likely is she to have breast cancer?")

**
** by
Aaron Brown,
Eric Kim

Second, we know nothing about the accuracy of this statement in particular; we only make a claim about the long-term accuracy of lots of statements. This is how we turn an event that has already happened—drawing nine red marbles out of 10—into a hypothetical coin-flip gambling game that can be repeated indefinitely. The main alternative to frequentist statistics today is the Bayesian view. It is named for Thomas Bayes, an eighteenth-century theorist, but it was Pierre-Simon Laplace who put forth the basic ideas. It was not until the twentieth century, however, that researchers, including Richard Cox and Bruno de Finetti, created the modern formulation. In the Bayesian view of the urn, you must have some prior belief about the number of red marbles in the urn. For example, you might believe that any number from 0 to 100 red marbles is equally likely.

**
** by
Niall Ferguson

In 1738 the Swiss mathematician Daniel Bernoulli proposed that ‘The value of an item must not be based on its price, but rather on the utility that it yields’, and that the ‘utility resulting from any small increase in wealth will be inversely proportionate to the quantity of goods previously possessed’ - in other words $100 is worth more to someone on the median income than to a hedge fund manager. 6. Inference. In his ‘Essay Towards Solving a Problem in the Doctrine of Chances’ (published posthumously in 1764), Thomas Bayes set himself the following problem: ‘Given the number of times in which an unknown event has happened and failed; Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named.’ His resolution of the problem - ‘The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the chance of the thing expected upon it’s [sic] happening’ - anticipates the modern formulation that expected utility is the probability of an event times the payoff received in case of that event.18 In short, it was not merchants but mathematicians who were the true progenitors of modern insurance.

**
** by
Carol Alexander

For instance, if we threw a fair die 600 times we would expect to get a five 100 times. Thus, because we observe that there is 1 chance in 6 of getting a five when a fair die is thrown, we say that the probability of this event is 1/6. But long before the relative frequentist theory came to dominate our approach to probability and statistics, a more general Bayesian approach to probability and statistics had been pioneered by Thomas Bayes (1702–1761). The classical approach is based on objective information culled from experimental observations, but Bayes allowed subjective assessments of probabilities to be made, calling these assessments the prior beliefs. In fact, the classical approach is just a simple case of Bayesian probability and statistics, where there is no subjective information and so the prior distribution is uniform.

**
** by
Cory Doctorow

Brings up the grass a treat, as you can see." He gestured at the rolling lawns to one side of the ancient, mossy, fenced-in headstones. "Nonconformist cemetery," he went on, leading me deeper. "Unconsecrated ground. Lots of interesting folks buried here. You got your writers: like John Bunyon who wrote Pilgrims Progress. You got your philosophers, like Thomas Hardy. And some real maths geniuses, like old Thomas Bayes --" He pointed to a low, mossy tomb. "He invented a branch of statistics that got built into every spam filter, a couple hundred years after they buried him." He sat down on a bench. It was after mid-day now, and only a few people were eating lunch around us, none close enough to overhear us. "It's a grand life as a gentleman adventurer," he said. "Nothing to do all day but pluck choice morsels out of the bin and read the signboards the local historical society puts up in the graveyard."

**
** by
Pedro Domingos

Once we know how to do all these things, we’ll be ready to learn the Bayesian way. For Bayesians, learning is “just” another application of Bayes’ theorem, with whole models as the hypotheses and the data as the evidence: as you see more data, some models become more likely and some less, until ideally one model stands out as the clear winner. Bayesians have invented fiendishly clever kinds of models. So let’s get started. Thomas Bayes was an eighteenth-century English clergyman who, without realizing it, became the center of a new religion. You may well ask how that could happen, until you notice that it happened to Jesus, too: Christianity as we know it was invented by Saint Paul, while Jesus saw himself as the pinnacle of the Jewish faith. Similarly, Bayesianism as we know it was invented by Pierre-Simon de Laplace, a Frenchman who was born five decades after Bayes.

**
** by
Robert Wachter

This is the part of diagnostic reasoning that beginners find most vexing, since they lack the foundational knowledge to understand why their teacher focused so intently on one nugget of information and all but ignored others that, to the novice, seemed equally crucial. How do the great diagnosticians make such choices? We now recognize this as a relatively intuitive version of Bayes’ theorem. Developed by the eighteenth-century British theologian-turned-mathematician Thomas Bayes, this theorem (often ignored by students because it is taught to them with the dryness of a Passover matzo) is the linchpin of clinical reasoning. In essence, Bayes’ theorem says that any medical test must be interpreted from two perspectives. The first: How accurate is the test—that is, how often does it give right or wrong answers? The second: How likely is it that this patient has the disease the test is looking for?

**
** by
Steven Drobny

We shrink my expected returns towards zero rather than towards some equilibrium model forecast, the latter of which is more appropriate given our macro focus. We then input that adjusted expected return into the Titanic funnel, which assesses how much it will lose in a variety of cataclysms, giving me the recommended position. It is important to note that I am not putting trades on just to achieve the Titanic loss number. I am just looking for mispricings. Bayesian Methods The term Bayesian refers to the work Thomas Bayes, who proved a specific case of the now eponymous theorem, published after his death in 1761. The Bayesian interpretation of probability can be seen as a form of logic that allows for analysis of uncertain statements. To evaluate the probability of a hypothesis, Bayes’ theorem compares probabilities before and after the existence of new data. Unlike other methods for analyzing hypotheses, which attempt to reject or accept a statement, the Bayesian view seeks to assign dynamic probabilities that depend on the existence of relevant information.

**
** by
Daniel Kahneman

And if you believe that there is a 30% chance that candidate X will be elected president, and an 80% chance that he will be reelected if he wins the first time, then you must believe that the chances that he will be elected twice in a row are 24%. The relevant “rules” for cases such as the Tom W problem are provided by Bayesian statistics. This influential modern approach to statistics is named after an English minister of the eighteenth century, the Reverend Thomas Bayes, who is credited with the first major contribution to a large problem: the logic of how people should change their mind in the light of evidence. Bayes’s rule specifies how prior beliefs (in the examples of this chapter, base rates) should be combined with the diagnosticity of the evidence, the degree to which it favors the hypothesis over the alternative. For example, if you believe that 3% of graduate students are enrolled in computer science (the base rate), and you also believe that the description of Tom W is 4 times more likely for a graduate student in that field than in other fields, then Bayes’s rule says you must believe that the probability that Tom W is a computer scientist is now 11%.

**
** by
Nassim Nicholas Taleb

There were two main sources of technical knowledge and innovation in the nineteenth and early twentieth centuries: the hobbyist and the English rector, both of whom were generally in barbell situations. An extraordinary proportion of work came out of the rector, the English parish priest with no worries, erudition, a large or at least comfortable house, domestic help, a reliable supply of tea and scones with clotted cream, and an abundance of free time. And, of course, optionality. The enlightened amateur, that is. The Reverends Thomas Bayes (as in Bayesian probability) and Thomas Malthus (Malthusian overpopulation) are the most famous. But there are many more surprises, cataloged in Bill Bryson’s Home, in which the author found ten times more vicars and clergymen leaving recorded traces for posterity than scientists, physicists, economists, and even inventors. In addition to the previous two giants, I randomly list contributions by country clergymen: Rev.

**
** by
Daniel J. Levitin

Very unlikely. So this shows we’re capable of using base rate information when events are extremely unlikely. It’s when they’re only mildly unlikely that our brains freeze up. Organizing our decisions requires that we combine the base rate information with other relevant diagnostic information. This type of reasoning was discovered in the eighteenth century by the mathematician and Presbyterian minister Thomas Bayes, and bears his name: Bayes’s rule. Bayes’s rule allows us to refine estimates. For example, we read that roughly half of marriages end in divorce. But we can refine that estimate if we have additional information, such as the age, religion, or location of the people involved, because the 50% figure holds only for the aggregate of all people. Some subpopulations of people have higher divorce rates than others.

The Stuff of Thought: Language as a Window Into Human Nature
Steven Pinker

The world is a tissue of causes and effects that criss and cross in tangled patterns. The embarrassments for Hume’s two theories of causation (conjunction and counterfactuals) can be diagrammed as a family of networks in which the lines fan in or out or loop around, as in the diagram on the following page. One solution to the webbiness of causation is a technique in artificial intelligence called Causal Bayes Networks.120 (They are named for Thomas Bayes, whose eponymous theorem shows how to calculate the probability of some condition from its prior plausibility and the likelihood that it led to some observed symptoms.) A modeler chooses a set of variables (amount of coffee drunk, amount of exercise, presence of heart disease, and so on), draws arrows between causes and their effects, and labels each arrow with a number representing the strength of the causal influence (the increase or decrease in the likelihood of the effect, given the presence of the cause).

**
** by
Jiawei Han,
Micheline Kamber,
Jian Pei

Naïve Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class-conditional independence. It is made to simplify the computations involved and, in this sense, is considered “naïve." Section 8.3.1 reviews basic probability notation and Bayes’ theorem. In Section 8.3.2 you will learn how to do naïve Bayesian classification. 8.3.1. Bayes’ Theorem Bayes’ theorem is named after Thomas Bayes, a nonconformist English clergyman who did early work in probability and decision theory during the 18th century. Let X be a data tuple. In Bayesian terms, X is considered “evidence.” As usual, it is described by measurements made on a set of n attributes. Let H be some hypothesis such as that the data tuple X belongs to a specified class C. For classification problems, we want to determine , the probability that the hypothesis H holds given the “evidence” or observed data tuple X.