statistical model

291 results back to index


Natural Language Processing with Python and spaCy by Yuli Vasiliev

Bayesian statistics, computer vision, data science, database schema, Easter island, en.wikipedia.org, loose coupling, natural language processing, Skype, statistical model

Statistical language modeling is vital to many natural language processing tasks, such as natural language generating and natural language understanding. For this reason, a statistical model lies at the heart of virtually any NLP application. Figure 1-4 provides a conceptual depiction of how an NLP application uses a statistical model. Figure 1-4: A high-level conceptual view of an NLP application’s architecture The application interacts with spaCy’s API, which abstracts the underlying statistical model. The statistical model contains information like word vectors and linguistic annotations. The linguistic annotations might include features such as part-of-speech tags and syntactic annotations. The statistical model also includes a set of machine learning algorithms that can extract the necessary pieces of information from the stored data.

If you decide to upgrade your installed spaCy package to the latest version, you can do this using the following pip command: $ pip install -U spacy Installing Statistical Models for spaCy The spaCy installation doesn’t include statistical models that you’ll need when you start using the library. The statistical models contain knowledge collected about the particular language from a set of sources. You must separately download and install each model you want to use. Several pretrained statistical models are available for different languages. For English, for example, the following models are available for download from spaCy’s website: en_core_web_sm, en_core_web_md, en_core_web_lg, and en_vectors_web_lg.

Figure 1-1 provides a high-level depiction of the model training stage. Figure 1-1: Generating a statistical model with a machine learning algorithm using a large volume of text data as input Your model processes large volumes of text data to understand which words share characteristics; then it creates word vectors for those words that reflect those shared characteristics. As you’ll learn in “What Is a Statistical Model in NLP?” on page 8, such a word vector space is not the only component of a statistical model built for NLP. The actual structure is typically more complicated, providing a way to extract linguistic features for each word depending on the context in which it appears.


pages: 50 words: 13,399

The Elements of Data Analytic Style by Jeff Leek

correlation does not imply causation, data science, Netflix Prize, p-value, pattern recognition, Ronald Coase, statistical model, TED Talk

As an example, suppose you are analyzing data to identify a relationship between geography and income in a city, but all the data from suburban neighborhoods are missing. 6. Statistical modeling and inference The central goal of statistical modeling is to use a small subsample of individuals to say something about a larger population. The reasons for taking this sample are often the cost or difficulty of measuring data on the whole population. The subsample is identified with probability (Figure 6.1). Figure 6.1 Probability is used to obtain a sample Statistical modeling and inference are used to try to generalize what we see in the sample to the population. Inference involves two separate steps, first obtaining a best estimate for what we expect in the population (Figure 6.2).

Before these steps, it is critical to tidy, check, and explore the data to identify dataset specific conditions that may violate your model assumptions. 6.12.4 Assuming the statistical model fit is good Once a statistical model is fit to data it is critical to evaluate how well the model describes the data. For example, with a linear regression analysis it is critical to plot the best fit line over the scatterplot of the original data, plot the residuals, and evaluate whether the estimates are reasonable. It is ok to fit only one statistical model to a data set to avoid data dredging, as long as you carefully report potential flaws with the model. 6.12.5 Drawing conclusions about the wrong population When you perform inference, the goal is to make a claim about the larger population you have sampled from.

Histograms and boxplots are good ways to check that the measurements you observe fall on the right scale. 4.10 Common mistakes 4.10.1 Failing to check the data at all A common temptation in data analysis is to load the data and immediately leap to statistical modeling. Checking the data before analysis is a critical step in the process. 4.10.2 Encoding factors as quantitative numbers If a scale is qualitative, but the variable is encoded as 1, 2, 3, etc. then statistical modeling functions may interpret this variable as a quantitative variable and incorrectly order the values. 4.10.3 Not making sufficient plots A common mistake is to only make tabular summaries of the data when doing data checking.


pages: 442 words: 94,734

The Art of Statistics: Learning From Data by David Spiegelhalter

Abraham Wald, algorithmic bias, Anthropocene, Antoine Gombaud: Chevalier de Méré, Bayesian statistics, Brexit referendum, Carmen Reinhart, Charles Babbage, complexity theory, computer vision, confounding variable, correlation coefficient, correlation does not imply causation, dark matter, data science, deep learning, DeepMind, Edmond Halley, Estimating the Reproducibility of Psychological Science, government statistician, Gregor Mendel, Hans Rosling, Higgs boson, Kenneth Rogoff, meta-analysis, Nate Silver, Netflix Prize, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, p-value, placebo effect, probability theory / Blaise Pascal / Pierre de Fermat, publication bias, randomized controlled trial, recommendation engine, replication crisis, self-driving car, seminal paper, sparse data, speech recognition, statistical model, sugar pill, systematic bias, TED Talk, The Design of Experiments, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Malthus, Two Sigma

Even with the Bradford Hill criteria outlined above, statisticians are generally reluctant to attribute causation unless there has been an experiment, although computer scientist Judea Pearl and others have made great progress in setting out the principles for building causal regression models from observational data.2 Pearson Correlation Gradient of regression of offspring on parent Mothers and daughters 0.31 0.33 Fathers and sons 0.39 0.45 Table 5.2 Correlations between heights of adult children and parent of the same gender, and gradients of the regression of the offspring’s on the parent’s height. Regression Lines Are Models The regression line we fitted between fathers’ and sons’ heights is a very basic example of a statistical model. The US Federal Reserve define a model as a ‘representation of some aspect of the world which is based on simplifying assumptions’: essentially some phenomenon will be represented mathematically, generally embedded in computer software, in order to produce a simplified ‘pretend’ version of reality.3 Statistical models have two main components. First, a mathematical formula that expresses a deterministic, predictable component, for example the fitted straight line that enables us to make a prediction of a son’s height from his father’s.

We have data that can help us answer some of these questions, with which we have already done some exploratory plotting and drawn some informal conclusions about an appropriate statistical model. But we now come to a formal aspect of the Analysis part of the PPDAC cycle, generally known as hypothesis testing. What Is a ‘Hypothesis’? A hypothesis can be defined as a proposed explanation for a phenomenon. It is not the absolute truth, but a provisional, working assumption, perhaps best thought of as a potential suspect in a criminal case. When discussing regression in Chapter 5, we saw the claim that observation = deterministic model + residual error. This represents the idea that statistical models are mathematical representations of what we observe, which combine a deterministic component with a ‘stochastic’ component, the latter representing unpredictability or random ‘error’, generally expressed in terms of a probability distribution.

A two-sided test would be appropriate for a null hypothesis that a treatment effect, say, is exactly zero, and so both positive and negative estimates would lead to the null being rejected. one-tailed and two-tailed P-values: those corresponding to one-sided and two-sided tests. over-fitting: building a statistical model that is over-adapted to training data, so that its predictive ability starts to decline. parameters: the unknown quantities in a statistical model, generally denoted with Greek letters. Pearson correlation coefficient: for a set of n paired numbers, (x1, y1), (x2, y2) … (xn, yn), when , sx are the sample mean and standard deviation of the xs, and , sy are the sample mean and standard deviation of the ys, the Pearson correlation coefficient is given by Suppose xs and ys have both been standardized to Z-scores given by us and vs respectively, so that ui = (xi – )/sx, and vi = (yi – )/sy.


pages: 404 words: 92,713

The Art of Statistics: How to Learn From Data by David Spiegelhalter

Abraham Wald, algorithmic bias, Antoine Gombaud: Chevalier de Méré, Bayesian statistics, Brexit referendum, Carmen Reinhart, Charles Babbage, complexity theory, computer vision, confounding variable, correlation coefficient, correlation does not imply causation, dark matter, data science, deep learning, DeepMind, Edmond Halley, Estimating the Reproducibility of Psychological Science, government statistician, Gregor Mendel, Hans Rosling, Higgs boson, Kenneth Rogoff, meta-analysis, Nate Silver, Netflix Prize, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, p-value, placebo effect, probability theory / Blaise Pascal / Pierre de Fermat, publication bias, randomized controlled trial, recommendation engine, replication crisis, self-driving car, seminal paper, sparse data, speech recognition, statistical model, sugar pill, systematic bias, TED Talk, The Design of Experiments, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Malthus, Two Sigma

Even with the Bradford Hill criteria outlined above, statisticians are generally reluctant to attribute causation unless there has been an experiment, although computer scientist Judea Pearl and others have made great progress in setting out the principles for building causal regression models from observational data.2 * Table 5.2 Correlations between heights of adult children and parent of the same gender, and gradients of the regression of the offspring’s on the parent’s height. Regression Lines Are Models The regression line we fitted between fathers’ and sons’ heights is a very basic example of a statistical model. The US Federal Reserve define a model as a ‘representation of some aspect of the world which is based on simplifying assumptions’: essentially some phenomenon will be represented mathematically, generally embedded in computer software, in order to produce a simplified ‘pretend’ version of reality.3 Statistical models have two main components. First, a mathematical formula that expresses a deterministic, predictable component, for example the fitted straight line that enables us to make a prediction of a son’s height from his father’s.

We have data that can help us answer some of these questions, with which we have already done some exploratory plotting and drawn some informal conclusions about an appropriate statistical model. But we now come to a formal aspect of the Analysis part of the PPDAC cycle, generally known as hypothesis testing. What Is a ‘Hypothesis’? A hypothesis can be defined as a proposed explanation for a phenomenon. It is not the absolute truth, but a provisional, working assumption, perhaps best thought of as a potential suspect in a criminal case. When discussing regression in Chapter 5, we saw the claim that observation = deterministic model + residual error. This represents the idea that statistical models are mathematical representations of what we observe, which combine a deterministic component with a ‘stochastic’ component, the latter representing unpredictability or random ‘error’, generally expressed in terms of a probability distribution.

A two-sided test would be appropriate for a null hypothesis that a treatment effect, say, is exactly zero, and so both positive and negative estimates would lead to the null being rejected. one-tailed and two-tailed P-values: those corresponding to one-sided and two-sided tests. over-fitting: building a statistical model that is over-adapted to training data, so that its predictive ability starts to decline. parameters: the unknown quantities in a statistical model, generally denoted with Greek letters. Pearson correlation coefficient: for a set of n paired numbers, (x1, y1), (x2, y2)… (xn, yn), when , sx are the sample mean and standard deviation of the xs, and , sy are the sample mean and standard deviation of the ys, the Pearson correlation coefficient is given by Suppose xs and ys have both been standardized to Z-scores given by us and vs respectively, so that ui = (xi − )/sx, and vi = (yi − )/sy.


pages: 227 words: 62,177

Numbers Rule Your World: The Hidden Influence of Probability and Statistics on Everything You Do by Kaiser Fung

Alan Greenspan, American Society of Civil Engineers: Report Card, Andrew Wiles, behavioural economics, Bernie Madoff, Black Swan, business cycle, call centre, correlation does not imply causation, cross-subsidies, Daniel Kahneman / Amos Tversky, edge city, Emanuel Derman, facts on the ground, financial engineering, fixed income, Gary Taubes, John Snow's cholera map, low interest rates, moral hazard, p-value, pattern recognition, profit motive, Report Card for America’s Infrastructure, statistical model, the scientific method, traveling salesman

Figure C-1 Drawing a Line Between Natural and Doping Highs Because the anti-doping laboratories face bad publicity for false positives (while false negatives are invisible unless the dopers confess), they calibrate the tests to minimize false accusations, which allows some athletes to get away with doping. The Virtue of Being Wrong The subject matter of statistics is variability, and statistical models are tools that examine why things vary. A disease outbreak model links causes to effects to tell us why some people fall ill while others do not; a credit-scoring model identifies correlated traits to describe which borrowers are likely to default on their loans and which will not. These two examples represent two valid modes of statistical modeling. George Box is justly celebrated for his remark “All models are false but some are useful.” The mark of great statisticians is their confidence in the face of fallibility.

Highway engineers in Minnesota tell us why their favorite tactic to reduce congestion is a technology that forces commuters to wait more, while Disney engineers make the case that the most effective tool to reduce wait times does not actually reduce average wait times. Second, variability does not need to be explained by reasonable causes, despite our natural desire for a rational explanation of everything; statisticians are frequently just as happy to pore over patterns of correlation. In Chapter 2, we compare and contrast these two modes of statistical modeling by trailing disease detectives on the hunt for tainted spinach (causal models) and by prying open the black box that produces credit scores (correlational models). Surprisingly, these practitioners freely admit that their models are “wrong” in the sense that they do not perfectly describe the world around us; we explore how they justify what they do.

Their special talent is the educated guess, with emphasis on the adjective. The leaders of the pack are practical-minded people who rely on detailed observation, directed research, and data analysis. Their Achilles heel is the big I, when they let intuition lead them astray. This chapter celebrates two groups of statistical modelers who have made lasting, positive impacts on our lives. First, we meet the epidemiologists whose investigations explain the causes of disease. Later, we meet credit modelers who mark our fiscal reputation for banks, insurers, landlords, employers, and so on. By observing these scientists in action, we will learn how they have advanced the technical frontier and to what extent we can trust their handiwork. ~###~ In November 2006, the U.S.


pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline by Cathy O'Neil, Rachel Schutt

Amazon Mechanical Turk, augmented reality, Augustin-Louis Cauchy, barriers to entry, Bayesian statistics, bike sharing, bioinformatics, computer vision, confounding variable, correlation does not imply causation, crowdsourcing, data science, distributed generation, Dunning–Kruger effect, Edward Snowden, Emanuel Derman, fault tolerance, Filter Bubble, finite state, Firefox, game design, Google Glasses, index card, information retrieval, iterative process, John Harrison: Longitude, Khan Academy, Kickstarter, machine translation, Mars Rover, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, performance metric, personalized medicine, pull request, recommendation engine, rent-seeking, selection bias, Silicon Valley, speech recognition, statistical model, stochastic process, tacit knowledge, text mining, the scientific method, The Wisdom of Crowds, Watson beat the top human players on Jeopardy!, X Prize

He was using it to mean data models—the representation one is choosing to store one’s data, which is the realm of database managers—whereas she was talking about statistical models, which is what much of this book is about. One of Andrew Gelman’s blog posts on modeling was recently tweeted by people in the fashion industry, but that’s a different issue. Even if you’ve used the terms statistical model or mathematical model for years, is it even clear to yourself and to the people you’re talking to what you mean? What makes a model a model? Also, while we’re asking fundamental questions like this, what’s the difference between a statistical model and a machine learning algorithm? Before we dive deeply into that, let’s add a bit of context with this deliberately provocative Wired magazine piece, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete,” published in 2008 by Chris Anderson, then editor-in-chief.

In the case of proteins, a model of the protein backbone with side-chains by itself is removed from the laws of quantum mechanics that govern the behavior of the electrons, which ultimately dictate the structure and actions of proteins. In the case of a statistical model, we may have mistakenly excluded key variables, included irrelevant ones, or assumed a mathematical structure divorced from reality. Statistical modeling Before you get too involved with the data and start coding, it’s useful to draw a picture of what you think the underlying process might be with your model. What comes first? What influences what? What causes what?

These two seem obviously different, so it seems the distinction should should be obvious. Unfortunately, it isn’t. For example, regression can be described as a statistical model as well as a machine learning algorithm. You’ll waste your time trying to get people to discuss this with any precision. In some ways this is a historical artifact of statistics and computer science communities developing methods and techniques in parallel and using different words for the same methods. The consequence of this is that the distinction between machine learning and statistical modeling is muddy. Some methods (for example, k-means, discussed in the next section) we might call an algorithm because it’s a series of computational steps used to cluster or classify objects—on the other hand, k-means can be reinterpreted as a special case of a Gaussian mixture model.


pages: 209 words: 13,138

Empirical Market Microstructure: The Institutions, Economics and Econometrics of Securities Trading by Joel Hasbrouck

Alvin Roth, barriers to entry, business cycle, conceptual framework, correlation coefficient, discrete time, disintermediation, distributed generation, experimental economics, financial intermediation, index arbitrage, information asymmetry, interest rate swap, inventory management, market clearing, market design, market friction, market microstructure, martingale, payment for order flow, power law, price discovery process, price discrimination, quantitative trading / quantitative finance, random walk, Richard Thaler, second-price auction, selection bias, short selling, statistical model, stochastic process, stochastic volatility, transaction costs, two-sided market, ultimatum game, zero-sum game

If we know that the structural model is the particular one described in section 9.2, we simply set vt so that qt = +1, set ut = 0 and forecast using equation (9.7). We do not usually know the structural model, however. Typically we’re working from estimates of a statistical model (a VAR or VMA). This complicates specification of ε0 . From the perspective of the VAR or VMA model of the trade and price data, the innovation vector and its variance are: 2 σp,q σp εp,t . (9.15) and = εt = εq,t σp,q σq2 The innovations in the statistical model are simply associated with the observed variables, and have no necessary structural interpretation. We can still set εq,t according to our contemplated trade (εq,t = +1), but how should we set εp,t ?

The role they play and how they should be regulated are ongoing concerns of practical interest. 117 12 Limit Order Markets The worldwide proliferation of limit order markets (LOMs) clearly establishes a need for economic and statistical models of these mechanisms. This chapter discusses some approaches, but it should be admitted at the outset that no comprehensive and realistic models (either statistical or economic) exist. One might start with the view that a limit order, being a bid or offer, is simply a dealer quote by another name. The implication is that a limit order is exposed to asymmetric information risk and also must recover noninformational costs of trade. This view supports the application of the economic and statistical models described earlier to LOM, hybrid, and other nondealer markets.

HG4521.H353 2007 332.64—dc22 2006003935 9 8 7 6 5 4 3 2 1 Printed in the United States of America on acid-free paper To Lisa, who inspires these pages and much more. This page intentionally left blank Preface This book is a study of the trading mechanisms in financial markets: the institutions, the economic principles underlying the institutions, and statistical models for analyzing the data they generate. The book is aimed at graduate and advanced undergraduate students in financial economics and practitioners who design or use order management systems. Most of the book presupposes only a basic familiarity with economics and statistics. I began writing this book because I perceived a need for treatment of empirical market microstructure that was unified, authoritative, and comprehensive.


pages: 257 words: 13,443

Statistical Arbitrage: Algorithmic Trading Insights and Techniques by Andrew Pole

algorithmic trading, Benoit Mandelbrot, constrained optimization, Dava Sobel, deal flow, financial engineering, George Santayana, Long Term Capital Management, Louis Pasteur, low interest rates, mandelbrot fractal, market clearing, market fundamentalism, merger arbitrage, pattern recognition, price discrimination, profit maximization, proprietary trading, quantitative trading / quantitative finance, risk tolerance, Sharpe ratio, statistical arbitrage, statistical model, stochastic volatility, systematic trading, transaction costs

Orders, up to a threshold labeled ‘‘visibility threshold,’’ have less impact on large-volume days. Fitting a mathematical curve or statistical model to the order size–market impact data yields a tool for answering the question: How much will I have to pay to buy 10,000 shares of XYZ? Note that buy and sell responses may be different and may be dependent on whether the stock is moving up or down that day. Breaking down the raw (60-day) data set and analyzing up days and down days separately will illuminate that issue. More formally, one could define an encompassing statistical model including an indicator variable for up or down day and test the significance of the estimated coefficient.

Consideration of change is Preface xv introduced from this first toe dipping into analysis, because temporal dynamics underpin the entirety of the project. Without the dynamic there is no arbitrage. In Chapter 3 we increase the depth and breadth of the analysis, expanding the modeling scope from simple observational rules1 for pairs to formal statistical models for more general portfolios. Several popular models for time series are described but detailed focus is on weighted moving averages at one extreme of complexity and factor analysis at another, these extremes serving to carry the message as clearly as we can make it. Pair spreads are referred to throughout the text serving, as already noted, as the simplest practical illustrator of the notions discussed.

Events in trading volume series provide information sometimes not identified (by turning point analysis) in price series. Volume patterns do not directly affect price spreads but volume spurts are a useful warning that a stock may be subject to unusual trading activity and that price development may therefore not be as characterized in statistical models that have been estimated on average recent historical price series. In historical analysis, flags of unusual activity are extremely important in the evaluation of, for example, simulation 25 Statistical Arbitrage 80 $ 70 60 50 40 19970102 19970524 19971016 19980312 FIGURE 2.8 Adjusted close price trace (General Motors) with 20 percent turning points identified TABLE 2.1 Event return summary for Chrysler–GM Criterion daily 30% move 25% move 20% move # Events Return Correlation 332 22 26 33 0.53 0.75 0.73 0.77 results.


Thinking with Data by Max Shron

business intelligence, Carmen Reinhart, confounding variable, correlation does not imply causation, data science, Growth in a Time of Debt, iterative process, Kenneth Rogoff, randomized controlled trial, Richard Feynman, statistical model, The Design of Experiments, the scientific method

This could be part of a solution to the first two needs, verifying that there is a strong relationship between public transit and the housing market, and trying to predict whether apartments are under- or overpriced. Digging into our experience, we know that graphs are just one way to express a relationship. Two others are models and maps. How might we capture the relevant relationships with a statistical model? A statistical model would be a way to relate some notion of transit access to some notion of apartment price, controlling for other factors. We can clarify our idea with a mockup. The mockup here would be a sentence interpreting the hypothetical output. Results from a model might have conclusions like, “In New York City, apartment prices fall by 5% for every block away from the A train, compared to similar apartments.”

Depending on the resolution of the map, this could potentially meet the first two needs (making a case for a connection and finding outliers) as well, through visual inspection. A map is easier to inspect, but harder to calibrate or interpret. Each has its strengths and weaknesses. A scatterplot is going to be easy to make once we have some data, but potentially misleading. The statistical model will collapse down a lot of variation in the data in order to arrive at a general, interpretable conclusion, potentially missing interesting patterns. The map is going to be limited in its ability to account for variables that aren’t spatial, and we may have a harder time interpreting the results.

It is rare that we can get an audience to understand something just from lists of facts. Transformations make data intelligible, allowing raw data to be incorporated into an argument. A transformation puts an interpretation on data by highlighting things that we take to be essential. Counting all of the sales in a month is a transformation, as is plotting a graph or fitting a statistical model of page visits against age, or making a map of every taxi pickup in a city. Returning to our transit example, if we just wanted to show that there is some relationship between transit access and apartment prices, a high-resolution map of apartment prices overlaid on a transit map would be reasonable evidence, as would a two-dimensional histogram or scatterplot of the right quantities.


pages: 327 words: 103,336

Everything Is Obvious: *Once You Know the Answer by Duncan J. Watts

"World Economic Forum" Davos, active measures, affirmative action, Albert Einstein, Amazon Mechanical Turk, AOL-Time Warner, Bear Stearns, behavioural economics, Black Swan, business cycle, butterfly effect, carbon credits, Carmen Reinhart, Cass Sunstein, clockwork universe, cognitive dissonance, coherent worldview, collapse of Lehman Brothers, complexity theory, correlation does not imply causation, crowdsourcing, death of newspapers, discovery of DNA, East Village, easy for humans, difficult for computers, edge city, en.wikipedia.org, Erik Brynjolfsson, framing effect, Future Shock, Geoffrey West, Santa Fe Institute, George Santayana, happiness index / gross national happiness, Herman Kahn, high batting average, hindsight bias, illegal immigration, industrial cluster, interest rate swap, invention of the printing press, invention of the telescope, invisible hand, Isaac Newton, Jane Jacobs, Jeff Bezos, Joseph Schumpeter, Kenneth Rogoff, lake wobegon effect, Laplace demon, Long Term Capital Management, loss aversion, medical malpractice, meta-analysis, Milgram experiment, natural language processing, Netflix Prize, Network effects, oil shock, packet switching, pattern recognition, performance metric, phenotype, Pierre-Simon Laplace, planetary scale, prediction markets, pre–internet, RAND corporation, random walk, RFID, school choice, Silicon Valley, social contagion, social intelligence, statistical model, Steve Ballmer, Steve Jobs, Steve Wozniak, supply-chain management, tacit knowledge, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, too big to fail, Toyota Production System, Tragedy of the Commons, ultimatum game, urban planning, Vincenzo Peruggia: Mona Lisa, Watson beat the top human players on Jeopardy!, X Prize

Next, we compared the performance of these two polls with the Vegas sports betting market—one of the oldest and most popular betting markets in the world—as well as with another prediction market, TradeSports. And finally, we compared the prediction of both the markets and the polls against two simple statistical models. The first model relied only on the historical probability that home teams win—which they do 58 percent of the time—while the second model also factored in the recent win-loss records of the two teams in question. In this way, we set up a six-way comparison between different prediction methods—two statistical models, two markets, and two polls.6 Given how different these methods were, what we found was surprising: All of them performed about the same.

One might think, therefore, that prediction markets, with their far greater capacity to factor in different sorts of information, would outperform simplistic statistical models by a much wider margin for baseball than they do for football. But that turns out not to be true either. We compared the predictions of the Las Vegas sports betting markets over nearly twenty thousand Major League baseball games played from 1999 to 2006 with a simple statistical model based again on home-team advantage and the recent win-loss records of the two teams. This time, the difference between the two was even smaller—in fact, the performance of the market and the model were indistinguishable.

Because AI researchers had to program every fact, rule, and learning process into their creations from scratch, and because their creations failed to behave as expected in obvious and often catastrophic ways—like driving off a cliff or trying to walk through a wall—the frame problem was impossible to ignore. Rather than trying to crack the problem, therefore, AI researchers took a different approach entirely—one that emphasized statistical models of data rather than thought processes. This approach, which nowadays is called machine learning, was far less intuitive than the original cognitive approach, but it has proved to be much more productive, leading to all kinds of impressive breakthroughs, from the almost magical ability of search engines to complete queries as you type them to building autonomous robot cars, and even a computer that can play Jeopardy!


pages: 829 words: 186,976

The Signal and the Noise: Why So Many Predictions Fail-But Some Don't by Nate Silver

airport security, Alan Greenspan, Alvin Toffler, An Inconvenient Truth, availability heuristic, Bayesian statistics, Bear Stearns, behavioural economics, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, big-box store, Black Monday: stock market crash in 1987, Black Swan, Boeing 747, book value, Broken windows theory, business cycle, buy and hold, Carmen Reinhart, Charles Babbage, classic study, Claude Shannon: information theory, Climategate, Climatic Research Unit, cognitive dissonance, collapse of Lehman Brothers, collateralized debt obligation, complexity theory, computer age, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, cuban missile crisis, Daniel Kahneman / Amos Tversky, disinformation, diversification, Donald Trump, Edmond Halley, Edward Lorenz: Chaos theory, en.wikipedia.org, equity premium, Eugene Fama: efficient market hypothesis, everywhere but in the productivity statistics, fear of failure, Fellow of the Royal Society, Ford Model T, Freestyle chess, fudge factor, Future Shock, George Akerlof, global pandemic, Goodhart's law, haute cuisine, Henri Poincaré, high batting average, housing crisis, income per capita, index fund, information asymmetry, Intergovernmental Panel on Climate Change (IPCC), Internet Archive, invention of the printing press, invisible hand, Isaac Newton, James Watt: steam engine, Japanese asset price bubble, John Bogle, John Nash: game theory, John von Neumann, Kenneth Rogoff, knowledge economy, Laplace demon, locking in a profit, Loma Prieta earthquake, market bubble, Mikhail Gorbachev, Moneyball by Michael Lewis explains big data, Monroe Doctrine, mortgage debt, Nate Silver, negative equity, new economy, Norbert Wiener, Oklahoma City bombing, PageRank, pattern recognition, pets.com, Phillips curve, Pierre-Simon Laplace, Plato's cave, power law, prediction markets, Productivity paradox, proprietary trading, public intellectual, random walk, Richard Thaler, Robert Shiller, Robert Solow, Rodney Brooks, Ronald Reagan, Saturday Night Live, savings glut, security theater, short selling, SimCity, Skype, statistical model, Steven Pinker, The Great Moderation, The Market for Lemons, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, Timothy McVeigh, too big to fail, transaction costs, transfer pricing, University of East Anglia, Watson beat the top human players on Jeopardy!, Wayback Machine, wikimedia commons

Moreover, even the aggregate economic forecasts have been quite poor in any real-world sense, so there is plenty of room for progress. Most economists rely on their judgment to some degree when they make a forecast, rather than just take the output of a statistical model as is. Given how noisy the data is, this is probably helpful. A study62 by Stephen K. McNess, the former vice president of the Federal Reserve Bank of Boston, found that judgmental adjustments to statistical forecasting methods resulted in forecasts that were about 15 percent more accurate. The idea that a statistical model would be able to “solve” the problem of economic forecasting was somewhat in vogue during the 1970s and 1980s when computers came into wider use.

In these cases, it is much more likely that the fault lies with the forecaster’s model of the world and not with the world itself. In the instance of CDOs, the ratings agencies had no track record at all: these were new and highly novel securities, and the default rates claimed by S&P were not derived from historical data but instead were assumptions based on a faulty statistical model. Meanwhile, the magnitude of their error was enormous: AAA-rated CDOs were two hundred times more likely to default in practice than they were in theory. The ratings agencies’ shot at redemption would be to admit that the models had been flawed and the mistake had been theirs. But at the congressional hearing, they shirked responsibility and claimed to have been unlucky.

Barack Obama had led John McCain in almost every national poll since September 15, 2008, when the collapse of Lehman Brothers had ushered in the worst economic slump since the Great Depression. Obama also led in almost every poll of almost every swing state: in Ohio and Florida and Pennsylvania and New Hampshire—and even in a few states that Democrats don’t normally win, like Colorado and Virginia. Statistical models like the one I developed for FiveThirtyEight suggested that Obama had in excess of a 95 percent chance of winning the election. Betting markets were slightly more equivocal, but still had him as a 7 to 1 favorite.2 But McLaughlin’s first panelist, Pat Buchanan, dodged the question. “The undecideds will decide this weekend,” he remarked, drawing guffaws from the rest of the panel.


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

backpropagation, bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, disinformation, distributed generation, finite state, industrial research laboratory, information retrieval, information security, iterative process, knowledge worker, linked data, machine readable, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, power law, random walk, recommendation engine, RFID, search costs, semantic web, seminal paper, sentiment analysis, sparse data, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

Data mining has an inherent connection with statistics. A statistical model is a set of mathematical functions that describe the behavior of the objects in a target class in terms of random variables and their associated probability distributions. Statistical models are widely used to model data and data classes. For example, in data mining tasks like data characterization and classification, statistical models of target classes can be built. In other words, such statistical models can be the outcome of a data mining task. Alternatively, data mining tasks can be built on top of statistical models. For example, we can use statistics to model noise and missing data values.

For each object y in region, R, we can estimate , the probability that this point fits the Gaussian distribution. Because is very low, y is unlikely generated by the Gaussian model, and thus is an outlier. The effectiveness of statistical methods highly depends on whether the assumptions made for the statistical model hold true for the given data. There are many kinds of statistical models. For example, the statistic models used in the methods may be parametric or nonparametric. Statistical methods for outlier detection are discussed in detail in Section 12.3. Proximity-Based Methods Proximity-based methods assume that an object is an outlier if the nearest neighbors of the object are far away in feature space, that is, the proximity of the object to its neighbors significantly deviates from the proximity of most of the other objects to their neighbors in the same data set.

Then, when mining patterns in a large data set, the data mining process can use the model to help identify and handle noisy or missing values in the data. Statistics research develops tools for prediction and forecasting using data and statistical models. Statistical methods can be used to summarize or describe a collection of data. Basic statistical descriptions of data are introduced in Chapter 2. Statistics is useful for mining various patterns from data as well as for understanding the underlying mechanisms generating and affecting the patterns. Inferential statistics (or predictive statistics) models data in a way that accounts for randomness and uncertainty in the observations and is used to draw inferences about the process or population under investigation.


pages: 174 words: 56,405

Machine Translation by Thierry Poibeau

Alignment Problem, AlphaGo, AltaVista, augmented reality, call centre, Claude Shannon: information theory, cloud computing, combinatorial explosion, crowdsourcing, deep learning, DeepMind, easy for humans, difficult for computers, en.wikipedia.org, geopolitical risk, Google Glasses, information retrieval, Internet of things, language acquisition, machine readable, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, natural language processing, Necker cube, Norbert Wiener, RAND corporation, Robert Mercer, seminal paper, Skype, speech recognition, statistical model, technological singularity, Turing test, wikimedia commons

Introduction of Linguistic Information into Statistical Models Statistical translation models, despite their increasing complexity to better fit language specificities, have not solved all the difficulties encountered. In fact, bilingual corpora, even large ones, remain insufficient at times to properly cover rare or complex linguistic phenomena. One solution is to then integrate more information of a linguistic nature in the machine translation system to better represent the relations between words (syntax) and their meanings (semantics). Alignment Models Accounting for Syntax The statistical models described so far are all direct translation systems: they search for equivalences between the source language and the target language at word level, or, at best, they take into consideration sequences of words that are not necessarily linguistically coherent.

On the one hand, the analysis of existing translations and their generalization according to various linguistic strategies can be used as a reservoir of knowledge for future translations. This is known as example-based translation, because in this approach previous translations are considered examples for new translations. On the other hand, with the increasing amount of translations available on the Internet, it is now possible to directly design statistical models for machine translation. This approach, known as statistical machine translation, is the most popular today. Unlike a translation memory, which can be relatively small, automatic processing presumes the availability of an enormous amount of data. Robert Mercer, one of the pioneers of statistical translation,1 proclaimed: “There is no data like more data.”

This approach naturally takes into account the statistical nature of language, which means that the approach focuses on the most frequent patterns in a language and, despite its limitations, is able to produce acceptable translations for a significant number of simple sentences. In certain cases, statistical models can also identify idioms thanks to asymmetric alignments (one word from the source language aligned with several words from the target language, for example), which means they can also overcome the word-for-word limitation. In the following section, we will examine several lexical alignment models developed toward the end of the 1980s and the beginning of the 1990s.


pages: 204 words: 58,565

Keeping Up With the Quants: Your Guide to Understanding and Using Analytics by Thomas H. Davenport, Jinho Kim

behavioural economics, Black-Scholes formula, business intelligence, business process, call centre, computer age, correlation coefficient, correlation does not imply causation, Credit Default Swap, data science, en.wikipedia.org, feminist movement, Florence Nightingale: pie chart, forensic accounting, global supply chain, Gregor Mendel, Hans Rosling, hypertext link, invention of the telescope, inventory management, Jeff Bezos, Johannes Kepler, longitudinal study, margin call, Moneyball by Michael Lewis explains big data, Myron Scholes, Netflix Prize, p-value, performance metric, publish or perish, quantitative hedge fund, random walk, Renaissance Technologies, Robert Shiller, self-driving car, sentiment analysis, six sigma, Skype, statistical model, supply-chain management, TED Talk, text mining, the scientific method, Thomas Davenport

In big-data environments, where the data just keeps coming in large volumes, it may not always be possible for humans to create hypotheses before sifting through the data. In the context of placing digital ads on publishers’ sites, for example, decisions need to be made in thousandths of a second by automated decision systems, and the firms doing this work must generate several thousand statistical models per week. Clearly this type of analysis can’t involve a lot of human hypothesizing and reflection on results, and machine learning is absolutely necessary. But for the most part, we’d advise sticking to hypothesis-driven analysis and the steps and sequence in this book. The Modeling (Variable Selection) Step A model is a purposefully simplified representation of the phenomenon or problem.

Data analysis * * * Key Software Vendors for Different Analysis Types (listed alphabetically) REPORTING SOFTWARE BOARD International IBM Cognos Information Builders WebFOCUS Oracle Business Intelligence (including Hyperion) Microsoft Excel/SQL Server/SharePoint MicroStrategy Panorama SAP BusinessObjects INTERACTIVE VISUAL ANALYTICS QlikTech QlikView Tableau TIBCO Spotfire QUANTITATIVE OR STATISTICAL MODELING IBM SPSS R (an open-source software package) SAS * * * While all of the listed reporting software vendors also have capabilities for graphical display, some vendors focus specifically on interactive visual analytics, or the use of visual representations of data and reporting.

Such tools are often used simply to graph data and for data discovery—understanding the distribution of the data, identifying outliers (data points with unexpected values) and visual relationships between variables. So we’ve listed these as a separate category. We’ve also listed key vendors of software for the other category of analysis, which we’ll call quantitative or statistical modeling. In that category, you’re trying to use statistics to understand the relationships between variables and to make inferences from your sample to a larger population. Predictive analytics, randomized testing, and the various forms of regression analysis are all forms of this type of modeling.


pages: 301 words: 89,076

The Globotics Upheaval: Globalisation, Robotics and the Future of Work by Richard Baldwin

agricultural Revolution, Airbnb, AlphaGo, AltaVista, Amazon Web Services, Apollo 11, augmented reality, autonomous vehicles, basic income, Big Tech, bread and circuses, business process, business process outsourcing, call centre, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, commoditize, computer vision, Corn Laws, correlation does not imply causation, Credit Default Swap, data science, David Ricardo: comparative advantage, declining real wages, deep learning, DeepMind, deindustrialization, deskilling, Donald Trump, Douglas Hofstadter, Downton Abbey, Elon Musk, Erik Brynjolfsson, facts on the ground, Fairchild Semiconductor, future of journalism, future of work, George Gilder, Google Glasses, Google Hangouts, Hans Moravec, hiring and firing, hype cycle, impulse control, income inequality, industrial robot, intangible asset, Internet of things, invisible hand, James Watt: steam engine, Jeff Bezos, job automation, Kevin Roose, knowledge worker, laissez-faire capitalism, Les Trente Glorieuses, low skilled workers, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, manufacturing employment, Mark Zuckerberg, mass immigration, mass incarceration, Metcalfe’s law, mirror neurons, new economy, optical character recognition, pattern recognition, Ponzi scheme, post-industrial society, post-work, profit motive, remote working, reshoring, ride hailing / ride sharing, Robert Gordon, Robert Metcalfe, robotic process automation, Ronald Reagan, Salesforce, San Francisco homelessness, Second Machine Age, self-driving car, side project, Silicon Valley, Skype, Snapchat, social intelligence, sovereign wealth fund, standardized shipping container, statistical model, Stephen Hawking, Steve Jobs, supply-chain management, systems thinking, TaskRabbit, telepresence, telepresence robot, telerobotics, Thomas Malthus, trade liberalization, universal basic income, warehouse automation

The chore is to identify which features of the digitalized speech data are most useful when making an educated guess as to the corresponding word. To tackle this chore, the computer scientists set up a “blank slate” statistical model. It is a blank slate in the sense that every feature of the speech data is allowed to be, in principle, an important feature in the guessing process. What they are looking for is how to weight each aspect of the speech data when trying to find the word it is associated with. The revolutionary thing about machine learning is that the scientists don’t fill in the blanks. They don’t write down the weights in the statistical model. Instead, they write a set of step-by-step instructions for how the computer should fill in the blanks itself.

That is to say, it identifies the features of the speech data that are useful in predicting the corresponding words. The scientists then make the statistical model take an exam. They feed it a fresh set of spoken words and ask it to predict the written words that they correspond to. This is called the “testing data set.” Usually, the model—which is also called an “algorithm”—is not good enough to be released “into the wild,” so the computer scientists do some sophisticated trial and error of their own by manually tweaking the computer program that is used to choose the weights. After what can be a long sequence of iterations like this, and after the statistical model has achieved a sufficiently high degree of accuracy, the new language model graduates to the next level.

We haven’t a clue as to how our elephant thinks—how we, for example, recognize a cat or keep our balance when running over hill and dale. A form of AI called “machine learning” solved the paradox by changing the way computers are programmed. With machine learning, humans help the computer (the “machine” part) estimate a very large statistical model that the computer then uses to guess the solution to a particular problem (the “learning” part). Thanks to mind-blowing advances in computing power and access to hallucinatory amounts of data, white-collar robots trained by machine learning routinely achieve human-level performance on specific guessing tasks, like recognizing speech.


pages: 276 words: 81,153

Outnumbered: From Facebook and Google to Fake News and Filter-Bubbles – the Algorithms That Control Our Lives by David Sumpter

affirmative action, algorithmic bias, AlphaGo, Bernie Sanders, Brexit referendum, Cambridge Analytica, classic study, cognitive load, Computing Machinery and Intelligence, correlation does not imply causation, crowdsourcing, data science, DeepMind, Demis Hassabis, disinformation, don't be evil, Donald Trump, Elon Musk, fake news, Filter Bubble, Geoffrey Hinton, Google Glasses, illegal immigration, James Webb Space Telescope, Jeff Bezos, job automation, Kenneth Arrow, Loebner Prize, Mark Zuckerberg, meta-analysis, Minecraft, Nate Silver, natural language processing, Nelson Mandela, Nick Bostrom, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, p-value, post-truth, power law, prediction markets, random walk, Ray Kurzweil, Robert Mercer, selection bias, self-driving car, Silicon Valley, Skype, Snapchat, social contagion, speech recognition, statistical model, Stephen Hawking, Steve Bannon, Steven Pinker, TED Talk, The Signal and the Noise by Nate Silver, traveling salesman, Turing test

Instead, Mona described to me a culture where colleagues judged each other on how advanced their mathematical techniques were. They believed there was a direct trade-off between the quality of statistical results and the ease with which they can be communicated. If FiveThirtyEight offered a purely statistical model of the polls then the socio-economic background of their statisticians wouldn’t be relevant. But they don’t offer a purely statistical model. Such a model would have come out strongly for Clinton. Instead, they use a combination of their skills as forecasters and the underlying numbers. Work environments consisting of people with the same background and ideas are typically less likely to perform as well on difficult tasks, such as academic research and running a successful business.12 It is difficult for a bunch of people who all have the same background to identify all of the complex factors involved in predicting the future.

Google’s search engine was making racist autocomplete suggestions; Twitterbots were spreading fake news; Stephen Hawking was worried about artificial intelligence; far-right groups were living in algorithmically created filter-bubbles; Facebook was measuring our personalities, and these were being exploited to target voters. One after another, the stories of the dangers of algorithms accumulated. Even the mathematicians’ ability to make predictions was called into question as statistical models got both Brexit and Trump wrong. Stories about the maths of football, love, weddings, graffiti and other fun things were suddenly replaced by the maths of sexism, hate, dystopia and embarrassing errors in opinion poll calculations. When I reread the scientific article on Banksy, a bit more carefully this time, I found that very little new evidence was presented about his identity.

CHAPTER TWO Make Some Noise After the mathematical unmasking of Banksy had sunk in, I realised that I had somehow missed the sheer scale of the change that algorithms were making to our society. But let me be clear. I certainly hadn’t missed the development of the mathematics. Machine learning, statistical models and artificial intelligence are all things I actively research and talk about with my colleagues every day. I read the latest articles and keep up to date with the biggest developments. But I was concentrating on the scientific side of things: looking at how the algorithms work in the abstract.


pages: 400 words: 94,847

Reinventing Discovery: The New Era of Networked Science by Michael Nielsen

Albert Einstein, augmented reality, barriers to entry, bioinformatics, Cass Sunstein, Climategate, Climatic Research Unit, conceptual framework, dark matter, discovery of DNA, Donald Knuth, double helix, Douglas Engelbart, Douglas Engelbart, Easter island, en.wikipedia.org, Erik Brynjolfsson, fault tolerance, Fellow of the Royal Society, Firefox, Free Software Foundation, Freestyle chess, Galaxy Zoo, Higgs boson, Internet Archive, invisible hand, Jane Jacobs, Jaron Lanier, Johannes Kepler, Kevin Kelly, Large Hadron Collider, machine readable, machine translation, Magellanic Cloud, means of production, medical residency, Nicholas Carr, P = NP, P vs NP, publish or perish, Richard Feynman, Richard Stallman, selection bias, semantic web, Silicon Valley, Silicon Valley startup, Simon Singh, Skype, slashdot, social intelligence, social web, statistical model, Stephen Hawking, Stewart Brand, subscription business, tacit knowledge, Ted Nelson, the Cathedral and the Bazaar, The Death and Life of Great American Cities, The Nature of the Firm, The Wisdom of Crowds, University of East Anglia, Vannevar Bush, Vernor Vinge, Wayback Machine, Yochai Benkler

At the least we should take seriously the idea that these statistical models express truths not found in more conventional explanations of language translation. Might it be that the statistical models contain more truth than our conventional theories of language, with their notions of verb, noun, and adjective, subjects and objects, and so on? Or perhaps the models contain a different kind of truth, in part complementary, and in part overlapping, with conventional theories of language? Maybe we could develop a better theory of language by combining the best insights from the conventional approach and the approach based on statistical modeling into a single, unified explanation?

The program would also examine the corpus to figure out how words moved around in the sentence, observing, for example, that “hola” and “hello” tend to be in the same parts of the sentence, while other words get moved around more. Repeating this for every pair of words in the Spanish and English languages, their program gradually built up a statistical model of translation—an immensely complex model, but nonetheless one that can be stored on a modern computer. I won’t describe the models they used in complete detail here, but the hola-hello example gives you the flavor. Once they had analyzed the corpus and built up their statistical model, they used that model to translate new texts. To translate a Spanish sentence, the idea was to find the English sentence that, according to the model, had the highest probability.

But it’s stimulating to speculate that nouns and verbs, subjects and objects, and all the other paraphernalia of language are really emergent properties whose existence can be deduced from statistical models of language. Today, we don’t yet know how to make such a deductive leap, but that doesn’t mean it’s not possible. What status should we give to complex explanations of thisype? As the data web is built, it will become easier and easier for people to construct such explanations, and we’ll end up with statistical models of all kinds of complex phenomena. We’ll need to learn how to look into complex models such as the language models and extract emergent concepts such as verbs and nouns.


pages: 354 words: 26,550

High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems by Irene Aldridge

algorithmic trading, asset allocation, asset-backed security, automated trading system, backtesting, Black Swan, Brownian motion, business cycle, business process, buy and hold, capital asset pricing model, centralized clearinghouse, collapse of Lehman Brothers, collateralized debt obligation, collective bargaining, computerized trading, diversification, equity premium, fault tolerance, financial engineering, financial intermediation, fixed income, global macro, high net worth, implied volatility, index arbitrage, information asymmetry, interest rate swap, inventory management, Jim Simons, law of one price, Long Term Capital Management, Louis Bachelier, machine readable, margin call, market friction, market microstructure, martingale, Myron Scholes, New Journalism, p-value, paper trading, performance metric, Performance of Mutual Funds in the Period, pneumatic tube, profit motive, proprietary trading, purchasing power parity, quantitative trading / quantitative finance, random walk, Renaissance Technologies, risk free rate, risk tolerance, risk-adjusted returns, risk/return, Sharpe ratio, short selling, Small Order Execution System, statistical arbitrage, statistical model, stochastic process, stochastic volatility, systematic trading, tail risk, trade route, transaction costs, value at risk, yield curve, zero-sum game

Legal risk—the risk of litigation expenses All current risk measurement approaches fall into four categories: r r r r Statistical models Scalar models Scenario analysis Causal modeling Statistical models generate predictions about worst-case future conditions based on past information. The Value-at-Risk (VaR) methodology is the most common statistical risk measurement tool, discussed in detail in the sections that focus on market and liquidity risk estimation. Statistical models are the preferred methodology of risk estimation whenever statistical modeling is feasible. Scalar models establish the maximum foreseeable loss levels as percentages of business parameters, such as revenues, operating costs, and the like.

Yet, readers relying on software packages with preconfigured statistical procedures may find the level of detail presented here to be sufficient for quality analysis of trading opportunities. The depth of the statistical content should be also sufficient for readers to understand the models presented throughout the remainder of this book. Readers interested in a more thorough treatment of statistical models may refer to Tsay (2002); Campbell, Lo, and MacKinlay (1997); and Gouriéroux and Jasiak (2001). This chapter begins with a review of the fundamental statistical estimators, moves on to linear dependency identification methods and volatility modeling techniques, and concludes with standard nonlinear approaches for identifying and modeling trading opportunities.

These highfrequency strategies, which trade on the market movements surrounding news announcements, are collectively referred to as event arbitrage. This chapter investigates the mechanics of event arbitrage in the following order: W r Overview of the development process r Generating a price forecast through statistical modeling of r Directional forecasts r Point forecasts r Applying event arbitrage to corporate announcements, industry news, and macroeconomic news r Documented effects of events on foreign exchange, equities, fixed income, futures, emerging economies, commodities, and REIT markets DEVELOPING EVENT ARBITRAGE TRADING STRATEGIES Event arbitrage refers to the group of trading strategies that place trades on the basis of the markets’ reaction to events.


pages: 320 words: 33,385

Market Risk Analysis, Quantitative Methods in Finance by Carol Alexander

asset allocation, backtesting, barriers to entry, Brownian motion, capital asset pricing model, constrained optimization, credit crunch, Credit Default Swap, discounted cash flows, discrete time, diversification, diversified portfolio, en.wikipedia.org, financial engineering, fixed income, implied volatility, interest rate swap, low interest rates, market friction, market microstructure, p-value, performance metric, power law, proprietary trading, quantitative trading / quantitative finance, random walk, risk free rate, risk tolerance, risk-adjusted returns, risk/return, seminal paper, Sharpe ratio, statistical arbitrage, statistical model, stochastic process, stochastic volatility, systematic bias, Thomas Bayes, transaction costs, two and twenty, value at risk, volatility smile, Wiener process, yield curve, zero-sum game

Its most important market risk modelling applications are to: • multivariate GARCH modelling, generating copulas, and • simulating asset prices. • I.3.5 INTRODUCTION TO STATISTICAL INFERENCE A statistical model will predict well only if it is properly specified and its parameter estimates are robust, unbiased and efficient. Unbiased means that the expected value of the estimator is equal to the true model parameter and efficient means that the variance of the estimator is low, i.e. different samples give similar estimates. When we set up a statistical model the implicit assumption is that this is the ‘true’ model for the population. We estimate the model’s parameters from a sample and then use these estimates to infer the values of the ‘true’ population parameters.

A case study in this chapter applies PCA to European equity indices, and several more case studies are given in subsequent volumes of Market Risk Analysis. A very good free downloadable Excel add-in has been used for these case studies and examples. Further details are given in the chapter. Chapter 3, Probability and Statistics, covers the probabilistic and statistical models that we use to analyse the evolution of financial asset prices or interest rates. Starting from the basic concepts of a random variable, a probability distribution, quantiles and population and sample moments, we then provide a catalogue of probability distributions. We describe the theoretical properties of each distribution and give examples of practical applications to finance.

We describe the theoretical properties of each distribution and give examples of practical applications to finance. Stable distributions and kernel estimates are also covered, because they have broad applications to financial risk management. The sections on statistical inference and maximum likelihood lay the foundations for Chapter 4. Finally, we focus on the continuous time and discrete time statistical models for the evolution of financial asset prices and returns, which are further developed in Volume III. xxvi Preface Much of the material in Volume II rests on the Introduction to Linear Regression given in Chapter 4. Here we start from the basic, simple linear model, showing how to estimate and draw inferences on the parameters, and explaining the standard diagnostic tests for a regression model.


pages: 250 words: 64,011

Everydata: The Misinformation Hidden in the Little Data You Consume Every Day by John H. Johnson

Affordable Care Act / Obamacare, autism spectrum disorder, Black Swan, business intelligence, Carmen Reinhart, cognitive bias, correlation does not imply causation, Daniel Kahneman / Amos Tversky, data science, Donald Trump, en.wikipedia.org, Kenneth Rogoff, labor-force participation, lake wobegon effect, Long Term Capital Management, Mercator projection, Mercator projection distort size, especially Greenland and Africa, meta-analysis, Nate Silver, obamacare, p-value, PageRank, pattern recognition, publication bias, QR code, randomized controlled trial, risk-adjusted returns, Ronald Reagan, selection bias, statistical model, The Signal and the Noise by Nate Silver, Thomas Bayes, Tim Cook: Apple, wikimedia commons, Yogi Berra

You collect all the data on every wheat price in the history of humankind, and all the different factors that determine the price of wheat (temperature, feed prices, transportation costs, etc.). First, you need to develop a statistical model to determine what factors have affected the price of wheat in the past and how these various factors relate to one another mathematically. Then, based on that model, you predict the price of wheat for next year.14 The problem is that no matter how big your sample is (even if it’s the full population), and how accurate your statistical model is, there are still unknowns that can cause your forecast to be off: What if a railroad strike doubles the transportation costs?

As Hovenkamp said, “the plaintiff’s expert had ignored a clear ‘outlier’ in the data.”33 If that outlier data had been excluded—as it arguably should have been—then the results would have shown a clear increase in market share for Conwood. Instead, the conclusion—driven by an extreme observation—showed a decrease. If your conclusions change dramatically by excluding a data point, then that data point is a strong candidate to be an outlier. In a good statistical model, you would expect that you can drop a data point without seeing a substantive difference in the results. It’s something to think about when looking for outliers. ARE YOU BETTER THAN AVERAGE? The average American: Sleeps more than 8.7 hours per day34 Weighs approximately 181 pounds (195.5 pounds for men and 166.2 pounds for women)35 Drinks 20.8 gallons of beer per year36 Drives 13,476 miles per year (hopefully not after drinking all that beer)37 Showers six times a week, but only shampoos four times a week38 Has been at his or her current job 4.6 years39 So, are you better than average?

(On its website, Visa even suggests that you tell your financial institution if you’ll be traveling, which can “help ensure that your card isn’t flagged for unusual activity.”18) This is a perfect example of a false positive—the credit card company predicted that the charges on your card were potentially fraudulent, but it was wrong. Events like this, which may not be accounted for in the statistical model, are potential sources of prediction error. Just as sampling error tells us about the uncertainty in our sample, prediction error is a way to measure uncertainty in the future, essentially by comparing the predicted results to the actual outcomes, once they occur.19 Prediction error is often measured using a prediction interval, which is the range in which we expect to see the next data point.


pages: 252 words: 71,176

Strength in Numbers: How Polls Work and Why We Need Them by G. Elliott Morris

affirmative action, call centre, Cambridge Analytica, commoditize, coronavirus, COVID-19, critical race theory, data science, Donald Trump, Francisco Pizarro, green new deal, lockdown, Moneyball by Michael Lewis explains big data, Nate Silver, random walk, Ronald Reagan, selection bias, Silicon Valley, Socratic dialogue, statistical model, Works Progress Administration

Although political pollsters would still have to conduct traditional RDD telephone polls to sample the attitudes of Americans who are not registered to vote and therefore do not show up in states’ voter files, any pre-election polling would ideally still be mixed with polls conducted off the voter file in order to adjust nonresponse biases among the voting population. But the world of public opinion research is far from ideal. First, not every polling outfit has access to a voter file. Subscriptions can be very expensive, often more expensive than the added cost of calling people who won’t respond to your poll. Reengineering statistical models to incorporate the new methods also takes time, which many firms do not have. Further, many pollsters clinging to RDD phone polls would not have the technical know-how to make the switch even if they tried; Hartman and her colleagues were in a league of their own when it came to their programming and statistical abilities.

Another analyst, extremely sharp but perhaps prone to dramatic swings, memorably declares “We’re fucked.” A senior member of the team excused himself, and I later found out that he proceeded immediately to the bathroom, in order to vomit.11 Ghitza’s thesis reports on a group of projects related to statistical modeling and political science. He had been hired by the Obama campaign roughly six weeks before the election to program a model that could predict how the election was unfolding throughout the day, based on the way turnout among different groups was looking. Ghitza was interested in answering two questions: First, “Could we measure deviations from expected turnout for different groups of the electorate in real time?”

David Shor, Ghitza’s colleague who created the campaign’s pre-election poll-based forecasts, later remarked, “That was the worst 12 hours of my life.”12 Ghitza was not hired by the Obama campaign to work on its voter file and polling operation, but he probably should have been. He had studied breakthrough statistical modeling during his doctoral work at Columbia, developing a lot of the methods his current employer, Catalist, uses to merge polls with voter files and model support for political candidates at the individual level. The hallmark method of his dissertation, “multilevel regression with post-stratification” (MRP), was thought up by his advisor, Andrew Gelman, in the 1990s.


pages: 265 words: 74,000

The Numerati by Stephen Baker

Berlin Wall, Black Swan, business process, call centre, correlation does not imply causation, Drosophila, full employment, illegal immigration, index card, information security, Isaac Newton, job automation, job satisfaction, junk bonds, McMansion, Myron Scholes, natural language processing, off-the-grid, PageRank, personalized medicine, recommendation engine, RFID, Silicon Valley, Skype, statistical model, surveillance capitalism, Watson beat the top human players on Jeopardy!, workplace surveillance

And when he got his master's, he decided to look for a job "at places where they hire Ph.D.'s." He landed at Accenture, and now, at an age at which many of his classmates are just finishing their doctorate, he runs the analytics division from his perch in Chicago. Ghani leads me out of his office and toward the shopping cart. For statistical modeling, he explains, grocery shopping is one of the first retail industries to conquer. This is because we buy food constantly. For many of us, the supermarket functions as a chilly, Muzak-blaring annex to our pantries. (I would bet that millions of suburban Americans spend more time in supermarkets than in their formal living room.)

He thinks that over the next generation, many of us will surround ourselves with the kinds of networked gadgets he and his team are building and testing. These machines will busy themselves with far more than measuring people's pulse and counting the pills they take, which is what today's state-of-the-art monitors can do. Dishman sees sensors eventually recording and building statistical models of almost every aspect of our behavior. They'll track our pathways in the house, the rhythm of our gait. They'll diagram our thrashing in bed and chart our nightly trips to the bathroom—perhaps keeping tabs on how much time we spend in there. Some of these gadgets will even measure the pause before we recognize a familiar voice on the phone.

From that, they can calculate a 90 percent probability that toothbrush movement involves teeth cleaning. (They could factor in time variables, but there's more than enough complexity ahead, as we'll see.) Next they move to the broom and the teakettle, and they ask the same questions. The goal is to build a statistical model for each of us that will infer from a series of observations what we're most likely to be doing. The toothbrush was easy. For the most part, it sticks to only one job. But consider the kettle. What are the chances that it's being used for tea? Maybe a person uses it to make instant soup (which is more nutritious than tea but dangerously salty for people like my mother).


pages: 764 words: 261,694

The Elements of Statistical Learning (Springer Series in Statistics) by Trevor Hastie, Robert Tibshirani, Jerome Friedman

algorithmic bias, backpropagation, Bayesian statistics, bioinformatics, computer age, conceptual framework, correlation coefficient, data science, G4S, Geoffrey Hinton, greed is good, higher-order functions, linear programming, p-value, pattern recognition, random walk, selection bias, sparse data, speech recognition, statistical model, stochastic process, The Wisdom of Crowds

–Ian Hacking This is page xiii Printer: Opaque this Contents Preface to the Second Edition vii Preface to the First Edition xi 1 Introduction 2 Overview of Supervised Learning 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 2.2 Variable Types and Terminology . . . . . . . . . . 2.3 Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors . . . . . . . 2.3.1 Linear Models and Least Squares . . . . 2.3.2 Nearest-Neighbor Methods . . . . . . . . 2.3.3 From Least Squares to Nearest Neighbors 2.4 Statistical Decision Theory . . . . . . . . . . . . . 2.5 Local Methods in High Dimensions . . . . . . . . . 2.6 Statistical Models, Supervised Learning and Function Approximation . . . . . . . . . . . . 2.6.1 A Statistical Model for the Joint Distribution Pr(X, Y ) . . . 2.6.2 Supervised Learning . . . . . . . . . . . . 2.6.3 Function Approximation . . . . . . . . . 2.7 Structured Regression Models . . . . . . . . . . . 2.7.1 Difficulty of the Problem . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 14 16 18 22 . . . . 28 . . . . . 28 29 29 32 32 . . . . . . . . . . . 9 9 9 . . . . . . . . . .

We will see that there is a whole spectrum of models between the rigid linear models and the extremely flexible 1-nearest-neighbor models, each with their own assumptions and biases, which have been proposed specifically to avoid the exponential growth in complexity of functions in high dimensions by drawing heavily on these assumptions. Before we delve more deeply, let us elaborate a bit on the concept of statistical models and see how they fit into the prediction framework. 28 2. Overview of Supervised Learning 2.6 Statistical Models, Supervised Learning and Function Approximation Our goal is to find a useful approximation fˆ(x) to the function f (x) that underlies the predictive relationship between the inputs and outputs. In the theoretical setting of Section 2.4, we saw that squared error loss lead us to the regression function f (x) = E(Y |X = x) for a quantitative response.

The class of nearest-neighbor methods can be viewed as direct estimates of this conditional expectation, but we have seen that they can fail in at least two ways: • if the dimension of the input space is high, the nearest neighbors need not be close to the target point, and can result in large errors; • if special structure is known to exist, this can be used to reduce both the bias and the variance of the estimates. We anticipate using other classes of models for f (x), in many cases specifically designed to overcome the dimensionality problems, and here we discuss a framework for incorporating them into the prediction problem. 2.6.1 A Statistical Model for the Joint Distribution Pr(X, Y ) Suppose in fact that our data arose from a statistical model Y = f (X) + ε, (2.29) where the random error ε has E(ε) = 0 and is independent of X. Note that for this model, f (x) = E(Y |X = x), and in fact the conditional distribution Pr(Y |X) depends on X only through the conditional mean f (x). The additive error model is a useful approximation to the truth.


pages: 321

Finding Alphas: A Quantitative Approach to Building Trading Strategies by Igor Tulchinsky

algorithmic trading, asset allocation, automated trading system, backpropagation, backtesting, barriers to entry, behavioural economics, book value, business cycle, buy and hold, capital asset pricing model, constrained optimization, corporate governance, correlation coefficient, credit crunch, Credit Default Swap, currency risk, data science, deep learning, discounted cash flows, discrete time, diversification, diversified portfolio, Eugene Fama: efficient market hypothesis, financial engineering, financial intermediation, Flash crash, Geoffrey Hinton, implied volatility, index arbitrage, index fund, intangible asset, iterative process, Long Term Capital Management, loss aversion, low interest rates, machine readable, market design, market microstructure, merger arbitrage, natural language processing, passive investing, pattern recognition, performance metric, Performance of Mutual Funds in the Period, popular capitalism, prediction markets, price discovery process, profit motive, proprietary trading, quantitative trading / quantitative finance, random walk, Reminiscences of a Stock Operator, Renaissance Technologies, risk free rate, risk tolerance, risk-adjusted returns, risk/return, selection bias, sentiment analysis, shareholder value, Sharpe ratio, short selling, Silicon Valley, speech recognition, statistical arbitrage, statistical model, stochastic process, survivorship bias, systematic bias, systematic trading, text mining, transaction costs, Vanguard fund, yield curve

For example, in alpha research the task of predicting stock prices can be a good application of supervised learning, and the task of selecting stocks for inclusion in a portfolio is an application of unsupervised learning. Machine Learning in Alpha Research123 Machine learning Unsupervised methods Clusterization algorithms Supervised methods Statistical models Support vector machines Neural networks Deep learning algorithms Fuzzy logic Ensemble methods Random forest AdaBoost Figure 16.1 The most developed directions of machine learning. The most popular are in black Statistical Models Models like naive Bayes, linear discriminant analysis, the hidden Markov model, and logistic regression are good for solving relatively simple problems that do not need high precision of classification or prediction.

The most popular are in black Statistical Models Models like naive Bayes, linear discriminant analysis, the hidden Markov model, and logistic regression are good for solving relatively simple problems that do not need high precision of classification or prediction. These methods are easy to implement and not too sensitive to missing data. The disadvantage is that each of these approaches presumes some specific data model. Trend analysis is an example of applications of statistical models in alpha research. In particular, a hidden Markov model is frequently utilized for that purpose, based on the belief that price movements of the stock market are not totally random. In a statistics framework, the hidden Markov model is a composition of two or more stochastic processes: a hidden Markov chain, which accounts for the temporal variability, and an observable process, which accounts for the spectral variability.

(There will be a range of views on both of these horizons, but we can still use the implied causal relationship between the extreme weather and the commodity supply to narrow the range of candidates.) We can now test our idea by gathering data on historical weather forecasts and price changes for the major energy contracts, and testing for association between the two datasets, using a partial in-­ sample historical dataset. The next step is to fit a simple statistical model and test it for robustness, while varying the parameters in the fit. One good robustness test is to include a similar asset for comparison, where we expect the effect to be weaker. In the case of our weather alpha example, Brent crude oil would be a reasonable choice. Crude oil is a global market, so we would expect some spillover from a US supply disruption.


pages: 625 words: 167,349

The Alignment Problem: Machine Learning and Human Values by Brian Christian

Albert Einstein, algorithmic bias, Alignment Problem, AlphaGo, Amazon Mechanical Turk, artificial general intelligence, augmented reality, autonomous vehicles, backpropagation, butterfly effect, Cambridge Analytica, Cass Sunstein, Claude Shannon: information theory, computer vision, Computing Machinery and Intelligence, data science, deep learning, DeepMind, Donald Knuth, Douglas Hofstadter, effective altruism, Elaine Herzberg, Elon Musk, Frances Oldham Kelsey, game design, gamification, Geoffrey Hinton, Goodhart's law, Google Chrome, Google Glasses, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, hedonic treadmill, ImageNet competition, industrial robot, Internet Archive, John von Neumann, Joi Ito, Kenneth Arrow, language acquisition, longitudinal study, machine translation, mandatory minimum, mass incarceration, multi-armed bandit, natural language processing, Nick Bostrom, Norbert Wiener, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, OpenAI, Panopticon Jeremy Bentham, pattern recognition, Peter Singer: altruism, Peter Thiel, precautionary principle, premature optimization, RAND corporation, recommendation engine, Richard Feynman, Rodney Brooks, Saturday Night Live, selection bias, self-driving car, seminal paper, side project, Silicon Valley, Skinner box, sparse data, speech recognition, Stanislav Petrov, statistical model, Steve Jobs, strong AI, the map is not the territory, theory of mind, Tim Cook: Apple, W. E. B. Du Bois, Wayback Machine, zero-sum game

“He was on fire about reforming the criminal justice system,” says Brennan. “It was a job of a passion for Dave.” Brennan and Wells decided to team up.10 They called their company Northpointe. As the era of the personal computer dawned, the use of statistical models at all points in the criminal justice system, in jurisdictions large and small, exploded. In 1980, only four states were using statistical models to assist in parole decisions. By 1990, it was twelve states, and by 2000, it was twenty-six.11 Suddenly it began to seem strange not to use such models; as the Association of Paroling Authorities International’s 2003 Handbook for New Parole Board Members put it, “In this day and age, making parole decisions without benefit of a good, research-based risk assessment instrument clearly falls short of accepted best practice.”12 One of the most widely used tools of this new era had been developed by Brennan and Wells in 1998; they called it Correctional Offender Management Profiling for Alternative Sanctions—or COMPAS.13 COMPAS uses a simple statistical model based on a weighted linear combination of things like age, age at first arrest, and criminal history to predict whether an inmate, if released, would commit a violent or nonviolent crime within approximately one to three years.14 It also includes a broad set of survey questions to identify a defendant’s particular issues and needs—things like chemical dependency, lack of family support, and depression.

Even in cases where the human decision makers were given the statistical prediction as yet another piece of data on which to make their decision, their decisions were still worse than just using the prediction itself.23 Other researchers tried the reverse tack: feeding the expert human judgments into a statistical model as input. They didn’t appear to add much.24 Conclusions like these, which have been supported by numerous studies since, should give us pause.25 For one, they seem to suggest that, whatever myriad issues we face in turning decision-making over to statistical models, human judgment alone is not a viable alternative. At the same time, perhaps complex, elaborate models really aren’t necessary to match or exceed this human baseline.

In recent years, alarm bells have gone off in two distinct communities. The first are those focused on the present-day ethical risks of technology. If a facial-recognition system is wildly inaccurate for people of one race or gender but not another, or if someone is denied bail because of a statistical model that has never been audited and that no one in the courtroom—including the judge, attorneys, and defendant—understands, this is a problem. Issues like these cannot be addressed within traditional disciplinary camps, but rather only through dialogue: between computer scientists, social scientists, lawyers, policy experts, ethicists.


The Ethical Algorithm: The Science of Socially Aware Algorithm Design by Michael Kearns, Aaron Roth

23andMe, affirmative action, algorithmic bias, algorithmic trading, Alignment Problem, Alvin Roth, backpropagation, Bayesian statistics, bitcoin, cloud computing, computer vision, crowdsourcing, data science, deep learning, DeepMind, Dr. Strangelove, Edward Snowden, Elon Musk, fake news, Filter Bubble, general-purpose programming language, Geoffrey Hinton, Google Chrome, ImageNet competition, Lyft, medical residency, Nash equilibrium, Netflix Prize, p-value, Pareto efficiency, performance metric, personalized medicine, pre–internet, profit motive, quantitative trading / quantitative finance, RAND corporation, recommendation engine, replication crisis, ride hailing / ride sharing, Robert Bork, Ronald Coase, self-driving car, short selling, sorting algorithm, sparse data, speech recognition, statistical model, Stephen Hawking, superintelligent machines, TED Talk, telemarketer, Turing machine, two-sided market, Vilfredo Pareto

A far more common type of machine learning is the supervised variety, where we wish to use data to make specific predictions that can later be verified or refuted by observing the truth—for example, using past meteorological data to predict whether it will rain tomorrow. The “supervision” that guides our learning is the feedback we get tomorrow, when either it rains or it doesn’t. And for much of the history of machine learning and statistical modeling, many applications, like this example, were focused on making predictions about nature or other large systems: predicting tomorrow’s weather, predicting whether the stock market will go up or down (and by how much), predicting congestion on roadways during rush hour, and the like. Even when humans were part of the system being modeled, the emphasis was on predicting aggregate, collective behaviors.

But if we go too far down the path toward individual fairness, other difficulties arise. In particular, if our model makes even a single mistake, then it can potentially be accused of unfairness toward that one individual, assuming it makes any loans at all. And anywhere we apply machine learning and statistical models to historical data, there are bound to be mistakes except in the most idealized settings. So we can ask for this sort of individual level of fairness, but if we do so naively, its applicability will be greatly constrained and its costs to accuracy are likely to be unpalatable; we’re simply asking for too much.

Sometimes decisions made using biased data or algorithms are the basis for further data collection, forming a pernicious feedback loop that can amplify discrimination over time. An example of this phenomenon comes from the domain of “predictive policing,” in which large metropolitan police departments use statistical models to forecast neighborhoods with higher crime rates, and then send larger forces of police officers there. The most popularly used algorithms are proprietary and secret, so there is debate about how these algorithms estimate crime rates, and concern that some police departments might be in part using arrest data.


pages: 294 words: 82,438

Simple Rules: How to Thrive in a Complex World by Donald Sull, Kathleen M. Eisenhardt

Affordable Care Act / Obamacare, Airbnb, Apollo 13, asset allocation, Atul Gawande, barriers to entry, Basel III, behavioural economics, Berlin Wall, carbon footprint, Checklist Manifesto, complexity theory, Craig Reynolds: boids flock, Credit Default Swap, Daniel Kahneman / Amos Tversky, democratizing finance, diversification, drone strike, en.wikipedia.org, European colonialism, Exxon Valdez, facts on the ground, Fall of the Berlin Wall, Glass-Steagall Act, Golden age of television, haute cuisine, invention of the printing press, Isaac Newton, Kickstarter, late fees, Lean Startup, Louis Pasteur, Lyft, machine translation, Moneyball by Michael Lewis explains big data, Nate Silver, Network effects, obamacare, Paul Graham, performance metric, price anchoring, RAND corporation, risk/return, Saturday Night Live, seminal paper, sharing economy, Silicon Valley, Startup school, statistical model, Steve Jobs, TaskRabbit, The Signal and the Noise by Nate Silver, transportation-network company, two-sided market, Wall-E, web application, Y Combinator, Zipcar

A simple rule—take the midpoint of the two most distant crime scenes—got police closer to the criminal than more sophisticated decision-making approaches. Another study compared a state-of-the-art statistical model and a simple rule to determine which did a better job of predicting whether past customers would purchase again. According to the simple rule, a customer was inactive if they had not purchased in x months (the number of months varies by industry). The simple rule did as well as the statistical model in predicting repeat purchases of online music, and beat it in the apparel and airline industries. Other research finds that simple rules match or beat more complicated models in assessing the likelihood that a house will be burgled and in forecasting which patients with chest pain are actually suffering from a heart attack.

. [>] Statisticians have found: Professor Scott Armstrong of the Wharton School reviewed thirty-three studies comparing simple and complex statistical models used to forecast business and economic outcomes. He found no difference in forecasting accuracy in twenty-one of the studies. Sophisticated models did better in five studies, while simple models outperformed complex ones in seven cases. See J. Scott Armstrong, “Forecasting by Extrapolation: Conclusions from 25 Years of Research,” Interfaces 14 (1984): 52–66. Spyros Makridakis has hosted a series of competitions for statistical models over two decades, and consistently found that complex models fail to outperform simpler approaches.

And yet it works. One recent study of alternative investment approaches pitted the Markowitz model and three extensions of his approach against the 1/N rule, testing them on seven samples of data from the real world. This research ran a total of twenty-eight horseraces between the four state-of-the-art statistical models and the 1/N rule. With ten years of historical data to estimate risk, returns, and correlations, the 1/N rule outperformed the Markowitz equation and its extensions 79 percent of the time. The 1/N rule earned a positive return in every test, while the more complicated models lost money for investors more than half the time.


pages: 370 words: 107,983

Rage Inside the Machine: The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All by Robert Elliott Smith

"World Economic Forum" Davos, Ada Lovelace, adjacent possible, affirmative action, AI winter, Alfred Russel Wallace, algorithmic bias, algorithmic management, AlphaGo, Amazon Mechanical Turk, animal electricity, autonomous vehicles, behavioural economics, Black Swan, Brexit referendum, British Empire, Cambridge Analytica, cellular automata, Charles Babbage, citizen journalism, Claude Shannon: information theory, combinatorial explosion, Computing Machinery and Intelligence, corporate personhood, correlation coefficient, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, deep learning, DeepMind, desegregation, discovery of DNA, disinformation, Douglas Hofstadter, Elon Musk, fake news, Fellow of the Royal Society, feminist movement, Filter Bubble, Flash crash, Geoffrey Hinton, Gerolamo Cardano, gig economy, Gödel, Escher, Bach, invention of the wheel, invisible hand, Jacquard loom, Jacques de Vaucanson, John Harrison: Longitude, John von Neumann, Kenneth Arrow, Linda problem, low skilled workers, Mark Zuckerberg, mass immigration, meta-analysis, mutually assured destruction, natural language processing, new economy, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, On the Economy of Machinery and Manufactures, p-value, pattern recognition, Paul Samuelson, performance metric, Pierre-Simon Laplace, post-truth, precariat, profit maximization, profit motive, Silicon Valley, social intelligence, statistical model, Stephen Hawking, stochastic process, Stuart Kauffman, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, The Future of Employment, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Thomas Bayes, Thomas Malthus, traveling salesman, Turing machine, Turing test, twin studies, Vilfredo Pareto, Von Neumann architecture, warehouse robotics, women in the workforce, Yochai Benkler

That act of faith remains largely hidden from everyone outside that community by a cloud of seemingly impenetrable mathematics. This obscures the dangers inherent in using statistics and probability as a basis for reasoning about people via algorithms. Statistical models, after all, aren’t unbiased, particularly when, as is the case for most algorithms today, they are motivated by the pursuit of profit. Just like expert systems, statistical models require a frame within which to operate, which is then populated by particular atoms. That frame and those atoms are subject to the same brittleness (limitations) and biases. On top of that, the probabilities drawn from these statistics, which become the grist for the statistical algorithmic mill, often aren’t what we think they are at all.

Unlike Wollstonecraft, Byron was a game-changing personality who challenged conventions and social mores and opened the door to a new Romantic Age. At least for men. The casual definition of outlier is ‘a person or thing situated away or detached from the main body or system,’ but in statistical modelling, it is ‘a data point on a graph or in a set of results that is very much bigger or smaller than the next nearest data point.’ In terms of algorithms, a statistical model is like the flattened and warped rugby ball, a shape that can be mathematically characterized by a few numbers, which can be in turn manipulated by an algorithm to fit data. In this sense, an outlier is a point that is far from the other points, the fluff on the data cloud which can’t easily be fitted inside the warped rugby ball.

However, there is another way to view the Bell Curve: not as a natural law, but as an artefact of trying to see complex and uncertain phenomena through the limiting lens of sampling and statistics. The CLT does not prove that everything follows a Bell Curve; it shows that when you sample in order to understand things that you can’t observe, you will always get a Bell Curve. That’s all. Despite this reality, faith in CLT and the Bell Curve still dominates in statistical modelling of all sorts of things today from presidential approval ratings to reoffending rates for criminals to educational success or failure, to whether jobs can be done by computers as well as people. What’s more, faith in this mathematical model inevitably led to its use in areas where it was ill-suited and inappropriate, such as Quetelet’s Theory of Probabilities as Applied and to the Moral and Political Sciences.


pages: 401 words: 109,892

The Great Reversal: How America Gave Up on Free Markets by Thomas Philippon

airline deregulation, Amazon Mechanical Turk, Amazon Web Services, Andrei Shleifer, barriers to entry, Big Tech, bitcoin, blockchain, book value, business cycle, business process, buy and hold, Cambridge Analytica, carbon tax, Carmen Reinhart, carried interest, central bank independence, commoditize, crack epidemic, cross-subsidies, disruptive innovation, Donald Trump, driverless car, Erik Brynjolfsson, eurozone crisis, financial deregulation, financial innovation, financial intermediation, flag carrier, Ford Model T, gig economy, Glass-Steagall Act, income inequality, income per capita, index fund, intangible asset, inventory management, Jean Tirole, Jeff Bezos, Kenneth Rogoff, labor-force participation, law of one price, liquidity trap, low cost airline, manufacturing employment, Mark Zuckerberg, market bubble, minimum wage unemployment, money market fund, moral hazard, natural language processing, Network effects, new economy, offshore financial centre, opioid epidemic / opioid crisis, Pareto efficiency, patent troll, Paul Samuelson, price discrimination, profit maximization, purchasing power parity, QWERTY keyboard, rent-seeking, ride hailing / ride sharing, risk-adjusted returns, Robert Bork, Robert Gordon, robo advisor, Ronald Reagan, search costs, Second Machine Age, self-driving car, Silicon Valley, Snapchat, spinning jenny, statistical model, Steve Jobs, stock buybacks, supply-chain management, Telecommunications Act of 1996, The Chicago School, the payments system, The Rise and Fall of American Growth, The Wealth of Nations by Adam Smith, too big to fail, total factor productivity, transaction costs, Travis Kalanick, vertical integration, Vilfredo Pareto, warehouse automation, zero-sum game

This pattern holds for the whole economy as well as within the manufacturing sector, where we can use more granular data (NAICS level 6, a term explained in the Appendix section on industry classification). The relationship is positive and significant over the 1997–2002 period but not after. In fact, the relationship appears to be negative, albeit noisy, in the 2007–2012 period. Box 4.2. Statistical Models Table 4.2 presents the results of five regressions, that is, five statistical models. The right half of the table considers the whole economy; the left half focuses on the manufacturing sector. TABLE 4.2 Regression Results Productivity growth Years (1) (2) (3) (4) (5) Manufacturing Whole economy 97–02 02–07 07–12 89–99 00–15 Census CR4 growth 0.13* 0.01 −0.13 [0.06] [0.05] [0.17] Compustat CR4 growth 0.14* −0.09 [0.06] [0.07] Data set & granularity NAICS-6 KLEMS Year fixed effects Y Y Y Y Y Observations 469 466 299 92 138 R2 0.03 0.00 0.02 0.07 0.09 Notes: Log changes in TFP and in top 4 concentration.

When BLS data collectors cannot obtain a price for an item in the CPI sample (for example, because the outlet has stopped selling it), they look for a replacement item that is closest to the missing one. The BLS then adjusts for changes in quality and specifications. It can use manufacturers’ cost data or hedonic regressions to compute quality adjustments. Hedonic regressions are statistical models to infer consumers’ willingness to pay for goods or services. When it cannot estimate an explicit quality adjustment, the BLS imputes the price change using the average price change of similar items in the same geographic area. Finally, the BLS has specific procedures to estimate the price of housing (rents and owners’ equivalent rents) and medical care.

To test this idea, Matias Covarrubias, Germán Gutiérrez, and I (2019) study the relationship between changes in concentration and changes in total factor productivity (TFP) across industries during the 1990s and 2000s. We use our trade-adjusted concentration measures to control for foreign competition and for exports. Box 4.2 and its table summarize our results and discuss the interpretation of the various numbers in statistical models. We find that the relationship between concentration and productivity growth has changed over the past twenty years. During the 1990s (1989–1999) this relationship was positive. Industries with larger increases in concentration were also industries with larger productivity gains. This is no longer the case.


pages: 545 words: 137,789

How Markets Fail: The Logic of Economic Calamities by John Cassidy

Abraham Wald, Alan Greenspan, Albert Einstein, An Inconvenient Truth, Andrei Shleifer, anti-communist, AOL-Time Warner, asset allocation, asset-backed security, availability heuristic, bank run, banking crisis, Bear Stearns, behavioural economics, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Black Monday: stock market crash in 1987, Black-Scholes formula, Blythe Masters, book value, Bretton Woods, British Empire, business cycle, capital asset pricing model, carbon tax, Carl Icahn, centralized clearinghouse, collateralized debt obligation, Columbine, conceptual framework, Corn Laws, corporate raider, correlation coefficient, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, Daniel Kahneman / Amos Tversky, debt deflation, different worldview, diversification, Elliott wave, Eugene Fama: efficient market hypothesis, financial deregulation, financial engineering, financial innovation, Financial Instability Hypothesis, financial intermediation, full employment, Garrett Hardin, George Akerlof, Glass-Steagall Act, global supply chain, Gunnar Myrdal, Haight Ashbury, hiring and firing, Hyman Minsky, income per capita, incomplete markets, index fund, information asymmetry, Intergovernmental Panel on Climate Change (IPCC), invisible hand, John Nash: game theory, John von Neumann, Joseph Schumpeter, junk bonds, Kenneth Arrow, Kickstarter, laissez-faire capitalism, Landlord’s Game, liquidity trap, London Interbank Offered Rate, Long Term Capital Management, Louis Bachelier, low interest rates, mandelbrot fractal, margin call, market bubble, market clearing, mental accounting, Mikhail Gorbachev, military-industrial complex, Minsky moment, money market fund, Mont Pelerin Society, moral hazard, mortgage debt, Myron Scholes, Naomi Klein, negative equity, Network effects, Nick Leeson, Nixon triggered the end of the Bretton Woods system, Northern Rock, paradox of thrift, Pareto efficiency, Paul Samuelson, Phillips curve, Ponzi scheme, precautionary principle, price discrimination, price stability, principal–agent problem, profit maximization, proprietary trading, quantitative trading / quantitative finance, race to the bottom, Ralph Nader, RAND corporation, random walk, Renaissance Technologies, rent control, Richard Thaler, risk tolerance, risk-adjusted returns, road to serfdom, Robert Shiller, Robert Solow, Ronald Coase, Ronald Reagan, Savings and loan crisis, shareholder value, short selling, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical model, subprime mortgage crisis, tail risk, Tax Reform Act of 1986, technology bubble, The Chicago School, The Great Moderation, The Market for Lemons, The Wealth of Nations by Adam Smith, too big to fail, Tragedy of the Commons, transaction costs, Two Sigma, unorthodox policies, value at risk, Vanguard fund, Vilfredo Pareto, wealth creators, zero-sum game

Maybe because of shifts in psychology or government policy, there are periods when markets will settle into a rut, and other periods when they will be apt to gyrate in alarming fashion. This picture seems to jibe with reality, but it raises some tricky issues for quantitative finance. If the underlying reality of the markets is constantly changing, statistical models based on past data will be of limited use, at best, in determining what is likely to happen in the future. And firms and investors that rely on these models to manage risk may well be exposing themselves to danger. The economics profession didn’t exactly embrace Mandelbrot’s criticisms. As the 1970s proceeded, the use of quantitative techniques became increasingly common on Wall Street.

After listening to Vincent Reinhart, the head of the Fed’s Division of Monetary Affairs, suggest several ways the Fed could try to revive the economy if interest rate changes could no longer be used, he dismissed the discussion as “premature” and described the possibility of a prolonged deflation as “a very small probability event.” The discussion turned to the immediate issue of whether to keep the funds rate at 1.25 percent. Since the committee’s previous meeting, Congress had approved the Bush administration’s third set of tax cuts since 2001, which was expected to give spending a boost. The Fed’s own statistical model of the economy was predicting a vigorous upturn later in 2003, suggesting that further rate cuts would be unnecessary and that some policy tightening might even be needed. “But that forecast has a very low probability, as far as I’m concerned,” Greenspan said curtly. “It points to an outcome that would be delightful if it were to materialize, but it is not a prospect on which we should focus our policy at this point.”

Greenspan’s method of analysis was inductive: he ingested as many figures as he could, from as many sources as he could find, then tried to fit them together into a coherent pattern. When I visited Greenspan at his office one day in 2000, I discovered him knee-deep in figures. He explained that he was trying to revamp a forty-year-old statistical model that his consulting firm had used to estimate realized capital gains on home sales. What made Greenspan such an interesting and important figure is that his empiricism was accompanied by a fervent belief in the efficiency and morality of the free market system. The conclusion that untrammeled capitalism provides a uniquely productive method of organizing production Greenspan took from his own observations and his reading of Adam Smith.


Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport

Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, commoditize, data acquisition, data science, disruptive innovation, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, lifelogging, Mark Zuckerberg, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining, Thomas Davenport, three-martini lunch

For reasons not entirely understood (by anyone, I think), the results of big data analyses are often expressed in visual formats. Now, visual analytics have a lot of strengths: They are relatively easy for non-quantitative executives to interpret, and they get attention. The downside is that they are not generally well suited for expressing complex multivariate relationships and statistical models. Put in other terms, most visual displays of data are for descriptive analytics, rather than predictive or prescriptive ones. They can, however, show a lot of data at once, as figure 4-1 illustrates. It’s a display of the tweets and retweets on Twitter involving particular New York Times articles.5 I find—as with many other complex big data visualizations—this one difficult to decipher.

.* In effect,     big data is not just a large volume of unstructured data, but also the technologies that make processing and analyzing it possible. Specific big data technologies analyze textual, video, and audio content. When big data is fast moving, technologies like machine learning allow for the rapid creation of statistical models that fit, optimize, and predict the data. This chapter is devoted to all of these big data technologies and the difference they make. The technologies addressed in the chapter are outlined in table 5-1. *I am indebted in this section to Jill Dyché, vice president of SAS Best Practices, who collaborated with me on this work and developed many of the frameworks in this section.

This makes it useful for analysts who are familiar with that query language. Business View The business view layer of the stack makes big data ready for further analysis. Depending on the big data application, additional processing via MapReduce or custom code might be used to construct an intermediate data structure, such as a statistical model, a flat file, a relational table, or a data cube. The resulting structure may be intended for additional analysis or to be queried by a traditional SQL-based query tool. Many vendors are moving to so-called “SQL on Hadoop” approaches, simply because SQL has been used in business for a couple of decades, and many people (and higher-level languages) know how to create SQL queries.


pages: 443 words: 51,804

Handbook of Modeling High-Frequency Data in Finance by Frederi G. Viens, Maria C. Mariani, Ionut Florescu

algorithmic trading, asset allocation, automated trading system, backtesting, Bear Stearns, Black-Scholes formula, book value, Brownian motion, business process, buy and hold, continuous integration, corporate governance, discrete time, distributed generation, fear index, financial engineering, fixed income, Flash crash, housing crisis, implied volatility, incomplete markets, linear programming, machine readable, mandelbrot fractal, market friction, market microstructure, martingale, Menlo Park, p-value, pattern recognition, performance metric, power law, principal–agent problem, random walk, risk free rate, risk tolerance, risk/return, short selling, statistical model, stochastic process, stochastic volatility, transaction costs, value at risk, volatility smile, Wiener process

HG106.V54 2011 332.01 5193–dc23 2011038022 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 Contents Preface Contributors xi xiii part One Analysis of Empirical Data 1 1 Estimation of NIG and VG Models for High Frequency Financial Data 3 José E. Figueroa-López, Steven R. Lancette, Kiseop Lee, and Yanhui Mi 1.1 1.2 1.3 1.4 1.5 1.6 Introduction, 3 The Statistical Models, 6 Parametric Estimation Methods, 9 Finite-Sample Performance via Simulations, 14 Empirical Results, 18 Conclusion, 22 References, 24 2 A Study of Persistence of Price Movement using High Frequency Financial Data 27 Dragos Bozdog, Ionuţ Florescu, Khaldoun Khashanah, and Jim Wang 2.1 Introduction, 27 2.2 Methodology, 29 2.3 Results, 35 v vi Contents 2.4 Rare Events Distribution, 41 2.5 Conclusions, 44 References, 45 3 Using Boosting for Financial Analysis and Trading 47 Germán Creamer 3.1 3.2 3.3 3.4 3.5 Introduction, 47 Methods, 48 Performance Evaluation, 53 Earnings Prediction and Algorithmic Trading, 60 Final Comments and Conclusions, 66 References, 69 4 Impact of Correlation Fluctuations on Securitized structures 75 Eric Hillebrand, Ambar N.

The data was obtained from the NYSE TAQ database of 2005 trades via Wharton’s WRDS system. For the sake of clarity and space, we only present the results for Intel and defer a full analysis of other stocks for a future publication. We finish with a section of conclusions and further recommendations. 1.2 The Statistical Models 1.2.1 GENERALITIES OF EXPONENTIAL LÉVY MODELS Before introducing the specific models we consider in this chapter, let us briefly motivate the application of Lévy processes in financial modeling. We refer the reader to the monographs of Cont & Tankov (2004) and Sato (1999) or the recent review papers Figueroa-López (2011) and Tankov (2011) for further information.

A geometric Brownian motion (also called Black–Scholes model) postulates the following conditions about the price process (St )t≥0 of a risky asset: (1) The (log) return on the asset over a time period [t, t + h] of length h, that is, Rt,t+h := log St+h St is Gaussian with mean μh and variance σ 2 h (independent of t); 7 1.2 The Statistical Models (2) Log returns on disjoint time periods are mutually independent; (3) The price path t → St is continuous; that is, P(Su → St , as u → t, ∀ t) = 1. The previous assumptions can equivalently be stated in terms of the so-called log return process (Xt )t , denoted henceforth as Xt := log St .


pages: 460 words: 122,556

The End of Wall Street by Roger Lowenstein

"World Economic Forum" Davos, Alan Greenspan, Asian financial crisis, asset-backed security, bank run, banking crisis, Bear Stearns, benefit corporation, Berlin Wall, Bernie Madoff, Black Monday: stock market crash in 1987, Black Swan, break the buck, Brownian motion, Carmen Reinhart, collateralized debt obligation, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversified portfolio, eurozone crisis, Fall of the Berlin Wall, fear of failure, financial deregulation, financial engineering, fixed income, geopolitical risk, Glass-Steagall Act, Greenspan put, high net worth, Hyman Minsky, interest rate derivative, invisible hand, junk bonds, Ken Thompson, Kenneth Rogoff, London Interbank Offered Rate, Long Term Capital Management, low interest rates, margin call, market bubble, Martin Wolf, Michael Milken, money market fund, moral hazard, mortgage debt, negative equity, Northern Rock, Ponzi scheme, profit motive, race to the bottom, risk tolerance, Ronald Reagan, Rubik’s Cube, Savings and loan crisis, savings glut, short selling, sovereign wealth fund, statistical model, the payments system, too big to fail, tulip mania, Y2K

See AIG bailouts Ben Bernanke and board of Warren Buffett and CDOs and collateral calls on compensation at corporate structure of credit default swaps and credit rating agencies and Jamie Dimon and diversity of holdings employees, number of Financial Products subsidiary Timothy Geithner and Goldman Sachs and insurance (credit default swap) premiums of JPMorgan Chase and lack of reserve for losses leadership changes Lehman Brothers and losses Moody’s and Morgan Stanley and New York Federal Reserve Bank and Hank Paulson and rescue of. See AIG bailouts revenue of shareholders statistical modeling of stock price of struggles of risk of systemic effects of failure of Texas and AIG bailouts amount of Ben Bernanke and board’s role in credit rating agencies and Federal Reserve and Timothy Geithner and Goldman Sachs and JPMorgan Chase and Lehman Brothers’ bankruptcy and New York state and Hank Paulson and reasons for harm to shareholders in Akers, John Alexander, Richard Allison, Herbert Ambac American Home Mortgages Andrukonis, David appraisers, real estate Archstone-Smith Trust Associates First Capital Atteberry, Thomas auto industry Bagehot, Walter bailouts.

See credit crisis volatility of credit crisis borrowers, lack of effects of fear of lending mortgages and reasons for spread of as unforeseen credit cycle credit default swaps AIG and Goldman Sachs and Morgan Stanley and credit rating agencies. See also specific agencies AIG and capital level determination by guessing by inadequacy of models of Lehman Brothers and Monte Carlo method of mortgage-backed securities and statistical modeling used by Credit Suisse Cribiore, Alberto Cummings, Christine Curl, Gregory Dallavecchia, Enrico Dannhauser, Stephen Darling, Alistair Dean Witter debt of financial firms U.S. reliance on of U.S. families defaults/delinquencies deflation deleveraging. See also specific firms del Missier, Jerry Democrats deposit insurance deregulation of banking system and derivatives of financial markets derivatives.

See home foreclosure(s) foreign investors France Frank, Barney Freddie Mac and Fannie Mae accounting problems of affordable housing and Alternative-A loans bailout of Ben Bernanke and capital raised by competitive threats to Congress and Countrywide Financial and Democrats and Federal Reserve and foreign investment in Alan Greenspan and as guarantor history of lack of regulation of leadership changes leverage losses mortgage bubble and as mortgage traders Hank Paulson and politics and predatory lending and reasons for failures of relocation to private sector Robert Rodriguez and shareholders solving financial crisis through statistical models of stock price of Treasury Department and free market Freidheim, Scott Friedman, Milton Fuld, Richard compensation of failure to pull back from mortgage-backed securities identification with Lehman Brothers Lehman Brothers’ bankruptcy and Lehman Brothers’ last days and long tenure of Hank Paulson and personality and character of Gamble, James (Jamie) GDP Geithner, Timothy AIG and bank debt guarantees and Bear Stearns bailout and career of China and Citigroup and financial crisis, response to Lehman Brothers and money markets and Morgan Stanley and in Obama administration Hank Paulson and TARP and Gelband, Michael General Electric General Motors Germany Glass-Steagall Act Glauber, Robert Golden West Savings and Loan Goldman Sachs AIG and as bank holding company Warren Buffett investment in capital raised by capital sought by compensation at credit default swaps and hedge funds and insurance (credit default swap) premiums of job losses at leverage of Merrill Lynch and Stanley O’Neal’s obsession with Hank Paulson and pull back from mortgage-backed securities short selling against stock price of Wachovia and Gorton, Gary government, U.S.


Evidence-Based Technical Analysis: Applying the Scientific Method and Statistical Inference to Trading Signals by David Aronson

Albert Einstein, Andrew Wiles, asset allocation, availability heuristic, backtesting, Black Swan, book value, butter production in bangladesh, buy and hold, capital asset pricing model, cognitive dissonance, compound rate of return, computerized trading, Daniel Kahneman / Amos Tversky, distributed generation, Elliott wave, en.wikipedia.org, equity risk premium, feminist movement, Great Leap Forward, hindsight bias, index fund, invention of the telescope, invisible hand, Long Term Capital Management, managed futures, mental accounting, meta-analysis, p-value, pattern recognition, Paul Samuelson, Ponzi scheme, price anchoring, price stability, quantitative trading / quantitative finance, Ralph Nelson Elliott, random walk, retrograde motion, revision control, risk free rate, risk tolerance, risk-adjusted returns, riskless arbitrage, Robert Shiller, Sharpe ratio, short selling, source of truth, statistical model, stocks for the long run, sugar pill, systematic trading, the scientific method, transfer pricing, unbiased observer, yield curve, Yogi Berra

It was a review of prior studies, known as a meta-analysis, which examined 20 studies that had compared the subjective diagnoses of psychologists and psychiatrists with those produced by linear statistical models. The studies covered the prediction of academic success, the likelihood of criminal recidivism, and predicting the outcomes of electrical shock therapy. In each case, the experts rendered a judgment by evaluating a multitude of variables in a subjective manner. “In all studies, the statistical model provided more accurate predictions or the two methods tied.”34 A subsequent study by Sawyer35 was a meta analysis of 45 studies. “Again, there was not a single study in which clinical global judgment was superior to the statistical prediction (termed ‘mechanical combination’ by Sawyer).”36 Sawyer’s investigation is noteworthy because he considered studies in which the human expert was allowed access to information that was not considered by the statistical model, and yet the model was still superior.

The prediction problems spanned nine different fields: (1) academic performance of graduate students, (2) life-expectancy of cancer patients, (3) changes in stock prices, (4) mental illness using personality tests, (5) grades and attitudes in a psychology course, (6) business failures using financial ratios, (7) students’ ratings of teaching effectiveness, (8) performance of life insurance sales personnel, and (9) IQ scores using Rorschach Tests. Note that the average correlation of the statistical model was 0.64 versus the expert average of 0.33. In terms of information content, which is measured by the correlation coefficient squared or r-squared, the model’s predictions were on average 3.76 times as informative as the experts’. Numerous additional studies comparing expert judgment to statistical models (rules) have confirmed these findings, forcing the conclusion that people do poorly when attempting to combine a multitude of variables to make predictions or judgments.

The average accuracy of the experts, as measured by the correlation coefficient between their prediction of violence and the actual manifestation of violence, was a poor 0.12. The single best expert had a score of 0.36. The predictions of a linear statistical model, using the same set of 19 inputs, achieved a correlation of 0.82. In this instance the model’s predictions were nearly 50 times more informative than the experts’. Meehl continued to expand his research of comparing experts and statistical models and in 1986 concluded that “There is no controversy in social science which shows such a large body of qualitatively diverse studies coming out so uniformly in the same direction as this one.


pages: 294 words: 77,356

Automating Inequality by Virginia Eubanks

autonomous vehicles, basic income, Black Lives Matter, business process, call centre, cognitive dissonance, collective bargaining, correlation does not imply causation, data science, deindustrialization, digital divide, disruptive innovation, Donald Trump, driverless car, Elon Musk, ending welfare as we know it, experimental subject, fake news, gentrification, housing crisis, Housing First, IBM and the Holocaust, income inequality, job automation, mandatory minimum, Mark Zuckerberg, mass incarceration, minimum wage unemployment, mortgage tax deduction, new economy, New Urbanism, payday loans, performance metric, Ronald Reagan, San Francisco homelessness, self-driving car, sparse data, statistical model, strikebreaker, underbanked, universal basic income, urban renewal, W. E. B. Du Bois, War on Poverty, warehouse automation, working poor, Works Progress Administration, young professional, zero-sum game

The electronic registry of the unhoused I studied in Los Angeles, called the coordinated entry system, was piloted seven years later. It deploys computerized algorithms to match unhoused people in its registry to the most appropriate available housing resources. The Allegheny Family Screening Tool, launched in August 2016, uses statistical modeling to provide hotline screeners with a predictive risk score that shapes the decision whether or not to open child abuse and neglect investigations. I started my reporting in each location by reaching out to organizations working closely with the families most directly impacted by these systems.

“[P]renatal risk assessments could be used to identify children at risk … while still in the womb.”3 On the other side of the world, Rhema Vaithianathan, associate professor of economics at the University of Auckland, was on a team developing just such a tool. As part of a larger program of welfare reforms led by conservative Paula Bennett, the New Zealand Ministry of Social Development (MSD) commissioned the Vaithianathan team to create a statistical model to sift information on parents interacting with the public benefits, child protective, and criminal justice systems to predict which children were most likely to be abused or neglected. Vaithianathan reached out to Putnam-Hornstein to collaborate. “It was such an exciting opportunity to partner with Rhema’s team around this potential real-time use of data to target children,” said Putnam-Hornstein.

It is an early adopter in a nationwide algorithmic experiment in child welfare: similar systems have been implemented recently in Florida, Los Angeles, New York City, Oklahoma, and Oregon. As this book goes to press, Cherna and Dalton continue to experiment with data analytics. The next iteration of the AFST will employ machine learning rather than traditional statistical modeling. They also plan to introduce a second predictive model, one that will not rely on reports to the hotline at all. Instead, the planned model “would be run on a daily or weekly basis on all babies born in Allegheny County the prior day or week,” according to a September 2017 email from Dalton.


pages: 263 words: 75,455

Quantitative Value: A Practitioner's Guide to Automating Intelligent Investment and Eliminating Behavioral Errors by Wesley R. Gray, Tobias E. Carlisle

activist fund / activist shareholder / activist investor, Alan Greenspan, Albert Einstein, Andrei Shleifer, asset allocation, Atul Gawande, backtesting, beat the dealer, Black Swan, book value, business cycle, butter production in bangladesh, buy and hold, capital asset pricing model, Checklist Manifesto, cognitive bias, compound rate of return, corporate governance, correlation coefficient, credit crunch, Daniel Kahneman / Amos Tversky, discounted cash flows, Edward Thorp, Eugene Fama: efficient market hypothesis, financial engineering, forensic accounting, Henry Singleton, hindsight bias, intangible asset, Jim Simons, Louis Bachelier, p-value, passive investing, performance metric, quantitative hedge fund, random walk, Richard Thaler, risk free rate, risk-adjusted returns, Robert Shiller, shareholder value, Sharpe ratio, short selling, statistical model, stock buybacks, survivorship bias, systematic trading, Teledyne, The Myth of the Rational Market, time value of money, transaction costs

We need some means to protect us from our cognitive biases, and the quantitative method is that means. It serves both to protect us from our own behavioral errors and to exploit the behavioral errors of others. The model does need not be complex to achieve this end. In fact, the weight of evidence indicates that even simple statistical models outperform the best experts. It speaks to the diabolical nature of our faulty cognitive apparatus that those simple statistical models continue to outperform the best experts even when those same experts are given access to the models' output. This is as true for a value investor as it is for any other expert in any other field of endeavor. This book is aimed at value investors.

Tetlock's conclusion is that experts suffer from the same behavioral biases as the laymen. Tetlock's study fits within a much larger body of research that has consistently found that experts are as unreliable as the rest of us. A large number of studies have examined the records of experts against simple statistical model, and, in almost all cases, concluded that experts either underperform the models or can do no better. It's a compelling argument against human intuition and for the statistical approach, whether it's practiced by experts or nonexperts.37 Even Experts Make Behavioral Errors In many disciplines, simple quantitative models outperform the intuition of the best experts.

The model predicted O'Connor's vote correctly 70 percent of the time, while the experts' success rate was only 61 percent.41How can it be that simple models perform better than experienced clinical psychologists or renowned legal experts with access to detailed information about the cases? Are these results just flukes? No. In fact, the MMPI and Supreme Court decision examples are not even rare. There are an overwhelming number of studies and meta-analyses—studies of studies—that corroborate this phenomenon. In his book, Montier provides a diverse range of studies comparing statistical models and experts, ranging from the detection of brain damage, the interview process to admit students to university, the likelihood of a criminal to reoffend, the selection of “good” and “bad” vintages of Bordeaux wine, and the buying decisions of purchasing managers. Value Investors Have Cognitive Biases, Too Graham recognized early on that successful investing required emotional discipline.


pages: 1,829 words: 135,521

Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney

Bear Stearns, business process, data science, Debian, duck typing, Firefox, general-purpose programming language, Google Chrome, Guido van Rossum, index card, p-value, quantitative trading / quantitative finance, random walk, recommendation engine, sentiment analysis, side project, sorting algorithm, statistical model, Two Sigma, type inference

.: categories=['a', 'b']) In [25]: data Out[25]: x0 x1 y category 0 1 0.01 -1.5 a 1 2 -0.01 0.0 b 2 3 0.25 3.6 a 3 4 -4.10 1.3 a 4 5 0.00 -2.0 b If we wanted to replace the 'category' column with dummy variables, we create dummy variables, drop the 'category' column, and then join the result: In [26]: dummies = pd.get_dummies(data.category, prefix='category') In [27]: data_with_dummies = data.drop('category', axis=1).join(dummies) In [28]: data_with_dummies Out[28]: x0 x1 y category_a category_b 0 1 0.01 -1.5 1 0 1 2 -0.01 0.0 0 1 2 3 0.25 3.6 1 0 3 4 -4.10 1.3 1 0 4 5 0.00 -2.0 0 1 There are some nuances to fitting certain statistical models with dummy variables. It may be simpler and less error-prone to use Patsy (the subject of the next section) when you have more than simple numeric columns. 13.2 Creating Model Descriptions with Patsy Patsy is a Python library for describing statistical models (especially linear models) with a small string-based “formula syntax,” which is inspired by (but not exactly the same as) the formula syntax used by the R and S statistical programming languages.

This includes such submodules as: Regression models: Linear regression, generalized linear models, robust linear models, linear mixed effects models, etc. Analysis of variance (ANOVA) Time series analysis: AR, ARMA, ARIMA, VAR, and other models Nonparametric methods: Kernel density estimation, kernel regression Visualization of statistical model results statsmodels is more focused on statistical inference, providing uncertainty estimates and p-values for parameters. scikit-learn, by contrast, is more prediction-focused. As with scikit-learn, I will give a brief introduction to statsmodels and how to use it with NumPy and pandas. 1.4 Installation and Setup Since everyone uses Python for different applications, there is no single solution for setting up Python and required add-on packages.

While readers may have many different end goals for their work, the tasks required generally fall into a number of different broad groups: Interacting with the outside world Reading and writing with a variety of file formats and data stores Preparation Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and transforming data for analysis Transformation Applying mathematical and statistical operations to groups of datasets to derive new datasets (e.g., aggregating a large table by group variables) Modeling and computation Connecting your data to statistical models, machine learning algorithms, or other computational tools Presentation Creating interactive or static graphical visualizations or textual summaries Code Examples Most of the code examples in the book are shown with input and output as it would appear executed in the IPython shell or in Jupyter notebooks: In [5]: CODE EXAMPLE Out[5]: OUTPUT When you see a code example like this, the intent is for you to type in the example code in the In block in your coding environment and execute it by pressing the Enter key (or Shift-Enter in Jupyter).


pages: 461 words: 128,421

The Myth of the Rational Market: A History of Risk, Reward, and Delusion on Wall Street by Justin Fox

"Friedman doctrine" OR "shareholder theory", Abraham Wald, activist fund / activist shareholder / activist investor, Alan Greenspan, Albert Einstein, Andrei Shleifer, AOL-Time Warner, asset allocation, asset-backed security, bank run, beat the dealer, behavioural economics, Benoit Mandelbrot, Big Tech, Black Monday: stock market crash in 1987, Black-Scholes formula, book value, Bretton Woods, Brownian motion, business cycle, buy and hold, capital asset pricing model, card file, Carl Icahn, Cass Sunstein, collateralized debt obligation, compensation consultant, complexity theory, corporate governance, corporate raider, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, democratizing finance, Dennis Tito, discovery of the americas, diversification, diversified portfolio, Dr. Strangelove, Edward Glaeser, Edward Thorp, endowment effect, equity risk premium, Eugene Fama: efficient market hypothesis, experimental economics, financial innovation, Financial Instability Hypothesis, fixed income, floating exchange rates, George Akerlof, Glass-Steagall Act, Henri Poincaré, Hyman Minsky, implied volatility, impulse control, index arbitrage, index card, index fund, information asymmetry, invisible hand, Isaac Newton, John Bogle, John Meriwether, John Nash: game theory, John von Neumann, joint-stock company, Joseph Schumpeter, junk bonds, Kenneth Arrow, libertarian paternalism, linear programming, Long Term Capital Management, Louis Bachelier, low interest rates, mandelbrot fractal, market bubble, market design, Michael Milken, Myron Scholes, New Journalism, Nikolai Kondratiev, Paul Lévy, Paul Samuelson, pension reform, performance metric, Ponzi scheme, power law, prediction markets, proprietary trading, prudent man rule, pushing on a string, quantitative trading / quantitative finance, Ralph Nader, RAND corporation, random walk, Richard Thaler, risk/return, road to serfdom, Robert Bork, Robert Shiller, rolodex, Ronald Reagan, seminal paper, shareholder value, Sharpe ratio, short selling, side project, Silicon Valley, Skinner box, Social Responsibility of Business Is to Increase Its Profits, South Sea Bubble, statistical model, stocks for the long run, tech worker, The Chicago School, The Myth of the Rational Market, The Predators' Ball, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, Thorstein Veblen, Tobin tax, transaction costs, tulip mania, Two Sigma, Tyler Cowen, value at risk, Vanguard fund, Vilfredo Pareto, volatility smile, Yogi Berra

First, modeling financial risk is hard. Statistical models can never fully capture all things that can go wrong (or right). It was as physicist and random walk pioneer M. F. M. Osborne told his students at UC–Berkeley back in 1972: For everyday market events the bell curve works well. When it doesn’t, one needs to look outside the statistical models and make informed judgments about what’s driving the market and what the risks are. The derivatives business and other financial sectors on the rise in the 1980s and 1990s were dominated by young quants. These people knew how to work statistical models, but they lacked the market experience needed to make informed judgments.

Traditional ratios of loan-to-value and monthly payments to income gave way to credit scoring and purportedly precise gradations of default risk that turned out to be worse than useless. In the 1970s, Amos Tversky and Daniel Kahneman had argued that real-world decision makers didn’t follow the statistical models of John von Neumann and Oskar Morgenstern, but used simple heuristics—rules of thumb—instead. Now the mortgage lending industry was learning that heuristics worked much better than statistical models descended from the work of von Neumann and Morgenstern. Simple trumped complex. In 2005, Robert Shiller came out with a second edition of Irrational Exuberance that featured a new twenty-page chapter on “The Real Estate Market in Historical Perspective.”

These people knew how to work statistical models, but they lacked the market experience needed to make informed judgments. Meanwhile, those with the experience, wisdom, and authority to make informed judgments—the bosses—didn’t understand the statistical models. It’s possible that, as more quants rise into positions of high authority (1986 Columbia finance Ph.D. Vikram Pandit, who became CEO of Citigroup in 2007, was the first quant to run a major bank), this particular problem will become less pronounced. But the second obstacle to risk-free living through derivatives is much harder to get around. It’s the paradox that killed portfolio insurance—when enough people subscribe to a particular means of taming financial risk, then that in itself brings new risks.


pages: 301 words: 85,126

AIQ: How People and Machines Are Smarter Together by Nick Polson, James Scott

Abraham Wald, Air France Flight 447, Albert Einstein, algorithmic bias, Amazon Web Services, Atul Gawande, autonomous vehicles, availability heuristic, basic income, Bayesian statistics, Big Tech, Black Lives Matter, Bletchley Park, business cycle, Cepheid variable, Checklist Manifesto, cloud computing, combinatorial explosion, computer age, computer vision, Daniel Kahneman / Amos Tversky, data science, deep learning, DeepMind, Donald Trump, Douglas Hofstadter, Edward Charles Pickering, Elon Musk, epigenetics, fake news, Flash crash, Grace Hopper, Gödel, Escher, Bach, Hans Moravec, Harvard Computers: women astronomers, Higgs boson, index fund, information security, Isaac Newton, John von Neumann, late fees, low earth orbit, Lyft, machine translation, Magellanic Cloud, mass incarceration, Moneyball by Michael Lewis explains big data, Moravec's paradox, more computing power than Apollo, natural language processing, Netflix Prize, North Sea oil, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, p-value, pattern recognition, Pierre-Simon Laplace, ransomware, recommendation engine, Ronald Reagan, Salesforce, self-driving car, sentiment analysis, side project, Silicon Valley, Skype, smart cities, speech recognition, statistical model, survivorship bias, systems thinking, the scientific method, Thomas Bayes, Uber for X, uber lyft, universal basic income, Watson beat the top human players on Jeopardy!, young professional

People rely on billions of language facts, most of which they take for granted—like the knowledge that “drop your trousers” and “drop off your trousers” are used in very different situations, only one of which is at the dry cleaner’s. Knowledge like this is hard to codify in explicit rules, because there’s too much of it. Believe it or not, the best way we know to teach it to machines is to give them a giant hard drive full of examples of how people say stuff, and to let the machines sort it out on their own with a statistical model. This purely data-driven approach to language may seem naïve, and until recently we simply didn’t have enough data or fast-enough computers to make it work. Today, though, it works shockingly well. At its tech conference in 2017, for example, Google boldly announced that machines had now reached parity with humans at speech recognition, with a per-word dictation error rate of 4.9%—drastically better than the 20–30% error rates common as recently as 2013.

This is about 250 times more common than “whether report” (0.0000000652%), which is used mainly as a bad pun or an example of phonetic ambiguity. From the 1980s onward, NLP researchers began to recognize the value of this purely statistical information. Before, they’d been hand-building rules capable of describing how a given linguistic task should be performed. Now, these experts started training statistical models capable of predicting that a person would perform a task in a certain way. As a field, NLP shifted its focus from understanding to mimicry—from knowing how, to knowing that. These new models required lots of data. You fed the machine as many examples as you could find of how humans use language, and you programmed the machine to use the rules of probability to find patterns in those examples.

You may remember a time when people dialed 411 to look up a phone number for a local business, at a dollar or so per call. Google 411 lets you do the same thing for free, by dialing 1-800-GOOG-411. It was a useful service in an age before ubiquitous smartphones—and also a great way for Google to build up an enormous database of voice queries that would help train its statistical models for speech recognition. The system quietly shut down in 2010, presumably because Google had all the data it needed. Of course, there’s been an awful lot of Grace Hopper–style coding since 2007 to turn all that data into good prediction rules. So more than a decade later, what’s the result?


Know Thyself by Stephen M Fleming

Abraham Wald, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, AlphaGo, autism spectrum disorder, autonomous vehicles, availability heuristic, backpropagation, citation needed, computer vision, confounding variable, data science, deep learning, DeepMind, Demis Hassabis, Douglas Hofstadter, Dunning–Kruger effect, Elon Musk, Estimating the Reproducibility of Psychological Science, fake news, global pandemic, higher-order functions, index card, Jeff Bezos, l'esprit de l'escalier, Lao Tzu, lifelogging, longitudinal study, meta-analysis, mutually assured destruction, Network effects, patient HM, Pierre-Simon Laplace, power law, prediction markets, QWERTY keyboard, recommendation engine, replication crisis, self-driving car, side project, Skype, Stanislav Petrov, statistical model, theory of mind, Thomas Bayes, traumatic brain injury

By tweaking the settings of the scanner, rapid snapshots can also be taken every few seconds that track changes in blood oxygen levels in different parts of the brain (this is known as functional MRI, or fMRI). Because more vigorous neural firing uses up more oxygen, these changes in blood oxygen levels are useful markers of neural activity. The fMRI signal is very slow compared to the rapid firing of neurons, but, by applying statistical models to the signal, it is possible to reconstruct maps that highlight brain regions as being more or less active when people are doing particular tasks. If I put you in an fMRI scanner and asked you to think about yourself, it’s a safe bet that I would observe changes in activation in two key parts of the association cortex: the medial PFC and the medial parietal cortex (also known as the precuneus), which collectively are sometimes referred to as the cortical midline structures.

Metacognitive sensitivity is subtly but importantly different from metacognitive bias, which is the overall tendency to be more or less confident. While on average I might be overconfident, if I am still aware of each time I make an error (the Ds in the table), then I can still achieve a high level of metacognitive sensitivity. We can quantify people’s metacognitive sensitivity by fitting parameters from statistical models to people’s confidence ratings (with names such as meta-d’ and Φ). Ever more sophisticated models are being developed, but they ultimately all boil down to quantifying the extent to which our self-evaluations track whether we are actually right or wrong.4 What Makes One Person’s Metacognition Better than Another’s?

This kind of self-endorsement of our choices is a key aspect of decision-making, and it can have profound consequences for whether we decide to reverse or undo such decisions. Together with our colleagues Neil Garrett and Ray Dolan, Benedetto and I set out to investigate people’s self-awareness about their subjective choices in the lab. In order to apply the statistical models of metacognition that we encountered in Chapter 4, we needed to get people to make lots of choices, one after the other, and rate their confidence in choosing the best option—a proxy for whether they in fact wanted what they chose. We collected a set of British snacks, such as chocolate bars and crisps, and presented people with all possible pairs of items to choose between (hundreds of pairs in total).


Learn Algorithmic Trading by Sebastien Donadio

active measures, algorithmic trading, automated trading system, backtesting, Bayesian statistics, behavioural economics, buy and hold, buy low sell high, cryptocurrency, data science, deep learning, DevOps, en.wikipedia.org, fixed income, Flash crash, Guido van Rossum, latency arbitrage, locking in a profit, market fundamentalism, market microstructure, martingale, natural language processing, OpenAI, p-value, paper trading, performance metric, prediction markets, proprietary trading, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, Sharpe ratio, short selling, sorting algorithm, statistical arbitrage, statistical model, stochastic process, survivorship bias, transaction costs, type inference, WebSocket, zero-sum game

In-sample versus out-of-sample data When building a statistical model, we use cross-validation to avoid overfitting. Cross-validation imposes a division of data into two or three different sets. One set will be used to create your model, while the other sets will be used to validate the model's accuracy. Because the model has not been created with the other datasets, we will have a better idea of its performance. When testing a trading strategy with historical data, it is important to use a portion of data for testing. In a statistical model, we call training data the initial data to create the model.

R is not significantly more recent than Python. It was released in 1995 by the two founders, Ross Ihaka and Robert Gentleman, while Python was released in 1991 by Guido Van Rossum. Today, R is mainly used by the academic and research world. Unlike many other languages, Python and R allows us to write a statistical model with a few lines of code. Because it is impossible to choose one over the other, since they both have their own advantages, they can easily be used in a complementary manner. Developers created a multitude of libraries capable of easily using one language in conjunction with the other without any difficulties.

The last step of the time series analysis is to forecast the time series. We have two possible scenarios: A strictly stationary series without dependencies among values. We can use a regular linear regression to forecast values. A series with dependencies among values. We will be forced to use other statistical models. In this chapter, we chose to focus on using the Auto-Regression Integrated Moving Averages (ARIMA) model. This model has three parameters: Autoregressive (AR) term (p)—lags of dependent variables. Example for 3, the predictors for x(t) is x(t-1) + x(t-2) + x(t-3). Moving average (MA) term (q)—lags for errors in prediction.


Analysis of Financial Time Series by Ruey S. Tsay

Asian financial crisis, asset allocation, backpropagation, Bayesian statistics, Black-Scholes formula, Brownian motion, business cycle, capital asset pricing model, compound rate of return, correlation coefficient, data acquisition, discrete time, financial engineering, frictionless, frictionless market, implied volatility, index arbitrage, inverted yield curve, Long Term Capital Management, market microstructure, martingale, p-value, pattern recognition, random walk, risk free rate, risk tolerance, short selling, statistical model, stochastic process, stochastic volatility, telemarketer, transaction costs, value at risk, volatility smile, Wiener process, yield curve

Stable Distribution The stable distributions are a natural generalization of normal in that they are stable under addition, which meets the need of continuously compounded returns rt . Furthermore, stable distributions are capable of capturing excess kurtosis shown by historical stock returns. However, non-normal stable distributions do not have a finite variance, which is in conflict with most finance theories. In addition, statistical modeling using non-normal stable distributions is difficult. An example of non-normal stable distributions is the Cauchy distribution, which is symmetric with respect to its median, but has infinite variance. Scale Mixture of Normal Distributions Recent studies of stock returns tend to use scale mixture or finite mixture of normal distributions.

Furthermore, the lag- autocovariance of rt is γ = Cov(rt , rt− ) = E =E ∞ i=0 ∞ ψi at−i ∞ ψ j at−− j j=0 ψi ψ j at−i at−− j i, j=0 = ∞ j=0 2 2 ψ j+ ψ j E(at−− j ) = σa ∞ ψ j ψ j+ . j=0 Consequently, the ψ-weights are related to the autocorrelations of rt as follows: ∞ ψi ψi+ γ = i=0 ρ = ∞ 2 , γ0 1 + i=1 ψi ≥ 0, (2.5) where ψ0 = 1. Linear time series models are econometric and statistical models used to describe the pattern of the ψ-weights of rt . 2.4 SIMPLE AUTOREGRESSIVE MODELS The fact that the monthly return rt of CRSP value-weighted index has a statistically significant lag-1 autocorrelation indicates that the lagged return rt−1 might be useful in predicting rt . A simple model that makes use of such predictive power is rt = φ0 + φ1rt−1 + at , (2.6) where {at } is assumed to be a white noise series with mean zero and variance σa2 .

If we treat the random-walk model as a special AR(1) model, then the coefficient of pt−1 is unity, which does not satisfy the weak stationarity condition of an AR(1) model. A random-walk series is, therefore, not weakly stationary, and we call it a unit-root nonstationary time series. The random-walk model has been widely considered as a statistical model for the movement of logged stock prices. Under such a model, the stock price is not predictable or mean reverting. To see this, the 1-step ahead forecast of model (2.32) at the forecast origin h is p̂h (1) = E( ph+1 | ph , ph−1 , . . .) = ph , which is the log price of the stock at the forecast origin.


pages: 416 words: 39,022

Asset and Risk Management: Risk Oriented Finance by Louis Esch, Robert Kieffer, Thierry Lopez

asset allocation, Brownian motion, business continuity plan, business process, capital asset pricing model, computer age, corporate governance, discrete time, diversified portfolio, fixed income, implied volatility, index fund, interest rate derivative, iterative process, P = NP, p-value, random walk, risk free rate, risk/return, shareholder value, statistical model, stochastic process, transaction costs, value at risk, Wiener process, yield curve, zero-coupon bond

Table 6.3 Student distribution quantiles ν γ2 z0.95 z0.975 z0.99 6.00 1.00 0.55 0.38 0.29 0.23 0.17 0.11 0.05 0 2.601 2.026 1.883 1.818 1.781 1.757 1.728 1.700 1.672 1.645 3.319 2.491 2.289 2.199 2.148 2.114 2.074 2.034 1.997 1.960 4.344 3.090 2.795 2.665 2.591 2.543 2.486 2.431 2.378 2.326 5 10 15 20 25 30 40 60 120 normal 8 Blattberg R. and Gonedes N., A comparison of stable and student distributions as statistical models for stock prices, Journal of Business, Vol. 47, 1974, pp. 244–80. 9 Pearson E. S. and Hartley H. O., Biometrika Tables for Statisticians, Biometrika Trust, 1976, p. 146. 190 Asset and Risk Management This clearly shows that when the normal law is used in place of the Student laws, the VaR parameter is underestimated unless the number of degrees of freedom is high.

Using pt presents the twofold advantage of: • making the magnitudes of the various factors likely to be involved in evaluating an asset or portfolio relative; • supplying a variable that has been shown to be capable of possessing certain distributional properties (normality or quasi-normality for returns on equities, for example). 1 Estimating quantiles is often a complex problem, especially for arguments close to 0 or 1. Interested readers should read Gilchrist W. G., Statistical Modelling with Quantile Functions, Chapman & Hall/CRC, 2000. 2 If the risk factor X is a share price, we are looking at the return on that share (see Section 3.1.1). 200 Asset and Risk Management Valuation models Historical data Estimation technique VaR Figure 7.1 Estimating VaR Note In most calculation methods, a different expression is taken into consideration: ∗ (t) = ln X(t) X(t − 1) As we saw in Section 3.1.1, this is in fact very similar to (t) and has the advantage that it can take on any real value3 and that the logarithmic return for several consecutive periods is the sum of the logarithmic return for each of those periods.

(yt ))) − r times− instead of yt ((yt ) = yt − yt−1 ). We therefore use an ARIMA(p, r, q) procedure.16 If this procedure fails because of nonconstant volatility in the error term, it will be necessary to use the ARCH-GARCH or EGARCH models (Appendix 7). B. The equation on the replicated positions This equation may be estimated by a statistical model (such as SAS/OR procedure PROC NPL), using multiple regression with the constraints 15 years αi = 1 and αi ≥ 0 i=3 months It is also possible to estimate the replicated positions (b) with the single constraint (by using the SAS/STAT procedure) 15 years αi = 1 i=3 months In both cases, the duration of the demand product is a weighted average of the durations.


Science Fictions: How Fraud, Bias, Negligence, and Hype Undermine the Search for Truth by Stuart Ritchie

Albert Einstein, anesthesia awareness, autism spectrum disorder, Bayesian statistics, Black Lives Matter, Carmen Reinhart, Cass Sunstein, Charles Babbage, citation needed, Climatic Research Unit, cognitive dissonance, complexity theory, coronavirus, correlation does not imply causation, COVID-19, crowdsourcing, data science, deindustrialization, Donald Trump, double helix, en.wikipedia.org, epigenetics, Estimating the Reproducibility of Psychological Science, fake news, Goodhart's law, Growth in a Time of Debt, Helicobacter pylori, Higgs boson, hype cycle, Kenneth Rogoff, l'esprit de l'escalier, Large Hadron Collider, meta-analysis, microbiome, Milgram experiment, mouse model, New Journalism, ocean acidification, p-value, phenotype, placebo effect, profit motive, publication bias, publish or perish, quantum entanglement, race to the bottom, randomized controlled trial, recommendation engine, rent-seeking, replication crisis, Richard Thaler, risk tolerance, Ronald Reagan, Scientific racism, selection bias, Silicon Valley, Silicon Valley startup, social distancing, Stanford prison experiment, statistical model, stem cell, Steven Pinker, TED Talk, Thomas Bayes, twin studies, Tyler Cowen, University of East Anglia, Wayback Machine

Or do you leave them in? Do you split the sample up into separate age groups, or by some other criterion? Do you merge observations from week one and week two and compare them to weeks three and four, or look at each week separately, or make some other grouping? Do you choose this particular statistical model, or that one? Precisely how many ‘control’ variables do you throw in? There aren’t objective answers to these kinds of questions. They depend on the specifics and context of what you’re researching, and on your perspective on statistics (which is, after all, a constantly evolving subject in itself): ask ten statisticians, and you might receive as many different answers.

– we’re looking for generalisable facts about the world (‘what is the link between taking antipsychotic drugs and schizophrenia symptoms in humans in general?’). Figure 3, below, illustrates overfitting. As you can see, we have a set of data: one measurement of rainfall is made each month across the space of a year. We want to draw a line through that data that describes what happens to rainfall over time: the line will be our statistical model of the data. And we want to use that line to predict how much rain will fall in each month next year. The laziest possible solution is just to try a straight line, as in graph 3A – but it looks almost nothing like the data: if we tried to use that line to predict the next year’s measurements, forecasting the exact same amount of rain for every month, we’d do a terribly inaccurate job.

For the American Statistical Association’s consensus position on p-values, written surprisingly comprehensibly, see Ronald L. Wasserstein & Nicole A. Lazar, ‘The ASA Statement on p-Values: Context, Process, and Purpose’, The American Statistician 70, no. 2 (2 April 2016): pp. 129–33; https://doi.org/10.1080/0003130 5.2016.1154108. It defines the p-value like this: ‘the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value: p. 131. 18.  Why does the definition of the p-value (‘how likely is it that pure noise would give you results like the ones you have, or ones with an even larger effect’) have that ‘or an even larger effect’ clause in it?


pages: 233 words: 67,596

Competing on Analytics: The New Science of Winning by Thomas H. Davenport, Jeanne G. Harris

always be closing, Apollo 13, big data - Walmart - Pop Tarts, business intelligence, business logic, business process, call centre, commoditize, data acquisition, digital map, en.wikipedia.org, fulfillment center, global supply chain, Great Leap Forward, high net worth, if you build it, they will come, intangible asset, inventory management, iterative process, Jeff Bezos, job satisfaction, knapsack problem, late fees, linear programming, Moneyball by Michael Lewis explains big data, Netflix Prize, new economy, performance metric, personalized medicine, quantitative hedge fund, quantitative trading / quantitative finance, recommendation engine, RFID, search inside the book, shareholder value, six sigma, statistical model, supply-chain management, text mining, The future is already here, the long tail, the scientific method, traveling salesman, yield management

As more tangible benefits began to appear, the CEO’s commitment to competing on analytics grew. In his letter to shareholders, he described the growing importance of analytics and a new growth initiative to “outsmart and outthink” the competition. Analysts expanded their work to use propensity analysis and neural nets (an artificial intelligence technology incorporating nonlinear statistical modeling to identify patterns) to target and provide specialized services to clients with both personal and corporate relationships with the bank. They also began testing some analytically enabled new services for trust clients. Today, BankCo is well on its way to becoming an analytical competitor.

They can also be used to help streamline the flow of information or products—for example, they can help employees of health care organizations decide where to send donated organs according to criteria ranging from blood type to geographic limitations. Emerging Analytical Technologies These are some of the leading-edge technologies that will play a role in analytical applications over the next few years: Text categorization is the process of using statistical models or rules to rate a document’s relevance to a certain topic. For example, text categorization can be used to dynamically evaluate competitors’ product assortments on their Web sites. Genetic algorithms are a class of stochastic optimization methods that use principles found in natural genetic reproduction (crossover or mutations of DNA structures).

Commercially purchased analytical applications usually have an interface to be used by information workers, managers, and analysts. But for proprietary analyses, the presentation tools determine how different classes of individuals can use the data. For example, a statistician could directly access a statistical model, but most managers would hesitate to do so. A new generation of visual analytical tools—from new vendors such as Spotfire and Visual Sciences and from traditional analytics providers such as SAS—allow the manipulation of data and analyses through an intuitive visual interface. A manager, for example, could look at a plot of data, exclude outlier values, and compute a regression line that fits the data—all without any statistical skills.


pages: 338 words: 104,815

Nobody's Fool: Why We Get Taken in and What We Can Do About It by Daniel Simons, Christopher Chabris

Abraham Wald, Airbnb, artificial general intelligence, Bernie Madoff, bitcoin, Bitcoin "FTX", blockchain, Boston Dynamics, butterfly effect, call centre, Carmen Reinhart, Cass Sunstein, ChatGPT, Checklist Manifesto, choice architecture, computer vision, contact tracing, coronavirus, COVID-19, cryptocurrency, DALL-E, data science, disinformation, Donald Trump, Elon Musk, en.wikipedia.org, fake news, false flag, financial thriller, forensic accounting, framing effect, George Akerlof, global pandemic, index fund, information asymmetry, information security, Internet Archive, Jeffrey Epstein, Jim Simons, John von Neumann, Keith Raniere, Kenneth Rogoff, London Whale, lone genius, longitudinal study, loss aversion, Mark Zuckerberg, meta-analysis, moral panic, multilevel marketing, Nelson Mandela, pattern recognition, Pershing Square Capital Management, pets.com, placebo effect, Ponzi scheme, power law, publication bias, randomized controlled trial, replication crisis, risk tolerance, Robert Shiller, Ronald Reagan, Rubik’s Cube, Sam Bankman-Fried, Satoshi Nakamoto, Saturday Night Live, Sharpe ratio, short selling, side hustle, Silicon Valley, Silicon Valley startup, Skype, smart transportation, sovereign wealth fund, statistical model, stem cell, Steve Jobs, sunk-cost fallacy, survivorship bias, systematic bias, TED Talk, transcontinental railway, WikiLeaks, Y2K

The platform calculates separate ratings for games played under different time limits. For regular games, in which each player has ten or more minutes in total for all their moves, lazzir’s rating had gained 1,442 rating points in eleven days—after having been almost unchanged for the previous five years. According to the statistical model underpinning the rating system, that 1,442-point gain meant that the lazzir who beat Chris would have been over a 1,000-to-1 favorite to beat the lazzir of just two weeks earlier. No one in chess gets better so consistently over such a short time window; even the fictional Beth Harmon from The Queen’s Gambit had more setbacks in her meteoric rise to the top.

Mysteriously, lazzir stopped playing on the site a couple of days later, and within months, his account was permanently closed for violating Chess.com’s “fair play” policy. The lazzir case is not an isolated one: Chess.com closes about eight hundred accounts every day for cheating, often because their behavior too closely matches statistical models of what a nonhuman entity would produce. An absence of noise, of the human tendency to make occasional blunders in complex situations, is a critical signal.14 COME ON, FEEL THE NOISE Most people and organizations think of noise in human behavior as a problem to eliminate. That’s the meaning of noise popularized by Daniel Kahneman, Olivier Sibony, and Cass Sunstein in their book Noise: problematic, unpredictable, or unjustified variability in performance between decisionmakers.

Lehrer, The Smarter Screen: Surprising Ways to Influence and Improve Online Behavior (New York: Portfolio, 2015), 127. The time-reversal heuristic was proposed in a blog post by Andrew Gelman, “The Time-Reversal Heuristic—a New Way to Think About a Published Finding That Is Followed Up by a Large, Preregistered Replication (in Context of Claims About Power Pose),” Statistical Modeling, Causal Inference, and Social Science, January 26, 2016 [https://statmodeling.stat.columbia.edu/2016/01/26/more-power-posing/]. 25. L. Magrath, and L. Weld, “Abusive Earnings Management and Early Warning Signs,” CPA Journal, August 2002, 50–54. Kenneth Lay’s indictment lays out the nature of the manipulation used to beat estimates [https://www.justice.gov/archive/opa/pr/2004/July/04_crm_470.htm]; he was convicted in 2006 [https://www.justice.gov/archive/opa/pr/2006/May/06_crm_328.html].


pages: 252 words: 72,473

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy by Cathy O'Neil

Affordable Care Act / Obamacare, Alan Greenspan, algorithmic bias, Bernie Madoff, big data - Walmart - Pop Tarts, call centre, Cambridge Analytica, carried interest, cloud computing, collateralized debt obligation, correlation does not imply causation, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, data science, disinformation, electronic logging device, Emanuel Derman, financial engineering, Financial Modelers Manifesto, Glass-Steagall Act, housing crisis, I will remember that I didn’t make the world, and it doesn’t satisfy my equations, Ida Tarbell, illegal immigration, Internet of things, late fees, low interest rates, machine readable, mass incarceration, medical bankruptcy, Moneyball by Michael Lewis explains big data, new economy, obamacare, Occupy movement, offshore financial centre, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price discrimination, quantitative hedge fund, Ralph Nader, RAND corporation, real-name policy, recommendation engine, Rubik’s Cube, Salesforce, Sharpe ratio, statistical model, tech worker, Tim Cook: Apple, too big to fail, Unsafe at Any Speed, Upton Sinclair, Watson beat the top human players on Jeopardy!, working poor

Their spectacular failure comes, instead, from what they chose not to count: tuition and fees. Student financing was left out of the model. This brings us to the crucial question we’ll confront time and again. What is the objective of the modeler? In this case, put yourself in the place of the editors at U.S. News in 1988. When they were building their first statistical model, how would they know when it worked? Well, it would start out with a lot more credibility if it reflected the established hierarchy. If Harvard, Stanford, Princeton, and Yale came out on top, it would seem to validate their model, replicating the informal models that they and their customers carried in their own heads.

A child places her finger on the stove, feels pain, and masters for the rest of her life the correlation between the hot metal and her throbbing hand. And she also picks up the word for it: burn. A machine learning program, by contrast, will often require millions or billions of data points to create its statistical models of cause and effect. But for the first time in history, those petabytes of data are now readily available, along with powerful computers to process them. And for many jobs, machine learning proves to be more flexible and nuanced than the traditional programs governed by rules. Language scientists, for example, spent decades, from the 1960s to the early years of this century, trying to teach computers how to read.

Probably not a model trained on such demographic and behavioral data. I should note that in the statistical universe proxies inhabit, they often work. More times than not, birds of a feather do fly together. Rich people buy cruises and BMWs. All too often, poor people need a payday loan. And since these statistical models appear to work much of the time, efficiency rises and profits surge. Investors double down on scientific systems that can place thousands of people into what appear to be the correct buckets. It’s the triumph of Big Data. And what about the person who is misunderstood and placed in the wrong bucket?


pages: 250 words: 79,360

Escape From Model Land: How Mathematical Models Can Lead Us Astray and What We Can Do About It by Erica Thompson

Alan Greenspan, Bayesian statistics, behavioural economics, Big Tech, Black Swan, butterfly effect, carbon tax, coronavirus, correlation does not imply causation, COVID-19, data is the new oil, data science, decarbonisation, DeepMind, Donald Trump, Drosophila, Emanuel Derman, Financial Modelers Manifesto, fudge factor, germ theory of disease, global pandemic, hindcast, I will remember that I didn’t make the world, and it doesn’t satisfy my equations, implied volatility, Intergovernmental Panel on Climate Change (IPCC), John von Neumann, junk bonds, Kim Stanley Robinson, lockdown, Long Term Capital Management, moral hazard, mouse model, Myron Scholes, Nate Silver, Neal Stephenson, negative emissions, paperclip maximiser, precautionary principle, RAND corporation, random walk, risk tolerance, selection bias, self-driving car, social distancing, Stanford marshmallow experiment, statistical model, systematic bias, tacit knowledge, tail risk, TED Talk, The Great Moderation, The Great Resignation, the scientific method, too big to fail, trolley problem, value at risk, volatility smile, Y2K

Such models can now write poetry, answer questions, compose articles and hold conversations. They do this by scraping a huge archive of text produced by humans – basically most of the content of the internet plus a lot of books, probably with obviously offensive words removed – and creating statistical models that link one word with the probability of the next word given a context. And they do it remarkably well, to the extent that it is occasionally difficult to tell whether text has been composed by a human or by a language model. Bender, Gebru and colleagues point out some of the problems with this.

I think the most coherent argument here is that we often need to impose as much structure on the model as we can, to represent the areas in which we do genuinely have physical confidence, in order to avoid overfitting. If we are willing to calibrate everything with respect to data, then we will end up with a glorified statistical model overfitted to that data rather than something that reflects our expert judgement about the underlying mechanisms involved. If I can’t make a reasonable model without requiring that π=4 or without violating conservation of mass, then there must be something seriously wrong with my other assumptions.

Second, even if we had 200+ years of past data, we are unsure whether the conditions that generate flood losses have remained the same: perhaps flood barriers have been erected; perhaps a new development has been built on the flood plain; perhaps agricultural practices upstream have changed; perhaps extreme rainfall events have become more common. Our simple statistical model of an extreme flood will have to change to take all this into account. Given all of those factors, our calculation of a 1-in-200-year flood event will probably come with a considerable level of uncertainty, and that’s before we start worrying about correcting for returns on investment, inflation or other changes in valuation.


pages: 197 words: 35,256

NumPy Cookbook by Ivan Idris

business intelligence, cloud computing, computer vision, data science, Debian, en.wikipedia.org, Eratosthenes, mandelbrot fractal, p-value, power law, sorting algorithm, statistical model, transaction costs, web application

If not specified, first-order differences are computed. log Calculates the natural log of elements in a NumPy array. sum Sums the elements of a NumPy array. dot Does matrix multiplication for 2D arrays. Calculates the inner product for 1D arrays. Installing scikits-statsmodels The scikits-statsmodels package focuses on statistical modeling. It can be integrated with NumPy and Pandas (more about Pandas later in this chapter). How to do it... Source and binaries can be downloaded from http://statsmodels.sourceforge.net/install.html . If you are installing from source, you need to run the following command: python setup.py install If you are using setuptools, the command is: easy_install statsmodels Performing a normality test with scikits-statsmodels The scikits-statsmodels package has lots of statistical tests.

The data in the Dataset class of statsmodels follows a special format. Among others, this class has the endog and exog attributes. Statsmodels has a load function, which loads data as NumPy arrays. Instead, we used the load_pandas method, which loads data as Pandas objects. We did an OLS fit, basically giving us a statistical model for copper price and consumption. Resampling time series data In this tutorial, we will learn how to resample time series with Pandas. How to do it... We will download the daily price time series data for AAPL, and resample it to monthly data by computing the mean. We will accomplish this by creating a Pandas DataFrame, and calling its resample method.


pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, backpropagation, basic income, Bayesian statistics, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Charles Babbage, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, creative destruction, crowdsourcing, Danny Hillis, data is not the new oil, data is the new oil, data science, deep learning, DeepMind, double helix, Douglas Hofstadter, driverless car, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, Geoffrey Hinton, global village, Google Glasses, Gödel, Escher, Bach, Hans Moravec, incognito mode, information retrieval, Jeff Hawkins, job automation, John Markoff, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, large language model, lone genius, machine translation, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, Nick Bostrom, NP-complete, off grid, P = NP, PageRank, pattern recognition, phenotype, planetary scale, power law, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, scientific worldview, Second Machine Age, self-driving car, Silicon Valley, social intelligence, speech recognition, Stanford marshmallow experiment, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the long tail, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight, yottabyte, zero-sum game

we can ask, “What is the algorithm that produces this output?” We will soon see how to turn this insight into concrete learning algorithms. Some learners learn knowledge, and some learn skills. “All humans are mortal” is a piece of knowledge. Riding a bicycle is a skill. In machine learning, knowledge is often in the form of statistical models, because most knowledge is statistical: all humans are mortal, but only 4 percent are Americans. Skills are often in the form of procedures: if the road curves left, turn the wheel left; if a deer jumps in front of you, slam on the brakes. (Unfortunately, as of this writing Google’s self-driving cars still confuse windblown plastic bags with deer.)

If you can tell which e-mails are spam, you know which ones to delete. If you can tell how good a board position in chess is, you know which move to make (the one that leads to the best position). Machine learning takes many different forms and goes by many different names: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analytics, data science, adaptive systems, self-organizing systems, and more. Each of these is used by different communities and has different associations. Some have a long half-life, some less so. In this book I use the term machine learning to refer broadly to all of them.

They called this scheme the EM algorithm, where the E stands for expectation (inferring the expected probabilities) and the M for maximization (estimating the maximum-likelihood parameters). They also showed that many previous algorithms were special cases of EM. For example, to learn hidden Markov models, we alternate between inferring the hidden states and estimating the transition and observation probabilities based on them. Whenever we want to learn a statistical model but are missing some crucial information (e.g., the classes of the examples), we can use EM. This makes it one of the most popular algorithms in all of machine learning. You might have noticed a certain resemblance between k-means and EM, in that they both alternate between assigning entities to clusters and updating the clusters’ descriptions.


pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence by John Brockman

Adam Curtis, agricultural Revolution, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Anthropocene, artificial general intelligence, augmented reality, autism spectrum disorder, autonomous vehicles, backpropagation, basic income, behavioural economics, bitcoin, blockchain, bread and circuses, Charles Babbage, clean water, cognitive dissonance, Colonization of Mars, complexity theory, computer age, computer vision, constrained optimization, corporate personhood, cosmological principle, cryptocurrency, cuban missile crisis, Danny Hillis, dark matter, data science, deep learning, DeepMind, Demis Hassabis, digital capitalism, digital divide, digital rights, discrete time, Douglas Engelbart, driverless car, Elon Musk, Emanuel Derman, endowment effect, epigenetics, Ernest Rutherford, experimental economics, financial engineering, Flash crash, friendly AI, functional fixedness, global pandemic, Google Glasses, Great Leap Forward, Hans Moravec, hive mind, Ian Bogost, income inequality, information trail, Internet of things, invention of writing, iterative process, James Webb Space Telescope, Jaron Lanier, job automation, Johannes Kepler, John Markoff, John von Neumann, Kevin Kelly, knowledge worker, Large Hadron Collider, lolcat, loose coupling, machine translation, microbiome, mirror neurons, Moneyball by Michael Lewis explains big data, Mustafa Suleyman, natural language processing, Network effects, Nick Bostrom, Norbert Wiener, paperclip maximiser, pattern recognition, Peter Singer: altruism, phenotype, planetary scale, Ray Kurzweil, Recombinant DNA, recommendation engine, Republic of Letters, RFID, Richard Thaler, Rory Sutherland, Satyajit Das, Search for Extraterrestrial Intelligence, self-driving car, sharing economy, Silicon Valley, Skype, smart contracts, social intelligence, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, superintelligent machines, supervolcano, synthetic biology, systems thinking, tacit knowledge, TED Talk, the scientific method, The Wisdom of Crowds, theory of mind, Thorstein Veblen, too big to fail, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, We are as Gods, Y2K

At the University of Chicago Booth School of Business, where I teach, recruiters devote endless hours to interviewing students on campus for potential jobs—a process that selects the few who will be invited to visit the employer, where they will undergo another extensive set of interviews. Yet research shows that interviews are nearly useless in predicting whether a job prospect will perform well on the job. Compared to a statistical model based on objective measures such as grades in courses relevant to the job in question, interviews primarily add noise and introduce the potential for prejudice. (Statistical models don’t favor any particular alma mater or ethnic background and cannot detect good looks.) These facts have been known for more than four decades, but hiring practices have barely budged. The reason is simple: Each of us just knows that if we are the one conducting an interview, we will learn a lot about the candidate.

There’s an algorithm for computing the optimal action for achieving a desired outcome, but it’s computationally expensive. Experiments have found that simple learning algorithms with lots of training data often outperform complex hand-crafted models. Today’s systems primarily provide value by learning better statistical models and performing statistical inference for classification and decision making. The next generation will be able to create and improve their own software and are likely to self-improve rapidly. In addition to improving productivity, AI and robotics are drivers for numerous military and economic arms races.

More disturbing to me is the stubborn reluctance in many segments of society to allow computers to take over tasks that simple models perform demonstrably better than humans. A literature pioneered by psychologists such as the late Robyn Dawes finds that virtually any routine decision-making task—detecting fraud, assessing the severity of a tumor, hiring employees—is done better by a simple statistical model than by a leading expert in the field. Let me offer just two illustrative examples, one from human-resource management and the other from the world of sports. First, let’s consider the embarrassing ubiquity of job interviews as an important, often the most important, determinant of who gets hired.


Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data by Dipanjan Sarkar

bioinformatics, business intelligence, business logic, computer vision, continuous integration, data science, deep learning, Dr. Strangelove, en.wikipedia.org, functional programming, general-purpose programming language, Guido van Rossum, information retrieval, Internet of things, invention of the printing press, iterative process, language acquisition, machine readable, machine translation, natural language processing, out of africa, performance metric, premature optimization, recommendation engine, self-driving car, semantic web, sentiment analysis, speech recognition, statistical model, text mining, Turing test, web application

Even though we have a large number of machine learning and data analysis techniques at our disposal, most of them are tuned to work with numerical data, hence we have to resort to areas like natural language processing (NLP ) and specialized techniques, transformations, and algorithms to analyze text data, or more specifically natural language, which is quite different from programming languages that are easily understood by machines. Remember that textual data, being highly unstructured, does not follow or adhere to structured or regular syntax and patterns—hence we cannot directly use mathematical or statistical models to analyze it. Before we dive into specific techniques and algorithms to analyze textual data, we will be going over some of the main concepts and theoretical principles associated with the nature of text data in this chapter. The primary intent here is to get you familiarized with concepts and domains associated with natural language understanding, processing, and text analytics.

Note here the emphasis on corpus of documents because the more diverse set of documents you have, the more topics or concepts you can generate—unlike with a single document where you will not get too many topics or concepts if it talks about a singular concept. Topic models are also often known as probabilistic statistical models, which use specific statistical techniques including singular valued decomposition and latent dirichlet allocation to discover connected latent semantic structures in text data that yield topics and concepts. They are used extensively in text analytics and even bioinformatics. Automated document summarizationis the process of using a computer program or algorithm based on statistical and ML techniques to summarize a document or corpus of documents such that we obtain a short summary that captures all the essential concepts and themes of the original document or corpus.

The end result is still in the form of some document, but with a few sentences based on the length we might want the summary to be. This is similar to having a research paper with an abstract or an executive summary. The main objective of automated document summarization is to perform this summarization without involving human inputs except for running any computer programs. Mathematical and statistical models help in building and automating the task of summarizing documents by observing their content and context. There are mainly two broad approaches towards document summarization using automated techniques: Extraction-based techniques: These methods use mathematical and statistical concepts like SVD to extract some key subset of content from the original document such that this subset of content contains the core information and acts as the focal point of the entire document.


pages: 447 words: 104,258

Mathematics of the Financial Markets: Financial Instruments and Derivatives Modelling, Valuation and Risk Issues by Alain Ruttiens

algorithmic trading, asset allocation, asset-backed security, backtesting, banking crisis, Black Swan, Black-Scholes formula, Bob Litterman, book value, Brownian motion, capital asset pricing model, collateralized debt obligation, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, delta neutral, discounted cash flows, discrete time, diversification, financial engineering, fixed income, implied volatility, interest rate derivative, interest rate swap, low interest rates, managed futures, margin call, market microstructure, martingale, p-value, passive investing, proprietary trading, quantitative trading / quantitative finance, random walk, risk free rate, risk/return, Satyajit Das, seminal paper, Sharpe ratio, short selling, statistical model, stochastic process, stochastic volatility, time value of money, transaction costs, value at risk, volatility smile, Wiener process, yield curve, zero-coupon bond

FABOZZI, The Mathematics of Financial Modeling and Investment Management, John Wiley & Sons, Inc., Hoboken, 2004, 800 p. Lawrence GALITZ, Financial Times Handbook of Financial Engineering, FT Press, 3rd ed. Scheduled on November 2011, 480 p. Philippe JORION, Financial Risk Manager Handbook, John Wiley & Sons, Inc., Hoboken, 5th ed., 2009, 752 p. Tze Leung LAI, Haipeng XING, Statistical Models and Methods for Financial Markets, Springer, 2008, 374 p. David RUPPERT, Statistics and Finance, An Introduction, Springer, 2004, 482 p. Dan STEFANICA, A Primer for the Mathematics of Financial Engineering, FE Press, 2011, 352 p. Robert STEINER, Mastering Financial Calculations, FT Prentice Hall, 1997, 400 p.

More generally, Jarrow has developed some general but very useful considerations about model risk in an article devoted to risk management models, but valid for any kind of (financial) mathematical model.17 In his article, Jarrow is distinguishing between statistical and theoretical models: the former ones refer to modeling a market price or return evolution, based on historical data, such as a GARCH model. What is usually developed as “quantitative models” by some fund or portfolio managers, also belong to statistical models. On the other hand, theoretical models aim to evidence some causality based on a financial/economic reasoning, for example the Black–Scholes formula. Both types of model imply some assumptions: Jarrow distinguishes between robust and non-robust assumptions, depending on the size of the impact when the assumption is slightly modified.

MAILLET (eds), Multi-Moment Asset Allocation and Pricing Models, John Wiley & Sons, Ltd, Chichester, 2006, 233 p. Ioannis KARATZAS, Steven E. SHREVE, Methods of Mathematical Finance, Springer, 2010, 430 p. Donna KLINE, Fundamentals of the Futures Market, McGraw-Hill, 2000, 256 p. Tze Leung LAI, Haipeng XING, Statistical Models and Methods for Financial Markets, Springer, 2008, 374 p. Raymond M. LEUTHOLD, Joan C. JUNKUS, Jean E. CORDIER, The Theory and Practice of Futures Markets, Stipes Publishing, 1999, 410 p. Bob LITTERMAN, Modern Investment Management – An Equilibrium Approach, John Wiley & Sons, Inc., Hoboken, 2003, 624 p.


pages: 518 words: 147,036

The Fissured Workplace by David Weil

"Friedman doctrine" OR "shareholder theory", accounting loophole / creative accounting, affirmative action, Affordable Care Act / Obamacare, banking crisis, barriers to entry, behavioural economics, business cycle, business process, buy and hold, call centre, Carmen Reinhart, Cass Sunstein, Clayton Christensen, clean water, collective bargaining, commoditize, company town, corporate governance, corporate raider, Corrections Corporation of America, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, declining real wages, employer provided health coverage, Frank Levy and Richard Murnane: The New Division of Labor, George Akerlof, global supply chain, global value chain, hiring and firing, income inequality, independent contractor, information asymmetry, intermodal, inventory management, Jane Jacobs, Kenneth Rogoff, law of one price, long term incentive plan, loss aversion, low skilled workers, minimum wage unemployment, moral hazard, Network effects, new economy, occupational segregation, Paul Samuelson, performance metric, pre–internet, price discrimination, principal–agent problem, Rana Plaza, Richard Florida, Richard Thaler, Ronald Coase, seminal paper, shareholder value, Silicon Valley, statistical model, Steve Jobs, supply-chain management, The Death and Life of Great American Cities, The Nature of the Firm, transaction costs, Triangle Shirtwaist Factory, ultimatum game, union organizing, vertical integration, women in the workforce, yield management

Using a statistical model to predict the factors that increase the likelihood of contracting out specific types of jobs, Abraham and Taylor demonstrate that the higher the typical wage for the workforce at an establishment, the more likely that establishment will contract out its janitorial work. They also show that establishments that do any contracting out of janitorial workers tend to shift out the function entirely.36 Wages and benefits for workers employed directly versus contracted out can be compared given the significant number of people in both groups. Using statistical models that control for both observed characteristics of the workers and the places in which they work, several studies directly compare the wages and benefits for these occupations.

That competition (and franchising only indirectly) might lead them to have higher incentives to not comply. Alternatively, company-owned outlets might be in locations with stronger consumer markets, higher-skilled workers, or lower crime rates, all of which might also be associated with compliance. To adequately account for these problems, statistical models that consider all of the potentially relevant factors, including franchise status, are generated to predict compliance levels. By doing so, the effect of franchising can be examined, holding other factors constant. This allows measurement of the impact on compliance of an outlet being run by a franchisee with otherwise identical features, as opposed to a company-owned outlet.

This narrative is based on Federal Mine Safety and Health Review Commission, Secretary of Labor MSHA v. Ember Contracting Corporation, Office of Administrative Law Judges, November 4, 2011. I am grateful to Greg Wagner for flagging this case and to Andrew Razov for additional research on it. 26. These estimates are based on quarterly mining data from 2000–2010. Using statistical modeling techniques, two different measures of traumatic injuries and a direct measure of fatality rates are associated with contracting status of the mine operator as well as other explanatory factors, including mining method, physical attributes of the mine, union status, size of operations, year, and location.


pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić

Albert Einstein, algorithmic bias, backpropagation, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

Throughout this book, we will simply call this subset of population data set, to eliminate confusion between the two definitions of sample: one (explained earlier) denoting the description of a single entity in the population, and the other (given here) referring to the subset of a population. From a given data set, we build a statistical model of the population that will help us to make inferences concerning that same population. If our inferences from the data set are to be valid, we must obtain samples that are representative of the population. Very often, we are tempted to choose a data set by selecting the most convenient members of the population.

Generalized linear regression models are currently the most frequently applied statistical techniques. They are used to describe the relationship between the trend of one variable and the values taken by several other variables. Modeling this type of relationship is often called linear regression. Fitting models is not the only task in statistical modeling. We often want to select one of several possible models as being the most appropriate. An objective method for choosing between different models is called ANOVA, and it is described in Section 5.5. The relationship that fits a set of data is characterized by a prediction model called a regression equation.

All these ideas are still in their infancy, and we expect that the next generation of text-mining techniques and tools will improve the quality of information and knowledge discovery from text. 11.7 LATENT SEMANTIC ANALYSIS (LSA) LSA is a method that was originally developed to improve the accuracy and effectiveness of IR techniques by focusing on semantic meaning of words across a series of usage contexts, as opposed to using simple string-matching operations. LSA is a way of partitioning free text using a statistical model of word usage that is similar to eigenvector decomposition and factor analysis. Rather than focusing on superficial features such as word frequency, this approach provides a quantitative measure of semantic similarities among documents based on a word’s context. Two major shortcomings to the use of term counts are synonyms and polysemes.


pages: 451 words: 103,606

Machine Learning for Hackers by Drew Conway, John Myles White

call centre, centre right, correlation does not imply causation, data science, Debian, Erdős number, Nate Silver, natural language processing, Netflix Prize, off-by-one error, p-value, pattern recognition, Paul Erdős, recommendation engine, social graph, SpamAssassin, statistical model, text mining, the scientific method, traveling salesman

Knowing the number of nonzero coefficients is useful because many people would like to be able to assert that only a few inputs really matter, and we can assert this more confidently if the model performs well even when assigning zero weight to many of the inputs. When the majority of the inputs to a statistical model are assigned zero coefficients, we say that the model is sparse. Developing tools for promoting sparsity in statistical models is a major topic in contemporary machine learning research. The second column, %Dev, is essentially the R2 for this model. For the top row, it’s 0% because you have a zero coefficient for the one input variable and therefore can’t get better performance than just using a constant intercept.

This is easiest to see in a residuals plot, as shown in panel C of Figure 6-1. In this plot, you can see all of the structure of the original data set, as none of the structure is captured by the default linear regression model. Using ggplot2’s geom_smooth function without any method argument, we can fit a more complex statistical model called a Generalized Additive Model (or GAM for short) that provides a smooth, nonlinear representation of the structure in our data: set.seed(1) x <- seq(-10, 10, by = 0.01) y <- 1 - x ⋀ 2 + rnorm(length(x), 0, 5) ggplot(data.frame(X = x, Y = y), aes(x = X, y = Y)) + geom_point() + geom_smooth(se = FALSE) The result, shown in panel D of Figure 6-1, lets us immediately see that we want to fit a curved line instead of a straight line to this data set.


Quantitative Trading: How to Build Your Own Algorithmic Trading Business by Ernie Chan

algorithmic trading, asset allocation, automated trading system, backtesting, Bear Stearns, Black Monday: stock market crash in 1987, Black Swan, book value, Brownian motion, business continuity plan, buy and hold, classic study, compound rate of return, Edward Thorp, Elliott wave, endowment effect, financial engineering, fixed income, general-purpose programming language, index fund, Jim Simons, John Markoff, Long Term Capital Management, loss aversion, p-value, paper trading, price discovery process, proprietary trading, quantitative hedge fund, quantitative trading / quantitative finance, random walk, Ray Kurzweil, Renaissance Technologies, risk free rate, risk-adjusted returns, Sharpe ratio, short selling, statistical arbitrage, statistical model, survivorship bias, systematic trading, transaction costs

Data-Snooping Bias In Chapter 2, I mentioned data-snooping bias—the danger that backtest performance is inflated relative to the future performance of the strategy because we have overoptimized the parameters of the model based on transient noise in the historical data. Data snooping bias is pervasive in the business of predictive statistical models of historical data, but is especially serious in finance because of the limited amount of independent data we have. High-frequency data, while in abundant supply, is useful only for high-frequency models. And while we have stock market data stretching back to the early parts of the twentieth century, only data within the past 10 years are really suitable for building predictive model.

He also co-manages EXP Quantitative Investments, LLC and publishes the Quantitative Trading blog (epchan.blogspot.com), which is syndicated to multiple financial news services including www.tradingmarkets.com and Yahoo! Finance. He has been quoted by the New York Times and CIO magazine on quantitative hedge funds, and has appeared on CNBC’s Closing Bell. Ernie is an expert in developing statistical models and advanced computer algorithms to discover patterns and trends from large quantities of data. He was a researcher in computer science at IBM’s T. J. Watson Research Center, in data mining at Morgan Stanley, and in statistical arbitrage trading at Credit Suisse. He has also been a senior quantitative strategist and trader at various hedge funds, with sizes ranging from millions to billions of dollars.


pages: 442 words: 39,064

Why Stock Markets Crash: Critical Events in Complex Financial Systems by Didier Sornette

Alan Greenspan, Asian financial crisis, asset allocation, behavioural economics, Berlin Wall, Black Monday: stock market crash in 1987, Bretton Woods, Brownian motion, business cycle, buy and hold, buy the rumour, sell the news, capital asset pricing model, capital controls, continuous double auction, currency peg, Deng Xiaoping, discrete time, diversified portfolio, Elliott wave, Erdős number, experimental economics, financial engineering, financial innovation, floating exchange rates, frictionless, frictionless market, full employment, global village, implied volatility, index fund, information asymmetry, intangible asset, invisible hand, John von Neumann, joint-stock company, law of one price, Louis Bachelier, low interest rates, mandelbrot fractal, margin call, market bubble, market clearing, market design, market fundamentalism, mental accounting, moral hazard, Network effects, new economy, oil shock, open economy, pattern recognition, Paul Erdős, Paul Samuelson, power law, quantitative trading / quantitative finance, random walk, risk/return, Ronald Reagan, Schrödinger's Cat, selection bias, short selling, Silicon Valley, South Sea Bubble, statistical model, stochastic process, stocks for the long run, Tacoma Narrows Bridge, technological singularity, The Coming Technological Singularity, The Wealth of Nations by Adam Smith, Tobin tax, total factor productivity, transaction costs, tulip mania, VA Linux, Y2K, yield curve

For this purpose, I shall describe a new set of computational methods that are capable of searching and comparing patterns, simultaneously and iteratively, at multiple scales in hierarchical systems. I shall use these patterns to improve the understanding of the dynamical state before and after a financial crash and to enhance the statistical modeling of social hierarchical systems with the goal of developing reliable forecasting skills for these large-scale financial crashes. IS PREDICTION POSSIBLE? A WORKING HYPOTHESIS With the low of 3227 on April 17, 2000, identified as the end of the “crash,” the Nasdaq Composite index lost in five weeks over 37% of its all-time high of 5133 reached on March 10, 2000.

In reality, the three crashes occurred in less than one century. This result is a first indication that the exponential model may not apply for the large crashes. As an additional test, 10,000 so-called synthetic data sets, each covering a time span close to a century, hence adding up to about 1 million years, was generated using a standard statistical model used by the financial industry [46]. We use the model version GARCH(1,1) estimated from the true index with a student distribution with four degrees of freedom. This model includes both nonstationarity of volatilities (the amplitude of price variations) and the (fat tail) nature of the distribution of the price returns seen in Figure 2.7.

More recently, Feigenbaum has examined the first differences for the logarithm of the S&P 500 from 1980 to 1987 and finds that he cannot reject the log-periodic component at the 95% confidence level [127]: in plain words, this means that the probability that the log-periodic component results from chance is about or less than one in twenty. To test furthermore the solidity of the advanced log-periodic hypothesis, Johansen, Ledoit, and I [209] tested whether the null hypothesis that a standard statistical model of financial markets, called the GARCH(1,1) model with Student-distributed noise, could “explain” the presence of log-periodicity. In the 1,000 surrogate data sets of length 400 weeks generated using this GARCH(1,1) model with Student-distributed noise and analyzed as for the real crashes, only two 400-week windows qualified.


pages: 673 words: 164,804

Peer-to-Peer by Andy Oram

AltaVista, big-box store, c2.com, combinatorial explosion, commoditize, complexity theory, correlation coefficient, dark matter, Dennis Ritchie, fault tolerance, Free Software Foundation, Garrett Hardin, independent contractor, information retrieval, Kickstarter, Larry Wall, Marc Andreessen, moral hazard, Network effects, P = NP, P vs NP, p-value, packet switching, PalmPilot, peer-to-peer, peer-to-peer model, Ponzi scheme, power law, radical decentralization, rolodex, Ronald Coase, Search for Extraterrestrial Intelligence, semantic web, SETI@home, Silicon Valley, slashdot, statistical model, Tragedy of the Commons, UUNET, Vernor Vinge, web application, web of trust, Zimmermann PGP

If the seller does a lot of volume, she could have a higher reputation in this system than someone who trades perfectly but has less than three quarters the volume. Other reputation metrics can have high sensitivity to lies or losses of information. Other approaches to reputation are principled.[92] One of the approaches to reputation that I like is working from statistical models of behavior, in which reputation is an unbound model parameter to be determined from the feedback data, using Maximum Likelihood Estimation (MLE). MLE is a standard statistical technique: it chooses model parameters that maximize the likelihood of getting the sample data. The reputation calculation can also be performed with a Bayesian approach.

An entity’s reputation is an ideal to be estimated from the samples as measured by the different entities providing feedback points. An entity’s reputation is accompanied by an expression of the confidence or lack of confidence in the estimate. Our reputation calculator is a platform that accepts different statistical models of how entities might behave during the transaction and in providing feedback. For example, one simple model might assume that an entity’s performance rating follows a normal distribution (bell) curve with some average and standard deviation. To make things even simpler, one can assume that feedback is always given honestly and with no bias.

Internet, Reputation reducing risk in transactions, Examples of using the Reputation Server referrals, Reputation systems risking by trying out new nodes, Bootstrapping separating into categories to defend against shilling, Scoring algorithms tracking mechanisms, Social solutions: Engineer polite behavior trust and, Trust in real life, and its lessons for computer networks using statistical models of behavior to calculate, Reputation metrics–Reputation metrics reputation domains, Reputation domains, entities, and multidimensional reputations–Reputation domains, entities, and multidimensional reputations weak vs. strong entities, Identity as an element of reputation Reputation Server, Reputation–Summary auction sites, supporting, Interdomain sharing benchmark sources, Credibility bootstrapping, obstacle to, Bootstrapping buyers and sellers benefit from, Examples of using the Reputation Server centralized vs. distributed, Central Reputation Server versus distributed Reputation Servers communicating with the marketplace, Interface to the marketplace credibility measures for sources, Credibility domains, Reputation domains, entities, and multidimensional reputations–Reputation domains, entities, and multidimensional reputations interdomain sharing, Interdomain sharing needs to know identity of entities, Identity as an element of reputation reducing risk with, Examples of using the Reputation Server references to bootstrap reputations, Bootstrapping reputation metrics, Reputation metrics–Reputation metrics scoring algorithms, Credibility scoring system, Scoring system identity properties influence, Identity as an element of reputation soliciting feedback from parties in transactions, Reputation, Interface to the marketplace reputation systems bootstrapping, Bootstrapping collecting ratings, Collecting ratings creating new identities, Bootstrapping early online, Early reputation systems online–System successes and failures evaluating the security of, Attacks and adversaries Free Haven, Reputation systems–Reputation systems needed to provide accountability, Peer-to-peer models and their impacts on accountability observing transaction flow, Collecting ratings open source development, A reputation system that resists pseudospoofing: Advogato–A reputation system that resists pseudospoofing: Advogato partially-automated (Slashdot), Who will moderate the moderators: Slashdot personalizing reputation searches, Personalizing reputation searches problems with, System successes and failures pseudospoofing, bad loophole in, Problems with pseudospoofing and possible defenses purpose of, Purposes of micropayments and reputation systems–Purposes of micropayments and reputation systems scores and ratings, Scoring systems–True decentralization usefulness, Reputations vs. micropayment schemes, Reputations Reputation Technologies, Inc.


pages: 336 words: 113,519

The Undoing Project: A Friendship That Changed Our Minds by Michael Lewis

Albert Einstein, availability heuristic, behavioural economics, Cass Sunstein, choice architecture, complexity theory, Daniel Kahneman / Amos Tversky, Donald Trump, Douglas Hofstadter, endowment effect, feminist movement, framing effect, hindsight bias, John von Neumann, Kenneth Arrow, Linda problem, loss aversion, medical residency, Menlo Park, Murray Gell-Mann, Nate Silver, New Journalism, Paul Samuelson, peak-end rule, Richard Thaler, Saturday Night Live, Skinner box, Stanford marshmallow experiment, statistical model, systematic bias, the new new thing, Thomas Bayes, Walter Mischel, Yom Kippur War

He helped hire new management, then helped to figure out how to price tickets, and, finally, inevitably, was asked to work on the problem of whom to select in the NBA draft. “How will that nineteen-year-old perform in the NBA?” was like “Where will the price of oil be in ten years?” A perfect answer didn’t exist, but statistics could get you to some answer that was at least a bit better than simply guessing. Morey already had a crude statistical model to evaluate amateur players. He’d built it on his own, just for fun. In 2003 the Celtics had encouraged him to use it to pick a player at the tail end of the draft—the 56th pick, when the players seldom amount to anything. And thus Brandon Hunter, an obscure power forward out of Ohio University, became the first player picked by an equation.* Two years later Morey got a call from a headhunter who said that the Houston Rockets were looking for a new general manager.

The closest he came to certainty was in his approach to making decisions. He never simply went with his first thought. He suggested a new definition of the nerd: a person who knows his own mind well enough to mistrust it. One of the first things Morey did after he arrived in Houston—and, to him, the most important—was to install his statistical model for predicting the future performance of basketball players. The model was also a tool for the acquisition of basketball knowledge. “Knowledge is literally prediction,” said Morey. “Knowledge is anything that increases your ability to predict the outcome. Literally everything you do you’re trying to predict the right thing.

The Indian was DeAndre Jordan all over again; he was, like most of the problems you faced in life, a puzzle, with pieces missing. The Houston Rockets would pass on him—and be shocked when the Dallas Mavericks took him in the second round of the NBA draft. Then again, you never knew.†† And that was the problem: You never knew. In Morey’s ten years of using his statistical model with the Houston Rockets, the players he’d drafted, after accounting for the draft slot in which they’d been taken, had performed better than the players drafted by three-quarters of the other NBA teams. His approach had been sufficiently effective that other NBA teams were adopting it. He could even pinpoint the moment when he felt, for the first time, imitated.


pages: 49 words: 12,968

Industrial Internet by Jon Bruner

air gap, autonomous vehicles, barriers to entry, Boeing 747, commoditize, computer vision, data acquisition, demand response, electricity market, en.wikipedia.org, factory automation, Google X / Alphabet X, industrial robot, Internet of things, job automation, loose coupling, natural language processing, performance metric, Silicon Valley, slashdot, smart grid, smart meter, statistical model, the Cathedral and the Bazaar, web application

“Imagine trying to operate a highway system if all you have are monthly traffic readings for a few spots on the road. But that’s what operating our power system was like.” The utility’s customers benefit, too — an example of the industrial internet creating value for every entity to which it’s connected. Fort Collins utility customers can see data on their electric usage through a Web portal that uses a statistical model to estimate how much electricity they’re using on heating, cooling, lighting and appliances. The site then draws building data from county records to recommend changes to insulation and other improvements that might save energy. Water meters measure usage every hour — frequent enough that officials will soon be able to dispatch inspection crews to houses whose vacationing owners might not know about a burst pipe.


pages: 1,088 words: 228,743

Expected Returns: An Investor's Guide to Harvesting Market Rewards by Antti Ilmanen

Alan Greenspan, Andrei Shleifer, asset allocation, asset-backed security, availability heuristic, backtesting, balance sheet recession, bank run, banking crisis, barriers to entry, behavioural economics, Bernie Madoff, Black Swan, Bob Litterman, bond market vigilante , book value, Bretton Woods, business cycle, buy and hold, buy low sell high, capital asset pricing model, capital controls, carbon credits, Carmen Reinhart, central bank independence, classic study, collateralized debt obligation, commoditize, commodity trading advisor, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, deal flow, debt deflation, deglobalization, delta neutral, demand response, discounted cash flows, disintermediation, diversification, diversified portfolio, dividend-yielding stocks, equity premium, equity risk premium, Eugene Fama: efficient market hypothesis, fiat currency, financial deregulation, financial innovation, financial intermediation, fixed income, Flash crash, framing effect, frictionless, frictionless market, G4S, George Akerlof, global macro, global reserve currency, Google Earth, high net worth, hindsight bias, Hyman Minsky, implied volatility, income inequality, incomplete markets, index fund, inflation targeting, information asymmetry, interest rate swap, inverted yield curve, invisible hand, John Bogle, junk bonds, Kenneth Rogoff, laissez-faire capitalism, law of one price, London Interbank Offered Rate, Long Term Capital Management, loss aversion, low interest rates, managed futures, margin call, market bubble, market clearing, market friction, market fundamentalism, market microstructure, mental accounting, merger arbitrage, mittelstand, moral hazard, Myron Scholes, negative equity, New Journalism, oil shock, p-value, passive investing, Paul Samuelson, pension time bomb, performance metric, Phillips curve, Ponzi scheme, prediction markets, price anchoring, price stability, principal–agent problem, private sector deleveraging, proprietary trading, purchasing power parity, quantitative easing, quantitative trading / quantitative finance, random walk, reserve currency, Richard Thaler, risk free rate, risk tolerance, risk-adjusted returns, risk/return, riskless arbitrage, Robert Shiller, savings glut, search costs, selection bias, seminal paper, Sharpe ratio, short selling, sovereign wealth fund, statistical arbitrage, statistical model, stochastic volatility, stock buybacks, stocks for the long run, survivorship bias, systematic trading, tail risk, The Great Moderation, The Myth of the Rational Market, too big to fail, transaction costs, tulip mania, value at risk, volatility arbitrage, volatility smile, working-age population, Y2K, yield curve, zero-coupon bond, zero-sum game

Note, though, that most academic studies rely on such in-sample relations; econometricians simply assume that any observed statistical relation between predictors and subsequent market returns was already known to rational investors in real time. Practitioners who find this assumption unrealistic try to avoid in-sample bias by selecting and/or estimating statistical models repeatedly using only data that were available at each point in time, so as to assess predictability in a quasi-out-of-sample sense, but never completely succeeding in doing so. Table 8.6. Correlations with future excess returns of the S&P 500, 1962–2009 Sources: Haver Analytics, Robert Shiller’s website, Amit Goyal’s website, own calculations.

They treat default (or rating change) as a random event whose probability can be estimated from observed market prices in the context of an analytical model (or directly from historical default data). Useful indicators, besides equity volatility and leverage, include past equity returns, certain financial ratios, and proxies for the liquidity premium. This modeling approach is sort of a compromise between statistical models and theoretically purer structural models. Reduced-form models can naturally match market spreads better than structural models, but unconstrained indicator selection can make them overfitted to in-sample data. Box 10.1. (wonkish) Risk-neutral and actual default probabilities Under certain assumptions (continuous trading, a single-factor diffusion process), positions in risky assets can be perfectly hedged and thus should earn riskless return.

However, there is some evidence of rising correlations across all quant strategies, presumably due to common positions among leveraged traders. 12.7 NOTES [1] Like many others, I prefer to use economic intuition as one guard against data mining, but the virtues of such intuition can be overstated as our intuition is inevitably influenced by past experiences. Purely data-driven statistical approaches are even worse, but at least then statistical models can help assess the magnitude of data-mining bias. [2] Here are some additional points on VMG: —No trading costs or financing costs related to shorting are subtracted from VMG returns. This is typical for academic studies because such costs are trade specific and/or investor specific and, moreover, such data are not available over long histories.


pages: 467 words: 116,094

I Think You'll Find It's a Bit More Complicated Than That by Ben Goldacre

Aaron Swartz, call centre, conceptual framework, confounding variable, correlation does not imply causation, crowdsourcing, death of newspapers, Desert Island Discs, Dr. Strangelove, drug harm reduction, en.wikipedia.org, experimental subject, Firefox, Flynn Effect, Helicobacter pylori, jimmy wales, John Snow's cholera map, Loebner Prize, meta-analysis, moral panic, nocebo, placebo effect, publication bias, selection bias, selective serotonin reuptake inhibitor (SSRI), seminal paper, Simon Singh, social distancing, statistical model, stem cell, Stephen Fry, sugar pill, the scientific method, Turing test, two and twenty, WikiLeaks

Obviously, there are no out gay people in the eighteen-to-twenty-four group who came out at an age later than twenty-four; so the average age at which people in the eighteen-to-twenty-four group came out cannot possibly be greater than the average age of that group, and certainly it will be lower than, say, thirty-seven, the average age at which people in their sixties came out. For the same reason, it’s very likely indeed that the average age of coming out will increase as the average age of each age group rises. In fact, if we assume (in formal terms we could call this a ‘statistical model’) that at any time, all the people who are out have always come out at a uniform rate between the age of ten and their current age, you would get almost exactly the same figures (you’d get fifteen, twenty-three and thirty-five, instead of seventeen, twenty-one and thirty-seven). This is almost certainly why ‘the average coming-out age has fallen by over twenty years’: in fact you could say that Stonewall’s survey has found that on average, as people get older, they get older.

The study concluded that compulsory cycle-helmet legislation may selectively reduce cycling in the second group. There are even more complex second-round effects if each individual cyclist’s safety is improved by increased cyclist density through ‘safety in numbers’, a phenomenon known as Smeed’s law. Statistical models for the overall impact of helmet habits are therefore inevitably complex and based on speculative assumptions. This complexity seems at odds with the current official BMA policy, which confidently calls for compulsory helmet legislation. Standing over all this methodological complexity is a layer of politics, culture and psychology.

A&E departments: randomised trials in 208; waiting times 73–5 abdominal aortic aneurysms (AAA) 18, 114 abortion; GPs and xviii, 89–91; Science and Technology Committee report on ‘scientific developments relating to the Abortion Act, 1967’ 196–201 academia, bad xviii–xix, 127–46; animal experiments, failures in research 136–8; brain-imaging studies report more positive findings than their numbers can support 131–4; journals, failures of academic 138–46; Medical Hypotheses: Aids denialism in 138–41; Medical Hypotheses: ‘Down Subjects and Oriental Population Share Several Specific Attitudes and Characteristics’ article 139, 141–3; Medical Hypotheses: masturbation as a treatment for nasal congestion articles 139, 143–6; misuse of statistics 129–31; retractions, academic literature and 134–6 academic journals: access to papers published in 32–4, 143; cherry-picking and 5–8; ‘citation classics’ and 9–10, 102–3, 173; commercial ghost writers and 25–6; data published in newspapers rather than 17–20; doctors and technical academic journals 214; ‘impact factor’ 143; number of 14, 17; peer review and 138–46 see also peer review; poor quality (‘crap’) 138–46; refusal to publish in 3–5; retractions and 134–6; statistical model errors in 129–31; studies of errors in papers published in 9–10, 129–31; summaries of important new research from 214–15; teaching and 214–15; youngest people to publish papers in 11–12 academic papers xvi; access to 32–4; cherry-picking from xvii, 5–8, 12, 174, 176–7, 192, 193, 252, 336, 349, 355; ‘citation classics’ 9–10, 102–3, 173; commercial ‘ghost writers’ and 25–6; investigative journalism work and 18; journalists linking work to 342, 344, 346; number of 14; peer review and see peer review; post-publication 4–5; press releases and xxi, 6, 29–31, 65, 66, 107–9, 119, 120, 121–2, 338–9, 340–2, 358–60; public relations and 358–60; publication bias 132–3, 136, 314, 315; references to other academic papers within allowing study of how ideas spread 26; refusal to publish in 3–5, 29–31; retractions and 134–6; studies of errors in 9–10, 129–31; titles of 297 Acousticom 366 acupuncture 39, 388 ADE 651 273–5 ADHD 40–2 Advertising Standards Authority (ASA) 252 Afghanistan 231; crop captures in xx, 221–4 Ahn, Professor Anna 341 Aids; antiretroviral drugs and 140, 185, 281, 284, 285; Big Pharma and 186; birth control, abortion and US Christian aid groups 185; Catholic Church fight against condom use and 183–4; cures for 12, 182–3, 185–6, 366; denialism 138–41, 182–3, 185–6, 263, 273, 281–6; drug users and 182, 183, 233–4; House of Numbers film 281–3; Medical Hypotheses, Aids denial in 138–41; needle-exchange programmes and 182, 183; number of deaths from 20, 186, 309; power of ideas and 182–7; Roger Coghill and ‘the Aids test’ 366; Spectator, Aids denialism at the xxi, 283–6; US Presidential Emergency Plan for Aids Relief 185 Aidstruth.org 139 al-Jabiri, Major General Jehad 274–5 alcohol: intravenous use of 233; lung cancer and 108–9; rape and consumption of 329, 330 ALLHAT trial 119 Alzheimer’s, smoking and 20–1 American Academy of Child and Adolescent Psychiatry 325 American Association on Mental Retardation 325 American Journal of Clinical Nutrition 344 American Medical Association 262 American Psychological Association 325 American Speech-Language-Hearing Association 325 anecdotes, illustrating data with 8, 118–22, 189, 248–9, 293 animal experiments 136–8 Annals of Internal Medicine 358 Annals of Thoracic Surgery 134 anti-depressants 18; recession linked to rise in prescriptions for xviii, 104–7; SSRI 18, 105 antiretroviral medications 140, 185, 281, 284, 285 aortic aneurysm repair, mortality rates in hospital after/during 18–20, 114 APGaylard 252 Appleby, John 19, 173 artificial intelligence xxii, 394–5 Asch, Solomon 15, 16 Asphalia 365 Associated Press 316 Astel, Professor Karl 22 ATSC 273 autism: educational interventions in 325; internet use and 3; MMR and 145, 347–55, 356–8 Autism Research Centre, Cambridge 348, 354 Bad Science (Goldacre) xvi, 104, 110n, 257, 346 Bad Science column see Guardian Ballas, Dr Dimitris 58 Barasi, Leo 96 Barden, Paul 101–4 Barnardo’s 394 Baron-Cohen, Professor Simon 349–51, 353–4 Batarim 305–6 BBC xxi; ‘bioresonance’ story and 277–8; Britain’s happiest places story and 56, 57; causes of avoidable death, overall coverage of 20; Down’s syndrome births increase story and 61–2; ‘EDF Survey Shows Support for Hinkley Power Station’ story and 95–6; psychological nature of libido problems story and 37; radiation from wi-fi networks story and 289–91, 293; recession and anti-depressant link, reports 105; Reform: The Value of Mathematics’ story and 196; ‘Threefold variation’ in UK bowel cancer rates’ story and 101–4; Wightman and 393, 394; ‘“Worrying’’ Jobless Rise Needs Urgent Action – Labour’ story and 59 Beating Bowel Cancer 101, 104 Becker muscular dystrophy 121 Bem Sex Role Inventory (BSRI) 45 Benedict XVI, Pope 183, 184 Benford’s law 54–6 bicycle helmets, the law and 110–13 big data xvii, xviii, 71–86; access to government data 75–7; care.data and risk of sharing medical records 77–86; magical way that patterns emerge from data 73–5 Big Pharma xvii, 324, 401 bin Laden, Osama 357 biologising xvii, 35–46; biological causes for psychological or behavioural conditions 40–2; brain imaging, reality of phenomena and 37–9; girls’ love of pink, evolution and 42–6 Biologist 6 BioSTAR 248 birth rate, UK 49–50 Bishop, Professor Dorothy 3, 6 bladder cancer 24–5, 342 Blair, Tony 357 Blakemore, Colin 138 blame, mistakes in medicine and 267–70 blind auditions, orchestras and xxi, 309–11 blinding, randomised testing and xviii, 12, 118, 124, 126, 133, 137–8, 292–3, 345 blood tests 117, 119–20, 282 blood-pressure drugs 119–20 Blundell, Professor John 337 BMA 112 Booth, Patricia 265 Boston Globe 39 bowel cancer 101–4 Boynton, Dr Petra 252 Brain Committee 230–1 Brain Gym 10–12 Brainiac: faking of science on xxii, 371–5 brain-imaging studies, positive findings in 131–4 breast cancer: abortion and 200–1; diet and 338–40; red wine and 267, 269; screening 113, 114, 115 breast enhancement cream xx, 254–7 Breuning, Stephen 135–6 The British Association for Applied Nutrition and Nutritional Therapy (BANT) 268–9 British Association of Nutritional Therapists 270 British Chiropractic Association (BCA) 250–4 British Dental Association 24 British Household Panel Survey 57 British Journal of Cancer: ‘What if Cancer Survival in Britain were the Same as in Europe: How Many Deaths are Avoidable?’


pages: 592 words: 125,186

The Science of Hate: How Prejudice Becomes Hate and What We Can Do to Stop It by Matthew Williams

3D printing, 4chan, affirmative action, agricultural Revolution, algorithmic bias, Black Lives Matter, Brexit referendum, Cambridge Analytica, citizen journalism, cognitive dissonance, coronavirus, COVID-19, dark matter, data science, deep learning, deindustrialization, desegregation, disinformation, Donald Trump, European colonialism, fake news, Ferguson, Missouri, Filter Bubble, gamification, George Floyd, global pandemic, illegal immigration, immigration reform, impulse control, income inequality, longitudinal study, low skilled workers, Mark Zuckerberg, meta-analysis, microaggression, Milgram experiment, Oklahoma City bombing, OpenAI, Overton Window, power law, selection bias, Snapchat, statistical model, The Turner Diaries, theory of mind, TikTok, twin studies, white flight

The combination of unemployed locals and an abundance of employed migrants, competing for scarce resources in a time of recession and cutbacks, creates a greater feeling of ‘us’ versus ‘them’. A lack of inter-cultural interactions and understanding between the local and migrant populations results in rising tensions. Combined with the galvanising effect of the referendum result, these factors create the perfect conditions for hate crime to flourish. In our analysis we used statistical models that take account of a number of factors known to have an effect on hate crimes. In each of the populations of the forty-three police force areas of England and Wales we measured the unemployment rate, average income, educational attainment, health deprivation, general crime rate, barriers to housing and services, quality of living, rate of migrant inflow, and Leave vote share.

This psychosocial criminological approach to behaviour can be especially useful in understanding crimes caused in part by the senses of grievance and frustration, such as terrorism and hate crime.1 ‘Instrumental’ crimes, such as burglary and theft, can often be understood as a product of wider social and economic forces. Economic downturns, cuts to state benefits, widespread unemployment, increases in school expulsions, income inequality and poor rental housing stock can all combine to explain much of the variance (the total amount that can be explained in a statistical model) in the propensity of someone to burgle a home or shoplift.*2 Their commission is often rational – ‘I have no money, it’s easier to get it illegitimately than legitimately, and the chances of getting caught are low.’ But these ‘big issue’ drivers do not explain so much of the variance in hate crimes.

. ** A macroeconomic panel regression technique was used by G. Edwards and S. Rushin (‘The Effect of President Trump’s Election on Hate Crimes’, SSRN, 18 January 2018) to rule out a wide range of the most likely alternative explanations for the dramatic increase in hate crimes in the fourth quarter of 2016. While a powerful statistical model, it cannot account for all possible explanations for the rise. To do so, a ‘true experiment’ is required in which one location at random is subjected to the ‘Trump effect’, while another control location is not. As the 2016 presidential election affected all US jurisdictions, there is simply no way of running a true experiment, meaning we cannot say with absolute certainty that Trump’s rise to power caused a rise in hate crimes.


pages: 586 words: 186,548

Architects of Intelligence by Martin Ford

3D printing, agricultural Revolution, AI winter, algorithmic bias, Alignment Problem, AlphaGo, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, barriers to entry, basic income, Baxter: Rethink Robotics, Bayesian statistics, Big Tech, bitcoin, Boeing 747, Boston Dynamics, business intelligence, business process, call centre, Cambridge Analytica, cloud computing, cognitive bias, Colonization of Mars, computer vision, Computing Machinery and Intelligence, correlation does not imply causation, CRISPR, crowdsourcing, DARPA: Urban Challenge, data science, deep learning, DeepMind, Demis Hassabis, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, driverless car, Elon Musk, Erik Brynjolfsson, Ernest Rutherford, fake news, Fellow of the Royal Society, Flash crash, future of work, general purpose technology, Geoffrey Hinton, gig economy, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, Hans Rosling, hype cycle, ImageNet competition, income inequality, industrial research laboratory, industrial robot, information retrieval, job automation, John von Neumann, Large Hadron Collider, Law of Accelerating Returns, life extension, Loebner Prize, machine translation, Mark Zuckerberg, Mars Rover, means of production, Mitch Kapor, Mustafa Suleyman, natural language processing, new economy, Nick Bostrom, OpenAI, opioid epidemic / opioid crisis, optical character recognition, paperclip maximiser, pattern recognition, phenotype, Productivity paradox, radical life extension, Ray Kurzweil, recommendation engine, Robert Gordon, Rodney Brooks, Sam Altman, self-driving car, seminal paper, sensor fusion, sentiment analysis, Silicon Valley, smart cities, social intelligence, sparse data, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, synthetic biology, systems thinking, Ted Kaczynski, TED Talk, The Rise and Fall of American Growth, theory of mind, Thomas Bayes, Travis Kalanick, Turing test, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, women in the workforce, working-age population, workplace surveillance , zero-sum game, Zipcar

We do find that people, including myself, have all kinds of speculations about the future, but as a scientist, I like to base my conclusions on the specific data that we’ve seen. And what we’ve seen is people using deep learning as high-capacity statistical models. High capacity is just some jargon that means that the model keeps getting better and better the more data you throw at it. Statistical models that at their core are based on matrices of numbers being multiplied, and added, and subtracted, and so on. They are a long way from something where you can see common sense or consciousness emerging. My feeling is that there’s no data to support these claims and if such data appears, I’ll be very excited, but I haven’t seen it yet.

From 1996 to 1999, he worked for Digital Equipment Corporation’s Western Research Lab in Palo Alto, where he worked on low-overhead profiling tools, design of profiling hardware for out-of-order microprocessors, and web-based information retrieval. From 1990 to 1991, Jeff worked for the World Health Organization’s Global Programme on AIDS, developing software to do statistical modeling, forecasting, and analysis of the HIV pandemic. In 2009, Jeff was elected to the National Academy of Engineering, and he was also named a Fellow of the Association for Computing Machinery (ACM) and a Fellow of the American Association for the Advancement of Sciences (AAAS). His areas of interest include large-scale distributed systems, performance monitoring, compression techniques, information retrieval, application of machine learning to search and other related problems, microprocessor architecture, compiler optimizations, and development of new products that organize existing information in new and interesting ways.

I went to Berkeley as a postdoc, and there I started to really think about how what I was doing was relevant to actual problems that people cared about, as opposed to just being mathematically elegant. That was the first time I started to get into machine learning. I then returned to Stanford as faculty in 1995 where I started to work on areas relating to statistical modeling and machine learning. I began studying applied problems where machine learning could really make a difference. I worked in computer vision, in robotics, and from 2000 on biology and health data. I also had an ongoing interest in technology-enabled education, which led to a lot of experimentation at Stanford into ways in which we could offer an enhanced learning experience.


pages: 619 words: 177,548

Power and Progress: Our Thousand-Year Struggle Over Technology and Prosperity by Daron Acemoglu, Simon Johnson

"Friedman doctrine" OR "shareholder theory", "World Economic Forum" Davos, 4chan, agricultural Revolution, AI winter, Airbnb, airline deregulation, algorithmic bias, algorithmic management, Alignment Problem, AlphaGo, An Inconvenient Truth, artificial general intelligence, augmented reality, basic income, Bellingcat, Bernie Sanders, Big Tech, Bletchley Park, blue-collar work, British Empire, carbon footprint, carbon tax, carried interest, centre right, Charles Babbage, ChatGPT, Clayton Christensen, clean water, cloud computing, collapse of Lehman Brothers, collective bargaining, computer age, Computer Lib, Computing Machinery and Intelligence, conceptual framework, contact tracing, Corn Laws, Cornelius Vanderbilt, coronavirus, corporate social responsibility, correlation does not imply causation, cotton gin, COVID-19, creative destruction, declining real wages, deep learning, DeepMind, deindustrialization, Demis Hassabis, Deng Xiaoping, deskilling, discovery of the americas, disinformation, Donald Trump, Douglas Engelbart, Douglas Engelbart, Edward Snowden, Elon Musk, en.wikipedia.org, energy transition, Erik Brynjolfsson, European colonialism, everywhere but in the productivity statistics, factory automation, facts on the ground, fake news, Filter Bubble, financial innovation, Ford Model T, Ford paid five dollars a day, fulfillment center, full employment, future of work, gender pay gap, general purpose technology, Geoffrey Hinton, global supply chain, Gordon Gekko, GPT-3, Grace Hopper, Hacker Ethic, Ida Tarbell, illegal immigration, income inequality, indoor plumbing, industrial robot, interchangeable parts, invisible hand, Isaac Newton, Jacques de Vaucanson, James Watt: steam engine, Jaron Lanier, Jeff Bezos, job automation, Johannes Kepler, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph-Marie Jacquard, Kenneth Arrow, Kevin Roose, Kickstarter, knowledge economy, labor-force participation, land reform, land tenure, Les Trente Glorieuses, low skilled workers, low-wage service sector, M-Pesa, manufacturing employment, Marc Andreessen, Mark Zuckerberg, megacity, mobile money, Mother of all demos, move fast and break things, natural language processing, Neolithic agricultural revolution, Norbert Wiener, NSO Group, offshore financial centre, OpenAI, PageRank, Panopticon Jeremy Bentham, paperclip maximiser, pattern recognition, Paul Graham, Peter Thiel, Productivity paradox, profit maximization, profit motive, QAnon, Ralph Nader, Ray Kurzweil, recommendation engine, ride hailing / ride sharing, Robert Bork, Robert Gordon, Robert Solow, robotic process automation, Ronald Reagan, scientific management, Second Machine Age, self-driving car, seminal paper, shareholder value, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, social intelligence, Social Responsibility of Business Is to Increase Its Profits, social web, South Sea Bubble, speech recognition, spice trade, statistical model, stem cell, Steve Jobs, Steve Wozniak, strikebreaker, subscription business, Suez canal 1869, Suez crisis 1956, supply-chain management, surveillance capitalism, tacit knowledge, tech billionaire, technoutopianism, Ted Nelson, TED Talk, The Future of Employment, The Rise and Fall of American Growth, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, theory of mind, Thomas Malthus, too big to fail, total factor productivity, trade route, transatlantic slave trade, trickle-down economics, Turing machine, Turing test, Twitter Arab Spring, Two Sigma, Tyler Cowen, Tyler Cowen: Great Stagnation, union organizing, universal basic income, Unsafe at Any Speed, Upton Sinclair, upwardly mobile, W. E. B. Du Bois, War on Poverty, WikiLeaks, wikimedia commons, working poor, working-age population

The modern approach bypasses the step of modeling or even understanding how humans make decisions. Instead, it relies on a large data set of humans making correct recognition decisions based on images. It then fits a statistical model to large data sets of image features to predict when humans say that there is a cat in the frame. It subsequently applies the estimated statistical model to new pictures to predict whether there is a cat there or not. Progress was made possible by faster computer processor speed, as well as new graphics processing units (GPUs), originally used to generate high-resolution graphics in video games, which proved to be a powerful tool for data crunching.

There have also been major advances in data storage, reducing the cost of storing and accessing massive data sets, and improvements in the ability to perform large amounts of computation distributed across many devices, aided by rapid advances in microprocessors and cloud computing. Equally important has been progress in machine learning, especially “deep learning,” by using multilayer statistical models, such as neural networks. In traditional statistical analysis a researcher typically starts with a theory specifying a causal relationship. A hypothesis linking the valuation of the US stock market to interest rates is a simple example of such a causal relationship, and it naturally lends itself to statistical analysis for investigating whether it fits the data and for forecasting future movements.

To start with, these approaches will have difficulty with the situational nature of intelligence because the exact situation is difficult to define and codify. Another perennial challenge for statistical approaches is “overfitting,” which is typically defined as using more parameters than justified for fitting some empirical relationship. The concern is that overfitting will make a statistical model account for irrelevant aspects of the data and then lead to inaccurate predictions and conclusions. Statisticians have devised many methods to prevent overfitting—for example, developing algorithms on a different sample than the one in which they are deployed. Nevertheless, overfitting remains a thorn in the side of statistical approaches because it is fundamentally linked to the shortcomings of the current approach to AI: lack of a theory of the phenomena being modeled.


pages: 183 words: 17,571

Broken Markets: A User's Guide to the Post-Finance Economy by Kevin Mellyn

Alan Greenspan, banking crisis, banks create money, Basel III, Bear Stearns, Bernie Madoff, Big bang: deregulation of the City of London, bond market vigilante , Bonfire of the Vanities, bonus culture, Bretton Woods, BRICs, British Empire, business cycle, buy and hold, call centre, Carmen Reinhart, central bank independence, centre right, cloud computing, collapse of Lehman Brothers, collateralized debt obligation, compensation consultant, corporate governance, corporate raider, creative destruction, credit crunch, crony capitalism, currency manipulation / currency intervention, currency risk, disintermediation, eurozone crisis, fiat currency, financial innovation, financial repression, floating exchange rates, Fractional reserve banking, Glass-Steagall Act, global reserve currency, global supply chain, Home mortgage interest deduction, index fund, information asymmetry, joint-stock company, Joseph Schumpeter, junk bonds, labor-force participation, light touch regulation, liquidity trap, London Interbank Offered Rate, low interest rates, market bubble, market clearing, Martin Wolf, means of production, Michael Milken, mobile money, Money creation, money market fund, moral hazard, mortgage debt, mortgage tax deduction, negative equity, Nixon triggered the end of the Bretton Woods system, Paul Volcker talking about ATMs, Ponzi scheme, profit motive, proprietary trading, prudent man rule, quantitative easing, Real Time Gross Settlement, regulatory arbitrage, reserve currency, rising living standards, Ronald Coase, Savings and loan crisis, seigniorage, shareholder value, Silicon Valley, SoftBank, Solyndra, statistical model, Steve Jobs, The Great Moderation, the payments system, Tobin tax, too big to fail, transaction costs, underbanked, Works Progress Administration, yield curve, Yogi Berra, zero-sum game

Moreover, distributing risk to large numbers of sophisticated institutions seemed safer than leaving it concentrated on the books of individual banks. Besides, even the Basel-process experts had become convinced that bank risk management had reached a new level of effectiveness through the use of sophisticated statistical models, and the Basel II rules that superseded Basel I especially allowed the largest and most sophisticated banks to use approved models to set their capital requirements. The fly in the ointment of market-centric finance was that it allowed an almost infinite expansion of credit in the economy, but creditworthy risks are by definition finite.

It is critical to understand that a credit score is only a measure of whether a consumer can service a certain amount of credit—that is, make timely interest and principal payments. It is not concerned with the ability to pay off Broken Markets debts over time. What it really measures is the probability that an individual will default. This is a statistical model–based determination, and as such is hostage to historical experience of the behavior of tens of millions of individuals. The factors that over time have proved most predictive include not only behavior—late or missed payments on any bill, not just a loan, signals potential default—but also circumstances.


pages: 238 words: 77,730

Final Jeopardy: Man vs. Machine and the Quest to Know Everything by Stephen Baker

23andMe, AI winter, Albert Einstein, artificial general intelligence, behavioural economics, business process, call centre, clean water, commoditize, computer age, Demis Hassabis, Frank Gehry, information retrieval, Iridium satellite, Isaac Newton, job automation, machine translation, pattern recognition, Ray Kurzweil, Silicon Valley, Silicon Valley startup, statistical model, The Soul of a New Machine, theory of mind, thinkpad, Turing test, Vernor Vinge, vertical integration, Wall-E, Watson beat the top human players on Jeopardy!

The Google team had fed millions of translated documents, many of them from the United Nations, into their computers and supplemented them with a multitude of natural-language text culled from the Web. This training set dwarfed their competitors’. Without knowing what the words meant, their computers had learned to associate certain strings of words in Arabic and Chinese with their English equivalents. Since they had so very many examples to learn from, these statistical models caught nuances that had long confounded machines. Using statistics, Google’s computers won hands down. “Just like that, they bypassed thirty years of work on machine translation,” said Ed Lazowska, the chairman of the computer science department at the University of Washington. The statisticians trounced the experts.

The human players were more complicated. Tesauro had to pull together statistics on the thousands of humans who had played Jeopardy: how often they buzzed in, their precision in different levels of clues, their betting patterns for Daily Doubles and Final Jeopardy. From these, the IBM team pieced together statistical models of two humans. Then they put them into action against the model of Watson. The games had none of the life or drama of Jeopardy—no suspense, no jokes, no jingle while the digital players came up with their Final Jeopardy responses. They were only simulations of the scoring dynamics of Jeopardy.


The Smartphone Society by Nicole Aschoff

"Susan Fowler" uber, 4chan, A Declaration of the Independence of Cyberspace, Airbnb, algorithmic bias, algorithmic management, Amazon Web Services, artificial general intelligence, autonomous vehicles, barriers to entry, Bay Area Rapid Transit, Bernie Sanders, Big Tech, Black Lives Matter, blockchain, carbon footprint, Carl Icahn, Cass Sunstein, citizen journalism, cloud computing, correlation does not imply causation, crony capitalism, crowdsourcing, cryptocurrency, data science, deep learning, DeepMind, degrowth, Demis Hassabis, deplatforming, deskilling, digital capitalism, digital divide, do what you love, don't be evil, Donald Trump, Downton Abbey, Edward Snowden, Elon Musk, Evgeny Morozov, fake news, feminist movement, Ferguson, Missouri, Filter Bubble, financial independence, future of work, gamification, gig economy, global value chain, Google Chrome, Google Earth, Googley, green new deal, housing crisis, income inequality, independent contractor, Jaron Lanier, Jeff Bezos, Jessica Bruder, job automation, John Perry Barlow, knowledge economy, late capitalism, low interest rates, Lyft, M-Pesa, Mark Zuckerberg, minimum wage unemployment, mobile money, moral panic, move fast and break things, Naomi Klein, Network effects, new economy, Nicholas Carr, Nomadland, occupational segregation, Occupy movement, off-the-grid, offshore financial centre, opioid epidemic / opioid crisis, PageRank, Patri Friedman, peer-to-peer, Peter Thiel, pets.com, planned obsolescence, quantitative easing, Ralph Waldo Emerson, RAND corporation, Ray Kurzweil, RFID, Richard Stallman, ride hailing / ride sharing, Rodney Brooks, Ronald Reagan, Salesforce, Second Machine Age, self-driving car, shareholder value, sharing economy, Sheryl Sandberg, Shoshana Zuboff, Sidewalk Labs, Silicon Valley, single-payer health, Skype, Snapchat, SoftBank, statistical model, Steve Bannon, Steve Jobs, surveillance capitalism, TaskRabbit, tech worker, technological determinism, TED Talk, the scientific method, The Structural Transformation of the Public Sphere, TikTok, transcontinental railway, transportation-network company, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, upwardly mobile, Vision Fund, W. E. B. Du Bois, wages for housework, warehouse robotics, WikiLeaks, women in the workforce, yottabyte

Federici, Caliban and the Witch, 97 50. James, Sex, Race, and Class, 45. 51. James, Sex, Race, and Class, 50. 52. Machine learning (often conflated with artificial intelligence) refers to a field of study whose goal is to learn statistical models directly from large datasets. Machine learning also refers to the software and algorithms that implement these statistical models and make predictions on new data. 53. Mayer-Schönberger and Cukier, Big Data, 93. 54. Levine, Surveillance Valley, 153. 55. Facebook’s financials can be found in its 2018 10-K filing for the Securities and Exchange Commission: https://www.sec.gov/Archives/edgar/data/1326801/000132680119000009/fb-12312018x10k.htm. 56.


pages: 305 words: 75,697

Cogs and Monsters: What Economics Is, and What It Should Be by Diane Coyle

3D printing, additive manufacturing, Airbnb, Al Roth, Alan Greenspan, algorithmic management, Amazon Web Services, autonomous vehicles, banking crisis, barriers to entry, behavioural economics, Big bang: deregulation of the City of London, biodiversity loss, bitcoin, Black Lives Matter, Boston Dynamics, Bretton Woods, Brexit referendum, business cycle, call centre, Carmen Reinhart, central bank independence, choice architecture, Chuck Templeton: OpenTable:, cloud computing, complexity theory, computer age, conceptual framework, congestion charging, constrained optimization, coronavirus, COVID-19, creative destruction, credit crunch, data science, DeepMind, deglobalization, deindustrialization, Diane Coyle, discounted cash flows, disintermediation, Donald Trump, Edward Glaeser, en.wikipedia.org, endogenous growth, endowment effect, Erik Brynjolfsson, eurozone crisis, everywhere but in the productivity statistics, Evgeny Morozov, experimental subject, financial deregulation, financial innovation, financial intermediation, Flash crash, framing effect, general purpose technology, George Akerlof, global supply chain, Goodhart's law, Google bus, haute cuisine, High speed trading, hockey-stick growth, Ida Tarbell, information asymmetry, intangible asset, Internet of things, invisible hand, Jaron Lanier, Jean Tirole, job automation, Joseph Schumpeter, Kenneth Arrow, Kenneth Rogoff, knowledge economy, knowledge worker, Les Trente Glorieuses, libertarian paternalism, linear programming, lockdown, Long Term Capital Management, loss aversion, low earth orbit, lump of labour, machine readable, market bubble, market design, Menlo Park, millennium bug, Modern Monetary Theory, Mont Pelerin Society, multi-sided market, Myron Scholes, Nash equilibrium, Nate Silver, Network effects, Occupy movement, Pareto efficiency, payday loans, payment for order flow, Phillips curve, post-industrial society, price mechanism, Productivity paradox, quantitative easing, randomized controlled trial, rent control, rent-seeking, ride hailing / ride sharing, road to serfdom, Robert Gordon, Robert Shiller, Robert Solow, Robinhood: mobile stock trading app, Ronald Coase, Ronald Reagan, San Francisco homelessness, savings glut, school vouchers, sharing economy, Silicon Valley, software is eating the world, spectrum auction, statistical model, Steven Pinker, tacit knowledge, The Chicago School, The Future of Employment, The Great Moderation, the map is not the territory, The Rise and Fall of American Growth, the scientific method, The Signal and the Noise by Nate Silver, the strength of weak ties, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, Uber for X, urban planning, winner-take-all economy, Winter of Discontent, women in the workforce, Y2K

Nate Silver writes in his bestseller The Signal and the Noise: The government produces data on literally 45,000 economic indicators each year. Private data providers track as many as four million statistics. The temptation that some economists succumb to is to put all this data into a blender and claim that the resulting gruel is haute cuisine. If you have a statistical model that seeks to explain eleven outputs but has to choose from among four million inputs to do so, many of the relationships it identifies are going to be spurious (Silver 2012). Econometricians, those economists specialising in applied statistics, know well the risk of over-fitting of economic models, the temptation to prefer inaccurate precision to the accurate imprecision that would more properly characterise noisy data.

., 1988, The Free Economy and the Strong State: The Politics of Thatcherism, London, New York: Macmillan. Gawer, A., M. Cusumano, and D. B. Yoffie, 2019, The Business of Platforms: Strategy in the Age of Digital Competition, Innovation, and Power, New York: Harper Business, 2019. Gelman, A., 2013, ‘The Recursion of Pop-Econ’, Statistical Modeling, Causal Inference, and Social Science, 10 May, https://statmodeling.stat.columbia.edu/2013/05/10/the-recursion-of-pop-econ-or-of-trolling/. Gerlach, P., 2017, ‘The Games Economists Play: Why Economics Students Behave More Selfishly than Other Students’, PloS ONE, 12 (9), e0183814, https://doi.org/10.1371/journal.pone.0183814.


pages: 752 words: 131,533

Python for Data Analysis by Wes McKinney

Alignment Problem, backtesting, Bear Stearns, cognitive dissonance, crowdsourcing, data science, Debian, duck typing, Firefox, functional programming, Google Chrome, Guido van Rossum, index card, machine readable, random walk, recommendation engine, revision control, sentiment analysis, Sharpe ratio, side project, sorting algorithm, statistical model, type inference

Preparation Cleaning, munging, combining, normalizing, reshaping, slicing and dicing, and transforming data for analysis. Transformation Applying mathematical and statistical operations to groups of data sets to derive new data sets. For example, aggregating a large table by group variables. Modeling and computation Connecting your data to statistical models, machine learning algorithms, or other computational tools Presentation Creating interactive or static graphical visualizations or textual summaries In this chapter I will show you a few data sets and some things we can do with them. These examples are just intended to pique your interest and thus will only be explained at a high level.

To create a Panel, you can use a dict of DataFrame objects or a three-dimensional ndarray: import pandas.io.data as web pdata = pd.Panel(dict((stk, web.get_data_yahoo(stk, '1/1/2009', '6/1/2012')) for stk in ['AAPL', 'GOOG', 'MSFT', 'DELL'])) Each item (the analogue of columns in a DataFrame) in the Panel is a DataFrame: In [297]: pdata Out[297]: <class 'pandas.core.panel.Panel'> Dimensions: 4 (items) x 861 (major) x 6 (minor) Items: AAPL to MSFT Major axis: 2009-01-02 00:00:00 to 2012-06-01 00:00:00 Minor axis: Open to Adj Close In [298]: pdata = pdata.swapaxes('items', 'minor') In [299]: pdata['Adj Close'] Out[299]: <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 861 entries, 2009-01-02 00:00:00 to 2012-06-01 00:00:00 Data columns: AAPL 861 non-null values DELL 861 non-null values GOOG 861 non-null values MSFT 861 non-null values dtypes: float64(4) ix-based label indexing generalizes to three dimensions, so we can select all data at a particular date or a range of dates like so: In [300]: pdata.ix[:, '6/1/2012', :] Out[300]: Open High Low Close Volume Adj Close AAPL 569.16 572.65 560.52 560.99 18606700 560.99 DELL 12.15 12.30 12.05 12.07 19396700 12.07 GOOG 571.79 572.65 568.35 570.98 3057900 570.98 MSFT 28.76 28.96 28.44 28.45 56634300 28.45 In [301]: pdata.ix['Adj Close', '5/22/2012':, :] Out[301]: AAPL DELL GOOG MSFT Date 2012-05-22 556.97 15.08 600.80 29.76 2012-05-23 570.56 12.49 609.46 29.11 2012-05-24 565.32 12.45 603.66 29.07 2012-05-25 562.29 12.46 591.53 29.06 2012-05-29 572.27 12.66 594.34 29.56 2012-05-30 579.17 12.56 588.23 29.34 2012-05-31 577.73 12.33 580.86 29.19 2012-06-01 560.99 12.07 570.98 28.45 An alternate way to represent panel data, especially for fitting statistical models, is in “stacked” DataFrame form: In [302]: stacked = pdata.ix[:, '5/30/2012':, :].to_frame() In [303]: stacked Out[303]: Open High Low Close Volume Adj Close major minor 2012-05-30 AAPL 569.20 579.99 566.56 579.17 18908200 579.17 DELL 12.59 12.70 12.46 12.56 19787800 12.56 GOOG 588.16 591.90 583.53 588.23 1906700 588.23 MSFT 29.35 29.48 29.12 29.34 41585500 29.34 2012-05-31 AAPL 580.74 581.50 571.46 577.73 17559800 577.73 DELL 12.53 12.54 12.33 12.33 19955500 12.33 GOOG 588.72 590.00 579.00 580.86 2968300 580.86 MSFT 29.30 29.42 28.94 29.19 39134000 29.19 2012-06-01 AAPL 569.16 572.65 560.52 560.99 18606700 560.99 DELL 12.15 12.30 12.05 12.07 19396700 12.07 GOOG 571.79 572.65 568.35 570.98 3057900 570.98 MSFT 28.76 28.96 28.44 28.45 56634300 28.45 DataFrame has a related to_panel method, the inverse of to_frame: In [304]: stacked.to_panel() Out[304]: <class 'pandas.core.panel.Panel'> Dimensions: 6 (items) x 3 (major) x 4 (minor) Items: Open to Adj Close Major axis: 2012-05-30 00:00:00 to 2012-06-01 00:00:00 Minor axis: AAPL to MSFT Chapter 6.

There are much more efficient sampling-without-replacement algorithms, but this is an easy strategy that uses readily available tools: In [183]: df.take(np.random.permutation(len(df))[:3]) Out[183]: 0 1 2 3 1 4 5 6 7 3 12 13 14 15 4 16 17 18 19 To generate a sample with replacement, the fastest way is to use np.random.randint to draw random integers: In [184]: bag = np.array([5, 7, -1, 6, 4]) In [185]: sampler = np.random.randint(0, len(bag), size=10) In [186]: sampler Out[186]: array([4, 4, 2, 2, 2, 0, 3, 0, 4, 1]) In [187]: draws = bag.take(sampler) In [188]: draws Out[188]: array([ 4, 4, -1, -1, -1, 5, 6, 5, 4, 7]) Computing Indicator/Dummy Variables Another type of transformation for statistical modeling or machine learning applications is converting a categorical variable into a “dummy” or “indicator” matrix. If a column in a DataFrame has k distinct values, you would derive a matrix or DataFrame containing k columns containing all 1’s and 0’s. pandas has a get_dummies function for doing this, though devising one yourself is not difficult.


pages: 58 words: 18,747

The Rent Is Too Damn High: What to Do About It, and Why It Matters More Than You Think by Matthew Yglesias

Edward Glaeser, falling living standards, gentrification, Home mortgage interest deduction, income inequality, industrial robot, Jane Jacobs, land reform, mortgage tax deduction, New Urbanism, pets.com, rent control, rent-seeking, restrictive zoning, Robert Gordon, Robert Shiller, San Francisco homelessness, Saturday Night Live, Silicon Valley, statistical model, transcontinental railway, transit-oriented development, urban sprawl, white picket fence

That said, though automobiles are unquestionably a useful technology, they’re not teleportation devices and they haven’t abolished distance. Location still matters, and some land is more valuable than other land. Since land and structures are normally sold in a bundle, it’s difficult in many cases to get precise numbers on land prices as such. But researchers at the Federal Reserve Bank of New York used a statistical model based on prices paid for vacant lots and for structures that were torn down to be replaced by brand-new buildings and found that the price of land in the metro area is closely linked to its distance from the Empire State Building: CHART 1 Land Prices and Distance of Property from Empire State Building Natural logarithm of land price per square foot Distance from Empire State Building (kilometers) In general, the expensive land should be much more densely built upon than the cheap land.


Statistics in a Nutshell by Sarah Boslaugh

Antoine Gombaud: Chevalier de Méré, Bayesian statistics, business climate, computer age, confounding variable, correlation coefficient, experimental subject, Florence Nightingale: pie chart, income per capita, iterative process, job satisfaction, labor-force participation, linear programming, longitudinal study, meta-analysis, p-value, pattern recognition, placebo effect, probability theory / Blaise Pascal / Pierre de Fermat, publication bias, purchasing power parity, randomized controlled trial, selection bias, six sigma, sparse data, statistical model, systematic bias, The Design of Experiments, the scientific method, Thomas Bayes, Two Sigma, Vilfredo Pareto

For instance, in the field of study and salary example, by using age as a continuous covariate, you are examining what the relationship between those two factors would be if all the subjects in your study were the same age. Another typical use of ANCOVA is to reduce the residual or error variance in a design. We know that one goal of statistical modeling is to explain variance in a data set and that we generally prefer models that can explain more variance, and have lower residual variance, than models that explain less. If we can reduce the residual variance by including one or more continuous covariates in our design, it might be easier to see the relationships between the factors of interest and the dependent variable.

For example, in the mid-1970s, models focused on variables derived from atmospheric conditions, whereas in the near future, models will be available that are based on atmospheric data combined with land surface, ocean and sea ice, sulphate and nonsulphate aerosol, carbon cycle, dynamic vegetation, and atmospheric chemistry data. By combining these additional sources of variation into a large-scale statistical model, predictions of weather activity of qualitatively different types have been made possible at different spatial and temporal scales. In this chapter, we will be working with multiple regression on a much smaller scale. This is not unrealistic from a real-world point of view; in fact, useful regression models may be built using a relatively small number of predictor variables (say, from 2 to 10), although the people building the model might consider far more predictors for inclusion before selecting those to keep in the final model.

Perhaps wine drinkers eat better diets than people who don’t drink at all, or perhaps they are able to drink wine because they are in better health. (Treatment for certain illnesses precludes alcohol consumption, for instance.) To try to eliminate these alternative explanations, researchers often collect data on a variety of factors other than the factor of primary interest and include the extra factors in the statistical model. Such variables, which are neither the outcome nor the main predictors of interest, are called control variables because they are included in the equation to control for their effect on the outcome. Variables such as age, gender, socioeconomic status, and race/ethnicity are often included in medical and social science studies, although they are not the variables of interest, because the researcher wants to know the effect of the main predictor variables on the outcome after the effects of these control variables have been accounted for.


pages: 360 words: 85,321

The Perfect Bet: How Science and Math Are Taking the Luck Out of Gambling by Adam Kucharski

Ada Lovelace, Albert Einstein, Antoine Gombaud: Chevalier de Méré, beat the dealer, behavioural economics, Benoit Mandelbrot, Bletchley Park, butterfly effect, call centre, Chance favours the prepared mind, Claude Shannon: information theory, collateralized debt obligation, Computing Machinery and Intelligence, correlation does not imply causation, diversification, Edward Lorenz: Chaos theory, Edward Thorp, Everything should be made as simple as possible, Flash crash, Gerolamo Cardano, Henri Poincaré, Hibernia Atlantic: Project Express, if you build it, they will come, invention of the telegraph, Isaac Newton, Johannes Kepler, John Nash: game theory, John von Neumann, locking in a profit, Louis Pasteur, Nash equilibrium, Norbert Wiener, p-value, performance metric, Pierre-Simon Laplace, probability theory / Blaise Pascal / Pierre de Fermat, quantitative trading / quantitative finance, random walk, Richard Feynman, Ronald Reagan, Rubik’s Cube, statistical model, The Design of Experiments, Watson beat the top human players on Jeopardy!, zero-sum game

The scales can tip one way or the other: whichever produces the combined prediction that lines up best with actual results. Strike the right balance, and good predictions can become profitable ones. WHEN WOODS AND BENTER arrived in Hong Kong, they did not meet with immediate success. While Benter spent the first year putting together the statistical model, Woods tried to make money exploiting the long-shot-favorite bias. They had come to Asia with a bankroll of $150,000; within two years, they’d lost it all. It didn’t help that investors weren’t interested in their strategy. “People had so little faith in the system that they would not have invested for 100 percent of the profits,” Woods later said.

Some of which provide clear hints about the future, while others just muddy the predictions. To pin down which factors are useful, syndicates need to collect reliable, repeated observations about races. Hong Kong was the closest Bill Benter could find to a laboratory setup, with the same horses racing on a regular basis on the same tracks in similar conditions. Using his statistical model, Benter identified factors that could lead to successful race predictions. He found that some came out as more important than others. In Benter’s early analysis, for example, the model said the number of races a horse had previously run was a crucial factor when making predictions. In fact, it was more important than almost any other factor.


pages: 309 words: 86,909

The Spirit Level: Why Greater Equality Makes Societies Stronger by Richard Wilkinson, Kate Pickett

"Hurricane Katrina" Superdome, basic income, Berlin Wall, classic study, clean water, Diane Coyle, epigenetics, experimental economics, experimental subject, Fall of the Berlin Wall, full employment, germ theory of disease, Gini coefficient, God and Mammon, impulse control, income inequality, Intergovernmental Panel on Climate Change (IPCC), knowledge economy, labor-force participation, land reform, longitudinal study, Louis Pasteur, meta-analysis, Milgram experiment, mirror neurons, moral panic, Murray Bookchin, offshore financial centre, phenotype, plutocrats, profit maximization, profit motive, Ralph Waldo Emerson, statistical model, The Chicago School, The Spirit Level, The Wealth of Nations by Adam Smith, Thorstein Veblen, ultimatum game, upwardly mobile, World Values Survey, zero-sum game

One factor is the strength of the relationship, which is shown by the steepness of the lines in Figures 4.1 and 4.2. People in Sweden are much more likely to trust each other than people in Portugal. Any alternative explanation would need to be just as strong, and in our own statistical models we find that neither poverty nor average standards of living can explain our findings. We also see a consistent association among both the United States and the developed countries. Earlier we described how Uslaner and Rothstein used a statistical model to show the ordering of inequality and trust: inequality affects trust, not the other way round. The relationships between inequality and women’s status and between inequality and foreign aid also add coherence and plausibility to our belief that inequality increases the social distance between different groups of people, making us less willing to see them as ‘us’ rather than ‘them’.


pages: 304 words: 80,965

What They Do With Your Money: How the Financial System Fails Us, and How to Fix It by Stephen Davis, Jon Lukomnik, David Pitt-Watson

activist fund / activist shareholder / activist investor, Admiral Zheng, banking crisis, Basel III, Bear Stearns, behavioural economics, Bernie Madoff, Black Swan, buy and hold, Carl Icahn, centralized clearinghouse, clean water, compensation consultant, computerized trading, corporate governance, correlation does not imply causation, credit crunch, Credit Default Swap, crowdsourcing, David Brooks, Dissolution of the Soviet Union, diversification, diversified portfolio, en.wikipedia.org, financial engineering, financial innovation, financial intermediation, fixed income, Flash crash, Glass-Steagall Act, income inequality, index fund, information asymmetry, invisible hand, John Bogle, Kenneth Arrow, Kickstarter, light touch regulation, London Whale, Long Term Capital Management, moral hazard, Myron Scholes, Northern Rock, passive investing, Paul Volcker talking about ATMs, payment for order flow, performance metric, Ponzi scheme, post-work, principal–agent problem, rent-seeking, Ronald Coase, seminal paper, shareholder value, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical model, Steve Jobs, the market place, The Wealth of Nations by Adam Smith, transaction costs, Upton Sinclair, value at risk, WikiLeaks

That is why the day you get married is so memorable. In fact, the elements of that day are not likely to be present in the sample of any of the previous 3,652 days.28 So how could the computer possibly calculate the likelihood of their recurring tomorrow, or next week? Similarly, in the financial world, if you feed a statistical model data that have come from a period where there has been no banking crisis, the model will predict that it is very unlikely you will have a banking crisis. When statisticians worked out that a financial crisis of the sort we witnessed in 2008 would occur once in billions of years, their judgment was based on years of data when there had not been such a crisis.29 It compounds the problem that people tend to simplify the outcome of risk models.

The compass that bankers and regulators were using worked well according to its own logic, but it was pointing in the wrong direction, and they steered the ship onto the rocks. History does not record whether the Queen was satisfied with the academics’ response. She might, however, have noted that this economic-statistical model had been found wanting before—in 1998, when the collapse of the hedge fund Long-Term Capital Management nearly took the financial system down with it. Ironically, its directors included the two people who had shared the Nobel Prize in Economics the previous year.20 The Queen might also have noted the glittering lineup of senior economists who, over the last century, have warned against excessive confidence in predictions made using models.


Data Wrangling With Python: Tips and Tools to Make Your Life Easier by Jacqueline Kazil

Amazon Web Services, bash_history, business logic, cloud computing, correlation coefficient, crowdsourcing, data acquisition, data science, database schema, Debian, en.wikipedia.org, Fairphone, Firefox, Global Witness, Google Chrome, Hacker News, job automation, machine readable, Nate Silver, natural language processing, pull request, Ronald Reagan, Ruby on Rails, selection bias, social web, statistical model, web application, WikiLeaks

Exception handling Enables you to anticipate and manage Python exceptions with code. It’s always better to be specific and explicit, so you don’t disguise bugs with overly general exception catches. numpy coerrcoef Uses statistical models like Pearson’s correlation to determine whether two parts of a dataset are related. agate mad_outli ers and stdev_out liers Use statistical models and tools like standard deviations or mean average deviations to determine whether your dataset has specific outliers or data that “doesn’t fit.” agate group_by and aggregate Group your dataset on a particular attribute and run aggregation analysis to see if there are notable differences (or similarities) across groupings.

This interactive displays different scenar‐ 262 | Chapter 10: Presenting Your Data ios The Guardian staff researched and coded. Not every simulation turns out with the same outcome, allowing users to understand there is an element of chance, while still showing probability (i.e., less chance of infection with higher vaccination rates). This takes a highly politicized topic and brings out real-world scenarios using statistical models of outbreaks. Although interactives take more experience to build and often require a deeper cod‐ ing skillset, they are a great tool, especially if you have frontend coding experience. As an example, for our child labor data we could build an interactive showing how many people in your local high school would have never graduated due to child labor rates if they lived in Chad.


pages: 88 words: 25,047

The Mathematics of Love: Patterns, Proofs, and the Search for the Ultimate Equation by Hannah Fry

Brownian motion, John Nash: game theory, linear programming, Nash equilibrium, Pareto efficiency, power law, recommendation engine, Skype, stable marriage problem, statistical model, TED Talk

Simulating Social Phenomena, edited by Rosaria Conte, Rainer Hegselmann, Pietro Terna, 419–36. Berlin: Springer Berlin Heidelberg, 1997. CHAPTER 8: HOW TO OPTIMIZE YOUR WEDDING Bellows, Meghan L. and J. D. Luc Peterson. ‘Finding an Optimal Seating Chart.’ Annals of Improbable Research, 2012. Alexander, R. A Statistically Modelled Wedding. (2014): http://www­.bbc­.co­.uk/­news­/mag­azi­ne-25980076. CHAPTER 9: HOW TO LIVE HAPPILY EVER AFTER Gottman, John M., James D. Murray, Catherine C. Swanson, Rebecca Tyson and Kristin R. Swanson. The Mathematics of Marriage: Dynamic Nonlinear Models. Cambridge, MA.: Basic Books, 2005.


pages: 346 words: 92,984

The Lucky Years: How to Thrive in the Brave New World of Health by David B. Agus

"World Economic Forum" Davos, active transport: walking or cycling, Affordable Care Act / Obamacare, Albert Einstein, Apollo 11, autism spectrum disorder, butterfly effect, clean water, cognitive dissonance, CRISPR, crowdsourcing, Danny Hillis, Drosophila, Edward Jenner, Edward Lorenz: Chaos theory, en.wikipedia.org, epigenetics, fake news, Kickstarter, Larry Ellison, longitudinal study, Marc Benioff, medical residency, meta-analysis, microbiome, microcredit, mouse model, Murray Gell-Mann, Neil Armstrong, New Journalism, nocebo, parabiotic, pattern recognition, personalized medicine, phenotype, placebo effect, publish or perish, randomized controlled trial, risk tolerance, Salesforce, statistical model, stem cell, Steve Jobs, Thomas Malthus, wikimedia commons

Tomasetti and Vogelstein were accused of focusing on rare cancers while leaving out several common cancers that indeed are largely preventable. The International Agency for Research on Cancer, the cancer arm of the World Health Organization, published a press release stating it “strongly disagrees” with the report. To arrive at their conclusion, Tomasetti and Vogelstein used a statistical model they developed based on known rates of cell division in thirty-one types of tissue. Stem cells were their main focal point. As a reminder, these are the small, specialized “mothership” cells in each organ or tissue that divide to replace cells that die or wear out. Only in recent years have researchers been able to conduct these kinds of studies due to advances in the understanding of stem-cell biology.

., “Baseline Selenium Status and Effects of Selenium and Vitamin E Supplementation on Prostate Cancer Risk,” Journal of the National Cancer Institute 106, no. 3 (March 2014): djt456, doi:10.1093/jnci/djt456, Epub February 22, 2014. 12. Johns Hopkins Medicine, “Bad Luck of Random Mutations Plays Predominant Role in Cancer, Study Shows—Statistical Modeling Links Cancer Risk with Number of Stem Cell Divisions,” news release, January 1, 2015, www.hopkinsmedicine.org/news/media/releases/bad_luck_of_random_mutations_plays_predominant_role_in_cancer_study_shows. 13. C. Tomasetti and B. Vogelstein, “Cancer Etiology. Variation in Cancer Risk Among Tissues Can Be Explained by the Number of Stem Cell Divisions,” Science 347, no. 6217 (January 2, 2015): 78–81, doi:10.1126/science.1260825. 14.


Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose

Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, sparse data, speech recognition, statistical model, William of Occam

WHY THE BOOK IS NEEDED The book provides the reader with: r The models and techniques to uncover hidden nuggets of information in Webbased data r Insight into how web mining algorithms really work r The experience of actually performing web mining on real-world data sets “WHITE-BOX” APPROACH: UNDERSTANDING THE UNDERLYING ALGORITHMIC AND MODEL STRUCTURES The best way to avoid costly errors stemming from a blind black-box approach to data mining, is to apply, instead, a white-box methodology, which emphasizes an understanding of the algorithmic and statistical model structures underlying the software. The book, applies this white-box approach by: r Walking the reader through various algorithms r Providing examples of the operation of web mining algorithms on actual large data sets PREFACE xiii r Testing the reader’s level of understanding of the concepts and algorithms r Providing an opportunity for the reader to do some real web mining on large Web-based data sets Algorithm Walk-Throughs The book walks the reader through the operations and nuances of various algorithms, using small sample data sets, so that the reader gets a true appreciation of what is really going on inside an algorithm.

CHAPTER 4 EVALUATING CLUSTERING APPROACHES TO EVALUATING CLUSTERING SIMILARITY-BASED CRITERION FUNCTIONS PROBABILISTIC CRITERION FUNCTIONS MDL-BASED MODEL AND FEATURE EVALUATION CLASSES-TO-CLUSTERS EVALUATION PRECISION, RECALL, AND F-MEASURE ENTROPY APPROACHES TO EVALUATING CLUSTERING Clustering algorithms group documents by similarity or create statistical models based solely on the document representation, which in turn reflects document content. Then the criterion functions evaluate these models objectively (i.e., using only the document content). In contrast, when we label documents by topic we use additional knowledge, which is generally not explicitly available in document content and representation.


pages: 340 words: 94,464

Randomistas: How Radical Researchers Changed Our World by Andrew Leigh

Albert Einstein, Amazon Mechanical Turk, Anton Chekhov, Atul Gawande, basic income, behavioural economics, Black Swan, correlation does not imply causation, crowdsourcing, data science, David Brooks, Donald Trump, ending welfare as we know it, Estimating the Reproducibility of Psychological Science, experimental economics, Flynn Effect, germ theory of disease, Ignaz Semmelweis: hand washing, Indoor air pollution, Isaac Newton, It's morning again in America, Kickstarter, longitudinal study, loss aversion, Lyft, Marshall McLuhan, meta-analysis, microcredit, Netflix Prize, nudge unit, offshore financial centre, p-value, Paradox of Choice, placebo effect, price mechanism, publication bias, RAND corporation, randomized controlled trial, recommendation engine, Richard Feynman, ride hailing / ride sharing, Robert Metcalfe, Ronald Reagan, Sheryl Sandberg, statistical model, Steven Pinker, sugar pill, TED Talk, uber lyft, universal basic income, War on Poverty

Critics mocked his ‘pocket handkerchief wheat plots’.4 But after trying hundreds of different breeding combinations, Farrer created a new ‘Federation Wheat’ based not on reputation or appearance, but on pure performance. Agricultural trials of this kind are often called ‘field experiments’, a term which some people also use to describe randomised trials in social science. Modern agricultural field experiments use spatial statistical models to divide up the plots.5 As in medicine and aid, the most significant agricultural randomised trials are now conducted across multiple countries. They are at the heart of much of our understanding of genetically modified crops, the impact of climate change on agriculture, and drought resistance

But all of these studies are limited by the assumptions that the methods required us to make. New developments in non-randomised econometrics – such as machine learning – are generally even more complicated than the older approaches.34 As economist Orley Ashenfelter notes, if an evaluator is predisposed to give a program the thumbs-up, statistical modelling ‘leaves too many ways for the researcher to fake it’.35 That’s why one leading econometrics text teaches non-random approaches by comparing each to the ‘experimental ideal’.36 Students are encouraged to ask the question: ‘If we could run a randomised experiment here, what would it look like?’


pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum

Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, Chuck Templeton: OpenTable:, cloud computing, cognitive dissonance, combinatorial explosion, commoditize, conceptual framework, data science, database schema, DevOps, en.wikipedia.org, Firefox, Flash crash, functional programming, Gini coefficient, hype cycle, illegal immigration, iterative process, labor-force participation, loose coupling, machine readable, natural language processing, Netflix Prize, One Laptop per Child (OLPC), power law, quantitative trading / quantitative finance, recommendation engine, selection bias, sentiment analysis, SQL injection, statistical model, supply-chain management, survivorship bias, text mining, too big to fail, web application

He has spent the past 15 years extracting information from messy data in fields ranging from intelligence to quantitative finance to social media. Richard Cotton is a data scientist with a background in chemical health and safety, and has worked extensively on tools to give non-technical users access to statistical models. He is the author of the R packages “assertive” for checking the state of your variables and “sig” to make sure your functions have a sensible API. He runs The Damned Liars statistics consultancy. Philipp K. Janert was born and raised in Germany. He obtained a Ph.D. in Theoretical Physics from the University of Washington in 1997 and has been working in the tech industry since, including four years at Amazon.com, where he initiated and led several projects to improve Amazon’s order fulfillment process.

As the first and second examples show, a scientist can spot faulty experimental setups, because of his or her ability to test the data for internal consistency and for agreement with known theories, and thereby prevent wrong conclusions and faulty analyses. What possibly could be more importantto a scientist? And if that means taking a trip to the factory, I’ll be glad to go. Chapter 8. Blood, Sweat, and Urine Richard Cotton A Very Nerdy Body Swap Comedy I spent six years working in the statistical modeling team at the UK’s Health and Safety Laboratory.[23] A large part of my job was working with the laboratory’s chemists, looking at occupational exposure to various nasty substances to see if an industry was adhering to safe limits. The laboratory gets sent tens of thousands of blood and urine samples each year (and sometimes more exotic fluids like sweat or saliva), and has its own team of occupational hygienists who visit companies and collect yet more samples.


pages: 339 words: 94,769

Possible Minds: Twenty-Five Ways of Looking at AI by John Brockman

AI winter, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Alignment Problem, AlphaGo, artificial general intelligence, Asilomar, autonomous vehicles, basic income, Benoit Mandelbrot, Bill Joy: nanobots, Bletchley Park, Buckminster Fuller, cellular automata, Claude Shannon: information theory, Computing Machinery and Intelligence, CRISPR, Daniel Kahneman / Amos Tversky, Danny Hillis, data science, David Graeber, deep learning, DeepMind, Demis Hassabis, easy for humans, difficult for computers, Elon Musk, Eratosthenes, Ernest Rutherford, fake news, finite state, friendly AI, future of work, Geoffrey Hinton, Geoffrey West, Santa Fe Institute, gig economy, Hans Moravec, heat death of the universe, hype cycle, income inequality, industrial robot, information retrieval, invention of writing, it is difficult to get a man to understand something, when his salary depends on his not understanding it, James Watt: steam engine, Jeff Hawkins, Johannes Kepler, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, Kickstarter, Laplace demon, Large Hadron Collider, Loebner Prize, machine translation, market fundamentalism, Marshall McLuhan, Menlo Park, military-industrial complex, mirror neurons, Nick Bostrom, Norbert Wiener, OpenAI, optical character recognition, paperclip maximiser, pattern recognition, personalized medicine, Picturephone, profit maximization, profit motive, public intellectual, quantum cryptography, RAND corporation, random walk, Ray Kurzweil, Recombinant DNA, Richard Feynman, Rodney Brooks, self-driving car, sexual politics, Silicon Valley, Skype, social graph, speech recognition, statistical model, Stephen Hawking, Steven Pinker, Stewart Brand, strong AI, superintelligent machines, supervolcano, synthetic biology, systems thinking, technological determinism, technological singularity, technoutopianism, TED Talk, telemarketer, telerobotics, The future is already here, the long tail, the scientific method, theory of mind, trolley problem, Turing machine, Turing test, universal basic income, Upton Sinclair, Von Neumann architecture, Whole Earth Catalog, Y2K, you are the product, zero-sum game

., 49, 240–53 AI safety concerns, 242–43 background and overview of work of, 240–41 conventional computers versus bio-electronic hybrids, 246–48 equal rights, 248–49 ethical rules for intelligent machines, 243–44 free will of machines, and rights, 250–51 genetic red lines, 251–52 human manipulation of humans, 244–46, 252 humans versus nonhumans and hybrids, treatment of, 249–53 non-Homo intelligences, fair and safe treatment of, 247–48 rights for nonhumans and hybrids, 249–53 science versus religion, 243–44 self-consciousness of machines, and rights, 250–51 technical barriers/red lines, malleability of, 244–46 transhumans, rights of, 252–53 clinical (subjective) method of prediction, 233, 234–35 Colloquy of Mobiles (Pask), 259 Colossus: The Forbin Project (film), 242 competence of superintelligent AGI, 85 computational theory of mind, 102–3, 129–33, 222 computer learning systems Bayesian models, 226–28 cooperative inverse-reinforcement learning (CIRL), 30–31 deep learning (See deep learning) human learning, similarities to, 11 reality blueprint, need for, 16–17 statistical, model-blind mode of current, 16–17, 19 supervised learning, 148 unsupervised learning, 225 Computer Power and Human Reason (Weizenbaum), 48–49, 248 computer virus, 61 “Computing Machinery and Intelligence” (Turing), 43 conflicts among hybrid superintelligences, 174–75 controllable-agent designs, 31–32 control systems beyond human control (control problem) AI designed as tool and not as conscious agent, 46–48, 51–53 arguments against AI risk (See risk posed by AI, arguments against) Ashby’s Law and, 39, 179, 180 cognitive element in, xx–xxi Dyson on, 38–39, 40 Macy conferences, xx–xxi purpose imbued in machines and, 23–25 Ramakrishnan on, 183–86 risk of superhuman intelligence, arguments against, 25–29 Russell on templates for provably beneficial AI, 29–32 Tallinn on, 93–94 Wiener’s warning about, xviii–xix, xxvi, 4–5, 11–12, 22–23, 35, 93, 104, 172 Conway, John Horton, 263 cooperative inverse-reinforcement learning (CIRL), 30–31 coordination problem, 137, 138–41 corporate/AI scenario, in relation of machine superintelligences to hybrid superintelligences, 176 corporate superintelligences, 172–74 credit-assignment function, 196–200 AI and, 196–97 humans, applied to, 197–200 Crick, Francis, 58, 66 culture in evolution, selecting for, 198–99 curiosity, and AI risk denial, 96 Cybernetic Idea, xv cybernetics, xv–xxi, 3–7, 102–4, 153–54, 178–80, 194–95, 209–10, 256–57 “Cybernetic Sculpture” exhibition (Tsai), 258, 260–61 “Cybernetic Serendipity” exhibition (Reichardt), 258–59 Cybernetics (Wiener), xvi, xvii, 3, 5, 7, 56 “Cyborg Manifesto, A” (Haraway), 261 data gathering and exploitation, computation platforms used for, 61–63 Dawkins, Richard, 243 Declaration of Helsinki, 252 declarative design, 166–67 Deep Blue, 8, 184 Deep Dream, 211 deep learning, 184–85 bottom-up, 224–26 Pearl on lack of transparency in, and limitations of, 15–19 reinforcement learning, 128, 184–85, 225–26 unsupervised learning, 225 visualization programs, 211–13 Wiener’s foreshadowing of, 9 Deep-Mind, 184–85, 224, 225, 262–63 Deleuze, Gilles, 256 Dennett, Daniel C., xxv, 41–53, 120, 191 AI as “helpless by themselves,” 46–48 AI as tool, not colleagues, 46–48, 51–53 background and overview of work of, 41–42 dependence on new tools and loss of ability to thrive without them, 44–46 gap between today’s AI and public’s imagination of AI, 49 humanoid embellishment of AI, 49–50 intelligent tools versus artificial conscious agents, need for, 51–52 operators of AI systems, responsibilities of, 50–51 on Turing Test, 46–47 on Weizenbaum, 48–50 on Wiener, 43–45 Descartes, René, 191, 223 Desk Set (film), 270 Deutsch, David, 113–24 on AGI risks, 121–22 background and overview of work of, 113–14 creating AGIs, 122–24 developing AI with goals under unknown constraints, 119–21 innovation in prehistoric humans, lack of, 116–19 knowledge imitation of ancestral humans, understanding inherent in, 115–16 reward/punishment of AI, 120–21 Differential Analyzer, 163, 179–80 digital fabrication, 167–69 digital signal encoding, 180 dimensionality, 165–66 distributed Thompson sampling, 198 DNA molecule, 58 “Dollie Clone Series” (Hershman Leeson), 261, 262 Doubt and Certainty in Science (Young), xviii Dragan, Anca, 134–42 adding people to AI problem definition, 137–38 background and overview of work of, 134–35 coordination problem, 137, 138–41 mathematical definition of AI, 136 value-alignment problem, 137–38, 141–42 The Dreams of Reason: The Computer and the Rise of the Science of Complexity (Pagels), xxiii Drexler, Eric, 98 Dyson, Freeman, xxv, xxvi Dyson, George, xviii–xix, 33–40 analog and digital computation, distinguished, 35–37 background and overview of work of, 33–34 control, emergence of, 38–39 electronics, fundamental transitions in, 35 hybrid analog/digital systems, 37–38 on three laws of AI, 39–40 “Economic Possibilities for Our Grandchildren” (Keynes), 187 “Einstein, Gertrude Stein, Wittgenstein and Frankenstein” (Brockman), xxii emergence, 68–69 Emissaries trilogy (Cheng), 216–17 Empty Space, The (Brook), 213 environmental risk, AI risk as, 97–98 Eratosthenes, 19 Evans, Richard, 217 Ex Machina (film), 242 expert systems, 271 extreme wealth, 202–3 fabrication, 167–69 factor analysis, 225 Feigenbaum, Edward, xxiv Feynman, Richard, xxi–xxii Fifth Generation, xxiii–xxiv The Fifth Generation: Artificial Intelligence and Japan’s Computer Challenge to the World (Feigenbaum and McCorduck), xxiv Fodor, Jerry, 102 Ford Foundation, 202 Foresight and Understanding (Toulmin), 18–19 free will of machines, and rights, 250–51 Frege, Gottlob, 275–76 Galison, Peter, 231–39 background and overview of work of, 231–32 clinical versus objective method of prediction, 233–35 scientific objectivity, 235–39 Gates, Bill, 202 generative adversarial networks, 226 generative design, 166–67 Gershenfeld, Neil, 160–69 background and overview of work of, 160–61 boom-bust cycles in evolution of AI, 162–63 declarative design, 166–67 digital fabrication, 167–69 dimensionality problem, overcoming, 165–66 exponentially increasing amounts of date, processing of, 164–65 knowledge in AI systems, 164 scaling, and development of AI, 163–66 Ghahramani, Zoubin, 190 Gibson, William, 253 Go, 10, 150, 184–85 goal alignment.

., 222, 225 Sleepwalkers, The (Koestler), 153 Sloan Foundation, 202 social sampling, 198–99 software failure to advance in conjunction with increased processing power, 10 lack of standards of correctness and failure in engineering of, 60–61 Solomon, Arthur K., xvi–xvii “Some Moral and Technical Consequences of Automation” (Wiener), 23 Stapledon, Olaf, 75 state/AI scenario, in relation of machine superintelligences to hybrid superintelligences, 175–76 statistical, model-blind mode of learning, 16–17, 19 Steveni, Barbara, 218 Stewart, Potter, 247 Steyerl, Hito on AI visualization programs, 211–12 on artificial stupidity, 210–11 subjective method of prediction, 233, 234–35 subjugation fear in AI scenarios, 108–10 Superintelligence: Paths, Dangers, Strategies (Bostrom), 27 supervised learning, 148 surveillance state dystopias, 105–7 switch-it-off argument against AI risk, 25 Szilard, Leo, 26, 83 Tallinn, Jaan, 88–99 AI-risk message, 92–93 background and overview of work of, 88–89 calibrating AI-risk message, 96–98 deniers of AI-risk, motives of, 95–96 environmental risk, AI risk as, 97–98 Estonian dissidents, messages of, 91–92 evolution’s creation of planner and optimizer greater than itself, 93–94 growing awareness of AI risk, 98–99 technological singularity.


pages: 125 words: 27,675

Applied Text Analysis With Python: Enabling Language-Aware Data Products With Machine Learning by Benjamin Bengfort, Rebecca Bilbro, Tony Ojeda

data science, full text search, natural language processing, quantitative easing, sentiment analysis, statistical model, the long tail

In [Link to Come] we will explore classification models and applications, then in [Link to Come] we will take a look at clustering models, often called topic modeling in text analysis. 1 Kumar, A., McCann, R., Naughton, J., Patel, J. (2015) Model Selection Management Systems: The Next Frontier of Advanced Analytics 2 Wickham, H., Cooke, D., Hofmann, H. (2015) Visualizing statistical models: Removing the blindfold 3 https://arxiv.org/abs/1405.4053


pages: 317 words: 100,414

Superforecasting: The Art and Science of Prediction by Philip Tetlock, Dan Gardner

Affordable Care Act / Obamacare, Any sufficiently advanced technology is indistinguishable from magic, availability heuristic, behavioural economics, Black Swan, butterfly effect, buy and hold, cloud computing, cognitive load, cuban missile crisis, Daniel Kahneman / Amos Tversky, data science, desegregation, drone strike, Edward Lorenz: Chaos theory, forward guidance, Freestyle chess, fundamental attribution error, germ theory of disease, hindsight bias, How many piano tuners are there in Chicago?, index fund, Jane Jacobs, Jeff Bezos, Kenneth Arrow, Laplace demon, longitudinal study, Mikhail Gorbachev, Mohammed Bouazizi, Nash equilibrium, Nate Silver, Nelson Mandela, obamacare, operational security, pattern recognition, performance metric, Pierre-Simon Laplace, place-making, placebo effect, precautionary principle, prediction markets, quantitative easing, random walk, randomized controlled trial, Richard Feynman, Richard Thaler, Robert Shiller, Ronald Reagan, Saturday Night Live, scientific worldview, Silicon Valley, Skype, statistical model, stem cell, Steve Ballmer, Steve Jobs, Steven Pinker, tacit knowledge, tail risk, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Watson beat the top human players on Jeopardy!

He also appreciated the absurdity of an academic committee on a mission to save the world. So I am 98% sure he was joking. And 99% sure his joke captures a basic truth about human judgment. Probability for the Stone Age Human beings have coped with uncertainty for as long as we have been recognizably human. And for almost all that time we didn’t have access to statistical models of uncertainty because they didn’t exist. It was remarkably late in history—arguably as late as the 1713 publication of Jakob Bernoulli’s Ars Conjectandi—before the best minds started to think seriously about probability. Before that, people had no choice but to rely on the tip-of-your-nose perspective.

For more details, visit www.goodjudgment.com. (1) Triage. Focus on questions where your hard work is likely to pay off. Don’t waste time either on easy “clocklike” questions (where simple rules of thumb can get you close to the right answer) or on impenetrable “cloud-like” questions (where even fancy statistical models can’t beat the dart-throwing chimp). Concentrate on questions in the Goldilocks zone of difficulty, where effort pays off the most. For instance, “Who will win the presidential election, twelve years out, in 2028?” is impossible to forecast now. Don’t even try. Could you have predicted in 1940 the winner of the election, twelve years out, in 1952?


pages: 347 words: 97,721

Only Humans Need Apply: Winners and Losers in the Age of Smart Machines by Thomas H. Davenport, Julia Kirby

"World Economic Forum" Davos, AI winter, Amazon Robotics, Andy Kessler, Apollo Guidance Computer, artificial general intelligence, asset allocation, Automated Insights, autonomous vehicles, basic income, Baxter: Rethink Robotics, behavioural economics, business intelligence, business process, call centre, carbon-based life, Clayton Christensen, clockwork universe, commoditize, conceptual framework, content marketing, dark matter, data science, David Brooks, deep learning, deliberate practice, deskilling, digital map, disruptive innovation, Douglas Engelbart, driverless car, Edward Lloyd's coffeehouse, Elon Musk, Erik Brynjolfsson, estate planning, financial engineering, fixed income, flying shuttle, follow your passion, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, game design, general-purpose programming language, global pandemic, Google Glasses, Hans Lippershey, haute cuisine, income inequality, independent contractor, index fund, industrial robot, information retrieval, intermodal, Internet of things, inventory management, Isaac Newton, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joi Ito, Khan Academy, Kiva Systems, knowledge worker, labor-force participation, lifelogging, longitudinal study, loss aversion, machine translation, Mark Zuckerberg, Narrative Science, natural language processing, Nick Bostrom, Norbert Wiener, nuclear winter, off-the-grid, pattern recognition, performance metric, Peter Thiel, precariat, quantitative trading / quantitative finance, Ray Kurzweil, Richard Feynman, risk tolerance, Robert Shiller, robo advisor, robotic process automation, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, six sigma, Skype, social intelligence, speech recognition, spinning jenny, statistical model, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, superintelligent machines, supply-chain management, tacit knowledge, tech worker, TED Talk, the long tail, transaction costs, Tyler Cowen, Tyler Cowen: Great Stagnation, Watson beat the top human players on Jeopardy!, Works Progress Administration, Zipcar

The term “artificial intelligence” alone, for example, has been used to describe such technologies as expert systems (collections of rules facilitating decisions in a specified domain, such as financial planning or knowing when a batch of soup is cooked), neural networks (a more mathematical approach to creating a model that fits a data set), machine learning (semiautomated statistical modeling to achieve the best fitting-model to data), natural language processing or NLP (in which computers make sense of human language in textual form), and so forth. Wikipedia lists at least ten branches of AI, and we have seen other sources that mention many more. To make sense of this army of machines and the direction in which it is marching, it helps to remember where it all started: with numerical analytics supporting and supported by human decision-makers.

This work required a broad range of sophisticated models including “neural network” models; some were vendor supplied; some were custom-built . Cathcart, who was an English major at Dartmouth College but also learned the BASIC computer language there from its creator, John Kemeny, knew his way around computer systems and statistical models. Most important, he knew when to trust them and when not to. The models and analyses began to exhibit significant problems. No matter how automated and sophisticated the models were, Cathcart realized that they were becoming less valid over time with changes in the economy and banking climate.


pages: 317 words: 106,130

The New Science of Asset Allocation: Risk Management in a Multi-Asset World by Thomas Schneeweis, Garry B. Crowder, Hossein Kazemi

asset allocation, backtesting, Bear Stearns, behavioural economics, Bernie Madoff, Black Swan, book value, business cycle, buy and hold, capital asset pricing model, collateralized debt obligation, commodity trading advisor, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, diversification, diversified portfolio, financial engineering, fixed income, global macro, high net worth, implied volatility, index fund, interest rate swap, invisible hand, managed futures, market microstructure, merger arbitrage, moral hazard, Myron Scholes, passive investing, Richard Feynman, Richard Feynman: Challenger O-ring, risk free rate, risk tolerance, risk-adjusted returns, risk/return, search costs, selection bias, Sharpe ratio, short selling, statistical model, stocks for the long run, survivorship bias, systematic trading, technology bubble, the market place, Thomas Kuhn: the structure of scientific revolutions, transaction costs, value at risk, yield curve, zero-sum game

There are libraries of statistical books dedicated to the simple task of coming up with estimates of the parameters used in MPT. Here is the point: It is not simple. For example, (1) for what period is one estimating the parameters (week, month, year)? and (2) how constant are the estimates (e.g., do they change and, if they do, do we have statistical models that permit us to systematically reflect those changes?)? There are many more issues in parameter estimation, but probably the biggest is that when two assets exist with the same true expected return, standard deviation, and Measuring Risk 33 correlation but when the risk parameter is often estimated with error (e.g., standard deviation is larger or smaller than its true standard deviation), the procedure for determining the efficient frontier always picks the asset with the downward bias risk estimate (e.g., the lower estimated standard deviation) and the upward bias return estimate.

The primary issue, of course, remains how to create a comparably risky investable non-actively managed asset. Even when one believes in the use of ex ante equilibrium (e.g., CAPM) or arbitrage (e.g., APT) models of expected return, problems in empirically estimating the required parameters usually results in alpha being determined using statistical models based on the underlying theoretical model. As generally measured in a statistical sense, the term alpha is often derived from a linear regression in which the equation that relates an observed variable y (asset return) to some other factor x (market index) is written as: y = α + βx + ε The first term, α (alpha) represents the intercept; β (beta) represents the slope; and ε (epsilon) represents a random error term.


pages: 311 words: 99,699

Fool's Gold: How the Bold Dream of a Small Tribe at J.P. Morgan Was Corrupted by Wall Street Greed and Unleashed a Catastrophe by Gillian Tett

"World Economic Forum" Davos, accounting loophole / creative accounting, Alan Greenspan, asset-backed security, bank run, banking crisis, Bear Stearns, Black-Scholes formula, Blythe Masters, book value, break the buck, Bretton Woods, business climate, business cycle, buy and hold, collateralized debt obligation, commoditize, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, easy for humans, difficult for computers, financial engineering, financial innovation, fixed income, Glass-Steagall Act, housing crisis, interest rate derivative, interest rate swap, inverted yield curve, junk bonds, Kickstarter, locking in a profit, Long Term Capital Management, low interest rates, McMansion, Michael Milken, money market fund, mortgage debt, North Sea oil, Northern Rock, Plato's cave, proprietary trading, Renaissance Technologies, risk free rate, risk tolerance, Robert Shiller, Satyajit Das, Savings and loan crisis, short selling, sovereign wealth fund, statistical model, tail risk, The Great Moderation, too big to fail, value at risk, yield curve

JPMorgan Chase, Deutsche Bank, and many other banks and funds suffered substantial losses. For a few weeks after the turmoil, the banking community engaged in soul-searching. At J.P. Morgan the traders stuck bananas on their desks as a jibe at the so-called F9 model monkeys, the mathematical wizards who had created such havoc. (The “monkeys” who wrote the statistical models tended to use the “F9” key on the computer when they performed their calculations, giving rise to the tag.) J.P. Morgan, Deutsche, and others conducted internal reviews that led them to introduce slight changes in their statistical systems. GLG Ltd., one large hedge fund, told its investors that it would use a wider set of data to analyze CDOs in the future.

Compared to Greenspan, Geithner was not just younger, but he also commanded far less clout and respect. As the decade wore on, though, he became privately uneasy about some of the trends in the credit world. From 2005 onwards, he started to call on bankers to prepare for so-called “fat tails,” a statistical term for extremely negative events that occur more often than the normal bell curve statistical models the banks’ risk assessment relied on so much implied. He commented in the spring of 2006: “A number of fundamental changes in the US financial system over the past twenty-five years appear to have rendered it able to withstand the stress of a broader array of shocks than was the case in the past.


Calling Bullshit: The Art of Scepticism in a Data-Driven World by Jevin D. West, Carl T. Bergstrom

airport security, algorithmic bias, AlphaGo, Amazon Mechanical Turk, Andrew Wiles, Anthropocene, autism spectrum disorder, bitcoin, Charles Babbage, cloud computing, computer vision, content marketing, correlation coefficient, correlation does not imply causation, crowdsourcing, cryptocurrency, data science, deep learning, deepfake, delayed gratification, disinformation, Dmitri Mendeleev, Donald Trump, Elon Musk, epigenetics, Estimating the Reproducibility of Psychological Science, experimental economics, fake news, Ford Model T, Goodhart's law, Helicobacter pylori, Higgs boson, invention of the printing press, John Markoff, Large Hadron Collider, longitudinal study, Lyft, machine translation, meta-analysis, new economy, nowcasting, opioid epidemic / opioid crisis, p-value, Pluto: dwarf planet, publication bias, RAND corporation, randomized controlled trial, replication crisis, ride hailing / ride sharing, Ronald Reagan, selection bias, self-driving car, Silicon Valley, Silicon Valley startup, social graph, Socratic dialogue, Stanford marshmallow experiment, statistical model, stem cell, superintelligent machines, systematic bias, tech bro, TED Talk, the long tail, the scientific method, theory of mind, Tim Cook: Apple, twin studies, Uber and Lyft, Uber for X, uber lyft, When a measure becomes a target

Modeling the changes in winning times, the authors predicted that women will outsprint men by the 2156 Olympic Games. It may be true that women will someday outsprint men, but this analysis does not provide a compelling argument. The authors’ conclusions were based on an overly simplistic statistical model. As shown above, the researchers fit a straight line through the times for women, and a separate straight line through the times for men. If you use this model to estimate future times, it predicts that women will outsprint men in the year 2156. In that year, the model predicts that women will finish the hundred-meter race in about 8.08 seconds and men will be shortly behind with times of about 8.10 seconds.

Using the same model, he extrapolated further into the future and came to the preposterous conclusion that late-millennium sprinters will run the hundred-meter dash in negative times. Clearly this can’t be true, so we should be skeptical of the paper’s other surprising results, such as the forecasted gender reversal in winning times. Another lesson here is to be careful about what kind of model is employed. A model may pass all the formal statistical model-fitting tests. But if it does not account for real biology—in this case, the physical limits to how fast any organism can run—we should be careful about what we conclude. BE MEMORABLE Functional magnetic resonance imaging (fMRI) allows neuroscientists to explore what brain regions are involved in what sorts of cognitive tasks.


pages: 420 words: 100,811

We Are Data: Algorithms and the Making of Our Digital Selves by John Cheney-Lippold

algorithmic bias, bioinformatics, business logic, Cass Sunstein, centre right, computer vision, critical race theory, dark matter, data science, digital capitalism, drone strike, Edward Snowden, Evgeny Morozov, Filter Bubble, Google Chrome, Google Earth, Hans Moravec, Ian Bogost, informal economy, iterative process, James Bridle, Jaron Lanier, Julian Assange, Kevin Kelly, late capitalism, Laura Poitras, lifelogging, Lyft, machine readable, machine translation, Mark Zuckerberg, Marshall McLuhan, mass incarceration, Mercator projection, meta-analysis, Nick Bostrom, Norbert Wiener, offshore financial centre, pattern recognition, price discrimination, RAND corporation, Ray Kurzweil, Richard Thaler, ride hailing / ride sharing, Rosa Parks, Silicon Valley, Silicon Valley startup, Skype, Snapchat, software studies, statistical model, Steven Levy, technological singularity, technoutopianism, the scientific method, Thomas Bayes, Toyota Production System, Turing machine, uber lyft, web application, WikiLeaks, Zimmermann PGP

The example of Shimon offers us an excellent insight into the real cultural work that is being done by algorithmic processing. Shimon’s algorithmic ‘Coltrane’ and ‘Monk’ are new cultural forms that are innovative, a touch random, and ultimately removed from a doctrinaire politics of what jazz is supposed to be. It instead follows what ‘jazz’ is and can be according to the musical liberties taken by predictive statistical modeling. This is what anthropologist Eitan Wilf has called the “stylizing of styles”—or the way that learning algorithms challenge how we understand style as an aesthetic form.97 ‘Jazz’ is jazz but also not. As a measurable type, it’s something divergent, peculiarly so but also anthropologically so.

For example, inspired by Judith Butler’s theory of gender performance, some U.S. machine-learning researchers looked to unpack the intersectional essentialism implicit in closed, concretized a priori notions like ‘old’ and ‘man’:117 The increasing prevalence of online social media for informal communication has enabled large-scale statistical modeling of the connection between language style and social variables, such as gender, age, race, and geographical origin. Whether the goal of such research is to understand stylistic differences or to learn predictive models of “latent attributes,” there is often an implicit assumption that linguistic choices are associated with immutable and essential categories of people.


The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do by Erik J. Larson

AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Alignment Problem, AlphaGo, Amazon Mechanical Turk, artificial general intelligence, autonomous vehicles, Big Tech, Black Swan, Bletchley Park, Boeing 737 MAX, business intelligence, Charles Babbage, Claude Shannon: information theory, Computing Machinery and Intelligence, conceptual framework, correlation does not imply causation, data science, deep learning, DeepMind, driverless car, Elon Musk, Ernest Rutherford, Filter Bubble, Geoffrey Hinton, Georg Cantor, Higgs boson, hive mind, ImageNet competition, information retrieval, invention of the printing press, invention of the wheel, Isaac Newton, Jaron Lanier, Jeff Hawkins, John von Neumann, Kevin Kelly, Large Hadron Collider, Law of Accelerating Returns, Lewis Mumford, Loebner Prize, machine readable, machine translation, Nate Silver, natural language processing, Nick Bostrom, Norbert Wiener, PageRank, PalmPilot, paperclip maximiser, pattern recognition, Peter Thiel, public intellectual, Ray Kurzweil, retrograde motion, self-driving car, semantic web, Silicon Valley, social intelligence, speech recognition, statistical model, Stephen Hawking, superintelligent machines, tacit knowledge, technological singularity, TED Talk, The Coming Technological Singularity, the long tail, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, theory of mind, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, Yochai Benkler

Getting a misclassified photo on Facebook or a boring movie recommendation on Netflix may not get us into much trou­ble with reli- P rob­lems with D eduction and I nduction 129 ance on data-­driven induction, but driverless cars and other critical technologies certainly can. A growing number of AI scientists understand the issue. Oren Etzioni, head of the Allen Institute for Artificial Intelligence, calls machine learning and big data “high-­capacity statistical models.”9 That’s impressive computer science, but it’s not general intelligence. Intelligent minds bring understanding to data, and can connect dots that lead to an appreciation of failure points and abnormalities. Data and data analy­sis a­ ren’t enough. T H E P R O B ­L E M O F I N F E R E N C E A S T R U S T In an illuminating critique of induction as used for financial forecasting, former stock trader Nassim Nicholas Taleb divides statistical prediction prob­lems into four quadrants, with the variables being, first, ­whether the decision to be made is ­simple (binary) or complex, and second, w ­ hether the randomness involved is “mediocre” or extreme.

Like the meandering line explaining the existing points on a scatter plot, the models turned out to have no predictive or scientific value. ­There are numerous such fiascoes involving earthquake prediction by geologists, as Silver points out, culminating in the now-­famous failure of Rus­sian mathematical geophysicist Vladimir Keilis-­Borok to predict an earthquake in the Mojave Desert in 2004 using an “elaborate and opaque” statistical model that identified patterns from smaller earthquakes in par­tic­u ­lar regions, generalizing to larger ones. Keilis-­Borok’s student David Bowman, who is now Chair of the Department of Geological Sciences at Cal State Fullerton, admitted in a rare bit of scientific humility that the Keilis-­Borok model was simply overfit.


pages: 123 words: 32,382

Grouped: How Small Groups of Friends Are the Key to Influence on the Social Web by Paul Adams

Airbnb, Cass Sunstein, cognitive dissonance, content marketing, David Brooks, Dunbar number, information retrieval, invention of the telegraph, Jeff Hawkins, mirror neurons, planetary scale, race to the bottom, Richard Thaler, sentiment analysis, social web, statistical model, the strength of weak ties, The Wisdom of Crowds, web application, white flight

Research by Forrester found that cancer patients trust their local care physician more than world renowned cancer treatment centers, and in most cases, the patient had known their local care physician for years.16 We overrate the advice of experts Psychologist Philip Tetlock conducted numerous studies to test the accuracy of advice from experts in the fields of journalism and politics. He quantified over 82,000 predictions and found that the journalism experts tended to perform slightly worse than picking answers at random. Political experts didn’t fare much better. They slightly outperformed random chance, but did not perform as well as a basic statistical model. In fact, they actually performed slightly better at predicting things outside their area of expertise, and 80 percent of their predictions were wrong. Studies in finance also show that only 20 percent of investment bankers outperform the stock market.17 We overestimate what we know Sometimes we consider ourselves as experts, even though we don’t know as much as we think we know.


pages: 103 words: 32,131

Program Or Be Programmed: Ten Commands for a Digital Age by Douglas Rushkoff

Alan Greenspan, banking crisis, big-box store, citizen journalism, cloud computing, digital map, East Village, financial innovation, Firefox, Future Shock, hive mind, Howard Rheingold, invention of the printing press, Kevin Kelly, Marshall McLuhan, mirror neurons, peer-to-peer, public intellectual, Silicon Valley, statistical model, Stewart Brand, Ted Nelson, WikiLeaks

As baseball became a business, the fans took back baseball as a game—even if it had to happen on their computers. The effects didn’t stay in the computer. Leveraging the tremendous power of digital abstraction back to the real world, Billy Bean, coach of the Oakland Athletics, applied these same sorts of statistical modeling to players for another purpose: to assemble a roster for his own Major League team. Bean didn’t have the same salary budget as his counterparts in New York or Los Angeles, and he needed to find another way to assemble a winning combination. So he abstracted and modeled available players in order to build a better team that went from the bottom to the top of its division, and undermined the way that money had come to control the game.


pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel

Alan Greenspan, Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apollo 11, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, butter production in bangladesh, call centre, Charles Lindbergh, commoditize, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil, data science, driverless car, en.wikipedia.org, Erik Brynjolfsson, Everything should be made as simple as possible, experimental subject, Google Glasses, happiness index / gross national happiness, information security, job satisfaction, Johann Wolfgang von Goethe, lifelogging, machine readable, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mass immigration, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, Shai Danziger, software as a service, SpaceShipOne, speech recognition, statistical model, Steven Levy, supply chain finance, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Thomas Davenport, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra, zero-sum game

GlaxoSmithKline (UK): Vladimir Anisimov, GlaxoSmithKline, “Predictive Analytic Patient Recruitment and Drug Supply Modelling in Clinical Trials,” Predictive Analytics World London Conference, November 30, 2011, London, UK. www.predictiveanalyticsworld.com/london/2011/agenda.php#day1–16. Vladimir V. Anisimov, “Statistical Modelling of Clinical Trials (Recruitment and Randomization),” Communications in Statistics—Theory and Methods 40, issue 19–20 (2011): 3684–3699. www.tandfonline.com/toc/lsta20/40/19–20. MultiCare Health System (four hospitals in Washington): Karen Minich-Pourshadi for HealthLeaders Media, “Hospital Data Mining Hits Paydirt,” HealthLeaders Media Online, November 29, 2010. www.healthleadersmedia.com/page-1/FIN-259479/Hospital-Data-Mining-Hits-Paydirt.

Johnson, Serena Lee, Frank Doherty, and Arthur Kressner (Consolidated Edison Company of New York), “Predicting Electricity Distribution Feeder Failures Using Machine Learning Susceptibility Analysis,” March 31, 2006. www.phillong.info/publications/GBAetal06_susc.pdf. This work has been partly supported by a research contract from Consolidated Edison. BNSF Railway: C. Tyler Dick, Christopher P. L. Barkan, Edward R. Chapman, and Mark P. Stehly, “Multivariate Statistical Model for Predicting Occurrence and Location of Broken Rails,” Transportation Research Board of the National Academies, January 26, 2007. http://trb.metapress.com/content/v2j6022171r41478/. See also: http://ict.uiuc.edu/railroad/cee/pdf/Dick_et_al_2003.pdf. TTX: Thanks to Mahesh Kumar at Tiger Analytics for this case study, “Predicting Wheel Failure Rate for Railcars.”


pages: 416 words: 108,370

Hit Makers: The Science of Popularity in an Age of Distraction by Derek Thompson

Airbnb, Albert Einstein, Alexey Pajitnov wrote Tetris, always be closing, augmented reality, Clayton Christensen, data science, Donald Trump, Downton Abbey, Ford Model T, full employment, game design, Golden age of television, Gordon Gekko, hindsight bias, hype cycle, indoor plumbing, industrial cluster, information trail, invention of the printing press, invention of the telegraph, Jeff Bezos, John Snow's cholera map, Kevin Roose, Kodak vs Instagram, linear programming, lock screen, Lyft, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Mary Meeker, Menlo Park, Metcalfe’s law, Minecraft, Nate Silver, Network effects, Nicholas Carr, out of africa, planned obsolescence, power law, prosperity theology / prosperity gospel / gospel of success, randomized controlled trial, recommendation engine, Robert Gordon, Ronald Reagan, Savings and loan crisis, Silicon Valley, Skype, Snapchat, social contagion, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Steven Pinker, subscription business, TED Talk, telemarketer, the medium is the message, The Rise and Fall of American Growth, Tyler Cowen, Uber and Lyft, Uber for X, uber lyft, Vilfredo Pareto, Vincenzo Peruggia: Mona Lisa, women in the workforce

, Guys and Dolls, Lady and the Tramp, Strategic Air Command, Not as a Stranger, To Hell and Back, The Sea Chase, The Seven Year Itch, and The Tall Men. If you’ve heard of five of those twelve movies, you have me beat. And yet they were all more popular than the film that launched the bestselling rock song of all time. There is no statistical model in the world to forecast that the forgotten B-side of a middling record played over the credits of the thirteenth most popular movie of any year will automatically become the most popular rock-and-roll song of all time. The business of creativity is a game of chance—a complex, adaptive, semi-chaotic game with Bose-Einstein distribution dynamics and Pareto’s power law characteristics with dual-sided uncertainty.

killing 127 people in three days: Kathleen Tuthill, “John Snow and the Broad Street Pump,” Cricket 31, no. 3 (November 2003), reprinted by UCLA Department of Epidemiology, www.ph.ucla.edu/epi/snow/snowcricketarticle.html. “There were only ten deaths in houses”: John Snow, Medical Times and Gazette 9, September 23, 1854: 321–22, reprinted by UCLA Department of Epidemiology, www.ph.ucla.edu/epi/snow/choleraneargoldensquare.html. Note: Other accounts of Snow’s methodology, such as David Freedman’s paper “Statistical Models and Shoe Leather,” give more weight to Snow’s investigation of the water supply companies. A few years before the outbreak, one of London’s water suppliers had moved its intake point upstream from the main sewage discharge on the Thames, while another company kept its intake point downstream from the sewage.


Capital Ideas Evolving by Peter L. Bernstein

Albert Einstein, algorithmic trading, Andrei Shleifer, asset allocation, behavioural economics, Black Monday: stock market crash in 1987, Bob Litterman, book value, business cycle, buy and hold, buy low sell high, capital asset pricing model, commodity trading advisor, computerized trading, creative destruction, currency risk, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, diversification, diversified portfolio, endowment effect, equity premium, equity risk premium, Eugene Fama: efficient market hypothesis, financial engineering, financial innovation, fixed income, high net worth, hiring and firing, index fund, invisible hand, Isaac Newton, John Meriwether, John von Neumann, Joseph Schumpeter, Kenneth Arrow, London Interbank Offered Rate, Long Term Capital Management, loss aversion, Louis Bachelier, market bubble, mental accounting, money market fund, Myron Scholes, paper trading, passive investing, Paul Samuelson, Performance of Mutual Funds in the Period, price anchoring, price stability, random walk, Richard Thaler, risk free rate, risk tolerance, risk-adjusted returns, risk/return, Robert Shiller, seminal paper, Sharpe ratio, short selling, short squeeze, Silicon Valley, South Sea Bubble, statistical model, survivorship bias, systematic trading, tail risk, technology bubble, The Wealth of Nations by Adam Smith, transaction costs, yield curve, Yogi Berra, zero-sum game

W * Unless otherwise specif ied, quotations are from personal interviews or correspondence. 58 bern_c05.qxd 3/23/07 9:02 AM Page 59 Andrew Lo 59 While he was at Bronx Science, Lo read The Foundation Trilogy by the science fiction writer Isaac Asimov. The story was about a mathematician who develops a theory of human behavior called “psychohistory.” Psychohistory can predict the future course of human events, but only when the population reaches a certain size because the predictions are based on statistical models. Lo was hooked. He found Asimov’s narrative to be plausible enough to become a reality some day, and he wanted to be the one to make it happen. Economics, especially game theory and mathematical economics, looked like the best way to get started. He made the decision in his second year at Yale to do just that.

At that moment, in the early 1980s, academics in the field of financial economics were still working out the full theoretical implications of Markowitz’s theory of portfolio selection, the Efficient Market Hypothesis, the Capital Asset Pricing Model, the options pricing model, and Modigliani and Miller’s iconoclastic ideas about corporate finance and the central role of arbitrage. bern_c05.qxd 60 3/23/07 9:02 AM Page 60 THE THEORETICIANS That emphasis on theory made the bait even tastier for Lo. He saw the way clear to follow Asimov’s advice. By applying statistical models to the daily practice of finance in the real world, he would not only move the field of finance forward from its focus on theory, but even more enticing, he would also find the holy grail he was seeking in the first place: solutions to Asimov’s psychohistory. Progress was rapid. By 1988 he was an untenured professor at MIT, having turned down an offer of tenure to stay at Wharton.


pages: 719 words: 104,316

R Cookbook by Paul Teetor

Debian, en.wikipedia.org, p-value, quantitative trading / quantitative finance, statistical model

Solution The factor function encodes your vector of discrete values into a factor: > f <- factor(v) # v is a vector of strings or integers If your vector contains only a subset of possible values and not the entire universe, then include a second argument that gives the possible levels of the factor: > f <- factor(v, levels) Discussion In R, each possible value of a categorical variable is called a level. A vector of levels is called a factor. Factors fit very cleanly into the vector orientation of R, and they are used in powerful ways for processing data and building statistical models. Most of the time, converting your categorical data into a factor is a simple matter of calling the factor function, which identifies the distinct levels of the categorical data and packs them into a factor: > f <- factor(c("Win","Win","Lose","Tie","Win","Lose")) > f [1] Win Win Lose Tie Win Lose Levels: Lose Tie Win Notice that when we printed the factor, f, R did not put quotes around the values.

See Also The help page for par lists the global graphics parameters; the chapter of R in a Nutshell on graphics includes the list with useful annotations. R Graphics contains extensive explanations of graphics parameters. Chapter 11. Linear Regression and ANOVA Introduction In statistics, modeling is where we get down to business. Models quantify the relationships between our variables. Models let us make predictions. A simple linear regression is the most basic model. It’s just two variables and is modeled as a linear relationship with an error term: yi = β0 + β1xi + εi We are given the data for x and y.


pages: 319 words: 106,772

Irrational Exuberance: With a New Preface by the Author by Robert J. Shiller

Alan Greenspan, Andrei Shleifer, asset allocation, banking crisis, benefit corporation, Benoit Mandelbrot, book value, business cycle, buy and hold, computer age, correlation does not imply causation, Daniel Kahneman / Amos Tversky, demographic transition, diversification, diversified portfolio, equity premium, Everybody Ought to Be Rich, experimental subject, hindsight bias, income per capita, index fund, Intergovernmental Panel on Climate Change (IPCC), Joseph Schumpeter, Long Term Capital Management, loss aversion, Mahbub ul Haq, mandelbrot fractal, market bubble, market design, market fundamentalism, Mexican peso crisis / tequila crisis, Milgram experiment, money market fund, moral hazard, new economy, open economy, pattern recognition, Phillips curve, Ponzi scheme, price anchoring, random walk, Richard Thaler, risk tolerance, Robert Shiller, Ronald Reagan, Small Order Execution System, spice trade, statistical model, stocks for the long run, Suez crisis 1956, survivorship bias, the market place, Tobin tax, transaction costs, tulip mania, uptick rule, urban decay, Y2K

Another argument advanced to explain why days of unusually large stock price movements have often not been found to coincide with important news is that a confluence of factors may cause a significant market change, even if the individual factors themselves are not particularly newsworthy. For example, suppose certain investors are informally using a particular statistical model that forecasts fundamental value using a number of economic indicators. If all or most of these particular indicators point the same way on a given day, even if no single one of them is of any substantive importance by itself, their combined effect will be noteworthy. Both of these interpretations of the tenuous relationship between news and market movements assume that the public is paying continuous attention to the news—reacting sensitively to the slightest clues about market fundamentals, constantly and carefully adding up all the disparate pieces of evidence.

Merton, with Terry Marsh, wrote an article in the American Economic Review in 1986 that argued against my results and concluded, ironically, that speculative markets were not too volatile.26 John Campbell and I wrote a number of papers attempting to put these claims of excess volatility on a more secure footing, and we developed statistical models to study the issue and deal with some of the problems emphasized by the critics.27 We felt that we had established in a fairly convincing way that stock markets do violate the efficient markets model. Our research has not completely settled the matter, however. There are just too many possible statistical issues that can be raised, and the sample provided by only a little over a century of data cannot prove anything conclusively.


pages: 350 words: 103,270

The Devil's Derivatives: The Untold Story of the Slick Traders and Hapless Regulators Who Almost Blew Up Wall Street . . . And Are Ready to Do It Again by Nicholas Dunbar

Alan Greenspan, asset-backed security, bank run, banking crisis, Basel III, Bear Stearns, behavioural economics, Black Swan, Black-Scholes formula, bonus culture, book value, break the buck, buy and hold, capital asset pricing model, Carmen Reinhart, Cass Sunstein, collateralized debt obligation, commoditize, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, delayed gratification, diversification, Edmond Halley, facts on the ground, fear index, financial innovation, fixed income, George Akerlof, Glass-Steagall Act, Greenspan put, implied volatility, index fund, interest rate derivative, interest rate swap, Isaac Newton, John Meriwether, junk bonds, Kenneth Rogoff, Kickstarter, Long Term Capital Management, margin call, market bubble, money market fund, Myron Scholes, Nick Leeson, Northern Rock, offshore financial centre, Paul Samuelson, price mechanism, proprietary trading, regulatory arbitrage, rent-seeking, Richard Thaler, risk free rate, risk tolerance, risk/return, Ronald Reagan, Salesforce, Savings and loan crisis, seminal paper, shareholder value, short selling, statistical model, subprime mortgage crisis, The Chicago School, Thomas Bayes, time value of money, too big to fail, transaction costs, value at risk, Vanguard fund, yield curve, zero-sum game

The mattress had done its job—it had given international regulators the confidence to sign off as commercial banks built up their trading businesses. Betting—and Beating—the Spread Now return to the trading floor, to the people regulators and bank senior management need to police. Although they are taught to overcome risk aversion, traders continue to look for a mattress everywhere, in the form of “free lunches.” But do they use statistical modeling to identify a mattress, and make money? If you talk to traders, the answer tends to be no. Listen to the warning of a senior Morgan Stanley equities trader who I interviewed in 2009: “You can compare to theoretical or historic value. But these forms of trading are probably a bit dangerous.”

According to the Morgan Stanley trader, “You study the perception of the market: I buy this because the next tick will be on the upside, or I sell because the next tick will be on the downside. This is probably based on the observations of your peers and so on. If you look purely at the anticipation of the price, that’s a way to make money in trading.” One reason traders don’t tend to make outright bets on the basis of statistical modeling is that capital rules such as VAR discourage it. The capital required to be set aside by VAR scales up with the size of the positions and the degree of worst-case scenario projected by the statistics. For volatile markets like equities, that restriction takes a big bite out of potential profit since trading firms must borrow to invest.5 On the other hand, short-term, opportunistic trading (which might be less profitable) slips under the VAR radar because the positions never stay on the books for very long.


pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs

Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, Free Software Foundation, game design, information retrieval, iterative process, language acquisition, machine readable, machine translation, natural language processing, pattern recognition, performance metric, power law, sentiment analysis, social web, sparse data, speech recognition, statistical model, text mining

The British National Corpus (BNC) is compiled and released as the largest corpus of English to date (100 million words). The Text Encoding Initiative (TEI) is established to develop and maintain a standard for the representation of texts in digital form. 2000s: As the World Wide Web grows, more data is available for statistical models for Machine Translation and other applications. The American National Corpus (ANC) project releases a 22-million-word subcorpus, and the Corpus of Contemporary American English (COCA) is released (400 million words). Google releases its Google N-gram Corpus of 1 trillion word tokens from public web pages.

We can identify two basic methods for sequence classification: Feature-based classification A sequence is tranformed into a feature vector. The vector is then classified according to conventional classifier methods. Model-based classification An inherent model of the probability distribution of the sequence is built. HMMs and other statistical models are examples of this method. Included in feature-based methods are n-gram models of sequences, where an n-gram is selected as a feature. Given a set of such n-grams, we can represent a sequence as a binary vector of the occurrence of the n-grams, or as a vector containing frequency counts of the n-grams.


pages: 353 words: 106,704

Choked: Life and Breath in the Age of Air Pollution by Beth Gardiner

barriers to entry, Boris Johnson, call centre, carbon footprint, clean water, connected car, Crossrail, deindustrialization, Donald Trump, Elon Musk, epigenetics, Exxon Valdez, failed state, Hyperloop, index card, Indoor air pollution, Mahatma Gandhi, megacity, meta-analysis, rolling blackouts, Ronald Reagan, self-driving car, Silicon Valley, Skype, statistical model, Steve Jobs, TED Talk, white picket fence

Tiny airborne particles known as PM2.5, so small they are thought to enter the bloodstream and penetrate vital organs, including the brain, were a far more potent danger. Nitrogen dioxide, one of a family of gases known as NOx, also had a powerful effect. In fact, it poured out of cars, trucks, and ships in such close synchronicity with PM2.5 that even Jim Gauderman’s statistical models couldn’t disentangle the two pollutants’ effects. That wasn’t all. In what may have been their most worrisome discovery, the team found the pollutants were wreaking harm even at levels long assumed to be safe. In the years to come, the implications of that uncomfortable finding would be felt far beyond the pages of prestigious scientific journals

These numbers are everywhere: more than a million and a half annual air pollution deaths each for China and India.10 Approaching a half million in Europe.11 Upward of a hundred thousand in America.12 None are arrived at by counting individual cases; like Walton’s, they’re all derived through complex statistical modeling. Even if you tried, David Spiegelhalter says, it would be impossible to compile a body-by-body tabulation, since pollution—unlike, say, a heart attack or stroke—is not a cause of death in the medical sense. It’s more akin to smoking, obesity, or inactivity, all risk factors that can hasten a death or make it more likely, either alone or as one of several contributing factors.


pages: 407 words: 104,622

The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution by Gregory Zuckerman

affirmative action, Affordable Care Act / Obamacare, Alan Greenspan, Albert Einstein, Andrew Wiles, automated trading system, backtesting, Bayesian statistics, Bear Stearns, beat the dealer, behavioural economics, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Black Monday: stock market crash in 1987, blockchain, book value, Brownian motion, butter production in bangladesh, buy and hold, buy low sell high, Cambridge Analytica, Carl Icahn, Claude Shannon: information theory, computer age, computerized trading, Credit Default Swap, Daniel Kahneman / Amos Tversky, data science, diversified portfolio, Donald Trump, Edward Thorp, Elon Musk, Emanuel Derman, endowment effect, financial engineering, Flash crash, George Gilder, Gordon Gekko, illegal immigration, index card, index fund, Isaac Newton, Jim Simons, John Meriwether, John Nash: game theory, John von Neumann, junk bonds, Loma Prieta earthquake, Long Term Capital Management, loss aversion, Louis Bachelier, mandelbrot fractal, margin call, Mark Zuckerberg, Michael Milken, Monty Hall problem, More Guns, Less Crime, Myron Scholes, Naomi Klein, natural language processing, Neil Armstrong, obamacare, off-the-grid, p-value, pattern recognition, Peter Thiel, Ponzi scheme, prediction markets, proprietary trading, quantitative hedge fund, quantitative trading / quantitative finance, random walk, Renaissance Technologies, Richard Thaler, Robert Mercer, Ronald Reagan, self-driving car, Sharpe ratio, Silicon Valley, sovereign wealth fund, speech recognition, statistical arbitrage, statistical model, Steve Bannon, Steve Jobs, stochastic process, the scientific method, Thomas Bayes, transaction costs, Turing machine, Two Sigma

You can’t make any money in mathematics,” he sneered. The experience taught Patterson to distrust most moneymaking operations, even those that appeared legitimate—one reason why he was so skeptical of Simons years later. After graduate school, Patterson thrived as a cryptologist for the British government, building statistical models to unscramble intercepted messages and encrypt secret messages in a unit made famous during World War II when Alan Turing famously broke Germany’s encryption codes. Patterson harnessed the simple-yet-profound Bayes’ theorem of probability, which argues that, by updating one’s initial beliefs with new, objective information, one can arrive at improved understandings.

“Pie” is more likely to follow the word “apple” in a sentence than words like “him” or “the,” for example. Similar probabilities also exist for pronunciation, the IBM crew argued. Their goal was to feed their computers with enough data of recorded speech and written text to develop a probabilistic, statistical model capable of predicting likely word sequences based on sequences of sounds. Their computer code wouldn’t necessarily understand what it was transcribing, but it would learn to transcribe language, nonetheless. In mathematical terms, Brown, Mercer, and the rest of Jelinek’s team viewed sounds as the output of a sequence in which each step along the way is random, yet dependent on the previous step—a hidden Markov model.


pages: 456 words: 185,658

More Guns, Less Crime: Understanding Crime and Gun-Control Laws by John R. Lott

affirmative action, Columbine, crack epidemic, Donald Trump, Edward Glaeser, G4S, gun show loophole, income per capita, More Guns, Less Crime, Sam Peltzman, selection bias, statistical model, the medium is the message, transaction costs

As to the concern that other changes in law enforcement may have been occurring at the same time, the estimates account for changes in other gun-control laws and changes in law enforcement as measured by arrest and conviction rates as well as by prison terms. No previous study of crime has attempted to control for as many different factors that might explain changes in the crime rate. 3 Did I assume that there was an immediate and constant effect from these laws and that the effect should be the same everywhere? The “statistical models assumed: (1) an immediate and constant effect of shall-issue laws, and (2) similar effects across different states and counties.” (Webster, “Claims,” p. 2; see also Dan Black and Daniel Nagin, “Do ‘Right-to-Carry’ Laws Deter Violent Crime?” Journal of Legal Studies 27 [January 1998], p. 213.) One of the central arguments both in the original paper and in this book is that the size of the deterrent effect is related to the number of permits issued, and it takes many years before states reach their long-run level of permits.

A major reason for the larger effect on crime in the more urban counties was that in rural areas, permit requests already were being approved; hence it was in urban areas that the number of permitted concealed handguns increased the most. A week later, in response to a column that I published in the Omaha WorldHerald,20 Mr. Webster modified this claim somewhat: Lott claims that his analysis did not assume an immediate and constant effect, but that is contrary to his published article, in which the vast majority of the statistical models assume such an effect. (Daniel W. Webster, “Concealed-Gun Research Flawed,” Omaha World-Herald, March 12, 1997; emphasis added.) When one does research, it is most appropriate to take the simplest specifications first and then gradually make things more complicated. The simplest way of doing this is to examine the mean crime rates before and 136 | CHAPTER SEVEN after the change in a law.

While he includes a chapter that contains replies to his critics, unfortunately he doesn’t directly respond to the key Black and Nagin finding that formal statistical tests reject his methods. The closest he gets to addressing this point is to acknowledge “the more serious possibility is that some other factor may have caused both the reduction in crime rates and the passage of the law to occur at the same time,” but then goes on to say that he has “presented over a thousand [statistical model] specifications” that reveal “an extremely consistent pattern” that right-to-carry laws reduce crime. Another view would be that a thousand versions of a demonstrably invalid analytical approach produce boxes full of invalid results. (Jens Ludwig, “Guns and Numbers,” Washington Monthly, June 1998, p. 51)76 We applied a number of specification tests suggested by James J.


pages: 161 words: 39,526

Applied Artificial Intelligence: A Handbook for Business Leaders by Mariya Yao, Adelyn Zhou, Marlene Jia

Airbnb, algorithmic bias, AlphaGo, Amazon Web Services, artificial general intelligence, autonomous vehicles, backpropagation, business intelligence, business process, call centre, chief data officer, cognitive load, computer vision, conceptual framework, data science, deep learning, DeepMind, en.wikipedia.org, fake news, future of work, Geoffrey Hinton, industrial robot, information security, Internet of things, iterative process, Jeff Bezos, job automation, machine translation, Marc Andreessen, natural language processing, new economy, OpenAI, pattern recognition, performance metric, price discrimination, randomized controlled trial, recommendation engine, robotic process automation, Salesforce, self-driving car, sentiment analysis, Silicon Valley, single source of truth, skunkworks, software is eating the world, source of truth, sparse data, speech recognition, statistical model, strong AI, subscription business, technological singularity, The future is already here

“We’ve had a lot of success hiring from career fairs that Galvanize organizes, where we present the unique challenges our company tackles in healthcare,” he adds.(57) Experienced Scientists and Researchers Hiring experienced data scientists and machine learning researchers requires a different approach. For these positions, employers typically look for a doctorate or extensive experience in machine learning, statistical modeling, or related fields. You will usually source these talented recruits through strategic networking, academic conferences, or blatant poaching. To this end, you can partner with universities or research departments and sponsor conferences to build your brand reputation. You can also host competitions on Kaggle or similar platforms.


pages: 147 words: 39,910

The Great Mental Models: General Thinking Concepts by Shane Parrish

Albert Einstein, anti-fragile, Atul Gawande, Barry Marshall: ulcers, bitcoin, Black Swan, colonial rule, correlation coefficient, correlation does not imply causation, cuban missile crisis, Daniel Kahneman / Amos Tversky, dark matter, delayed gratification, feminist movement, Garrett Hardin, if you see hoof prints, think horses—not zebras, index fund, Isaac Newton, Jane Jacobs, John Bogle, Linda problem, mandelbrot fractal, Pepsi Challenge, Philippa Foot, Pierre-Simon Laplace, Ponzi scheme, Richard Feynman, statistical model, stem cell, The Death and Life of Great American Cities, the map is not the territory, the scientific method, Thomas Bayes, Torches of Freedom, Tragedy of the Commons, trolley problem

“It became possible also to map out master plans for the statistical city, and people take these more seriously, for we are all accustomed to believe that maps and reality are necessarily related, or that if they are not, we can make them so by altering reality.” 12 Jacobs’ book is, in part, a cautionary tale of what can happen when faith in the model influences the decisions we make in the territory. When we try to fit complexity into the simplification. _ Jacobs demonstrated that mapping the interaction between people and sidewalks was an important factor in determining how to improve city safety. «In general, when building statistical models, we must not forget that the aim is to understand something about the real world. Or predict, choose an action, make a decision, summarize evidence, and so on, but always about the real world, not an abstract mathematical world: our models are not the reality. » David Hand13 Conclusion Maps have long been a part of human society.


pages: 492 words: 118,882

The Blockchain Alternative: Rethinking Macroeconomic Policy and Economic Theory by Kariappa Bheemaiah

"World Economic Forum" Davos, accounting loophole / creative accounting, Ada Lovelace, Adam Curtis, Airbnb, Alan Greenspan, algorithmic trading, asset allocation, autonomous vehicles, balance sheet recession, bank run, banks create money, Basel III, basic income, behavioural economics, Ben Bernanke: helicopter money, bitcoin, Bletchley Park, blockchain, Bretton Woods, Brexit referendum, business cycle, business process, call centre, capital controls, Capital in the Twenty-First Century by Thomas Piketty, cashless society, cellular automata, central bank independence, Charles Babbage, Claude Shannon: information theory, cloud computing, cognitive dissonance, collateralized debt obligation, commoditize, complexity theory, constrained optimization, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, cross-border payments, crowdsourcing, cryptocurrency, data science, David Graeber, deep learning, deskilling, Diane Coyle, discrete time, disruptive innovation, distributed ledger, diversification, double entry bookkeeping, Ethereum, ethereum blockchain, fiat currency, financial engineering, financial innovation, financial intermediation, Flash crash, floating exchange rates, Fractional reserve banking, full employment, George Akerlof, Glass-Steagall Act, Higgs boson, illegal immigration, income inequality, income per capita, inflation targeting, information asymmetry, interest rate derivative, inventory management, invisible hand, John Maynard Keynes: technological unemployment, John von Neumann, joint-stock company, Joseph Schumpeter, junk bonds, Kenneth Arrow, Kenneth Rogoff, Kevin Kelly, knowledge economy, large denomination, Large Hadron Collider, Lewis Mumford, liquidity trap, London Whale, low interest rates, low skilled workers, M-Pesa, machine readable, Marc Andreessen, market bubble, market fundamentalism, Mexican peso crisis / tequila crisis, Michael Milken, MITM: man-in-the-middle, Money creation, money market fund, money: store of value / unit of account / medium of exchange, mortgage debt, natural language processing, Network effects, new economy, Nikolai Kondratiev, offshore financial centre, packet switching, Pareto efficiency, pattern recognition, peer-to-peer lending, Ponzi scheme, power law, precariat, pre–internet, price mechanism, price stability, private sector deleveraging, profit maximization, QR code, quantitative easing, quantitative trading / quantitative finance, Ray Kurzweil, Real Time Gross Settlement, rent control, rent-seeking, robo advisor, Satoshi Nakamoto, Satyajit Das, Savings and loan crisis, savings glut, seigniorage, seminal paper, Silicon Valley, Skype, smart contracts, software as a service, software is eating the world, speech recognition, statistical model, Stephen Hawking, Stuart Kauffman, supply-chain management, technology bubble, The Chicago School, The Future of Employment, The Great Moderation, the market place, The Nature of the Firm, the payments system, the scientific method, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, too big to fail, trade liberalization, transaction costs, Turing machine, Turing test, universal basic income, Vitalik Buterin, Von Neumann architecture, Washington Consensus

Thus, owing to their fundamental role in monetary policy decision making, it is important to understand the history, abilities and limitations of these models. Currently, most central banks, such as the Federal Reserve and the ECB,13 use two kinds of models to study and build forecasts about the economy (Axtell and Farmer, 2015). The first, statistical models , fit current aggregate data of variables such as GDP, interest rates, and unemployment to empirical data in order to predict/suggest what the near future holds. The second type of models (which are more widely used), are known as “Dynamic Stochastic General Equilibrium” (DSGE) models . These models are constructed on the basis that the economy would be at rest (i.e.: static equilibrium) if it wasn’t being randomly perturbed by events from outside the economy.

Buiter The Precariat: The New Dangerous Class (2011), Guy Standing Inventing the Future: Postcapitalism and a World Without Work (2015), Nick Srnicek and Alex Williams Raising the Floor: How a Universal Basic Income Can Renew Our Economy and Rebuild the American Dream (2016), Andy Stern Index A Aadhaar program Agent Based Computational Economics (ABCE) models complexity economists developments El Farol problem and minority games Kim-Markowitz Portfolio Insurers Model Santa Fe artificial stock market model Agent based modelling (ABM) aggregate behavioural trends axiomatisation, linearization and generalization black-boxing bottom-up approach challenge computational modelling paradigm conceptualizing, individual agents EBM enacting agent interaction environmental factors environment creation individual agent parameters and modelling decisions simulation designing specifying agent behaviour Alaska Anti-Money Laundering (AML) ARPANet Artificial Neural Networks (ANN) Atlantic model Automatic Speech Recognition (ASR) Autor-Levy-Murnane (ALM) B Bandits’ Club BankID system Basic Income Earth Network (BIEN) Bitnation Blockchain ARPANet break down points decentralized communication emails fiat currency functions Jiggery Pokery accounts malware protocols Satoshi skeleton keys smart contract TCP/IP protocol technological and financial innovation trade finance Blockchain-based regulatory framework (BRF) BlockVerify C Capitalism ALM hypotheses and SBTC Blockchain and CoCo canonical model cashlessenvironment See(Multiple currencies) categories classification definition of de-skilling process economic hypothesis education and training levels EMN fiat currency CBDC commercial banks debt-based money digital cash digital monetary framework fractional banking system framework ideas and methods non-bank private sector sovereign digital currency transition fiscal policy cashless environment central bank concept of control spending definition of exogenous and endogenous function fractional banking system Kelton, Stephanie near-zero interest rates policy instrument QE and QQE tendency ultra-low inflation helicopter drops business insider ceteris paribus Chatbots Chicago Plan comparative charts fractional banking keywords technology UBI higher-skilled workers ICT technology industry categories Jiggery Pokery accounts advantages bias information Blockchain CFTC digital environment Enron scandal limitations private/self-regulation public function regulatory framework tech-led firms lending and payments CAMELS evaluation consumers and SMEs cryptographic laws fundamental limitations governments ILP KYB process lending sector mobile banking payments industry regulatory pressures rehypothecation ripple protocol sectors share leveraging effect technology marketing money cashless system crime and taxation economy IRS money Seigniorage tax evasion markets and regulation market structure multiple currency mechanisms occupational categories ONET database policies economic landscape financialization monetary and fiscal policy money creation methods The Chicago Plan transformation probabilities regulation routine and non-routine routinization hypothesis Sarbanes-Oxley Act SBTC scalability issue skill-biased employment skills and technological advancement skills downgrading process trades See(Trade finance) UBI Alaska deployment Mincome, Canada Namibia Cashless system Cellular automata (CA) Central bank digital currency (CBDC) Centre for Economic Policy Research (CEPR) Chicago Plan Clearing House Interbank Payments System (CHIPS) Collateralised Debt Obligations (CDOs) Collateralized Loan Obligations (CLOs) Complexity economics agent challenges consequential decisions deterministic and axiomatized models dynamics education emergence exogenous and endogenous changes feedback loops information affects agents macroeconoic movements network science non-linearity path dependence power laws self-adapting individual agents technology andinvention See(Technology and invention) Walrasian approach Computing Congressional Research Service (CRS) Constant absolute risk aversion (CARA) Contingent convertible (CoCo) Credit Default Swaps (CDSs) CredyCo Cryptid Cryptographic law Currency mechanisms Current Account Switching System (CASS) D Data analysis techniques Debt and money broad and base money China’s productivity credit economic pressures export-led growth fractional banking See also((Fractional Reserve banking) GDP growth households junk bonds long-lasting effects private and public sectors problems pubilc and private level reaganomics real estate industry ripple effects security and ownership societal level UK DigID Digital trade documents (DOCS) Dodd-Frank Act Dynamic Stochastic General Equilibrium (DSGE) model E EBM SeeEquation based modelling (EBM) Economic entropy vs. economic equilibrium assemblages and adaptations complexity economics complexity theory DSGE based models EMH human uncertainty principle’ LHC machine-like system operating neuroscience findings reflexivity RET risk assessment scientific method technology and economy Economic flexibility Efficient markets hypothesis (EMH) eID system Electronic Discrete Variable Automatic Computer (EDVAC) Elliptical curve cryptography (ECC) EMH SeeEfficient Market Hypothesis (EMH) Equation based modelling (EBM) Equilibrium business-cycle models Equilibrium economic models contract theory contact incompleteness efficiency wages explicit contracts implicit contracts intellectual framework labor market flexibility menu cost risk sharing DSGE models Federal Reserve system implicit contracts macroeconomic models of business cycle NK models non-optimizing households principles RBC models RET ‘rigidity’ of wage and price change SIGE steady state equilibrium, economy structure Taylor rule FRB/US model Keynesian macroeconomic theory RBC models Romer’s analysis tests statistical models Estonian government European Migration Network (EMN) Exogenous and endogenous function Explicit contracts F Feedback loop Fiat currency CBDC commercial banks debt-based money digital cash digital monetary framework framework ideas and methods non-bank private sector sovereign digital currency transition Financialization de facto definition of eastern economic association enemy of my enemy is my friend FT slogans Palley, Thomas I.


pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack by Matthew A. Russell

Andy Rubin, business logic, Climategate, cloud computing, crowdsourcing, data science, en.wikipedia.org, fault tolerance, Firefox, folksonomy, full text search, Georg Cantor, Google Earth, information retrieval, machine readable, Mark Zuckerberg, natural language processing, NP-complete, power law, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, sparse data, statistical model, Steve Jobs, supply-chain management, text mining, traveling salesman, Turing test, web application

For example, what would the precision, recall, and F1 score have been if your algorithm had identified “Mr. Green”, “Colonel”, “Mustard”, and “candlestick”? As somewhat of an aside, you might find it interesting to know that many of the most compelling technology stacks used by commercial businesses in the NLP space use advanced statistical models to process natural language according to supervised learning algorithms. A supervised learning algorithm is essentially an approach in which you provide training samples of the form [(input1, output1), (input2, output2), ..., (inputN, outputN)] to a model such that the model is able to predict the tuples with reasonable accuracy.

SocialGraph Node Mapper, Brief analysis of breadth-first techniques sorting, Sensible Sorting, Sorting Documents by Value documents by value, Sorting Documents by Value documents in CouchDB, Sensible Sorting split method, using to tokenize text, Data Hacking with NLTK, Before You Go Off and Try to Build a Search Engine… spreadsheets, visualizing Facebook network data, Visualizing with spreadsheets (the old-fashioned way) statistical models processing natural language, Quality of Analytics stemming verbs, Querying Buzz Data with TF-IDF stopwords, Data Hacking with NLTK, Analysis of Luhn’s Summarization Algorithm downloading NLTK stopword data, Data Hacking with NLTK filtering out before document summarization, Analysis of Luhn’s Summarization Algorithm streaming API (Twitter), Analyzing Tweets (One Entity at a Time) Strong Links API, The Infochimps “Strong Links” API, Interactive 3D Graph Visualization student’s t-score, How the Collocation Sausage Is Made: Contingency Tables and Scoring Functions subject-verb-object triples, Entity-Centric Analysis: A Deeper Understanding of the Data, Man Cannot Live on Facts Alone summarizing documents, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm analysis of Luhn’s algorithm, Analysis of Luhn’s Summarization Algorithm Tim O’Reilly Radar blog post (example), Summarizing Documents summingReducer function, Frequency by date/time range, What entities are in Tim’s tweets?


pages: 302 words: 82,233

Beautiful security by Andy Oram, John Viega

Albert Einstein, Amazon Web Services, An Inconvenient Truth, Bletchley Park, business intelligence, business process, call centre, cloud computing, corporate governance, credit crunch, crowdsourcing, defense in depth, do well by doing good, Donald Davies, en.wikipedia.org, fault tolerance, Firefox, information security, loose coupling, Marc Andreessen, market design, MITM: man-in-the-middle, Monroe Doctrine, new economy, Nicholas Carr, Nick Leeson, Norbert Wiener, operational security, optical character recognition, packet switching, peer-to-peer, performance metric, pirate software, Robert Bork, Search for Extraterrestrial Intelligence, security theater, SETI@home, Silicon Valley, Skype, software as a service, SQL injection, statistical model, Steven Levy, the long tail, The Wisdom of Crowds, Upton Sinclair, web application, web of trust, zero day, Zimmermann PGP

Ashenfelter is a statistician at Princeton who loves wine but is perplexed by the pomp and circumstance around valuing and rating wine in much the same way I am perplexed by the pomp and circumstance surrounding risk management today. In the 1980s, wine critics dominated the market with predictions based on their own reputations, palate, and frankly very little more. Ashenfelter, in contrast, studied the Bordeaux region of France and developed a statistic model about the quality of wine. His model was based on the average rainfall in the winter before the growing season (the rain that makes the grapes plump) and the average sunshine during the growing season (the rays that make the grapes ripe), resulting in simple formula: quality = 12.145 + (0.00117 * winter rainfall) + (0.0614 * average growing season temperature) (0.00386 * harvest rainfall) Of course he was chastised and lampooned by the stuffy wine critics who dominated the industry, but after several years of producing valuable results, his methods are now widely accepted as providing important valuation criteria for wine.

I believe the right elements are really coming together where technology can create better technology. Advances in technology have been used to both arm and disarm the planet, to empower and oppress populations, and to attack and defend the global community and all it will have become. The areas I’ve pulled together in this chapter—from business process management, number crunching and statistical modeling, visualization, and long-tail technology—provide fertile ground for security management systems in the future that archive today’s best efforts in the annals of history. At least I hope so, for I hate mediocrity with a passion and I think security management systems today are mediocre at best!


pages: 404 words: 43,442

The Art of R Programming by Norman Matloff

data science, Debian, discrete time, Donald Knuth, functional programming, general-purpose programming language, linked data, sorting algorithm, statistical model

This approach is used in the loop beginning at line 53. (Arguably, in this case, the increase in speed comes at the expense of readability of the code.) 9.1.7 Extended Example: A Procedure for Polynomial Regression As another example, consider a statistical regression setting with one predictor variable. Since any statistical model is merely an approximation, in principle, you can get better and better models by fitting polynomials of higher and higher degrees. However, at some point, this becomes overfitting, so that the prediction of new, future data actually deteriorates for degrees higher than some value. The class "polyreg" aims to deal with this issue.

Input/Output 239 We’ll create a function called extractpums() to read in a PUMS file and create a data frame from its Person records. The user specifies the filename and lists fields to extract and names to assign to those fields. We also want to retain the household serial number. This is good to have because data for persons in the same household may be correlated and we may want to add that aspect to our statistical model. Also, the household data may provide important covariates. (In the latter case, we would want to retain the covariate data as well.) Before looking at the function code, let’s see what the function does. In this data set, gender is in column 23 and age in columns 25 and 26. In the example, our filename is pumsa.


pages: 370 words: 112,809

The Equality Machine: Harnessing Digital Technology for a Brighter, More Inclusive Future by Orly Lobel

2021 United States Capitol attack, 23andMe, Ada Lovelace, affirmative action, Airbnb, airport security, Albert Einstein, algorithmic bias, Amazon Mechanical Turk, augmented reality, barriers to entry, basic income, Big Tech, bioinformatics, Black Lives Matter, Boston Dynamics, Charles Babbage, choice architecture, computer vision, Computing Machinery and Intelligence, contact tracing, coronavirus, corporate social responsibility, correlation does not imply causation, COVID-19, crowdsourcing, data science, David Attenborough, David Heinemeier Hansson, deep learning, deepfake, digital divide, digital map, Elon Musk, emotional labour, equal pay for equal work, feminist movement, Filter Bubble, game design, gender pay gap, George Floyd, gig economy, glass ceiling, global pandemic, Google Chrome, Grace Hopper, income inequality, index fund, information asymmetry, Internet of things, invisible hand, it's over 9,000, iterative process, job automation, Lao Tzu, large language model, lockdown, machine readable, machine translation, Mark Zuckerberg, market bubble, microaggression, Moneyball by Michael Lewis explains big data, natural language processing, Netflix Prize, Network effects, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, occupational segregation, old-boy network, OpenAI, openstreetmap, paperclip maximiser, pattern recognition, performance metric, personalized medicine, price discrimination, publish or perish, QR code, randomized controlled trial, remote working, risk tolerance, robot derives from the Czech word robota Czech, meaning slave, Ronald Coase, Salesforce, self-driving car, sharing economy, Sheryl Sandberg, Silicon Valley, social distancing, social intelligence, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, surveillance capitalism, tech worker, TechCrunch disrupt, The Future of Employment, TikTok, Turing test, universal basic income, Wall-E, warehouse automation, women in the workforce, work culture , you are the product

And it can lower costs, increase the size of the pie, and accelerate the pace of progress. Malice or Competence: What We Fear For all the talk about the possibilities of AI and robotics, we’re really only at the embryonic stage of our grand machine-human integration. And AI means different things in different conversations. The most common use refers to machine learning—using statistical models to analyze large quantities of data. The next step from basic machine learning, referred to as deep learning, uses a multilayered architecture of networks, making connections and modeling patterns across data sets. AI can be understood as any machine—defined for our purposes as hardware running digital software—that mimics human behavior (i.e., human reactions).

We check boxes and upload images, and the algorithm learns how to direct us toward a successful connection. Online, we seem to be reduced to a menu of preselected choices. Despite Tinder’s recent announcement about forgoing automated scoring that takes ethnicity and socioeconomic status into account, many dating algorithms still use statistical models that allow them to classify users according to gender, race, sexuality, and other markers. At the same time, we can redefine our communities, seek love outside of our regular circles, and to some extent test the plasticity of our online identity beyond the rigid confines of the physical world.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

algorithmic trading, bioinformatics, business intelligence, business logic, business process, call centre, cloud computing, create, read, update, delete, data acquisition, data science, DevOps, extractivism, fault tolerance, information security, Large Hadron Collider, linked data, machine readable, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, warehouse automation, Watson beat the top human players on Jeopardy!, web application

CHALLENGES REMAIN Locating the right talent to analyze data is the biggest hurdle in building a team. Such talent is in high demand, and the need for data analysts and data scientists continues to grow at an almost exponential rate. Finding this talent means that organizations will have to focus on data science and hire statistical modelers and text data–mining professionals as well as people who specialize in sentiment analysis. Success with Big Data analytics requires solid data models, statistical predictive models, and test analytic models, since these will be the core applications needed to do Big Data. Locating the appropriate talent takes more than just a typical IT job placement; the skills required for a good return on investment are not simple and are not solely technology oriented.


pages: 428 words: 121,717

Warnings by Richard A. Clarke

"Hurricane Katrina" Superdome, active measures, Albert Einstein, algorithmic trading, anti-communist, artificial general intelligence, Asilomar, Asilomar Conference on Recombinant DNA, Bear Stearns, behavioural economics, Bernie Madoff, Black Monday: stock market crash in 1987, carbon tax, cognitive bias, collateralized debt obligation, complexity theory, corporate governance, CRISPR, cuban missile crisis, data acquisition, deep learning, DeepMind, discovery of penicillin, double helix, Elon Musk, failed state, financial thriller, fixed income, Flash crash, forensic accounting, friendly AI, Hacker News, Intergovernmental Panel on Climate Change (IPCC), Internet of things, James Watt: steam engine, Jeff Bezos, John Maynard Keynes: Economic Possibilities for our Grandchildren, knowledge worker, Maui Hawaii, megacity, Mikhail Gorbachev, money market fund, mouse model, Nate Silver, new economy, Nicholas Carr, Nick Bostrom, nuclear winter, OpenAI, pattern recognition, personalized medicine, phenotype, Ponzi scheme, Ray Kurzweil, Recombinant DNA, Richard Feynman, Richard Feynman: Challenger O-ring, risk tolerance, Ronald Reagan, Sam Altman, Search for Extraterrestrial Intelligence, self-driving car, Silicon Valley, smart grid, statistical model, Stephen Hawking, Stuxnet, subprime mortgage crisis, tacit knowledge, technological singularity, The Future of Employment, the scientific method, The Signal and the Noise by Nate Silver, Tunguska event, uranium enrichment, Vernor Vinge, WarGames: Global Thermonuclear War, Watson beat the top human players on Jeopardy!, women in the workforce, Y2K

The deeper they dig, the harder it gets to climb out and see what is happening outside, and the more tempting it becomes to keep on doing what they know how to do . . . uncovering new reasons why their initial inclination, usually too optimistic or pessimistic, was right.” Still, maddeningly, even the foxes, considered as a group, were only ever able to approximate the accuracy of simple statistical models that extrapolated trends. They did perform somewhat better than undergraduates subjected to the same exercises, and they outperformed the proverbial “chimp with a dart board,” but they didn’t come close to the predictive accuracy of formal statistical models. Later books have looked at Tetlock’s foundational results in some additional detail. Dan Gardner’s 2012 Future Babble draws on recent research in psychology, neuroscience, and behavioral economics to detail the biases and other cognitive processes that skew our judgment when we try to make predictions about the future.


pages: 1,164 words: 309,327

Trading and Exchanges: Market Microstructure for Practitioners by Larry Harris

active measures, Andrei Shleifer, AOL-Time Warner, asset allocation, automated trading system, barriers to entry, Bernie Madoff, Bob Litterman, book value, business cycle, buttonwood tree, buy and hold, compound rate of return, computerized trading, corporate governance, correlation coefficient, data acquisition, diversified portfolio, equity risk premium, fault tolerance, financial engineering, financial innovation, financial intermediation, fixed income, floating exchange rates, High speed trading, index arbitrage, index fund, information asymmetry, information retrieval, information security, interest rate swap, invention of the telegraph, job automation, junk bonds, law of one price, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, market clearing, market design, market fragmentation, market friction, market microstructure, money market fund, Myron Scholes, National best bid and offer, Nick Leeson, open economy, passive investing, pattern recognition, payment for order flow, Ponzi scheme, post-materialism, price discovery process, price discrimination, principal–agent problem, profit motive, proprietary trading, race to the bottom, random walk, Reminiscences of a Stock Operator, rent-seeking, risk free rate, risk tolerance, risk-adjusted returns, search costs, selection bias, shareholder value, short selling, short squeeze, Small Order Execution System, speech recognition, statistical arbitrage, statistical model, survivorship bias, the market place, transaction costs, two-sided market, vertical integration, winner-take-all economy, yield curve, zero-coupon bond, zero-sum game

Arbitrageurs generally should be reluctant to trade against markets that quickly and efficiently aggregate new information because the prices in such markets tend to accurately reflect fundamental values. 17.3.2.3 Statistical Arbitrage Statistical arbitrageurs use factor models to generalize the pairs trading strategy to many instruments. Factor models are statistical models that represent instrument returns by a weighted sum of common factors plus an instrument-specific factor. The weights, called factor loadings, are unique for each instrument. The arbitrageur must estimate them. Either statistical arbitrageurs specify the factors, or they use statistical methods to identify the factors from returns data for many instruments.

The variance of a set of price changes is the average squared difference between the price change and the average price change. The standard deviation is the square root of the variance. The mean absolute deviation is the average absolute difference between the price change and the average price change. Statistical models are necessary to identify and estimate the two components of total volatility. These models exploit the primary distinguishing characteristics of the two types of volatility: Fundamental volatility consists of seemingly random price changes that do not revert, whereas transitory volatility consists of price changes that ultimately revert.

Roll showed that we can estimate the latter term from the expected serial covariance. It is Inverting this expression gives Roll’s serial covariance spread estimator substitutes the sample serial covariance for the expected serial covariance in this last expression. ◀ * * * The simplest statistical model that can estimate these variance components is Roll’s serial covariance spread estimator model. Roll analyzed this simple model to create a simple serial covariance estimator of bid/ask spreads. The model assumes that fundamental values follow a random walk, and that observed prices are equal to fundamental value plus or minus half of the bid/ask spread.


Syntactic Structures by Noam Chomsky

finite state, language acquisition, P = NP, statistical model

We shall see, in fact, in § 7, that there are deep structural reasons for distinguish i ng (3) and (4) from (5) and (6) ; but before we are able to find an explana­ tion for such facts as these we shall have to carry the theory of syntactic structure a good deal beyond its fam i l iar li mits. 2.4 Third, the notion "grammatical i n English" cannot be identi- 16 SYNTACTIC STRUCTURES fied in any way with the notion "h igh order of statistical approxi­ mation to English." It is fa ir to assume that neither sentence ( I ) nor (2) (nor i ndeed any part of these sentences) has ever occurred in an English di scourse. Hence, in ,my statistical model for grammatical­ ness, these sentences will be ruled out on i dentica l grounds as equally 'remote' from English. Yet ( I ), though nonsensica l, i s grammatical, w h i l e ( 2 ) is not. Presented with these sentences, a speaker of English will read ( I ) with a normal sentence intonation, but he will read (2) with a fall ing i ntonation on each word ; i n fact, with just the i ntonation pattern given to any sequence of unrelated words.


pages: 199 words: 47,154

Gnuplot Cookbook by Lee Phillips

bioinformatics, computer vision, functional programming, general-purpose programming language, pattern recognition, statistical model, web application

These new features include the use of Unicode characters, transparency, new graph positioning commands, plotting objects, internationalization, circle plots, interactive HTML5 canvas plotting, iteration in scripts, lua/tikz/LaTeX integration, cairo and SVG terminal drivers, and volatile data. What this book covers Chapter 1, Plotting Curves, Boxes, Points, and more, covers the basic usage of Gnuplot: how to make all kinds of 2D plots for statistics, modeling, finance, science, and more. Chapter 2, Annotating with Labels and Legends, explains how to add labels, arrows, and mathematical text to our plots. Chapter 3, Applying Colors and Styles, covers the basics of colors and styles in gnuplot, plus transparency, and plotting with points and objects.


pages: 444 words: 138,781

Evicted: Poverty and Profit in the American City by Matthew Desmond

affirmative action, Cass Sunstein, crack epidemic, Credit Default Swap, deindustrialization, desegregation, dumpster diving, ending welfare as we know it, fixed income, food desert, gentrification, ghettoisation, glass ceiling, Gunnar Myrdal, housing crisis, housing justice, informal economy, Jane Jacobs, jobless men, Kickstarter, late fees, Lewis Mumford, mass incarceration, New Urbanism, payday loans, price discrimination, profit motive, rent control, statistical model, superstar cities, The Chicago School, The Death and Life of Great American Cities, thinkpad, upwardly mobile, working poor, young professional

With Jonathan Mijs, I combined all eviction court records between January 17 and February 26, 2011 (the Milwaukee Eviction Court Study period) with information about aspects of tenants’ neighborhoods, procured after geocoding the addresses that appeared in the eviction records. Working with the Harvard Center for Geographic Analysis, I also calculated the distance (in drive miles and time) between tenants’ addresses and the courthouse. Then I constructed a statistical model that attempted to explain the likelihood of a tenant appearing in court based on aspects of that tenant’s case and her or his neighborhood. The model generated only null findings. How much a tenant owed a landlord, her commute time to the courthouse, her gender—none of these factors were significantly related to appearing in court.

All else equal, a 1 percent increase in the percentage of children in a neighborhood is predicted to increase a neighborhood’s evictions by almost 7 percent. These estimates are based on court-ordered eviction records that took place in Milwaukee County between January 1, 2010, and December 31, 2010. The statistical model evaluating the association between a neighborhood’s percentage of children and its number of evictions is a zero-inflated Poisson regression, which is described in detail in Matthew Desmond et al., “Evicting Children,” Social Forces 92 (2013): 303–27. 3. That misery could stick around. At least two years after their eviction, mothers like Arleen still experienced significantly higher rates of depression than their peers.


pages: 624 words: 127,987

The Personal MBA: A World-Class Business Education in a Single Volume by Josh Kaufman

Albert Einstein, Alvin Toffler, Atul Gawande, Black Swan, Blue Ocean Strategy, business cycle, business process, buy low sell high, capital asset pricing model, Checklist Manifesto, cognitive bias, correlation does not imply causation, Credit Default Swap, Daniel Kahneman / Amos Tversky, David Heinemeier Hansson, David Ricardo: comparative advantage, Dean Kamen, delayed gratification, discounted cash flows, Donald Knuth, double entry bookkeeping, Douglas Hofstadter, Dunning–Kruger effect, en.wikipedia.org, Frederick Winslow Taylor, George Santayana, Gödel, Escher, Bach, high net worth, hindsight bias, index card, inventory management, iterative process, job satisfaction, Johann Wolfgang von Goethe, Kaizen: continuous improvement, Kevin Kelly, Kickstarter, Lao Tzu, lateral thinking, loose coupling, loss aversion, Marc Andreessen, market bubble, Network effects, Parkinson's law, Paul Buchheit, Paul Graham, place-making, premature optimization, Ralph Waldo Emerson, rent control, scientific management, side project, statistical model, stealth mode startup, Steve Jobs, Steve Wozniak, subscription business, systems thinking, telemarketer, the scientific method, time value of money, Toyota Production System, tulip mania, Upton Sinclair, Vilfredo Pareto, Walter Mischel, Y Combinator, Yogi Berra

MBA programs teach many worthless, outdated, even outright damaging concepts and practices—assuming your goal is to actually build a successful business and increase your net worth. Many of my MBAHOLDING readers and clients come to me after spending tens (sometimes hundreds) of thousands of dollars learning the ins and outs of complex financial formulas and statistical models, only to realize that their MBA program didn’t teach them how to start or improve a real, operating business. That’s a problem—graduating from business school does not guarantee having a useful working knowledge of business when you’re done, which is what you actually need to be successful. 3.

Over time, managers and executives began using statistics and analysis to forecast the future, relying on databases and spreadsheets in much the same way ancient seers relied on tea leaves and goat entrails. The world itself is no less unpredictable or uncertain: as in the olden days, the signs only “prove” the biases and desires of the soothsayer. The complexity of financial transactions and the statistical models those transactions relied upon continued to grow until few practitioners fully understood how they worked or respected their limits. As Wired revealed in a February 2009 article, “Recipe for Disaster: The Formula That Killed Wall Street,” the inherent limitations of deified financial formulas such as the Black-Scholes option pricing model, the Gaussian copula function, and the capital asset pricing model (CAPM) played a major role in the tech bubble of 2000 and the housing market and derivatives shenanigans behind the 2008 recession.


pages: 504 words: 139,137

Efficiently Inefficient: How Smart Money Invests and Market Prices Are Determined by Lasse Heje Pedersen

activist fund / activist shareholder / activist investor, Alan Greenspan, algorithmic trading, Andrei Shleifer, asset allocation, backtesting, bank run, banking crisis, barriers to entry, Bear Stearns, behavioural economics, Black-Scholes formula, book value, Brownian motion, business cycle, buy and hold, buy low sell high, buy the rumour, sell the news, capital asset pricing model, commodity trading advisor, conceptual framework, corporate governance, credit crunch, Credit Default Swap, currency peg, currency risk, David Ricardo: comparative advantage, declining real wages, discounted cash flows, diversification, diversified portfolio, Emanuel Derman, equity premium, equity risk premium, Eugene Fama: efficient market hypothesis, financial engineering, fixed income, Flash crash, floating exchange rates, frictionless, frictionless market, global macro, Gordon Gekko, implied volatility, index arbitrage, index fund, interest rate swap, junk bonds, late capitalism, law of one price, Long Term Capital Management, low interest rates, managed futures, margin call, market clearing, market design, market friction, Market Wizards by Jack D. Schwager, merger arbitrage, money market fund, mortgage debt, Myron Scholes, New Journalism, paper trading, passive investing, Phillips curve, price discovery process, price stability, proprietary trading, purchasing power parity, quantitative easing, quantitative trading / quantitative finance, random walk, Reminiscences of a Stock Operator, Renaissance Technologies, Richard Thaler, risk free rate, risk-adjusted returns, risk/return, Robert Shiller, selection bias, shareholder value, Sharpe ratio, short selling, short squeeze, SoftBank, sovereign wealth fund, statistical arbitrage, statistical model, stocks for the long run, stocks for the long term, survivorship bias, systematic trading, tail risk, technology bubble, time dilation, time value of money, total factor productivity, transaction costs, two and twenty, value at risk, Vanguard fund, yield curve, zero-coupon bond

For instance, volatility does not capture well the risk of selling out-the-money options, a strategy with small positive returns on most days but infrequent large crashes. To compute the volatility of a large portfolio, hedge funds need to account for correlations across assets, which can be accomplished by simulating the overall portfolio or by using a statistical model such as a factor model. Another measure of risk is value-at-risk (VaR), which attempts to capture tail risk (non-normality). The VaR measures the maximum loss with a certain confidence, as seen in figure 4.1 below. For example, the VaR is the most that you can lose with a 95% or 99% confidence.

The intermediary makes money when the wave subsides. Then the flows and equilibrium pricing are in the same direction. LHP: Or you might even short at a nickel cheap? MS: You might. Trend following is based on understanding macro developments and what governments are doing. Or they are based on statistical models of price movements. A positive up price tends to result in a positive up price. Here, however, it is not possible to determine whether the trend will continue. LHP: Why do spreads tend to widen during some periods of stress? MS: Well, capital becomes more scarce, both physical capital and human capital, in the sense that there isn’t enough time for intermediaries to understand what is happening in chaotic times.


pages: 480 words: 138,041

The Book of Woe: The DSM and the Unmaking of Psychiatry by Gary Greenberg

addicted to oil, Albert Einstein, Asperger Syndrome, autism spectrum disorder, back-to-the-land, David Brooks, Edward Jenner, impulse control, invisible hand, Isaac Newton, John Snow's cholera map, Kickstarter, late capitalism, longitudinal study, Louis Pasteur, McMansion, meta-analysis, neurotypical, phenotype, placebo effect, random walk, selection bias, statistical model, theory of mind, Winter of Discontent

If he was going to revise the DSM, Frances told Pincus, then his goal would be stabilizing the system rather than trying to perfect it—or, as he put it to me, “loving the pet, even if it is a mutt5.” Frances thought there was a way to protect the system from both instability and pontificating: meta-analysis, a statistical method that, thanks to advances in computer technology and statistical modeling, had recently allowed statisticians to compile results from large numbers of studies by combining disparate data into common terms. The result was a statistical synthesis by which many different research projects could be treated as one large study. “We needed something that would leave it up to the tables rather than the people,” he told me, and meta-analysis was perfect for the job.

Kraemer seemed to be saying that the point wasn’t to sift through the wreckage and try to prevent another catastrophe but, evidently, to crash the plane and then announce that the destruction could have been a lot worse. To be honest, however, I wasn’t sure. She was not making all that much sense, or maybe I just didn’t grasp the complexities of statistical modeling. And besides, I was distracted by a memory of something Steve Hyman once wrote. Fixing the DSM, finding another paradigm, getting away from its reifications—this, he said, was like “repairing a plane while it is flying.” It was a suggestive analogy, I thought at the time, one that recognized the near impossibility of the task even as it indicated its high stakes—and the necessity of keeping the mechanics from swearing and banging too loudly, lest the passengers start asking for a quick landing and a voucher on another airline.


pages: 186 words: 49,251

The Automatic Customer: Creating a Subscription Business in Any Industry by John Warrillow

Airbnb, airport security, Amazon Web Services, asset allocation, barriers to entry, call centre, cloud computing, commoditize, David Heinemeier Hansson, discounted cash flows, Hacker Conference 1984, high net worth, Jeff Bezos, Network effects, passive income, rolodex, Salesforce, sharing economy, side project, Silicon Valley, Silicon Valley startup, software as a service, statistical model, Steve Jobs, Stewart Brand, subscription business, telemarketer, the long tail, time value of money, zero-sum game, Zipcar

You have taken on a risk in guaranteeing your customer’s roof replacement and need to be paid for placing that bet. The repair job could have cost you $3,000, and then you would have taken an underwriting loss of $1,800 ($1,200−$3,000). Calculating your risk is the primary challenge of running a peace-of-mind model company. Big insurance companies employ an army of actuaries who use statistical models to predict the likelihood of a claim being made. You don’t need to be quite so scientific. Instead, start by looking back at the last 20 roofs you’ve installed with a guarantee and figure out how many service calls you needed to make. That will give you a pretty good idea of the possible risk of offering a peace-of-mind subscription.


pages: 222 words: 53,317

Overcomplicated: Technology at the Limits of Comprehension by Samuel Arbesman

algorithmic trading, Anthropocene, Anton Chekhov, Apple II, Benoit Mandelbrot, Boeing 747, Chekhov's gun, citation needed, combinatorial explosion, Computing Machinery and Intelligence, Danny Hillis, data science, David Brooks, digital map, discovery of the americas, driverless car, en.wikipedia.org, Erik Brynjolfsson, Flash crash, friendly AI, game design, Google X / Alphabet X, Googley, Hans Moravec, HyperCard, Ian Bogost, Inbox Zero, Isaac Newton, iterative process, Kevin Kelly, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mandelbrot fractal, Minecraft, Neal Stephenson, Netflix Prize, Nicholas Carr, Nick Bostrom, Parkinson's law, power law, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman: Challenger O-ring, Second Machine Age, self-driving car, SimCity, software studies, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, Stewart Brand, superintelligent machines, synthetic biology, systems thinking, the long tail, Therac-25, Tyler Cowen, Tyler Cowen: Great Stagnation, urban planning, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, Y2K

say, 99.9 percent of the time: I made these numbers up for effect, but if any linguist wants to chat, please reach out! “based on millions of specific features”: Alon Halevy et al., “The Unreasonable Effectiveness of Data,” IEEE Intelligent Systems 24, no. 2 (2009): 8–12. In some ways, these statistical models are actually simpler than those that start from seemingly more elegant rules, because the latter end up being complicated by exceptions. sophisticated machine learning techniques: See Douglas Heaven, “Higher State of Mind,” New Scientist 219 (August 10, 2013), 32–35, available online (under the title “Not Like Us: Artificial Minds We Can’t Understand”): http://complex.elte.hu/~csabai/simulationLab/AI_08_August_2013_New_Scientist.pdf.


Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, data science, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Gregor Mendel, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, Large Hadron Collider, longitudinal study, machine readable, machine translation, Mars Rover, natural language processing, openstreetmap, Paradox of Choice, power law, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social bookmarking, social graph, SPARQL, sparse data, speech recognition, statistical model, supply-chain management, systematic bias, TED Talk, text mining, the long tail, Vernor Vinge, web application

Although this is a fairly simple application, it highlights the distributed nature of the solution, combining open data with free visualization methods from multiple sources. More importantly, the distributed nature of the system and free accessibility of the data allow experts in different domains—experimentalists generating data, software developers creating interfaces, and computational modelers creating statistical models—to easily couple their expertise. The true promise of open data, open services, and the ecosystem that supports them is that this coupling can occur without requiring any formal collaboration. Researchers will find and use the data in ways that the generators of that data never considered. By doing this they add value to the original data set and strengthen the ecosystem around it, whether they are performing complementary experiments, doing new analyses, or providing new services that process the data.

We try to apply the following template: • “Figure X shows…” • “Each point (or line) in the graph represents…” • “The separate graphs indicate…” 323 Download at Boykma.Com • “Before making this graph, we did…which didn’t work, because…” • “A natural extension would be…” We do not have a full theory of statistical graphics—our closest attempt is to link exploratory graphical displays to checking the fit of statistical models (Gelman 2003)—but we hope that this small bit of structure can help readers in their own efforts. We think of our graphs not as beautiful standalone artifacts but rather as tools to help us understand beautiful reality. We illustrate using examples from our own work, not because our graphs are particularly beautiful, but because in these cases we know the story behind each plot.


pages: 517 words: 147,591

Small Wars, Big Data: The Information Revolution in Modern Conflict by Eli Berman, Joseph H. Felter, Jacob N. Shapiro, Vestal Mcintyre

basic income, call centre, centre right, classic study, clean water, confounding variable, crowdsourcing, data science, demand response, drone strike, experimental economics, failed state, George Akerlof, Google Earth, guns versus butter model, HESCO bastion, income inequality, income per capita, information asymmetry, Internet of things, iterative process, land reform, mandatory minimum, minimum wage unemployment, moral hazard, natural language processing, operational security, RAND corporation, randomized controlled trial, Ronald Reagan, school vouchers, statistical model, the scientific method, trade route, Twitter Arab Spring, unemployed young men, WikiLeaks, World Values Survey

He found that experiencing an indiscriminate attack was associated with a more than 50 percent decrease in the rate of insurgent attacks in a village—which amounts to a 24.2 percent reduction relative to the average.59 Furthermore, the correlation between the destructiveness of the random shelling and subsequent insurgent violence from that village was either negative or statistically insignificant, depending on the exact statistical model.60 While it’s not clear how civilians subject to these attacks interpreted them, what is clear is that in this case objectively indiscriminate violence by the government reduced local insurgent activity. Both of these studies are of asymmetric conflicts, and while the settings differ in important ways, each provides evidence that is not obviously consistent with the model.

Looking at subsequent village council elections, villages that had the training centers installed were much more likely to have a candidate from the PMLN place in the top two positions. The odds of a PMLN candidate either winning or being runner-up rose by 10 to 20 percentage points (depending on the statistical model). While other studies have shown that provision of public goods can sway attitudes, the effect is not usually so large. Remember, the training was funded and was going to be provided anyway. On the other hand, villages where vouchers were distributed for training elsewhere—making them less useful to men and virtually unusable by women—saw no increased support for the PMLN.


pages: 566 words: 155,428

After the Music Stopped: The Financial Crisis, the Response, and the Work Ahead by Alan S. Blinder

Affordable Care Act / Obamacare, Alan Greenspan, asset-backed security, bank run, banking crisis, banks create money, Bear Stearns, book value, break the buck, Carmen Reinhart, central bank independence, collapse of Lehman Brothers, collateralized debt obligation, conceptual framework, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, Detroit bankruptcy, diversification, double entry bookkeeping, eurozone crisis, facts on the ground, financial engineering, financial innovation, fixed income, friendly fire, full employment, Glass-Steagall Act, hiring and firing, housing crisis, Hyman Minsky, illegal immigration, inflation targeting, interest rate swap, Isaac Newton, junk bonds, Kenneth Rogoff, liquidity trap, London Interbank Offered Rate, Long Term Capital Management, low interest rates, market bubble, market clearing, market fundamentalism, McMansion, Minsky moment, money market fund, moral hazard, naked short selling, new economy, Nick Leeson, Northern Rock, Occupy movement, offshore financial centre, Paul Volcker talking about ATMs, price mechanism, proprietary trading, quantitative easing, Ralph Waldo Emerson, Robert Shiller, Robert Solow, Ronald Reagan, Savings and loan crisis, shareholder value, short selling, South Sea Bubble, statistical model, the payments system, time value of money, too big to fail, vertical integration, working-age population, yield curve, Yogi Berra

To date, there have been precious few studies of the broader effects of this grab bag of financial-market policies. The only one I know of that even attempts to estimate the macroeconomic impacts of the entire potpourri was published in July 2010 by Mark Zandi and me. Our methodology was pretty simple—and very standard. Take a statistical model of the U.S. economy—we used the Moody’s Analytics model—and simulate it both with and without the policies. The differences between the two simulations are then estimates of the effects of the policies. These estimates, of course, are only as good as the model, but ours were huge. By 2011, we estimated, real GDP was about 6 percent higher, the unemployment rate was nearly 3 percentage points lower, and 4.8 million more Americans were employed because of the financial-market policies (as compared with sticking with laissez-faire).

The standard analysis of conventional monetary policy—what we teach in textbooks and what central bankers are raised on—is predicated, roughly speaking, on constant risk spreads. When the Federal Reserve lowers riskless interest rates, like those on federal funds and T-bills, riskier interest rates, like those on corporate lending and auto loans, are supposed to follow suit.* The history on which we economists base our statistical models looks like that. Figure 9.1 shows the behavior of the interest rates on 10-year Treasuries (the lower line) and Moody’s Baa corporate bonds (the upper line) over the period from January 1980 through June 2007, just before the crisis got started. The spread between these two rates is the vertical distance between the two lines, and the fact that they look roughly parallel means that the spread did not change much over those twenty-seven years.


pages: 207 words: 57,959

Little Bets: How Breakthrough Ideas Emerge From Small Discoveries by Peter Sims

Alan Greenspan, Amazon Web Services, Black Swan, Clayton Christensen, complexity theory, David Heinemeier Hansson, deliberate practice, discovery of penicillin, endowment effect, fail fast, fear of failure, Frank Gehry, Guggenheim Bilbao, Jeff Bezos, knowledge economy, lateral thinking, Lean Startup, longitudinal study, loss aversion, meta-analysis, PageRank, Richard Florida, Richard Thaler, Ruby on Rails, Salesforce, scientific management, Silicon Valley, statistical model, Steve Ballmer, Steve Jobs, Steve Wozniak, systems thinking, TED Talk, theory of mind, Toyota Production System, urban planning, Wall-E

One of the men in charge of U.S. strategy in the war for many years was Robert McNamara, secretary of defense under Presidents Kennedy and Johnson. McNamara was known for his enormous intellect, renowned for achievements at Ford Motors (where he was once president) and in government. Many considered him the best management mind of his era. During World War II, McNamara had gained acclaim for developing statistical models to optimize the destruction from bombing operations over Japan. The challenge of Vietnam, however, proved to be different in ways that exposed the limits of McNamara’s approach. McNamara assumed that increased bombing in Vietnam would reduce the Viet Cong resistance with some degree of proportionality, but it did not.


pages: 244 words: 58,247

The Gone Fishin' Portfolio: Get Wise, Get Wealthy...and Get on With Your Life by Alexander Green

Alan Greenspan, Albert Einstein, asset allocation, asset-backed security, backtesting, behavioural economics, borderless world, buy and hold, buy low sell high, cognitive dissonance, diversification, diversified portfolio, Elliott wave, endowment effect, Everybody Ought to Be Rich, financial independence, fixed income, framing effect, hedonic treadmill, high net worth, hindsight bias, impulse control, index fund, interest rate swap, Johann Wolfgang von Goethe, John Bogle, junk bonds, Long Term Capital Management, means of production, mental accounting, Michael Milken, money market fund, Paul Samuelson, Ponzi scheme, risk tolerance, risk-adjusted returns, short selling, statistical model, stocks for the long run, sunk-cost fallacy, transaction costs, Vanguard fund, yield curve

Or take Long Term Capital Management (LTCM). LTCM was a hedge fund created in 1994 with the help of two Nobel Prize- winning economists. The fund incorporated a complex mathematical model designed to profit from inefficiencies in world bond prices. The brilliant folks in charge of the fund used a statistical model that they believed eliminated risk from the investment process. And if you’ve eliminated risk, why not bet large? So they did, accumulating positions totaling $1.25 trillion. Of course, they hadn’t really eliminated risk. And when Russia defaulted on its sovereign debt in 1998, the fund blew up.


pages: 219 words: 63,495

50 Future Ideas You Really Need to Know by Richard Watson

23andMe, 3D printing, access to a mobile phone, Albert Einstein, Alvin Toffler, artificial general intelligence, augmented reality, autonomous vehicles, BRICs, Buckminster Fuller, call centre, carbon credits, Charles Babbage, clean water, cloud computing, collaborative consumption, computer age, computer vision, crowdsourcing, dark matter, dematerialisation, Dennis Tito, digital Maoism, digital map, digital nomad, driverless car, Elon Musk, energy security, Eyjafjallajökull, failed state, Ford Model T, future of work, Future Shock, gamification, Geoffrey West, Santa Fe Institute, germ theory of disease, global pandemic, happiness index / gross national happiness, Higgs boson, high-speed rail, hive mind, hydrogen economy, Internet of things, Jaron Lanier, life extension, Mark Shuttleworth, Marshall McLuhan, megacity, natural language processing, Neil Armstrong, Network effects, new economy, ocean acidification, oil shale / tar sands, pattern recognition, peak oil, personalized medicine, phenotype, precision agriculture, private spaceflight, profit maximization, RAND corporation, Ray Kurzweil, RFID, Richard Florida, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Skype, smart cities, smart meter, smart transportation, space junk, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, supervolcano, synthetic biology, tech billionaire, telepresence, The Wisdom of Crowds, Thomas Malthus, Turing test, urban decay, Vernor Vinge, Virgin Galactic, Watson beat the top human players on Jeopardy!, web application, women in the workforce, working-age population, young professional

One day, we may, for example, develop a tiny chip that can hold the full medical history of a person including any medical conditions, allergies, prescriptions and contact information (this is already planned in America). Digital vacuums Digital vacuuming refers to the practice of scooping up vast amounts of data then using mathematical and statistical models to determine content and possible linkages. The data itself can be anything from phone calls in historical or real time (the US company AT&T, for example, holds the records of 1.9 trillion telephone calls) to financial transactions, emails and Internet site visits. Commercial applications could include future health risks to counterterrorism.


pages: 190 words: 62,941

Wild Ride: Inside Uber's Quest for World Domination by Adam Lashinsky

"Susan Fowler" uber, "World Economic Forum" Davos, Airbnb, always be closing, Amazon Web Services, asset light, autonomous vehicles, Ayatollah Khomeini, Benchmark Capital, business process, Chuck Templeton: OpenTable:, cognitive dissonance, corporate governance, DARPA: Urban Challenge, Didi Chuxing, Donald Trump, driverless car, Elon Musk, Erlich Bachman, gig economy, Golden Gate Park, Google X / Alphabet X, hustle culture, independent contractor, information retrieval, Jeff Bezos, John Zimmer (Lyft cofounder), Lyft, Marc Andreessen, Mark Zuckerberg, megacity, Menlo Park, multilevel marketing, new economy, pattern recognition, price mechanism, public intellectual, reality distortion field, ride hailing / ride sharing, Salesforce, San Francisco homelessness, Sand Hill Road, self-driving car, side hustle, Silicon Valley, Silicon Valley billionaire, Silicon Valley startup, Skype, Snapchat, South of Market, San Francisco, sovereign wealth fund, statistical model, Steve Jobs, super pumped, TaskRabbit, tech worker, Tony Hsieh, transportation-network company, Travis Kalanick, turn-by-turn navigation, Uber and Lyft, Uber for X, uber lyft, ubercab, young professional

Kalanick bragged about the advanced math that went into Uber’s calculation of when riders should expect their cars to show up. Uber’s “math department,” as he called it, included a computational statistician, a rocket scientist, and a nuclear physicist. They were running, he informed me, a Gaussian process emulation—a fancy statistical model—to improve on data available from Google’s mapping products. “Our estimates are far superior to Google’s,” Kalanick said. I was witnessing for the first time the cocksure Kalanick. I told him I had an idea for a market for Uber. I had recently sent a babysitter home in an Uber, a wonderful convenience because I could pay with my credit card from Uber’s app and then monitor the car’s progress on my phone to make sure the sitter got home safely.


pages: 256 words: 60,620

Think Twice: Harnessing the Power of Counterintuition by Michael J. Mauboussin

affirmative action, Alan Greenspan, asset allocation, Atul Gawande, availability heuristic, Benoit Mandelbrot, Bernie Madoff, Black Swan, butter production in bangladesh, Cass Sunstein, choice architecture, Clayton Christensen, cognitive dissonance, collateralized debt obligation, Daniel Kahneman / Amos Tversky, deliberate practice, disruptive innovation, Edward Thorp, experimental economics, financial engineering, financial innovation, framing effect, fundamental attribution error, Geoffrey West, Santa Fe Institute, George Akerlof, hindsight bias, hiring and firing, information asymmetry, libertarian paternalism, Long Term Capital Management, loose coupling, loss aversion, mandelbrot fractal, Menlo Park, meta-analysis, money market fund, Murray Gell-Mann, Netflix Prize, pattern recognition, Performance of Mutual Funds in the Period, Philip Mirowski, placebo effect, Ponzi scheme, power law, prediction markets, presumed consent, Richard Thaler, Robert Shiller, statistical model, Steven Pinker, systems thinking, the long tail, The Wisdom of Crowds, ultimatum game, vertical integration

This mistake, I admit, is hard to swallow and is a direct affront to experts of all stripes. But it is also among the best documented findings in the social sciences. In 1954, Paul Meehl, a psychologist at the University of Minnesota, published a book that reviewed studies comparing the clinical judgment of experts (psychologists and psychiatrists) with linear statistical models. He made sure the analysis was done carefully so he could be confident that the comparisons were fair. In study after study, the statistical methods exceeded or matched the expert performance.16 More recently, Philip Tetlock, a psychologist at the University of California, Berkeley, completed an exhaustive study of expert predictions, including twenty-eight thousand forecasts made by three hundred experts hailing from sixty countries over fifteen years.


Logically Fallacious: The Ultimate Collection of Over 300 Logical Fallacies (Academic Edition) by Bo Bennett

Black Swan, book value, butterfly effect, clean water, cognitive bias, correlation does not imply causation, Donald Trump, equal pay for equal work, Neil Armstrong, Richard Feynman, side project, statistical model, sunk-cost fallacy, the scientific method

But if you cross the line, hopefully you are with people who care about you enough to tell you. Tip: People don’t like to be made to feel inferior. You need to know when showing tack and restraint is more important than being right. Ludic Fallacy ludus Description: Assuming flawless statistical models apply to situations where they actually don’t. This can result in the over-confidence in probability theory or simply not knowing exactly where it applies, as opposed to chaotic situations or situations with external influences too subtle or numerous to predict. Example #1: The best example of this fallacy is presented by the person who coined this term, Nassim Nicholas Taleb in his 2007 book, The Black Swan.


Debtor Nation: The History of America in Red Ink (Politics and Society in Modern America) by Louis Hyman

Alan Greenspan, asset-backed security, bank run, barriers to entry, Bretton Woods, business cycle, business logic, card file, central bank independence, computer age, corporate governance, credit crunch, declining real wages, deindustrialization, diversified portfolio, financial independence, financial innovation, fixed income, Gini coefficient, Glass-Steagall Act, Home mortgage interest deduction, housing crisis, income inequality, invisible hand, It's morning again in America, late fees, London Interbank Offered Rate, low interest rates, market fundamentalism, means of production, mortgage debt, mortgage tax deduction, p-value, pattern recognition, post-Fordism, profit maximization, profit motive, risk/return, Ronald Reagan, Savings and loan crisis, Silicon Valley, statistical model, Tax Reform Act of 1986, technological determinism, technology bubble, the built environment, transaction costs, union organizing, white flight, women in the workforce, working poor, zero-sum game

In computer models, feminist credit advocates believed they had found the solution to discriminatory lending, ushering in the contemporary calculated credit regimes under which we live today. Yet removing such basic demographics from any model was not as straightforward as the authors of the ECOA had hoped because of how THE CREDIT INFRASTRUCTURE 215 all statistical models function, but which legislators seem to not have fully understood. The “objective” credit statistics that legislators had pined for during the early investigations of the Consumer Credit Protection Act could now exist, but with new difficulties that stemmed from using regressions and not human judgment to decide on loans.

The higher the level of education and income, the lower the effective interest rate paid, since such users tended more frequently to be non-revolvers.96 The researchers found that young, large, low-income families who could not save for major purchases, paid finance charges, while their opposite, older, smaller, highincome families who could save for major purchases, did not pay finance charges. Effectively the young and poor cardholders subsidized the convenience of the old and rich.97 And white.98 The new statistical models revealed that the second best predicator of revolving debt, after a respondent’s own “self-evaluation of his or her ability to save,” was race.99 But what these models revealed was that the very group—African Americans—that the politicians wanted to increase credit access to, tended to revolve their credit more than otherwise similar white borrowers.


pages: 504 words: 89,238

Natural language processing with Python by Steven Bird, Ewan Klein, Edward Loper

bioinformatics, business intelligence, business logic, Computing Machinery and Intelligence, conceptual framework, Donald Knuth, duck typing, elephant in my pajamas, en.wikipedia.org, finite state, Firefox, functional programming, Guido van Rossum, higher-order functions, information retrieval, language acquisition, lolcat, machine translation, Menlo Park, natural language processing, P = NP, search inside the book, sparse data, speech recognition, statistical model, text mining, Turing test, W. E. B. Du Bois

Structure of the published TIMIT Corpus: The CD-ROM contains doc, train, and test directories at the top level; the train and test directories both have eight sub-directories, one per dialect region; each of these contains further subdirectories, one per speaker; the contents of the directory for female speaker aks0 are listed, showing 10 wav files accompanied by a text transcription, a wordaligned transcription, and a phonetic transcription. there is a split between training and testing sets, which gives away its intended use for developing and evaluating statistical models. Finally, notice that even though TIMIT is a speech corpus, its transcriptions and associated data are just text, and can be processed using programs just like any other text corpus. Therefore, many of the computational methods described in this book are applicable. Moreover, notice that all of the data types included in the TIMIT Corpus fall into the two basic categories of lexicon and text, which we will discuss later.

For example, one intermediate position is to assume that humans are innately endowed with analogical and memory-based learning methods (weak rationalism), and use these methods to identify meaningful patterns in their sensory language experience (empiricism). We have seen many examples of this methodology throughout this book. Statistical methods inform symbolic models anytime corpus statistics guide the selection of productions in a context-free grammar, i.e., “grammar engineering.” Symbolic methods inform statistical models anytime a corpus that was created using rule-based methods is used as a source of features for training a statistical language model, i.e., “grammatical inference.” The circle is closed. NLTK Roadmap The Natural Language Toolkit is a work in progress, and is being continually expanded as people contribute code.


pages: 632 words: 166,729

Addiction by Design: Machine Gambling in Las Vegas by Natasha Dow Schüll

airport security, Albert Einstein, Build a better mousetrap, business intelligence, capital controls, cashless society, commoditize, corporate social responsibility, deindustrialization, dematerialisation, deskilling, emotional labour, Future Shock, game design, impulse control, information asymmetry, inventory management, iterative process, jitney, junk bonds, large denomination, late capitalism, late fees, longitudinal study, means of production, meta-analysis, Nash equilibrium, Panopticon Jeremy Bentham, Paradox of Choice, post-industrial society, postindustrial economy, profit motive, RFID, scientific management, Silicon Valley, Skinner box, Slavoj Žižek, statistical model, the built environment, yield curve, zero-sum game

To most profitably manage player relationships, the industry must determine the specific value of those relationships. “What is the relationship of a particular customer to you, and you to them? Is that customer profitable or not?” asked a Harrah’s executive at G2E in 2008.38 “What is the order of value of that player to me?” echoed Bally’s Rowe.39 Using statistical modeling, casinos “tier” players based on different parameters, assigning each a “customer value” or “theoretical player value”—a value, that is, based on the theoretical revenue they are likely to generate. On a panel called “Patron Rating: The New Definition of Customer Value,” one specialist shared his system for gauging patron worth, recommending that casinos give each customer a “recency score” (how recently he has visited), a “frequency score” (how often he visits), and a “monetary score” (how much he spends), and then create a personalized marketing algorithm out of these variables.40 “We want to maximize every relationship,” Harrah’s Richard Mirman told a journalist.41 Harrah’s statistical models for determining player value, similar to those used for predicting stocks’ future worth, are the most advanced in the industry.

On a panel called “Patron Rating: The New Definition of Customer Value,” one specialist shared his system for gauging patron worth, recommending that casinos give each customer a “recency score” (how recently he has visited), a “frequency score” (how often he visits), and a “monetary score” (how much he spends), and then create a personalized marketing algorithm out of these variables.40 “We want to maximize every relationship,” Harrah’s Richard Mirman told a journalist.41 Harrah’s statistical models for determining player value, similar to those used for predicting stocks’ future worth, are the most advanced in the industry. The casino franchise, which maintains ninety different demographic segments for its customers, has determined that player value is most strongly associated with frequency of play, type of game played, and the number of coins played per spin or hand.


pages: 569 words: 165,510

There Is Nothing for You Here: Finding Opportunity in the Twenty-First Century by Fiona Hill

2021 United States Capitol attack, active measures, Affordable Care Act / Obamacare, algorithmic bias, barriers to entry, Berlin Wall, Bernie Sanders, Big Tech, Black Lives Matter, blue-collar work, Boris Johnson, Brexit referendum, British Empire, business climate, call centre, collective bargaining, company town, coronavirus, COVID-19, crony capitalism, cuban missile crisis, David Brooks, deindustrialization, desegregation, digital divide, disinformation, Dissolution of the Soviet Union, Donald Trump, Fall of the Berlin Wall, financial independence, first-past-the-post, food desert, gender pay gap, gentrification, George Floyd, glass ceiling, global pandemic, Great Leap Forward, housing crisis, illegal immigration, imposter syndrome, income inequality, indoor plumbing, industrial cluster, industrial research laboratory, informal economy, Jeff Bezos, Jeremy Corbyn, Kickstarter, knowledge economy, lockdown, low skilled workers, Lyft, Martin Wolf, mass immigration, meme stock, Mikhail Gorbachev, new economy, oil shock, opioid epidemic / opioid crisis, Own Your Own Home, Paris climate accords, pension reform, QAnon, ransomware, restrictive zoning, ride hailing / ride sharing, Right to Buy, Ronald Reagan, self-driving car, Silicon Valley, single-payer health, statistical model, Steve Bannon, The Chicago School, TikTok, transatlantic slave trade, Uber and Lyft, uber lyft, University of East Anglia, urban decay, urban planning, Washington Consensus, WikiLeaks, Winter of Discontent, women in the workforce, working poor, Yom Kippur War, young professional

The digital divide in this case was manifested not by inadequate technological hardware and bandwidth but rather by the ones and zeroes that flowed through it and the human biases that they channeled. With schools closed and students in lockdown to stem disease transmission, the spring A-level exams were canceled. The UK government’s national exams and assessments regulatory board, known by its awkward acronym, Ofqual, decided to use a standardized statistical model instead of the exam to determine students’ grades. Teachers were instructed to submit grade predictions, but the national exam board then adjusted these using the algorithm they had devised. This drew on the historic data of the school and the results of previous students taking the same subject-based exams.

If this had been the approach to A-levels in 1984, my friends and I would surely have fallen into that unfortunate category. Bishop Barrington Comprehensive School had only a few years of A-level results in a smattering of subjects. There would have been no “historic” data for Ofqual to plug into its statistical model. In French, I didn’t even have a teacher to offer a prediction. I had been studying on my own in the months leading up to the exam. I could hardly have written my own assessment and would probably have been assigned an “unclassified” grade. Reading about the debacle from afar, I felt white-hot with sympathetic rage reading the students’ stunned comments.


pages: 242 words: 68,019

Why Information Grows: The Evolution of Order, From Atoms to Economies by Cesar Hidalgo

Ada Lovelace, Albert Einstein, Arthur Eddington, assortative mating, business cycle, Claude Shannon: information theory, David Ricardo: comparative advantage, Douglas Hofstadter, Everything should be made as simple as possible, Ford Model T, frictionless, frictionless market, George Akerlof, Gödel, Escher, Bach, income inequality, income per capita, industrial cluster, information asymmetry, invention of the telegraph, invisible hand, Isaac Newton, James Watt: steam engine, Jane Jacobs, job satisfaction, John von Neumann, Joi Ito, New Economic Geography, Norbert Wiener, p-value, Paul Samuelson, phenotype, price mechanism, Richard Florida, Robert Solow, Ronald Coase, Rubik’s Cube, seminal paper, Silicon Valley, Simon Kuznets, Skype, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, Stuart Kauffman, tacit knowledge, The Market for Lemons, The Nature of the Firm, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, working-age population

GNP considers the goods and services produced by the citizens of a country, whether or not those goods are produced within the boundaries of the country. 5. Simon Kuznets, “Modern Economic Growth: Findings and Reflections,” American Economic Review 63, no. 3 (1973): 247–258. 6. Technically, total factor productivity is the residual or error term of the statistical model. Also, economists often refer to total factor productivity as technology, although this is a semantic deformation that is orthogonal to the definition of technology used by anyone who has ever developed a technology. In the language of economics, technology is the ability to do more—of anything—with the same cost.


Exploring Everyday Things with R and Ruby by Sau Sheong Chang

Alfred Russel Wallace, bioinformatics, business process, butterfly effect, cloud computing, Craig Reynolds: boids flock, data science, Debian, duck typing, Edward Lorenz: Chaos theory, Gini coefficient, income inequality, invisible hand, p-value, price stability, Ruby on Rails, Skype, statistical model, stem cell, Stephen Hawking, text mining, The Wealth of Nations by Adam Smith, We are the 99%, web application, wikimedia commons

LOESS is not suitable for a large number of data points, however, because it scales on an O(n2) basis in memory, so instead we use the mgcv library and its gam method. We also send in the formula y~s(x), where s is the smoother function for GAM. GAM stands for generalized addictive model, which is a statistical model used to describe how items of data relate to each other. In our case, we use GAM as an algorithm in the smoother to provide us with a reasonably good estimation of how a large number of data points can be visualized. In Figure 8-5, you can see that the population of roids fluctuates over time between two extremes caused by the oversupply and exhaustion of food, respectively.


pages: 305 words: 69,216

A Failure of Capitalism: The Crisis of '08 and the Descent Into Depression by Richard A. Posner

Alan Greenspan, Andrei Shleifer, banking crisis, Bear Stearns, Bernie Madoff, business cycle, collateralized debt obligation, collective bargaining, compensation consultant, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, debt deflation, diversified portfolio, equity premium, financial deregulation, financial intermediation, Glass-Steagall Act, Home mortgage interest deduction, illegal immigration, laissez-faire capitalism, Long Term Capital Management, low interest rates, market bubble, Money creation, money market fund, moral hazard, mortgage debt, Myron Scholes, oil shock, Ponzi scheme, price stability, profit maximization, proprietary trading, race to the bottom, reserve currency, risk tolerance, risk/return, Robert Shiller, savings glut, shareholder value, short selling, statistical model, subprime mortgage crisis, too big to fail, transaction costs, very high income

Quantitative models of risk—another fulfillment of Weber's prophecy that more and more activities would be brought under the rule of rationality— are also being blamed for the financial crisis. Suppose a trader is contemplating the purchase of a stock using largely borrowed money, so that if the stock falls even a little way the loss will be great. He might consult a statistical model that predicted, on the basis of the ups and downs of the stock in the preceding two years, the probability distribution of the stock's behavior over the next few days or weeks. The criticism is that the model would have based the prediction on market behavior during a period of rising stock values; the modeler should have gone back to the 1980s or earlier to get a fuller picture of the riskiness of the stock.


pages: 204 words: 67,922

Elsewhere, U.S.A: How We Got From the Company Man, Family Dinners, and the Affluent Society to the Home Office, BlackBerry Moms,and Economic Anxiety by Dalton Conley

Alan Greenspan, assortative mating, call centre, clean water, commoditize, company town, dematerialisation, demographic transition, Edward Glaeser, extreme commuting, feminist movement, financial independence, Firefox, Frank Levy and Richard Murnane: The New Division of Labor, Home mortgage interest deduction, income inequality, informal economy, insecure affluence, It's morning again in America, Jane Jacobs, Joan Didion, John Maynard Keynes: Economic Possibilities for our Grandchildren, knowledge economy, knowledge worker, labor-force participation, late capitalism, low interest rates, low skilled workers, manufacturing employment, mass immigration, McMansion, Michael Shellenberger, mortgage tax deduction, new economy, off grid, oil shock, PageRank, Paradox of Choice, Ponzi scheme, positional goods, post-industrial society, post-materialism, principal–agent problem, recommendation engine, Richard Florida, rolodex, Ronald Reagan, Silicon Valley, Skype, statistical model, Ted Nordhaus, The Death and Life of Great American Cities, The Great Moderation, the long tail, the strength of weak ties, The Wealth of Nations by Adam Smith, Thomas Malthus, Thorstein Veblen, Tragedy of the Commons, transaction costs, women in the workforce, Yom Kippur War

Should they have gotten a discount since the first word of their brand is also the first word of American Airlines and thereby reinforces—albeit in a subtle way—the host company’s image? In order to know the value of the deal, they would have had to know how much the marketing campaign increases their business. Impossible. No focus group or statistical model will tell Amex how much worse or better their bottom line would have been in the absence of this marketing campaign. Ditto for the impact of billboards, product placement, and special promotions like airline mileage plans. There are simply too many other forces that come into play to be able to isolate the impact of a specific effort.


pages: 239 words: 70,206

Data-Ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else by Steve Lohr

"World Economic Forum" Davos, 23andMe, Abraham Maslow, Affordable Care Act / Obamacare, Albert Einstein, Alvin Toffler, Bear Stearns, behavioural economics, big data - Walmart - Pop Tarts, bioinformatics, business cycle, business intelligence, call centre, Carl Icahn, classic study, cloud computing, computer age, conceptual framework, Credit Default Swap, crowdsourcing, Daniel Kahneman / Amos Tversky, Danny Hillis, data is the new oil, data science, David Brooks, driverless car, East Village, Edward Snowden, Emanuel Derman, Erik Brynjolfsson, everywhere but in the productivity statistics, financial engineering, Frederick Winslow Taylor, Future Shock, Google Glasses, Ida Tarbell, impulse control, income inequality, indoor plumbing, industrial robot, informal economy, Internet of things, invention of writing, Johannes Kepler, John Markoff, John von Neumann, lifelogging, machine translation, Mark Zuckerberg, market bubble, meta-analysis, money market fund, natural language processing, obamacare, pattern recognition, payday loans, personalized medicine, planned obsolescence, precision agriculture, pre–internet, Productivity paradox, RAND corporation, rising living standards, Robert Gordon, Robert Solow, Salesforce, scientific management, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, SimCity, six sigma, skunkworks, speech recognition, statistical model, Steve Jobs, Steven Levy, The Design of Experiments, the scientific method, Thomas Kuhn: the structure of scientific revolutions, Tony Fadell, unbanked and underbanked, underbanked, Von Neumann architecture, Watson beat the top human players on Jeopardy!, yottabyte

“The altered field,” he wrote, “will be called ‘data science.’” In his paper, Cleveland, who is now a professor of statistics and computer science at Purdue University, described the contours of this new field. Data science, he said, would touch all disciplines of study and require the development of new statistical models, new computing tools, and educational programs in schools and corporations. Cleveland’s vision of a new field is now rapidly gaining momentum. The federal government, universities, and foundations are funding data science initiatives. Nearly all of these efforts are multidisciplinary melting pots that seek to bring together teams of computer scientists, statisticians, and mathematicians with experts who bring piles of data and unanswered questions from biology, astronomy, business and finance, public health, and elsewhere.


Once the American Dream: Inner-Ring Suburbs of the Metropolitan United States by Bernadette Hanlon

big-box store, classic study, company town, correlation coefficient, deindustrialization, desegregation, edge city, feminist movement, gentrification, housing crisis, illegal immigration, informal economy, longitudinal study, low skilled workers, low-wage service sector, manufacturing employment, McMansion, New Urbanism, Silicon Valley, statistical model, streetcar suburb, The Chicago School, transit-oriented development, urban sprawl, white flight, working-age population, zero-sum game

In this study, he suggests the role of population growth was somewhat exaggerated but finds other characteristics much more pertinent. Aside from population growth, he includes the variables of suburban age, initial suburban status levels, the suburbs’ geographic locations, suburban racial makeup, and employment specialization in his statistical model. He finds (1979: 946) that suburban age, the percentage of black inhabitants, and employment specialization within a suburb affected its then-current status (in 1970) “inasmuch as they also affected earlier (1960) status levels.” He describes how a suburb’s initial, established “ecological niche” was a great determinant of its future status.


pages: 210 words: 65,833

This Is Not Normal: The Collapse of Liberal Britain by William Davies

Airbnb, basic income, Bernie Sanders, Big bang: deregulation of the City of London, Black Lives Matter, Boris Johnson, Cambridge Analytica, central bank independence, centre right, Chelsea Manning, coronavirus, corporate governance, COVID-19, credit crunch, data science, deindustrialization, disinformation, Dominic Cummings, Donald Trump, double entry bookkeeping, Edward Snowden, fake news, family office, Filter Bubble, Francis Fukuyama: the end of history, ghettoisation, gig economy, global pandemic, global village, illegal immigration, Internet of things, Jeremy Corbyn, late capitalism, Leo Hollis, liberal capitalism, loadsamoney, London Interbank Offered Rate, mass immigration, moral hazard, Neil Kinnock, Northern Rock, old-boy network, post-truth, postnationalism / post nation state, precariat, prediction markets, quantitative easing, recommendation engine, Robert Mercer, Ronald Reagan, sentiment analysis, sharing economy, Silicon Valley, Slavoj Žižek, statistical model, Steve Bannon, Steven Pinker, surveillance capitalism, technoutopianism, The Chicago School, Thorstein Veblen, transaction costs, universal basic income, W. E. B. Du Bois, web of trust, WikiLeaks, Yochai Benkler

In this new world, data is captured first, research questions come later. In the long term, the implications of this will likely be as profound as the invention of statistics was in the late seventeenth century. The rise of ‘big data’ provides far greater opportunities for quantitative analysis than any amount of polling or statistical modelling. But it is not just the quantity of data that is different. It represents an entirely different type of knowledge, accompanied by a new mode of expertise. First, there is no fixed scale of analysis (such as the nation), nor are there any settled categories (such as ‘unemployed’). These vast new data sets can be mined in search of patterns, trends, correlations and emergent moods, which becomes a way of tracking the identities people bestow upon themselves (via hashtags and tags) rather than imposing classifications on them.


pages: 227 words: 63,186

An Elegant Puzzle: Systems of Engineering Management by Will Larson

Ben Horowitz, Cass Sunstein, Clayton Christensen, data science, DevOps, en.wikipedia.org, fault tolerance, functional programming, Google Earth, hive mind, Innovator's Dilemma, iterative process, Kanban, Kickstarter, Kubernetes, loose coupling, microservices, MITM: man-in-the-middle, no silver bullet, pull request, Richard Thaler, seminal paper, Sheryl Sandberg, Silicon Valley, statistical model, systems thinking, the long tail, web application

“Availability in Globally Distributed Storage Systems” This paper explores how to think about availability in replicated distributed systems, and is a useful starting point for those of us who are trying to determine the correct way to measure uptime for our storage layer or for any other sufficiently complex system. From the abstract: We characterize the availability properties of cloud storage systems based on an extensive one-year study of Google’s main storage infrastructure and present statistical models that enable further insight into the impact of multiple design choices, such as data placement and replication strategies. With these models we compare data availability under a variety of system parameters given the real patterns of failures observed in our fleet. Particularly interesting is the focus on correlated failures, building on the premise that users of distributed systems only experience the failure when multiple components have overlapping failures.


pages: 757 words: 193,541

The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2 by Thomas A. Limoncelli, Strata R. Chalup, Christina J. Hogan

active measures, Amazon Web Services, anti-pattern, barriers to entry, business process, cloud computing, commoditize, continuous integration, correlation coefficient, database schema, Debian, defense in depth, delayed gratification, DevOps, domain-specific language, en.wikipedia.org, fault tolerance, finite state, Firefox, functional programming, Google Glasses, information asymmetry, Infrastructure as a Service, intermodal, Internet of things, job automation, job satisfaction, Ken Thompson, Kickstarter, level 1 cache, load shedding, longitudinal study, loose coupling, machine readable, Malcom McLean invented shipping containers, Marc Andreessen, place-making, platform as a service, premature optimization, recommendation engine, revision control, risk tolerance, Salesforce, scientific management, seminal paper, side project, Silicon Valley, software as a service, sorting algorithm, standardized shipping container, statistical model, Steven Levy, supply-chain management, systems thinking, The future is already here, Toyota Production System, vertical integration, web application, Yogi Berra

Standard capacity planing is sufficient for small sites, sites that grow slowly, and sites with simple needs. It is insufficient for large, rapidly growing sites. They require more advanced techniques. Advanced capacity planning is based on core drivers, capacity limits of individual resources, and sophisticated data analysis such as correlation, regression analysis, and statistical models for forecasting. Regression analysis finds correlations between core drivers and resources. Forecasting uses past data to predict future needs. With sufficiently large sites, capacity planning is a full-time job, often done by project managers with technical backgrounds. Some organizations employ full-time statisticians to build complex models and dashboards that provide the information required by a project manager.

Capacity planning involves the technical work of understanding how many resources are needed per unit of growth, plus non-technical aspects such as budgeting, forecasting, and supply chain management. These topics are covered in Chapter 18. Sample Assessment Questions • How much capacity do you have now? • How much capacity do you expect to need three months from now? Twelve months from now? • Which statistical models do you use for determining future needs? • How do you load-test? • How much time does capacity planning take? What could be done to make it easier? • Are metrics collected automatically? • Are metrics available always or does their need initiate a process that collects them? • Is capacity planning the job of no one, everyone, a specific person, or a team of capacity planners?


pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives by Steven Levy

"World Economic Forum" Davos, 23andMe, AltaVista, Andy Rubin, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, Bill Atkinson, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Dutch auction, El Camino Real, Evgeny Morozov, fault tolerance, Firefox, General Magic , Gerard Salton, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, high-speed rail, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, John Markoff, Ken Thompson, Kevin Kelly, Kickstarter, large language model, machine translation, Mark Zuckerberg, Menlo Park, one-China policy, optical character recognition, PageRank, PalmPilot, Paul Buchheit, Potemkin village, prediction markets, Project Xanadu, recommendation engine, risk tolerance, Rubik’s Cube, Sand Hill Road, Saturday Night Live, search inside the book, second-price auction, selection bias, Sheryl Sandberg, Silicon Valley, SimCity, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, subscription business, Susan Wojcicki, Ted Nelson, telemarketer, The future is already here, the long tail, trade route, traveling salesman, turn-by-turn navigation, undersea cable, Vannevar Bush, web application, WikiLeaks, Y Combinator

Because Och and his colleagues knew they would have access to an unprecedented amount of data, they worked from the ground up to create a new translation system. “One of the things we did was to build very, very, very large language models, much larger than anyone has ever built in the history of mankind.” Then they began to train the system. To measure progress, they used a statistical model that, given a series of words, would predict the word that came next. Each time they doubled the amount of training data, they got a .5 percent boost in the metrics that measured success in the results. “So we just doubled it a bunch of times.” In order to get a reasonable translation, Och would say, you might feed something like a billion words to the model.

“We are trying to understand the mechanisms behind the metrics,” says Qing Wu, a decision support analyst at Google. His specialty was forecasting. He could predict patterns of queries from season to season, in different parts of the day, and the climate. “We have the temperature data, we have the weather data, and we have the queries data so we can do correlation and statistical modeling.” To make sure that his predictions were on track, Qing Wu and his colleagues made use of dozens of onscreen dashboards with information flowing through them, a Bloomberg of the Googlesphere. “With a dashboard you can monitor the queries, the amount of money you make, how many advertisers we have, how many keywords they’re bidding on, what the ROI is for each advertiser.”


pages: 741 words: 179,454

Extreme Money: Masters of the Universe and the Cult of Risk by Satyajit Das

"RICO laws" OR "Racketeer Influenced and Corrupt Organizations", "there is no alternative" (TINA), "World Economic Forum" Davos, affirmative action, Alan Greenspan, Albert Einstein, algorithmic trading, Andy Kessler, AOL-Time Warner, Asian financial crisis, asset allocation, asset-backed security, bank run, banking crisis, banks create money, Basel III, Bear Stearns, behavioural economics, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Big bang: deregulation of the City of London, Black Swan, Bonfire of the Vanities, bonus culture, book value, Bretton Woods, BRICs, British Empire, business cycle, buy the rumour, sell the news, capital asset pricing model, carbon credits, Carl Icahn, Carmen Reinhart, carried interest, Celtic Tiger, clean water, cognitive dissonance, collapse of Lehman Brothers, collateralized debt obligation, corporate governance, corporate raider, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, Daniel Kahneman / Amos Tversky, deal flow, debt deflation, Deng Xiaoping, deskilling, discrete time, diversification, diversified portfolio, Doomsday Clock, Dr. Strangelove, Dutch auction, Edward Thorp, Emanuel Derman, en.wikipedia.org, Eugene Fama: efficient market hypothesis, eurozone crisis, Everybody Ought to Be Rich, Fall of the Berlin Wall, financial engineering, financial independence, financial innovation, financial thriller, fixed income, foreign exchange controls, full employment, Glass-Steagall Act, global reserve currency, Goldman Sachs: Vampire Squid, Goodhart's law, Gordon Gekko, greed is good, Greenspan put, happiness index / gross national happiness, haute cuisine, Herman Kahn, high net worth, Hyman Minsky, index fund, information asymmetry, interest rate swap, invention of the wheel, invisible hand, Isaac Newton, James Carville said: "I would like to be reincarnated as the bond market. You can intimidate everybody.", job automation, Johann Wolfgang von Goethe, John Bogle, John Meriwether, joint-stock company, Jones Act, Joseph Schumpeter, junk bonds, Kenneth Arrow, Kenneth Rogoff, Kevin Kelly, laissez-faire capitalism, load shedding, locking in a profit, Long Term Capital Management, Louis Bachelier, low interest rates, margin call, market bubble, market fundamentalism, Market Wizards by Jack D. Schwager, Marshall McLuhan, Martin Wolf, mega-rich, merger arbitrage, Michael Milken, Mikhail Gorbachev, Milgram experiment, military-industrial complex, Minsky moment, money market fund, Mont Pelerin Society, moral hazard, mortgage debt, mortgage tax deduction, mutually assured destruction, Myron Scholes, Naomi Klein, National Debt Clock, negative equity, NetJets, Network effects, new economy, Nick Leeson, Nixon shock, Northern Rock, nuclear winter, oil shock, Own Your Own Home, Paul Samuelson, pets.com, Philip Mirowski, Phillips curve, planned obsolescence, plutocrats, Ponzi scheme, price anchoring, price stability, profit maximization, proprietary trading, public intellectual, quantitative easing, quantitative trading / quantitative finance, Ralph Nader, RAND corporation, random walk, Ray Kurzweil, regulatory arbitrage, Reminiscences of a Stock Operator, rent control, rent-seeking, reserve currency, Richard Feynman, Richard Thaler, Right to Buy, risk free rate, risk-adjusted returns, risk/return, road to serfdom, Robert Shiller, Rod Stewart played at Stephen Schwarzman birthday party, rolodex, Ronald Reagan, Ronald Reagan: Tear down this wall, Satyajit Das, savings glut, shareholder value, Sharpe ratio, short selling, short squeeze, Silicon Valley, six sigma, Slavoj Žižek, South Sea Bubble, special economic zone, statistical model, Stephen Hawking, Steve Jobs, stock buybacks, survivorship bias, tail risk, Teledyne, The Chicago School, The Great Moderation, the market place, the medium is the message, The Myth of the Rational Market, The Nature of the Firm, the new new thing, The Predators' Ball, The Theory of the Leisure Class by Thorstein Veblen, The Wealth of Nations by Adam Smith, Thorstein Veblen, too big to fail, trickle-down economics, Turing test, two and twenty, Upton Sinclair, value at risk, Yogi Berra, zero-coupon bond, zero-sum game

HE (home equity) and HELOC (home equity line of credit), borrowing against the equity in existing homes, became prevalent. Empowered by high-tech models, lenders loaned to less creditworthy borrowers, believing they could price any risk. Ben Bernanke shared his predecessor Alan Greenspan’s faith: “banks have become increasingly adept at predicting default risk by applying statistical models to data, such as credit scores.” Bernanke concluded that banks “have made substantial strides...in their ability to measure and manage risks.”13 Innovative affordability products included jumbo and super jumbo loans that did not conform to guidelines because of their size. More risky than prime but less risky than subprime, Alt A (Alternative A) mortgages were for borrowers who did not meet normal criteria.

Although Moody’s reversed the upgrades, all three banks collapsed in 2008. Unimpeded by insufficient disclosure, lack of information transparency, fraud, and improper accounting, traders anticipated these defaults, marking down bond prices well before rating downgrades. Rating-structured securities required statistical models, mapping complex securities to historical patterns of default on normal bonds. With mortgage markets changing rapidly, this was like “using weather in Antarctica to forecast conditions in Hawaii.”17 Antarctica from 100 years ago! The agencies did not look at the underlying mortgages or loans in detail, relying instead on information from others.


pages: 238 words: 75,994

A Burglar's Guide to the City by Geoff Manaugh

A. Roger Ekirch, big-box store, card file, dark matter, Evgeny Morozov, game design, index card, megacity, megaproject, megastructure, Minecraft, off grid, Rubik’s Cube, SimCity, Skype, smart cities, statistical model, the built environment, urban planning

* The fundamental premise of the capture-house program is that police can successfully predict what sorts of buildings and internal spaces will attract not just any criminal but a specific burglar, the unique individual each particular capture house was built to target. This is because burglars unwittingly betray personal, as well as shared, patterns in their crimes; they often hit the same sorts of apartments and businesses over and over. But the urge to mathematize this, and to devise complex statistical models for when and where a burglar will strike next, can lead to all sorts of analytical absurdities. A great example of this comes from an article published in the criminology journal Crime, Law and Social Change back in 2011. Researchers from the Physics Engineering Department at Tsinghua University reported some eyebrow-raisingly specific data about the meteorological circumstances during which burglaries were most likely to occur in urban China.


pages: 267 words: 72,552

Reinventing Capitalism in the Age of Big Data by Viktor Mayer-Schönberger, Thomas Ramge

accounting loophole / creative accounting, Air France Flight 447, Airbnb, Alvin Roth, Apollo 11, Atul Gawande, augmented reality, banking crisis, basic income, Bayesian statistics, Bear Stearns, behavioural economics, bitcoin, blockchain, book value, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, Cass Sunstein, centralized clearinghouse, Checklist Manifesto, cloud computing, cognitive bias, cognitive load, conceptual framework, creative destruction, Daniel Kahneman / Amos Tversky, data science, Didi Chuxing, disruptive innovation, Donald Trump, double entry bookkeeping, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, flying shuttle, Ford Model T, Ford paid five dollars a day, Frederick Winslow Taylor, fundamental attribution error, George Akerlof, gig economy, Google Glasses, Higgs boson, information asymmetry, interchangeable parts, invention of the telegraph, inventory management, invisible hand, James Watt: steam engine, Jeff Bezos, job automation, job satisfaction, joint-stock company, Joseph Schumpeter, Kickstarter, knowledge worker, labor-force participation, land reform, Large Hadron Collider, lone genius, low cost airline, low interest rates, Marc Andreessen, market bubble, market design, market fundamentalism, means of production, meta-analysis, Moneyball by Michael Lewis explains big data, multi-sided market, natural language processing, Neil Armstrong, Network effects, Nick Bostrom, Norbert Wiener, offshore financial centre, Parag Khanna, payday loans, peer-to-peer lending, Peter Thiel, Ponzi scheme, prediction markets, price anchoring, price mechanism, purchasing power parity, radical decentralization, random walk, recommendation engine, Richard Thaler, ride hailing / ride sharing, Robinhood: mobile stock trading app, Sam Altman, scientific management, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, smart grid, smart meter, Snapchat, statistical model, Steve Jobs, subprime mortgage crisis, Suez canal 1869, tacit knowledge, technoutopianism, The Future of Employment, The Market for Lemons, The Nature of the Firm, transaction costs, universal basic income, vertical integration, William Langewiesche, Y Combinator

Divergences would be flagged and brought to the attention of factory directors, then to government decision makers sitting in a futuristic operations room. From there the officials would send directives back to the factories. Cybersyn was quite sophisticated for its time, employing a network approach to capturing and calculating economic activity and using Bayesian statistical models. Most important, it relied on feedback that would loop back into the decision-making processes. The system never became fully operational. Its communications network was in place and was used in the fall of 1972 to keep the country running when striking transportation workers blocked goods from entering Santiago.


Deep Work: Rules for Focused Success in a Distracted World by Cal Newport

8-hour work day, Albert Einstein, barriers to entry, behavioural economics, Bluma Zeigarnik, business climate, Cal Newport, Capital in the Twenty-First Century by Thomas Piketty, Clayton Christensen, David Brooks, David Heinemeier Hansson, deliberate practice, digital divide, disruptive innovation, do what you love, Donald Knuth, Donald Trump, Downton Abbey, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, experimental subject, follow your passion, Frank Gehry, Hacker News, Higgs boson, informal economy, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, Mark Zuckerberg, Marshall McLuhan, Merlin Mann, Nate Silver, Neal Stephenson, new economy, Nicholas Carr, popular electronics, power law, remote working, Richard Feynman, Ruby on Rails, seminal paper, Silicon Valley, Silicon Valley startup, Snapchat, statistical model, the medium is the message, Tyler Cowen, Watson beat the top human players on Jeopardy!, web application, winner-take-all economy, work culture , zero-sum game

It turns out to be really difficult to answer a simple question such as: What’s the impact of our current e-mail habits on the bottom line? Cochran had to conduct a company-wide survey and gather statistics from the IT infrastructure. He also had to pull together salary data and information on typing and reading speed, and run the whole thing through a statistical model to spit out his final result. And even then, the outcome is fungible, as it’s not able to separate out, for example, how much value was produced by this frequent, expensive e-mail use to offset some of its cost. This example generalizes to most behaviors that potentially impede or improve deep work.


pages: 579 words: 76,657

Data Science from Scratch: First Principles with Python by Joel Grus

backpropagation, confounding variable, correlation does not imply causation, data science, deep learning, Hacker News, higher-order functions, natural language processing, Netflix Prize, p-value, Paul Graham, recommendation engine, SpamAssassin, statistical model

Of course, she doesn’t want to write thousands of web pages, nor does she want to pay a horde of “content strategists” to do so. Instead she asks you whether you can somehow programatically generate these web pages. To do this, we’ll need some way of modeling language. One approach is to start with a corpus of documents and learn a statistical model of language. In our case, we’ll start with Mike Loukides’s essay “What is data science?” As in Chapter 9, we’ll use requests and BeautifulSoup to retrieve the data. There are a couple of issues worth calling attention to. The first is that the apostrophes in the text are actually the Unicode character u"\u2019".


pages: 589 words: 69,193

Mastering Pandas by Femi Anthony

Amazon Web Services, Bayesian statistics, correlation coefficient, correlation does not imply causation, data science, Debian, en.wikipedia.org, Internet of things, Large Hadron Collider, natural language processing, p-value, power law, random walk, side project, sparse data, statistical model, Thomas Bayes

This is called the posterior. is the probability of obtaining the data, considering our hypothesis. This is called the likelihood. Thus, Bayesian statistics amounts to applying Bayes rule to solve problems in inferential statistics with H representing our hypothesis and D the data. A Bayesian statistical model is cast in terms of parameters, and the uncertainty in these parameters is represented by probability distributions. This is different from the Frequentist approach where the values are regarded as deterministic. An alternative representation is as follows: where, is our unknown data and is our observed data In Bayesian statistics, we make assumptions about the prior data and use the likelihood to update to the posterior probability using the Bayes rule.


pages: 291 words: 77,596

Total Recall: How the E-Memory Revolution Will Change Everything by Gordon Bell, Jim Gemmell

airport security, Albert Einstein, book scanning, cloud computing, Computing Machinery and Intelligence, conceptual framework, Douglas Engelbart, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, Ivan Sutherland, John Markoff, language acquisition, lifelogging, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Steve Bannon, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application

“World Explorer: Visualizing Aggregate Data from Unstructured Text in Geo-Referenced Collections.” In Proceedings, Seventh ACM/IEEE-CS Joint Conference on Digital Libraries ( JCDL 07), June 2007. The Stuff I’ve Seen project did some experiments that showed how displaying milestones alongside a timeline may help orient the user. Horvitz et al. used statistical models to infer the probability that users will consider events to be memory landmarks. Ringel, M., E. Cutrell, S. T. Dumais, and E. Horvitz. 2003. “Milestones in Time: The Value of Landmarks in Retrieving Information from Personal Stores.” Proceedings of IFIP Interact 2003. Horvitz, Eric, Susan Dumais, and Paul Koch.


pages: 274 words: 75,846

The Filter Bubble: What the Internet Is Hiding From You by Eli Pariser

A Declaration of the Independence of Cyberspace, A Pattern Language, adjacent possible, Amazon Web Services, An Inconvenient Truth, Apple Newton, augmented reality, back-to-the-land, Black Swan, borderless world, Build a better mousetrap, Cass Sunstein, citizen journalism, cloud computing, cognitive dissonance, crowdsourcing, Danny Hillis, data acquisition, disintermediation, don't be evil, Filter Bubble, Flash crash, fundamental attribution error, Gabriella Coleman, global village, Haight Ashbury, Internet of things, Isaac Newton, Jaron Lanier, Jeff Bezos, jimmy wales, John Perry Barlow, Kevin Kelly, knowledge worker, Mark Zuckerberg, Marshall McLuhan, megacity, Metcalfe’s law, Netflix Prize, new economy, PageRank, Paradox of Choice, Patri Friedman, paypal mafia, Peter Thiel, power law, recommendation engine, RFID, Robert Metcalfe, sentiment analysis, shareholder value, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, social graph, social software, social web, speech recognition, Startup school, statistical model, stem cell, Steve Jobs, Steven Levy, Stewart Brand, technoutopianism, Ted Nordhaus, The future is already here, the scientific method, urban planning, We are as Gods, Whole Earth Catalog, WikiLeaks, Y Combinator, Yochai Benkler

If Netflix shows me a romantic comedy and I like it, it’ll show me another one and begin to think of me as a romantic-comedy lover. But if it wants to get a good picture of who I really am, it should be constantly testing the hypothesis by showing me Blade Runner in an attempt to prove it wrong. Otherwise, I end up caught in a local maximum populated by Hugh Grant and Julia Roberts. The statistical models that make up the filter bubble write off the outliers. But in human life it’s the outliers who make things interesting and give us inspiration. And it’s the outliers who are the first signs of change. One of the best critiques of algorithmic prediction comes, remarkably, from the late-nineteenth-century Russian novelist Fyodor Dostoyevsky, whose Notes from Underground was a passionate critique of the utopian scientific rationalism of the day.


pages: 306 words: 78,893

After the New Economy: The Binge . . . And the Hangover That Won't Go Away by Doug Henwood

"World Economic Forum" Davos, accounting loophole / creative accounting, affirmative action, Alan Greenspan, AOL-Time Warner, Asian financial crisis, barriers to entry, Benchmark Capital, book value, borderless world, Branko Milanovic, Bretton Woods, business cycle, California energy crisis, capital controls, corporate governance, corporate raider, correlation coefficient, credit crunch, deindustrialization, dematerialisation, deskilling, digital divide, electricity market, emotional labour, ending welfare as we know it, feminist movement, fulfillment center, full employment, gender pay gap, George Gilder, glass ceiling, Glass-Steagall Act, Gordon Gekko, government statistician, greed is good, half of the world's population has never made a phone call, income inequality, indoor plumbing, intangible asset, Internet Archive, job satisfaction, joint-stock company, Kevin Kelly, labor-force participation, Larry Ellison, liquidationism / Banker’s doctrine / the Treasury view, low interest rates, manufacturing employment, Mary Meeker, means of production, Michael Milken, minimum wage unemployment, Naomi Klein, new economy, occupational segregation, PalmPilot, pets.com, post-work, profit maximization, purchasing power parity, race to the bottom, Ralph Nader, rewilding, Robert Gordon, Robert Shiller, Robert Solow, rolling blackouts, Ronald Reagan, shareholder value, Silicon Valley, Simon Kuznets, statistical model, stock buybacks, structural adjustment programs, tech worker, Telecommunications Act of 1996, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, The Wealth of Nations by Adam Smith, total factor productivity, union organizing, War on Poverty, warehouse automation, women in the workforce, working poor, zero-sum game

Even classic statements of this skills argument, Hke that of Juhn, Murphy, and Pierce (1993), find that the standard proxies for skill Hke years of education and years of work experience (proxies being needed because skill is nearly impossible to define or measure) only explain part of the increase in polarization—less than half, in fact. Most of the increase remains unexplained by statistical models, a remainder that is typically attributed to "unobserved" attributes. That is, since conventional economists believe as a matter of faith that market rates of pay are fair compensation for a worker s productive contribution, any inexpHcable anomaUes in pay must be the result of things a boss can see that elude the academics model.


pages: 279 words: 75,527

Collider by Paul Halpern

Albert Einstein, Albert Michelson, anthropic principle, cosmic microwave background, cosmological constant, dark matter, Dr. Strangelove, Ernest Rutherford, Gary Taubes, gravity well, Herman Kahn, Higgs boson, horn antenna, index card, Isaac Newton, Large Hadron Collider, Magellanic Cloud, pattern recognition, Plato's cave, Richard Feynman, Ronald Reagan, statistical model, Stephen Hawking, Strategic Defense Initiative, time dilation

Although this could represent an escaping graviton, more likely possibilities would need to be ruled out, such as the commonplace production of neutrinos. Unfortunately, even a hermetic detector such as ATLAS can’t account for the streams of lost neutrinos that pass unhindered through almost everything in nature—except by estimating the missing momentum and assuming it is all being transferred to neutrinos. Some physicists hope that statistical models of neutrino production would eventually prove sharp enough to indicate significant differences between the expected and actual pictures. Such discrepancies could prove that gravitons fled from collisions and ducked into regions beyond. Another potential means of establishing the existence of extra dimensions would be to look for the hypothetical phenomena called Kaluza-Klein excitations (named for Klein and an earlier unification pioneer, German mathematician Theodor Kaluza).


pages: 345 words: 75,660

Prediction Machines: The Simple Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans, Avi Goldfarb

Abraham Wald, Ada Lovelace, AI winter, Air France Flight 447, Airbus A320, algorithmic bias, AlphaGo, Amazon Picking Challenge, artificial general intelligence, autonomous vehicles, backpropagation, basic income, Bayesian statistics, Black Swan, blockchain, call centre, Capital in the Twenty-First Century by Thomas Piketty, Captain Sullenberger Hudson, carbon tax, Charles Babbage, classic study, collateralized debt obligation, computer age, creative destruction, Daniel Kahneman / Amos Tversky, data acquisition, data is the new oil, data science, deep learning, DeepMind, deskilling, disruptive innovation, driverless car, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, everywhere but in the productivity statistics, financial engineering, fulfillment center, general purpose technology, Geoffrey Hinton, Google Glasses, high net worth, ImageNet competition, income inequality, information retrieval, inventory management, invisible hand, Jeff Hawkins, job automation, John Markoff, Joseph Schumpeter, Kevin Kelly, Lyft, Minecraft, Mitch Kapor, Moneyball by Michael Lewis explains big data, Nate Silver, new economy, Nick Bostrom, On the Economy of Machinery and Manufactures, OpenAI, paperclip maximiser, pattern recognition, performance metric, profit maximization, QWERTY keyboard, race to the bottom, randomized controlled trial, Ray Kurzweil, ride hailing / ride sharing, Robert Solow, Salesforce, Second Machine Age, self-driving car, shareholder value, Silicon Valley, statistical model, Stephen Hawking, Steve Jobs, Steve Jurvetson, Steven Levy, strong AI, The Future of Employment, the long tail, The Signal and the Noise by Nate Silver, Tim Cook: Apple, trolley problem, Turing test, Uber and Lyft, uber lyft, US Airways Flight 1549, Vernor Vinge, vertical integration, warehouse automation, warehouse robotics, Watson beat the top human players on Jeopardy!, William Langewiesche, Y Combinator, zero-sum game

For example, in a mobile phone churn model, researchers utilized data on hour-by-hour call records in addition to standard variables such as bill size and payment punctuality. The machine learning methods also got better at leveraging the data available. In the Duke competition, a key component of success was choosing which of the hundreds of available variables to include and choosing which statistical model to use. The best methods at the time, whether machine learning or classic regression, used a combination of intuition and statistical tests to select the variables and model. Now, machine learning methods, and especially deep learning methods, allow flexibility in the model and this means variables can combine with each other in unexpected ways.


pages: 296 words: 78,631

Hello World: Being Human in the Age of Algorithms by Hannah Fry

23andMe, 3D printing, Air France Flight 447, Airbnb, airport security, algorithmic bias, algorithmic management, augmented reality, autonomous vehicles, backpropagation, Brixton riot, Cambridge Analytica, chief data officer, computer vision, crowdsourcing, DARPA: Urban Challenge, data science, deep learning, DeepMind, Douglas Hofstadter, driverless car, Elon Musk, fake news, Firefox, Geoffrey Hinton, Google Chrome, Gödel, Escher, Bach, Ignaz Semmelweis: hand washing, John Markoff, Mark Zuckerberg, meta-analysis, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, pattern recognition, Peter Thiel, RAND corporation, ransomware, recommendation engine, ride hailing / ride sharing, selection bias, self-driving car, Shai Danziger, Silicon Valley, Silicon Valley startup, Snapchat, sparse data, speech recognition, Stanislav Petrov, statistical model, Stephen Hawking, Steven Levy, systematic bias, TED Talk, Tesla Model S, The Wisdom of Crowds, Thomas Bayes, trolley problem, Watson beat the top human players on Jeopardy!, web of trust, William Langewiesche, you are the product

When all the inmates were eventually granted their release, and so were free to violate the terms of their parole if they chose to, Burgess had a chance to check how good his predictions were. From such a basic analysis, he managed to be remarkably ­accurate. Ninety-eight per cent of his low-risk group made a clean pass through their ­parole, while two-thirds of his high-risk group did not.17 Even crude statistical models, it turned out, could make better forecasts than the experts. But his work had its critics. Sceptical onlookers questioned how much the factors which reliably predicted parole success in one place at one time could apply elsewhere. (They had a point: I’m not sure the category ‘farm boy’ would be much help in predicting recidivism among modern inner-city criminals.)


pages: 267 words: 71,941

How to Predict the Unpredictable by William Poundstone

accounting loophole / creative accounting, Albert Einstein, Bernie Madoff, Brownian motion, business cycle, butter production in bangladesh, buy and hold, buy low sell high, call centre, centre right, Claude Shannon: information theory, computer age, crowdsourcing, Daniel Kahneman / Amos Tversky, Edward Thorp, Firefox, fixed income, forensic accounting, high net worth, index card, index fund, Jim Simons, John von Neumann, market bubble, money market fund, pattern recognition, Paul Samuelson, Ponzi scheme, power law, prediction markets, proprietary trading, random walk, Richard Thaler, risk-adjusted returns, Robert Shiller, Rubik’s Cube, statistical model, Steven Pinker, subprime mortgage crisis, transaction costs

There might be a happy medium, though, a range of probabilities where it does make sense to bet on a strong away team. That’s what the Bristol group found. To use this rule you need a good estimate of the probability of an away team win. Such estimates are not hard to come by on the web. You can also find spreadsheets or software that can be used as is or adapted to create your own statistical model. Note that the bookie odds are not proper estimates of the chances, as they factor in the commission and other tweaks. The researchers’ optimal rule was to bet on the away team when its chance of winning was between 44.7 and 71.5 percent. This is a selective rule. It applied to just twenty-two of the 194 matches in October 2007.


pages: 229 words: 72,431

Shadow Work: The Unpaid, Unseen Jobs That Fill Your Day by Craig Lambert

airline deregulation, Asperger Syndrome, banking crisis, Barry Marshall: ulcers, big-box store, business cycle, carbon footprint, cashless society, Clayton Christensen, cognitive dissonance, collective bargaining, Community Supported Agriculture, corporate governance, crowdsourcing, data science, disintermediation, disruptive innovation, emotional labour, fake it until you make it, financial independence, Galaxy Zoo, ghettoisation, gig economy, global village, helicopter parent, IKEA effect, industrial robot, informal economy, Jeff Bezos, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, Mark Zuckerberg, new economy, off-the-grid, pattern recognition, plutocrats, pneumatic tube, recommendation engine, Schrödinger's Cat, Silicon Valley, single-payer health, statistical model, the strength of weak ties, The Theory of the Leisure Class by Thorstein Veblen, Thorstein Veblen, Turing test, unpaid internship, Vanguard fund, Vilfredo Pareto, you are the product, zero-sum game, Zipcar

Algorithms are another tool that democratizes expertise, using the revolutionary power of data to outdo established authorities. For example, Theodore Ruger, then a law professor at Washington University in St. Louis, and three colleagues ran a contest to predict the outcome of Supreme Court cases on the 2002 docket. The four political scientists developed a statistical model based on six general case characteristics they extracted from previous trials; the model ignored information about specific laws and the facts of the actual cases. Their friendly contest pitted this model against the qualitative judgments of eighty-seven law professors, many of whom had clerked at the Court.


pages: 277 words: 80,703

Revolution at Point Zero: Housework, Reproduction, and Feminist Struggle by Silvia Federici

"World Economic Forum" Davos, Alan Greenspan, Community Supported Agriculture, declining real wages, equal pay for equal work, feminist movement, financial independence, fixed income, gentrification, global village, illegal immigration, informal economy, invisible hand, labor-force participation, land tenure, mass incarceration, means of production, microcredit, military-industrial complex, neoliberal agenda, new economy, Occupy movement, planetary scale, Scramble for Africa, statistical model, structural adjustment programs, the market place, tontine, trade liberalization, UNCLOS, wages for housework, Washington Consensus, women in the workforce, World Values Survey

At least since the Zapatistas, on December 31, 1993, took over the zócalo of San Cristóbal to protest legislation dissolving the ejidal lands of Mexico, the concept of the “commons” has gained popularity among the radical Left, internationally and in the United States, appearing as a ground of convergence among anarchists, Marxists/socialists, ecologists, and ecofeminists.1 There are important reasons why this apparently archaic idea has come to the center of political discussion in contemporary social movements. Two in particular stand out. On the one side, there has been the demise of the statist model of revolution that for decades has sapped the efforts of radical movements to build an alternative to capitalism. On the other, the neoliberal attempt to subordinate every form of life and knowledge to the logic of the market has heightened our awareness of the danger of living in a world in which we no longer have access to seas, trees, animals, and our fellow beings except through the cash-nexus.


Raw Data Is an Oxymoron by Lisa Gitelman

23andMe, collateralized debt obligation, computer age, continuous integration, crowdsourcing, disruptive innovation, Drosophila, Edmond Halley, Filter Bubble, Firefox, fixed income, folksonomy, Google Earth, Howard Rheingold, index card, informal economy, information security, Isaac Newton, Johann Wolfgang von Goethe, knowledge worker, Large Hadron Collider, liberal capitalism, lifelogging, longitudinal study, Louis Daguerre, Menlo Park, off-the-grid, optical character recognition, Panopticon Jeremy Bentham, peer-to-peer, RFID, Richard Thaler, Silicon Valley, social graph, software studies, statistical model, Stephen Hawking, Steven Pinker, text mining, time value of money, trade route, Turing machine, urban renewal, Vannevar Bush, WikiLeaks

Data storage of this scale, potentially measured in petabytes, would necessarily require sophisticated algorithmic querying in order to detect informational patterns. For David Gelernter, this type of data management would require “topsight,” a topdown perspective achieved through software modeling and the creation of microcosmic “mirror worlds,” in which raw data filters in from the bottom and the whole comes into focus through statistical modeling and rule and pattern extraction.36 The promise of topsight, in Gelernter’s terms, is a progression from annales to annalistes, from data collection that would satisfy a “neo-Victorian curatorial” drive to data analysis that calculates prediction scenarios and manages risk.37 What would be the locus of suspicion and paranoid fantasy (Poster calls it “database anxiety”) if not such an intricate and operationally efficient system, the aggregating capacity of which easily ups the ante on Thomas Pynchon’s paranoid realization that “everything is connected”?


pages: 251 words: 76,128

Borrow: The American Way of Debt by Louis Hyman

Alan Greenspan, asset-backed security, barriers to entry, big-box store, business cycle, cashless society, collateralized debt obligation, credit crunch, deindustrialization, deskilling, diversified portfolio, financial engineering, financial innovation, Ford Model T, Ford paid five dollars a day, Home mortgage interest deduction, housing crisis, income inequality, low interest rates, market bubble, McMansion, mortgage debt, mortgage tax deduction, Network effects, new economy, Paul Samuelson, plutocrats, price stability, Ronald Reagan, Savings and loan crisis, statistical model, Tax Reform Act of 1986, technology bubble, transaction costs, vertical integration, women in the workforce

As credit-rating agencies began to reassess the safety of the AAA mortgage-backed securities, insurance companies had to pony up greater quantities of collateral to guarantee the insurance policies on the bonds. The global credit market rested on a simple assumption: housing prices would always go up. Foreclosures would be randomly distributed, as the statistical models assumed. Yet as those models, and the companies that had created them, began to fail, a shudder ran through the corpus of global capitalism. The insurance giant AIG, which had hoped for so much profit in 1998, watched as its entire business—both traditional and new—went down, supported only by the U.S. government.


pages: 225 words: 11,355

Financial Market Meltdown: Everything You Need to Know to Understand and Survive the Global Credit Crisis by Kevin Mellyn

Alan Greenspan, asset-backed security, bank run, banking crisis, Bernie Madoff, bond market vigilante , bonus culture, Bretton Woods, business cycle, collateralized debt obligation, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, cuban missile crisis, deal flow, disintermediation, diversification, fiat currency, financial deregulation, financial engineering, financial innovation, financial intermediation, fixed income, foreign exchange controls, Francis Fukuyama: the end of history, George Santayana, global reserve currency, Greenspan put, Home mortgage interest deduction, inverted yield curve, Isaac Newton, joint-stock company, junk bonds, Kickstarter, liquidity trap, London Interbank Offered Rate, long peace, low interest rates, margin call, market clearing, mass immigration, Money creation, money market fund, moral hazard, mortgage tax deduction, Nixon triggered the end of the Bretton Woods system, Northern Rock, offshore financial centre, paradox of thrift, pattern recognition, pension reform, pets.com, Phillips curve, plutocrats, Ponzi scheme, profit maximization, proprietary trading, pushing on a string, reserve currency, risk tolerance, risk-adjusted returns, road to serfdom, Ronald Reagan, shareholder value, Silicon Valley, South Sea Bubble, statistical model, Suez canal 1869, systems thinking, tail risk, The Great Moderation, the long tail, the new new thing, the payments system, too big to fail, value at risk, very high income, War on Poverty, We are all Keynesians now, Y2K, yield curve

Like much of the ‘‘progress’’ of the last century, it was a matter of replacing common sense and tradition with science. The models produced using advanced statistics and computers were designed by brilliant minds from the best universities. At the Basle Committee, which set global standards for bank regulation to be followed by all major central banks, the use of statistical models to measure risk and reliance on the rating agencies were baked into the proposed rules for capital adequacy. The whole thing blew up not because of something obvious like greed. It failed because of the hubris, the fatal pride, of men and women who sincerely thought that they could build computer models that were capable of predicting risk and pricing it correctly.


pages: 322 words: 77,341

I.O.U.: Why Everyone Owes Everyone and No One Can Pay by John Lanchester

Alan Greenspan, asset-backed security, bank run, banking crisis, Bear Stearns, Berlin Wall, Bernie Madoff, Big bang: deregulation of the City of London, Black Monday: stock market crash in 1987, Black-Scholes formula, Blythe Masters, Celtic Tiger, collateralized debt obligation, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, Daniel Kahneman / Amos Tversky, diversified portfolio, double entry bookkeeping, Exxon Valdez, Fall of the Berlin Wall, financial deregulation, financial engineering, financial innovation, fixed income, George Akerlof, Glass-Steagall Act, greed is good, Greenspan put, hedonic treadmill, hindsight bias, housing crisis, Hyman Minsky, intangible asset, interest rate swap, invisible hand, James Carville said: "I would like to be reincarnated as the bond market. You can intimidate everybody.", Jane Jacobs, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Meriwether, junk bonds, Kickstarter, laissez-faire capitalism, light touch regulation, liquidity trap, Long Term Capital Management, loss aversion, low interest rates, Martin Wolf, money market fund, mortgage debt, mortgage tax deduction, mutually assured destruction, Myron Scholes, negative equity, new economy, Nick Leeson, Norman Mailer, Northern Rock, off-the-grid, Own Your Own Home, Ponzi scheme, quantitative easing, reserve currency, Right to Buy, risk-adjusted returns, Robert Shiller, Ronald Reagan, Savings and loan crisis, shareholder value, South Sea Bubble, statistical model, Tax Reform Act of 1986, The Great Moderation, the payments system, too big to fail, tulip mania, Tyler Cowen, value at risk

That means it should statistically have happened only once every 3 billion years. And it wasn’t the only one. The last decades have seen numerous 5-, 6-, and 7-sigma events. Those are supposed to happen, respectively, one day in every 13,932 years, one day in every 4,039,906 years, and one day in every 3,105,395,365 years. Yet no one concluded from this that the statistical models in use were wrong. The mathematical models simply didn’t work in a crisis. They worked when they worked, which was most of the time; but the whole point of them was to assess risk, and some risks by definition happen at the edges of known likelihoods. The strange thing is that this is strongly hinted at in the VAR model, as propounded by its more philosophically minded defenders such as Philippe Jorion: it marks the boundaries of the known world, up to the VAR break, and then writes “Here be Dragons.”


pages: 373 words: 80,248

Empire of Illusion: The End of Literacy and the Triumph of Spectacle by Chris Hedges

Albert Einstein, AOL-Time Warner, Ayatollah Khomeini, Bear Stearns, Cal Newport, clean water, collective bargaining, corporate governance, creative destruction, Credit Default Swap, Glass-Steagall Act, haute couture, Herbert Marcuse, Honoré de Balzac, Howard Zinn, illegal immigration, income inequality, Joseph Schumpeter, Naomi Klein, offshore financial centre, Plato's cave, power law, Ralph Nader, Ronald Reagan, scientific management, Seymour Hersh, single-payer health, social intelligence, statistical model, uranium enrichment

He added that “much of Latin America, former Soviet Union states, and sub-Saharan Africa lack sufficient cash reserves, access to international aid or credit, or other coping mechanism.” “When those growth rates go down, my gut tells me that there are going to be problems coming out of that, and we’re looking for that,” he said. He referred to “statistical modeling” showing that “economic crises increase the risk of regime-threatening instability if they persist over a one- to two-year period.” Blair articulated the newest narrative of fear. As the economic unraveling accelerates, we will be told it is not the bearded Islamic extremists who threaten us most, although those in power will drag them out of the Halloween closet whenever they need to give us an exotic shock, but instead the domestic riffraff, environmentalists, anarchists, unions, right-wing militias, and enraged members of our dispossessed working class.


pages: 280 words: 79,029

Smart Money: How High-Stakes Financial Innovation Is Reshaping Our WorldÑFor the Better by Andrew Palmer

Affordable Care Act / Obamacare, Alan Greenspan, algorithmic trading, Andrei Shleifer, asset-backed security, availability heuristic, bank run, banking crisis, behavioural economics, Black Monday: stock market crash in 1987, Black-Scholes formula, bonus culture, break the buck, Bretton Woods, call centre, Carmen Reinhart, cloud computing, collapse of Lehman Brothers, collateralized debt obligation, computerized trading, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Daniel Kahneman / Amos Tversky, David Graeber, diversification, diversified portfolio, Edmond Halley, Edward Glaeser, endogenous growth, Eugene Fama: efficient market hypothesis, eurozone crisis, family office, financial deregulation, financial engineering, financial innovation, fixed income, Flash crash, Google Glasses, Gordon Gekko, high net worth, housing crisis, Hyman Minsky, impact investing, implied volatility, income inequality, index fund, information asymmetry, Innovator's Dilemma, interest rate swap, Kenneth Rogoff, Kickstarter, late fees, London Interbank Offered Rate, Long Term Capital Management, longitudinal study, loss aversion, low interest rates, margin call, Mark Zuckerberg, McMansion, Minsky moment, money market fund, mortgage debt, mortgage tax deduction, Myron Scholes, negative equity, Network effects, Northern Rock, obamacare, payday loans, peer-to-peer lending, Peter Thiel, principal–agent problem, profit maximization, quantitative trading / quantitative finance, railway mania, randomized controlled trial, Richard Feynman, Richard Thaler, risk tolerance, risk-adjusted returns, Robert Shiller, Savings and loan crisis, short selling, Silicon Valley, Silicon Valley startup, Skype, South Sea Bubble, sovereign wealth fund, statistical model, subprime mortgage crisis, tail risk, Thales of Miletus, the long tail, transaction costs, Tunguska event, unbanked and underbanked, underbanked, Vanguard fund, web application

Public data from a couple of longitudinal studies showing the long-term relationship between education and income in the United States enabled him to build what he describes as “a simple multivariate regression model”—you know the sort, we’ve all built one—and work out the relationships between things such as test scores, degrees, and first jobs on later income. That model has since grown into something whizzier. An applicant’s education, SAT scores, work experience, and other details are pumped into a proprietary statistical model, which looks at people with comparable backgrounds and generates a prediction of that person’s personal income. Upstart now uses these data to underwrite loans to younger people—who often find it hard to raise money because of their limited credit histories. But the model was initially used to determine how much money an applicant could raise for each percentage point of future income they gave away.


Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Geron

AlphaGo, Amazon Mechanical Turk, Bayesian statistics, centre right, combinatorial explosion, constrained optimization, correlation coefficient, crowdsourcing, data science, deep learning, DeepMind, duck typing, en.wikipedia.org, Geoffrey Hinton, iterative process, Netflix Prize, NP-complete, optical character recognition, P = NP, p-value, pattern recognition, performance metric, recommendation engine, self-driving car, SpamAssassin, speech recognition, statistical model

They often end up selecting the same model, but when they differ, the model selected by the BIC tends to be simpler (fewer parameters) than the one selected by the AIC, but it does not fit the data quite as well (this is especially true for larger datasets). Likelihood function The terms “probability” and “likelihood” are often used interchangeably in the English language, but they have very different meanings in statistics: given a statistical model with some parameters θ, the word “probability” is used to describe how plausible a future outcome x is (knowing the parameter values θ), while the word “likelihood” is used to describe how plausible a particular set of parameter values θ are, after the outcome x is known. Consider a one-dimensional mixture model of two Gaussian distributions centered at -4 and +1.


pages: 303 words: 74,206

GDP: The World’s Most Powerful Formula and Why It Must Now Change by Ehsan Masood

Alan Greenspan, anti-communist, bank run, banking crisis, biodiversity loss, Bob Geldof, Bretton Woods, centre right, clean water, colonial rule, coronavirus, COVID-19, Credit Default Swap, decarbonisation, deindustrialization, Diane Coyle, energy security, European colonialism, financial engineering, government statistician, happiness index / gross national happiness, income inequality, indoor plumbing, Intergovernmental Panel on Climate Change (IPCC), Isaac Newton, job satisfaction, Kickstarter, Mahbub ul Haq, mass immigration, means of production, Meghnad Desai, Mohammed Bouazizi, Robert Solow, Ronald Reagan, Sheryl Sandberg, Silicon Valley, Simon Kuznets, Skype, statistical model, the scientific method, The Spirit Level, Washington Consensus, wealth creators, zoonotic diseases

But they remain a minority and to some extent marginal voices. Given the explosion of data and the tools with which to manipulate data, the trend is completely in the other direction. Our world today is what Keynes feared it would become. Most scientists and economists rely heavily on numerical and statistical models. Pick a country—any country in the world—and its economy, as well as its financial systems, is likewise built on such models. Some of these models, such as GDP, are simplistic. Others, such as those used in banking, can be far more complex. In either case, there are few practitioners who now have the ability to explain, rationalize, or critique using non-mathematical language what they do and why they do it.


pages: 232 words: 72,483

Immortality, Inc. by Chip Walter

23andMe, Airbnb, Albert Einstein, Arthur D. Levinson, bioinformatics, Buckminster Fuller, cloud computing, CRISPR, data science, disintermediation, double helix, Elon Musk, Isaac Newton, Jeff Bezos, Larry Ellison, Law of Accelerating Returns, life extension, Menlo Park, microbiome, mouse model, pattern recognition, Peter Thiel, phenotype, radical life extension, Ray Kurzweil, Recombinant DNA, Rodney Brooks, self-driving car, Silicon Valley, Silicon Valley startup, Snapchat, South China Sea, SpaceShipOne, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, TED Talk, Thomas Bayes, zero day

Melamud’s graphs showed that the longer people lived, the longer the list of diseases became: malfunctioning hearts, cancer, and Alzheimer’s being the three biggest killers. Whatever slowed those diseases and increased life span occurred thanks only to alpha’s whack-a-mole–style medicine. For fun, Melamud changed the statistical model for beta—the constant 8.5-year number that set the evolutionary life limit for humans at no more than 120 years. When that number was zeroed out, the calculations didn’t merely show an improvement; they blew everybody away. If the increase in beta was halted at age 30—a huge if to be sure—the median life span of that person would leap to 695 years!


pages: 804 words: 212,335

Revelation Space by Alastair Reynolds

game design, glass ceiling, gravity well, Kuiper Belt, planetary scale, random walk, statistical model, time dilation, VTOL

But if Sajaki's equipment was not the best, chances were good that he had excellent algorithms to distil memory traces. Over centuries, statistical models had studied patterns of memory storage in ten billion human minds, correlating structure against experience. Certain impressions tended to be reflected in similar neural structures — internal qualia — which were the functional blocks out of which more complex memories were assembled. Those qualia were never the same from mind to mind, except in very rare cases, but neither were they encoded in radically different ways, since nature would never deviate far from the minimum-energy route to a particular solution. The statistical models could identify those qualia patterns very efficiently, and then map the connections between them out of which memories were forged.


Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman

cloud computing, crowdsourcing, en.wikipedia.org, first-price auction, G4S, information retrieval, John Snow's cholera map, Netflix Prize, NP-complete, PageRank, pattern recognition, power law, random walk, recommendation engine, second-price auction, sentiment analysis, social graph, statistical model, the long tail, web application

Originally, “data mining” or “data dredging” was a derogatory term referring to attempts to extract information that was not supported by the data. Section 1.2 illustrates the sort of errors one can make by trying to extract what really isn’t in the data. Today, “data mining” has taken on a positive meaning. Now, statisticians view data mining as the construction of a statistical model, that is, an underlying distribution from which the visible data is drawn. EXAMPLE 1.1Suppose our data is a set of numbers. This data is much simpler than data that would be data-mined, but it will serve as an example. A statistician might decide that the data comes from a Gaussian distribution and use a formula to compute the most likely parameters of this Gaussian.

., 324 SNAP, 382 Social Graph, 326 Social network, 16, 325, 326, 384 SON Algorithm, 217 Source, 367 Space, 87, 228 Spam, see also Term spam, see also Link spam, 328, 421 Spam farm, 178, 180 Spam mass, 180, 181 Sparse matrix, 28, 76, 77, 168, 293 Spectral partitioning, 343 Spider trap, 161, 164, 184 Splitting clusters, 255 SQL, 19, 30, 66 Squares, 366 Srikant, R., 226 Srivastava, U., 67 Standard deviation, 245, 247 Standing query, 125 Stanford Network Analysis Platform, see SNAP Star join, 50 Stata, R., 18, 190 Statistical model, 1 Status, 287 Steinbach, M., 18 Stochastic gradient descent, 320, 445 Stochastic matrix, 158, 385 Stop clustering, 234, 238, 240 Stop words, 7, 74, 110, 194, 298 Stream, see Data stream Strength of membership, 355 String, 112 Striping, 29, 168, 170 Strong edge, 328 Strongly connected component, 159, 374 Strongly connected graph, 158, 368 Substochastic matrix, 161 Suffix length, 116 Summarization, 3 Summation, 147 Sun, J., 414 Supercomputer, 19 Superimposed code, see Bloom filter, 152 Supermarket, 193, 214 Superstep, 43 Supervised learning, 415, 417 Support, 192, 216, 218, 221 Support vector, 437 Support-vector machine, 17, 415, 419, 436, 455 Supporting page, 178 Suri, S., 383 Surprise number, 137 SVD, see Singular-value decomposition SVM, see Support-vector machine Swami, A., 226 Symmetric matrix, 346, 384 Szegedy, M., 152 Tag, 298, 329 Tail, 372 Tail length, 135, 376 Tan, P.


pages: 337 words: 86,320

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz

affirmative action, AltaVista, Amazon Mechanical Turk, Asian financial crisis, Bernie Sanders, big data - Walmart - Pop Tarts, Black Lives Matter, Cass Sunstein, computer vision, content marketing, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, desegregation, Donald Trump, Edward Glaeser, Filter Bubble, game design, happiness index / gross national happiness, income inequality, Jeff Bezos, Jeff Seder, John Snow's cholera map, longitudinal study, Mark Zuckerberg, Nate Silver, Nick Bostrom, peer-to-peer lending, Peter Thiel, price discrimination, quantitative hedge fund, Ronald Reagan, Rosa Parks, sentiment analysis, Silicon Valley, statistical model, Steve Jobs, Steven Levy, Steven Pinker, TaskRabbit, The Signal and the Noise by Nate Silver, working poor

Crossley, “Validity of Responses to Survey Questions,” Public Opinion Quarterly 14, 1 (1950). 106 survey asked University of Maryland graduates: Frauke Kreuter, Stanley Presser, and Roger Tourangeau, “Social Desirability Bias in CATI, IVR, and Web Surveys,” Public Opinion Quarterly 72(5), 2008. 107 failure of the polls: For an article arguing that lying might be a problem in trying to predict support for Trump, see Thomas B. Edsall, “How Many People Support Trump but Don’t Want to Admit It?” New York Times, May 15, 2016, SR2. But for an argument that this was not a large factor, see Andrew Gelman, “Explanations for That Shocking 2% Shift,” Statistical Modeling, Causal Inference, and Social Science, November 9, 2016, http://andrewgelman.com/2016/11/09/explanations-shocking-2-shift/. 107 says Tourangeau: I interviewed Roger Tourangeau by phone on May 5, 2015. 107 so many people say they are above average: This is discussed in Adam Grant, Originals: How Non-Conformists Move the World (New York: Viking, 2016).


pages: 348 words: 83,490

More Than You Know: Finding Financial Wisdom in Unconventional Places (Updated and Expanded) by Michael J. Mauboussin

Alan Greenspan, Albert Einstein, Andrei Shleifer, Atul Gawande, availability heuristic, beat the dealer, behavioural economics, Benoit Mandelbrot, Black Swan, Brownian motion, butter production in bangladesh, buy and hold, capital asset pricing model, Clayton Christensen, clockwork universe, complexity theory, corporate governance, creative destruction, Daniel Kahneman / Amos Tversky, deliberate practice, demographic transition, discounted cash flows, disruptive innovation, diversification, diversified portfolio, dogs of the Dow, Drosophila, Edward Thorp, en.wikipedia.org, equity premium, equity risk premium, Eugene Fama: efficient market hypothesis, fixed income, framing effect, functional fixedness, hindsight bias, hiring and firing, Howard Rheingold, index fund, information asymmetry, intangible asset, invisible hand, Isaac Newton, Jeff Bezos, John Bogle, Kenneth Arrow, Laplace demon, Long Term Capital Management, loss aversion, mandelbrot fractal, margin call, market bubble, Menlo Park, mental accounting, Milgram experiment, Murray Gell-Mann, Nash equilibrium, new economy, Paul Samuelson, Performance of Mutual Funds in the Period, Pierre-Simon Laplace, power law, quantitative trading / quantitative finance, random walk, Reminiscences of a Stock Operator, Richard Florida, Richard Thaler, Robert Shiller, shareholder value, statistical model, Steven Pinker, stocks for the long run, Stuart Kauffman, survivorship bias, systems thinking, The Wisdom of Crowds, transaction costs, traveling salesman, value at risk, wealth creators, women in the workforce, zero-sum game

Psychologist Phil Tetlock asked nearly three hundred experts to make literally tens of thousands of predictions over nearly two decades. These were difficult predictions related to political and economic outcomes—similar to the types of problems investors tackle. The results were unimpressive. Expert forecasters improved little, if at all, on simple statistical models. Further, when Tetlock confronted the experts with their poor predicting acuity, they went about justifying their views just like everyone else does. Tetlock doesn’t describe in detail what happens when the expert opinions are aggregated, but his research certainly shows that ability, defined as expertise, does not lead to good predictions when the problems are hard.


Psychopathy: An Introduction to Biological Findings and Their Implications by Andrea L. Glenn, Adrian Raine

dark triade / dark tetrad, epigenetics, longitudinal study, loss aversion, meta-analysis, phenotype, randomized controlled trial, selection bias, selective serotonin reuptake inhibitor (SSRI), statistical model, theory of mind, trolley problem, twin studies

In behavioral genetics studies, the similarity of MZ twins on a given trait is compared to the similarity of DZ twins on that trait. If MZ twins are more similar than DZ twins, then it can be inferred that the trait being measured is at least partly due to genetic factors. Across large samples, statistical modeling techniques can determine the proportion of the variance in a particular trait or phenotype (in this case, psychopathy or a subcomponent of it) that is accounted for by genetic versus environmental factors. Genetic factors either can be additive or nonadditive. Additive means that genes summate to contribute to a phenotype.


pages: 322 words: 84,752

Pax Technica: How the Internet of Things May Set Us Free or Lock Us Up by Philip N. Howard

Aaron Swartz, Affordable Care Act / Obamacare, Berlin Wall, bitcoin, blood diamond, Bretton Woods, Brian Krebs, British Empire, butter production in bangladesh, call centre, Chelsea Manning, citizen journalism, Citizen Lab, clean water, cloud computing, corporate social responsibility, creative destruction, crowdsourcing, digital map, Edward Snowden, en.wikipedia.org, Evgeny Morozov, failed state, Fall of the Berlin Wall, feminist movement, Filter Bubble, Firefox, Francis Fukuyama: the end of history, Google Earth, Hacker News, Howard Rheingold, income inequality, informal economy, information security, Internet of things, John Perry Barlow, Julian Assange, Kibera, Kickstarter, land reform, M-Pesa, Marshall McLuhan, megacity, Mikhail Gorbachev, mobile money, Mohammed Bouazizi, national security letter, Nelson Mandela, Network effects, obamacare, Occupy movement, off-the-grid, packet switching, pension reform, prediction markets, sentiment analysis, Silicon Valley, Skype, spectrum auction, statistical model, Stuxnet, Tactical Technology Collective, technological determinism, trade route, Twitter Arab Spring, undersea cable, uranium enrichment, WikiLeaks, zero day

Important events and recognizable causal connections can’t be replicated or falsified. We can’t repeat the Arab Spring in some kind of experiment. We can’t test its negation—an Arab Spring that never happened, or an Arab Spring minus one key factor that resulted in a different outcome. We don’t have enough large datasets about Arab Spring–like events to run statistical models. That doesn’t mean we shouldn’t try to learn from the real events that happened. In fact, for many in the social sciences, tracing how real events unfolded is the best way to understand political change. The richest explanations of the fall of the Berlin Wall, for example, as sociologist Steve Pfaff crafts them, come from such process tracing.2 We do, however, know enough to make some educated guesses about what will happen next.


pages: 283 words: 81,163

How Capitalism Saved America: The Untold History of Our Country, From the Pilgrims to the Present by Thomas J. Dilorenzo

air traffic controllers' union, Alan Greenspan, banking crisis, British Empire, business cycle, California energy crisis, collective bargaining, Cornelius Vanderbilt, corporate governance, corporate social responsibility, electricity market, financial deregulation, Fractional reserve banking, Hernando de Soto, Ida Tarbell, income inequality, invisible hand, Joseph Schumpeter, laissez-faire capitalism, McDonald's hot coffee lawsuit, means of production, medical malpractice, Menlo Park, minimum wage unemployment, Money creation, Norman Mailer, plutocrats, price stability, profit maximization, profit motive, Ralph Nader, rent control, rent-seeking, Robert Bork, rolling blackouts, Ronald Coase, Ronald Reagan, scientific management, Silicon Valley, statistical model, Tax Reform Act of 1986, The Wealth of Nations by Adam Smith, transcontinental railway, union organizing, Upton Sinclair, vertical integration, W. E. B. Du Bois, wealth creators, working poor, Works Progress Administration, zero-sum game

Wages rose by a phenomenal 13.7 percent during the first three quarters of 1937 alone.46 The union/nonunion wage differential increased from 5 percent in 1933 to 23 percent by 1940.47 On top of this, the Social Security payroll and unemployment insurance taxes contributed to a rapid rise in government-mandated fringe benefits, from 2.4 percent of payrolls in 1936 to 5.1 percent just two years later. Economists Richard Vedder and Lowell Gallaway have determined the costs of all this misguided legislation, showing how most of the abnormal unemployment of the 1930s would have been avoided had it not been for the New Deal. Using a statistical model, Vedder and Gallaway concluded that by 1940 the unemployment rate was more than 8 percentage points higher than it would have been without the legislation-induced growth in unionism and government-mandated fringe-benefit costs imposed on employers.48 Their conclusion: “The Great Depression was very significantly prolonged in both its duration and its magnitude by the impact of New Deal programs.”49 In addition to fascistic labor policies and government-mandated wage and fringe-benefit increases that destroyed millions of jobs, the Second New Deal was responsible for economy-destroying tax increases and massive government spending on myriad government make-work programs.


pages: 345 words: 86,394

Frequently Asked Questions in Quantitative Finance by Paul Wilmott

Abraham Wald, Albert Einstein, asset allocation, beat the dealer, Black-Scholes formula, Brownian motion, butterfly effect, buy and hold, capital asset pricing model, collateralized debt obligation, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, delta neutral, discrete time, diversified portfolio, Edward Thorp, Emanuel Derman, Eugene Fama: efficient market hypothesis, financial engineering, fixed income, fudge factor, implied volatility, incomplete markets, interest rate derivative, interest rate swap, iterative process, lateral thinking, London Interbank Offered Rate, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, margin call, market bubble, martingale, Myron Scholes, Norbert Wiener, Paul Samuelson, power law, quantitative trading / quantitative finance, random walk, regulatory arbitrage, risk free rate, risk/return, Sharpe ratio, statistical arbitrage, statistical model, stochastic process, stochastic volatility, transaction costs, urban planning, value at risk, volatility arbitrage, volatility smile, Wiener process, yield curve, zero-coupon bond

This is hat B. The final hat’s numbers have mean of zero and standard deviation 10. This is hat C. You don’t know which hat is which. You pick a number out of one hat, it is −2.6. Which hat do you think it came from? MLE can help you answer this question. Long Answer A large part of statistical modelling concerns finding model parameters. One popular way of doing this is Maximum Likelihood Estimation. The method is easily explained by a very simple example. You are attending a maths conference. You arrive by train at the city hosting the event. You take a taxi from the train station to the conference venue.


When Free Markets Fail: Saving the Market When It Can't Save Itself (Wiley Corporate F&A) by Scott McCleskey

Alan Greenspan, Asian financial crisis, asset-backed security, bank run, barriers to entry, Bear Stearns, Bernie Madoff, break the buck, call centre, collateralized debt obligation, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, financial engineering, financial innovation, fixed income, Glass-Steagall Act, information asymmetry, invisible hand, Isaac Newton, iterative process, junk bonds, Long Term Capital Management, margin call, money market fund, moral hazard, mortgage debt, place-making, Ponzi scheme, prediction markets, proprietary trading, risk tolerance, Savings and loan crisis, shareholder value, statistical model, The Wealth of Nations by Adam Smith, time value of money, too big to fail, web of trust

The methodology is the description of what information to gather and how to process the information to arrive at a rating (what conditions would lead to an AAA rating, to an AA rating, etc.). The 7 Gretchen Morgenson, ‘‘Debt Watchdogs: Tamed or Caught Napping?’’ New York Times, December 6, 2008. 8 Ibid. 9 Ibid. C10 06/16/2010 90 16:31:17 & Page 90 Rating the Raters: The Role of Credit Rating Agencies statistical models are the algorithms that predict the outcomes of various scenarios, such as what would happen to an airline if the price of oil rose to $100 per barrel. The analyst does his or her homework and comes up with the rating he or she believes is correct, but this is only the beginning of the process.


pages: 302 words: 86,614

The Alpha Masters: Unlocking the Genius of the World's Top Hedge Funds by Maneet Ahuja, Myron Scholes, Mohamed El-Erian

"World Economic Forum" Davos, activist fund / activist shareholder / activist investor, Alan Greenspan, Asian financial crisis, asset allocation, asset-backed security, backtesting, Bear Stearns, Bernie Madoff, book value, Bretton Woods, business process, call centre, Carl Icahn, collapse of Lehman Brothers, collateralized debt obligation, computerized trading, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, Donald Trump, en.wikipedia.org, family office, financial engineering, fixed income, global macro, high net worth, high-speed rail, impact investing, interest rate derivative, Isaac Newton, Jim Simons, junk bonds, Long Term Capital Management, managed futures, Marc Andreessen, Mark Zuckerberg, merger arbitrage, Michael Milken, Myron Scholes, NetJets, oil shock, pattern recognition, Pershing Square Capital Management, Ponzi scheme, proprietary trading, quantitative easing, quantitative trading / quantitative finance, Renaissance Technologies, risk-adjusted returns, risk/return, rolodex, Savings and loan crisis, short selling, Silicon Valley, South Sea Bubble, statistical model, Steve Jobs, stock buybacks, systematic bias, systematic trading, tail risk, two and twenty, zero-sum game

Wong says that the one thing most people don’t understand about systematic trading is the trade-off between profit potential in the long term and the potential for short-term fluctuation and losses. “We are all about the long run,” he says. “It’s why I say, over and over, the trend is your friend.” “If you’re a macro trader and you basically have 20 positions, you better make sure that no more than two or three are wrong. But we base our positions on statistical models, and we take hundreds of positions. At any given time, a lot of them are going to be wrong, and we have to accept that. But in the long run, we’ll be more right than wrong.” Evidently—since 1990, AHL’s total returns have exceeded 1,000 percent. Still, AHL is hardly invulnerable. The financial crisis brought on a sharp reversal, and the firm remains vulnerable to the Fed-induced drop in market volatility.


pages: 290 words: 83,248

The Greed Merchants: How the Investment Banks Exploited the System by Philip Augar

Alan Greenspan, Andy Kessler, AOL-Time Warner, barriers to entry, Bear Stearns, Berlin Wall, Big bang: deregulation of the City of London, Bonfire of the Vanities, business cycle, buttonwood tree, buy and hold, capital asset pricing model, Carl Icahn, commoditize, corporate governance, corporate raider, crony capitalism, cross-subsidies, deal flow, equity risk premium, financial deregulation, financial engineering, financial innovation, fixed income, Glass-Steagall Act, Gordon Gekko, high net worth, information retrieval, interest rate derivative, invisible hand, John Meriwether, junk bonds, Long Term Capital Management, low interest rates, Martin Wolf, Michael Milken, new economy, Nick Leeson, offshore financial centre, pensions crisis, proprietary trading, regulatory arbitrage, risk free rate, Sand Hill Road, shareholder value, short selling, Silicon Valley, South Sea Bubble, statistical model, systematic bias, Telecommunications Act of 1996, The Chicago School, The Predators' Ball, The Wealth of Nations by Adam Smith, transaction costs, tulip mania, value at risk, yield curve

As we saw in Chapter 1, the deal was intended to create a world-class modern media company but instead led to the largest losses in corporate history. Bankers such as the legendary dealmaker Bruce Wasserstein are dismissive of the value-destroying arguments: ‘The problem with many academic studies is that they make questionable assumptions to squeeze untidy data points into a pristine statistical model.’32 But the weight of evidence from the late-twentieth-century merger wave seems to show that the handsome profits made by the selling shareholders were usually offset by subsequent losses for the acquirers. This would suggest that many mergers were not well thought out and were attempted by managers that lacked the skills and techniques to make them work.


pages: 280 words: 83,299

Empty Planet: The Shock of Global Population Decline by Darrell Bricker, John Ibbitson

"World Economic Forum" Davos, affirmative action, agricultural Revolution, Berlin Wall, Black Lives Matter, Brexit referendum, BRICs, British Empire, Columbian Exchange, commoditize, demographic dividend, demographic transition, Deng Xiaoping, Donald Trump, en.wikipedia.org, full employment, gender pay gap, gentrification, ghettoisation, glass ceiling, global reserve currency, Great Leap Forward, Gunnar Myrdal, Hans Rosling, Hernando de Soto, illegal immigration, income inequality, James Watt: steam engine, Jeff Bezos, John Snow's cholera map, Kibera, knowledge worker, labor-force participation, Mark Zuckerberg, megacity, New Urbanism, nuclear winter, off grid, offshore financial centre, out of africa, Potemkin village, purchasing power parity, reserve currency, Ronald Reagan, Silicon Valley, South China Sea, statistical model, Steve Jobs, Steven Pinker, The Wealth of Nations by Adam Smith, Thomas Malthus, transcontinental railway, upwardly mobile, urban planning, working-age population, young professional, zero-sum game

“If we continue at this pace, one day the next species we extinguish may be ourselves,” Bourne warned.77 But the biggest neo-Malthusian of them all is an institution, and a highly respected one at that. The United Nations Population Division, a critical component of the UN’s Department of Economic and Social Affairs, is almost as old as the UN itself, having existed in one form or another since 1946. Its principal goal is to develop statistical models that will accurately project the growth of the global population. The demographers and statisticians who work there are good at their jobs. In 1958, the division predicted that the global population would reach 6.28 billion by 2000. In fact, it was a bit lower, at 6.06 billion, about 200 million out—a difference small enough not to count.78 This was remarkably impressive, given that demographers at that time had highly inadequate data for Africa and China.


The Armchair Economist: Economics and Everyday Life by Steven E. Landsburg

Albert Einstein, Arthur Eddington, business cycle, diversified portfolio, Dutch auction, first-price auction, German hyperinflation, Golden Gate Park, information asymmetry, invisible hand, junk bonds, Kenneth Arrow, low interest rates, means of production, price discrimination, profit maximization, Ralph Nader, random walk, Ronald Coase, Sam Peltzman, Savings and loan crisis, sealed-bid auction, second-price auction, second-price sealed-bid, statistical model, the scientific method, Unsafe at Any Speed

The commissioner became obsessed with the need to discourage punting and called in his assistants for advice on how to cope with the problem. One of those assistants, a fresh M.B.A., breathlessly announced that he had taken courses from an economist who was a great expert on all aspects of the game and who had developed detailed statistical models to predict how teams behave. He proposed retaining the economist to study what makes teams punt. 211 212 THE PITFALLS OF SCIENCE The commissioner summoned the economist, who went home with a large retainer check and a mandate to discover the causes of punting. Many hours later (he billed by the hour) the answer was at hand.


pages: 291 words: 81,703

Average Is Over: Powering America Beyond the Age of the Great Stagnation by Tyler Cowen

Amazon Mechanical Turk, behavioural economics, Black Swan, brain emulation, Brownian motion, business cycle, Cass Sunstein, Charles Babbage, choice architecture, complexity theory, computer age, computer vision, computerized trading, cosmological constant, crowdsourcing, dark matter, David Brooks, David Ricardo: comparative advantage, deliberate practice, driverless car, Drosophila, en.wikipedia.org, endowment effect, epigenetics, Erik Brynjolfsson, eurozone crisis, experimental economics, Flynn Effect, Freestyle chess, full employment, future of work, game design, Higgs boson, income inequality, industrial robot, informal economy, Isaac Newton, Johannes Kepler, John Markoff, Ken Thompson, Khan Academy, labor-force participation, Loebner Prize, low interest rates, low skilled workers, machine readable, manufacturing employment, Mark Zuckerberg, meta-analysis, microcredit, Myron Scholes, Narrative Science, Netflix Prize, Nicholas Carr, off-the-grid, P = NP, P vs NP, pattern recognition, Peter Thiel, randomized controlled trial, Ray Kurzweil, reshoring, Richard Florida, Richard Thaler, Ronald Reagan, Silicon Valley, Skype, statistical model, stem cell, Steve Jobs, Turing test, Tyler Cowen, Tyler Cowen: Great Stagnation, upwardly mobile, Yogi Berra

On the age dynamics for achievement for non-economists, see Benjamin F. Jones and Bruce A. Weinberg, “Age Dynamics in Scientific Creativity,” published online before print, PNAS, November 7, 2011, doi: 10.1073/pnas.1102895108. On data crunching pushing out theory, see the famous essay by Leo Breiman, “Statistical Modeling: The Two Cultures,” Statistical Science, 2001, 16(3): 199–231, including the comments on the piece as well. See also the recent piece by Betsey Stevenson and Justin Wolfers, “Business is Booming in Empirical Economics,” Bloomberg.com, August 6, 2012. And as mentioned earlier, see Daniel S.


pages: 561 words: 87,892

Losing Control: The Emerging Threats to Western Prosperity by Stephen D. King

"World Economic Forum" Davos, Admiral Zheng, Alan Greenspan, asset-backed security, barriers to entry, Berlin Wall, Bernie Madoff, Bretton Woods, BRICs, British Empire, business cycle, capital controls, Celtic Tiger, central bank independence, collateralized debt obligation, corporate governance, credit crunch, crony capitalism, currency manipulation / currency intervention, currency peg, David Ricardo: comparative advantage, demographic dividend, demographic transition, Deng Xiaoping, Diane Coyle, Fall of the Berlin Wall, financial deregulation, financial innovation, fixed income, foreign exchange controls, Francis Fukuyama: the end of history, full employment, G4S, George Akerlof, German hyperinflation, Gini coefficient, Great Leap Forward, guns versus butter model, hiring and firing, income inequality, income per capita, inflation targeting, invisible hand, Isaac Newton, junk bonds, knowledge economy, labour market flexibility, labour mobility, liberal capitalism, low interest rates, low skilled workers, market clearing, Martin Wolf, mass immigration, Meghnad Desai, Mexican peso crisis / tequila crisis, Naomi Klein, new economy, old age dependency ratio, Paul Samuelson, Ponzi scheme, price mechanism, price stability, purchasing power parity, rent-seeking, reserve currency, rising living standards, Ronald Reagan, Savings and loan crisis, savings glut, Silicon Valley, Simon Kuznets, sovereign wealth fund, spice trade, statistical model, technology bubble, The Great Moderation, The inhabitant of London could order by telephone, sipping his morning tea in bed, the various products of the whole earth, The Market for Lemons, The Wealth of Nations by Adam Smith, Thomas Malthus, trade route, transaction costs, Washington Consensus, We are all Keynesians now, women in the workforce, working-age population, Y2K, Yom Kippur War

WE’RE NOT ON OUR OWN In my twenty-five years as a professional economist, initially as a civil servant in Whitehall but, for the most part, as an employee of a major international bank, I’ve spent a good deal of time looking into the future. As the emerging nations first appeared on the economic radar screen, I began to realize I could talk about the future only by delving much further into the past. I wasn’t interested merely in the history incorporated into statistical models of the economy, a history which typically includes just a handful of years and therefore ignores almost all the interesting economic developments that have taken place over the last millennium. Instead, the history that mattered to me had to capture the long sweep of economic and political progress and all too frequent reversal.


pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat

AI winter, air gap, AltaVista, Amazon Web Services, artificial general intelligence, Asilomar, Automated Insights, Bayesian statistics, Bernie Madoff, Bill Joy: nanobots, Bletchley Park, brain emulation, California energy crisis, cellular automata, Chuck Templeton: OpenTable:, cloud computing, cognitive bias, commoditize, computer vision, Computing Machinery and Intelligence, cuban missile crisis, Daniel Kahneman / Amos Tversky, Danny Hillis, data acquisition, don't be evil, drone strike, dual-use technology, Extropian, finite state, Flash crash, friendly AI, friendly fire, Google Glasses, Google X / Alphabet X, Hacker News, Hans Moravec, Isaac Newton, Jaron Lanier, Jeff Hawkins, John Markoff, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, Loebner Prize, lone genius, machine translation, mutually assured destruction, natural language processing, Neil Armstrong, Nicholas Carr, Nick Bostrom, optical character recognition, PageRank, PalmPilot, paperclip maximiser, pattern recognition, Peter Thiel, precautionary principle, prisoner's dilemma, Ray Kurzweil, Recombinant DNA, Rodney Brooks, rolling blackouts, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, smart grid, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Jurvetson, Steve Wozniak, strong AI, Stuxnet, subprime mortgage crisis, superintelligent machines, technological singularity, The Coming Technological Singularity, Thomas Bayes, traveling salesman, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, zero day

When I asked Jason Freidenfelds, from Google PR, he wrote: … it’s much too early for us to speculate about topics this far down the road. We’re generally more focused on practical machine learning technologies like machine vision, speech recognition, and machine translation, which essentially is about building statistical models to match patterns—nothing close to the “thinking machine” vision of AGI. But I think Page’s quotation sheds more light on Google’s attitudes than Freidenfelds’s. And it helps explain Google’s evolution from the visionary, insurrectionist company of the 1990s, with the much touted slogan DON’T BE EVIL, to today’s opaque, Orwellian, personal-data-aggregating behemoth.


pages: 245 words: 83,272

Artificial Unintelligence: How Computers Misunderstand the World by Meredith Broussard

"Susan Fowler" uber, 1960s counterculture, A Declaration of the Independence of Cyberspace, Ada Lovelace, AI winter, Airbnb, algorithmic bias, AlphaGo, Amazon Web Services, autonomous vehicles, availability heuristic, barriers to entry, Bernie Sanders, Big Tech, bitcoin, Buckminster Fuller, Charles Babbage, Chris Urmson, Clayton Christensen, cloud computing, cognitive bias, complexity theory, computer vision, Computing Machinery and Intelligence, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data science, deep learning, Dennis Ritchie, digital map, disruptive innovation, Donald Trump, Douglas Engelbart, driverless car, easy for humans, difficult for computers, Electric Kool-Aid Acid Test, Elon Musk, fake news, Firefox, gamification, gig economy, global supply chain, Google Glasses, Google X / Alphabet X, Greyball, Hacker Ethic, independent contractor, Jaron Lanier, Jeff Bezos, Jeremy Corbyn, John Perry Barlow, John von Neumann, Joi Ito, Joseph-Marie Jacquard, life extension, Lyft, machine translation, Mark Zuckerberg, mass incarceration, Minecraft, minimum viable product, Mother of all demos, move fast and break things, Nate Silver, natural language processing, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, One Laptop per Child (OLPC), opioid epidemic / opioid crisis, PageRank, Paradox of Choice, payday loans, paypal mafia, performance metric, Peter Thiel, price discrimination, Ray Kurzweil, ride hailing / ride sharing, Ross Ulbricht, Saturday Night Live, school choice, self-driving car, Silicon Valley, Silicon Valley billionaire, speech recognition, statistical model, Steve Jobs, Steven Levy, Stewart Brand, TechCrunch disrupt, Tesla Model S, the High Line, The Signal and the Noise by Nate Silver, theory of mind, traumatic brain injury, Travis Kalanick, trolley problem, Turing test, Uber for X, uber lyft, Watson beat the top human players on Jeopardy!, We are as Gods, Whole Earth Catalog, women in the workforce, work culture , yottabyte

We created the Survived column and got a number that we can call 97 percent accurate. We learned that fare is the most influential factor in a mathematical analysis of Titanic survivor data. This was narrow artificial intelligence. It was not anything to be scared of, nor was it leading us toward a global takeover by superintelligent computers. “These are just statistical models, the same as those that Google uses to play board games or that your phone uses to make predictions about what word you’re saying in order to transcribe your messages,” Carnegie Mellon professor and machine learning researcher Zachary Lipton told the Register about AI. “They are no more sentient than a bowl of noodles.”17 For a programmer, writing an algorithm is that easy.


pages: 297 words: 84,447

The Star Builders: Nuclear Fusion and the Race to Power the Planet by Arthur Turrell

Albert Einstein, Arthur Eddington, autonomous vehicles, Boeing 747, Boris Johnson, carbon tax, coronavirus, COVID-19, data science, decarbonisation, deep learning, Donald Trump, Eddington experiment, energy security, energy transition, Ernest Rutherford, Extinction Rebellion, green new deal, Greta Thunberg, Higgs boson, Intergovernmental Panel on Climate Change (IPCC), ITER tokamak, Jeff Bezos, Kickstarter, Large Hadron Collider, lockdown, New Journalism, nuclear winter, Peter Thiel, planetary scale, precautionary principle, Project Plowshare, Silicon Valley, social distancing, sovereign wealth fund, statistical model, Stephen Hawking, Steve Bannon, TED Talk, The Rise and Fall of American Growth, Tunguska event

Clery, “Laser Fusion Reactor Approaches ‘Burning Plasma’ Milestone,” Science 370 (2020): 1019–20. 15. D. Clark et al., “Three-Dimensional Modeling and Hydrodynamic Scaling of National Ignition Facility Implosions,” Physics of Plasmas 26 (2019): 050601; V. Gopalaswamy et al., “Tripled Yield in Direct-Drive Laser Fusion Through Statistical Modelling,” Nature 565 (2019): 581–86. 16. K. Hahn et al., “Fusion-Neutron Measurements for Magnetized Liner Inertial Fusion Experiments on the Z Accelerator,” in Journal of Physics: Conference Series, vol. 717 (IOP Publishing, 2016), 012020. 17. O. Hurricane et al., “Approaching a Burning Plasma on the NIF,” Physics of Plasmas 26 (2019): 052704; P.


pages: 303 words: 84,023

Heads I Win, Tails I Win by Spencer Jakab

Alan Greenspan, Asian financial crisis, asset allocation, backtesting, Bear Stearns, behavioural economics, Black Monday: stock market crash in 1987, book value, business cycle, buy and hold, collapse of Lehman Brothers, correlation coefficient, crowdsourcing, Daniel Kahneman / Amos Tversky, diversification, dividend-yielding stocks, dogs of the Dow, Elliott wave, equity risk premium, estate planning, Eugene Fama: efficient market hypothesis, eurozone crisis, Everybody Ought to Be Rich, fear index, fixed income, geopolitical risk, government statistician, index fund, Isaac Newton, John Bogle, John Meriwether, Long Term Capital Management, low interest rates, Market Wizards by Jack D. Schwager, Mexican peso crisis / tequila crisis, money market fund, Myron Scholes, PalmPilot, passive investing, Paul Samuelson, pets.com, price anchoring, proprietary trading, Ralph Nelson Elliott, random walk, Reminiscences of a Stock Operator, risk tolerance, risk-adjusted returns, Robert Shiller, robo advisor, Savings and loan crisis, Sharpe ratio, short selling, Silicon Valley, South Sea Bubble, statistical model, Steve Jobs, subprime mortgage crisis, survivorship bias, technology bubble, transaction costs, two and twenty, VA Linux, Vanguard fund, zero-coupon bond, zero-sum game

We’re talking about less than one-tenth of one percent of all trading days during that span. Sure, the payoff from missing a major selloff would be huge. The very smartest people on Wall Street would give up a major bodily appendage to identify even one of those episodes, though, and there’s no evidence any of them has managed to do it with any consistency. Their statistical models aren’t even very good at predicting how bad those bad days will be once they arrive—potentially a fatal miscalculation for those using borrowed money to enhance returns. For example, the October 1987 stock market crash was what risk managers call a 21 standard deviation event. That’s a statistical definition and I won’t bore you with the math.


pages: 623 words: 448,848

Food Allergy: Adverse Reactions to Foods and Food Additives by Dean D. Metcalfe

active measures, Albert Einstein, autism spectrum disorder, bioinformatics, classic study, confounding variable, epigenetics, Helicobacter pylori, hygiene hypothesis, impulse control, life extension, longitudinal study, meta-analysis, mouse model, pattern recognition, phenotype, placebo effect, randomized controlled trial, Recombinant DNA, selection bias, statistical model, stem cell, twin studies, two and twenty

Furthermore, this approach allows for the possibility that almost 10% of patients allergic to that food will react to ingestion of that dose and this possibility may be considered as too high. Modeling of collective data from several studies is probably the preferred approach to determine the population-based threshold, although the best statistical model to use remains to be determined [8]. typical servings of these foods. Thus, it is tempting to speculate that those individuals with very low individual threshold doses would be less likely to outgrow their food allergy or would require a longer time period for that to occur. In at least one study [25], individuals with histories of severe food allergies had significantly lower individual threshold doses.

The knowledge of individual threshold doses would allow physicians to offer more complete advice to food-allergic patients in terms of their comparative vulnerability to hidden residues of allergenic foods. The clinical determination of large numbers of individual threshold doses would allow estimates of population-based thresholds using appropriate statistical modeling approaches. The food industry and regulatory agencies could also make effective use of information on population-based threshold doses to establish improved labeling regulations and practices and allergen control programs. References 1 Gern JE, Yang E, Evrard HM, et al. Allergic reactions to milk-contaminated “non-dairy” products.

Standardization of double-blind, placebocontrolled food challenges. Allergy 2001;56:75–7. 75 Caffarelli C, Petroccione T. False-negative food challenges in children with suspected food allergy. Lancet 2001;358:1871–2. 76 Sampson HA. Use of food-challenge tests in children. Lancet 2001; 358:1832–3. 77 Briggs D, Aspinall L, Dickens A, Bindslev-Jensen C. Statistical model for assessing the proportion of subjects with subjective sensitisations in adverse reactions to foods. Allergy 2001; 56:83–5. 78 Chinchilli VM, Fisher L, Craig TJ. Statistical issues in clinical trials that involve the double-blind, placebo-controlled food challenge. J Allergy Clin Immunol 2005;115:592–7. 21 CHAPTER 21 IgE Tests: In Vitro Diagnosis Kirsten Beyer KEY CONCEPTS • The presence of food allergen-specific IgE determines the sensitization to a specific food.


pages: 285 words: 86,853

What Algorithms Want: Imagination in the Age of Computing by Ed Finn

Airbnb, Albert Einstein, algorithmic bias, algorithmic management, algorithmic trading, AlphaGo, Amazon Mechanical Turk, Amazon Web Services, bitcoin, blockchain, business logic, Charles Babbage, Chuck Templeton: OpenTable:, Claude Shannon: information theory, commoditize, Computing Machinery and Intelligence, Credit Default Swap, crowdsourcing, cryptocurrency, data science, DeepMind, disruptive innovation, Donald Knuth, Donald Shoup, Douglas Engelbart, Douglas Engelbart, Elon Musk, Evgeny Morozov, factory automation, fiat currency, Filter Bubble, Flash crash, game design, gamification, Google Glasses, Google X / Alphabet X, Hacker Conference 1984, High speed trading, hiring and firing, Ian Bogost, industrial research laboratory, invisible hand, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Markoff, Just-in-time delivery, Kickstarter, Kiva Systems, late fees, lifelogging, Loebner Prize, lolcat, Lyft, machine readable, Mother of all demos, Nate Silver, natural language processing, Neal Stephenson, Netflix Prize, new economy, Nicholas Carr, Nick Bostrom, Norbert Wiener, PageRank, peer-to-peer, Peter Thiel, power law, Ray Kurzweil, recommendation engine, Republic of Letters, ride hailing / ride sharing, Satoshi Nakamoto, self-driving car, sharing economy, Silicon Valley, Silicon Valley billionaire, Silicon Valley ideology, Silicon Valley startup, SimCity, Skinner box, Snow Crash, social graph, software studies, speech recognition, statistical model, Steve Jobs, Steven Levy, Stewart Brand, supply-chain management, tacit knowledge, TaskRabbit, technological singularity, technological solutionism, technoutopianism, the Cathedral and the Bazaar, The Coming Technological Singularity, the scientific method, The Signal and the Noise by Nate Silver, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, transaction costs, traveling salesman, Turing machine, Turing test, Uber and Lyft, Uber for X, uber lyft, urban planning, Vannevar Bush, Vernor Vinge, wage slave

The black box structure of Siri’s knowledge ontology obfuscated the category error the system made by excluding Planned Parenthood facilities. Fixing this glitch in the culture machine necessarily involves human intervention: behind the facade of the black box, engineers had to overrule baseline statistical models with exceptions and workarounds. There must be thousands of such exceptions, particularly for responses that mimic human affect. Siri and its various counterparts offer a vision of universal language computation, but in practice depend on an “effective” computation that requires constant tweaking and oversight.


pages: 297 words: 91,141

Market Sense and Nonsense by Jack D. Schwager

3Com Palm IPO, asset allocation, Bear Stearns, Bernie Madoff, Black Monday: stock market crash in 1987, Brownian motion, buy and hold, collateralized debt obligation, commodity trading advisor, computerized trading, conceptual framework, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, diversified portfolio, fixed income, global macro, high net worth, implied volatility, index arbitrage, index fund, Jim Simons, junk bonds, London Interbank Offered Rate, Long Term Capital Management, low interest rates, managed futures, margin call, market bubble, market fundamentalism, Market Wizards by Jack D. Schwager, merger arbitrage, negative equity, pattern recognition, performance metric, pets.com, Ponzi scheme, proprietary trading, quantitative trading / quantitative finance, random walk, risk free rate, risk tolerance, risk-adjusted returns, risk/return, Robert Shiller, selection bias, Sharpe ratio, short selling, statistical arbitrage, statistical model, subprime mortgage crisis, survivorship bias, tail risk, transaction costs, two-sided market, value at risk, yield curve

The premise underlying statistical arbitrage is that short-term imbalances in buy and sell orders cause temporary price distortions, which provide short-term trading opportunities. Statistical arbitrage is a mean-reversion strategy that seeks to sell excessive strength and buy excessive weakness based on statistical models that define when short-term price moves in individual equities are considered out of line relative to price moves in related equities. The origin of the strategy was a subset of statistical arbitrage called pairs trading. In pairs trading, the price ratios of closely related stocks are tracked (e.g., Ford and General Motors), and when the mathematical model indicates that one stock has gained too much versus the other (either by rising more or by declining less), it is sold and hedged by the purchase of the related equity in the pair.


pages: 291 words: 90,200

Networks of Outrage and Hope: Social Movements in the Internet Age by Manuel Castells

"World Economic Forum" Davos, access to a mobile phone, banking crisis, call centre, centre right, citizen journalism, cognitive dissonance, collective bargaining, conceptual framework, crowdsourcing, currency manipulation / currency intervention, disintermediation, en.wikipedia.org, Glass-Steagall Act, housing crisis, income inequality, microcredit, military-industrial complex, Mohammed Bouazizi, Occupy movement, offshore financial centre, Port of Oakland, social software, statistical model, Twitter Arab Spring, We are the 99%, web application, WikiLeaks, World Values Survey, young professional, zero-sum game

Particularly significant, before the Arab Spring, was the transformation of social involvement in Egypt and Bahrain with the help of ICT diffusion. In a stream of research conducted in 2011 and 2012 after the Arab uprisings, Howard and Hussain, using a series of quantitative and qualitative indicators, probed a multi-causal, statistical model of the processes and outcomes of the Arab uprisings by using fuzzy logic (Hussain and Howard 2012). They found that the extensive use of digital networks by a predominantly young population of demonstrators had a significant effect on the intensity and power of these movements, starting with a very active debate on social and political demands in the social media before the demonstrations’ onset.


pages: 345 words: 92,849

Equal Is Unfair: America's Misguided Fight Against Income Inequality by Don Watkins, Yaron Brook

3D printing, Affordable Care Act / Obamacare, Apple II, barriers to entry, Berlin Wall, Bernie Madoff, blue-collar work, business process, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, collective bargaining, colonial exploitation, Cornelius Vanderbilt, corporate governance, correlation does not imply causation, creative destruction, Credit Default Swap, crony capitalism, David Brooks, deskilling, Edward Glaeser, Elon Musk, en.wikipedia.org, financial deregulation, immigration reform, income inequality, indoor plumbing, inventory management, invisible hand, Isaac Newton, Jeff Bezos, Jony Ive, laissez-faire capitalism, Louis Pasteur, low skilled workers, means of production, minimum wage unemployment, Naomi Klein, new economy, obamacare, Peter Singer: altruism, Peter Thiel, profit motive, rent control, Ronald Reagan, Silicon Valley, Skype, Solyndra, statistical model, Steve Jobs, Steve Wozniak, The Spirit Level, too big to fail, trickle-down economics, Uber for X, urban renewal, War on Poverty, wealth creators, women in the workforce, working poor, zero-sum game

In these cases the question to ask is: “Assuming this is a problem, what is your solution?” Inevitably, the inequality critics’ answer will be that some form of force must be used to tear down the top by depriving them of the earned, and to prop up the bottom by giving them the unearned. But nothing can justify an injustice, nor can any statistical model erase the fact that all of the values human life requires are a product of the human mind, and that the human mind cannot function without freedom. Don’t concede that the inequality alarmists value equality. The egalitarians pose as defenders of equality. But there is no such thing as being for equality across the board: different types of equality conflict.


pages: 322 words: 88,197

Wonderland: How Play Made the Modern World by Steven Johnson

"hyperreality Baudrillard"~20 OR "Baudrillard hyperreality", Ada Lovelace, adjacent possible, Alfred Russel Wallace, Antoine Gombaud: Chevalier de Méré, Berlin Wall, bitcoin, Book of Ingenious Devices, Buckminster Fuller, Charles Babbage, Claude Shannon: information theory, Clayton Christensen, colonial exploitation, computer age, Computing Machinery and Intelligence, conceptual framework, cotton gin, crowdsourcing, cuban missile crisis, Drosophila, Edward Thorp, Fellow of the Royal Society, flying shuttle, game design, global village, Great Leap Forward, Hedy Lamarr / George Antheil, HyperCard, invention of air conditioning, invention of the printing press, invention of the telegraph, Islamic Golden Age, Jacquard loom, Jacques de Vaucanson, James Watt: steam engine, Jane Jacobs, John von Neumann, joint-stock company, Joseph-Marie Jacquard, land value tax, Landlord’s Game, Lewis Mumford, lone genius, mass immigration, megacity, Minecraft, moral panic, Murano, Venice glass, music of the spheres, Necker cube, New Urbanism, Oculus Rift, On the Economy of Machinery and Manufactures, pattern recognition, peer-to-peer, pets.com, placebo effect, pneumatic tube, probability theory / Blaise Pascal / Pierre de Fermat, profit motive, QWERTY keyboard, Ray Oldenburg, SimCity, spice trade, spinning jenny, statistical model, Steve Jobs, Steven Pinker, Stewart Brand, supply-chain management, talking drums, the built environment, The Great Good Place, the scientific method, The Structural Transformation of the Public Sphere, trade route, Turing machine, Turing test, Upton Sinclair, urban planning, vertical integration, Victor Gruen, Watson beat the top human players on Jeopardy!, white flight, white picket fence, Whole Earth Catalog, working poor, Wunderkammern

Probability theory served as a kind of conceptual fossil fuel for the modern world. It gave rise to the modern insurance industry, which for the first time could calculate with some predictive power the claims it could expect when insuring individuals or industries. Capital markets—for good and for bad—rely extensively on elaborate statistical models that predict future risk. “The pundits and pollsters who today tell us who is likely to win the next election make direct use of mathematical techniques developed by Pascal and Fermat,” the mathematician Keith Devlin writes. “In modern medicine, future-predictive statistical methods are used all the time to compare the benefits of various drugs and treatments with their risks.”


pages: 257 words: 94,168

Oil Panic and the Global Crisis: Predictions and Myths by Steven M. Gorelick

California gold rush, carbon footprint, energy security, energy transition, flex fuel, Ford Model T, income per capita, invention of the telephone, Jevons paradox, meta-analysis, North Sea oil, nowcasting, oil shale / tar sands, oil shock, peak oil, price elasticity of demand, price stability, profit motive, purchasing power parity, RAND corporation, statistical model, stock buybacks, Thomas Malthus

At a depth of over 5 miles, this find contains anywhere between 3 and 15 billion barrels and could comprise 11 percent of US production by 2013.107 In 2009, Chevron reported another deep-water discovery just 44 miles away that may yield 0.5 billion barrels and could be profitably produced at an oil price of $50 per barrel.108 The second insight from discovery trends is that an underlying premise of many statistical models of oil discovery is probably incorrect. This premise is that larger oil fields are found first, followed by the discovery of smaller fields. Large fields in geologically related proximity to one another are typically discovered first simply because they are the most easily detected targets.


pages: 323 words: 89,795

Food and Fuel: Solutions for the Future by Andrew Heintzman, Evan Solomon, Eric Schlosser

agricultural Revolution, Berlin Wall, big-box store, California energy crisis, clean water, Community Supported Agriculture, corporate social responsibility, David Brooks, deindustrialization, distributed generation, electricity market, energy security, Exxon Valdez, flex fuel, full employment, half of the world's population has never made a phone call, hydrogen economy, Kickstarter, land reform, megaproject, microcredit, Negawatt, Nelson Mandela, oil shale / tar sands, oil shock, peak oil, precautionary principle, RAND corporation, risk tolerance, Silicon Valley, social contagion, statistical model, Tragedy of the Commons, Upton Sinclair, uranium enrichment, vertical integration

Specifically, there were some indications that China’s catch reports were too high. For example, some of China’s major fish populations were declared overexploited decades ago. In 2001, Watson and Pauly published an eye-opening study in the journal Nature about the true status of our world’s fisheries. These researchers used a statistical model to compare China’s officially reported catches to those that would be expected, given oceanographic conditions and other factors. They determined that China’s actual catches were likely closer to one half their reported levels. The implications of China’s over-reporting are dramatic: instead of global catches increasing by 0.33 million tonnes per year since 1988, as reported by the FAO, catches have actually declined by 0.36 million tonnes per year.


pages: 335 words: 94,657

The Bogleheads' Guide to Investing by Taylor Larimore, Michael Leboeuf, Mel Lindauer

asset allocation, behavioural economics, book value, buy and hold, buy low sell high, corporate governance, correlation coefficient, Daniel Kahneman / Amos Tversky, diversification, diversified portfolio, Donald Trump, endowment effect, estate planning, financial engineering, financial independence, financial innovation, high net worth, index fund, John Bogle, junk bonds, late fees, Long Term Capital Management, loss aversion, Louis Bachelier, low interest rates, margin call, market bubble, mental accounting, money market fund, passive investing, Paul Samuelson, random walk, risk tolerance, risk/return, Sharpe ratio, statistical model, stocks for the long run, survivorship bias, the rule of 72, transaction costs, Vanguard fund, yield curve, zero-sum game

During a 15-year period when the S&P 500 had average annual returns of 15.3 percent, the Mensa Investment Club's performance averaged returns of only 2.5 percent. 3. In 1994, a hedge fund called Long Term Capital Management (LTCM) was created with the help of two Nobel Prize-winning economists. They believed they had a statistical model that could eliminate risk from investing. The fund was extremely leveraged. They controlled positions totaling $1.25 trillion, an amount equal to the annual budget of the U.S. government. After some spectacular early successes, a financial panic swept across Asia. In 1998, LTCM hemorrhaged and faced bankruptcy.


High-Frequency Trading by David Easley, Marcos López de Prado, Maureen O'Hara

algorithmic trading, asset allocation, backtesting, Bear Stearns, Brownian motion, capital asset pricing model, computer vision, continuous double auction, dark matter, discrete time, finite state, fixed income, Flash crash, High speed trading, index arbitrage, information asymmetry, interest rate swap, Large Hadron Collider, latency arbitrage, margin call, market design, market fragmentation, market fundamentalism, market microstructure, martingale, National best bid and offer, natural language processing, offshore financial centre, pattern recognition, power law, price discovery process, price discrimination, price stability, proprietary trading, quantitative trading / quantitative finance, random walk, Sharpe ratio, statistical arbitrage, statistical model, stochastic process, Tobin tax, transaction costs, two-sided market, yield curve

With variable market speed of trading we need to discretise the time interval [0, T ] over n steps ∆ti , and Equation 2.1 becomes X= n vi ∆ti (2.2) i =1 Therefore, executing a CLOCK or VWAP or POV strategy is a scheduling problem, ie, we are trying to enforce Equation 2.2 within each evaluation interval ∆ti while targeting a predetermined T or a variable vi . Controlling the speed of trading vi is a non-trivial practical problem that requires statistical models for forecasting the market volume over short horizons, as well as local adjustments for tracking the target schedule (Markov et al 2011). These scheduling techniques are also used in later generation algorithms. Second generation algorithms Second generation algorithms introduce the concepts of price impact and risk.


Driverless: Intelligent Cars and the Road Ahead by Hod Lipson, Melba Kurman

AI winter, Air France Flight 447, AlphaGo, Amazon Mechanical Turk, autonomous vehicles, backpropagation, barriers to entry, butterfly effect, carbon footprint, Chris Urmson, cloud computing, computer vision, connected car, creative destruction, crowdsourcing, DARPA: Urban Challenge, deep learning, digital map, Donald Shoup, driverless car, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, General Motors Futurama, Geoffrey Hinton, Google Earth, Google X / Alphabet X, Hans Moravec, high net worth, hive mind, ImageNet competition, income inequality, industrial robot, intermodal, Internet of things, Jeff Hawkins, job automation, Joseph Schumpeter, lone genius, Lyft, megacity, Network effects, New Urbanism, Oculus Rift, pattern recognition, performance metric, Philippa Foot, precision agriculture, RFID, ride hailing / ride sharing, Second Machine Age, self-driving car, Silicon Valley, smart cities, speech recognition, statistical model, Steve Jobs, technoutopianism, TED Talk, Tesla Model S, Travis Kalanick, trolley problem, Uber and Lyft, uber lyft, Unsafe at Any Speed, warehouse robotics

After a few thousand games more, the software began to play with what some observers might call “strategy.” Since most moves can lead to both a loss and a win, depending on subsequent moves, the database didn’t just record a win/lose outcome. Instead, it recorded the probability that each move would eventually lead to a win. In other words, the database was essentially a big statistical model. Figure 8.2 AI techniques used in driverless cars. Most robotic systems use a combination of techniques. Object recognition for real-time obstacle detection and traffic negotiation is the most challenging for AI (far left). As the software learned, it spent countless hours in “self-play,” amassing more gaming experience than any human could in a lifetime.


pages: 297 words: 95,518

Ten Technologies to Save the Planet: Energy Options for a Low-Carbon Future by Chris Goodall

barriers to entry, carbon footprint, carbon tax, congestion charging, decarbonisation, electricity market, energy security, Indoor air pollution, Intergovernmental Panel on Climate Change (IPCC), Kickstarter, land tenure, load shedding, New Urbanism, oil shock, profit maximization, Silicon Valley, smart grid, smart meter, statistical model, undersea cable

Every country in the world that relies on increasing amounts of wind, marine, or solar power will probably need to use all three of these mechanisms to align short-term supply and demand. In the U.S., this three-pronged approach is appropriately called the “smart grid.” The construction and operation of this new kind of grid are fascinating challenges to engineers and also to the mathematicians who will use statistical modeling to minimize the risk of not having enough power or, perhaps even more expensively, having grossly excessive power production for many hours a week. Elsewhere, the standard approach, which we might call the “twentieth-century model,” simply tries to predict changes in demand and then adjusts supply to meet these variations.


pages: 339 words: 88,732

The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson, Andrew McAfee

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, access to a mobile phone, additive manufacturing, Airbnb, Alan Greenspan, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, American Society of Civil Engineers: Report Card, Any sufficiently advanced technology is indistinguishable from magic, autonomous vehicles, barriers to entry, basic income, Baxter: Rethink Robotics, Boston Dynamics, British Empire, business cycle, business intelligence, business process, call centre, carbon tax, Charles Lindbergh, Chuck Templeton: OpenTable:, clean water, combinatorial explosion, computer age, computer vision, congestion charging, congestion pricing, corporate governance, cotton gin, creative destruction, crowdsourcing, data science, David Ricardo: comparative advantage, digital map, driverless car, employer provided health coverage, en.wikipedia.org, Erik Brynjolfsson, factory automation, Fairchild Semiconductor, falling living standards, Filter Bubble, first square of the chessboard / second half of the chessboard, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, full employment, G4S, game design, general purpose technology, global village, GPS: selective availability, Hans Moravec, happiness index / gross national happiness, illegal immigration, immigration reform, income inequality, income per capita, indoor plumbing, industrial robot, informal economy, intangible asset, inventory management, James Watt: steam engine, Jeff Bezos, Jevons paradox, jimmy wales, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, Khan Academy, Kiva Systems, knowledge worker, Kodak vs Instagram, law of one price, low skilled workers, Lyft, Mahatma Gandhi, manufacturing employment, Marc Andreessen, Mark Zuckerberg, Mars Rover, mass immigration, means of production, Narrative Science, Nate Silver, natural language processing, Network effects, new economy, New Urbanism, Nicholas Carr, Occupy movement, oil shale / tar sands, oil shock, One Laptop per Child (OLPC), pattern recognition, Paul Samuelson, payday loans, post-work, power law, price stability, Productivity paradox, profit maximization, Ralph Nader, Ray Kurzweil, recommendation engine, Report Card for America’s Infrastructure, Robert Gordon, Robert Solow, Rodney Brooks, Ronald Reagan, search costs, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Simon Kuznets, six sigma, Skype, software patent, sovereign wealth fund, speech recognition, statistical model, Steve Jobs, Steven Pinker, Stuxnet, supply-chain management, TaskRabbit, technological singularity, telepresence, The Bell Curve by Richard Herrnstein and Charles Murray, the Cathedral and the Bazaar, the long tail, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, Tyler Cowen, Tyler Cowen: Great Stagnation, Vernor Vinge, warehouse robotics, Watson beat the top human players on Jeopardy!, winner-take-all economy, Y2K

To test this hypothesis, Erik asked Google if he could access data about its search terms. He was told that he didn’t have to ask; the company made these data freely available over the Web. Erik and his doctoral student Lynn Wu, neither of whom was versed in the economics of housing, built a simple statistical model to look at the data utilizing the user-generated content of search terms made available by Google. Their model linked changes in search-term volume to later housing sales and price changes, predicting that if search terms like the ones above were on the increase today, then housing sales and prices in Phoenix would rise three months from now.


pages: 353 words: 88,376

The Investopedia Guide to Wall Speak: The Terms You Need to Know to Talk Like Cramer, Think Like Soros, and Buy Like Buffett by Jack (edited By) Guinan

Albert Einstein, asset allocation, asset-backed security, book value, Brownian motion, business cycle, business process, buy and hold, capital asset pricing model, clean water, collateralized debt obligation, computerized markets, correlation coefficient, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, discounted cash flows, diversification, diversified portfolio, dividend-yielding stocks, dogs of the Dow, equity premium, equity risk premium, fear index, financial engineering, fixed income, Glass-Steagall Act, implied volatility, index fund, intangible asset, interest rate swap, inventory management, inverted yield curve, junk bonds, London Interbank Offered Rate, low interest rates, margin call, money market fund, mortgage debt, Myron Scholes, passive investing, performance metric, risk free rate, risk tolerance, risk-adjusted returns, risk/return, shareholder value, Sharpe ratio, short selling, short squeeze, statistical model, time value of money, transaction costs, yield curve, zero-coupon bond

Related Terms: • Defined-Benefit Plan • Defined-Contribution Plan • Individual Retirement Account—IRA • Roth IRA • Tax Deferred 241 242 The Investopedia Guide to Wall Speak Quantitative Analysis What Does Quantitative Analysis Mean? A business or financial analysis technique that is used to understand market behavior by employing complex mathematical and statistical modeling, measurement, and research. By assigning a numerical value to variables, quantitative analysts try to replicate reality in mathematical terms. Quantitative analysis helps measure performance evaluation or valuation of a financial instrument. It also can be used to predict real-world events such as changes in a share’s price.


The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Bayesian statistics, business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, data science, discrete time, disruptive innovation, George Gilder, Google Earth, hype cycle, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, Large Hadron Collider, late capitalism, lifelogging, linked data, longitudinal study, machine readable, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, SimCity, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, technological solutionism, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

The difference between the humanities and social sciences in this respect is because the statistics used in the digital humanities are largely descriptive – identifying patterns and plotting them as counts, graphs, and maps. In contrast, the computational social sciences employ the scientific method, complementing descriptive statistics with inferential statistics that seek to identify causality. In other words, they are underpinned by an epistemology wherein the aim is to produce sophisticated statistical models that explain, simulate and predict human life. This is much more difficult to reconcile with post-positivist approaches. The defence then rests on the utility and value of the method and models, not on providing complementary analysis of a more expansive set of data. There are alternatives to this position, such as that adopted within critical GIS (Geographic Information Science) and radical statistics, and those who utilise mixed-method approaches, that either employ models and inferential statistics while being mindful of their shortcomings, or more commonly only utilise descriptive statistics that are complemented with small data studies.


pages: 312 words: 89,728

The End of My Addiction by Olivier Ameisen

Albert Einstein, epigenetics, fake it until you make it, meta-analysis, placebo effect, randomized controlled trial, selective serotonin reuptake inhibitor (SSRI), statistical model

., Sunde, N. et al. (2002) Evidence of tolerance to baclofen in treatment of severe spasticity with intrathecal baclofen. Clinical Neurology and Neurosurgery 104, 142–145. Pelc, I., Ansoms, C., Lehert, P. et al. (2002) The European NEAT program: an integrated approach using acamprosate and psychosocial support for the prevention of relapse in alcohol-dependent patients with a statistical modeling of therapy success prediction. Alcoholism: Clinical and Experimental Research 26, 1529–1538. Roberts, D. C. and Andrews, M. M. (1997) Baclofen suppression of cocaine self-administration: demonstration using a discrete trials procedure. Psychopharmacology (Berlin) 131, 271–277. Shoaib, M., Swanner, L.


pages: 342 words: 94,762

Wait: The Art and Science of Delay by Frank Partnoy

algorithmic trading, Atul Gawande, behavioural economics, Bernie Madoff, Black Swan, blood diamond, Cass Sunstein, Checklist Manifesto, cognitive bias, collapse of Lehman Brothers, collateralized debt obligation, computerized trading, corporate governance, cotton gin, Daniel Kahneman / Amos Tversky, delayed gratification, Flash crash, Frederick Winslow Taylor, George Akerlof, Google Earth, Hernando de Soto, High speed trading, impulse control, income inequality, information asymmetry, Isaac Newton, Long Term Capital Management, Menlo Park, mental accounting, meta-analysis, MITM: man-in-the-middle, Nick Leeson, paper trading, Paul Graham, payday loans, Pershing Square Capital Management, Ralph Nader, Richard Thaler, risk tolerance, Robert Shiller, Ronald Reagan, Saturday Night Live, scientific management, six sigma, social discount rate, Spread Networks laid a new fibre optics cable between New York and Chicago, Stanford marshmallow experiment, statistical model, Steve Jobs, systems thinking, The Market for Lemons, the scientific method, The Wealth of Nations by Adam Smith, upwardly mobile, Walter Mischel, work culture

It is worth noting that when economists attempt to describe human behavior using high-level math, it often doesn’t go particularly well. Because the math is complex, people are prone to rely on it without question. And the equations often are vulnerable to unrealistic assumptions. Most recently, the financial crisis was caused in part by overreliance on statistical models that didn’t take into account the chances of declines in housing prices. But that was just the most recent iteration: the collapse of Enron, the implosion of the hedge fund Long-Term Capital Management, the billions of dollars lost by rogue traders Kweku Adoboli, Jerome Kerviel, Nick Leeson, and others—all of these fiascos have, at their heart, a mistaken reliance on complex math.


The Fractalist by Benoit Mandelbrot

Albert Einstein, Benoit Mandelbrot, Brownian motion, business cycle, Claude Shannon: information theory, discrete time, double helix, financial engineering, Georg Cantor, Henri Poincaré, Honoré de Balzac, illegal immigration, Isaac Newton, iterative process, Johannes Kepler, John von Neumann, linear programming, Louis Bachelier, Louis Blériot, Louis Pasteur, machine translation, mandelbrot fractal, New Journalism, Norbert Wiener, Olbers’ paradox, Paul Lévy, power law, Richard Feynman, statistical model, urban renewal, Vilfredo Pareto

There were far too many big price jumps and falls. And the volatility kept shifting over time. Some years prices were stable, other years wild. “We’ve done all we can to make sense of these cotton prices. Everything changes, nothing is constant. This is a mess of the worst kind.” Nothing could make the data fit the existing statistical model, originally proposed in 1900, which assumed that each day’s price change was independent of the last and followed the mildly random pattern predicted by the bell curve. In short order, we made a deal: he’d let me see what I could do. He handed me cardboard boxes of computer punch cards recording the data.


pages: 408 words: 94,311

The Great Depression: A Diary by Benjamin Roth, James Ledbetter, Daniel B. Roth

bank run, banking crisis, book value, business cycle, buy and hold, California gold rush, classic study, collective bargaining, currency manipulation / currency intervention, deindustrialization, financial independence, Joseph Schumpeter, low interest rates, market fundamentalism, military-industrial complex, moral hazard, short selling, statistical model, strikebreaker, union organizing, urban renewal, Works Progress Administration

Rather, Roth’s diary is a reminder that our economic security, individually and collectively, always rests on a complex interaction of market forces, politics, consumer perception, and the impact of unforeseen (and sometimes unforeseeable) events. As in so many other areas, those offering predictions for the future or even detailed readings of the present are often wrong because of incomplete information, flawed statistical models, or hidden agendas. And even when they are right within a particular time frame, history often has other plans in mind. The Youngstown that Benjamin Roth knew and hoped to see revived—the booming steel town, where soot-choked skies meant prosperity—did in fact survive the Depression, thanks in large part to the military buildup during World War II, a major theme of this book’s final chapter.


Deep Value by Tobias E. Carlisle

activist fund / activist shareholder / activist investor, Andrei Shleifer, availability heuristic, backtesting, behavioural economics, book value, business cycle, buy and hold, Carl Icahn, corporate governance, corporate raider, creative destruction, Daniel Kahneman / Amos Tversky, discounted cash flows, financial engineering, fixed income, Henry Singleton, intangible asset, John Bogle, joint-stock company, low interest rates, margin call, passive investing, principal–agent problem, Richard Thaler, risk free rate, riskless arbitrage, Robert Shiller, Rory Sutherland, shareholder value, Sharpe ratio, South Sea Bubble, statistical model, Teledyne, The Myth of the Rational Market, The Wealth of Nations by Adam Smith, Tim Cook: Apple

Sell only if market price is equal to or greater than intrinsic value, or a better opportunity can be found, hold otherwise. Resistance to the application of statistical prediction rules in value investment runs deep. Many investors recoil at the thought of ceding control of investment decisions to a statistical model, believing that it would be better to use the output from the statistical prediction rule and retain the discretion to follow the rule’s output or not. There is some evidence to support this possibility. Traditional experts are shown to make better decisions Catch a Falling Knife 143 when they are provided with the results of statistical prediction.


pages: 304 words: 90,084

Net Zero: How We Stop Causing Climate Change by Dieter Helm

3D printing, autonomous vehicles, Berlin Wall, biodiversity loss, blockchain, Boris Johnson, carbon credits, carbon footprint, carbon tax, clean water, congestion charging, coronavirus, COVID-19, CRISPR, decarbonisation, deindustrialization, demand response, Deng Xiaoping, Donald Trump, electricity market, Extinction Rebellion, fixed income, food miles, Ford Model T, Francis Fukuyama: the end of history, general purpose technology, Great Leap Forward, green new deal, Greta Thunberg, Haber-Bosch Process, high-speed rail, hydrogen economy, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Jevons paradox, lockdown, market design, means of production, microplastics / micro fibres, North Sea oil, ocean acidification, off grid, off-the-grid, oil shale / tar sands, oil shock, peak oil, planetary scale, precautionary principle, price mechanism, quantitative easing, remote working, reshoring, rewilding, Ronald Reagan, smart meter, South China Sea, sovereign wealth fund, statistical model, systems thinking, Thomas Malthus

And that is what they are doing, just like the tobacco companies, the manufacturers of sugary drinks, arms manufacturers, construction companies that use cement and steel, farmers who use fertilisers and pesticides, and so on. Every company in the FT100 index is embedded in the fossil fuel economy. Those who say that this is what is wrong with the ‘capitalist model’ need to consider just what would happen if we jumped off now, rather than over a sensible transition, and why there is no effective carbon price. The statist model is, from a carbon perspective, much worse. It is the work of Saudi Aramco and Rosneft. Climate activists attack European and US politicians and company executives. They don’t dare take on Vladimir Putin, Xi Jinping and Mohammad bin Salman. Gluing yourself to the HQ of Shell or BP is easy: doing it in Moscow, Beijing or Riyadh is much tougher.


pages: 406 words: 88,977

How to Prevent the Next Pandemic by Bill Gates

augmented reality, call centre, computer vision, contact tracing, coronavirus, COVID-19, data science, demographic dividend, digital divide, digital map, disinformation, Edward Jenner, global pandemic, global supply chain, Hans Rosling, lockdown, Neal Stephenson, Picturephone, profit motive, QR code, remote working, social distancing, statistical model, TED Talk, women in the workforce, zero-sum game

Until fairly recently, the government there counted deaths by surveying small samples of the country every few years and then using the data to estimate nationwide mortality. In 2018, though, Mozambique began building what’s known as a “sample registration system,” which involves continuous surveillance in areas that are representative of the country as a whole. Data from these samples is fed into statistical models that make high-quality estimates about what’s going on throughout the nation. For the first time, Mozambique’s leaders can see accurate monthly reports on how many people died, how and where they died, and how old they were. Mozambique is also one of several countries that are deepening their understanding of child mortality by participating in a program called Child Health and Mortality Prevention Surveillance, or CHAMPS, a global network of public health agencies and other organizations.


pages: 286 words: 92,521

How Medicine Works and When It Doesn't: Learning Who to Trust to Get and Stay Healthy by F. Perry Wilson

Affordable Care Act / Obamacare, barriers to entry, Barry Marshall: ulcers, cognitive bias, Comet Ping Pong, confounding variable, coronavirus, correlation does not imply causation, COVID-19, data science, Donald Trump, fake news, Helicobacter pylori, Ignaz Semmelweis: hand washing, Louis Pasteur, medical malpractice, meta-analysis, multilevel marketing, opioid epidemic / opioid crisis, p-value, personalized medicine, profit motive, randomized controlled trial, risk tolerance, selection bias, statistical model, stem cell, sugar pill, the scientific method, Thomas Bayes

We may see a study that notes that Black people are twice as likely to develop diabetes as white people and, erroneously, concludes that it is due to some inherent unchangeable biology, when in fact this is a correlation induced by multiple third factors—confounders such as poor socioeconomic conditions, which are things we can change. This is why I’ve moved my lab away from using race as a variable in our statistical models. It’s not that there is no correlation between race and the kind of stuff I research (kidney disease outcomes). There is. But race is correlational, not causal. Better instead to focus on the real causal agents: racism (implicit and explicit) and societal inequality. While I don’t have a pill to fix those, I am fortunate enough to have a platform in which to urge everyone to recognize the causality of health inequality in this country and to move our government and ourselves toward addressing it.


Risk Management in Trading by Davis Edwards

Abraham Maslow, asset allocation, asset-backed security, backtesting, Bear Stearns, Black-Scholes formula, Brownian motion, business cycle, computerized trading, correlation coefficient, Credit Default Swap, discrete time, diversified portfolio, financial engineering, fixed income, Glass-Steagall Act, global macro, implied volatility, intangible asset, interest rate swap, iterative process, John Meriwether, junk bonds, London Whale, Long Term Capital Management, low interest rates, margin call, Myron Scholes, Nick Leeson, p-value, paper trading, pattern recognition, proprietary trading, random walk, risk free rate, risk tolerance, risk/return, selection bias, shareholder value, Sharpe ratio, short selling, statistical arbitrage, statistical model, stochastic process, systematic trading, time value of money, transaction costs, value at risk, Wiener process, zero-coupon bond

As a result, two equally qualified risk managers can come up with slightly different estimates for VAR. In addition, there are several common approaches to estimating VAR. These approaches can include using historical price movements, forward implied volatility from options markets, or a variety of statistical models. One common approach used to estimate VAR is to assume that percentage changes in price (called percent returns) are normally distributed. Historical data would then be used to estimate the size of a typical price move. This assumption used in the model (that percent returns are normally distributed and can be described by a single parameter called volatility) would give the model its name (this is called a parametric model).


Lessons-Learned-in-Software-Testing-A-Context-Driven-Approach by Anson-QA

anti-pattern, Chuck Templeton: OpenTable:, finite state, framing effect, full employment, independent contractor, information retrieval, job automation, knowledge worker, lateral thinking, Ralph Nader, Richard Feynman, side project, Silicon Valley, statistical model, systems thinking, tacit knowledge, web application

If the open bug count is low near the desired end of the project, does this mean that the product is more stable, or that the test team is spending too much time writing reports, running regression tests (tests that rarely find new bugs), demonstrating the product at tradeshows, and doing other activities that aren't geared toward finding new bugs? We can't tell this from the bug counts. We are particularly unimpressed with statistical models of bug arrival rates (how many bugs will be found per unit time) as vehicles for managing projects because we see no reason to believe that the assumptions underlying the probability models have any correspondence to the realities of the project. Simmonds (2000) provides a clear, explicit statement of the assumptions of one such model.


pages: 370 words: 94,968

The Most Human Human: What Talking With Computers Teaches Us About What It Means to Be Alive by Brian Christian

"Friedman doctrine" OR "shareholder theory", 4chan, Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Bertrand Russell: In Praise of Idleness, Blue Ocean Strategy, carbon footprint, cellular automata, Charles Babbage, Claude Shannon: information theory, cognitive dissonance, commoditize, complexity theory, Computing Machinery and Intelligence, crowdsourcing, David Heinemeier Hansson, Donald Trump, Douglas Hofstadter, George Akerlof, Gödel, Escher, Bach, high net worth, Isaac Newton, Jacques de Vaucanson, Jaron Lanier, job automation, Kaizen: continuous improvement, Ken Thompson, l'esprit de l'escalier, language acquisition, Loebner Prize, machine translation, Menlo Park, operational security, Ray Kurzweil, RFID, Richard Feynman, Ronald Reagan, SimCity, Skype, Social Responsibility of Business Is to Increase Its Profits, starchitect, statistical model, Stephen Hawking, Steve Jobs, Steven Pinker, Thales of Miletus, theory of mind, Thomas Bayes, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, zero-sum game

UCSD’s computational linguist Roger Levy: “Programs have gotten relatively good at what is actually said. We can devise complex new expressions, if we intend new meanings, and we can understand those new meanings. This strikes me as a great way to break the Turing test [programs] and a great way to distinguish yourself as a human. I think that in my experience with statistical models of language, it’s the unboundedness of human language that’s really distinctive.”4 Dave Ackley offers very similar confederate advice: “I would make up words, because I would expect programs to be operating out of a dictionary.” My mind on deponents and attorneys, I think of drug culture, how dealers and buyers develop their own micro-patois, and how if any of these idiosyncratic reference systems started to become too standardized—if they use the well-known “snow” for cocaine, for instance—their text-message records and email records become much more legally vulnerable (i.e., have less room for deniability) than if the dealers and buyers are, like poets, ceaselessly inventing.


pages: 313 words: 101,403

My Life as a Quant: Reflections on Physics and Finance by Emanuel Derman

Bear Stearns, Berlin Wall, bioinformatics, Black-Scholes formula, book value, Brownian motion, buy and hold, capital asset pricing model, Claude Shannon: information theory, Dennis Ritchie, Donald Knuth, Emanuel Derman, financial engineering, fixed income, Gödel, Escher, Bach, haute couture, hiring and firing, implied volatility, interest rate derivative, Jeff Bezos, John Meriwether, John von Neumann, Ken Thompson, law of one price, linked data, Long Term Capital Management, moral hazard, Murray Gell-Mann, Myron Scholes, PalmPilot, Paul Samuelson, pre–internet, proprietary trading, publish or perish, quantitative trading / quantitative finance, Sharpe ratio, statistical arbitrage, statistical model, Stephen Hawking, Steve Jobs, stochastic volatility, technology bubble, the new new thing, transaction costs, volatility smile, Y2K, yield curve, zero-coupon bond, zero-sum game

We ran daily reports on the desk's inventory using both these models. Different clients preferred different metrics, depending on their sophistication and on the accounting rules and regulations to which they were subject. We also did some longer-term, client-focused research, developing improved statistical models for homeowner prepayments or programs for valuing the more exotic ARM-based structures that were growing in popularity. The traders on the desk used the option-adjusted spread model to decide how much to bid for newly available ARM pools. The calculation was arduous. Each pool consisted of a variety of mortgages with a range of coupons and a spectrum of servicing fees, and the optionadjusted spread was calculated by averaging over thousands of future scenarios, each one involving a month-by-month simulation of interest rates over hundreds of months.


pages: 323 words: 100,772

Prisoner's Dilemma: John Von Neumann, Game Theory, and the Puzzle of the Bomb by William Poundstone

90 percent rule, Albert Einstein, anti-communist, cuban missile crisis, Douglas Hofstadter, Dr. Strangelove, Frank Gehry, From Mathematics to the Technologies of Life and Death, Herman Kahn, Jacquard loom, John Nash: game theory, John von Neumann, Kenneth Arrow, means of production, Monroe Doctrine, mutually assured destruction, Nash equilibrium, Norbert Wiener, RAND corporation, Richard Feynman, seminal paper, statistical model, the market place, zero-sum game

However, the ALMOST TIT FOR TAT strategy, which throws in a test defection to see if it’s dealing with ALL C, is not as good as plain TIT FOR TAT when paired with TIT FOR TAT. It’s 1 point worse. Beating TIT FOR TAT is tougher than it looks. Axelrod’s tournaments included sophisticated strategies designed to detect an exploitable opponent. Some created a constantly updated statistical model of their opponent’s behavior. This allowed them to predict what the opponent strategy will do after cooperation and after defection, and to adjust their own choices accordingly. This sounds great. It does allow these strategies to exploit unresponsive strategies like ALL C and RANDOM. The trouble is, no one submitted a unresponsive strategy (other than the RANDOM strategy Axelrod included).


pages: 463 words: 105,197

Radical Markets: Uprooting Capitalism and Democracy for a Just Society by Eric Posner, E. Weyl

3D printing, activist fund / activist shareholder / activist investor, Affordable Care Act / Obamacare, Airbnb, Amazon Mechanical Turk, anti-communist, augmented reality, basic income, Berlin Wall, Bernie Sanders, Big Tech, Branko Milanovic, business process, buy and hold, carbon footprint, Cass Sunstein, Clayton Christensen, cloud computing, collective bargaining, commoditize, congestion pricing, Corn Laws, corporate governance, crowdsourcing, cryptocurrency, data science, deep learning, DeepMind, Donald Trump, Elon Musk, endowment effect, Erik Brynjolfsson, Ethereum, feminist movement, financial deregulation, Francis Fukuyama: the end of history, full employment, gamification, Garrett Hardin, George Akerlof, global macro, global supply chain, guest worker program, hydraulic fracturing, Hyperloop, illegal immigration, immigration reform, income inequality, income per capita, index fund, informal economy, information asymmetry, invisible hand, Jane Jacobs, Jaron Lanier, Jean Tirole, Jeremy Corbyn, Joseph Schumpeter, Kenneth Arrow, labor-force participation, laissez-faire capitalism, Landlord’s Game, liberal capitalism, low skilled workers, Lyft, market bubble, market design, market friction, market fundamentalism, mass immigration, negative equity, Network effects, obamacare, offshore financial centre, open borders, Pareto efficiency, passive investing, patent troll, Paul Samuelson, performance metric, plutocrats, pre–internet, radical decentralization, random walk, randomized controlled trial, Ray Kurzweil, recommendation engine, rent-seeking, Richard Thaler, ride hailing / ride sharing, risk tolerance, road to serfdom, Robert Shiller, Ronald Coase, Rory Sutherland, search costs, Second Machine Age, second-price auction, self-driving car, shareholder value, sharing economy, Silicon Valley, Skype, special economic zone, spectrum auction, speech recognition, statistical model, stem cell, telepresence, Thales and the olive presses, Thales of Miletus, The Death and Life of Great American Cities, The Future of Employment, The Market for Lemons, The Nature of the Firm, The Rise and Fall of American Growth, The Theory of the Leisure Class by Thorstein Veblen, The Wealth of Nations by Adam Smith, Thorstein Veblen, trade route, Tragedy of the Commons, transaction costs, trickle-down economics, Tyler Cowen, Uber and Lyft, uber lyft, universal basic income, urban planning, Vanguard fund, vertical integration, women in the workforce, Zipcar

The core idea of ML is that the world and the human minds that intelligently navigate it are more complicated and uncertain than any programmer can precisely formulate in a set of rules. Instead of attempting to characterize intelligence through a set of instructions that the computer will directly execute, ML devises algorithms that train often complicated and opaque statistical models to “learn” to classify or predict outcomes of interest, such as how creditworthy a borrower is or whether a photo contains a cat. The most famous example of an ML algorithm is a “neural network,” or neural net for short. Neural nets imitate the structure of the human brain rather than perform a standard statistical analysis.


pages: 571 words: 105,054

Advances in Financial Machine Learning by Marcos Lopez de Prado

algorithmic trading, Amazon Web Services, asset allocation, backtesting, behavioural economics, bioinformatics, Brownian motion, business process, Claude Shannon: information theory, cloud computing, complexity theory, correlation coefficient, correlation does not imply causation, data science, diversification, diversified portfolio, en.wikipedia.org, financial engineering, fixed income, Flash crash, G4S, Higgs boson, implied volatility, information asymmetry, latency arbitrage, margin call, market fragmentation, market microstructure, martingale, NP-complete, P = NP, p-value, paper trading, pattern recognition, performance metric, profit maximization, quantitative trading / quantitative finance, RAND corporation, random walk, risk free rate, risk-adjusted returns, risk/return, selection bias, Sharpe ratio, short selling, Silicon Valley, smart cities, smart meter, statistical arbitrage, statistical model, stochastic process, survivorship bias, transaction costs, traveling salesman

Bubbles are formed in compressed (low entropy) markets. 18.8.2 Maximum Entropy Generation In a series of papers, Fiedor [2014a, 2014b, 2014c] proposes to use Kontoyiannis [1997] to estimate the amount of entropy present in a price series. He argues that, out of the possible future outcomes, the one that maximizes entropy may be the most profitable, because it is the one that is least predictable by frequentist statistical models. It is the black swan scenario most likely to trigger stop losses, thus generating a feedback mechanism that will reinforce and exacerbate the move, resulting in runs in the signs of the returns time series. 18.8.3 Portfolio Concentration Consider an NxN covariance matrix V, computed on returns.


pages: 227 words: 32,306

Using Open Source Platforms for Business Intelligence: Avoid Pitfalls and Maximize Roi by Lyndsay Wise

barriers to entry, business intelligence, business process, call centre, cloud computing, commoditize, different worldview, en.wikipedia.org, Just-in-time delivery, knowledge worker, Richard Stallman, Salesforce, software as a service, statistical model, supply-chain management, the market place

In the past, much risk management within BI remained within the realm of finance, insurance, and banking, but most organizations need to assess potential risk and help mitigate its effects on the organization. Within BI, this goes beyond information visibility and means using predictive modeling and other advanced statistical models to ensure that customers with accounts past due are not allowed to submit new orders unless it is known beforehand, or that insurance claims aren’t being submitted fraudulently. The National Health Care Anti-Fraud Association (NHCAA) estimates that in 2010, 3% of all health care spending or $68 billion is lost to health care fraud in the United States.2 This makes fraud detection in health care extremely important, especially when you consider that if you are paying for insurance in the United States, part of your insurance premiums are probably being paid to cover the instances of fraud that occur, making this relevant beyond health care insurance providers.


pages: 364 words: 102,926

What the F: What Swearing Reveals About Our Language, Our Brains, and Ourselves by Benjamin K. Bergen

correlation does not imply causation, information retrieval, intentional community, machine readable, Parler "social media", pre–internet, Ronald Reagan, seminal paper, statistical model, Steven Pinker, traumatic brain injury

So if you believe that exposure to violence in media could be a confounding factor—it correlates with exposure to profanity and could explain some amount of aggression—then you measure not only how much profanity but also how much violence children are exposed to. The two will probably correlate, but the key point is that you can measure exactly how much media violence correlates with child aggressiveness, and you can pull that apart in a statistical model from the amount that profanity exposure correlates with child aggressiveness. The authors of the Pediatrics study tried to do this. But to know that profanity exposure per se and not any of these other possible confounding factors is responsible for increased reports of aggressiveness, you’d need to do the same thing not just for exposure to media violence, as the authors did, but for every other possible confounding factor, which they did not.


pages: 364 words: 101,286

The Misbehavior of Markets: A Fractal View of Financial Turbulence by Benoit Mandelbrot, Richard L. Hudson

Alan Greenspan, Albert Einstein, asset allocation, Augustin-Louis Cauchy, behavioural economics, Benoit Mandelbrot, Big bang: deregulation of the City of London, Black Monday: stock market crash in 1987, Black-Scholes formula, British Empire, Brownian motion, business cycle, buy and hold, buy low sell high, capital asset pricing model, carbon-based life, discounted cash flows, diversification, double helix, Edward Lorenz: Chaos theory, electricity market, Elliott wave, equity premium, equity risk premium, Eugene Fama: efficient market hypothesis, Fellow of the Royal Society, financial engineering, full employment, Georg Cantor, Henri Poincaré, implied volatility, index fund, informal economy, invisible hand, John Meriwether, John von Neumann, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, market bubble, market microstructure, Myron Scholes, new economy, paper trading, passive investing, Paul Lévy, Paul Samuelson, plutocrats, power law, price mechanism, quantitative trading / quantitative finance, Ralph Nelson Elliott, RAND corporation, random walk, risk free rate, risk tolerance, Robert Shiller, short selling, statistical arbitrage, statistical model, Steve Ballmer, stochastic volatility, transfer pricing, value at risk, Vilfredo Pareto, volatility smile

Econometrica 34, 1966 (Supplement): 152-153. Mandelbrot, Benoit B. 1970. Long-run interdependence in price records and other economic time series. Econometrica 38: 122-123. Mandelbrot, Benoit B. 1972. Possible refinement of the lognormal hypothesis concerning the distribution of energy dissipation in intermittent turbulence. Statistical Models and Turbulence. M. Rosenblatt and C. Van Atta, eds. Lecture Notes in Physics 12. New York: Springer, 333-351. • Reprint: Chapter N14 of Mandelbrot 1999a. Mandelbrot, Benoit B. 1974a. Intermittent turbulence in self-similar cascades; divergence of high moments and dimension of the carrier. Journal of Fluid Mechanics 62: 331-358. • Reprint: Chapter N15 of Mandelbrot 1999a.


pages: 364 words: 99,613

Servant Economy: Where America's Elite Is Sending the Middle Class by Jeff Faux

air traffic controllers' union, Alan Greenspan, back-to-the-land, Bear Stearns, benefit corporation, Bernie Sanders, Black Swan, Bretton Woods, BRICs, British Empire, business cycle, call centre, centre right, classic study, cognitive dissonance, collateralized debt obligation, collective bargaining, creative destruction, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, currency manipulation / currency intervention, David Brooks, David Ricardo: comparative advantage, disruptive innovation, falling living standards, financial deregulation, financial innovation, full employment, Glass-Steagall Act, guns versus butter model, high-speed rail, hiring and firing, Howard Zinn, Hyman Minsky, illegal immigration, indoor plumbing, informal economy, invisible hand, John Maynard Keynes: Economic Possibilities for our Grandchildren, junk bonds, Kevin Roose, Kickstarter, lake wobegon effect, Long Term Capital Management, low interest rates, market fundamentalism, Martin Wolf, McMansion, medical malpractice, Michael Milken, military-industrial complex, Minsky moment, mortgage debt, Myron Scholes, Naomi Klein, new economy, oil shock, old-boy network, open immigration, Paul Samuelson, plutocrats, price mechanism, price stability, private military company, public intellectual, radical decentralization, Ralph Nader, reserve currency, rising living standards, Robert Shiller, rolodex, Ronald Reagan, Savings and loan crisis, school vouchers, Silicon Valley, single-payer health, Solyndra, South China Sea, statistical model, Steve Jobs, Suez crisis 1956, Thomas L Friedman, Thorstein Veblen, too big to fail, trade route, Triangle Shirtwaist Factory, union organizing, upwardly mobile, urban renewal, War on Poverty, We are the 99%, working poor, Yogi Berra, Yom Kippur War, you are the product

“Larry Summers and Michael Steele,” This Week with Christiane Amanpour, ABC News, February 8, 2009. 10. CNN Politics, Election Center, November 24, 2010, http://www.cnn.com/ELECTION/2010/results/polls.main. 11. Andrew Gelman, “Unsurprisingly, More People Are Worried about the Economy and Jobs Than about Deficit,” Statistical Modeling, Causal Interference, and Social Science, June 19, 2010, http://www.stat.columbia.edu/~cook/movabletype/archives/2010/06/unsurprisingly.html;Ryan Grim, “Mayberry Machiavellis: Obama Political Team Handcuffing Recovery,” Huffington Post, July 6, 2010, http://www.huffingtonpost.com/2010/07/06/mayberry-machiavellis-oba_n_636770.html. 12.


pages: 377 words: 97,144

Singularity Rising: Surviving and Thriving in a Smarter, Richer, and More Dangerous World by James D. Miller

23andMe, affirmative action, Albert Einstein, artificial general intelligence, Asperger Syndrome, barriers to entry, brain emulation, cloud computing, cognitive bias, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, David Brooks, David Ricardo: comparative advantage, Deng Xiaoping, en.wikipedia.org, feminist movement, Flynn Effect, friendly AI, hive mind, impulse control, indoor plumbing, invention of agriculture, Isaac Newton, John Gilmore, John von Neumann, knowledge worker, Larry Ellison, Long Term Capital Management, low interest rates, low skilled workers, Netflix Prize, neurotypical, Nick Bostrom, Norman Macrae, pattern recognition, Peter Thiel, phenotype, placebo effect, prisoner's dilemma, profit maximization, Ray Kurzweil, recommendation engine, reversible computing, Richard Feynman, Rodney Brooks, Silicon Valley, Singularitarianism, Skype, statistical model, Stephen Hawking, Steve Jobs, sugar pill, supervolcano, tech billionaire, technological singularity, The Coming Technological Singularity, the scientific method, Thomas Malthus, transaction costs, Turing test, twin studies, Vernor Vinge, Von Neumann architecture

Nobel Prize-winning economist James Heckman has written that “an entire literature has found” that cognitive abilities “significantly affect wages.”147 Of course, “cognitive abilities” aren’t necessarily the same thing as g or IQ. Recall that the theory behind g, and therefore IQ’s importance, is that a single variable can represent intelligence. To check whether a single measure of cognitive ability has predictive value, Heckman developed a statistical model testing whether one number essentially representing g and another representing noncognitive ability can explain most of the variations in wages.148 Heckman’s model shows that it could. Heckman, however, carefully points out that noncognitive traits such as “stick-to-it-iveness” are at least as important as cognitive traits in determining wages—meaning that a lazy worker with a high IQ won’t succeed at Microsoft or Goldman Sachs.


pages: 349 words: 98,868

Nervous States: Democracy and the Decline of Reason by William Davies

active measures, Affordable Care Act / Obamacare, Amazon Web Services, Anthropocene, bank run, banking crisis, basic income, Black Lives Matter, Brexit referendum, business cycle, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, citizen journalism, Climategate, Climatic Research Unit, Colonization of Mars, continuation of politics by other means, creative destruction, credit crunch, data science, decarbonisation, deep learning, DeepMind, deindustrialization, digital divide, discovery of penicillin, Dominic Cummings, Donald Trump, drone strike, Elon Musk, failed state, fake news, Filter Bubble, first-past-the-post, Frank Gehry, gig economy, government statistician, housing crisis, income inequality, Isaac Newton, Jeff Bezos, Jeremy Corbyn, Johannes Kepler, Joseph Schumpeter, knowledge economy, loss aversion, low skilled workers, Mahatma Gandhi, Mark Zuckerberg, mass immigration, meta-analysis, Mont Pelerin Society, mutually assured destruction, Northern Rock, obamacare, Occupy movement, opioid epidemic / opioid crisis, Paris climate accords, pattern recognition, Peace of Westphalia, Peter Thiel, Philip Mirowski, planetary scale, post-industrial society, post-truth, quantitative easing, RAND corporation, Ray Kurzweil, Richard Florida, road to serfdom, Robert Mercer, Ronald Reagan, sentiment analysis, Silicon Valley, Silicon Valley billionaire, Silicon Valley startup, smart cities, Social Justice Warrior, statistical model, Steve Bannon, Steve Jobs, tacit knowledge, the scientific method, Turing machine, Uber for X, universal basic income, University of East Anglia, Valery Gerasimov, W. E. B. Du Bois, We are the 99%, WikiLeaks, women in the workforce, zero-sum game

One study conducted across Europe found that the experience of unemployment leads people to become less trusting in parliament, but more trusting in the police.3 The elites who are in trouble are the ones whose lineage begins in the seventeenth century: journalists, experts, officials. They are the ones whose task it was originally to create portraits, maps, statistical models of the world, that the rest of us were expected to accept, on the basis that they were unpolluted by personal feelings or bias. Social media has accelerated this declining credibility, but it is not the sole cause. This split reflects something about the role of speed in our politics. The work of government and of establishing facts can be slow and frustrating.


Data and the City by Rob Kitchin,Tracey P. Lauriault,Gavin McArdle

A Declaration of the Independence of Cyberspace, algorithmic management, bike sharing, bitcoin, blockchain, Bretton Woods, Chelsea Manning, citizen journalism, Claude Shannon: information theory, clean water, cloud computing, complexity theory, conceptual framework, corporate governance, correlation does not imply causation, create, read, update, delete, crowdsourcing, cryptocurrency, data science, dematerialisation, digital divide, digital map, digital rights, distributed ledger, Evgeny Morozov, fault tolerance, fiat currency, Filter Bubble, floating exchange rates, folksonomy, functional programming, global value chain, Google Earth, Hacker News, hive mind, information security, Internet of things, Kickstarter, knowledge economy, Lewis Mumford, lifelogging, linked data, loose coupling, machine readable, new economy, New Urbanism, Nicholas Carr, nowcasting, open economy, openstreetmap, OSI model, packet switching, pattern recognition, performance metric, place-making, power law, quantum entanglement, RAND corporation, RFID, Richard Florida, ride hailing / ride sharing, semantic web, sentiment analysis, sharing economy, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart contracts, smart grid, smart meter, social graph, software studies, statistical model, tacit knowledge, TaskRabbit, technological determinism, technological solutionism, text mining, The Chicago School, The Death and Life of Great American Cities, the long tail, the market place, the medium is the message, the scientific method, Toyota Production System, urban planning, urban sprawl, web application

Here, a visualization is not simply describing or displaying the data, but is used as a visual analytical tool to extract information, build visual models and explanation, and to guide further statistical analysis (Keim et al. 2010). Often several different types of visual graphics are used in conjunction with each other so that the data can be examined from more than one perspective simultaneously. In addition, data mining and statistical modelling, such as prediction, simulation and optimization, can be performed and outputted through visual interfaces and outputs (Thomas and Cook 2006). In the context of city dashboards, this epistemology is framed within the emerging field of urban informatics (Foth 2009) and urban science (Batty 2013).


pages: 304 words: 99,836

Why I Left Goldman Sachs: A Wall Street Story by Greg Smith

Alan Greenspan, always be closing, asset allocation, Bear Stearns, Black Swan, bonus culture, break the buck, collateralized debt obligation, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, delayed gratification, East Village, fear index, financial engineering, fixed income, Flash crash, glass ceiling, Glass-Steagall Act, Goldman Sachs: Vampire Squid, high net worth, information asymmetry, London Interbank Offered Rate, mega-rich, money market fund, new economy, Nick Leeson, proprietary trading, quantitative hedge fund, Renaissance Technologies, short selling, short squeeze, Silicon Valley, Skype, sovereign wealth fund, Stanford marshmallow experiment, statistical model, technology bubble, too big to fail

The problem was that these hedge funds were not anticipating “Black Swan” events, a term coined by Nassim Nicholas Taleb to explain once-in-a-thousand-year-type events that people do not expect and that models can’t predict. What we saw in 2008 and 2009 was a series of Black Swan events that the statistical models would have told you were not possible, according to history. Instead of the S&P 500 Index having average daily percentage swings of 1 percent, for a sustained period the market was swinging back and forth more than 5 percent per day—five times what was normal. No computer model could have predicted this.


pages: 419 words: 102,488

Chaos Engineering: System Resiliency in Practice by Casey Rosenthal, Nora Jones

Amazon Web Services, Asilomar, autonomous vehicles, barriers to entry, blockchain, business continuity plan, business intelligence, business logic, business process, cloud computing, cognitive load, complexity theory, continuous integration, cyber-physical system, database schema, DevOps, fail fast, fault tolerance, hindsight bias, human-factors engineering, information security, Kanban, Kubernetes, leftpad, linear programming, loose coupling, microservices, MITM: man-in-the-middle, no silver bullet, node package manager, operational security, OSI model, pull request, ransomware, risk tolerance, scientific management, Silicon Valley, six sigma, Skype, software as a service, statistical model, systems thinking, the scientific method, value engineering, WebSocket

Nevertheless, the process of “glueing” these pieces together can be thought of as establishing a set of abstract mappings. We describe this in a recent paper.5 Boolean formulae were just one possible way to build our models and it is easy to imagine others. Perhaps most appropriate to the fundamentally uncertain nature of distributed systems would be probabilistic models or trained statistical models such as deep neural networks. These are outside of my own area of expertise, but I would very much like to find collaborators and recruit graduate students who are interested in working on these problems! It was not my intent in this section to advertise the LDFI approach per se, but rather to provide an example that shows that the sort of end-to-end “intuition automation” for which I advocated in the sidebar is possible in practice.


Artificial Whiteness by Yarden Katz

affirmative action, AI winter, algorithmic bias, AlphaGo, Amazon Mechanical Turk, autonomous vehicles, benefit corporation, Black Lives Matter, blue-collar work, Californian Ideology, Cambridge Analytica, cellular automata, Charles Babbage, cloud computing, colonial rule, computer vision, conceptual framework, Danny Hillis, data science, David Graeber, deep learning, DeepMind, desegregation, Donald Trump, Dr. Strangelove, driverless car, Edward Snowden, Elon Musk, Erik Brynjolfsson, European colonialism, fake news, Ferguson, Missouri, general purpose technology, gentrification, Hans Moravec, housing crisis, income inequality, information retrieval, invisible hand, Jeff Bezos, Kevin Kelly, knowledge worker, machine readable, Mark Zuckerberg, mass incarceration, Menlo Park, military-industrial complex, Nate Silver, natural language processing, Nick Bostrom, Norbert Wiener, pattern recognition, phenotype, Philip Mirowski, RAND corporation, recommendation engine, rent control, Rodney Brooks, Ronald Reagan, Salesforce, Seymour Hersh, Shoshana Zuboff, Silicon Valley, Silicon Valley billionaire, Silicon Valley ideology, Skype, speech recognition, statistical model, Stephen Hawking, Stewart Brand, Strategic Defense Initiative, surveillance capitalism, talking drums, telemarketer, The Signal and the Noise by Nate Silver, W. E. B. Du Bois, Whole Earth Catalog, WikiLeaks

For discussion of the narratives that have been used in the past to explain the rise and fall of neural networks research, see Mikel Olazaran, “A Sociological Study of the Official History of the Perceptrons Controversy,” Social Studies of Science 26, no. 3 (1996): 611–59.     4.   As one example of many, consider the coverage of a scientific journal article that presented a statistical model which the authors claim recognizes emotions better than people: Carlos F. Benitez-Quiroz, Ramprakash Srinivasan, and Aleix M. Martinez, “Facial Color Is an Efficient Mechanism to Visually Transmit Emotion,” Proceedings of the National Academy of Sciences 115, no. 14 (2018): 3581–86. The article does not reference AI at all, but that is how it was described in the media.


pages: 599 words: 98,564

The Mutant Project: Inside the Global Race to Genetically Modify Humans by Eben Kirksey

23andMe, Abraham Maslow, Affordable Care Act / Obamacare, Albert Einstein, Bernie Sanders, bioinformatics, bitcoin, Black Lives Matter, blockchain, Buckminster Fuller, clean water, coronavirus, COVID-19, CRISPR, cryptocurrency, data acquisition, deep learning, Deng Xiaoping, Donald Trump, double helix, epigenetics, Ethereum, ethereum blockchain, experimental subject, fake news, gentrification, George Floyd, Jeff Bezos, lockdown, Mark Zuckerberg, megacity, microdosing, moral panic, move fast and break things, personalized medicine, phenotype, placebo effect, randomized controlled trial, Recombinant DNA, Shenzhen special economic zone , Shenzhen was a fishing village, Silicon Valley, Silicon Valley billionaire, Skype, special economic zone, statistical model, stem cell, surveillance capitalism, tech billionaire, technological determinism, upwardly mobile, urban planning, young professional

Jiankui He zoomed through his PhD, completing his dissertation in three and a half years—extremely fast, especially for someone who was still trying to perfect his English along the way. The dissertation was ambitious and interdisciplinary: a study of the “modularity, diversity, and stochasticity” of evolutionary processes over the last 4 billion years. He used statistical models and differential equations to study seemingly unrelated systems: the structure of animal bodies, the dynamics of global financial markets, emergent strains of the influenza virus, and—fatefully—the CRISPR molecule in bacteria. He defended his dissertation in December 2010, more than a year before Jennifer Doudna and Emmanuelle Charpentier demonstrated how to manipulate DNA with CRISPR.


pages: 289 words: 95,046

Chaos Kings: How Wall Street Traders Make Billions in the New Age of Crisis by Scott Patterson

"World Economic Forum" Davos, 2021 United States Capitol attack, 4chan, Alan Greenspan, Albert Einstein, asset allocation, backtesting, Bear Stearns, beat the dealer, behavioural economics, Benoit Mandelbrot, Bernie Madoff, Bernie Sanders, bitcoin, Bitcoin "FTX", Black Lives Matter, Black Monday: stock market crash in 1987, Black Swan, Black Swan Protection Protocol, Black-Scholes formula, blockchain, Bob Litterman, Boris Johnson, Brownian motion, butterfly effect, carbon footprint, carbon tax, Carl Icahn, centre right, clean tech, clean water, collapse of Lehman Brothers, Colonization of Mars, commodity super cycle, complexity theory, contact tracing, coronavirus, correlation does not imply causation, COVID-19, Credit Default Swap, cryptocurrency, Daniel Kahneman / Amos Tversky, decarbonisation, disinformation, diversification, Donald Trump, Doomsday Clock, Edward Lloyd's coffeehouse, effective altruism, Elliott wave, Elon Musk, energy transition, Eugene Fama: efficient market hypothesis, Extinction Rebellion, fear index, financial engineering, fixed income, Flash crash, Gail Bradbrook, George Floyd, global pandemic, global supply chain, Gordon Gekko, Greenspan put, Greta Thunberg, hindsight bias, index fund, interest rate derivative, Intergovernmental Panel on Climate Change (IPCC), Jeff Bezos, Jeffrey Epstein, Joan Didion, John von Neumann, junk bonds, Just-in-time delivery, lockdown, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, Mark Spitznagel, Mark Zuckerberg, market fundamentalism, mass immigration, megacity, Mikhail Gorbachev, Mohammed Bouazizi, money market fund, moral hazard, Murray Gell-Mann, Nick Bostrom, off-the-grid, panic early, Pershing Square Capital Management, Peter Singer: altruism, Ponzi scheme, power law, precautionary principle, prediction markets, proprietary trading, public intellectual, QAnon, quantitative easing, quantitative hedge fund, quantitative trading / quantitative finance, Ralph Nader, Ralph Nelson Elliott, random walk, Renaissance Technologies, rewilding, Richard Thaler, risk/return, road to serfdom, Ronald Reagan, Ronald Reagan: Tear down this wall, Rory Sutherland, Rupert Read, Sam Bankman-Fried, Silicon Valley, six sigma, smart contracts, social distancing, sovereign wealth fund, statistical arbitrage, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, systematic trading, tail risk, technoutopianism, The Chicago School, The Great Moderation, the scientific method, too big to fail, transaction costs, University of East Anglia, value at risk, Vanguard fund, We are as Gods, Whole Earth Catalog

A Dragon King, he explained, is a dynamic process that moves toward massive instability, known as a phase transition. As an example, he showed a slide of water heating to one hundred degrees Celsius—the boiling point. The bad news is that Dragon Kings occur much more frequently than traditional statistical models would imply. The good news, he said, is that this behavior can be predicted as a system approaches what he called bifurcation—the sudden shift in the phase transition, the leap from water to steam. “Close to bifurcation you have a window of visibility,” like a plane flying from clouds into the sunshine.


pages: 307 words: 101,998

IRL: Finding Realness, Meaning, and Belonging in Our Digital Lives by Chris Stedman

Albert Einstein, augmented reality, Bernie Sanders, Black Lives Matter, context collapse, COVID-19, deepfake, different worldview, digital map, Donald Trump, fake news, feminist movement, Ferguson, Missouri, Filter Bubble, financial independence, game design, gamification, gentrification, Google Earth, Jon Ronson, Kickstarter, longitudinal study, Mark Zuckerberg, Minecraft, move fast and break things, off-the-grid, Overton Window, pre–internet, profit motive, Ralph Waldo Emerson, sentiment analysis, Skype, Snapchat, statistical model, surveillance capitalism, technoutopianism, TikTok, urban planning, urban renewal

In an article for the journal Nature Human Behavior, “The Association between Adolescent Well-Being and Digital Technology Use,” researchers Amy Orben and Andrew K. Przybylski argue that the relationship between technology and well-being actually varies a great deal depending on how you set up the statistical model. When they test many of these different approaches, they conclude that the negative relationship between technology and well-being is really quite small and probably negligible. Yes, our digital tools make some of us unhappy, but it’s not correct to say that they always do, or that they must.


pages: 385 words: 111,113

Augmented: Life in the Smart Lane by Brett King

23andMe, 3D printing, additive manufacturing, Affordable Care Act / Obamacare, agricultural Revolution, Airbnb, Albert Einstein, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, Apollo 11, Apollo Guidance Computer, Apple II, artificial general intelligence, asset allocation, augmented reality, autonomous vehicles, barriers to entry, bitcoin, Bletchley Park, blockchain, Boston Dynamics, business intelligence, business process, call centre, chief data officer, Chris Urmson, Clayton Christensen, clean water, Computing Machinery and Intelligence, congestion charging, CRISPR, crowdsourcing, cryptocurrency, data science, deep learning, DeepMind, deskilling, different worldview, disruptive innovation, distributed generation, distributed ledger, double helix, drone strike, electricity market, Elon Musk, Erik Brynjolfsson, Fellow of the Royal Society, fiat currency, financial exclusion, Flash crash, Flynn Effect, Ford Model T, future of work, gamification, Geoffrey Hinton, gig economy, gigafactory, Google Glasses, Google X / Alphabet X, Hans Lippershey, high-speed rail, Hyperloop, income inequality, industrial robot, information asymmetry, Internet of things, invention of movable type, invention of the printing press, invention of the telephone, invention of the wheel, James Dyson, Jeff Bezos, job automation, job-hopping, John Markoff, John von Neumann, Kevin Kelly, Kickstarter, Kim Stanley Robinson, Kiva Systems, Kodak vs Instagram, Leonard Kleinrock, lifelogging, low earth orbit, low skilled workers, Lyft, M-Pesa, Mark Zuckerberg, Marshall McLuhan, megacity, Metcalfe’s law, Minecraft, mobile money, money market fund, more computing power than Apollo, Neal Stephenson, Neil Armstrong, Network effects, new economy, Nick Bostrom, obamacare, Occupy movement, Oculus Rift, off grid, off-the-grid, packet switching, pattern recognition, peer-to-peer, Ray Kurzweil, retail therapy, RFID, ride hailing / ride sharing, Robert Metcalfe, Salesforce, Satoshi Nakamoto, Second Machine Age, selective serotonin reuptake inhibitor (SSRI), self-driving car, sharing economy, Shoshana Zuboff, Silicon Valley, Silicon Valley startup, Skype, smart cities, smart grid, smart transportation, Snapchat, Snow Crash, social graph, software as a service, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, synthetic biology, systems thinking, TaskRabbit, technological singularity, TED Talk, telemarketer, telepresence, telepresence robot, Tesla Model S, The future is already here, The Future of Employment, Tim Cook: Apple, trade route, Travis Kalanick, TSMC, Turing complete, Turing test, Twitter Arab Spring, uber lyft, undersea cable, urban sprawl, V2 rocket, warehouse automation, warehouse robotics, Watson beat the top human players on Jeopardy!, white picket fence, WikiLeaks, yottabyte

The skills and level of education required for each job were taken into consideration too. These features were weighted according to how automatable they were, and according to the engineering obstacles currently preventing automation or computerisation. The results were calculated with a common statistical modelling method. The outcome was clear. In the United States, more than 45 per cent of jobs could be automated within one to two decades. Table 2.3 shows a few jobs that are basically at 100 per cent risk of automation (I’ve highlighted a few of my favourites):8 Table 2.3: Some of the Jobs at Risk from Automation and AI Telemarketers Telemarketers Data Entry Professionals Procurement Clerks Title Examiners, Abstractors and Searchers Timing Device Assemblers and Adjusters Shipping, Receiving and Traffic Clerks Sewers, Hand Insurance Claims and Policy Processing Clerks Milling and Planing Machine Setters, Operators Mathematical Technicians Brokerage Clerks Credit Analysts Insurance Underwriters Order Clerks Parts Salespersons Watch Repairers Loan Officers Claims Adjusters, Examiners and Investigators Cargo and Freight Agents Insurance Appraisers, Auto Damage Driver/Sales Workers Tax Preparers Umpires, Referees and Other Sports Officials Radio Operators Photographic Process Workers and Processing Machine Operators Bank Tellers Legal Secretaries New Accounts Clerks Etchers and Engravers Bookkeeping, Accounting and Auditing Clerks Library Technicians Packaging and Filling Machine Operators Inspectors, Testers, Sorters, Samplers and Weighing Technicians One often voiced concern is that AI will create huge wealth for a limited few who own the technology, thus implying that the wealth gap will become even more acute.


pages: 446 words: 102,421

Network Security Through Data Analysis: Building Situational Awareness by Michael S Collins

business process, cloud computing, create, read, update, delete, data science, Firefox, functional programming, general-purpose programming language, index card, information security, Internet Archive, inventory management, iterative process, operational security, OSI model, p-value, Parkinson's law, peer-to-peer, slashdot, statistical model, zero day

An Introduction to R for Security Analysts R is an open source statistical analysis package developed initially by Ross Ihaka and Robert Gentleman of the University of Auckland. R was designed primarily by statisticians and data analysts, and is related to commercial statistical packages such as S and SPSS. R is a toolkit for exploratory data analysis; it provides statistical modeling and data manipulation capabilities, visualization, and a full-featured programming language. R fulfills a particular utility knife-like role for analysis. Analytic work requires some tool for creating and manipulating small ad hoc databases that summarize raw data. For example, hour summaries of traffic volume from a particular host broken down by services.


pages: 411 words: 108,119

The Irrational Economist: Making Decisions in a Dangerous World by Erwann Michel-Kerjan, Paul Slovic

"World Economic Forum" Davos, Alan Greenspan, An Inconvenient Truth, Andrei Shleifer, availability heuristic, bank run, behavioural economics, Black Swan, business cycle, Cass Sunstein, classic study, clean water, cognitive dissonance, collateralized debt obligation, complexity theory, conceptual framework, corporate social responsibility, Credit Default Swap, credit default swaps / collateralized debt obligations, cross-subsidies, Daniel Kahneman / Amos Tversky, endowment effect, experimental economics, financial innovation, Fractional reserve banking, George Akerlof, hindsight bias, incomplete markets, information asymmetry, Intergovernmental Panel on Climate Change (IPCC), invisible hand, Isaac Newton, iterative process, Kenneth Arrow, Loma Prieta earthquake, London Interbank Offered Rate, market bubble, market clearing, money market fund, moral hazard, mortgage debt, Oklahoma City bombing, Pareto efficiency, Paul Samuelson, placebo effect, precautionary principle, price discrimination, price stability, RAND corporation, Richard Thaler, Robert Shiller, Robert Solow, Ronald Reagan, Savings and loan crisis, social discount rate, source of truth, statistical model, stochastic process, subprime mortgage crisis, The Wealth of Nations by Adam Smith, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, too big to fail, transaction costs, ultimatum game, University of East Anglia, urban planning, Vilfredo Pareto

The particular danger, now both available and salient, is likely to be overestimated in the future. Second, and by contrast, we tend to raise our probability estimate insufficiently when an experienced risk occurs. Follow-up research should document these tendencies with many more examples, and in laboratory settings. If improved predictions are our goal, it should also provide rigorous statistical models of effective updating of virgin and experienced risks. Future inquiry should consider resembled risks as well. Evidence from both terrorist incidents and financial markets suggests that we have difficulty extrapolating from risks that, though varied, bear strong similarities. Behavioral biases such as these are difficult to counteract, but awareness of them is the first step.


pages: 688 words: 107,867

Python Data Analytics: With Pandas, NumPy, and Matplotlib by Fabio Nelli

Amazon Web Services, backpropagation, centre right, computer vision, data science, Debian, deep learning, DevOps, functional programming, Google Earth, Guido van Rossum, Internet of things, optical character recognition, pattern recognition, sentiment analysis, speech recognition, statistical model, web application

Other methods of data mining, such as decision trees and association rules, automatically extract important facts or rules from the data. These approaches can be used in parallel with data visualization to uncover relationships between the data. Predictive Modeling Predictive modeling is a process used in data analysis to create or choose a suitable statistical model to predict the probability of a result. After exploring the data, you have all the information needed to develop the mathematical model that encodes the relationship between the data. These models are useful for understanding the system under study, and in a specific way they are used for two main purposes.


The Deep Learning Revolution (The MIT Press) by Terrence J. Sejnowski

AI winter, Albert Einstein, algorithmic bias, algorithmic trading, AlphaGo, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, autonomous vehicles, backpropagation, Baxter: Rethink Robotics, behavioural economics, bioinformatics, cellular automata, Claude Shannon: information theory, cloud computing, complexity theory, computer vision, conceptual framework, constrained optimization, Conway's Game of Life, correlation does not imply causation, crowdsourcing, Danny Hillis, data science, deep learning, DeepMind, delayed gratification, Demis Hassabis, Dennis Ritchie, discovery of DNA, Donald Trump, Douglas Engelbart, driverless car, Drosophila, Elon Musk, en.wikipedia.org, epigenetics, Flynn Effect, Frank Gehry, future of work, Geoffrey Hinton, Google Glasses, Google X / Alphabet X, Guggenheim Bilbao, Gödel, Escher, Bach, haute couture, Henri Poincaré, I think there is a world market for maybe five computers, industrial robot, informal economy, Internet of things, Isaac Newton, Jim Simons, John Conway, John Markoff, John von Neumann, language acquisition, Large Hadron Collider, machine readable, Mark Zuckerberg, Minecraft, natural language processing, Neil Armstrong, Netflix Prize, Norbert Wiener, OpenAI, orbital mechanics / astrodynamics, PageRank, pattern recognition, pneumatic tube, prediction markets, randomized controlled trial, Recombinant DNA, recommendation engine, Renaissance Technologies, Rodney Brooks, self-driving car, Silicon Valley, Silicon Valley startup, Socratic dialogue, speech recognition, statistical model, Stephen Hawking, Stuart Kauffman, theory of mind, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Von Neumann architecture, Watson beat the top human players on Jeopardy!, world market for maybe five computers, X Prize, Yogi Berra

These success stories had a common trajectory. In the past, computers were slow and only able to explore toy models with just a few parameters. But these toy models generalized poorly to real-world data. When abundant data were available and computers were much faster, it became possible to create more complex statistical models and to extract more features and relationships between the features. Deep learning automates this process. Instead of having domain experts handcraft features for each application, deep learning can extract them from very large data sets. As computation replaces labor and continues to get cheaper, more labor-intensive cognitive tasks will be performed by computers.


pages: 398 words: 105,917

Bean Counters: The Triumph of the Accountants and How They Broke Capitalism by Richard Brooks

"World Economic Forum" Davos, accounting loophole / creative accounting, Alan Greenspan, asset-backed security, banking crisis, Bear Stearns, Big bang: deregulation of the City of London, blockchain, BRICs, British Empire, business process, Charles Babbage, cloud computing, collapse of Lehman Brothers, collateralized debt obligation, corporate governance, corporate raider, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, David Strachan, Deng Xiaoping, Donald Trump, double entry bookkeeping, Double Irish / Dutch Sandwich, energy security, Etonian, eurozone crisis, financial deregulation, financial engineering, Ford Model T, forensic accounting, Frederick Winslow Taylor, G4S, Glass-Steagall Act, high-speed rail, information security, intangible asset, Internet of things, James Watt: steam engine, Jeremy Corbyn, joint-stock company, joint-stock limited liability company, Joseph Schumpeter, junk bonds, light touch regulation, Long Term Capital Management, low cost airline, new economy, Northern Rock, offshore financial centre, oil shale / tar sands, On the Economy of Machinery and Manufactures, Ponzi scheme, post-oil, principal–agent problem, profit motive, race to the bottom, railway mania, regulatory arbitrage, risk/return, Ronald Reagan, Savings and loan crisis, savings glut, scientific management, short selling, Silicon Valley, South Sea Bubble, statistical model, supply-chain management, The Chicago School, too big to fail, transaction costs, transfer pricing, Upton Sinclair, WikiLeaks

The BBC’s then economics editor Robert Peston blogged that ‘some would say [there] is a flaw the size of Greater Manchester in its analysis – because KPMG is ignoring one of the fundamental causes of lacklustre growth in many parts of the UK, which is a shortage of skilled labour and of easily and readily developable land’.35 When a committee of MPs came to examine the report, academics lined up to rubbish it. ‘I don’t think the statistical work is reliable,’ said a professor of statistical modelling at Imperial College, London. ‘They [KPMG] apply this procedure which is essentially made up, which provides them with an estimate,’ added a professor of economic geography from the London School of Economics. ‘It is something that really shouldn’t be done in a situation where we are trying to inform public debate using statistical analysis.’36 Noting that HS2 ‘stands or falls on this piece of work’, the committee’s acerbic chairman Andrew Tyrie summoned the report’s authors.37 One exchange with KPMG’s Lewis Atter (a former Treasury civil servant) spoke volumes for the bean counters’ role in lumbering the taxpayer with monothilic projects: Tyrie: It [the £15bn a year economic projection] is a reasonable forecast of what we might hope to get from this project?


pages: 338 words: 106,936

The Physics of Wall Street: A Brief History of Predicting the Unpredictable by James Owen Weatherall

Alan Greenspan, Albert Einstein, algorithmic trading, Antoine Gombaud: Chevalier de Méré, Apollo 11, Asian financial crisis, bank run, Bear Stearns, beat the dealer, behavioural economics, Benoit Mandelbrot, Black Monday: stock market crash in 1987, Black Swan, Black-Scholes formula, Bonfire of the Vanities, book value, Bretton Woods, Brownian motion, business cycle, butterfly effect, buy and hold, capital asset pricing model, Carmen Reinhart, Claude Shannon: information theory, coastline paradox / Richardson effect, collateralized debt obligation, collective bargaining, currency risk, dark matter, Edward Lorenz: Chaos theory, Edward Thorp, Emanuel Derman, Eugene Fama: efficient market hypothesis, financial engineering, financial innovation, Financial Modelers Manifesto, fixed income, George Akerlof, Gerolamo Cardano, Henri Poincaré, invisible hand, Isaac Newton, iterative process, Jim Simons, John Nash: game theory, junk bonds, Kenneth Rogoff, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, Market Wizards by Jack D. Schwager, martingale, Michael Milken, military-industrial complex, Myron Scholes, Neil Armstrong, new economy, Nixon triggered the end of the Bretton Woods system, Paul Lévy, Paul Samuelson, power law, prediction markets, probability theory / Blaise Pascal / Pierre de Fermat, quantitative trading / quantitative finance, random walk, Renaissance Technologies, risk free rate, risk-adjusted returns, Robert Gordon, Robert Shiller, Ronald Coase, Sharpe ratio, short selling, Silicon Valley, South Sea Bubble, statistical arbitrage, statistical model, stochastic process, Stuart Kauffman, The Chicago School, The Myth of the Rational Market, tulip mania, Vilfredo Pareto, volatility smile

Sornette, Didier, and Christian Vanneste. 1992. “Dynamics and Memory Effects in Rupture of Thermal Fuse.” Physical Review Letters 68: 612–15. — — — . 1994. “Dendrites and Fronts in a Model of Dynamical Rupture with Damage.” Physical Review E 50 (6, December): 4327–45. Sornette, D., C. Vanneste, and L. Knopoff. 1992. “Statistical Model of Earthquake Foreshocks.” Physical Review A 45: 8351–57. Sourd, Véronique, Le. 2008. “Hedge Fund Performance in 2007.” EDHEC Risk and Asset Management Research Centre. Spence, Joseph. 1820. Observations, Anecdotes, and Characters, of Books and Men. London: John Murray. Stewart, James B. 1992.


pages: 392 words: 108,745

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think by James Vlahos

Albert Einstein, AltaVista, Amazon Mechanical Turk, Amazon Web Services, augmented reality, Automated Insights, autonomous vehicles, backpropagation, Big Tech, Cambridge Analytica, Chuck Templeton: OpenTable:, cloud computing, Colossal Cave Adventure, computer age, deep learning, DeepMind, Donald Trump, Elon Musk, fake news, Geoffrey Hinton, information retrieval, Internet of things, Jacques de Vaucanson, Jeff Bezos, lateral thinking, Loebner Prize, machine readable, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Mark Zuckerberg, Menlo Park, natural language processing, Neal Stephenson, Neil Armstrong, OpenAI, PageRank, pattern recognition, Ponzi scheme, randomized controlled trial, Ray Kurzweil, Ronald Reagan, Rubik’s Cube, self-driving car, sentiment analysis, Silicon Valley, Skype, Snapchat, speech recognition, statistical model, Steve Jobs, Steve Wozniak, Steven Levy, TechCrunch disrupt, Turing test, Watson beat the top human players on Jeopardy!

Because unit selection assembles snippets of actual human speech, the method has traditionally been the best way to concoct a natural-sounding voice. It’s like cooking with ingredients from the local farmers market. A second-tier method, called parametric synthesis, has historically been the speech industry’s Velveeta cheese. For it, audio engineers build statistical models of all of the various language sounds. Then they use the data to synthetically reproduce those sounds and concatenate them into full words and phrases. This approach typically produces a more robotic-sounding voice than a unit selection one. The advantage, though, is that engineers don’t need to spend eons recording someone like Bennett.


pages: 428 words: 103,544

The Data Detective: Ten Easy Rules to Make Sense of Statistics by Tim Harford

Abraham Wald, access to a mobile phone, Ada Lovelace, affirmative action, algorithmic bias, Automated Insights, banking crisis, basic income, behavioural economics, Black Lives Matter, Black Swan, Bretton Woods, British Empire, business cycle, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, Charles Babbage, clean water, collapse of Lehman Brothers, contact tracing, coronavirus, correlation does not imply causation, COVID-19, cuban missile crisis, Daniel Kahneman / Amos Tversky, data science, David Attenborough, Diane Coyle, disinformation, Donald Trump, Estimating the Reproducibility of Psychological Science, experimental subject, fake news, financial innovation, Florence Nightingale: pie chart, Gini coefficient, Great Leap Forward, Hans Rosling, high-speed rail, income inequality, Isaac Newton, Jeremy Corbyn, job automation, Kickstarter, life extension, meta-analysis, microcredit, Milgram experiment, moral panic, Netflix Prize, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, opioid epidemic / opioid crisis, Paul Samuelson, Phillips curve, publication bias, publish or perish, random walk, randomized controlled trial, recommendation engine, replication crisis, Richard Feynman, Richard Thaler, rolodex, Ronald Reagan, selection bias, sentiment analysis, Silicon Valley, sorting algorithm, sparse data, statistical model, stem cell, Stephen Hawking, Steve Bannon, Steven Pinker, survivorship bias, systematic bias, TED Talk, universal basic income, W. E. B. Du Bois, When a measure becomes a target

David Jackson and Gary Marx, “Data Mining Program Designed to Predict Child Abuse Proves Unreliable, DCFS Says,” Chicago Tribune, December 6, 2017; and Dan Hurley, “Can an Algorithm Tell When Kids Are in Danger?,” New York Times Magazine, January 2, 2018, https://www.nytimes.com/2018/01/02/magazine/can-an-algorithm-tell-when-kids-are-in-danger.html. 22. Hurley, “Can an Algorithm.” 23. Andrew Gelman, “Flaws in Stupid Horrible Algorithm Revealed Because It Made Numerical Predictions,” Statistical Modeling, Causal Inference, and Social Science (blog), July 3, 2018, https://statmodeling.stat.columbia.edu/2018/07/03/flaws-stupid-horrible-algorithm-revealed-made-numerical-predictions/. 24. Sabine Hossenfelder, “Blaise Pascal, Florin Périer, and the Puy de Dôme Experiment,” BackRe(Action) (blog), November 21, 2007, http://backreaction.blogspot.com/2007/11/blaise-pascal-florin-p-and-puy-de-d.html; and David Wootton, The Invention of Science: A New History of the Scientific Revolution (London: Allen Lane, 2015), chap. 8. 25.


pages: 461 words: 106,027

Zero to Sold: How to Start, Run, and Sell a Bootstrapped Business by Arvid Kahl

business logic, business process, centre right, Chuck Templeton: OpenTable:, cognitive load, content marketing, continuous integration, coronavirus, COVID-19, crowdsourcing, domain-specific language, financial independence, functional programming, Google Chrome, hockey-stick growth, if you build it, they will come, information asymmetry, information retrieval, inventory management, Jeff Bezos, job automation, Kanban, Kubernetes, machine readable, minimum viable product, Network effects, performance metric, post-work, premature optimization, risk tolerance, Ruby on Rails, sentiment analysis, side hustle, Silicon Valley, single source of truth, software as a service, solopreneur, source of truth, statistical model, subscription business, sunk-cost fallacy, supply-chain management, the long tail, trickle-down economics, value engineering, web application

Forecasting will allow you to explore several scenarios of where your business could go if you made certain decisions that are hard to reverse and would be very risky to attempt in reality: hiring a number of people, switching to another audience completely, or pivoting to another kind of product. It's business experimentation powered by statistical models that are at least less biased than your hopeful entrepreneurial perspective. It's a projection of your ambitions into the future. Being able to share this kind of projection will give your acquirer the confidence that you have thought about these things, and there is a statistically significant chance that the goals you have set may be reached in reality.


pages: 414 words: 109,622

Genius Makers: The Mavericks Who Brought A. I. To Google, Facebook, and the World by Cade Metz

AI winter, air gap, Airbnb, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, AlphaGo, Amazon Robotics, artificial general intelligence, Asilomar, autonomous vehicles, backpropagation, Big Tech, British Empire, Cambridge Analytica, carbon-based life, cloud computing, company town, computer age, computer vision, deep learning, deepfake, DeepMind, Demis Hassabis, digital map, Donald Trump, driverless car, drone strike, Elon Musk, fake news, Fellow of the Royal Society, Frank Gehry, game design, Geoffrey Hinton, Google Earth, Google X / Alphabet X, Googley, Internet Archive, Isaac Newton, Jeff Hawkins, Jeffrey Epstein, job automation, John Markoff, life extension, machine translation, Mark Zuckerberg, means of production, Menlo Park, move 37, move fast and break things, Mustafa Suleyman, new economy, Nick Bostrom, nuclear winter, OpenAI, PageRank, PalmPilot, pattern recognition, Paul Graham, paypal mafia, Peter Thiel, profit motive, Richard Feynman, ride hailing / ride sharing, Ronald Reagan, Rubik’s Cube, Sam Altman, Sand Hill Road, self-driving car, side project, Silicon Valley, Silicon Valley billionaire, Silicon Valley startup, Skype, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Ballmer, Steven Levy, Steven Pinker, tech worker, telemarketer, The Future of Employment, Turing test, warehouse automation, warehouse robotics, Y Combinator

He later called it his “come-to-Jesus moment,” when he realized he had spent six years writing rules that were now obsolete. “My fifty-two-year-old body had one of those moments when I saw a future where I wasn’t involved,” he says. The world’s natural language researchers soon overhauled their approach, embracing the kind of statistical models unveiled that afternoon at the lab outside Seattle. This was just one of many mathematical methods that spread across the larger community of AI researchers in the 1990s and on into the 2000s, with names like “random forests,” “boosted trees,” and “support vector machines.” Researchers applied some to natural language understanding, others to speech recognition and image recognition.


pages: 363 words: 109,834

The Crux by Richard Rumelt

activist fund / activist shareholder / activist investor, air gap, Airbnb, AltaVista, AOL-Time Warner, Bayesian statistics, behavioural economics, biodiversity loss, Blue Ocean Strategy, Boeing 737 MAX, Boeing 747, Charles Lindbergh, Clayton Christensen, cloud computing, cognitive bias, commoditize, coronavirus, corporate raider, COVID-19, creative destruction, crossover SUV, Crossrail, deep learning, Deng Xiaoping, diversified portfolio, double entry bookkeeping, drop ship, Elon Musk, en.wikipedia.org, financial engineering, Ford Model T, Herman Kahn, income inequality, index card, Internet of things, Jeff Bezos, Just-in-time delivery, Larry Ellison, linear programming, lockdown, low cost airline, low earth orbit, Lyft, Marc Benioff, Mark Zuckerberg, Masayoshi Son, meta-analysis, Myron Scholes, natural language processing, Neil Armstrong, Network effects, packet switching, PageRank, performance metric, precision agriculture, RAND corporation, ride hailing / ride sharing, Salesforce, San Francisco homelessness, search costs, selection bias, self-driving car, shareholder value, sharing economy, Silicon Valley, Skype, Snapchat, social distancing, SoftBank, software as a service, statistical model, Steve Ballmer, Steve Jobs, stochastic process, Teledyne, telemarketer, TSMC, uber lyft, undersea cable, union organizing, vertical integration, WeWork

They contain a strong random element. Track your monthly spending on groceries. A blip upward does not mean your finances are out of control, and a downward blip does not signal coming starvation. However, to insert proper logic into their estimates of value, the analysts would need PhDs in advanced Bayesian statistical modeling and certainly would not use spreadsheets. By construction, their fairly primitive estimating tools grossly overreact to blips. A third problem is that the “true” value of a company is very hard to know. Fischer Black, coauthor of the famous 1973 Black-Scholes option-pricing formula, was a believer that market prices were unbiased estimates of true value.3 But, over drinks, he also told me that the “true” value of a company was anywhere from half to twice the current stock price.


pages: 368 words: 102,379

Pandemic, Inc.: Chasing the Capitalists and Thieves Who Got Rich While We Got Sick by J. David McSwane

Affordable Care Act / Obamacare, commoditize, coronavirus, COVID-19, disinformation, Donald Trump, Elon Musk, fake it until you make it, fake news, global pandemic, global supply chain, Internet Archive, lockdown, Lyft, Mark Zuckerberg, microaggression, military-industrial complex, obamacare, open economy, Ponzi scheme, race to the bottom, ransomware, remote working, ride hailing / ride sharing, shareholder value, side hustle, Silicon Valley, social distancing, statistical model, stem cell, Steve Bannon, stock buybacks, TaskRabbit, telemarketer, uber lyft, Y2K

Fintech entered the mainstream with the advent of companies like SoFi, which offered more favorable rates than banks for those looking to consolidate student loan debt. But the model found a niche—and billions in easy profit—in servicing small and struggling businesses that banks had overlooked or turned away. Through automation, data, and statistical models that help determine if applicants will repay a loan, fintechs removed much of the human work from the loan approval process. With less human involvement, it appears, came less racial bias. Researchers at New York University, for instance, found that businesses owned by Black people were 70 percent more likely to have gotten their PPP loan from fintech than a small bank.


pages: 918 words: 257,605

The Age of Surveillance Capitalism by Shoshana Zuboff

"World Economic Forum" Davos, algorithmic bias, Amazon Web Services, Andrew Keen, augmented reality, autonomous vehicles, barriers to entry, Bartolomé de las Casas, behavioural economics, Berlin Wall, Big Tech, bitcoin, blockchain, blue-collar work, book scanning, Broken windows theory, California gold rush, call centre, Cambridge Analytica, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, choice architecture, citizen journalism, Citizen Lab, classic study, cloud computing, collective bargaining, Computer Numeric Control, computer vision, connected car, context collapse, corporate governance, corporate personhood, creative destruction, cryptocurrency, data science, deep learning, digital capitalism, disinformation, dogs of the Dow, don't be evil, Donald Trump, Dr. Strangelove, driverless car, Easter island, Edward Snowden, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, facts on the ground, fake news, Ford Model T, Ford paid five dollars a day, future of work, game design, gamification, Google Earth, Google Glasses, Google X / Alphabet X, Herman Kahn, hive mind, Ian Bogost, impulse control, income inequality, information security, Internet of things, invention of the printing press, invisible hand, Jean Tirole, job automation, Johann Wolfgang von Goethe, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, Kevin Roose, knowledge economy, Lewis Mumford, linked data, longitudinal study, low skilled workers, Mark Zuckerberg, market bubble, means of production, multi-sided market, Naomi Klein, natural language processing, Network effects, new economy, Occupy movement, off grid, off-the-grid, PageRank, Panopticon Jeremy Bentham, pattern recognition, Paul Buchheit, performance metric, Philip Mirowski, precision agriculture, price mechanism, profit maximization, profit motive, public intellectual, recommendation engine, refrigerator car, RFID, Richard Thaler, ride hailing / ride sharing, Robert Bork, Robert Mercer, Salesforce, Second Machine Age, self-driving car, sentiment analysis, shareholder value, Sheryl Sandberg, Shoshana Zuboff, Sidewalk Labs, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, slashdot, smart cities, Snapchat, social contagion, social distancing, social graph, social web, software as a service, speech recognition, statistical model, Steve Bannon, Steve Jobs, Steven Levy, structural adjustment programs, surveillance capitalism, technological determinism, TED Talk, The Future of Employment, The Wealth of Nations by Adam Smith, Tim Cook: Apple, two-sided market, union organizing, vertical integration, Watson beat the top human players on Jeopardy!, winner-take-all economy, Wolfgang Streeck, work culture , Yochai Benkler, you are the product

The company describes itself “at the forefront of innovation in machine intelligence,” a term in which it includes machine learning as well as “classical” algorithmic production, along with many computational operations that are often referred to with other terms such as “predictive analytics” or “artificial intelligence.” Among these operations Google cites its work on language translation, speech recognition, visual processing, ranking, statistical modeling, and prediction: “In all of those tasks and many others, we gather large volumes of direct or indirect evidence of relationships of interest, applying learning algorithms to understand and generalize.”9 These machine intelligence operations convert raw material into the firm’s highly profitable algorithmic products designed to predict the behavior of its users.

For individuals, the attraction is the possibility of a world where everything is arranged for your convenience—your health checkup is magically scheduled just as you begin to get sick, the bus comes just as you get to the bus stop, and there is never a line of waiting people at city hall. As these new abilities become refined by the use of more sophisticated statistical models and sensor capabilities, we could well see the creation of a quantitative, predictive science of human organizations and human society.38 III. The Principles of an Instrumentarian Society Pentland’s theory of instrumentarian society came to full flower in his 2014 book Social Physics, in which his tools and methods are integrated into an expansive vision of our futures in a data-driven instrumentarian society governed by computation.


pages: 409 words: 118,448

An Extraordinary Time: The End of the Postwar Boom and the Return of the Ordinary Economy by Marc Levinson

affirmative action, airline deregulation, Alan Greenspan, banking crisis, Big bang: deregulation of the City of London, Boycotts of Israel, Bretton Woods, business cycle, Capital in the Twenty-First Century by Thomas Piketty, car-free, Carmen Reinhart, central bank independence, centre right, clean water, deindustrialization, endogenous growth, falling living standards, financial deregulation, flag carrier, floating exchange rates, full employment, George Gilder, Gini coefficient, global supply chain, Great Leap Forward, guns versus butter model, high-speed rail, income inequality, income per capita, indoor plumbing, informal economy, intermodal, inverted yield curve, invisible hand, It's morning again in America, Kenneth Rogoff, knowledge economy, late capitalism, Les Trente Glorieuses, linear programming, low interest rates, manufacturing employment, Multi Fibre Arrangement, new economy, Nixon shock, Nixon triggered the end of the Bretton Woods system, North Sea oil, oil shock, Paul Samuelson, pension reform, Phillips curve, price stability, purchasing power parity, refrigerator car, Right to Buy, rising living standards, Robert Gordon, rolodex, Ronald Coase, Ronald Reagan, Simon Kuznets, statistical model, strikebreaker, structural adjustment programs, The Rise and Fall of American Growth, Thomas Malthus, total factor productivity, unorthodox policies, upwardly mobile, War on Poverty, Washington Consensus, Winter of Discontent, Wolfgang Streeck, women in the workforce, working-age population, yield curve, Yom Kippur War, zero-sum game

Although the influx of foreign capital set off a boom after 1986, job creation did not follow. Spain continued to have by far the highest unemployment rate in the industrial world. Its experience, like that of France, showed that the economic malaise afflicting the wealthy economies was beyond the reach of ideologically driven solutions. While the statist model had failed to revive growth, stimulate investment, and raise living standards in both France and Spain, more market-oriented policies had proven no more efficacious. Neither approach offered a realistic chance of bringing back the glorious years, which were beyond the ability of any government to restore.21 CHAPTER 13 Morning in America October 6, 1979, was a chilly Saturday in Washington.


pages: 403 words: 111,119

Doughnut Economics: Seven Ways to Think Like a 21st-Century Economist by Kate Raworth

"Friedman doctrine" OR "shareholder theory", 3D printing, Alan Greenspan, Alvin Toffler, Anthropocene, Asian financial crisis, bank run, basic income, battle of ideas, behavioural economics, benefit corporation, Berlin Wall, biodiversity loss, bitcoin, blockchain, Branko Milanovic, Bretton Woods, Buckminster Fuller, business cycle, call centre, Capital in the Twenty-First Century by Thomas Piketty, carbon tax, Cass Sunstein, choice architecture, circular economy, clean water, cognitive bias, collapse of Lehman Brothers, complexity theory, creative destruction, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, degrowth, dematerialisation, disruptive innovation, Douglas Engelbart, Douglas Engelbart, Easter island, en.wikipedia.org, energy transition, Erik Brynjolfsson, Ethereum, ethereum blockchain, Eugene Fama: efficient market hypothesis, experimental economics, Exxon Valdez, Fall of the Berlin Wall, financial deregulation, Financial Instability Hypothesis, full employment, Future Shock, Garrett Hardin, Glass-Steagall Act, global supply chain, global village, Henri Poincaré, hiring and firing, Howard Zinn, Hyman Minsky, income inequality, Intergovernmental Panel on Climate Change (IPCC), invention of writing, invisible hand, Isaac Newton, it is difficult to get a man to understand something, when his salary depends on his not understanding it, John Maynard Keynes: Economic Possibilities for our Grandchildren, Joseph Schumpeter, Kenneth Arrow, Kenneth Rogoff, Kickstarter, land reform, land value tax, Landlord’s Game, loss aversion, low interest rates, low skilled workers, M-Pesa, Mahatma Gandhi, market fundamentalism, Martin Wolf, means of production, megacity, Minsky moment, mobile money, Money creation, Mont Pelerin Society, Myron Scholes, neoliberal agenda, Network effects, Occupy movement, ocean acidification, off grid, offshore financial centre, oil shale / tar sands, out of africa, Paul Samuelson, peer-to-peer, planetary scale, price mechanism, quantitative easing, randomized controlled trial, retail therapy, Richard Thaler, Robert Solow, Ronald Reagan, Second Machine Age, secular stagnation, shareholder value, sharing economy, Silicon Valley, Simon Kuznets, smart cities, smart meter, Social Responsibility of Business Is to Increase Its Profits, South Sea Bubble, statistical model, Steve Ballmer, systems thinking, TED Talk, The Chicago School, The Great Moderation, the map is not the territory, the market place, The Spirit Level, The Wealth of Nations by Adam Smith, Thomas Malthus, Thorstein Veblen, too big to fail, Torches of Freedom, Tragedy of the Commons, trickle-down economics, ultimatum game, universal basic income, Upton Sinclair, Vilfredo Pareto, wikimedia commons

Given its uncanny resemblance to that famous inequality curve of Chapter 5, this new one was soon known as the Environmental Kuznets Curve. The Environmental Kuznets Curve, which suggests that growth will eventually fix the environmental problems that it creates. Having discovered another apparent economic law of motion, the economists could not resist the urge to use statistical modelling in order to identify the level of income at which the curve magically turned. For lead contamination in rivers, they found, pollution peaked and started to fall when national income reached $1,887 per person (measured in 1985 US dollars, the standard metric of the day). What about sulphur dioxide in the air?


pages: 437 words: 113,173

Age of Discovery: Navigating the Risks and Rewards of Our New Renaissance by Ian Goldin, Chris Kutarna

"World Economic Forum" Davos, 2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, Airbnb, Albert Einstein, AltaVista, Asian financial crisis, asset-backed security, autonomous vehicles, banking crisis, barriers to entry, battle of ideas, Bear Stearns, Berlin Wall, bioinformatics, bitcoin, Boeing 747, Bonfire of the Vanities, bread and circuses, carbon tax, clean water, collective bargaining, Colonization of Mars, Credit Default Swap, CRISPR, crowdsourcing, cryptocurrency, Dava Sobel, demographic dividend, Deng Xiaoping, digital divide, Doha Development Round, double helix, driverless car, Edward Snowden, Elon Musk, en.wikipedia.org, epigenetics, experimental economics, Eyjafjallajökull, failed state, Fall of the Berlin Wall, financial innovation, full employment, Galaxy Zoo, general purpose technology, Glass-Steagall Act, global pandemic, global supply chain, Higgs boson, Hyperloop, immigration reform, income inequality, indoor plumbing, industrial cluster, industrial robot, information retrieval, information security, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invention of the printing press, Isaac Newton, Islamic Golden Age, Johannes Kepler, Khan Academy, Kickstarter, Large Hadron Collider, low cost airline, low skilled workers, Lyft, Mahbub ul Haq, Malacca Straits, mass immigration, Max Levchin, megacity, Mikhail Gorbachev, moral hazard, Nelson Mandela, Network effects, New Urbanism, non-tariff barriers, Occupy movement, On the Revolutions of the Heavenly Spheres, open economy, Panamax, Paris climate accords, Pearl River Delta, personalized medicine, Peter Thiel, post-Panamax, profit motive, public intellectual, quantum cryptography, rent-seeking, reshoring, Robert Gordon, Robert Metcalfe, Search for Extraterrestrial Intelligence, Second Machine Age, self-driving car, Shenzhen was a fishing village, Silicon Valley, Silicon Valley startup, Skype, smart grid, Snapchat, special economic zone, spice trade, statistical model, Stephen Hawking, Steve Jobs, Stuxnet, synthetic biology, TED Talk, The Future of Employment, too big to fail, trade liberalization, trade route, transaction costs, transatlantic slave trade, uber lyft, undersea cable, uranium enrichment, We are the 99%, We wanted flying cars, instead we got 140 characters, working poor, working-age population, zero day

Sequencing machines arrived to automate many of the lab technicians’ decoding tasks. DNA copy machines were invented that could take a single DNA snippet of interest and make millions of copies overnight, which in turn enabled a new generation of faster sequencers designed to apply brute force to now-inexhaustible source material. Mathematicians developed new statistical models to puzzle out how to stitch any number of snippets back together into their correct order, and the “shotgun sequencing” technique (basically, blasting the entire genome into tens of thousands of very short segments) was born to take advantage of this new “sequence now, line up later” capability.


pages: 484 words: 120,507

The Last Lingua Franca: English Until the Return of Babel by Nicholas Ostler

barriers to entry, BRICs, British Empire, call centre, en.wikipedia.org, European colonialism, Internet Archive, invention of writing, Isaac Newton, language acquisition, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mass immigration, Nelson Mandela, open economy, precautionary principle, Republic of Letters, Scramble for Africa, statistical model, trade route, upwardly mobile, Wayback Machine

In essence these resources are nothing other than large quantities of text (text corpora) or recorded speech (speech databases) in some form that is systematic and well documented enough to be tractable for digital analysis. From these files, it is possible to derive indices, glossaries, and thesauri, which can be the basis for dictionaries; it is also possible to derive statistical models of the languages, and (if they are multilingual files as, e.g., the official dossiers of the Canadian Parliament, the European Union, or some agency of the United Nations) models of equivalences among languages. These models are calculations of the conditional probability of sequences of sounds, or sequences of words, on the basis of past per formance in all those recorded files.


pages: 459 words: 118,959

Confidence Game: How a Hedge Fund Manager Called Wall Street's Bluff by Christine S. Richard

activist fund / activist shareholder / activist investor, Alan Greenspan, Asian financial crisis, asset-backed security, banking crisis, Bear Stearns, Bernie Madoff, Blythe Masters, book value, buy and hold, Carl Icahn, cognitive dissonance, collateralized debt obligation, corporate governance, corporate raider, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, Donald Trump, electricity market, family office, financial innovation, fixed income, forensic accounting, glass ceiling, Greenspan put, Long Term Capital Management, market bubble, money market fund, moral hazard, old-boy network, Pershing Square Capital Management, Ponzi scheme, profit motive, Savings and loan crisis, short selling, short squeeze, statistical model, stock buybacks, subprime mortgage crisis, white flight, zero-sum game

Although an earthquake in California doesn’t increase the chance of an earthquake occurring in Florida, bond defaults tend to be contagious and closely correlated in times of economic stress. That makes CDOs, which mingle various types of loans across different geographic regions, vulnerable to the same pressures. In fact, the whole bond-insurance industry might be vulnerable to faulty statistical models that rely on the past to predict the future, Ackman argued in the report. These models estimated that MBIA faced just a 1-in-10,000 chance of confronting a scenario that would leave it unable to meet all its claims. Yet historical data-based models considered the 1987 stock market crash an event so improbable that it would be expected to happen only once in a trillion years, Ackman explained.


Super Thinking: The Big Book of Mental Models by Gabriel Weinberg, Lauren McCann

Abraham Maslow, Abraham Wald, affirmative action, Affordable Care Act / Obamacare, Airbnb, Albert Einstein, anti-pattern, Anton Chekhov, Apollo 13, Apple Newton, autonomous vehicles, bank run, barriers to entry, Bayesian statistics, Bernie Madoff, Bernie Sanders, Black Swan, Broken windows theory, business process, butterfly effect, Cal Newport, Clayton Christensen, cognitive dissonance, commoditize, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, dark pattern, David Attenborough, delayed gratification, deliberate practice, discounted cash flows, disruptive innovation, Donald Trump, Douglas Hofstadter, Dunning–Kruger effect, Edward Lorenz: Chaos theory, Edward Snowden, effective altruism, Elon Musk, en.wikipedia.org, experimental subject, fake news, fear of failure, feminist movement, Filter Bubble, framing effect, friendly fire, fundamental attribution error, Goodhart's law, Gödel, Escher, Bach, heat death of the universe, hindsight bias, housing crisis, if you see hoof prints, think horses—not zebras, Ignaz Semmelweis: hand washing, illegal immigration, imposter syndrome, incognito mode, income inequality, information asymmetry, Isaac Newton, Jeff Bezos, John Nash: game theory, karōshi / gwarosa / guolaosi, lateral thinking, loss aversion, Louis Pasteur, LuLaRoe, Lyft, mail merge, Mark Zuckerberg, meta-analysis, Metcalfe’s law, Milgram experiment, minimum viable product, moral hazard, mutually assured destruction, Nash equilibrium, Network effects, nocebo, nuclear winter, offshore financial centre, p-value, Paradox of Choice, Parkinson's law, Paul Graham, peak oil, Peter Thiel, phenotype, Pierre-Simon Laplace, placebo effect, Potemkin village, power law, precautionary principle, prediction markets, premature optimization, price anchoring, principal–agent problem, publication bias, recommendation engine, remote working, replication crisis, Richard Feynman, Richard Feynman: Challenger O-ring, Richard Thaler, ride hailing / ride sharing, Robert Metcalfe, Ronald Coase, Ronald Reagan, Salesforce, school choice, Schrödinger's Cat, selection bias, Shai Danziger, side project, Silicon Valley, Silicon Valley startup, speech recognition, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, Streisand effect, sunk-cost fallacy, survivorship bias, systems thinking, The future is already here, The last Blockbuster video rental store is in Bend, Oregon, The Present Situation in Quantum Mechanics, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Tragedy of the Commons, transaction costs, uber lyft, ultimatum game, uranium enrichment, urban planning, vertical integration, Vilfredo Pareto, warehouse robotics, WarGames: Global Thermonuclear War, When a measure becomes a target, wikimedia commons

The app developers think that their app can improve this rate, helping more people fall asleep in less than ten minutes. The developers plan a study in a sleep lab to test their theory. The test group will use their app and the control group will just go to sleep without it. (A real study might have a slightly more complicated design, but this simple design will let us better explain the statistical models.) The statistical setup behind most experiments (including this one) starts with a hypothesis that there is no difference between the groups, called the null hypothesis. If the developers collect sufficient evidence to reject this hypothesis, then they will conclude that their app really does help people fall asleep faster.


pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber

"World Economic Forum" Davos, AI winter, Alan Greenspan, algorithmic trading, AOL-Time Warner, Apollo 11, asset allocation, banking crisis, barriers to entry, Bear Stearns, Big bang: deregulation of the City of London, Bob Litterman, book value, business cycle, butter production in bangladesh, butterfly effect, buttonwood tree, buy and hold, buy low sell high, capital asset pricing model, Charles Babbage, citizen journalism, collateralized debt obligation, Cornelius Vanderbilt, corporate governance, Craig Reynolds: boids flock, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, electricity market, Emanuel Derman, en.wikipedia.org, experimental economics, fake news, financial engineering, financial innovation, fixed income, Ford Model T, Gordon Gekko, Hans Moravec, Herman Kahn, implied volatility, index arbitrage, index fund, information retrieval, intangible asset, Internet Archive, Ivan Sutherland, Jim Simons, John Bogle, John Nash: game theory, Kenneth Arrow, load shedding, Long Term Capital Management, machine readable, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, Metcalfe’s law, military-industrial complex, moral hazard, mutually assured destruction, Myron Scholes, natural language processing, negative equity, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, proprietary trading, quantitative hedge fund, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Reminiscences of a Stock Operator, Renaissance Technologies, risk free rate, risk tolerance, risk-adjusted returns, risk/return, Robert Metcalfe, Ronald Reagan, Rubik’s Cube, Savings and loan crisis, semantic web, Sharpe ratio, short selling, short squeeze, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, stock buybacks, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, tontine, too big to fail, transaction costs, Turing machine, two and twenty, Upton Sinclair, value at risk, value engineering, Vernor Vinge, Wayback Machine, yield curve, Yogi Berra, your tax dollars at work

Fischer Black’s Quantitative Strategies Group at Goldman Sachs were algo pioneers. They were perhaps the first to use computers for actual trading, as well as for identifying trades. The early alpha seekers were the first combatants in the algo wars. Pairs trading, popular at the time, relied on statistical models. Finding stronger short-term correlations than the next guy had big rewards. Escalation beyond pairs to groups of related securities was inevitable. Parallel developments in futures markets opened the door to electronic index arbitrage trading. Automated market making was a valuable early algorithm.


pages: 401 words: 112,784

Hard Times: The Divisive Toll of the Economic Slump by Tom Clark, Anthony Heath

Affordable Care Act / Obamacare, Alan Greenspan, British Empire, business cycle, Carmen Reinhart, classic study, credit crunch, Daniel Kahneman / Amos Tversky, debt deflation, deindustrialization, Etonian, eurozone crisis, falling living standards, full employment, Gini coefficient, Greenspan put, growth hacking, hedonic treadmill, hiring and firing, income inequality, interest rate swap, invisible hand, It's morning again in America, John Maynard Keynes: Economic Possibilities for our Grandchildren, Kenneth Rogoff, labour market flexibility, low interest rates, low skilled workers, MITM: man-in-the-middle, mortgage debt, new economy, Northern Rock, obamacare, oil shock, plutocrats, price stability, quantitative easing, Right to Buy, Ronald Reagan, science of happiness, statistical model, The Wealth of Nations by Adam Smith, unconventional monetary instruments, War on Poverty, We are the 99%, women in the workforce, working poor

There is also steady downward-shifting between the cat-egories in the frequency with which people claim to help. Analysis of CPS data presented at the SCHMI seminar in Sarasota, Florida, March 2012, by James Laurence and Chaeyoon Lim. 37. Data presented at the SCHMI seminar in Sarasota, Florida, March 2012, by James Laurence and Chaeyoon Lim. 38. All the statistical models – the results of which are reported in Table I of Lim and Laurence, ‘Doing good when times are bad’ – adjust for personal characteristics, including employment status, and yet the significant decline in volunteering remains. Factoring household income into the modelling, the authors report, yields results that are ‘almost identical’. 39.


pages: 372 words: 116,005

The Secret Barrister: Stories of the Law and How It's Broken by Secret Barrister

cognitive bias, Donald Trump, G4S, glass ceiling, haute cuisine, Intergovernmental Panel on Climate Change (IPCC), mandatory minimum, post-truth, race to the bottom, Schrödinger's Cat, statistical model

The analysis used seventeen broad ‘offence groups’, and compared defendants from different ethnic backgrounds within these groups. The groups each comprised a wide range of offences; for example ‘violence against the person’ included crimes ranging from common assault to murder, and drug offence categories did not distinguish between Class A and Class B offences, or between possession and supply. Furthermore, the statistical modelling did not take into account aggravating and mitigating features of the offences. Further analysis is therefore required into sentencing of specific offences, including aggravating and mitigating factors, before any meaningful comparisons might be drawn. 10. See, for example, M R Banaji and A G Greenwald, Blind Spot: Hidden Biases of Good People, Delacorte Press, 2013.


Financial Statement Analysis: A Practitioner's Guide by Martin S. Fridson, Fernando Alvarez

Bear Stearns, book value, business cycle, corporate governance, credit crunch, discounted cash flows, diversification, Donald Trump, double entry bookkeeping, Elon Musk, financial engineering, fixed income, information trail, intangible asset, interest rate derivative, interest rate swap, junk bonds, negative equity, new economy, offshore financial centre, postindustrial economy, profit maximization, profit motive, Richard Thaler, shareholder value, speech recognition, statistical model, stock buybacks, the long tail, time value of money, transaction costs, Y2K, zero-coupon bond

There is ample evidence, as well, of inefficiency in many large, bureaucratic organizations. The point, however, is not to debate whether big corporations are invincible or nimble, but to determine whether they meet their obligations with greater regularity, on average, than their pint-size peers. Statistical models of default risk confirm that they do. Therefore, the bond-rating agencies are following sound methodology when they create size-based peer groups. Line of business is another basis for defining a peer group. Because different industries have different financial characteristics, ratio comparisons across industry lines may not be valid.


Succeeding With AI: How to Make AI Work for Your Business by Veljko Krunic

AI winter, Albert Einstein, algorithmic trading, AlphaGo, Amazon Web Services, anti-fragile, anti-pattern, artificial general intelligence, autonomous vehicles, Bayesian statistics, bioinformatics, Black Swan, Boeing 737 MAX, business process, cloud computing, commoditize, computer vision, correlation coefficient, data is the new oil, data science, deep learning, DeepMind, en.wikipedia.org, fail fast, Gini coefficient, high net worth, information retrieval, Internet of things, iterative process, job automation, Lean Startup, license plate recognition, minimum viable product, natural language processing, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, six sigma, smart cities, speech recognition, statistical model, strong AI, tail risk, The Design of Experiments, the scientific method, web application, zero-sum game

PID compares errors between current values and a desired value of some process variable for the system under control and applies the correction to that Glossary of terms 223 process variable based on proportional, integral, and derivative terms. PID controllers are widely used in various control systems.  Quantitative analysis (QA)—According to Will Kenton [187]: Quantitative analysis (QA) is a technique that seeks to understand behavior by using mathematical and statistical modeling, measurement, and research. Quantitative analysts aim to represent a given reality in terms of a numerical value.  Quantitative analyst (quant)—A practitioner of quantitative analysis [187]. Com-      mon business verticals in which quants work are trading and other financial services.


Human Frontiers: The Future of Big Ideas in an Age of Small Thinking by Michael Bhaskar

"Margaret Hamilton" Apollo, 3D printing, additive manufacturing, AI winter, Albert Einstein, algorithmic trading, AlphaGo, Anthropocene, artificial general intelligence, augmented reality, autonomous vehicles, backpropagation, barriers to entry, basic income, behavioural economics, Benoit Mandelbrot, Berlin Wall, Big bang: deregulation of the City of London, Big Tech, Bletchley Park, blockchain, Boeing 747, brain emulation, Brexit referendum, call centre, carbon tax, charter city, citizen journalism, Claude Shannon: information theory, Clayton Christensen, clean tech, clean water, cognitive load, Columbian Exchange, coronavirus, cosmic microwave background, COVID-19, creative destruction, CRISPR, crony capitalism, cyber-physical system, dark matter, David Graeber, deep learning, DeepMind, deindustrialization, dematerialisation, Demis Hassabis, demographic dividend, Deng Xiaoping, deplatforming, discovery of penicillin, disruptive innovation, Donald Trump, double entry bookkeeping, Easter island, Edward Jenner, Edward Lorenz: Chaos theory, Elon Musk, en.wikipedia.org, endogenous growth, energy security, energy transition, epigenetics, Eratosthenes, Ernest Rutherford, Eroom's law, fail fast, false flag, Fellow of the Royal Society, flying shuttle, Ford Model T, Francis Fukuyama: the end of history, general purpose technology, germ theory of disease, glass ceiling, global pandemic, Goodhart's law, Google Glasses, Google X / Alphabet X, GPT-3, Haber-Bosch Process, hedonic treadmill, Herman Kahn, Higgs boson, hive mind, hype cycle, Hyperloop, Ignaz Semmelweis: hand washing, Innovator's Dilemma, intangible asset, interchangeable parts, Internet of things, invention of agriculture, invention of the printing press, invention of the steam engine, invention of the telegraph, invisible hand, Isaac Newton, ITER tokamak, James Watt: steam engine, James Webb Space Telescope, Jeff Bezos, jimmy wales, job automation, Johannes Kepler, John von Neumann, Joseph Schumpeter, Kenneth Arrow, Kevin Kelly, Kickstarter, knowledge economy, knowledge worker, Large Hadron Collider, liberation theology, lockdown, lone genius, loss aversion, Louis Pasteur, Mark Zuckerberg, Martin Wolf, megacity, megastructure, Menlo Park, Minecraft, minimum viable product, mittelstand, Modern Monetary Theory, Mont Pelerin Society, Murray Gell-Mann, Mustafa Suleyman, natural language processing, Neal Stephenson, nuclear winter, nudge unit, oil shale / tar sands, open economy, OpenAI, opioid epidemic / opioid crisis, PageRank, patent troll, Peter Thiel, plutocrats, post scarcity, post-truth, precautionary principle, public intellectual, publish or perish, purchasing power parity, quantum entanglement, Ray Kurzweil, remote working, rent-seeking, Republic of Letters, Richard Feynman, Robert Gordon, Robert Solow, secular stagnation, shareholder value, Silicon Valley, Silicon Valley ideology, Simon Kuznets, skunkworks, Slavoj Žižek, sovereign wealth fund, spinning jenny, statistical model, stem cell, Steve Jobs, Stuart Kauffman, synthetic biology, techlash, TED Talk, The Rise and Fall of American Growth, the scientific method, The Wealth of Nations by Adam Smith, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, TikTok, total factor productivity, transcontinental railway, Two Sigma, Tyler Cowen, Tyler Cowen: Great Stagnation, universal basic income, uranium enrichment, We wanted flying cars, instead we got 140 characters, When a measure becomes a target, X Prize, Y Combinator

Later his three-colour principle was vital for the invention of colour television. He made vaulting gains in the understanding of Saturn's rings, then one of the most intractable problems in planetary physics. Before moving to electromagnetism, Maxwell had theorised the radical idea of a field. His understanding of gases led towards the use in science of statistical models, a mathematical advance that paved the way for modern physics. Maxwell is pivotal here: after him, physics grew ever more abstract, conceptually reliant on the most sophisticated mathematical techniques. Maxwell understood that while some processes were inaccessible to direct human perception, statistical virtuosity could bridge the gap.


pages: 472 words: 117,093

Machine, Platform, Crowd: Harnessing Our Digital Future by Andrew McAfee, Erik Brynjolfsson

"World Economic Forum" Davos, 3D printing, additive manufacturing, AI winter, Airbnb, airline deregulation, airport security, Albert Einstein, algorithmic bias, AlphaGo, Amazon Mechanical Turk, Amazon Web Services, Andy Rubin, AOL-Time Warner, artificial general intelligence, asset light, augmented reality, autism spectrum disorder, autonomous vehicles, backpropagation, backtesting, barriers to entry, behavioural economics, bitcoin, blockchain, blood diamond, British Empire, business cycle, business process, carbon footprint, Cass Sunstein, centralized clearinghouse, Chris Urmson, cloud computing, cognitive bias, commoditize, complexity theory, computer age, creative destruction, CRISPR, crony capitalism, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, data science, Dean Kamen, deep learning, DeepMind, Demis Hassabis, discovery of DNA, disintermediation, disruptive innovation, distributed ledger, double helix, driverless car, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Ethereum, ethereum blockchain, everywhere but in the productivity statistics, Evgeny Morozov, fake news, family office, fiat currency, financial innovation, general purpose technology, Geoffrey Hinton, George Akerlof, global supply chain, Great Leap Forward, Gregor Mendel, Hernando de Soto, hive mind, independent contractor, information asymmetry, Internet of things, inventory management, iterative process, Jean Tirole, Jeff Bezos, Jim Simons, jimmy wales, John Markoff, joint-stock company, Joseph Schumpeter, Kickstarter, Kiva Systems, law of one price, longitudinal study, low interest rates, Lyft, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Marc Andreessen, Marc Benioff, Mark Zuckerberg, meta-analysis, Mitch Kapor, moral hazard, multi-sided market, Mustafa Suleyman, Myron Scholes, natural language processing, Network effects, new economy, Norbert Wiener, Oculus Rift, PageRank, pattern recognition, peer-to-peer lending, performance metric, plutocrats, precision agriculture, prediction markets, pre–internet, price stability, principal–agent problem, Project Xanadu, radical decentralization, Ray Kurzweil, Renaissance Technologies, Richard Stallman, ride hailing / ride sharing, risk tolerance, Robert Solow, Ronald Coase, Salesforce, Satoshi Nakamoto, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Skype, slashdot, smart contracts, Snapchat, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Pinker, supply-chain management, synthetic biology, tacit knowledge, TaskRabbit, Ted Nelson, TED Talk, the Cathedral and the Bazaar, The Market for Lemons, The Nature of the Firm, the strength of weak ties, Thomas Davenport, Thomas L Friedman, too big to fail, transaction costs, transportation-network company, traveling salesman, Travis Kalanick, Two Sigma, two-sided market, Tyler Cowen, Uber and Lyft, Uber for X, uber lyft, ubercab, Vitalik Buterin, warehouse robotics, Watson beat the top human players on Jeopardy!, winner-take-all economy, yield management, zero day

In the 1980s, I judged fully automated recognition of connected speech (listening to connected conversational speech and writing down accurately what was said) to be too difficult for machines. . . . The speech engineers have accomplished it without even relying on any syntactic§§ analysis: pure engineering, aided by statistical modeling based on gigantic amounts of raw data. . . . I not only didn’t think I would see this come about, I would have confidently bet against it.” A remark attributed to the legendary computer scientist Frederick Jelinek captures the reason behind the broad transition within the artificial intelligence community from rule-based to statistical approaches.


pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life by Adam Greenfield

3D printing, Airbnb, algorithmic bias, algorithmic management, AlphaGo, augmented reality, autonomous vehicles, bank run, barriers to entry, basic income, bitcoin, Black Lives Matter, blockchain, Boston Dynamics, business intelligence, business process, Californian Ideology, call centre, cellular automata, centralized clearinghouse, centre right, Chuck Templeton: OpenTable:, circular economy, cloud computing, Cody Wilson, collective bargaining, combinatorial explosion, Computer Numeric Control, computer vision, Conway's Game of Life, CRISPR, cryptocurrency, David Graeber, deep learning, DeepMind, dematerialisation, digital map, disruptive innovation, distributed ledger, driverless car, drone strike, Elon Musk, Ethereum, ethereum blockchain, facts on the ground, fiat currency, fulfillment center, gentrification, global supply chain, global village, Goodhart's law, Google Glasses, Herman Kahn, Ian Bogost, IBM and the Holocaust, industrial robot, informal economy, information retrieval, Internet of things, Jacob Silverman, James Watt: steam engine, Jane Jacobs, Jeff Bezos, Jeff Hawkins, job automation, jobs below the API, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John Perry Barlow, John von Neumann, joint-stock company, Kevin Kelly, Kickstarter, Kiva Systems, late capitalism, Leo Hollis, license plate recognition, lifelogging, M-Pesa, Mark Zuckerberg, means of production, megacity, megastructure, minimum viable product, money: store of value / unit of account / medium of exchange, natural language processing, Network effects, New Urbanism, Nick Bostrom, Occupy movement, Oculus Rift, off-the-grid, PalmPilot, Pareto efficiency, pattern recognition, Pearl River Delta, performance metric, Peter Eisenman, Peter Thiel, planetary scale, Ponzi scheme, post scarcity, post-work, printed gun, proprietary trading, RAND corporation, recommendation engine, RFID, rolodex, Rutger Bregman, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, sharing economy, Shenzhen special economic zone , Sidewalk Labs, Silicon Valley, smart cities, smart contracts, social intelligence, sorting algorithm, special economic zone, speech recognition, stakhanovite, statistical model, stem cell, technoutopianism, Tesla Model S, the built environment, The Death and Life of Great American Cities, The Future of Employment, Tony Fadell, transaction costs, Uber for X, undersea cable, universal basic income, urban planning, urban sprawl, vertical integration, Vitalik Buterin, warehouse robotics, When a measure becomes a target, Whole Earth Review, WikiLeaks, women in the workforce

This conceit helps us see that while our ability to act is invariably constrained by history, existing structures of power, and the operations of chance, we nevertheless have a degree of choice as to the kind of world we wish to bring into being. As it was originally developed by Royal Dutch Shell’s Long-Term Studies group,24 scenario planning emphasized quantification, and the creation of detailed statistical models. The scenarios that follow aren’t nearly as rigorous as all that. They are by no means a comprehensive survey of the possible futures available to us, nor is there anything particularly systematic about the way I’ve presented them. They are simply suggestive of the various choices we might plausibly make.


When Computers Can Think: The Artificial Intelligence Singularity by Anthony Berglas, William Black, Samantha Thalind, Max Scratchmann, Michelle Estes

3D printing, Abraham Maslow, AI winter, air gap, anthropic principle, artificial general intelligence, Asilomar, augmented reality, Automated Insights, autonomous vehicles, availability heuristic, backpropagation, blue-collar work, Boston Dynamics, brain emulation, call centre, cognitive bias, combinatorial explosion, computer vision, Computing Machinery and Intelligence, create, read, update, delete, cuban missile crisis, David Attenborough, DeepMind, disinformation, driverless car, Elon Musk, en.wikipedia.org, epigenetics, Ernest Rutherford, factory automation, feminist movement, finite state, Flynn Effect, friendly AI, general-purpose programming language, Google Glasses, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, industrial robot, Isaac Newton, job automation, John von Neumann, Law of Accelerating Returns, license plate recognition, Mahatma Gandhi, mandelbrot fractal, natural language processing, Nick Bostrom, Parkinson's law, patent troll, patient HM, pattern recognition, phenotype, ransomware, Ray Kurzweil, Recombinant DNA, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, sorting algorithm, speech recognition, statistical model, stem cell, Stephen Hawking, Stuxnet, superintelligent machines, technological singularity, Thomas Malthus, Turing machine, Turing test, uranium enrichment, Von Neumann architecture, Watson beat the top human players on Jeopardy!, wikimedia commons, zero day

And like every other formalism, decision table conditions can be learnt from experience using various algorithms. Regression Linear and exponential regression. Owned Statisticians have used regression methods since the nineteenth century to fit a function to a set of data points. In the chart above, Excel was used to automatically fit two statistical models to the data represented by the red dots. The first is a simple straight line, while the second is a curved exponential function. In both cases the 14 data points are modelled by just two numbers that are shown on the chart. The R2 value shows the sum of squares correlation between the models and the data, and shows that the exponential model is a better fit.


pages: 387 words: 119,409

Work Rules!: Insights From Inside Google That Will Transform How You Live and Lead by Laszlo Bock

Abraham Maslow, Abraham Wald, Airbnb, Albert Einstein, AltaVista, Atul Gawande, behavioural economics, Black Swan, book scanning, Burning Man, call centre, Cass Sunstein, Checklist Manifesto, choice architecture, citizen journalism, clean water, cognitive load, company town, correlation coefficient, crowdsourcing, Daniel Kahneman / Amos Tversky, deliberate practice, en.wikipedia.org, experimental subject, Fairchild Semiconductor, Frederick Winslow Taylor, future of work, Google Earth, Google Glasses, Google Hangouts, Google X / Alphabet X, Googley, helicopter parent, immigration reform, Internet Archive, Kevin Roose, longitudinal study, Menlo Park, mental accounting, meta-analysis, Moneyball by Michael Lewis explains big data, nudge unit, PageRank, Paul Buchheit, power law, Ralph Waldo Emerson, Rana Plaza, random walk, Richard Thaler, Rubik’s Cube, self-driving car, shareholder value, Sheryl Sandberg, side project, Silicon Valley, six sigma, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Steven Pinker, survivorship bias, Susan Wojcicki, TaskRabbit, The Wisdom of Crowds, Tony Hsieh, Turing machine, Wayback Machine, winner-take-all economy, Y2K

It’s possible to measure the change in patient outcomes after a physician learns a new technique by recording recovery times, incidence of complications, and degree of vision improvement. It’s much more difficult to measure the impact of training on less structured jobs or more general skills. You can develop fantastically sophisticated statistical models to draw connections between training and outcomes, and at Google we often do. In fact, we often have to, just because our engineers won’t believe us otherwise! But for most organizations, there’s a shortcut. Skip the graduate-school math and just compare how identical groups perform after only one has received training.


pages: 461 words: 125,845

This Machine Kills Secrets: Julian Assange, the Cypherpunks, and Their Fight to Empower Whistleblowers by Andy Greenberg

air gap, Apple II, Ayatollah Khomeini, Berlin Wall, Bill Gates: Altair 8800, Bletchley Park, Burning Man, Chelsea Manning, computerized markets, crowdsourcing, cryptocurrency, disinformation, domain-specific language, driverless car, drone strike, en.wikipedia.org, Evgeny Morozov, Fairchild Semiconductor, fault tolerance, hive mind, information security, Jacob Appelbaum, John Gilmore, John Perry Barlow, Julian Assange, Lewis Mumford, Mahatma Gandhi, military-industrial complex, Mitch Kapor, MITM: man-in-the-middle, Mohammed Bouazizi, Mondo 2000, Neal Stephenson, nuclear winter, offshore financial centre, operational security, PalmPilot, pattern recognition, profit motive, Ralph Nader, real-name policy, reality distortion field, Richard Stallman, Robert Hanssen: Double agent, Silicon Valley, Silicon Valley ideology, Skype, social graph, SQL injection, statistical model, stem cell, Steve Jobs, Steve Wozniak, Steven Levy, Teledyne, three-masted sailing ship, undersea cable, Vernor Vinge, We are Anonymous. We are Legion, We are the 99%, WikiLeaks, X Prize, Zimmermann PGP

So they’re planning on eventually integrating their submissions page directly into the home pages themselves, a trick that requires coaching their media partners on how to excise security bugs from the most complex portion of their sites. Once they have what the OpenLeaks engineer calls that “armored car” version of the partner sites set up, they plan to go even further than WikiLeaks, building more convincing cover traffic than has ever existed before, this unnamed engineer tells me. They’ve statistically modeled the timing and file size of uploads to WikiLeaks and have used it to spoof those submissions with high statistical accuracy. Most submissions to WikiLeaks were between 1.5 and 2 megabytes, for instance. Less than one percent are above 700 megabytes. Their cover traffic aims to follow exactly the same bell curve, making it theoretically indistinguishable from real submissions under the cover of SSL encryption, even when the user isn’t running Tor.


pages: 755 words: 121,290

Statistics hacks by Bruce Frey

Bayesian statistics, Berlin Wall, correlation coefficient, Daniel Kahneman / Amos Tversky, distributed generation, en.wikipedia.org, feminist movement, G4S, game design, Hacker Ethic, index card, Linda problem, Milgram experiment, Monty Hall problem, p-value, place-making, reshoring, RFID, Search for Extraterrestrial Intelligence, SETI@home, Silicon Valley, statistical model, sugar pill, systematic bias, Thomas Bayes

Contributors The following people contributed their hacks, writing, and inspiration to this book: Joseph Adler is the author of Baseball Hacks (O'Reilly), and a researcher in the Advanced Product Development Group at VeriSign, focusing on problems in user authentication, managed security services, and RFID security. Joe has years of experience analyzing data, building statistical models, and formulating business strategies as an employee and consultant for companies including DoubleClick, American Express, and Dun & Bradstreet. He is a graduate of the Massachusetts Institute of Technology with an Sc.B. and an M.Eng. in computer science and computer engineering. Joe is an unapologetic Yankees fan, but he appreciates any good baseball game.


pages: 421 words: 125,417

Common Wealth: Economics for a Crowded Planet by Jeffrey Sachs

agricultural Revolution, air freight, Anthropocene, back-to-the-land, biodiversity loss, British Empire, business process, carbon credits, carbon footprint, carbon tax, clean water, colonial rule, corporate social responsibility, correlation does not imply causation, creative destruction, demographic transition, Diane Coyle, digital divide, Edward Glaeser, energy security, failed state, Garrett Hardin, Gini coefficient, global pandemic, Global Witness, Haber-Bosch Process, impact investing, income inequality, income per capita, Intergovernmental Panel on Climate Change (IPCC), intermodal, invention of agriculture, invention of the steam engine, invisible hand, Joseph Schumpeter, knowledge worker, labor-force participation, low skilled workers, mass immigration, microcredit, ocean acidification, oil shale / tar sands, old age dependency ratio, peak oil, profit maximization, profit motive, purchasing power parity, road to serfdom, Ronald Reagan, Simon Kuznets, Skype, statistical model, The Wealth of Nations by Adam Smith, Thomas Malthus, trade route, Tragedy of the Commons, transaction costs, unemployed young men, War on Poverty, women in the workforce, working-age population, zoonotic diseases

One test of this is the cross-country evidence on economic growth. We can examine whether countries with high fertility rates indeed have lower growth rates of income per person. The standard tests have been carried out by the leaders of empirical growth modeling, Robert Barro and Xavier Sala-i-Martin. Their statistical model accounts for each country’s average annual growth rate of income per person according to various characteristics of the country, including the level of income per person, the average educational attainment, the life expectancy, an indicator of the “rule of law,” and other variables, including the total fertility rate.


pages: 561 words: 120,899

The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant From Two Centuries of Controversy by Sharon Bertsch McGrayne

Abraham Wald, Alan Greenspan, Bayesian statistics, bioinformatics, Bletchley Park, British Empire, classic study, Claude Shannon: information theory, Daniel Kahneman / Amos Tversky, data science, double helix, Dr. Strangelove, driverless car, Edmond Halley, Fellow of the Royal Society, full text search, government statistician, Henri Poincaré, Higgs boson, industrial research laboratory, Isaac Newton, Johannes Kepler, John Markoff, John Nash: game theory, John von Neumann, linear programming, longitudinal study, machine readable, machine translation, meta-analysis, Nate Silver, p-value, Pierre-Simon Laplace, placebo effect, prediction markets, RAND corporation, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman: Challenger O-ring, Robert Mercer, Ronald Reagan, seminal paper, speech recognition, statistical model, stochastic process, Suez canal 1869, Teledyne, the long tail, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, traveling salesman, Turing machine, Turing test, uranium enrichment, We are all Keynesians now, Yom Kippur War

Cochran WG, Mosteller F, Tukey JW. (1954) Statistical Problems of the Kinsey Report on Sexual Behavior in the Human Male. American Statistical Association. Converse, Jean M. (1987) Survey Research in the United States: Roots and Emergence 1890–1960. University of California Press. Fienberg SE, Hoaglin DC, eds. (2006) Selected Papers of Frederick Mosteller. Springer. Fienberg SE et al., eds. (1990) A Statistical Model: Frederick Mosteller’s Contributions to Statistics, Science and Public Policy. Springer-Verlag. Hedley-Whyte J. (2007) Frederick Mosteller (1916–2006): Mentoring, A Memoir. International Journal of Technology Assessment in Health Care (23) 152–54. Ingelfinger, Joseph, et al. (1987) Biostatistics in Clinical Medicine.


pages: 388 words: 125,472

The Establishment: And How They Get Away With It by Owen Jones

anti-communist, Asian financial crisis, autism spectrum disorder, bank run, battle of ideas, Big bang: deregulation of the City of London, bonus culture, Boris Johnson, Bretton Woods, British Empire, call centre, capital controls, Capital in the Twenty-First Century by Thomas Piketty, centre right, citizen journalism, collapse of Lehman Brothers, collective bargaining, disinformation, don't be evil, Edward Snowden, Etonian, eurozone crisis, falling living standards, Francis Fukuyama: the end of history, full employment, G4S, glass ceiling, hiring and firing, housing crisis, inflation targeting, Intergovernmental Panel on Climate Change (IPCC), investor state dispute settlement, James Dyson, Jon Ronson, laissez-faire capitalism, land bank, light touch regulation, low interest rates, market fundamentalism, mass immigration, Monroe Doctrine, Mont Pelerin Society, moral hazard, Neil Kinnock, night-watchman state, Nixon triggered the end of the Bretton Woods system, Northern Rock, Occupy movement, offshore financial centre, old-boy network, open borders, Overton Window, plutocrats, popular capitalism, post-war consensus, profit motive, quantitative easing, race to the bottom, rent control, road to serfdom, Ronald Reagan, shareholder value, short selling, sovereign wealth fund, stakhanovite, statistical model, subprime mortgage crisis, Suez crisis 1956, The Wealth of Nations by Adam Smith, transfer pricing, Tyler Cowen, union organizing, unpaid internship, Washington Consensus, We are all Keynesians now, wealth creators, Winter of Discontent

Rather than flogging off the banks that were bailed out by the taxpayer, government could turn these institutions into publicly owned regional investment banks, helping to rebuild local economies across Britain. They would have specific mandates, such as supporting small businesses currently being starved of loans, as well as helping to reshape the economy and encouraging the new industrial strategy. Again, this does not mean entirely replicating a top-down statist model. British taxpayers bailed out the banks. The old American revolutionary slogan was ‘no taxation without representation’, and the same principle should apply to finance. We, the taxpayers, should have democratic representation on the boards of the banks we have saved, helping to ensure that these same banks are responsive to the needs of consumers and communities.


pages: 401 words: 119,488

Smarter Faster Better: The Secrets of Being Productive in Life and Business by Charles Duhigg

Air France Flight 447, Asperger Syndrome, Atul Gawande, behavioural economics, Black Swan, cognitive dissonance, Daniel Kahneman / Amos Tversky, data science, David Brooks, digital map, epigenetics, Erik Brynjolfsson, framing effect, high-speed rail, hiring and firing, index card, John von Neumann, knowledge worker, Lean Startup, Malcom McLean invented shipping containers, meta-analysis, new economy, power law, Saturday Night Live, Silicon Valley, Silicon Valley startup, statistical model, Steve Jobs, the scientific method, the strength of weak ties, theory of mind, Toyota Production System, William Langewiesche, Yom Kippur War

equally successful group In comments sent in response to fact-checking questions, a Google spokeswoman wrote: “We wanted to test many group norms that we thought might be important. But at the testing phase we didn’t know that the how was going to be more important than the who. When we started running the statistical models, it became clear that not only were the norms more important in our models but that 5 themes stood out from the rest.” Boston hospitals Amy C. Edmondson, “Learning from Mistakes Is Easier Said than Done: Group and Organizational Influences on the Detection and Correction of Human Error,” The Journal of Applied Behavioral Science 32, no. 1 (1996): 5–28; Druskat and Wolff, “Group Emotional Intelligence,” 132–55; David W.


pages: 497 words: 123,718

A Game as Old as Empire: The Secret World of Economic Hit Men and the Web of Global Corruption by Steven Hiatt; John Perkins

"RICO laws" OR "Racketeer Influenced and Corrupt Organizations", "World Economic Forum" Davos, accelerated depreciation, addicted to oil, airline deregulation, Andrei Shleifer, Asian financial crisis, Berlin Wall, big-box store, Bob Geldof, book value, Bretton Woods, British Empire, capital controls, centre right, clean water, colonial rule, corporate governance, corporate personhood, deglobalization, deindustrialization, disinformation, Doha Development Round, energy security, European colonialism, export processing zone, financial deregulation, financial independence, full employment, global village, high net worth, land bank, land reform, large denomination, liberal capitalism, Long Term Capital Management, Mexican peso crisis / tequila crisis, Mikhail Gorbachev, military-industrial complex, moral hazard, Naomi Klein, new economy, North Sea oil, offshore financial centre, oil shock, Ponzi scheme, race to the bottom, reserve currency, Ronald Reagan, Scramble for Africa, Seymour Hersh, statistical model, structural adjustment programs, Suez crisis 1956, Tax Reform Act of 1986, too big to fail, trade liberalization, transatlantic slave trade, transfer pricing, union organizing, Washington Consensus, working-age population, Yom Kippur War

In his book Globalization and Its Discontents, Stiglitz writes: To make its [the IMF’s] programs seem to work, to make the numbers “add up,” economic forecasts have to be adjusted. Many users of these numbers do not realize that they are not like ordinary forecasts; in these instances GDP forecasts are not based on a sophisticated statistical model, or even on the best estimates of those who know the economy well, but are merely the numbers that have been negotiated as part of an IMF program. …1 Globalization, as it has been advocated, often seems to replace the old dictatorships of national elites with new dictatorships of international finance….


pages: 320 words: 87,853

The Black Box Society: The Secret Algorithms That Control Money and Information by Frank Pasquale

Adam Curtis, Affordable Care Act / Obamacare, Alan Greenspan, algorithmic trading, Amazon Mechanical Turk, American Legislative Exchange Council, asset-backed security, Atul Gawande, bank run, barriers to entry, basic income, Bear Stearns, Berlin Wall, Bernie Madoff, Black Swan, bonus culture, Brian Krebs, business cycle, business logic, call centre, Capital in the Twenty-First Century by Thomas Piketty, Chelsea Manning, Chuck Templeton: OpenTable:, cloud computing, collateralized debt obligation, computerized markets, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, cryptocurrency, data science, Debian, digital rights, don't be evil, drone strike, Edward Snowden, en.wikipedia.org, Evgeny Morozov, Fall of the Berlin Wall, Filter Bubble, financial engineering, financial innovation, financial thriller, fixed income, Flash crash, folksonomy, full employment, Gabriella Coleman, Goldman Sachs: Vampire Squid, Google Earth, Hernando de Soto, High speed trading, hiring and firing, housing crisis, Ian Bogost, informal economy, information asymmetry, information retrieval, information security, interest rate swap, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Bogle, Julian Assange, Kevin Kelly, Kevin Roose, knowledge worker, Kodak vs Instagram, kremlinology, late fees, London Interbank Offered Rate, London Whale, machine readable, Marc Andreessen, Mark Zuckerberg, Michael Milken, mobile money, moral hazard, new economy, Nicholas Carr, offshore financial centre, PageRank, pattern recognition, Philip Mirowski, precariat, profit maximization, profit motive, public intellectual, quantitative easing, race to the bottom, reality distortion field, recommendation engine, regulatory arbitrage, risk-adjusted returns, Satyajit Das, Savings and loan crisis, search engine result page, shareholder value, Silicon Valley, Snapchat, social intelligence, Spread Networks laid a new fibre optics cable between New York and Chicago, statistical arbitrage, statistical model, Steven Levy, technological solutionism, the scientific method, too big to fail, transaction costs, two-sided market, universal basic income, Upton Sinclair, value at risk, vertical integration, WikiLeaks, Yochai Benkler, zero-sum game

It might seem risky to give any one household a loan; the breadwinner might fall ill, they might declare bankruptcy, they may hit the lottery and pay off the loan tomorrow (denying the investor a steady stream of interest payments). It’s hard to predict what will happen to any given family. But statistical models can much better predict the likelihood of defaults happening in, say, a group of 1,000 families. They “know” that, in the data used, rarely do, say, more than thirty in a 1,000 borrowers default. This statistical analysis, programmed in proprietary software, was one “green light” for massive investments in the mortgage market.21 That sounds simple, but as fi nance automation took off, such deals tended to get hedged around by contingencies, for instance about possible refi nancings or defaults.


pages: 415 words: 125,089

Against the Gods: The Remarkable Story of Risk by Peter L. Bernstein

Alan Greenspan, Albert Einstein, Alvin Roth, Andrew Wiles, Antoine Gombaud: Chevalier de Méré, Bayesian statistics, behavioural economics, Big bang: deregulation of the City of London, Bretton Woods, business cycle, buttonwood tree, buy and hold, capital asset pricing model, cognitive dissonance, computerized trading, Daniel Kahneman / Amos Tversky, diversified portfolio, double entry bookkeeping, Edmond Halley, Edward Lloyd's coffeehouse, endowment effect, experimental economics, fear of failure, Fellow of the Royal Society, Fermat's Last Theorem, financial deregulation, financial engineering, financial innovation, full employment, Great Leap Forward, index fund, invention of movable type, Isaac Newton, John Nash: game theory, John von Neumann, Kenneth Arrow, linear programming, loss aversion, Louis Bachelier, mental accounting, moral hazard, Myron Scholes, Nash equilibrium, Norman Macrae, Paul Samuelson, Philip Mirowski, Post-Keynesian economics, probability theory / Blaise Pascal / Pierre de Fermat, prudent man rule, random walk, Richard Thaler, Robert Shiller, Robert Solow, spectrum auction, statistical model, stocks for the long run, The Bell Curve by Richard Herrnstein and Charles Murray, The Wealth of Nations by Adam Smith, Thomas Bayes, trade route, transaction costs, tulip mania, Vanguard fund, zero-sum game

Richard Thaler started thinking about these problems in the early 1970s, while working on his doctoral dissertation at the University of Rochester, an institution known for its emphasis on rational theory.' His subject was the value of a human life, and he was trying to prove that the correct measure of that value is the amount people would be willing to pay to save a life. After studying risky occupations like mining and logging, he decided to take a break from the demanding statistical modeling he was doing and began to ask people what value they would put on their own lives. He started by asking two questions. First, how much would you be willing to pay to eliminate a one-in-a-thousand chance of immediate death? And how much would you have to be paid to accept a one-ina-thousand chance of immediate death?


pages: 391 words: 123,597

Targeted: The Cambridge Analytica Whistleblower's Inside Story of How Big Data, Trump, and Facebook Broke Democracy and How It Can Happen Again by Brittany Kaiser

"World Economic Forum" Davos, Albert Einstein, Amazon Mechanical Turk, Asian financial crisis, Bernie Sanders, Big Tech, bitcoin, blockchain, Boris Johnson, Brexit referendum, Burning Man, call centre, Cambridge Analytica, Carl Icahn, centre right, Chelsea Manning, clean water, cognitive dissonance, crony capitalism, dark pattern, data science, disinformation, Dominic Cummings, Donald Trump, Edward Snowden, Etonian, fake news, haute couture, illegal immigration, Julian Assange, Mark Zuckerberg, Menlo Park, Nelson Mandela, off grid, open borders, public intellectual, Renaissance Technologies, Robert Mercer, rolodex, Russian election interference, sentiment analysis, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, Skype, Snapchat, statistical model, Steve Bannon, subprime mortgage crisis, TED Talk, the High Line, the scientific method, WeWork, WikiLeaks, you are the product, young professional

Rospars and his team at Blue State described themselves as pioneers who understood that “people don’t just vote on Election Day—they vote every day with their wallets, with their time, with their clicks and posts and tweets.”2 Other senior-level members of the Obama for America analytics team founded BlueLabs in 2013.3 Daniel Porter had been director of statistical modeling on the 2012 campaign, “the first in the history of presidential politics to use persuasion modeling” to identify swing voters. Sophie Schmidt’s father, Eric, founded Civis in 2013, the same year that Sophie interned at CA. Civis’s mission was to “democratize data science so organizations can stop guessing and make decisions based on numbers and scientific fact.”


pages: 400 words: 121,988

Trading at the Speed of Light: How Ultrafast Algorithms Are Transforming Financial Markets by Donald MacKenzie

algorithmic trading, automated trading system, banking crisis, barriers to entry, bitcoin, blockchain, Bonfire of the Vanities, Bretton Woods, Cambridge Analytica, centralized clearinghouse, Claude Shannon: information theory, coronavirus, COVID-19, cryptocurrency, disintermediation, diversification, en.wikipedia.org, Ethereum, ethereum blockchain, family office, financial intermediation, fixed income, Flash crash, Google Earth, Hacker Ethic, Hibernia Atlantic: Project Express, interest rate derivative, interest rate swap, inventory management, Jim Simons, level 1 cache, light touch regulation, linked data, lockdown, low earth orbit, machine readable, market design, market microstructure, Martin Wolf, proprietary trading, Renaissance Technologies, Satoshi Nakamoto, Small Order Execution System, Spread Networks laid a new fibre optics cable between New York and Chicago, statistical arbitrage, statistical model, Steven Levy, The Great Moderation, transaction costs, UUNET, zero-sum game

As described in chapter 1, these are computer programs that investors can use to break up big orders into small parts and execute them automatically (Whitcomb interview 2). Instinet did not adopt Whitcomb’s suggestion. However, one of his former students, James Hawkes, who taught statistics at the College of Charleston, ran a small firm, Quant Systems, which sold software for statistical analysis. Whitcomb and Hawkes had earlier collaborated on a statistical model to predict the outcomes of horse races. Their equation displayed some predictive power, but because of bookmakers’ large “vigs,” or “takes” (the profits they earn by setting odds unfavorable to the gambler), it did not earn Hawkes and Whitcomb money (Whitcomb interviews 1 and 2). Hawkes, though, also traded stock options, and had installed a satellite dish on the roof of his garage to receive a share-price datafeed.


pages: 945 words: 292,893

Seveneves by Neal Stephenson

Apollo 13, Biosphere 2, clean water, Colonization of Mars, Danny Hillis, digital map, double helix, epigenetics, fault tolerance, Fellow of the Royal Society, Filipino sailors, gravity well, hydroponic farming, Isaac Newton, Jeff Bezos, kremlinology, Kuiper Belt, low earth orbit, machine readable, microbiome, military-industrial complex, Neal Stephenson, orbital mechanics / astrodynamics, phenotype, Potemkin village, pre–internet, random walk, remote working, selection bias, side project, Silicon Valley, Skype, Snow Crash, space junk, statistical model, Stewart Brand, supervolcano, tech billionaire, TED Talk, the scientific method, Tunguska event, VTOL, zero day, éminence grise

Or would it split up into two or more distinct swarms that would try different things? Arguments could be made for all of the above scenarios and many more, depending on what actually happened in the Hard Rain. Since the Earth had never before been bombarded by a vast barrage of lunar fragments, there was no way to predict what it was going to be like. Statistical models had been occupying much of Doob’s time because they had a big influence on which scenarios might be most worth preparing for. To take a simplistic example, if the moon could be relied on to disassemble itself into pea-sized rocks, then the best strategy was to remain in place and not worry too much about maneuvering.

A clutter of faint noise and clouds on the optical telescope gave them data about the density of objects too small and numerous to resolve. All of it fed into the plan. Doob looked tired, and nodded off frequently, and hadn’t eaten a square meal since the last perigee, but he pulled himself together when he was needed and fed any new information into a statistical model, prepared long in advance, that would enable them to maximize their chances by ditching Amalthea and doing the big final burn at just the right times. But as he kept warning Ivy and Zeke, the time was coming soon when they would become so embroiled in the particulars of which rock was coming from which direction that it wouldn’t be a statistical exercise anymore.


pages: 419 words: 130,627

Last Man Standing: The Ascent of Jamie Dimon and JPMorgan Chase by Duff McDonald

"World Economic Forum" Davos, Alan Greenspan, AOL-Time Warner, bank run, Bear Stearns, Blythe Masters, Bonfire of the Vanities, book value, business logic, centralized clearinghouse, collateralized debt obligation, conceptual framework, corporate governance, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Exxon Valdez, financial innovation, fixed income, G4S, Glass-Steagall Act, Greenspan put, housing crisis, interest rate swap, Jeff Bezos, John Meriwether, junk bonds, Kickstarter, laissez-faire capitalism, Long Term Capital Management, margin call, market bubble, Michael Milken, money market fund, moral hazard, negative equity, Nelson Mandela, Northern Rock, profit motive, proprietary trading, Renaissance Technologies, risk/return, Rod Stewart played at Stephen Schwarzman birthday party, Saturday Night Live, sovereign wealth fund, statistical model, Steve Ballmer, Steve Jobs, technology bubble, The Chicago School, too big to fail, Vanguard fund, zero-coupon bond, zero-sum game

Ralph Cioffi of Bear Stearns wasn’t the only one putting his equity at risk by loading up on debt; all of Wall Street was in on the scheme. Warren Buffett thinks Dimon separated himself from the pack by relying on his own judgment and not becoming slave to the software that tried to simplify all of banking into a mathematical equation. “Too many people overemphasize the power of these statistical models,” he says. “But not Jamie. The CEO of any of these firms has to be the chief risk officer. At Berkshire Hathaway, it’s my number one job. I have to be correlating the chance of an earthquake in California not only causing a big insurance loss, but also the effect on Wells Fargo’s earnings, or the availability of money tomorrow.


pages: 500 words: 145,005

Misbehaving: The Making of Behavioral Economics by Richard H. Thaler

3Com Palm IPO, Alan Greenspan, Albert Einstein, Alvin Roth, Amazon Mechanical Turk, Andrei Shleifer, Apple's 1984 Super Bowl advert, Atul Gawande, behavioural economics, Berlin Wall, Bernie Madoff, Black-Scholes formula, book value, business cycle, capital asset pricing model, Cass Sunstein, Checklist Manifesto, choice architecture, clean water, cognitive dissonance, conceptual framework, constrained optimization, Daniel Kahneman / Amos Tversky, delayed gratification, diversification, diversified portfolio, Edward Glaeser, endowment effect, equity premium, equity risk premium, Eugene Fama: efficient market hypothesis, experimental economics, Fall of the Berlin Wall, George Akerlof, hindsight bias, Home mortgage interest deduction, impulse control, index fund, information asymmetry, invisible hand, Jean Tirole, John Nash: game theory, John von Neumann, Kenneth Arrow, Kickstarter, late fees, law of one price, libertarian paternalism, Long Term Capital Management, loss aversion, low interest rates, market clearing, Mason jar, mental accounting, meta-analysis, money market fund, More Guns, Less Crime, mortgage debt, Myron Scholes, Nash equilibrium, Nate Silver, New Journalism, nudge unit, PalmPilot, Paul Samuelson, payday loans, Ponzi scheme, Post-Keynesian economics, presumed consent, pre–internet, principal–agent problem, prisoner's dilemma, profit maximization, random walk, randomized controlled trial, Richard Thaler, risk free rate, Robert Shiller, Robert Solow, Ronald Coase, Silicon Valley, South Sea Bubble, Stanford marshmallow experiment, statistical model, Steve Jobs, sunk-cost fallacy, Supply of New York City Cabdrivers, systematic bias, technology bubble, The Chicago School, The Myth of the Rational Market, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, transaction costs, ultimatum game, Vilfredo Pareto, Walter Mischel, zero-sum game

Yet, although he received a Nobel Prize in economics, unfortunately I think it is fair to say that he had little impact on the economics profession.* I believe many economists ignored Simon because it was too easy to brush aside bounded rationality as a “true but unimportant” concept. Economists were fine with the idea that their models were imprecise and that the predictions of those models would contain error. In the statistical models used by economists, this is handled simply by adding what is called an “error” term to the equation. Suppose you try to predict the height that a child will reach at adulthood using the height of both parents as predictors. This model will do a decent job since tall parents tend to have tall children, but the model will not be perfectly accurate, which is what the error term is meant to capture.


Making Globalization Work by Joseph E. Stiglitz

"World Economic Forum" Davos, affirmative action, Alan Greenspan, Andrei Shleifer, Asian financial crisis, banking crisis, barriers to entry, benefit corporation, Berlin Wall, blood diamond, business process, capital controls, carbon tax, central bank independence, corporate governance, corporate social responsibility, currency manipulation / currency intervention, Doha Development Round, Exxon Valdez, Fall of the Berlin Wall, Firefox, full employment, Garrett Hardin, Gini coefficient, global reserve currency, Global Witness, Great Leap Forward, Gunnar Myrdal, happiness index / gross national happiness, illegal immigration, income inequality, income per capita, incomplete markets, Indoor air pollution, informal economy, information asymmetry, Intergovernmental Panel on Climate Change (IPCC), inventory management, invisible hand, John Markoff, Jones Act, Kenneth Arrow, Kenneth Rogoff, low interest rates, low skilled workers, manufacturing employment, market fundamentalism, Martin Wolf, microcredit, moral hazard, negative emissions, new economy, North Sea oil, offshore financial centre, oil rush, open borders, open economy, price stability, profit maximization, purchasing power parity, quantitative trading / quantitative finance, race to the bottom, reserve currency, rising living standards, risk tolerance, Seymour Hersh, Silicon Valley, special drawing rights, statistical model, the market place, The Wealth of Nations by Adam Smith, Thomas L Friedman, trade liberalization, Tragedy of the Commons, trickle-down economics, union organizing, Washington Consensus, zero-sum game

A claims board could establish, for instance, the magnitude of the damage suffered by each individual and provide compensation on that basis. A separate tribunal could establish the extent of the corporation’s culpability, whether it took actions which caused harm—say, as a result of inappropriate environmental policies—and then assess, using a statistical model, appropriate penalties. Additional punitive damages might be assessed to provide further deterrence or in response to particularly outrageous behavior. Chapter Eight 1.The ruble fell from R6.28 to the dollar before the crisis to R23 to the dollar in January 1999. 2.Argentina abandoned its long-standing foreign exchange regime, in which the peso was convertible to the dollar on a one-to-one basis, in December 2001.


The Science of Language by Noam Chomsky

Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Alfred Russel Wallace, backpropagation, British Empire, Brownian motion, Computing Machinery and Intelligence, dark matter, Drosophila, epigenetics, finite state, Great Leap Forward, Howard Zinn, language acquisition, phenotype, public intellectual, statistical model, stem cell, Steven Pinker, Stuart Kauffman, theory of mind, trolley problem

Chapter 2 Page 20, On biology as more than selectional evolution Kauffman, D’Arcy Thompson, and Turing (in his work on morphogenesis) all emphasize that there is a lot more to evolution and development than can be explained by Darwinian (or neo-Darwinian) selection. (In fact, Darwin himself acknowledged as much, although this is often forgotten.) Each uses mathematics in studying biological systems in different ways. Some of Kauffman's more surprising suggestions concern self-organizing systems and the use of statistical modeling in trying to get a grip on how timing of gene protein expression can influence cell specialization during growth. Page 22, On Plato's Problem and its explanation The term “I-language” is explained – along with “I-belief” and “I-concept” – in Appendix I. For discussion, see Chomsky (1986) and (2000).


The Trade Lifecycle: Behind the Scenes of the Trading Process (The Wiley Finance Series) by Robert P. Baker

asset-backed security, bank run, banking crisis, Basel III, Black-Scholes formula, book value, Brownian motion, business continuity plan, business logic, business process, collapse of Lehman Brothers, corporate governance, credit crunch, Credit Default Swap, diversification, financial engineering, fixed income, functional programming, global macro, hiring and firing, implied volatility, interest rate derivative, interest rate swap, locking in a profit, London Interbank Offered Rate, low interest rates, margin call, market clearing, millennium bug, place-making, prediction markets, proprietary trading, short selling, statistical model, stochastic process, the market place, the payments system, time value of money, too big to fail, transaction costs, value at risk, Wiener process, yield curve, zero-coupon bond

Then in our fixed bond we have the EUR cashflows as 0.97951 × 0.05 × 10,000,000 × 0.8086 = 396,015.9 and 0.95983 × 1.05 × 10,000,000 × 0.7904 = 7,965,821 giving an NPV of EUR 8,361,837. Unknown cashflows In many cases cashflows are not known with certainty. An option is an example of a product that has an unknown cashflow. To value the option we have to use some sort of statistical model that predicts the likely price of the underlying instrument on the exercise date and from there we can calculate the value of the option. Let’s examine a simple case where the underlying price could only be one of a discrete set of possibilities. Suppose Table 26.5 shows an option is struck at 0.9 and the probability of certain prices.


pages: 484 words: 136,735

Capitalism 4.0: The Birth of a New Economy in the Aftermath of Crisis by Anatole Kaletsky

"World Economic Forum" Davos, Alan Greenspan, bank run, banking crisis, Bear Stearns, behavioural economics, Benoit Mandelbrot, Berlin Wall, Black Swan, bond market vigilante , bonus culture, Bretton Woods, BRICs, business cycle, buy and hold, Carmen Reinhart, classic study, cognitive dissonance, collapse of Lehman Brothers, Corn Laws, correlation does not imply causation, creative destruction, credit crunch, currency manipulation / currency intervention, currency risk, David Ricardo: comparative advantage, deglobalization, Deng Xiaoping, eat what you kill, Edward Glaeser, electricity market, Eugene Fama: efficient market hypothesis, eurozone crisis, experimental economics, F. W. de Klerk, failed state, Fall of the Berlin Wall, financial deregulation, financial innovation, Financial Instability Hypothesis, floating exchange rates, foreign exchange controls, full employment, geopolitical risk, George Akerlof, global rebalancing, Goodhart's law, Great Leap Forward, Hyman Minsky, income inequality, information asymmetry, invisible hand, Isaac Newton, Joseph Schumpeter, Kenneth Arrow, Kenneth Rogoff, Kickstarter, laissez-faire capitalism, long and variable lags, Long Term Capital Management, low interest rates, mandelbrot fractal, market design, market fundamentalism, Martin Wolf, military-industrial complex, Minsky moment, Modern Monetary Theory, Money creation, money market fund, moral hazard, mortgage debt, Nelson Mandela, new economy, Nixon triggered the end of the Bretton Woods system, Northern Rock, offshore financial centre, oil shock, paradox of thrift, Pareto efficiency, Paul Samuelson, Paul Volcker talking about ATMs, peak oil, pets.com, Ponzi scheme, post-industrial society, price stability, profit maximization, profit motive, quantitative easing, Ralph Waldo Emerson, random walk, rent-seeking, reserve currency, rising living standards, Robert Shiller, Robert Solow, Ronald Reagan, Savings and loan crisis, seminal paper, shareholder value, short selling, South Sea Bubble, sovereign wealth fund, special drawing rights, statistical model, systems thinking, The Chicago School, The Great Moderation, The inhabitant of London could order by telephone, sipping his morning tea in bed, the various products of the whole earth, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, too big to fail, Vilfredo Pareto, Washington Consensus, zero-sum game

Mandelbrot’s research program undermined most of the mathematical assumptions of modern portfolio theory, which is the basis for the conventional risk models used by regulators, credit-rating agencies, and unsophisticated financial institutions. Mandelbrot’s analysis, presented to nonspecialist readers in his 2004 book (Mis)behavior of Markets, shows with mathematical certainty that these standard statistical models based on neoclassical definitions of efficient markets and rational expectations among investors cannot be true. Had these models been valid, events such as the 1987 stock market crash and the bankruptcy of the 1998 hedge fund crisis would not have occurred even once in the fifteen billion years since the creation of the universe.9 In fact, four such extreme events occurred in just two weeks after the Lehman bankruptcy.


pages: 503 words: 131,064

Liars and Outliers: How Security Holds Society Together by Bruce Schneier

Abraham Maslow, airport security, Alvin Toffler, barriers to entry, behavioural economics, benefit corporation, Berlin Wall, Bernie Madoff, Bernie Sanders, Brian Krebs, Broken windows theory, carried interest, Cass Sunstein, Chelsea Manning, commoditize, corporate governance, crack epidemic, credit crunch, CRISPR, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, David Graeber, desegregation, don't be evil, Double Irish / Dutch Sandwich, Douglas Hofstadter, Dunbar number, experimental economics, Fall of the Berlin Wall, financial deregulation, Future Shock, Garrett Hardin, George Akerlof, hydraulic fracturing, impulse control, income inequality, information security, invention of agriculture, invention of gunpowder, iterative process, Jean Tirole, John Bogle, John Nash: game theory, joint-stock company, Julian Assange, language acquisition, longitudinal study, mass incarceration, meta-analysis, microcredit, mirror neurons, moral hazard, Multics, mutually assured destruction, Nate Silver, Network effects, Nick Leeson, off-the-grid, offshore financial centre, Oklahoma City bombing, patent troll, phenotype, pre–internet, principal–agent problem, prisoner's dilemma, profit maximization, profit motive, race to the bottom, Ralph Waldo Emerson, RAND corporation, Recombinant DNA, rent-seeking, RFID, Richard Thaler, risk tolerance, Ronald Coase, security theater, shareholder value, slashdot, statistical model, Steven Pinker, Stuxnet, technological singularity, The Market for Lemons, The Nature of the Firm, The Spirit Level, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Timothy McVeigh, too big to fail, traffic fines, Tragedy of the Commons, transaction costs, ultimatum game, UNCLOS, union organizing, Vernor Vinge, WikiLeaks, World Values Survey, Y2K, Yochai Benkler, zero-sum game

Majumdar (2006), “Two-Stage Credit Card Fraud Detection Using Sequence Alignment,” Information Systems Security, Lecture Notes in Computer Science, Springer-Verlag, 4332:260–75. predictive policing programs Martin B. Short, Maria R. D'Orsogna, Virginia B. Pasour, George E. Tita, P. Jeffrey Brantingham, Andrea L. Bertozzi, and Lincoln B. Chayes (2008), “A Statistical Model of Criminal Behavior,” Mathematical Models and Methods in Applied Sciences, 18 (Supplement):1249–67. Beth Pearsall (2010), “Predictive Policing: The Future of Law Enforcement?” NIJ Journal, 266:16–9. Nancy Murray (2011), “Profiling in the Age of Total Information Awareness,” Race & Class, 51:3–24.


The Book of Why: The New Science of Cause and Effect by Judea Pearl, Dana Mackenzie

affirmative action, Albert Einstein, AlphaGo, Asilomar, Bayesian statistics, computer age, computer vision, Computing Machinery and Intelligence, confounding variable, correlation coefficient, correlation does not imply causation, Daniel Kahneman / Amos Tversky, data science, deep learning, DeepMind, driverless car, Edmond Halley, Elon Musk, en.wikipedia.org, experimental subject, Great Leap Forward, Gregor Mendel, Isaac Newton, iterative process, John Snow's cholera map, Loebner Prize, loose coupling, Louis Pasteur, Menlo Park, Monty Hall problem, pattern recognition, Paul Erdős, personalized medicine, Pierre-Simon Laplace, placebo effect, Plato's cave, prisoner's dilemma, probability theory / Blaise Pascal / Pierre de Fermat, randomized controlled trial, Recombinant DNA, selection bias, self-driving car, seminal paper, Silicon Valley, speech recognition, statistical model, Stephen Hawking, Steve Jobs, strong AI, The Design of Experiments, the scientific method, Thomas Bayes, Turing test

John Snow, the Broad Street pump, and modern epidemiology. International Journal of Epidemiology 12: 393–396. Cox, D., and Wermuth, N. (2015). Design and interpretation of studies: Relevant concepts from the past and some extensions. Observational Studies 1. Available at: https://arxiv.org/pdf/1505.02452 .pdf. Freedman, D. (2010). Statistical Models and Causal Inference: A Dialogue with the Social Sciences. Cambridge University Press, New York, NY. Glynn, A., and Kashin, K. (2018). Front-door versus back-door adjustment with unmeasured confounding: Bias formulas for front-door and hybrid adjustments. Journal of the American Statistical Association.


How I Became a Quant: Insights From 25 of Wall Street's Elite by Richard R. Lindsey, Barry Schachter

Albert Einstein, algorithmic trading, Andrew Wiles, Antoine Gombaud: Chevalier de Méré, asset allocation, asset-backed security, backtesting, bank run, banking crisis, Bear Stearns, Black-Scholes formula, Bob Litterman, Bonfire of the Vanities, book value, Bretton Woods, Brownian motion, business cycle, business process, butter production in bangladesh, buy and hold, buy low sell high, capital asset pricing model, centre right, collateralized debt obligation, commoditize, computerized markets, corporate governance, correlation coefficient, creative destruction, Credit Default Swap, credit default swaps / collateralized debt obligations, currency manipulation / currency intervention, currency risk, discounted cash flows, disintermediation, diversification, Donald Knuth, Edward Thorp, Emanuel Derman, en.wikipedia.org, Eugene Fama: efficient market hypothesis, financial engineering, financial innovation, fixed income, full employment, George Akerlof, global macro, Gordon Gekko, hiring and firing, implied volatility, index fund, interest rate derivative, interest rate swap, Ivan Sutherland, John Bogle, John von Neumann, junk bonds, linear programming, Loma Prieta earthquake, Long Term Capital Management, machine readable, margin call, market friction, market microstructure, martingale, merger arbitrage, Michael Milken, Myron Scholes, Nick Leeson, P = NP, pattern recognition, Paul Samuelson, pensions crisis, performance metric, prediction markets, profit maximization, proprietary trading, purchasing power parity, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Reminiscences of a Stock Operator, Richard Feynman, Richard Stallman, risk free rate, risk-adjusted returns, risk/return, seminal paper, shareholder value, Sharpe ratio, short selling, Silicon Valley, six sigma, sorting algorithm, statistical arbitrage, statistical model, stem cell, Steven Levy, stochastic process, subscription business, systematic trading, technology bubble, The Great Moderation, the scientific method, too big to fail, trade route, transaction costs, transfer pricing, value at risk, volatility smile, Wiener process, yield curve, young professional

At his invitation, we presented our findings on complexity and disentangling at the CFA Institute’s 1988 conference on continuing education. We also later presented them to the Institute for Quantitative Research in Finance (“Q Group”). Integrating the Investment Process Our research laid the groundwork for our investment approach. Statistical modeling and disentangling of a wide range of stocks and numerous fundamental, behavioral, and economic factors results in a multidimensional security selection system capable of maximizing the number of insights that can be exploited while capturing the intricacies of stock price behavior. This, in turn, allows for construction of portfolios that can achieve consistency of performance through numerous exposures to a large number of precisely defined profit opportunities.


pages: 349 words: 134,041

Traders, Guns & Money: Knowns and Unknowns in the Dazzling World of Derivatives by Satyajit Das

accounting loophole / creative accounting, Alan Greenspan, Albert Einstein, Asian financial crisis, asset-backed security, Bear Stearns, beat the dealer, Black Swan, Black-Scholes formula, Bretton Woods, BRICs, Brownian motion, business logic, business process, buy and hold, buy low sell high, call centre, capital asset pricing model, collateralized debt obligation, commoditize, complexity theory, computerized trading, corporate governance, corporate raider, Credit Default Swap, credit default swaps / collateralized debt obligations, cuban missile crisis, currency peg, currency risk, disinformation, disintermediation, diversification, diversified portfolio, Edward Thorp, Eugene Fama: efficient market hypothesis, Everything should be made as simple as possible, financial engineering, financial innovation, fixed income, Glass-Steagall Act, Haight Ashbury, high net worth, implied volatility, index arbitrage, index card, index fund, interest rate derivative, interest rate swap, Isaac Newton, job satisfaction, John Bogle, John Meriwether, junk bonds, locking in a profit, Long Term Capital Management, low interest rates, mandelbrot fractal, margin call, market bubble, Marshall McLuhan, mass affluent, mega-rich, merger arbitrage, Mexican peso crisis / tequila crisis, money market fund, moral hazard, mutually assured destruction, Myron Scholes, new economy, New Journalism, Nick Leeson, Nixon triggered the end of the Bretton Woods system, offshore financial centre, oil shock, Parkinson's law, placebo effect, Ponzi scheme, proprietary trading, purchasing power parity, quantitative trading / quantitative finance, random walk, regulatory arbitrage, Right to Buy, risk free rate, risk-adjusted returns, risk/return, Salesforce, Satyajit Das, shareholder value, short selling, short squeeze, South Sea Bubble, statistical model, technology bubble, the medium is the message, the new new thing, time value of money, too big to fail, transaction costs, value at risk, Vanguard fund, volatility smile, yield curve, Yogi Berra, zero-coupon bond

The back office has a large, diverse cast. Risk managers are employed to ensure that the risk taken by traders is within specified limits. They ensure that the firm does not self-destruct as a result of some trader betting the bank on the correlation between the lunar cycle and the $/yen exchange rate. Risk managers use elaborate statistical models to keep tabs on the traders. Like double and triple agents, risk managers spy on the traders, each other and even themselves. Lawyers are employed to ensure that hopefully legally binding contracts are signed. Compliance officers ensure that the firm does not break any laws or at least is not caught breaking any laws.


pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang

AI winter, artificial general intelligence, backpropagation, bioinformatics, brain emulation, classic study, combinatorial explosion, complexity theory, computer vision, Computing Machinery and Intelligence, conceptual framework, correlation coefficient, epigenetics, friendly AI, functional programming, G4S, higher-order functions, information retrieval, Isaac Newton, Jeff Hawkins, John Conway, Loebner Prize, Menlo Park, natural language processing, Nick Bostrom, Occam's razor, p-value, pattern recognition, performance metric, precautionary principle, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

Commons and colleagues have also proposed a task-based model which provides a framework for explaining stage discrepancies across tasks and for generating new stages based on classification of observed logical behaviors. [32] promotes a statistical conception of stage, which provides a good bridge between taskbased and stage-based models of development, as statistical modeling allows for stages to be roughly defined and analyzed based on collections of task behaviors. [29] postulates the existence of a postformal stage by observing elevated levels of abstraction which, they argue, are not manifested in formal thought. [33] observes a postformal stage when subjects become capable of analyzing and coordinating complex logical systems with each other, creating metatheoretical supersystems.


pages: 486 words: 132,784

Inventors at Work: The Minds and Motivation Behind Modern Inventions by Brett Stern

Apple II, augmented reality, autonomous vehicles, bioinformatics, Build a better mousetrap, business process, cloud computing, computer vision, cyber-physical system, distributed generation, driverless car, game design, Grace Hopper, human-factors engineering, Richard Feynman, Silicon Valley, skunkworks, Skype, smart transportation, speech recognition, statistical model, stealth mode startup, Steve Jobs, Steve Wozniak, the market place, value engineering, Yogi Berra

When we were doing our work then, computers really weren’t around. They were in the university. The math statistics group at Corning had a big IBM mainframe computer that could tackle really difficult problems, but modeling capabilities just didn’t exist. I had grown up with computers that were basically doing the statistical modeling of molecular spectrum. So, I was reasonably familiar with doing this and even­tually got the first computer in the lab. I was actually taking data off the optical bench that was in my lab directly into a computer. Would I have used 3D modeling if the capability had existed then? Sure, you use whatever tool is available to you.


A Dominant Character by Samanth Subramanian

affirmative action, Alfred Russel Wallace, Arthur Eddington, British Empire, CRISPR, double helix, Drosophila, Eddington experiment, epigenetics, Etonian, Fellow of the Royal Society, Gregor Mendel, Gunnar Myrdal, Louis Pasteur, peak oil, phenotype, statistical model, strikebreaker, Suez canal 1869, the scientific method, Thomas Malthus, Tim Cook: Apple

His work spoke to their deepest instinct for self-preservation. Haldane himself wasn’t a scientist in this mold. He published often and widely. In the 1930s alone, he wrote a paper on the link between quantum mechanics and philosophy, another on an economic theory of price fluctuations, several on statistical models, and a paper each on the cosmology of space-time and the future of warfare. (After the zoologist Karl von Frisch showed that bees communicate with each other through intricate dances, Haldane recalled Aristotle’s description of bee waggles and, in the Journal of Hellenic Studies, inspected it in the light of modern science.


pages: 475 words: 134,707

The Hype Machine: How Social Media Disrupts Our Elections, Our Economy, and Our Health--And How We Must Adapt by Sinan Aral

Airbnb, Albert Einstein, algorithmic bias, AlphaGo, Any sufficiently advanced technology is indistinguishable from magic, AOL-Time Warner, augmented reality, behavioural economics, Bernie Sanders, Big Tech, bitcoin, Black Lives Matter, Cambridge Analytica, carbon footprint, Cass Sunstein, computer vision, contact tracing, coronavirus, correlation does not imply causation, COVID-19, crowdsourcing, cryptocurrency, data science, death of newspapers, deep learning, deepfake, digital divide, digital nomad, disinformation, disintermediation, Donald Trump, Drosophila, Edward Snowden, Elon Musk, en.wikipedia.org, end-to-end encryption, Erik Brynjolfsson, experimental subject, facts on the ground, fake news, Filter Bubble, George Floyd, global pandemic, hive mind, illegal immigration, income inequality, Kickstarter, knowledge worker, lockdown, longitudinal study, low skilled workers, Lyft, Mahatma Gandhi, Mark Zuckerberg, Menlo Park, meta-analysis, Metcalfe’s law, mobile money, move fast and break things, multi-sided market, Nate Silver, natural language processing, Neal Stephenson, Network effects, performance metric, phenotype, recommendation engine, Robert Bork, Robert Shiller, Russian election interference, Second Machine Age, seminal paper, sentiment analysis, shareholder value, Sheryl Sandberg, skunkworks, Snapchat, social contagion, social distancing, social graph, social intelligence, social software, social web, statistical model, stem cell, Stephen Hawking, Steve Bannon, Steve Jobs, Steve Jurvetson, surveillance capitalism, Susan Wojcicki, Telecommunications Act of 1996, The Chicago School, the strength of weak ties, The Wisdom of Crowds, theory of mind, TikTok, Tim Cook: Apple, Uber and Lyft, uber lyft, WikiLeaks, work culture , Yogi Berra

In the fall of 2001, while Mark Zuckerberg was still in high school at Phillips Exeter Academy, three years before he founded Facebook at Harvard, I was a PhD student down the street at MIT, sitting in the reading room at Dewey Library studying for two very different classes: Econometrics I, taught by the world-renowned statistician Jerry Hausman, and The Sociology of Strategy, taught by the then-rising-star sociologist Ezra Zuckerman, who is now the dean of faculty at MIT’s Sloan School of Management. Ezra’s class was heavily focused on social networks, while Jerry’s class introduced us to “BLUE” estimators—the theory of what generates the best linear unbiased statistical models. I had my statistics textbook in one hand and a stack of papers on networks in the other. As I read the statistics text, I saw that it repeated one main assumption of classical statistics over and over again—the assumption that all the observations in the data we were analyzing (the people, firms, or countries) were “independent and identically distributed (or IID).”


Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps by Valliappa Lakshmanan, Sara Robinson, Michael Munn

A Pattern Language, Airbnb, algorithmic trading, automated trading system, business intelligence, business logic, business process, combinatorial explosion, computer vision, continuous integration, COVID-19, data science, deep learning, DevOps, discrete time, en.wikipedia.org, Hacker News, industrial research laboratory, iterative process, Kubernetes, machine translation, microservices, mobile money, natural language processing, Netflix Prize, optical character recognition, pattern recognition, performance metric, recommendation engine, ride hailing / ride sharing, selection bias, self-driving car, sentiment analysis, speech recognition, statistical model, the payments system, web application

Technically, a 2-element feature vector is enough to provide a unique mapping for a vocabulary of size 3: Categorical input Numeric feature English [0.0, 0.0] Chinese [1.0, 0.0] German [0.0, 1.0] This is called dummy coding. Because dummy coding is a more compact representation, it is preferred in statistical models that perform better when the inputs are linearly independent. Modern machine learning algorithms, though, don’t require their inputs to be linearly independent and use methods such as L1 regularization to prune redundant inputs. The additional degree of freedom allows the framework to transparently handle a missing input in production as all zeros: Categorical input Numeric feature English [1.0, 0.0, 0.0] Chinese [0.0, 1.0, 0.0] German [0.0, 0.0, 1.0] (missing) [0.0, 0.0, 0.0] Therefore, many machine learning frameworks often support only one-hot encoding.


pages: 502 words: 132,062

Ways of Being: Beyond Human Intelligence by James Bridle

Ada Lovelace, Airbnb, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Anthropocene, Any sufficiently advanced technology is indistinguishable from magic, autonomous vehicles, behavioural economics, Benoit Mandelbrot, Berlin Wall, Big Tech, Black Lives Matter, blockchain, Californian Ideology, Cambridge Analytica, carbon tax, Charles Babbage, cloud computing, coastline paradox / Richardson effect, Computing Machinery and Intelligence, corporate personhood, COVID-19, cryptocurrency, DeepMind, Donald Trump, Douglas Hofstadter, Elon Musk, experimental subject, factory automation, fake news, friendly AI, gig economy, global pandemic, Gödel, Escher, Bach, impulse control, James Bridle, James Webb Space Telescope, John von Neumann, Kickstarter, Kim Stanley Robinson, language acquisition, life extension, mandelbrot fractal, Marshall McLuhan, microbiome, music of the spheres, negative emissions, Nick Bostrom, Norbert Wiener, paperclip maximiser, pattern recognition, peer-to-peer, planetary scale, RAND corporation, random walk, recommendation engine, self-driving car, SETI@home, shareholder value, Silicon Valley, Silicon Valley ideology, speech recognition, statistical model, surveillance capitalism, techno-determinism, technological determinism, technoutopianism, the long tail, the scientific method, The Soul of a New Machine, theory of mind, traveling salesman, trolley problem, Turing complete, Turing machine, Turing test, UNCLOS, undersea cable, urban planning, Von Neumann architecture, wikimedia commons, zero-sum game

Because of the different ways that different animals react to natural phenomena, according to their size, speed and species, the ICARUS team found it necessary to use particularly complex forms of analysis to pick up on the differences in the data generated from different tags at different times – a welter of subtle and subtly variable signals. To do this, they turned to statistical models developed for financial econometrics: software designed to generate wealth by picking up on subtle signals in stock markets and investment patterns. I like to think of this as a kind of rehabilitation: penitent banking algorithms retiring from the City to start a new life in the countryside, and helping to remediate the Earth.


pages: 636 words: 140,406

The Case Against Education: Why the Education System Is a Waste of Time and Money by Bryan Caplan

affirmative action, Affordable Care Act / Obamacare, assortative mating, behavioural economics, conceptual framework, correlation does not imply causation, deliberate practice, deskilling, disruptive innovation, do what you love, driverless car, en.wikipedia.org, endogenous growth, experimental subject, fear of failure, Flynn Effect, future of work, George Akerlof, ghettoisation, hive mind, job satisfaction, Kenneth Arrow, Khan Academy, labor-force participation, longitudinal study, low interest rates, low skilled workers, market bubble, mass incarceration, meta-analysis, Peter Thiel, price discrimination, profit maximization, publication bias, risk tolerance, Robert Gordon, Ronald Coase, school choice, selection bias, Silicon Valley, statistical model, Steven Pinker, The Bell Curve by Richard Herrnstein and Charles Murray, the scientific method, The Wisdom of Crowds, trickle-down economics, twin studies, Tyler Cowen, unpaid internship, upwardly mobile, women in the workforce, yield curve, zero-sum game

The neglected master’s. Evidence on the master’s degree is sparse. Estimates of the sheepskin effect are scarce and vary widely, so I stipulate that the master’s sheepskin breakdown matches the bachelor’s. Completion rates for the master’s are lower than the bachelor’s. But I failed to locate any statistical models that estimate how master’s completion varies by prior academic performance. While broad outlines are not in doubt, I also located no solid evidence on how, correcting for student ability, the master’s payoff varies by discipline. Sins of omission. To keep my write-up manageable, I gloss over three major credentials: the associate degree, the professional degree, and the Ph.D.


pages: 570 words: 158,139

Overbooked: The Exploding Business of Travel and Tourism by Elizabeth Becker

airport security, Asian financial crisis, barriers to entry, Berlin Wall, BRICs, car-free, carbon footprint, clean water, collective bargaining, colonial rule, computer age, corporate governance, Costa Concordia, Deng Xiaoping, European colonialism, Exxon Valdez, Fall of the Berlin Wall, Frank Gehry, global village, Global Witness, Great Leap Forward, happiness index / gross national happiness, haute cuisine, high-speed rail, indoor plumbing, Kickstarter, Masdar, Murano, Venice glass, open borders, out of africa, race to the bottom, Ralph Nader, Scramble for Africa, Silicon Valley, statistical model, sustainable-tourism, the market place, union organizing, urban renewal, wage slave, young professional, éminence grise

That was the data that was missing. If the new council could measure how much money tourists spent, the industry would know how much it contributed to national economies as well as the global marketplace. From there they could begin flexing their muscles. The WTTC teamed with the Wharton School to produce a statistical model that a region or country could use to measure income from tourism. The statisticians defined the industry by categories: accommodation services; food and beverage services; passenger transport; travel agencies, tour operators and tourist guide services; cultural services; recreation and other entertainment services and a final miscellaneous category that included financial and insurance services.


pages: 582 words: 160,693

The Sovereign Individual: How to Survive and Thrive During the Collapse of the Welfare State by James Dale Davidson, William Rees-Mogg

affirmative action, agricultural Revolution, Alan Greenspan, Alvin Toffler, bank run, barriers to entry, Berlin Wall, borderless world, British Empire, California gold rush, classic study, clean water, colonial rule, Columbine, compound rate of return, creative destruction, Danny Hillis, debt deflation, ending welfare as we know it, epigenetics, Fall of the Berlin Wall, falling living standards, feminist movement, financial independence, Francis Fukuyama: the end of history, full employment, George Gilder, Hernando de Soto, illegal immigration, income inequality, independent contractor, informal economy, information retrieval, Isaac Newton, John Perry Barlow, Kevin Kelly, market clearing, Martin Wolf, Menlo Park, money: store of value / unit of account / medium of exchange, new economy, New Urbanism, Norman Macrae, offshore financial centre, Parkinson's law, pattern recognition, phenotype, price mechanism, profit maximization, rent-seeking, reserve currency, road to serfdom, Ronald Coase, Sam Peltzman, school vouchers, seigniorage, Silicon Valley, spice trade, statistical model, telepresence, The Nature of the Firm, the scientific method, The Wealth of Nations by Adam Smith, Thomas L Friedman, Thomas Malthus, trade route, transaction costs, Turing machine, union organizing, very high income, Vilfredo Pareto

And by no means, however, are all of Morris's fingers pointed at Bill Clinton. His wife comes in for some critical attention as well. For example, consider this excerpt from Morris's account of Hillary Clinton's miraculous commodity trading: "In 1995 economists at Auburn and North Florida Universities ran a sophisticated computer statistical model of the First Lady's trades for publication in the Journal of Economics and Statistics, using all the available records as well as market data from the Wall Street Journal. The probability of Hillary Rodham's having made her trades legitimately, they calculated, was less than one in 250,000,000." 22 Morris musters many incriminating 286 details about the drug-running and money-laundering operation that prospered in Arkansas under Clinton.


pages: 473 words: 154,182

Moby-Duck: The True Story of 28,800 Bath Toys Lost at Sea and of the Beachcombers, Oceanographers, Environmentalists, and Fools, Including the Author, Who Went in Search of Them by Donovan Hohn

An Inconvenient Truth, carbon footprint, clean water, collective bargaining, dark matter, Deng Xiaoping, disinformation, Exxon Valdez, Filipino sailors, Garrett Hardin, Google Earth, hindcast, illegal immigration, indoor plumbing, intermodal, Isaac Newton, means of production, microbiome, Neil Armstrong, ocean acidification, off-the-grid, Panamax, Pearl River Delta, planned obsolescence, post-Panamax, profit motive, Skype, standardized shipping container, statistical model, the long tail, Thorstein Veblen, Tragedy of the Commons, traveling salesman

As it collides with the continental shelf and then with the freshwater gushing out of the rainforests of the coastal mountains, and then with the coast, the North Pacific Drift loses its coherence, crazies, sends out fractal meanders and eddies and tendrils that tease the four voyagers apart. We don’t know for certain what happens next, but statistical models suggest that at least one of the four voyagers I’m imagining—the frog, let’s pretend—will turn south, carried by an eddy or a meander into the California Current, which will likely deliver it, after many months, into the North Pacific Subtropical Gyre. You may now forget about the frog. We already know its story—how, as it disintegrates, it will contribute a few tablespoons of plastic to the Garbage Patch, or to Hawaii’s Plastic Beach, or to the dinner of an albatross, or to a sample collected in the codpiece of Charlie Moore’s manta trawl.


pages: 470 words: 144,455

Secrets and Lies: Digital Security in a Networked World by Bruce Schneier

Ayatollah Khomeini, barriers to entry, Bletchley Park, business process, butterfly effect, cashless society, Columbine, defense in depth, double entry bookkeeping, drop ship, fault tolerance, game design, IFF: identification friend or foe, information security, John Gilmore, John von Neumann, knapsack problem, macro virus, Mary Meeker, MITM: man-in-the-middle, moral panic, Morris worm, Multics, multilevel marketing, mutually assured destruction, PalmPilot, pez dispenser, pirate software, profit motive, Richard Feynman, risk tolerance, Russell Brand, Silicon Valley, Simon Singh, slashdot, statistical model, Steve Ballmer, Steven Levy, systems thinking, the payments system, Timothy McVeigh, Y2K, Yogi Berra

Just as antivirus software needs to be constantly updated with new signatures, this type of IDS needs a constantly updated database of attack signatures. It’s unclear whether such a database can ever keep up with the hacker tools. The other IDS paradigm is anomaly detection. The IDS does some statistical modeling of your network and figures out what is normal. Then, if anything abnormal happens, it sounds an alarm. This kind of thing can be done with rules (the system knows what’s normal and flags anything else), statistics (the system figures out statistically what’s normal and flags anything else), or with artificial-intelligence techniques.


Beginning R: The Statistical Programming Language by Mark Gardener

correlation coefficient, distributed generation, natural language processing, New Urbanism, p-value, statistical model

The bats data yielded a significant interaction term in the two-way ANOVA. Look at this further. Make a graphic of the data and then follow up with a post-hoc analysis. Draw a graph of the interaction. What You Learned in This Chapter Topic Key Points Formula syntax response ~ predictor The formula syntax enables you to specify complex statistical models. Usually the response variables go on the left and predictor variables go on the right. The syntax can also be used in more simple situations and for graphics. Stacking samples stack() In more complex analyses, the data need to be in a layout where each column is a separate item; that is, a column for the response variable and a column for each predictor variable.


Sorting Things Out: Classification and Its Consequences (Inside Technology) by Geoffrey C. Bowker

affirmative action, business process, classic study, corporate governance, Drosophila, government statistician, information retrieval, loose coupling, Menlo Park, Mitch Kapor, natural language processing, Occam's razor, QWERTY keyboard, Scientific racism, scientific worldview, sexual politics, statistical model, Stephen Hawking, Stewart Brand, tacit knowledge, the built environment, the medium is the message, the strength of weak ties, transaction costs, William of Occam

As noted in the case of New Zealand above, its need for information is effectively infinite. Below, for exam­ ple, is a wish list from 1 985 for a national medical information system in the United States: The system must capture more data than just the names of lesions and diseases and the therapeutic procedures used to correct them to meet these needs. I n a statistical model proposed b y Kerr White, all factors affecting health are incorporated: genetic and biological; environmental, behavioral, psychologi­ cal, and social conditions which precipitate health problems' complaints, symp­ toms, and diseases which prompt people to seek medical care; and evaluation of severity and functional capacity, including impairment and handicaps .


Globalists: The End of Empire and the Birth of Neoliberalism by Quinn Slobodian

"World Economic Forum" Davos, Alan Greenspan, Asian financial crisis, Berlin Wall, bilateral investment treaty, borderless world, Bretton Woods, British Empire, business cycle, capital controls, central bank independence, classic study, collective bargaining, David Ricardo: comparative advantage, Deng Xiaoping, desegregation, Dissolution of the Soviet Union, Doha Development Round, eurozone crisis, Fall of the Berlin Wall, floating exchange rates, full employment, Garrett Hardin, Greenspan put, Gunnar Myrdal, Hernando de Soto, invisible hand, liberal capitalism, liberal world order, Mahbub ul Haq, market fundamentalism, Martin Wolf, Mercator projection, Mont Pelerin Society, Norbert Wiener, offshore financial centre, oil shock, open economy, pattern recognition, Paul Samuelson, Pearl River Delta, Philip Mirowski, power law, price mechanism, public intellectual, quantitative easing, random walk, rent control, rent-seeking, road to serfdom, Ronald Reagan, special economic zone, statistical model, Suez crisis 1956, systems thinking, tacit knowledge, The Chicago School, the market place, The Wealth of Nations by Adam Smith, theory of mind, Thomas L Friedman, trade liberalization, urban renewal, Washington Consensus, Wolfgang Streeck, zero-sum game

The four-­person team eventually expanded. New members include two other experts and active League economists—­Meade, an architect of GATT who had also played a key role in formulating Britain’s postwar full-­employment policies, and the Dutch econometrician Jan Tinbergen, who created the first macroeconomic statistical model of a national economy while at the League. They w ­ ere joined by Roberto Campos, a Brazilian economist who had been one of his nation’s delegates at Bretton Woods and the head of the Brazilian Development Bank, whose U.S.-­friendly policies had earned him the nickname “Bob Fields.”95 Another former League economist, Hans Staehle, had helped assem­ble the group.


pages: 560 words: 158,238

Fifty Degrees Below by Kim Stanley Robinson

airport security, bioinformatics, bread and circuses, Burning Man, carbon credits, carbon tax, clean water, DeepMind, Donner party, full employment, Intergovernmental Panel on Climate Change (IPCC), invisible hand, iterative process, Kim Stanley Robinson, means of production, minimum wage unemployment, North Sea oil, off-the-grid, Ralph Waldo Emerson, Richard Feynman, statistical model, Stephen Hawking, the scientific method

So actually, to have the idea of something broached without any subsequent repercussion is actually a kind of, what. A kind of inoculation for an event you don’t want investigated.” “Jesus. So how does it work, do you know?” “Not the technical details, no. I know they target certain counties in swing states. They use various statistical models and decision-tree algorithms to pick which ones, and how much to intervene.” “I’d like to see this algorithm.” “Yes, I thought you might.” She reached into her purse, pulled out a data disk in a paper sleeve. She handed it to him. “This is it.” “Whoah,” Frank said, staring at it. “And so . . .


pages: 589 words: 147,053

The Age of Em: Work, Love and Life When Robots Rule the Earth by Robin Hanson

8-hour work day, artificial general intelligence, augmented reality, Berlin Wall, bitcoin, blockchain, brain emulation, business cycle, business process, Clayton Christensen, cloud computing, correlation does not imply causation, creative destruction, deep learning, demographic transition, Erik Brynjolfsson, Ethereum, ethereum blockchain, experimental subject, fault tolerance, financial intermediation, Flynn Effect, Future Shock, Herman Kahn, hindsight bias, information asymmetry, job automation, job satisfaction, John Markoff, Just-in-time delivery, lone genius, Machinery of Freedom by David Friedman, market design, megaproject, meta-analysis, Nash equilibrium, new economy, Nick Bostrom, pneumatic tube, power law, prediction markets, quantum cryptography, rent control, rent-seeking, reversible computing, risk tolerance, Silicon Valley, smart contracts, social distancing, statistical model, stem cell, Thomas Malthus, trade route, Turing test, Tyler Cowen, Vernor Vinge, William MacAskill

Such a database would hardly be possible if the differing jobs within each of these 974 categories were not very similar. In fact, a factor analysis of 226 of these descriptors finds that the top four factors account for 75% of the variance in these descriptors, and the top 15 factors account for 91% of this variance (Lee 2011). Also, statistical models to predict the income and performance of workers usually have at most only a few dozen parameters. These analyses have mostly been about post-skill types, that is, about how workers differ after they have been trained to do particular tasks. Pre-skill types should vary even less than do post-skill types.


pages: 444 words: 151,136

Endless Money: The Moral Hazards of Socialism by William Baker, Addison Wiggin

Alan Greenspan, Andy Kessler, asset allocation, backtesting, bank run, banking crisis, Bear Stearns, Berlin Wall, Bernie Madoff, Black Swan, bond market vigilante , book value, Branko Milanovic, bread and circuses, break the buck, Bretton Woods, BRICs, business climate, business cycle, capital asset pricing model, carbon tax, commoditize, corporate governance, correlation does not imply causation, credit crunch, Credit Default Swap, crony capitalism, cuban missile crisis, currency manipulation / currency intervention, debt deflation, Elliott wave, en.wikipedia.org, Fall of the Berlin Wall, feminist movement, fiat currency, fixed income, floating exchange rates, foreign exchange controls, Fractional reserve banking, full employment, German hyperinflation, Great Leap Forward, housing crisis, income inequality, index fund, inflation targeting, Joseph Schumpeter, Kickstarter, laissez-faire capitalism, land bank, land reform, liquidity trap, Long Term Capital Management, lost cosmonauts, low interest rates, McMansion, mega-rich, military-industrial complex, Money creation, money market fund, moral hazard, mortgage tax deduction, naked short selling, negative equity, offshore financial centre, Ponzi scheme, price stability, proprietary trading, pushing on a string, quantitative easing, RAND corporation, rent control, rent stabilization, reserve currency, risk free rate, riskless arbitrage, Ronald Reagan, Savings and loan crisis, school vouchers, seigniorage, short selling, Silicon Valley, six sigma, statistical arbitrage, statistical model, Steve Jobs, stocks for the long run, Tax Reform Act of 1986, The Great Moderation, the scientific method, time value of money, too big to fail, Two Sigma, upwardly mobile, War on Poverty, Yogi Berra, young professional

Being hedged, the fund loses its correlation with the overall market. But due to the mathematics, its sensitivity to the change in covariance of its positions is magnified fourfold. Probably half of all statistical arbitrage funds that deployed this strategy have moved on to greener pastures. But the use of value-at-risk statistical models to control exposure in hedge funds or even for large pension funds that allocate between different asset types continues, and it is virtually a mandatory exercise for institutional managers. There is hardly a large pension plan that has not developed a PowerPoint presentation that boasts it realigned its investments to increase excess return (alpha) and also reduced risk (variance).


pages: 543 words: 153,550

Model Thinker: What You Need to Know to Make Data Work for You by Scott E. Page

Airbnb, Albert Einstein, Alfred Russel Wallace, algorithmic trading, Alvin Roth, assortative mating, behavioural economics, Bernie Madoff, bitcoin, Black Swan, blockchain, business cycle, Capital in the Twenty-First Century by Thomas Piketty, Checklist Manifesto, computer age, corporate governance, correlation does not imply causation, cuban missile crisis, data science, deep learning, deliberate practice, discrete time, distributed ledger, Easter island, en.wikipedia.org, Estimating the Reproducibility of Psychological Science, Everything should be made as simple as possible, experimental economics, first-price auction, Flash crash, Ford Model T, Geoffrey West, Santa Fe Institute, germ theory of disease, Gini coefficient, Higgs boson, High speed trading, impulse control, income inequality, Isaac Newton, John von Neumann, Kenneth Rogoff, knowledge economy, knowledge worker, Long Term Capital Management, loss aversion, low skilled workers, Mark Zuckerberg, market design, meta-analysis, money market fund, multi-armed bandit, Nash equilibrium, natural language processing, Network effects, opioid epidemic / opioid crisis, p-value, Pareto efficiency, pattern recognition, Paul Erdős, Paul Samuelson, phenotype, Phillips curve, power law, pre–internet, prisoner's dilemma, race to the bottom, random walk, randomized controlled trial, Richard Feynman, Richard Thaler, Robert Solow, school choice, scientific management, sealed-bid auction, second-price auction, selection bias, six sigma, social graph, spectrum auction, statistical model, Stephen Hawking, Supply of New York City Cabdrivers, systems thinking, tacit knowledge, The Bell Curve by Richard Herrnstein and Charles Murray, The Great Moderation, the long tail, The Rise and Fall of American Growth, the rule of 72, the scientific method, The Spirit Level, the strength of weak ties, The Wisdom of Crowds, Thomas Malthus, Thorstein Veblen, Tragedy of the Commons, urban sprawl, value at risk, web application, winner-take-all economy, zero-sum game

A successful auction design had to be immune to strategic manipulation, generate efficient outcomes, and be comprehensible to participants. The economists used game theory models to analyze whether features could be exploited by strategic bidders, computer simulation models to compare the efficiency of various designs, and statistical models to choose parameters for experiments with real people. The final design, a multiple-round auction that allowed participants to back out of bids and prohibited sitting out early periods to mask intentions, proved successful. Over the past thirty years, the FCC has raised nearly $60 billion using this type of auction.10 REDCAPE: Communicate By creating a common representation, models improve communication.


pages: 661 words: 156,009

Your Computer Is on Fire by Thomas S. Mullaney, Benjamin Peters, Mar Hicks, Kavita Philip

"Susan Fowler" uber, 2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, A Declaration of the Independence of Cyberspace, affirmative action, Airbnb, algorithmic bias, AlphaGo, AltaVista, Amazon Mechanical Turk, Amazon Web Services, American Society of Civil Engineers: Report Card, An Inconvenient Truth, Asilomar, autonomous vehicles, Big Tech, bitcoin, Bletchley Park, blockchain, Boeing 737 MAX, book value, British Empire, business cycle, business process, Californian Ideology, call centre, Cambridge Analytica, carbon footprint, Charles Babbage, cloud computing, collective bargaining, computer age, computer vision, connected car, corporate governance, corporate social responsibility, COVID-19, creative destruction, cryptocurrency, dark matter, data science, Dennis Ritchie, deskilling, digital divide, digital map, don't be evil, Donald Davies, Donald Trump, Edward Snowden, en.wikipedia.org, European colonialism, fake news, financial innovation, Ford Model T, fulfillment center, game design, gentrification, George Floyd, glass ceiling, global pandemic, global supply chain, Grace Hopper, hiring and firing, IBM and the Holocaust, industrial robot, informal economy, Internet Archive, Internet of things, Jeff Bezos, job automation, John Perry Barlow, Julian Assange, Ken Thompson, Kevin Kelly, Kickstarter, knowledge economy, Landlord’s Game, Lewis Mumford, low-wage service sector, M-Pesa, Mark Zuckerberg, mass incarceration, Menlo Park, meta-analysis, mobile money, moral panic, move fast and break things, Multics, mutually assured destruction, natural language processing, Neal Stephenson, new economy, Norbert Wiener, off-the-grid, old-boy network, On the Economy of Machinery and Manufactures, One Laptop per Child (OLPC), packet switching, pattern recognition, Paul Graham, pink-collar, pneumatic tube, postindustrial economy, profit motive, public intellectual, QWERTY keyboard, Ray Kurzweil, Reflections on Trusting Trust, Report Card for America’s Infrastructure, Salesforce, sentiment analysis, Sheryl Sandberg, Silicon Valley, Silicon Valley ideology, smart cities, Snapchat, speech recognition, SQL injection, statistical model, Steve Jobs, Stewart Brand, tacit knowledge, tech worker, techlash, technoutopianism, telepresence, the built environment, the map is not the territory, Thomas L Friedman, TikTok, Triangle Shirtwaist Factory, undersea cable, union organizing, vertical integration, warehouse robotics, WikiLeaks, wikimedia commons, women in the workforce, Y2K

It’s important, dare I say imperative, that policy makers think through the implications of what it will mean when this kind of predictive software is embedded in decision-making robotics, like artificial police officers or military personnel that will be programmed to make potentially life-or-death decisions based on statistical modeling and a recognition of certain patterns or behaviors in targeted populations. Crawford and Shultz warn that the use of predictive modeling through gathering data on the public also poses a serious threat to privacy; they argue for new frameworks of “data due process” that would allow individuals a right to appeal the use of their data profiles.17 This could include where a person moves about, how one is captured in modeling technologies, and use of surveillance data for use in big data projects for behavioral predictive modeling such as in predictive policing software: Moreover, the predictions that these policing algorithms make—that particular geographic areas are more likely to have crime—will surely produce more arrests in those areas by directing police to patrol them.


pages: 598 words: 150,801

Snakes and Ladders: The Great British Social Mobility Myth by Selina Todd

assortative mating, Bletchley Park, Boris Johnson, collective bargaining, conceptual framework, coronavirus, COVID-19, deindustrialization, deskilling, DIY culture, emotional labour, Etonian, fear of failure, feminist movement, financial independence, full employment, Gini coefficient, greed is good, housing crisis, income inequality, Jeremy Corbyn, Kickstarter, Mahatma Gandhi, manufacturing employment, meritocracy, Nick Leeson, offshore financial centre, old-boy network, profit motive, rent control, Right to Buy, school choice, social distancing, statistical model, The Home Computer Revolution, The Spirit Level, traveling salesman, unpaid internship, upwardly mobile, urban sprawl, women in the workforce, Yom Kippur War, young professional

Most significantly, upward mobility rose dramatically after the Second World War when all these countries increased room at the top, by investing public money in job creation and welfare measures like free education.5 Britain did not offer fewer opportunities to be upwardly mobile than societies that are popularly assumed to be less class-bound, such as the United States. Since the 1980s, upward mobility has declined in both Britain and the USA, due to the destruction of many secure, reasonably well-paid jobs and the decimation of welfare provision and social security.6 My focus on Britain reflects the aim of this book. This is not to construct a statistical ‘model’ of social mobility that enables measurements of and between large populations – as many valuable studies have already done. Rather, I explore the historically specific circumstances that made it possible and desirable for some people to climb the ladder, and caused others to slide down it.


pages: 553 words: 153,028

The Vortex: A True Story of History's Deadliest Storm, an Unspeakable War, and Liberation by Scott Carney, Jason Miklian

anti-communist, back-to-the-land, Bob Geldof, British Empire, clean water, cuban missile crisis, Donald Trump, en.wikipedia.org, hive mind, index card, Kickstarter, Live Aid, low earth orbit, Mahatma Gandhi, mutually assured destruction, Neil Armstrong, rolodex, South China Sea, statistical model

He adapted storm-surge models into NHC hurricane forecasts for the first time, and designed a system to simply and directly inform the public what was coming, how seriously they should take it, and what to do. Frank’s work formed the foundation of United States hurricane action plans that have warned the American people for over fifty years. Frank helped move the NHC from what seemed like an alchemy-based organization to a hard-science paradise. They used statistical modeling in hurricane tracking for the first time, and bought a secret weapon: a state-of-the-art mainframe with a brand-new terminal interface. The eight-thousand-pound machine took up an entire room, and the quantum leap in computational power meant that the NHC could forecast forty-eight or even seventy-two hours out instead of just twelve.


pages: 595 words: 143,394

Rigged: How the Media, Big Tech, and the Democrats Seized Our Elections by Mollie Hemingway

2021 United States Capitol attack, active measures, Affordable Care Act / Obamacare, Airbnb, Bernie Sanders, Big Tech, Black Lives Matter, coronavirus, corporate governance, COVID-19, critical race theory, defund the police, deplatforming, disinformation, Donald Trump, fake news, George Floyd, global pandemic, illegal immigration, inventory management, lab leak, lockdown, machine readable, Mahatma Gandhi, Mark Zuckerberg, military-industrial complex, obamacare, Oculus Rift, Paris climate accords, Ponzi scheme, power law, QR code, race to the bottom, Ronald Reagan, Silicon Valley, Snapchat, statistical model, tech billionaire, TikTok

Getting them to the voting booth seemed like a comparatively easy task. Facebook data was just the starting point. In 2012, Sasha Issenberg wrote The Victory Lab: The Secret Science of Winning Campaigns, which discusses in detail “cutting edge persuasion experiments, innovative ways to mobilize voters, and statistical models predicting the behavior of every voter in the country.”17 It soon became clear that campaigns were engaged in much more concerning behavior than merely adapting to the smartphone era. The Obama campaign released an app to the public to help Obama volunteers canvass their neighborhoods. The app contained detailed and intimate information about people’s political tendencies, such as partisan affiliation.


pages: 630 words: 174,171

Caliban's War by James S. A. Corey

clean water, gravity well, phenotype, sensible shoes, statistical model, systems thinking

There was a massive upwelling of elemental iron in the northern hemisphere that lasted fourteen hours. There has also been a series of volcanic eruptions. Since the planet doesn’t have any tectonic motion, we’re assuming the protomolecule is doing something in the mantle, but we can’t tell what. The brains put together a statistical model that shows the approximate energy output expected for the changes we’ve seen. It suggests that the overall level of activity is rising about three hundred percent per year over that last eighteen months.” The secretary-general nodded, his expression grave. It was almost as if he’d understood any part of what she’d said.


pages: 512 words: 165,704

Traffic: Why We Drive the Way We Do (And What It Says About Us) by Tom Vanderbilt

Albert Einstein, autonomous vehicles, availability heuristic, Berlin Wall, Boeing 747, call centre, cellular automata, Cesare Marchetti: Marchetti’s constant, cognitive dissonance, computer vision, congestion charging, congestion pricing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, Donald Shoup, endowment effect, extreme commuting, fundamental attribution error, Garrett Hardin, Google Earth, hedonic treadmill, Herman Kahn, hindsight bias, hive mind, human-factors engineering, if you build it, they will come, impulse control, income inequality, Induced demand, invisible hand, Isaac Newton, Jane Jacobs, John Nash: game theory, Kenneth Arrow, lake wobegon effect, loss aversion, megacity, Milgram experiment, Nash equilibrium, PalmPilot, power law, Sam Peltzman, Silicon Valley, SimCity, statistical model, the built environment, The Death and Life of Great American Cities, Timothy McVeigh, traffic fines, Tragedy of the Commons, traumatic brain injury, ultimatum game, urban planning, urban sprawl, women in the workforce, working poor

Michael Schreckenberg, the German physicist known as the “jam professor,” has worked with officials in North Rhine–Westphalia in Germany to provide real-time information, as well as “predictive” traffic forecasts. Like Inrix, if less extensively, they have assembled some 360,000 “fundamental diagrams,” or precise statistical models of the flow behavior of highway sections. They have a good idea of what happens on not only a “normal” day but on all the strange variations: weeks when a holiday falls on Wednesday, the first day there is ice on the road (most people, he notes, will not have yet put on winter tires), the first day of daylight savings time, when a normally light morning trip may occur in the dark.


pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, Anthropocene, anti-communist, artificial general intelligence, autism spectrum disorder, autonomous vehicles, backpropagation, barriers to entry, Bayesian statistics, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, Computing Machinery and Intelligence, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, Demis Hassabis, demographic transition, different worldview, Donald Knuth, Douglas Hofstadter, driverless car, Drosophila, Elon Musk, en.wikipedia.org, endogenous growth, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, general purpose technology, Geoffrey Hinton, Gödel, Escher, Bach, hallucination problem, Hans Moravec, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John Markoff, John von Neumann, knowledge worker, Large Hadron Collider, longitudinal study, machine translation, megaproject, Menlo Park, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Nick Bostrom, Norbert Wiener, NP-complete, nuclear winter, operational security, optical character recognition, paperclip maximiser, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, search costs, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, Strategic Defense Initiative, strong AI, superintelligent machines, supervolcano, synthetic biology, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, time dilation, Tragedy of the Commons, transaction costs, trolley problem, Turing machine, Vernor Vinge, WarGames: Global Thermonuclear War, Watson beat the top human players on Jeopardy!, World Values Survey, zero-sum game

Optical character recognition of handwritten and typewritten text is routinely used in applications such as mail sorting and digitization of old documents.66 Machine translation remains imperfect but is good enough for many applications. Early systems used the GOFAI approach of hand-coded grammars that had to be developed by skilled linguists from the ground up for each language. Newer systems use statistical machine learning techniques that automatically build statistical models from observed usage patterns. The machine infers the parameters for these models by analyzing bilingual corpora. This approach dispenses with linguists: the programmers building these systems need not even speak the languages they are working with.67 Face recognition has improved sufficiently in recent years that it is now used at automated border crossings in Europe and Australia.


Digital Accounting: The Effects of the Internet and Erp on Accounting by Ashutosh Deshmukh

accounting loophole / creative accounting, AltaVista, book value, business continuity plan, business intelligence, business logic, business process, call centre, computer age, conceptual framework, corporate governance, currency risk, data acquisition, disinformation, dumpster diving, fixed income, hypertext link, information security, interest rate swap, inventory management, iterative process, late fees, machine readable, money market fund, new economy, New Journalism, optical character recognition, packet switching, performance metric, profit maximization, semantic web, shareholder value, six sigma, statistical model, supply chain finance, supply-chain management, supply-chain management software, telemarketer, transaction costs, value at risk, vertical integration, warehouse automation, web application, Y2K

Business intelligence tools •Data extraction •Data transformation •Data load Business information warehouse ERP system Reports •Key performance measures •Ad-hoc queries •Business intelligence metadata •OLAP metadata Business intelligence tools OLAP Analysis •Business logic •Mathematical/statistical models •Data mining Executive dashboards Management dashboards Executive information systems Pre-packaged solutions •Planning and budgeting •Consolidations •Financial analytics •Abc/abm •Balanced scorecard •Corporate performance management These tools were soon superceded by specialized report writing tools and analytical tools, which now have evolved to a new category of Business Intelligence (BI) tools; Crystal Reports/Business Objects and Cognos are examples of leading software vendors in this area.


pages: 543 words: 157,991

All the Devils Are Here by Bethany McLean

Alan Greenspan, Asian financial crisis, asset-backed security, bank run, Bear Stearns, behavioural economics, Black-Scholes formula, Blythe Masters, break the buck, buy and hold, call centre, Carl Icahn, collateralized debt obligation, corporate governance, corporate raider, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, diversification, Dr. Strangelove, Exxon Valdez, fear of failure, financial innovation, fixed income, Glass-Steagall Act, high net worth, Home mortgage interest deduction, interest rate swap, junk bonds, Ken Thompson, laissez-faire capitalism, Long Term Capital Management, low interest rates, margin call, market bubble, market fundamentalism, Maui Hawaii, Michael Milken, money market fund, moral hazard, mortgage debt, Northern Rock, Own Your Own Home, Ponzi scheme, proprietary trading, quantitative trading / quantitative finance, race to the bottom, risk/return, Ronald Reagan, Rosa Parks, Savings and loan crisis, shareholder value, short selling, South Sea Bubble, statistical model, stock buybacks, tail risk, Tax Reform Act of 1986, telemarketer, the long tail, too big to fail, value at risk, zero-sum game

Merrill did a number of these deals with Magnetar. The performance of these CDOs can be summed up in one word: horrible. The essence of the ProPublica allegation is that Magnetar, like Paulson, was betting that “its” CDOs would implode. Magnetar denies that this was its intent and claims that its strategy was based on a “mathematical statistical model.” The firm says it would have done well regardless of the direction of the market. It almost doesn’t matter. The triple-As did blow up. You didn’t have to be John Paulson, picking out the securities you were then going to short, to make a fortune in this trade. Given that the CDOs referenced poorly underwritten subprime mortgages, they had to blow up, almost by definition.


pages: 708 words: 176,708

The WikiLeaks Files: The World According to US Empire by Wikileaks

affirmative action, anti-communist, banking crisis, battle of ideas, Boycotts of Israel, Bretton Woods, British Empire, capital controls, central bank independence, Chelsea Manning, colonial exploitation, colonial rule, corporate social responsibility, credit crunch, cuban missile crisis, Deng Xiaoping, drone strike, Edward Snowden, energy security, energy transition, European colonialism, eurozone crisis, experimental subject, F. W. de Klerk, facts on the ground, failed state, financial innovation, Food sovereignty, Francis Fukuyama: the end of history, full employment, future of journalism, high net worth, invisible hand, Julian Assange, Kickstarter, liberal world order, Mikhail Gorbachev, millennium bug, Mohammed Bouazizi, Monroe Doctrine, Nelson Mandela, no-fly zone, Northern Rock, nuclear ambiguity, Philip Mirowski, post-war consensus, RAND corporation, Ronald Reagan, Seymour Hersh, Silicon Valley, South China Sea, statistical model, Strategic Defense Initiative, structural adjustment programs, too big to fail, trade liberalization, trade route, UNCLOS, UNCLOS, uranium enrichment, vertical integration, Washington Consensus, WikiLeaks, zero-sum game, éminence grise

More than this, however, the disruption to the old oligarchic rule represented by Allende, and the Pinochet regime’s relative autonomy from the business class, enabled the dictatorship to restructure industry in such a way as to displace the dominance of old mining and industrial capital. This was part of a global trend, as investors everywhere felt shackled by the old statist models of development. They demanded the reorganization of industry, the freeing up of the financial sector, and the opening of international markets. In place of the old economic model of “import substitution,” protecting and developing the nation’s industries to overcome dependence on imports, a new model of “export-led growth” was implemented, in which domestic consumption was suppressed so that goods could be more profitably exported abroad.103 The WikiLeaks documents, taken together with previous historical findings, show us a US government immensely relieved by the Pinochet coup, and desperate to work with the new regime.


pages: 687 words: 189,243

A Culture of Growth: The Origins of the Modern Economy by Joel Mokyr

Andrei Shleifer, barriers to entry, Berlin Wall, business cycle, classic study, clockwork universe, cognitive dissonance, Copley Medal, creative destruction, David Ricardo: comparative advantage, delayed gratification, deliberate practice, Deng Xiaoping, Edmond Halley, Edward Jenner, epigenetics, Fellow of the Royal Society, financial independence, flying shuttle, framing effect, germ theory of disease, Haber-Bosch Process, Herbert Marcuse, hindsight bias, income inequality, information asymmetry, invention of movable type, invention of the printing press, invisible hand, Isaac Newton, Jacquard loom, Jacques de Vaucanson, James Watt: steam engine, Johannes Kepler, John Harrison: Longitude, Joseph Schumpeter, knowledge economy, labor-force participation, land tenure, law of one price, Menlo Park, moveable type in China, new economy, phenotype, price stability, principal–agent problem, rent-seeking, Republic of Letters, Robert Solow, Ronald Reagan, seminal paper, South Sea Bubble, statistical model, survivorship bias, tacit knowledge, the market place, the strength of weak ties, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, transaction costs, ultimatum game, World Values Survey, Wunderkammern

The obvious reason is that social knowledge depends on specialization, simply because the set of total knowledge is far too large for a single mind to comprehend. Complex social and physical processes are often impossible for laypersons to comprehend, yet the information may be essential to guide certain important behaviors. Subtle statistical models and sophisticated experimentation may be needed to discriminate between important hypotheses about, say, the effects of certain foods on human health or the causes of crime. Especially for propositional knowledge (the knowledge underpinning techniques in use), authorities and the division of knowledge are indispensable because such knowledge can operate effectively only if a fine subdivision of knowledge through specialization is practiced.


pages: 667 words: 186,968

The Great Influenza: The Story of the Deadliest Pandemic in History by John M. Barry

Albert Einstein, Brownian motion, centralized clearinghouse, conceptual framework, coronavirus, discovery of penicillin, double helix, Edward Jenner, Fellow of the Royal Society, germ theory of disease, index card, Louis Pasteur, Marshall McLuhan, Mason jar, means of production, scientific management, seminal paper, statistical model, the medium is the message, the scientific method, traveling salesman, women in the workforce

The CDC based that range, however, on different estimates of the effectiveness and availability of a vaccine and of the age groups most vulnerable to the virus. It did not factor in the most important determinant of deaths: the lethality of the virus itself. The CDC simply figured virulence by computing an average from the last three pandemics, those in 1918, 1957, and 1968. Yet two of those three real pandemics fall outside the range of the statistical model. The 1968 pandemic was less lethal than the best case scenario, and the 1918 pandemic was more lethal than the worst case scenario. After adjusting for population growth, the 1918 virus killed four times as many as the CDC’s worst case scenario, and medical advances cannot now significantly mitigate the killing impact of a virus that lethal.


pages: 584 words: 187,436

More Money Than God: Hedge Funds and the Making of a New Elite by Sebastian Mallaby

Alan Greenspan, Andrei Shleifer, Asian financial crisis, asset-backed security, automated trading system, bank run, barriers to entry, Bear Stearns, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Big bang: deregulation of the City of London, Bonfire of the Vanities, book value, Bretton Woods, business cycle, buy and hold, capital controls, Carmen Reinhart, collapse of Lehman Brothers, collateralized debt obligation, computerized trading, corporate raider, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, currency manipulation / currency intervention, currency peg, deal flow, do well by doing good, Elliott wave, Eugene Fama: efficient market hypothesis, failed state, Fall of the Berlin Wall, financial deregulation, financial engineering, financial innovation, financial intermediation, fixed income, full employment, German hyperinflation, High speed trading, index fund, Jim Simons, John Bogle, John Meriwether, junk bonds, Kenneth Rogoff, Kickstarter, Long Term Capital Management, low interest rates, machine translation, margin call, market bubble, market clearing, market fundamentalism, Market Wizards by Jack D. Schwager, Mary Meeker, merger arbitrage, Michael Milken, money market fund, moral hazard, Myron Scholes, natural language processing, Network effects, new economy, Nikolai Kondratiev, operational security, pattern recognition, Paul Samuelson, pre–internet, proprietary trading, public intellectual, quantitative hedge fund, quantitative trading / quantitative finance, random walk, Renaissance Technologies, Richard Thaler, risk-adjusted returns, risk/return, Robert Mercer, rolodex, Savings and loan crisis, Sharpe ratio, short selling, short squeeze, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical arbitrage, statistical model, survivorship bias, tail risk, technology bubble, The Great Moderation, The Myth of the Rational Market, the new new thing, too big to fail, transaction costs, two and twenty, uptick rule

See also Steven Drobny, Inside the House of Money: Top Hedge Fund Traders on Profiting in the Global Markets, (Hoboken, NJ: John Wiley & Sons, 2006), p. 174. 16. Wadhwani recalls, “Often it was the case that you were already using the input variables these guys were talking about, but you were perhaps using these input variables in a more naive way in your statistical model than the way they were actually using it.” Wadhwani interview. 17. Mahmood Pradhan, who worked with Wadhwani at Tudor, elaborates: “There are times when particular variables explain certain asset prices, and there are times when other things determine the price. So you need to understand when your model is working and when it isn’t.


pages: 651 words: 180,162

Antifragile: Things That Gain From Disorder by Nassim Nicholas Taleb

"World Economic Forum" Davos, Air France Flight 447, Alan Greenspan, Andrei Shleifer, anti-fragile, banking crisis, Benoit Mandelbrot, Berlin Wall, biodiversity loss, Black Swan, business cycle, caloric restriction, caloric restriction, Chuck Templeton: OpenTable:, commoditize, creative destruction, credit crunch, Daniel Kahneman / Amos Tversky, David Ricardo: comparative advantage, discrete time, double entry bookkeeping, Emanuel Derman, epigenetics, fail fast, financial engineering, financial independence, Flash crash, flying shuttle, Gary Taubes, George Santayana, Gini coefficient, Helicobacter pylori, Henri Poincaré, Higgs boson, high net worth, hygiene hypothesis, Ignaz Semmelweis: hand washing, informal economy, invention of the wheel, invisible hand, Isaac Newton, James Hargreaves, Jane Jacobs, Jim Simons, joint-stock company, joint-stock limited liability company, Joseph Schumpeter, Kenneth Arrow, knowledge economy, language acquisition, Lao Tzu, Long Term Capital Management, loss aversion, Louis Pasteur, mandelbrot fractal, Marc Andreessen, Mark Spitznagel, meta-analysis, microbiome, money market fund, moral hazard, mouse model, Myron Scholes, Norbert Wiener, pattern recognition, Paul Samuelson, placebo effect, Ponzi scheme, Post-Keynesian economics, power law, principal–agent problem, purchasing power parity, quantitative trading / quantitative finance, Ralph Nader, random walk, Ray Kurzweil, rent control, Republic of Letters, Ronald Reagan, Rory Sutherland, Rupert Read, selection bias, Silicon Valley, six sigma, spinning jenny, statistical model, Steve Jobs, Steven Pinker, Stewart Brand, stochastic process, stochastic volatility, synthetic biology, tacit knowledge, tail risk, Thales and the olive presses, Thales of Miletus, The Great Moderation, the new new thing, The Wealth of Nations by Adam Smith, Thomas Bayes, Thomas Malthus, too big to fail, transaction costs, urban planning, Vilfredo Pareto, Yogi Berra, Zipf's Law

Franklin, James, 2001, The Science of Conjecture: Evidence and Probability Before Pascal. Baltimore: Johns Hopkins University Press. Freedman, D. A., and D. B. Petitti, 2001, “Salt and Blood Pressure: Conventional Wisdom Reconsidered.” Evaluation Review 25(3): 267–287. Freedman, D., D. Collier, et al., 2010, Statistical Models and Causal Inference: A Dialogue with the Social Sciences. Cambridge: Cambridge University Press. Freeman, C., and L. Soete, 1997, The Economics of Industrial Innovation. London: Routledge. Freidson, Eliot, 1970, Profession of Medicine: A Study of the Sociology of Applied Knowledge. Chicago: University of Chicago Press.


pages: 652 words: 172,428

Aftershocks: Pandemic Politics and the End of the Old International Order by Colin Kahl, Thomas Wright

"World Economic Forum" Davos, 2021 United States Capitol attack, banking crisis, Berlin Wall, biodiversity loss, Black Lives Matter, Boris Johnson, British Empire, Carmen Reinhart, centre right, Charles Lindbergh, circular economy, citizen journalism, clean water, collapse of Lehman Brothers, colonial rule, contact tracing, contact tracing app, coronavirus, COVID-19, creative destruction, cuban missile crisis, deglobalization, digital rights, disinformation, Donald Trump, drone strike, eurozone crisis, failed state, fake news, Fall of the Berlin Wall, fear of failure, future of work, George Floyd, German hyperinflation, Gini coefficient, global pandemic, global supply chain, global value chain, income inequality, industrial robot, informal economy, Intergovernmental Panel on Climate Change (IPCC), Internet of things, it's over 9,000, job automation, junk bonds, Kibera, lab leak, liberal world order, lockdown, low interest rates, Mahatma Gandhi, Martin Wolf, mass immigration, megacity, mobile money, oil shale / tar sands, oil shock, one-China policy, open borders, open economy, Paris climate accords, public intellectual, Ronald Reagan, social distancing, South China Sea, spice trade, statistical model, subprime mortgage crisis, W. E. B. Du Bois, World Values Survey, zoonotic diseases

Research has consistently identified several indicators associated with higher levels of state fragility and civil strife, including poor health, low per capita income, economic vulnerability produced by dependence on oil and other natural resources, low levels of international trade, government discrimination, democratic backsliding, and instability in neighboring countries—and all of these were exacerbated by the pandemic. In July, for example, a group of conflict researchers at the University of Denver’s Korbel School of International Studies updated a statistical model of internal war to include the possible effects of COVID-19. Prior to the pandemic, their statistical simulation—which incorporated a wide array of human and social development indicators—predicted that the number of armed conflicts around the world would plateau or even decline starting in 2020 and continue on that path through the remainder of the decade.


pages: 701 words: 199,010

The Crisis of Crowding: Quant Copycats, Ugly Models, and the New Crash Normal by Ludwig B. Chincarini

affirmative action, Alan Greenspan, asset-backed security, automated trading system, bank run, banking crisis, Basel III, Bear Stearns, Bernie Madoff, Black-Scholes formula, Bob Litterman, business cycle, buttonwood tree, Carmen Reinhart, central bank independence, collapse of Lehman Brothers, collateralized debt obligation, collective bargaining, corporate governance, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, currency risk, delta neutral, discounted cash flows, diversification, diversified portfolio, family office, financial engineering, financial innovation, financial intermediation, fixed income, Flash crash, full employment, Gini coefficient, Glass-Steagall Act, global macro, high net worth, hindsight bias, housing crisis, implied volatility, income inequality, interest rate derivative, interest rate swap, John Meriwether, Kickstarter, liquidity trap, London Interbank Offered Rate, Long Term Capital Management, low interest rates, low skilled workers, managed futures, margin call, market design, market fundamentalism, merger arbitrage, Mexican peso crisis / tequila crisis, Mitch Kapor, money market fund, moral hazard, mortgage debt, Myron Scholes, National best bid and offer, negative equity, Northern Rock, Occupy movement, oil shock, price stability, proprietary trading, quantitative easing, quantitative hedge fund, quantitative trading / quantitative finance, Ralph Waldo Emerson, regulatory arbitrage, Renaissance Technologies, risk free rate, risk tolerance, risk-adjusted returns, Robert Shiller, Ronald Reagan, Sam Peltzman, Savings and loan crisis, Sharpe ratio, short selling, sovereign wealth fund, speech recognition, statistical arbitrage, statistical model, survivorship bias, systematic trading, tail risk, The Great Moderation, too big to fail, transaction costs, value at risk, yield curve, zero-coupon bond

TABLE 15.4 Annualized Returns of Hedge Fund Strategies and Major Indices Notes 1. His stay as president of the large hedge fund Paloma Partners was short lived, and he eventually teamed up with LTCM alum Robert Shustak and the fund’s former controller, Bruce Wilson, to start Quantitative Alternatives LLC, in Rye Brook, New York. Their plan was to use statistical models for trading strategies much like those employed by LTCM. The fund never raised enough funds, and the three partners folded the operation at the end of 2008. Rosenfeld is now retired, but teaches part-time at MIT’s Sloan School of Management. 2. This phenomenon was discussed in Chapter 9. 3.


Cultural Backlash: Trump, Brexit, and Authoritarian Populism by Pippa Norris, Ronald Inglehart

affirmative action, Affordable Care Act / Obamacare, bank run, banking crisis, Berlin Wall, Bernie Sanders, Black Lives Matter, Boris Johnson, Brexit referendum, Cass Sunstein, centre right, classic study, cognitive dissonance, conceptual framework, declining real wages, desegregation, digital divide, Donald Trump, eurozone crisis, fake news, Fall of the Berlin Wall, feminist movement, first-past-the-post, illegal immigration, immigration reform, income inequality, It's morning again in America, Jeremy Corbyn, job automation, knowledge economy, labor-force participation, land reform, liberal world order, longitudinal study, low skilled workers, machine readable, mass immigration, meta-analysis, obamacare, open borders, open economy, opioid epidemic / opioid crisis, Paris climate accords, post-industrial society, post-materialism, precariat, purchasing power parity, rising living standards, Ronald Reagan, sexual politics, Silicon Valley, statistical model, stem cell, Steve Bannon, War on Poverty, white flight, winner-take-all economy, women in the workforce, working-age population, World Values Survey, zero-sum game

New York: Palgrave Macmillan. Golder, Matthew. 2003. ‘Explaining variation in the success of extreme right parties in Western Europe.’ Comparative Political Studies, 36(4): 432–466. 2016. ‘Far right parties in Europe.’ Annual Review of Political Science, 19: 477–497. Goldstein, Harvey. 1995. Multilevel Statistical Models. 3rd Edn. New York: Halstead Press. Golsan, Richard J. Ed. 1995. Fascism’s Return: Scandal, Revision and Ideology since 1980. Lincoln, NE: University of Nebraska Press. Goodhart, David. 2017. The Road to Somewhere: The Populist Revolt and the Future of Politics. London: Hurst & Company. Goodwin, Matthew J. 2006.


pages: 685 words: 203,949

The Organized Mind: Thinking Straight in the Age of Information Overload by Daniel J. Levitin

Abraham Maslow, airport security, Albert Einstein, Amazon Mechanical Turk, Anton Chekhov, autism spectrum disorder, Bayesian statistics, behavioural economics, big-box store, business process, call centre, Claude Shannon: information theory, cloud computing, cognitive bias, cognitive load, complexity theory, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, data science, deep learning, delayed gratification, Donald Trump, en.wikipedia.org, epigenetics, Eratosthenes, Exxon Valdez, framing effect, friendly fire, fundamental attribution error, Golden Gate Park, Google Glasses, GPS: selective availability, haute cuisine, How many piano tuners are there in Chicago?, human-factors engineering, if you see hoof prints, think horses—not zebras, impulse control, index card, indoor plumbing, information retrieval, information security, invention of writing, iterative process, jimmy wales, job satisfaction, Kickstarter, language acquisition, Lewis Mumford, life extension, longitudinal study, meta-analysis, more computing power than Apollo, Network effects, new economy, Nicholas Carr, optical character recognition, Pareto efficiency, pattern recognition, phenotype, placebo effect, pre–internet, profit motive, randomized controlled trial, Rubik’s Cube, Salesforce, shared worldview, Sheryl Sandberg, Skype, Snapchat, social intelligence, statistical model, Steve Jobs, supply-chain management, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Thomas Bayes, traumatic brain injury, Turing test, Twitter Arab Spring, ultimatum game, Wayback Machine, zero-sum game

Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? Proceedings of the 5th USENIX Conference on File and Storage Technologies (FAST), Mountain View, CA. Retrieved from http://www.pdl.cmu.edu/ftp/Failure/failure-fast07.pdf See also: He, Z., Yang, H., & Xie, M. (2012, October). Statistical modeling and analysis of hard disk drives (HDDs) failure. Institute of Electrical and Electronics Engineers APMRC, pp. 1–2. suffer a disk failure within two years Vishwanath, K. V., & Nagappan, N. (2010). Characterizing cloud computing hardware reliability. In Proceedings of the 1st ACM symposium on cloud computing.


pages: 691 words: 203,236

Whiteshift: Populism, Immigration and the Future of White Majorities by Eric Kaufmann

4chan, Abraham Maslow, affirmative action, Amazon Mechanical Turk, anti-communist, anti-globalists, augmented reality, battle of ideas, behavioural economics, Berlin Wall, Bernie Sanders, Boris Johnson, Brexit referendum, British Empire, centre right, Chelsea Manning, cognitive dissonance, complexity theory, corporate governance, correlation does not imply causation, critical race theory, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, David Brooks, deindustrialization, demographic transition, Donald Trump, Elon Musk, en.wikipedia.org, facts on the ground, failed state, fake news, Fall of the Berlin Wall, first-past-the-post, Francis Fukuyama: the end of history, gentrification, Great Leap Forward, Haight Ashbury, Herbert Marcuse, illegal immigration, immigration reform, imperial preference, income inequality, it's over 9,000, Jeremy Corbyn, knowledge economy, knowledge worker, liberal capitalism, longitudinal study, Lyft, mass immigration, meta-analysis, microaggression, moral panic, Nate Silver, New Urbanism, Norman Mailer, open borders, open immigration, opioid epidemic / opioid crisis, Overton Window, phenotype, postnationalism / post nation state, Ralph Waldo Emerson, Republic of Letters, Ronald Reagan, Scientific racism, Silicon Valley, Social Justice Warrior, statistical model, Steve Bannon, Steven Pinker, the built environment, the scientific method, The Wisdom of Crowds, transcontinental railway, twin studies, uber lyft, upwardly mobile, urban sprawl, W. E. B. Du Bois, Washington Consensus, white flight, working-age population, World Values Survey, young professional

Naturally there are exceptions like Brixton in London or Brooklyn, New York, where gentrification has taken place. This shows up as the line of dots on the left side of the American graph where there is a spike of places that were less than 10 per cent white in 2000 but had rapid white growth in the 2000s. Still, the overwhelming story, which the statistical models tell, is one in which whites are moving towards the most heavily white neighbourhoods. An identical pattern can be found in Stockholm neighbourhoods in the 1990s, and appears to hold within many American cities.31 We see it as well in urban British Columbia and Ontario, Canada, in figure 9.5.


pages: 741 words: 199,502

Human Diversity: The Biology of Gender, Race, and Class by Charles Murray

23andMe, affirmative action, Albert Einstein, Alfred Russel Wallace, Asperger Syndrome, assortative mating, autism spectrum disorder, basic income, behavioural economics, bioinformatics, Cass Sunstein, correlation coefficient, CRISPR, Daniel Kahneman / Amos Tversky, dark triade / dark tetrad, domesticated silver fox, double helix, Drosophila, emotional labour, epigenetics, equal pay for equal work, European colonialism, feminist movement, glass ceiling, Gregor Mendel, Gunnar Myrdal, income inequality, Kenneth Arrow, labor-force participation, longitudinal study, meritocracy, meta-analysis, nudge theory, out of africa, p-value, phenotype, public intellectual, publication bias, quantitative hedge fund, randomized controlled trial, Recombinant DNA, replication crisis, Richard Thaler, risk tolerance, school vouchers, Scientific racism, selective serotonin reuptake inhibitor (SSRI), Silicon Valley, Skinner box, social intelligence, Social Justice Warrior, statistical model, Steven Pinker, The Bell Curve by Richard Herrnstein and Charles Murray, the scientific method, The Wealth of Nations by Adam Smith, theory of mind, Thomas Kuhn: the structure of scientific revolutions, twin studies, universal basic income, working-age population

“Evolutionary Framework for Identifying Sex-and Species-Specific Vulnerabilities in Brain Development and Functions.” Journal of Neuroscience Research 95 (1–2): 355–61. Geddes, Patrick, and J. Arthur Thomson. 1889. The Evolution of Sex. New York: Humboldt Publishing. Gelman, Andrew. 2018. “You Need 16 Times the Sample Size to Estimate an Interaction Than to Estimate a Main Effect.” Statistical Modeling, Causal Inference, and Social Science (March 15). Geschwind, Norman, and Albert M. Galaburda. 1985. “Cerebral Lateralization, Biological Mechanisms, Associations, and Pathology: I. A Hypothesis and a Program for Research.” Archive of Neurology 42 (5): 428–59. Giedd, Jay N., Armin Raznahan, Aaron Alexander-Bloch et al. 2014.


pages: 1,409 words: 205,237

Architecting Modern Data Platforms: A Guide to Enterprise Hadoop at Scale by Jan Kunigk, Ian Buss, Paul Wilkinson, Lars George

Amazon Web Services, barriers to entry, bitcoin, business intelligence, business logic, business process, cloud computing, commoditize, computer vision, continuous integration, create, read, update, delete, data science, database schema, Debian, deep learning, DevOps, domain-specific language, fault tolerance, Firefox, FOSDEM, functional programming, Google Chrome, Induced demand, information security, Infrastructure as a Service, Internet of things, job automation, Kickstarter, Kubernetes, level 1 cache, loose coupling, microservices, natural language processing, Network effects, platform as a service, single source of truth, source of truth, statistical model, vertical integration, web application

Another common challenge when transitioning analytics problems to the Hadoop realm is that analysts need to master the various data formats that are used in Hadoop, since, for example, the previously dominant model of cubing data is almost never used in Hadoop. Data scientists typically also need extensive experience with SQL as a tool to drill down into the datasets that they require to build statistical models, via SparkSQL, Hive, or Impala. Machine learning and deep learning Simply speaking, machine learning is where the rubber of big data analytics hits the road. While certainly a hyped term, machine learning goes beyond classic statistics, with more advanced algorithms that predict an outcome by learning from the data—often without explicitly being programmed.


pages: 1,294 words: 210,361

The Emperor of All Maladies: A Biography of Cancer by Siddhartha Mukherjee

Apollo 11, Barry Marshall: ulcers, belling the cat, conceptual framework, discovery of penicillin, experimental subject, government statistician, Great Leap Forward, Gregor Mendel, Helicobacter pylori, iterative process, Joan Didion, life extension, longitudinal study, Louis Pasteur, medical residency, meta-analysis, mouse model, New Journalism, phenotype, Plato's cave, randomized controlled trial, Recombinant DNA, Robert Mercer, scientific mainstream, Silicon Valley, social contagion, social web, statistical model, stem cell, women in the workforce, Year of Magical Thinking, éminence grise

Whose victory was this—a victory of prevention or of therapeutic intervention?* Berry’s answer was a long-due emollient to a field beset by squabbles between the advocates of prevention and the proponents of chemotherapy. When Berry assessed the effect of each intervention independently using statistical models, it was a satisfying tie: both cancer prevention and chemotherapy had diminished breast cancer mortality equally—12 percent for mammography and 12 percent for chemotherapy, adding up to the observed 24 percent reduction in mortality. “No one,” as Berry said, paraphrasing the Bible, “had labored in vain.”


pages: 843 words: 223,858

The Rise of the Network Society by Manuel Castells

air traffic controllers' union, Alan Greenspan, Apple II, Asian financial crisis, barriers to entry, Big bang: deregulation of the City of London, Bob Noyce, borderless world, British Empire, business cycle, capital controls, classic study, complexity theory, computer age, Computer Lib, computerized trading, content marketing, creative destruction, Credit Default Swap, declining real wages, deindustrialization, delayed gratification, dematerialisation, deskilling, digital capitalism, digital divide, disintermediation, double helix, Douglas Engelbart, Douglas Engelbart, edge city, experimental subject, export processing zone, Fairchild Semiconductor, financial deregulation, financial independence, floating exchange rates, future of work, gentrification, global village, Gunnar Myrdal, Hacker Ethic, hiring and firing, Howard Rheingold, illegal immigration, income inequality, independent contractor, Induced demand, industrial robot, informal economy, information retrieval, intermodal, invention of the steam engine, invention of the telephone, inventory management, Ivan Sutherland, James Watt: steam engine, job automation, job-hopping, John Markoff, John Perry Barlow, Kanban, knowledge economy, knowledge worker, labor-force participation, laissez-faire capitalism, Leonard Kleinrock, longitudinal study, low skilled workers, manufacturing employment, Marc Andreessen, Marshall McLuhan, means of production, megacity, Menlo Park, military-industrial complex, moral panic, new economy, New Urbanism, offshore financial centre, oil shock, open economy, packet switching, Pearl River Delta, peer-to-peer, planetary scale, popular capitalism, popular electronics, post-Fordism, post-industrial society, Post-Keynesian economics, postindustrial economy, prediction markets, Productivity paradox, profit maximization, purchasing power parity, RAND corporation, Recombinant DNA, Robert Gordon, Robert Metcalfe, Robert Solow, seminal paper, Shenzhen special economic zone , Shoshana Zuboff, Silicon Valley, Silicon Valley startup, social software, South China Sea, South of Market, San Francisco, special economic zone, spinning jenny, statistical model, Steve Jobs, Steve Wozniak, Strategic Defense Initiative, tacit knowledge, technological determinism, Ted Nelson, the built environment, the medium is the message, the new new thing, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, total factor productivity, trade liberalization, transaction costs, urban renewal, urban sprawl, vertical integration, work culture , zero-sum game

What matters for our research purposes are two teachings from this fundamental experience of interrupted technological development: on the one hand, the state can be, and has been in history, in China and elsewhere, a leading force for technological innovation; on the other hand, precisely because of this, when the state reverses its interest in technological development, or becomes unable to perform it under new conditions, a statist model of innovation leads to stagnation, because of the sterilization of society’s autonomous innovative energy to create and apply technology. That the Chinese state could, centuries later, build anew an advanced technological basis, in nuclear technology, missiles, satellite launching, and electronics,13 demonstrates again the emptiness of a predominantly cultural interpretation of technological development and backwardness: the same culture may induce very different technological trajectories depending on the pattern of relationships between state and society.


Engineering Security by Peter Gutmann

active measures, address space layout randomization, air gap, algorithmic trading, Amazon Web Services, Asperger Syndrome, bank run, barriers to entry, bitcoin, Brian Krebs, business process, call centre, card file, cloud computing, cognitive bias, cognitive dissonance, cognitive load, combinatorial explosion, Credit Default Swap, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, Debian, domain-specific language, Donald Davies, Donald Knuth, double helix, Dr. Strangelove, Dunning–Kruger effect, en.wikipedia.org, endowment effect, false flag, fault tolerance, Firefox, fundamental attribution error, George Akerlof, glass ceiling, GnuPG, Google Chrome, Hacker News, information security, iterative process, Jacob Appelbaum, Jane Jacobs, Jeff Bezos, John Conway, John Gilmore, John Markoff, John von Neumann, Ken Thompson, Kickstarter, lake wobegon effect, Laplace demon, linear programming, litecoin, load shedding, MITM: man-in-the-middle, Multics, Network effects, nocebo, operational security, Paradox of Choice, Parkinson's law, pattern recognition, peer-to-peer, Pierre-Simon Laplace, place-making, post-materialism, QR code, quantum cryptography, race to the bottom, random walk, recommendation engine, RFID, risk tolerance, Robert Metcalfe, rolling blackouts, Ruby on Rails, Sapir-Whorf hypothesis, Satoshi Nakamoto, security theater, semantic web, seminal paper, Skype, slashdot, smart meter, social intelligence, speech recognition, SQL injection, statistical model, Steve Jobs, Steven Pinker, Stuxnet, sunk-cost fallacy, supply-chain attack, telemarketer, text mining, the built environment, The Death and Life of Great American Cities, The Market for Lemons, the payments system, Therac-25, too big to fail, Tragedy of the Commons, Turing complete, Turing machine, Turing test, Wayback Machine, web application, web of trust, x509 certificate, Y2K, zero day, Zimmermann PGP

Purdue professor Gene Spafford thinks this that may have its origins in work done with a standalone US Department of Defence (DoD) mainframe system for which the administrators calculated that their mainframe could brute-force a password in x days and so a period slightly less than this was set as the passwordchange interval [79]. Like the ubiquitous “Kilroy was here” there are various other explanations floating around for the origins of this requirement, but in truth no-one really knows for sure where it came from. In fact the conclusion of the sole documented statistical modelling of password change, carried out in late 2006, is that changing passwords doesn’t really matter (the analysis takes a number of different variables into account rather than just someone’s estimate of what a DoD mainframe may have done in the 1960s, for the full details see the original article) [80] Even if we don't know where the password-change requirement really originated, we do know the effect that it has.

This means that the chance of compromise for a certificate with a lifetime of one year is 0.002%. With a rather longer five-year lifetime it’s 0.01%, and with a ten-year lifetime it’s 0.02% (remember that this is a simplified model used to illustrate a point, since in practice it’s possible to argue endlessly over the sort of statistical model that you’d use for key compromise and we have next to no actual data on when actual key compromises occur since they’re so infrequent). In any case though, in those ten years of using the same key how many security holes and breaches do you think will be found in the web site that don’t involve the site’s private key?


Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

Stream processing is similar, but it extends operators to allow managed, fault-tolerant state (see “Rebuilding state after a failure” on page 478). The principle of deterministic functions with well-defined inputs and outputs is not only good for fault tolerance (see “Idempotence” on page 478), but also simplifies reasoning about the dataflows in an organization [7]. No matter whether the derived data is a search index, a statistical model, or a cache, it is helpful to think in terms of data pipelines that derive one thing from another, pushing state changes in one sys‐ tem through functional application code and applying the effects to derived systems. In principle, derived data systems could be maintained synchronously, just like a relational database updates secondary indexes synchronously within the same trans‐ action as writes to the table being indexed.


pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil

additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business cycle, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Charles Babbage, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, coronavirus, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, digital divide, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, functional programming, George Gilder, Gödel, Escher, Bach, Hans Moravec, hype cycle, informal economy, information retrieval, information security, invention of the telephone, invention of the telescope, invention of writing, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Marshall McLuhan, Mikhail Gorbachev, Mitch Kapor, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Nick Bostrom, Norbert Wiener, oil shale / tar sands, optical character recognition, PalmPilot, pattern recognition, phenotype, power law, precautionary principle, premature optimization, punch-card reader, quantum cryptography, quantum entanglement, radical life extension, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Robert Metcalfe, Rodney Brooks, scientific worldview, Search for Extraterrestrial Intelligence, selection bias, semantic web, seminal paper, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, Stuart Kauffman, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, two and twenty, Vernor Vinge, Y2K, Yogi Berra

Franz Josef Och, a computer scientist at the University of Southern California, has developed a technique that can generate a new language-translation system between any pair of languages in a matter of hours or days.209 All he needs is a "Rosetta stone"—that is, text in one language and the translation of that text in the other language—although he needs millions of words of such translated text. Using a self-organizing technique, the system is able to develop its own statistical models of how text is translated from one language to the other and develops these models in both directions. This contrasts with other translation systems, in which linguists painstakingly code grammar rules with long lists of exceptions to each rule. Och's system recently received the highest score in a competition of translation systems conducted by the U.S.


pages: 753 words: 233,306

Collapse by Jared Diamond

biodiversity loss, Biosphere 2, California energy crisis, classic study, clean water, colonial rule, correlation does not imply causation, cuban missile crisis, Donner party, Easter island, European colonialism, Exxon Valdez, Garrett Hardin, Great Leap Forward, illegal immigration, job satisfaction, low interest rates, means of production, Medieval Warm Period, megaproject, new economy, North Sea oil, Piper Alpha, polynesian navigation, prisoner's dilemma, South Sea Bubble, statistical model, Stewart Brand, Thomas Malthus, Timothy McVeigh, trade route, Tragedy of the Commons, transcontinental railway, unemployed young men

All eight of those variables make Easter susceptible to deforestation. Easter's volcanoes are of moderate age (probably 200,000 to 600,000 years); Easter's Poike Peninsula, its oldest volcano, was the first part of Easter to become deforested and exhibits the worst soil erosion today. Combining the effects of all those variables, Barry's and my statistical model predicted that Easter, Nihoa, and Necker should be the worst deforested Pacific islands. That agrees with what actually happened: Nihoa and Necker ended up with no human left alive and with only one tree species standing (Nihoa's palm), while Easter ended up with no tree species standing and with about 90% of its former population gone.


pages: 1,072 words: 237,186

How to Survive a Pandemic by Michael Greger, M.D., FACLM

"Hurricane Katrina" Superdome, Anthropocene, coronavirus, COVID-19, data science, double helix, Edward Jenner, friendly fire, global pandemic, global supply chain, global village, Helicobacter pylori, inventory management, Kickstarter, lockdown, mass immigration, megacity, meta-analysis, New Journalism, out of africa, Peace of Westphalia, phenotype, profit motive, RAND corporation, randomized controlled trial, Ronald Reagan, Saturday Night Live, social distancing, statistical model, stem cell, supply-chain management, the medium is the message, Westphalian system, Y2K, Yogi Berra, zoonotic diseases

Teenagers and health workers were found to be the prime violators of quarantine rules in Toronto during the SARS outbreak in 2003.805 Stating the obvious in regard to the difference between stopping the 1997 Hong Kong outbreak among chickens and stopping a human outbreak, experts have written, “Slaughter and quarantine of people is not an option.”806 “Even if it was possible to cordon off a city,” noted then Center for Biosecurity’s O’Toole, “that is not going to contain influenza.”807 Based on failed historical attempts along with contemporary statistical models,808 influenza experts are confident that efforts at quarantine “simply will not work.”809 Experts consider quarantine efforts “doomed to fail”810 because of the extreme contagiousness of influenza,811 which is a function of its incubation period and mode of transmission. SARS, in retrospect, was an easy virus to contain because people essentially became symptomatic before they became infectious.812 People showed signs of the disease before they could efficiently spread it, so tools like thermal image scanners at airports to detect fever or screening those with a cough could potentially stem the spread of the disease.813 The influenza virus, however, gets a head start.


pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Infrastructure as a Service, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

Stream processing is similar, but it extends operators to allow managed, fault-tolerant state (see “Rebuilding state after a failure”). The principle of deterministic functions with well-defined inputs and outputs is not only good for fault tolerance (see “Idempotence”), but also simplifies reasoning about the dataflows in an organization [7]. No matter whether the derived data is a search index, a statistical model, or a cache, it is helpful to think in terms of data pipelines that derive one thing from another, pushing state changes in one system through functional application code and applying the effects to derived systems. In principle, derived data systems could be maintained synchronously, just like a relational database updates secondary indexes synchronously within the same transaction as writes to the table being indexed.


pages: 944 words: 243,883

Private Empire: ExxonMobil and American Power by Steve Coll

addicted to oil, Alan Greenspan, An Inconvenient Truth, anti-communist, Atul Gawande, banking crisis, Benchmark Capital, Berlin Wall, call centre, carbon footprint, carbon tax, clean water, collapse of Lehman Brothers, company town, corporate governance, corporate social responsibility, decarbonisation, disinformation, energy security, European colonialism, Evgeny Morozov, Exxon Valdez, failed state, Fall of the Berlin Wall, financial engineering, Global Witness, Google Earth, Great Leap Forward, hydraulic fracturing, hydrogen economy, Ida Tarbell, illegal immigration, income inequality, industrial robot, Intergovernmental Panel on Climate Change (IPCC), inventory management, kremlinology, market fundamentalism, McMansion, medical malpractice, Mikhail Gorbachev, oil shale / tar sands, oil shock, peak oil, place-making, Ponzi scheme, precautionary principle, price mechanism, profit maximization, profit motive, Ronald Reagan, Saturday Night Live, Scramble for Africa, shareholder value, Silicon Valley, smart meter, statistical model, Steve Jobs, two and twenty, WikiLeaks

They would come from “all walks of life,” such as business, government, and the media, and they would be “aware of, and concerned about, the current debate and issues surrounding the world energy resources/use as well as climate change.” The ideal audience would be “open-minded,” as well as “information hungry” and “socially responsible.” The characteristics of the elites ExxonMobil sought to educate were derived in part from statistical modeling that Ken Cohen’s public affairs department had commissioned in the United States and Europe, to understand in greater depth the corporation’s reputation among opinion leaders. That model had allowed Cohen and his colleagues to forecast how elites would react to particular statements that ExxonMobil might make or actions it might take.


pages: 801 words: 242,104

Collapse: How Societies Choose to Fail or Succeed by Jared Diamond

biodiversity loss, Biosphere 2, California energy crisis, classic study, clean water, colonial rule, correlation does not imply causation, cuban missile crisis, Donner party, Easter island, European colonialism, Exxon Valdez, Garrett Hardin, Great Leap Forward, illegal immigration, job satisfaction, low interest rates, means of production, Medieval Warm Period, megaproject, new economy, North Sea oil, Piper Alpha, polynesian navigation, profit motive, South Sea Bubble, statistical model, Stewart Brand, Thomas Malthus, Timothy McVeigh, trade route, Tragedy of the Commons, transcontinental railway, unemployed young men

All eight of those variables make Easter susceptible to deforestation. Easter’s volcanoes are of moderate age (probably 200,000 to 600,000 years); Easter’s Poike Peninsula, its oldest volcano, was the first part of Easter to become deforested and exhibits the worst soil erosion today. Combining the effects of all those variables, Barry’s and my statistical model predicted that Easter, Nihoa, and Necker should be the worst deforested Pacific islands. That agrees with what actually happened: Nihoa and Necker ended up with no human left alive and with only one tree species standing (Nihoa’s palm), while Easter ended up with no tree species standing and with about 90% of its former population gone.


She Has Her Mother's Laugh by Carl Zimmer

23andMe, agricultural Revolution, Anthropocene, clean water, clockwatching, cloud computing, CRISPR, dark matter, data science, discovery of DNA, double helix, Drosophila, Easter island, Elon Musk, epigenetics, Fellow of the Royal Society, Flynn Effect, friendly fire, Gary Taubes, germ theory of disease, Gregor Mendel, Helicobacter pylori, Isaac Newton, James Webb Space Telescope, lolcat, longitudinal study, medical bankruptcy, meta-analysis, microbiome, moral panic, mouse model, New Journalism, out of africa, phenotype, Ralph Waldo Emerson, Recombinant DNA, Scientific racism, statistical model, stem cell, twin studies, W. E. B. Du Bois

The only way out of that paradox is to join some of those forks back together. In other words, your ancestors must have all been related to each other, either closely or distantly. The geometry of this heredity has long fascinated mathematicians, and in 1999 a Yale mathematician named Joseph Chang created the first statistical model of it. He found that it has an astonishing property. If you go back far enough in the history of a human population, you reach a point in time when all the individuals who have any descendants among living people are ancestors of all living people. To appreciate how weird this is, think again about Charlemagne.


The Dawn of Everything by David Graeber, David Wengrow

"World Economic Forum" Davos, agricultural Revolution, Anthropocene, Atahualpa, British Empire, Columbian Exchange, conceptual framework, cryptocurrency, David Graeber, degrowth, European colonialism, founder crops, Gini coefficient, global village, Hernando de Soto, Hobbesian trap, income inequality, invention of agriculture, invention of the steam engine, Isaac Newton, labour mobility, land tenure, Lewis Mumford, mass immigration, means of production, Murray Bookchin, new economy, New Urbanism, out of africa, public intellectual, Scientific racism, spice trade, spinning jenny, statistical model, Steven Pinker, theory of mind, trade route, Tragedy of the Commons, urban planning, urban renewal, urban sprawl

Indeed, this complex subsector of the coast, between the Eel River and the mouth of the Columbia River, posed significant problems of classification for scholars seeking to delineate the boundaries of those culture areas, and the issue of their affiliation remains contentious today. See Kroeber 1939; Jorgensen 1980; Donald 2003. 45. The historicity of First Nations oral narratives concerning ancient migrations and wars on the Northwest Coast has been the subject of an innovative study which combines archaeology with the statistical modelling of demographic shifts that can be scientifically dated back to periods well over a millennium into the past. Its authors conclude that the ‘Indigenous oral record has now been subjected to extremely rigorous testing. Our result – that the [in this case] Tsimshian oral record is correct (properly not disproved) in its accounting of events from over 1,000 years ago – is a major milestone in the evaluation of the validity of Indigenous oral traditions.’


pages: 1,042 words: 273,092

The Silk Roads: A New History of the World by Peter Frankopan

access to a mobile phone, Admiral Zheng, anti-communist, Ayatollah Khomeini, banking crisis, Bartolomé de las Casas, Berlin Wall, bread and circuses, British Empire, clean water, Columbian Exchange, credit crunch, cuban missile crisis, Deng Xiaoping, discovery of the americas, disinformation, drone strike, dual-use technology, energy security, European colonialism, failed state, financial innovation, Isaac Newton, land reform, Mahatma Gandhi, Malacca Straits, mass immigration, Mikhail Gorbachev, Murano, Venice glass, New Urbanism, no-fly zone, Ronald Reagan, sexual politics, South China Sea, spice trade, statistical model, Stuxnet, Suez crisis 1956, the built environment, the market place, The Wealth of Nations by Adam Smith, too big to fail, trade route, transcontinental railway, uranium enrichment, wealth creators, WikiLeaks, yield management, Yom Kippur War

Europe even began to export in the opposite direction too, flooding the market in the Middle East and causing a painful contraction that stood in direct contrast to the invigorated economy to the west.71 As recent research based on skeletal remains in graveyards in London demonstrates, the rise in wealth led to better diets and to better general health. Indeed, statistical modelling based on these results even suggests that one of the effects of the plague was a substantial improvement in life expectancy. London’s post-plague population was considerably healthier than it had been before the Black Death struck – raising life expectancy sharply.72 Economic and social development did not occur evenly across Europe.


pages: 1,758 words: 342,766

Code Complete (Developer Best Practices) by Steve McConnell

Ada Lovelace, Albert Einstein, Buckminster Fuller, business logic, call centre, classic study, continuous integration, data acquisition, database schema, don't repeat yourself, Donald Knuth, fault tolerance, General Magic , global macro, Grace Hopper, haute cuisine, if you see hoof prints, think horses—not zebras, index card, inventory management, iterative process, Larry Wall, loose coupling, Menlo Park, no silver bullet, off-by-one error, Perl 6, place-making, premature optimization, revision control, Sapir-Whorf hypothesis, seminal paper, slashdot, sorting algorithm, SQL injection, statistical model, Tacoma Narrows Bridge, the Cathedral and the Bazaar, the scientific method, Thomas Kuhn: the structure of scientific revolutions, Turing machine, web application

Chapter 5 of this book describes Humphrey's Probe method, which is a technique for estimating work at the individual developer level. Conte, S. D., H. E. Dunsmore, and V. Y. Shen. Software Engineering Metrics and Models. Menlo Park, CA: Benjamin/Cummings, 1986. Chapter 6 contains a good survey of estimation techniques, including a history of estimation, statistical models, theoretically based models, and composite models. The book also demonstrates the use of each estimation technique on a database of projects and compares the estimates to the projects' actual lengths. Gilb, Tom. Principles of Software Engineering Management. Wokingham, England: Addison-Wesley, 1988.


pages: 1,079 words: 321,718

Surfaces and Essences by Douglas Hofstadter, Emmanuel Sander

Abraham Maslow, affirmative action, Albert Einstein, Arthur Eddington, Benoit Mandelbrot, Brownian motion, Charles Babbage, cognitive dissonance, computer age, computer vision, dematerialisation, Donald Trump, Douglas Hofstadter, Eddington experiment, Ernest Rutherford, experimental subject, Flynn Effect, gentrification, Georg Cantor, Gerolamo Cardano, Golden Gate Park, haute couture, haute cuisine, Henri Poincaré, Isaac Newton, l'esprit de l'escalier, Louis Pasteur, machine translation, Mahatma Gandhi, mandelbrot fractal, Menlo Park, Norbert Wiener, place-making, Sapir-Whorf hypothesis, Silicon Valley, statistical model, Steve Jobs, Steve Wozniak, theory of mind, time dilation, upwardly mobile, urban sprawl, yellow journalism, zero-sum game

If he accepts the job, his salary and professional prestige will both take leaps, but on the other hand, the move would be a huge emotional upheaval for his entire family. A colleague whom he privately asks for advice reacts, “Hey, what’s with you? You’re one of the world’s experts on how decisions are made. Why are you coming to see me? You’re the one who invented super-sophisticated statistical models for making optimal decisions. Apply your own work to your dilemma; that’ll tell you what to do!” His friend looks at him straight in the eye and says, “Come off it, would you? This is serious!” The fact is that when we are faced with serious decisions, although we can certainly draw up a list of all sorts of outcomes, assigning them numerical weights that reflect their likelihoods of happening as well as the amount of pleasure they would bring us, on the basis of which we can then calculate the “optimal” choice, this is hardly the way that people who are in the throes of major decision-making generally proceed.


pages: 1,351 words: 385,579

The Better Angels of Our Nature: Why Violence Has Declined by Steven Pinker

1960s counterculture, affirmative action, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, availability heuristic, behavioural economics, Berlin Wall, Boeing 747, Bonfire of the Vanities, book value, bread and circuses, British Empire, Broken windows theory, business cycle, California gold rush, Cass Sunstein, citation needed, classic study, clean water, cognitive dissonance, colonial rule, Columbine, computer age, Computing Machinery and Intelligence, conceptual framework, confounding variable, correlation coefficient, correlation does not imply causation, crack epidemic, cuban missile crisis, Daniel Kahneman / Amos Tversky, David Brooks, delayed gratification, demographic transition, desegregation, Doomsday Clock, Douglas Hofstadter, Dr. Strangelove, Edward Glaeser, en.wikipedia.org, European colonialism, experimental subject, facts on the ground, failed state, first-past-the-post, Flynn Effect, food miles, Francis Fukuyama: the end of history, fudge factor, full employment, Garrett Hardin, George Santayana, ghettoisation, Gini coefficient, global village, Golden arches theory, Great Leap Forward, Henri Poincaré, Herbert Marcuse, Herman Kahn, high-speed rail, Hobbesian trap, humanitarian revolution, impulse control, income inequality, informal economy, Intergovernmental Panel on Climate Change (IPCC), invention of the printing press, Isaac Newton, lake wobegon effect, libertarian paternalism, long peace, longitudinal study, loss aversion, Marshall McLuhan, mass incarceration, McMansion, means of production, mental accounting, meta-analysis, Mikhail Gorbachev, mirror neurons, moral panic, mutually assured destruction, Nelson Mandela, nuclear taboo, Oklahoma City bombing, open economy, Peace of Westphalia, Peter Singer: altruism, power law, QWERTY keyboard, race to the bottom, Ralph Waldo Emerson, random walk, Republic of Letters, Richard Thaler, Ronald Reagan, Rosa Parks, Saturday Night Live, security theater, Skinner box, Skype, Slavoj Žižek, South China Sea, Stanford marshmallow experiment, Stanford prison experiment, statistical model, stem cell, Steven Levy, Steven Pinker, sunk-cost fallacy, technological determinism, The Bell Curve by Richard Herrnstein and Charles Murray, the long tail, The Wealth of Nations by Adam Smith, theory of mind, Timothy McVeigh, Tragedy of the Commons, transatlantic slave trade, trolley problem, Turing machine, twin studies, ultimatum game, uranium enrichment, Vilfredo Pareto, Walter Mischel, WarGames: Global Thermonuclear War, WikiLeaks, women in the workforce, zero-sum game

Combine exponentially growing damage with an exponentially shrinking chance of success, and you get a power law, with its disconcertingly thick tail. Given the presence of weapons of mass destruction in the real world, and religious fanatics willing to wreak untold damage for a higher cause, a lengthy conspiracy producing a horrendous death toll is within the realm of thinkable probabilities. A statistical model, of course, is not a crystal ball. Even if we could extrapolate the line of existing data points, the massive terrorist attacks in the tail are still extremely (albeit not astronomically) unlikely. More to the point, we can’t extrapolate it. In practice, as you get to the tail of a power-law distribution, the data points start to misbehave, scattering around the line or warping it downward to very low probabilities.


pages: 889 words: 433,897

The Best of 2600: A Hacker Odyssey by Emmanuel Goldstein

affirmative action, Apple II, benefit corporation, call centre, disinformation, don't be evil, Firefox, game design, Hacker Ethic, hiring and firing, information retrieval, information security, John Markoff, John Perry Barlow, late fees, license plate recognition, Mitch Kapor, MITM: man-in-the-middle, Oklahoma City bombing, optical character recognition, OSI model, packet switching, pirate software, place-making, profit motive, QWERTY keyboard, RFID, Robert Hanssen: Double agent, rolodex, Ronald Reagan, satellite internet, Silicon Valley, Skype, spectrum auction, statistical model, Steve Jobs, Steve Wozniak, Steven Levy, Telecommunications Act of 1996, telemarketer, undersea cable, UUNET, Y2K

I started with only a few keywords and found myself with many more based on the keyword tool. But this was where more problems started to occur. I found that my keywords were being canceled way too easily and were not given a fair chance to perform. Like I said earlier, if the campaign was on a larger scale, then this statistics model may hold true. But for smaller campaigns it simply was more of a hassle. It also led to another problem that I found slightly ironic, which is that the keyword tool suggested words and phrases to me that I was later denied due to their ToS (Terms of Service) anyway. Why recommend them if you are not going to allow me to use them?


pages: 1,737 words: 491,616

Rationality: From AI to Zombies by Eliezer Yudkowsky

Albert Einstein, Alfred Russel Wallace, anthropic principle, anti-pattern, anti-work, antiwork, Arthur Eddington, artificial general intelligence, availability heuristic, backpropagation, Bayesian statistics, behavioural economics, Berlin Wall, Boeing 747, Build a better mousetrap, Cass Sunstein, cellular automata, Charles Babbage, cognitive bias, cognitive dissonance, correlation does not imply causation, cosmological constant, creative destruction, Daniel Kahneman / Amos Tversky, dematerialisation, different worldview, discovery of DNA, disinformation, Douglas Hofstadter, Drosophila, Eddington experiment, effective altruism, experimental subject, Extropian, friendly AI, fundamental attribution error, Great Leap Forward, Gödel, Escher, Bach, Hacker News, hindsight bias, index card, index fund, Isaac Newton, John Conway, John von Neumann, Large Hadron Collider, Long Term Capital Management, Louis Pasteur, mental accounting, meta-analysis, mirror neurons, money market fund, Monty Hall problem, Nash equilibrium, Necker cube, Nick Bostrom, NP-complete, One Laptop per Child (OLPC), P = NP, paperclip maximiser, pattern recognition, Paul Graham, peak-end rule, Peter Thiel, Pierre-Simon Laplace, placebo effect, planetary scale, prediction markets, random walk, Ray Kurzweil, reversible computing, Richard Feynman, risk tolerance, Rubik’s Cube, Saturday Night Live, Schrödinger's Cat, scientific mainstream, scientific worldview, sensible shoes, Silicon Valley, Silicon Valley startup, Singularitarianism, SpaceShipOne, speech recognition, statistical model, Steve Jurvetson, Steven Pinker, strong AI, sunk-cost fallacy, technological singularity, The Bell Curve by Richard Herrnstein and Charles Murray, the map is not the territory, the scientific method, Turing complete, Turing machine, Tyler Cowen, ultimatum game, X Prize, Y Combinator, zero-sum game

When there is some phenomenon A that we want to investigate, and an observation X that is evidence about A—for example, in the previous example, A is breast cancer and X is a positive mammography—Bayes’s Theorem tells us how we should update our probability of A, given the new evidence X. By this point, Bayes’s Theorem may seem blatantly obvious or even tautological, rather than exciting and new. If so, this introduction has entirely succeeded in its purpose. * * * Bayes’s Theorem describes what makes something “evidence” and how much evidence it is. Statistical models are judged by comparison to the Bayesian method because, in statistics, the Bayesian method is as good as it gets—the Bayesian method defines the maximum amount of mileage you can get out of a given piece of evidence, in the same way that thermodynamics defines the maximum amount of work you can get out of a temperature differential.