natural language processing

83 results back to index


pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining

., Maximum Entropy Classifiers maximum a posteriori (MAP) hypothesis, Naïve Bayes Learning sentiment classification, Sentiment classification Named Entities (NEs), The Annotation Development Cycle, Adding Named Entities, Inline Annotation, Example 3: Extent Annotations—Named Entities, Example 3: Extent Annotations—Named Entities as extent tags, Example 3: Extent Annotations—Named Entities and inline tagging, Inline Annotation and models, Adding Named Entities Simple Named Entity Guidelines V6.5, Example 3: Extent Annotations—Named Entities Narrative Containers, Narrative Containers–Narrative Containers natural language processing, What Is Natural Language Processing?–What Is Natural Language Processing? (see NLP (natural language processing)) Natural Language Processing with Python (Bird, Klein, and Loper), What Is Natural Language Processing?, Collecting Data from the Internet, Training: Machine Learning, Gender Identification–Gender Identification gender identification problem in, Gender Identification–Gender Identification NCSU, TempEval-2 system, TempEval-2: System Summaries neg-content-term, Decision Tree Learning Netflix, Film Genre Classification, Example 2: Multiple Labels—Film Genres New York Times, Building the Corpus NIST TREC Tracks, NLP Challenges NLP (natural language processing), The Importance of Language Annotation–The Importance of Language Annotation, The Layers of Linguistic Description–The Layers of Linguistic Description, What Is Natural Language Processing?

In Proceedings of the 5th International Workshop on Semantic Evaluation. Madnani, Nitin. 2007. “Getting Started on Natural Language Processing with Python.” ACM Crossroads 13(4). Updated version available at http://www.desilinguist.org/. Accessed May 16, 2012. Madnani, Nitin, and Jimmy Lin. Natural Language Processing with Hadoop and Python. http://www.cloudera.com/blog/2010/03/natural-language-processing-with-hadoopand-python/. Posted March 16, 2010. Mani, Inderjeet, Marc Verhagen, Ben Wellner, Chong Min Lee, and James Pustejovsky. 2006. Proceedings of Machine Learning of Temporal Relations. ACL 2006, Sydney, Australia. Manning, Chris, and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press. Manning, Christopher D., Prabhakar Raghavan, and Hinrich Schütze. 2008.

Pragmatics The study of how the context of text affects the meaning of an expression, and what information is necessary to infer a hidden or presupposed meaning. Text structure analysis The study of how narratives and other textual styles are constructed to make larger textual compositions. Throughout this book we will present examples of annotation projects that make use of various combinations of the different concepts outlined in the preceding list. What Is Natural Language Processing? Natural Language Processing (NLP) is a field of computer science and engineering that has developed from the study of language and computational linguistics within the field of Artificial Intelligence. The goals of NLP are to design and build applications that facilitate human interaction with machines and other devices through the use of natural language. Some of the major areas of NLP include: Question Answering Systems (QAS) Imagine being able to actually ask your computer or your phone what time your favorite restaurant in New York stops serving dinner on Friday nights.


pages: 504 words: 89,238

Natural language processing with Python by Steven Bird, Ewan Klein, Edward Loper

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

bioinformatics, business intelligence, conceptual framework, Donald Knuth, elephant in my pajamas, en.wikipedia.org, finite state, Firefox, Guido van Rossum, information retrieval, Menlo Park, natural language processing, P = NP, search inside the book, speech recognition, statistical model, text mining, Turing test

Natural Language Processing with Python Natural Language Processing with Python Steven Bird, Ewan Klein, and Edward Loper Beijing • Cambridge • Farnham • Köln • Sebastopol • Taipei • Tokyo Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper Copyright © 2009 Steven Bird, Ewan Klein, and Edward Loper. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: (800) 998-9938 or corporate@oreilly.com. Editor: Julie Steele Production Editor: Loranah Dimant Copyeditor: Genevieve d’Entremont Proofreader: Loranah Dimant Indexer: Ellen Troutman Zaig Cover Designer: Karen Montgomery Interior Designer: David Futato Illustrator: Robert Romano Printing History: June 2009: First Edition.

Managing Linguistic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 11.1 11.2 11.3 11.4 Corpus Structure: A Case Study The Life Cycle of a Corpus Acquiring Data Working with XML 407 412 416 425 Table of Contents | vii 11.5 11.6 11.7 11.8 11.9 Working with Toolbox Data Describing Language Resources Using OLAC Metadata Summary Further Reading Exercises 431 435 437 437 438 Afterword: The Language Challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 NLTK Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 General Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 viii | Table of Contents Preface This is a book about Natural Language Processing. By “natural language” we mean a language that is used for everyday communication by humans; languages such as English, Hindi, or Portuguese. In contrast to artificial languages such as programming languages and mathematical notations, natural languages have evolved as they pass from generation to generation, and are hard to pin down with explicit rules. We will take Natural Language Processing—or NLP for short—in a wide sense to cover any kind of computer manipulation of natural language. At one extreme, it could be as simple as counting word frequencies to compare different writing styles.

[Church and Patil, 1982] Kenneth Church and Ramesh Patil. Coping with syntactic ambiguity or how to put the block in the box on the table. American Journal of Computational Linguistics, 8:139–149, 1982. [Cohen and Hunter, 2004] K. Bretonnel Cohen and Lawrence Hunter. Natural language processing and systems biology. In Werner Dubitzky and Francisco Azuaje, editors, Artificial Intelligence Methods and Tools for Systems Biology, page 147–174 Springer Verlag, 2004. [Cole, 1997] Ronald Cole, editor. Survey of the State of the Art in Human Language Technology. Studies in Natural Language Processing. Cambridge University Press, 1997. [Copestake, 2002] Ann Copestake. Implementing Typed Feature Structure Grammars. CSLI Publications, Stanford, CA, 2002. [Corbett, 2006] Greville G. Corbett. Agreement. Cambridge University Press, 2006.


pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack by Matthew A. Russell

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Climategate, cloud computing, crowdsourcing, en.wikipedia.org, fault tolerance, Firefox, full text search, Georg Cantor, Google Earth, information retrieval, Mark Zuckerberg, natural language processing, NP-complete, profit motive, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, statistical model, Steve Jobs, supply-chain management, text mining, traveling salesman, Turing test, web application

(Chapter 8 introduces a fundamental paradigm shift away from the tools in this chapter and should make the differences more pronounced than they may seem if you haven’t read that material yet.) If you’d like to try applying the techniques from this chapter to the Web (in general), you might want to check out Scrapy, an easy-to-use and mature web scraping and crawling framework. Chapter 8. Blogs et al.: Natural Language Processing (and Beyond) This chapter is a modest attempt to introduce Natural Language Processing (NLP) and apply it to the unstructured data in blogs. In the spirit of the prior chapters, it attempts to present the minimal level of detail required to empower you with a solid general understanding of an inherently complex topic, while also providing enough of a technical drill-down that you’ll be able to immediately get to work mining some data.

plotting geo data via microform.at and Google Maps, Plotting geo data via microform.at and Google Maps hRecipe, Slicing and Dicing Recipes (for the Health of It), Slicing and Dicing Recipes (for the Health of It) hReview data for recipe reviews, Collecting Restaurant Reviews, Collecting Restaurant Reviews popular, for embedding structured data into web pages, XFN and Friends semantic markup, XFN and Friends XFN, XFN and Friends, Exploring Social Connections with XFN, Brief analysis of breadth-first techniques using to explore social connections, Exploring Social Connections with XFN, Brief analysis of breadth-first techniques multiquery (FQL), Slicing and dicing data with FQL N n-gram similarity, Common Similarity Metrics for Clustering n-grams, Common Similarity Metrics for Clustering, Buzzing on Bigrams defined, Common Similarity Metrics for Clustering n-squared problem, Motivation for Clustering natural language processing, Frequency Analysis and Lexical Diversity (see NLP) Natural Language Toolkit, Frequency Analysis and Lexical Diversity (see NLTK) natural numbers, Elementary Set Operations nested query (FQL), Slicing and dicing data with FQL NetworkX, Installing Python Development Tools, Installing Python Development Tools, Extracting relationships from the tweets, Extracting relationships from the tweets, Constructing Friendship Graphs, Clique Detection and Analysis building graph describing retweet data, Extracting relationships from the tweets, Extracting relationships from the tweets exporting Redis friend/follower data to for graph analysis, Constructing Friendship Graphs finding cliques in Twitter friendship data, Clique Detection and Analysis installing, Installing Python Development Tools using to create graph of nodes and edges, Installing Python Development Tools *nix (Linux/Unix) environment, Or Not to Read This Book? NLP (natural language processing), Blogs et al.: Natural Language Processing (and Beyond), Closing Remarks, NLP: A Pareto-Like Introduction, A Brief Thought Exercise, A Typical NLP Pipeline with NLTK, A Typical NLP Pipeline with NLTK, Sentence Detection in Blogs with NLTK, Sentence Detection in Blogs with NLTK, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm, Analysis of Luhn’s Summarization Algorithm, Entity-Centric Analysis: A Deeper Understanding of the Data, Quality of Analytics, Quality of Analytics entity-centric analysis, Entity-Centric Analysis: A Deeper Understanding of the Data, Quality of Analytics, Quality of Analytics quality of analytics, Quality of Analytics sentence detection in blogs with NLTK, Sentence Detection in Blogs with NLTK, Sentence Detection in Blogs with NLTK summarizing documents, Summarizing Documents, Analysis of Luhn’s Summarization Algorithm, Analysis of Luhn’s Summarization Algorithm analysis of Luhn’s algorithm, Analysis of Luhn’s Summarization Algorithm syntax and semantics, NLP: A Pareto-Like Introduction thought exercise, A Brief Thought Exercise typical NLP pipeline with NLTK, A Typical NLP Pipeline with NLTK, A Typical NLP Pipeline with NLTK NLTK (Natural Language Toolkit), Frequency Analysis and Lexical Diversity, Frequency Analysis and Lexical Diversity, What are people talking about right now?

Although these are not difficult to compute, we’d be better off installing a tool that offers a built-in frequency distribution and many other tools for text analysis. The Natural Language Toolkit (NLTK) is a popular module we’ll use throughout this book: it delivers a vast amount of tools for various kinds of text analytics, including the calculation of common metrics, information extraction, and natural language processing (NLP). Although NLTK isn’t necessarily state-of-the-art as compared to ongoing efforts in the commercial space and academia, it nonetheless provides a solid and broad foundation—especially if this is your first experience trying to process natural language. If your project is sufficiently sophisticated that the quality or efficiency that NLTK provides isn’t adequate for your needs, you have approximately three options, depending on the amount of time and money you are willing to put in: scour the open source space for a suitable alternative by running comparative experiments and benchmarks, churn through whitepapers and prototype your own toolkit, or license a commercial product.


pages: 347 words: 97,721

Only Humans Need Apply: Winners and Losers in the Age of Smart Machines by Thomas H. Davenport, Julia Kirby

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AI winter, Andy Kessler, artificial general intelligence, asset allocation, Automated Insights, autonomous vehicles, basic income, Baxter: Rethink Robotics, business intelligence, business process, call centre, carbon-based life, Clayton Christensen, clockwork universe, commoditize, conceptual framework, dark matter, David Brooks, deliberate practice, deskilling, digital map, Douglas Engelbart, Edward Lloyd's coffeehouse, Elon Musk, Erik Brynjolfsson, estate planning, fixed income, follow your passion, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, game design, general-purpose programming language, Google Glasses, Hans Lippershey, haute cuisine, income inequality, index fund, industrial robot, information retrieval, intermodal, Internet of things, inventory management, Isaac Newton, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Khan Academy, knowledge worker, labor-force participation, lifelogging, loss aversion, Mark Zuckerberg, Narrative Science, natural language processing, Norbert Wiener, nuclear winter, pattern recognition, performance metric, Peter Thiel, precariat, quantitative trading / quantitative finance, Ray Kurzweil, Richard Feynman, Richard Feynman, risk tolerance, Robert Shiller, Robert Shiller, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, six sigma, Skype, speech recognition, spinning jenny, statistical model, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, superintelligent machines, supply-chain management, transaction costs, Tyler Cowen: Great Stagnation, Watson beat the top human players on Jeopardy!, Works Progress Administration, Zipcar

Finally, people who are interested in programming in this context should be interested in and knowledgeable about some aspect of this field’s key movements: artificial intelligence, natural language processing (NLP), machine learning, deep-learning neural networks, statistical analysis and data mining, and so forth. If you have a basic grounding in computer science and programming, it is possible to develop a sufficient understanding of these automation-oriented tools well into your career. Today there are many online courses related to this field. Stanford professors, for example, have created online courses with companies like Coursera and Udacity in such highly relevant fields as machine learning, natural language processing, algorithms, and robotics. You have to be pretty motivated to finish such courses, but it can be done. And as the Watson jobs we list above suggest, there are also plenty of IT-oriented jobs that don’t just involve programming.

The friend is an independent consultant, so it was slightly surprising to learn, by being cc’d on an email, that he employed an assistant, “Amy.” He wrote: Hi Amy, Would you please send an invite for Tom and me for Friday 9/19 at 9:30A.M. at Hi-Rise Cafe in Cambridge, MA. We will be meeting in person. Thanks, Judah Curiosity getting the best of him, Tom looked up the company in Amy’s email extension, @x.ai. It turns out X.ai is a company that uses “natural language processing” software to interpret text and schedule meetings via email. “Amy,” in other words, is automated. Meanwhile, other tools such as email and voice mail, word processing, online travel sites, and Internet search applications have been chipping away the rest of what used to be a secretarial job. Era Two automation doesn’t only affect office workers. It washes across the entire services-based economy that arose after massive productivity gains wiped out jobs in agriculture, then manufacturing.

Our observation is that the experts engaging in the current debate about knowledge work automation tend to fall into two camps—those who say we are heading inexorably toward permanent high levels of unemployment and those who are certain new job types will spring up to replace all the ones that go by the wayside—but that neither camp suggests to workers that there is much they can do personally about the situation. Our main mission in the next couple hundred pages is to persuade you, our knowledge worker reader, that you remain in charge of your destiny. You should be feeling a sense of agency and making decisions for yourself as to how you will deal with advancing automation. Over the past few years, even as every week brings news of some breakthrough in machine learning or natural language processing or visual image recognition, we’ve been learning from knowledge workers who are thriving. They’re redefining what it means to be more capable than computers, and doubling down on their very human strengths. As you’ll find in the chapters to come, these are not superhumans who can somehow process information more quickly than artificial intelligence or perform repetitive tasks as flawlessly as robots.


pages: 502 words: 107,657

Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die by Eric Siegel

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, algorithmic trading, Amazon Mechanical Turk, Apple's 1984 Super Bowl advert, backtesting, Black Swan, book scanning, bounce rate, business intelligence, business process, call centre, commoditize, computer age, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data is the new oil, en.wikipedia.org, Erik Brynjolfsson, Everything should be made as simple as possible, experimental subject, Google Glasses, happiness index / gross national happiness, job satisfaction, Johann Wolfgang von Goethe, lifelogging, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mass immigration, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, Norbert Wiener, personalized medicine, placebo effect, prediction markets, Ray Kurzweil, recommendation engine, risk-adjusted returns, Ronald Coase, Search for Extraterrestrial Intelligence, self-driving car, sentiment analysis, software as a service, speech recognition, statistical model, Steven Levy, text mining, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, Thomas Bayes, Turing test, Watson beat the top human players on Jeopardy!, X Prize, Yogi Berra, zero-sum game

self-driving cars spam filtering Google Adwords Google Flu Trends Google Glass Google Page Rank government data storage by fraud detection for invoices PA for public access to data GPS data grades, predicting Granger, Clive grant awards, predicting Greenspan, Alan Grockit Groundhog Day (film) Grundhoefer, Michael H hackers, predicting Halder, Gitali HAL (intelligent computer) Hansell, Saul happiness, social effect and Harbor Sweets Harcourt, Bernard Harrah’s Las Vegas Harris, Jeanne Harvard Medical School Harvard University Hastings, Reed healthcare death predictions in health risks, predicting hospital admissions, predicting influenza, predicting medical research, predicting in medical treatments, risks for wrong predictions in medical treatments, testing persuasion in PA for personalized medicine, uplift modeling applications for health insurance companies, PA for Hebrew University Heisenberg, Werner Karl Helle, Eva Helsinki Brain Research Centre Hennessey, Kathleen Heraclitus Heritage Health Prize Heritage Provider Network Hewlett Foundation Hewlett-Packard (HP) employee data used by financial savings and benefits of PA Global Business Services (GBS) quitting and Flight Risks, predicting sales leads, predicting turnover rates at warranty claims and fraud detection High Anxiety (film) HIV progression, predicting HIV treatments, uplift modeling for Hollifield, Stephen Holmes, Sherlock hormone replacement, coronary disease and hospital admissions, predicting Hotmail.com House (TV show) “How Companies Learn Your Secrets” (Duhigg) Howe, Jeff HP. See Hewlett-Packard (HP) Hubbard, Douglas human behavior collective intelligence consumer behavior insights emotions and mood prediction mistakes, predicting social effect and human genome human language inappropriate comments, predicting mood predictions and natural language processing (NLP) PA for persuasion and influence in human resources. See employees and staff I IBM corporate roll-ups Deep Blue computer DeepQA project Iambic IBM AI mind-reading technology natural language processing research sales leads, predicting student performance PA contest T. J. Watson Research Center value of See also Watson computer ID3 impact modeling. See uplift modeling Imperium incremental impact modeling. See uplift modeling incremental response modeling. See uplift modeling India Indiana University Induction Effect, The induction vs. deduction inductive bias infidelity, predicting Infinity Insurance influence.

2001: A Space Odyssey’s smart and talkative computer, HAL, bears a legendary, disputed connection in nomenclature to IBM (just take each letter back one position in the alphabet); however, author Arthur C. Clarke has strenuously denied that this was intentional. Ask IBM researchers whether their question answering Watson system is anything like HAL, which goes famously rogue in the film, and they’ll quickly reroute your comparison toward the obedient computers of Star Trek. The field of research that develops technology to work with human language is natural language processing (NLP, aka computational linguistics). In commercial application, it’s known as text analytics. These fields develop analytical methods especially designed to operate across the written word. If data is all Earth’s water, textual data is the part known as “the ocean.” Often said to compose 80 percent of all data, it’s everything we the human race know that we’ve bothered to write down.

They were tackling the breadth of human language that stretches beyond the phrasing of each question to include a sea of textual sources, from which the answer to each question must be extracted. With this ambition, IBM had truly doubled down. I would have thought success impossible. After witnessing the world’s best researchers attempting to tackle the task through the 1990s (during which I spent six years in natural language processing research, as well as a summer at the same IBM Research center that bore Watson), I was ready to throw up my hands. Language is so tough that it seemed virtually impossible even to program a computer to answer questions within a limited domain of knowledge such as movies or wines. Yet IBM had taken on the unconstrained, open field of questions across any domain. Meeting this challenge would demonstrate such a great leap toward humanlike capabilities that it invokes the “I” word: intelligence.


pages: 579 words: 76,657

Data Science from Scratch: First Principles with Python by Joel Grus

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

correlation does not imply causation, natural language processing, Netflix Prize, p-value, Paul Graham, recommendation engine, SpamAssassin, statistical model

Three bottom-up clusters using max distance For Further Exploration scikit-learn has an entire module sklearn.cluster that contains several clustering algorithms including KMeans and the Ward hierarchical clustering algorithm (which uses a different criterion for merging clusters than ours did). SciPy has two clustering models scipy.cluster.vq (which does k-means) and scipy.cluster.hierarchy (which has a variety of hierarchical clustering algorithms). Chapter 20. Natural Language Processing They have been at a great feast of languages, and stolen the scraps. William Shakespeare Natural language processing (NLP) refers to computational techniques involving language. It’s a broad field, but we’ll look at a few techniques both simple and not simple. Word Clouds In Chapter 1, we computed word counts of users’ interests. One approach to visualizing words and counts is word clouds, which artistically lay out the words with sizes proportional to their counts.

modules (Python), Modules multiple assignment (Python), Tuples N n-gram models, n-gram Models-n-gram Modelsbigram, n-gram Models trigrams, n-gram Models n-grams, n-gram Models Naive Bayes algorithm, Naive Bayes-For Further Explorationexample, filtering spam, A Really Dumb Spam Filter-A More Sophisticated Spam Filter implementation, Implementation natural language processing (NLP), Natural Language Processing-For Further Explorationgrammars, Grammars-Grammars topic modeling, Topic Modeling-Topic Modeling topics of interest, finding, Topics of Interest word clouds, Word Clouds-Word Clouds nearest neighbors classification, k-Nearest Neighbors-For Further Explorationcurse of dimensionality, The Curse of Dimensionality-The Curse of Dimensionality example, favorite programming languages, Example: Favorite Languages-Example: Favorite Languages model, The Model network analysis, Network Analysis-For Further Explorationbetweenness centrality, Betweenness Centrality-Betweenness Centrality closeness centrality, Betweenness Centrality degree centrality, Finding Key Connectors, Betweenness Centrality directed graphs and PageRank, Directed Graphs and PageRank-Directed Graphs and PageRank eigenvector centrality, Eigenvector Centrality-Centrality networks, Network Analysis neural networks, Neural Networks-For Further Explorationbackpropagation, Backpropagation example, defeating a CAPTCHA, Example: Defeating a CAPTCHA-Example: Defeating a CAPTCHA feed-forward, Feed-Forward Neural Networks perceptrons, Perceptrons neurons, Neural Networks NLP (see natural language processing) nodes, Network Analysis noise, Rescalingin machine learning, Overfitting and Underfitting None (Python), Truthiness normal distribution, The Normal Distributionand p-value computation, Example: Flipping a Coin central limit theorem and, The Central Limit Theorem in coin flip example, Example: Flipping a Coin standard, The Normal Distribution normalized tables, JOIN NoSQL databases, NoSQL NotQuiteABase, Databases and SQL null hypothesis, Statistical Hypothesis Testingtesting in A/B test, Example: Running an A/B Test NumPy, NumPy O one-sided tests, Example: Flipping a Coin ORDER BY statement (SQL), ORDER BY overfitting, Overfitting and Underfitting, The Bias-Variance Trade-off P p-hacking, P-hacking p-values, Example: Flipping a Coin PageRank algorithm, Directed Graphs and PageRank paid accounts, predicting, Paid Accounts pandas, For Further Exploration, For Further Exploration, pandas parameterized models, What Is Machine Learning?

modules (Python), Modules multiple assignment (Python), Tuples N n-gram models, n-gram Models-n-gram Modelsbigram, n-gram Models trigrams, n-gram Models n-grams, n-gram Models Naive Bayes algorithm, Naive Bayes-For Further Explorationexample, filtering spam, A Really Dumb Spam Filter-A More Sophisticated Spam Filter implementation, Implementation natural language processing (NLP), Natural Language Processing-For Further Explorationgrammars, Grammars-Grammars topic modeling, Topic Modeling-Topic Modeling topics of interest, finding, Topics of Interest word clouds, Word Clouds-Word Clouds nearest neighbors classification, k-Nearest Neighbors-For Further Explorationcurse of dimensionality, The Curse of Dimensionality-The Curse of Dimensionality example, favorite programming languages, Example: Favorite Languages-Example: Favorite Languages model, The Model network analysis, Network Analysis-For Further Explorationbetweenness centrality, Betweenness Centrality-Betweenness Centrality closeness centrality, Betweenness Centrality degree centrality, Finding Key Connectors, Betweenness Centrality directed graphs and PageRank, Directed Graphs and PageRank-Directed Graphs and PageRank eigenvector centrality, Eigenvector Centrality-Centrality networks, Network Analysis neural networks, Neural Networks-For Further Explorationbackpropagation, Backpropagation example, defeating a CAPTCHA, Example: Defeating a CAPTCHA-Example: Defeating a CAPTCHA feed-forward, Feed-Forward Neural Networks perceptrons, Perceptrons neurons, Neural Networks NLP (see natural language processing) nodes, Network Analysis noise, Rescalingin machine learning, Overfitting and Underfitting None (Python), Truthiness normal distribution, The Normal Distributionand p-value computation, Example: Flipping a Coin central limit theorem and, The Central Limit Theorem in coin flip example, Example: Flipping a Coin standard, The Normal Distribution normalized tables, JOIN NoSQL databases, NoSQL NotQuiteABase, Databases and SQL null hypothesis, Statistical Hypothesis Testingtesting in A/B test, Example: Running an A/B Test NumPy, NumPy O one-sided tests, Example: Flipping a Coin ORDER BY statement (SQL), ORDER BY overfitting, Overfitting and Underfitting, The Bias-Variance Trade-off P p-hacking, P-hacking p-values, Example: Flipping a Coin PageRank algorithm, Directed Graphs and PageRank paid accounts, predicting, Paid Accounts pandas, For Further Exploration, For Further Exploration, pandas parameterized models, What Is Machine Learning?


pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, Chuck Templeton: OpenTable, cloud computing, cognitive dissonance, combinatorial explosion, commoditize, conceptual framework, database schema, en.wikipedia.org, Firefox, Flash crash, Gini coefficient, illegal immigration, iterative process, labor-force participation, loose coupling, natural language processing, Netflix Prize, quantitative trading / quantitative finance, recommendation engine, selection bias, sentiment analysis, statistical model, supply-chain management, survivorship bias, text mining, too big to fail, web application

Online business reviews are one of the major input signals we use to determine these classifications. Reviews can tell us the positive or negative sentiment of the reviewer, as well as what they specifically care about, such as quality of service, ambience, and value. When we aggregate reviews, we can learn what’s popular about the place and why people like or dislike it. We use many other signals besides reviews, but with the proper application of natural language processing,[9] reviews are a rich source of significant information. Getting Reviews To get reviews, we use APIs where possible, but most reviews are found using good old-fashioned web scraping. If you can use an API like CityGrid[10] to get the data you need, it will make your life much easier, because while scraping isn’t necessarily difficult, it can be very frustrating. Website HTML can change without notice, and only the simplest or most advanced scraping logic will remain unaffected.

For sentiment analysis, a feature set is a piece of text, like a review, and the possible labels can be pos for positive text, and neg for negative text. Such a sentiment classifier could be run over a business’s reviews in order to calculate an overall sentiment, and to make up for any missing rating information. Sentiment Classification NLTK,[12] Python’s Natural Language ToolKit, is a very useful programming library for doing natural language processing and text classification.[13] It also comes with many corpora that you can use for training and testing. One of these is the movie_reviews corpus,[14] and if you’re just learning how to do sentiment classification, this is a good corpus to start with. It is organized into two directories, pos and neg. In each directory is a set of files containing movie reviews, with every review separated by a blank line.

If every other signal is mostly positive, then showing negative reviews is a disservice to our users and results in a poor experience. By choosing to show only positive reviews, the data, design, and user experience are all congruent, helping our users choose from the best options available based on their own preferences, without having to do any mental filtering of negative opinions. Lessons Learned One important lesson for machine learning and statistical natural language processing enthusiasts: it’s very important to train your own models on your own data. If I had used classifiers trained on the standard movie_reviews corpus, I would never have gotten these results. Movie reviews are simply different than local business reviews. In fact, it might be the case that you’d get even better results by segmenting businesses by type, and creating classifiers for each type of business.


pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, AI winter, Amazon Web Services, artificial general intelligence, Asilomar, Automated Insights, Bayesian statistics, Bernie Madoff, Bill Joy: nanobots, brain emulation, cellular automata, Chuck Templeton: OpenTable, cloud computing, cognitive bias, commoditize, computer vision, cuban missile crisis, Daniel Kahneman / Amos Tversky, Danny Hillis, data acquisition, don't be evil, drone strike, Extropian, finite state, Flash crash, friendly AI, friendly fire, Google Glasses, Google X / Alphabet X, Isaac Newton, Jaron Lanier, John Markoff, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, Loebner Prize, lone genius, mutually assured destruction, natural language processing, Nicholas Carr, optical character recognition, PageRank, pattern recognition, Peter Thiel, prisoner's dilemma, Ray Kurzweil, Rodney Brooks, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, smart grid, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, Stuxnet, superintelligent machines, technological singularity, The Coming Technological Singularity, Thomas Bayes, traveling salesman, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, zero day

When I questioned him at an AGI conference, Google’s Director of Research Peter Norvig, coauthor of the classic AI textbook, Artificial Intelligence: A Modern Approach, said Google wasn’t looking into AGI. He compared the quest to NASA’s plan for manned interplanetary travel. It doesn’t have one. But it will continue to develop the component sciences of traveling in space—rocketry, robotics, astronomy, et cetera—and one day all the pieces will come together, and a shot at Mars will look feasible. Likewise, narrow AI projects do lots of intelligent jobs like search, voice recognition, natural language processing, visual perception, data mining, and much more. Separately they are well-funded, powerful tools, dramatically improving each year. Together they advance the computer sciences that will benefit AGI systems. However, Norvig told me, no AGI program for Google exists. But compare that statement to what his boss, Google cofounder Larry Page said at a London conference called Zeitgeist ’06: People always make the assumption that we’re done with search.

Cyc’s inference engine understands queries and generates answers from its vast knowledge database. Created by AI pioneer Douglas Lenat, Cyc is the largest AI project in history, and probably the best funded, with $50 million in grants from government agencies, including DARPA, since 1984. Cyc’s creators continue to improve its database and inference engine so it can better process “natural language,” or everyday written language. Once it has acquired a sufficient natural language processing (NLP) capability, its creators will start it reading, and comprehending, all the Web pages on the Internet. Another contender for most knowledgeable knowledge database is already doing that. Carnegie Mellon University’s NELL, the Never-Ending-Language-Learning system, knows more than 390,000 facts about the world. Operating 24/7, NELL—a beneficiary of DARPA funding—scans hundreds of millions of Web pages for patterns of text so it can learn even more.

Many know that DARPA (then called ARPA) funded the research that invented the Internet (initially called ARPANET), as well as the researchers who developed the now ubiquitous GUI, or Graphical User Interface, a version of which you probably see every time you use a computer or smart phone. But the agency was also a major backer of parallel processing hardware and software, distributed computing, computer vision, and natural language processing (NLP). These contributions to the foundations of computer science are as important to AI as the results-oriented funding that characterizes DARPA today. How is DARPA spending its money? A recent annual budget allocates $61.3 million to a category called Machine Learning, and $49.3 million to Cognitive Computing. But AI projects are also funded under Information and Communication Technology, $400.5 million, and Classified Programs, $107.2 million.

Big Data at Work: Dispelling the Myths, Uncovering the Opportunities by Thomas H. Davenport

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Automated Insights, autonomous vehicles, bioinformatics, business intelligence, business process, call centre, chief data officer, cloud computing, commoditize, data acquisition, Edward Snowden, Erik Brynjolfsson, intermodal, Internet of things, Jeff Bezos, knowledge worker, lifelogging, Mark Zuckerberg, move fast and break things, move fast and break things, Narrative Science, natural language processing, Netflix Prize, New Journalism, recommendation engine, RFID, self-driving car, sentiment analysis, Silicon Valley, smart grid, smart meter, social graph, sorting algorithm, statistical model, Tesla Model S, text mining

Health Care In health care, for example, there will be much more structured data from the many electronic medical record systems that hospitals and outpatient clinics are installing. In addition, there have Chapter_02.indd 44 03/12/13 11:42 AM How Big Data Will Change Your Job, Company, and Industry   45 always been voluminous amounts of text in the clinical setting, ­primarily from physicians’ and nurses’ notes. This text can increasingly be c­ aptured and classified through the use of natural language ­processing ­technology. Insurance firms have huge amounts of medical claims data, but it’s not integrated with the data from healthcare providers. If all of that data could be integrated, categorized, and analyzed, we’d know a lot more about patient conditions. Image data from CAT scans and MRIs is another huge source; thus far doctors only look at it but don’t analyze it in any systematic fashion.

Many companies have used small data analytics to measure and analyze this important factor, but a lot of the data about how customers feel is unstructured—in particular, sitting in recorded voice files from customer calls to call centers. The level of customer satisfaction is increasingly important to health insurers because it is being monitored by state and federal government groups and published by organizations such as Consumers Union. In the past, that valuable data from calls couldn’t be analyzed. Now, however, United is turning it into text and then analyzing it with natural language processing software (a way to extract meaning from text). The analysis process can identify—though it’s not easy, given the vagaries of the English language—customers who use terms suggesting strong dissatisfaction. The insurer can then make some sort of intervention—perhaps a call exploring the source of the ­dissatisfaction. The decision is the same as in the past—how to identify a dissatisfied customer—but the tools are different.

In any case, many organizations that work with big data employ ­specialists in machine learning. Big data often involves the processing of unstructured data types like text, images, and video. It is probably impossible for a data scientist to be familiar with the analysis of all of these data types, but a knowledge of analytical approaches to one of them would be very useful. For example, natural language processing (NLP) is a set of approaches to extracting meaning from text. It may involve counting, classifying, translating, or otherwise analyzing words. It’s quite commonly used, for example, in understanding what customers are saying about a product or company. Virtually every large firm that is interested in big data should have someone available with NLP skills, but one or two experts will probably be sufficient.


pages: 348 words: 39,850

Data Scientists at Work by Sebastian Gutierrez

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, algorithmic trading, Bayesian statistics, bioinformatics, bitcoin, business intelligence, chief data officer, clean water, cloud computing, commoditize, computer vision, continuous integration, correlation does not imply causation, creative destruction, crowdsourcing, data is the new oil, DevOps, domain-specific language, Donald Knuth, follow your passion, full text search, informal economy, information retrieval, Infrastructure as a Service, Intergovernmental Panel on Climate Change (IPCC), inventory management, iterative process, lifelogging, linked data, Mark Zuckerberg, microbiome, Moneyball by Michael Lewis explains big data, move fast and break things, move fast and break things, natural language processing, Network effects, nuclear winter, optical character recognition, pattern recognition, Paul Graham, personalized medicine, Peter Thiel, pre–internet, quantitative hedge fund, quantitative trading / quantitative finance, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, self-driving car, side project, Silicon Valley, Skype, software as a service, speech recognition, statistical model, Steve Jobs, stochastic process, technology bubble, text mining, the scientific method, web application

But ultimately I couldn’t resist working on problems with practical consequences, and that’s how I found myself specializing in information retrieval and data science more broadly. Gutierrez: How did you get interested in working with data? Tunkelang: One of the problems I worked on at IBM was visualizing semantic networks obtained by applying natural language processing algorithms to large document collections. Even though my focus was on the network visualization algorithms, I couldn’t help noticing that the natural language processing algorithms had their good moments and bad moments. And that there was only so much I could do with visualization algorithms if the raw data was noisy. www.it-ebooks.info Data Scientists at Work Several years later, when I was at Endeca, I found myself working on terminology extraction and had to confront the noise problems personally.

For instance, when you’re reading a piece of text, if it’s English text, you have grammar for English, so you want a system on top of it that extracts the most likely interpretation that is part of the language. And what you’d like to be able to do is to train the system to simultaneously do the recognition and the segmentation, as well as provide the right input for the language model. We managed to figure out how to do this. Since then the technique has been reinvented in different forms multiple times for different contexts of natural language processing and for other things. There are models called CRF—conditional random fields—as well as structured perceptron, and then later things such as structured SVMs, which are very much in the same spirit except they’re not deep. Our system was deep. So the second half of that paper talks about how you do this. Sadly, it seems like very few people ever read the second half! I’m extremely proud of our work.

www.it-ebooks.info 207 208 Chapter 10 | Anna Smith, Rent the Runway A more informational and harder-to-work-with body measure data set is the review a person leaves after wearing the dress. Often, the customers write a long exposé about the dress and how it fit them. In these reviews, they offer advice to other people about the dress and why it might or might not fit them based on the size of the dress they wore and their body. This type of data is harder to use because we have to parse all that information out with natural language processing to try to expose the relevant details. Gutierrez: How can you ensure the accuracy of this very personal data? Smith: A really simple thing we’re doing is looking at the size someone says she is and the size of dress she actually wore. Additionally, we have a question that asks, “Is the dress true-to-fit?” Though the question is ambiguous, the answers give a first-order approximation of whether the wearer found the dress size large, small, or somewhere in between.


pages: 58 words: 12,386

Big Data Glossary by Pete Warden

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

business intelligence, crowdsourcing, fault tolerance, information retrieval, linked data, natural language processing, recommendation engine, web application

If you’re dealing with graph data, Tinkerpop will give you some high-level interfaces that can be much more convenient to deal with than raw graph databases. Chapter 7. NLP Natural language processing (NLP) is a subset of data processing that’s so crucial, it earned its own section. Its focus is taking messy, human-created text and extracting meaningful information. As you can imagine, this chaotic problem domain has spawned a large variety of approaches, with each tool most useful for particular kinds of text. There’s no magic bullet that will understand written information as well as a human, but if you’re prepared to adapt your use of the results to handle some errors and don’t expect miracles, you can pull out some powerful insights. Natural Language Toolkit The NLTK is a collection of Python modules and datasets that implement common natural language processing techniques. It offers the building blocks that you need to build more complex algorithms for specific problems.


pages: 309 words: 114,984

The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age by Robert Wachter

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

activist fund / activist shareholder / activist investor, Affordable Care Act / Obamacare, AI winter, Airbnb, Atul Gawande, Captain Sullenberger Hudson, Checklist Manifesto, Chuck Templeton: OpenTable, Clayton Christensen, collapse of Lehman Brothers, computer age, creative destruction, crowdsourcing, deskilling, en.wikipedia.org, Erik Brynjolfsson, everywhere but in the productivity statistics, Firefox, Frank Levy and Richard Murnane: The New Division of Labor, Google Glasses, Ignaz Semmelweis: hand washing, Internet of things, job satisfaction, Joseph Schumpeter, knowledge worker, lifelogging, medical malpractice, medical residency, Menlo Park, minimum viable product, natural language processing, Network effects, Nicholas Carr, obamacare, pattern recognition, peer-to-peer, personalized medicine, pets.com, Productivity paradox, Ralph Nader, RAND corporation, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, Skype, Snapchat, software as a service, Steve Jobs, Steven Levy, the payments system, The Wisdom of Crowds, Thomas Bayes, Toyota Production System, Uber for X, US Airways Flight 1549, Watson beat the top human players on Jeopardy!, Yogi Berra

Even technophiles admit that the quest to replace doctors with computers—or even the more modest ambition of providing them with useful guidance at the point of care—has been overhyped and unproductive. But times have changed. The growing prevalence of electronic health records offers grist for the AI and big-data mills, grist that wasn’t available when the records were on paper. And in this, the Age of Watson, we have new techniques, like natural language processing and machine learning, at our disposal. Perhaps this is our “gradually, then suddenly” moment. The public worships dynamic, innovative surgeons like Michael DeBakey; passionate, insightful researchers like Jonas Salk; and telegenic show horses like Mehmet Oz. But we seldom hear about those doctors whom other physicians tend to hold in the highest esteem: the great medical diagnosticians.

As if this weren’t complicated enough for the poor IBM engineer gearing up to retool Watson from answering questions about “Potent Potables” to diagnosing sick patients, there’s more. While the EHR at least offers a fighting chance for computerized diagnosis (older medical AI programs, built in the pen-and-paper era, required busy physicians to write their notes and then reenter all the key data), parsing an electronic medical record is far from straightforward. Natural language processing is getting much better, but it still has real problems with negation (“the patient has no history of chest pain or cough”) and with family history (“there is a history of arthritis in the patient’s sister, but his mother is well”), to name just a couple of issues. Certain terms have multiple meanings: when written by a psychiatrist, the term depression is likely to refer to a mood disorder, while when it appears in a cardiologist’s note (“there was no evidence of ST-depression”) it probably refers to a dip in the EKG tracing that is often a clue to coronary disease.

The scruffies are the pragmatists, the hackers, the crazy ones; they believe that problems should be attacked through whatever means work, and that modeling the behavior of experts or the scientific truth of a situation isn’t all that important. IBM’s breakthrough was to figure out that a combination of neat and scruffy—programming in some of the core rules of the game, but then folding in the fruits of machine learning and natural language processing—could solve truly complicated problems. When he was asked about the difference between human thinking and Watson’s method, Eric Brown, who runs IBM’s Watson Technologies group, gave a careful answer (note the shout-out to the humans, the bit players who made it all possible): A lot of the way that Watson works is motivated by the way that humans analyze problems and go about trying to find solutions, especially when it comes to dealing with complex problems where there are a number of intermediate steps to get you to the final answer.


pages: 339 words: 88,732

The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson, Andrew McAfee

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, access to a mobile phone, additive manufacturing, Airbnb, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, American Society of Civil Engineers: Report Card, Any sufficiently advanced technology is indistinguishable from magic, autonomous vehicles, barriers to entry, basic income, Baxter: Rethink Robotics, British Empire, business intelligence, business process, call centre, Chuck Templeton: OpenTable, clean water, combinatorial explosion, computer age, computer vision, congestion charging, corporate governance, creative destruction, crowdsourcing, David Ricardo: comparative advantage, digital map, employer provided health coverage, en.wikipedia.org, Erik Brynjolfsson, factory automation, falling living standards, Filter Bubble, first square of the chessboard / second half of the chessboard, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, full employment, game design, global village, happiness index / gross national happiness, illegal immigration, immigration reform, income inequality, income per capita, indoor plumbing, industrial robot, informal economy, intangible asset, inventory management, James Watt: steam engine, Jeff Bezos, jimmy wales, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Kevin Kelly, Khan Academy, knowledge worker, Kodak vs Instagram, law of one price, low skilled workers, Lyft, Mahatma Gandhi, manufacturing employment, Marc Andreessen, Mark Zuckerberg, Mars Rover, mass immigration, means of production, Narrative Science, Nate Silver, natural language processing, Network effects, new economy, New Urbanism, Nicholas Carr, Occupy movement, oil shale / tar sands, oil shock, pattern recognition, Paul Samuelson, payday loans, price stability, Productivity paradox, profit maximization, Ralph Nader, Ray Kurzweil, recommendation engine, Report Card for America’s Infrastructure, Robert Gordon, Rodney Brooks, Ronald Reagan, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Simon Kuznets, six sigma, Skype, software patent, sovereign wealth fund, speech recognition, statistical model, Steve Jobs, Steven Pinker, Stuxnet, supply-chain management, TaskRabbit, technological singularity, telepresence, The Bell Curve by Richard Herrnstein and Charles Murray, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, total factor productivity, transaction costs, Tyler Cowen: Great Stagnation, Vernor Vinge, Watson beat the top human players on Jeopardy!, winner-take-all economy, Y2K

A 2004 review of the previous half-century’s research in automatic speech recognition (a critical part of natural language processing) opened with the admission that “Human-level speech recognition has proved to be an elusive goal,” but less than a decade later major elements of that goal have been reached. Apple and other companies have made robust natural language processing technology available to hundreds of millions of people via their mobile phones.10 As noted by Tom Mitchell, who heads the machine-learning department at Carnegie Mellon University: “We’re at the beginning of a ten-year period where we’re going to transition from computers that can’t understand language to a point where computers can understand quite a bit about language.”11 Digital Fluency: The Babel Fish Goes to Work Natural language processing software is still far from perfect, and computers are not yet as good as people at complex communication, but they’re getting better all the time.

Their hundreds of person-years of accumulated experience and expertise seemed like an insurmountable advantage over a bunch of novices. They needn’t have worried. Many of the ‘novices’ drawn to the challenge outperformed all of the testing companies in the essay competition. The surprises continued when Kaggle investigated who the top performers were. In both competitions, none of the top three finishers had any previous significant experience with either essay grading or natural language processing. And in the second competition, none of the top three finishers had any formal training in artificial intelligence beyond a free online course offered by Stanford AI faculty and open to anyone in the world who wanted to take it. People all over the world did, and evidently they learned a lot. The top three individual finishers were from, respectively, the United States, Slovenia, and Singapore.

Thinking Machines, Available Now Machines that can complete cognitive tasks are even more important than machines that can accomplish physical ones. And thanks to modern AI we now have them. Our digital machines have escaped their narrow confines and started to demonstrate broad abilities in pattern recognition, complex communication, and other domains that used to be exclusively human. We’ve also recently seen great progress in natural language processing, machine learning (the ability of a computer to automatically refine its methods and improve its results as it gets more data), computer vision, simultaneous localization and mapping, and many of the other fundamental challenges of the discipline. We’re going to see artificial intelligence do more and more, and as this happens costs will go down, outcomes will improve, and our lives will get better.

The Economic Singularity: Artificial intelligence and the death of capitalism by Calum Chace

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, additive manufacturing, agricultural Revolution, AI winter, Airbnb, artificial general intelligence, augmented reality, autonomous vehicles, banking crisis, basic income, Baxter: Rethink Robotics, Berlin Wall, Bernie Sanders, bitcoin, blockchain, call centre, Chris Urmson, congestion charging, credit crunch, David Ricardo: comparative advantage, Douglas Engelbart, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Flynn Effect, full employment, future of work, gender pay gap, gig economy, Google Glasses, Google X / Alphabet X, ImageNet competition, income inequality, industrial robot, Internet of things, invention of the telephone, invisible hand, James Watt: steam engine, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, knowledge worker, lifelogging, lump of labour, Lyft, Marc Andreessen, Mark Zuckerberg, Martin Wolf, McJob, means of production, Milgram experiment, Narrative Science, natural language processing, new economy, Occupy movement, Oculus Rift, PageRank, pattern recognition, post scarcity, post-industrial society, precariat, prediction markets, QWERTY keyboard, railway mania, RAND corporation, Ray Kurzweil, RFID, Rodney Brooks, Satoshi Nakamoto, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Skype, software is eating the world, speech recognition, Stephen Hawking, Steve Jobs, TaskRabbit, technological singularity, The Future of Employment, Thomas Malthus, transaction costs, Tyler Cowen: Great Stagnation, Uber for X, universal basic income, Vernor Vinge, working-age population, Y Combinator, young professional

This algorithm, while ingenious, was not itself an example of artificial intelligence. Over time, Google Search has become unquestionably AI-powered. In August 2013, Google executed a major update of its search function by introducing Hummingbird, which enables the service to respond appropriately to questions phrased in natural language, such as, “what's the quickest route to Australia?”[lxxix] It combines AI techniques of natural language processing with colossal information resources (including Google's own Knowledge Graph, and of course Wikipedia) to analyse the context of the search query and make the response more relevant. PageRank wasn't dropped, but instead became just one of the 200 or so techniques that are now deployed to provide answers. Like IBM Watson, this is an example of how AI systems are often agglomerations of numerous approaches.

[civ] The software was initially licensed for single machines only, so even very well resourced organisations weren’t able to replicate the functionality that Google enjoys, but the move was significant. In April 2016 that restriction was lifted.[cv] In October 2015, Facebook announced that it would follow suit by open sourcing the designs for Big Sur, the server which runs the company's latest AI algorithms.[cvi] Then in May 2016 Google open sourced a natural language processing programme playfully called Parsey McParseFace, and SyntaxNet, an associated software toolkit. Google claims that in the kinds of sentences it can be used with, Parsey’s accuracy is 94%, almost as good as the 95% score achieved by human linguists.[cvii] Open sourcing confers a number of advantages. One is a level of goodwill among the AI community. More importantly, researchers in academia and elsewhere will learn the systems, and be able to work closely with Google and Facebook – and indeed be hired by them.

It was the year when our media caught on to the idea that AI presents enormous opportunity and enormous risk. This was thanks in no small part to the publication the previous year of Nick Bostrom's book “Superintelligence”. It was also the year when cutting-edge AI systems used deep learning and other techniques to demonstrate human-level capabilities in image recognition, speech recognition and natural language processing. In hindsight, 2015 may well be seen as a tipping point. Machines don't have to make everybody unemployed to bring about an economic singularity. If a majority of people – or even just a large minority – can never get hired again, we will need a different type of economy. Furthermore, we don't have to be absolutely certain of this outcome to make it worthwhile to monitor developments and make contingency plans.


pages: 134 words: 29,488

Python Requests Essentials by Rakesh Vidya Chandra, Bala Subrahmanyam Varanasi

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

create, read, update, delete, en.wikipedia.org, MVC pattern, natural language processing, RFC: Request For Comment, RFID, supply-chain management, web application

[ 48 ] Interacting with Social Media Using Requests In this contemporary world, our lives are woven with a lot of interactions and collaborations with social media. The information that is available on the web is very valuable and it is being used by abundant resources. For instance, the news that is trending in the world can be spotted easily from a Twitter hashtag and this can be achieved by interacting with the Twitter API. Using natural language processing, we can classify emotion of a person by grabbing the Facebook status of an account. All this stuff can be accomplished easily with the help of Requests using the concerned APIs. Requests is a perfect module, if we want to reach out API frequently, as it supports pretty much everything, like caching, redirection, proxies, and so on. We will cover the following topics in this chapter: • Interacting with Twitter • Interacting with Facebook • Interacting with reddit [ 49 ] Interacting with Social Media Using Requests API introduction Before diving into details, let us have a quick look at what exactly is an Application Programming Interface (API).

They contain information about the date of birth, gender, place, income, and so on, of the people of a country. Unstructured data In contrast to structured data, unstructured data either misses out on a standard format or stays unorganized even though a specific format is imposed on it. Due to this reason, it becomes difficult to deal with different parts of the data. Also, it turns into a tedious task. To handle unstructured data, different techniques such as text analytics, Natural Language Processing (NLP), and data mining are used. Images, scientific data, text-heavy content (such as newspapers, health records, and so on), come under the unstructured data type. [ 66 ] Chapter 6 Semistructured data Semistructured data is a type of data that follows an irregular trend or has a structure which changes rapidly. This data can be a self described one, it uses tags and other markers to establish a semantic relationship among the elements of the data.


pages: 118 words: 35,663

Smart Machines: IBM's Watson and the Era of Cognitive Computing (Columbia Business School Publishing) by John E. Kelly Iii

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AI winter, call centre, carbon footprint, crowdsourcing, demand response, discovery of DNA, Erik Brynjolfsson, future of work, Geoffrey West, Santa Fe Institute, global supply chain, Internet of things, John von Neumann, Mars Rover, natural language processing, optical character recognition, pattern recognition, planetary scale, RAND corporation, RFID, Richard Feynman, Richard Feynman, smart grid, smart meter, speech recognition, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!

Together, we can drive the exploration and invention that will shape society, the economy, and business for the next fifty years. 1 A NEW ERA OF COMPUTING IBM’s Watson computer created a sensation when it bested two past grand champions on the TV quiz show Jeopardy! Tens of millions of people suddenly understood how “smart” a computer could be. This was no mere parlor trick; the scientists who designed Watson built upon decades of research in the fields of artificial intelligence and natural-language processing and produced a series of breakthroughs. Their ingenuity made it possible for a system to excel at a game that requires both encyclopedic knowledge and lightning-quick recall. In preparation for the match, the machine ingested millions of pages of information. On the TV show, first broadcast in February 2011, the system was able to search that vast storehouse in response to questions, size up its confidence level, and, when sufficiently confident, beat the humans to the buzzer.

As it acquires answers, it will build a collection of learned axioms that strengthen its command of given domains. Other improvements to Watson have come. People are now able to view the logic and evidence upon which Watson presents options. Watson is now able to digest not just textual information but also structured statistical data, such as electronic medical records. A different group at IBM is working on natural-language-processing technology that will allow people to engage in spoken conversations with Watson. At the highest level, many of the changes are aimed at moving Watson from answering specific questions to dealing with complex and incomplete problem scenarios—the way humans experience things. In fact, as people in particular professions and industries experiment with Watson, they find that the basic question-and-answer capabilities, while useful, are not the most valuable aspects of the systems.


pages: 255 words: 78,207

Web Scraping With Python: Collecting Data From the Modern Web by Ryan Mitchell

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AltaVista, Amazon Web Services, cloud computing, en.wikipedia.org, Firefox, Guido van Rossum, meta analysis, meta-analysis, natural language processing, optical character recognition, random walk, self-driving car, Turing test, web application

Although you might not think that text analysis has anything to do with your project, understanding the concepts behind it can be extremely useful for all sorts of machine learning, as well as the more general ability to model real-world problems in proba‐ bilistic and algorithmic terms. 1 Although many of the techniques described in this chapter can be applied to all or most languages, it’s okay for now to focus on natural language processing in English only. Tools such as Python’s Natural Language Toolkit, for example, focus on English. Fifty-six percent of the Internet is still in English (with German follow‐ ing at a mere 6%, according to http://w3techs.com/technologies/overview/content_language/all). But who knows? English’s hold on the majority of the Internet will almost certainly change in the future, and further updates may be necessary in the next few years. 119 For instance, the Shazam music service can identify audio as containing a certain song recording, even if that audio contains ambient noise or distortion.

I hope that the coverage here will inspire you to think beyond conventional web scraping, or at least give some initial direction about where to begin when undertaking a project that requires natural language analysis. There are many excellent resources on introductory language processing and Python’s Natural Language Toolkit. In particular, Steven Bird, Ewan Klein, and Edward Loper’s book Natural Language Processing with Python presents both a com‐ prehensive and introductory approach to the topic. In addition, James Pustejovsky and Amber Stubbs’ Natural Language Annotations for Machine Learning provides a slightly more advanced theoretical guide. You’ll need a knowledge of Python to implement the lessons; the topics covered work perfectly with Python’s Natural Language Toolkit. 136 | Chapter 8: Reading and Writing Natural Languages CHAPTER 9 Crawling Through Forms and Logins One of the first questions that comes up when you start to move beyond the basics of web scraping is: “How do I access information behind a login screen?”

Hamidi, 227 intellectual property, 217-219 234 internal links crawling an entire site, 35-40 crawling with Scrapy, 45-48 traversing a single domain, 31-35 Internet about, 213-216 cautions downloading files from, 74 crawling across, 40-45 moving forward, 206 IP address blocking, avoiding, 199-200 ISO character sets, 96-98 is_displayed function, 186 Item object, 46, 48 items.py file, 46 | Index lambda expressions, 28, 74 legalities of web scraping, 217-230 lexicographical analysis with NLTK, 132-136 libraries bundling with projects, 7 OCR support, 161-164 logging with Scrapy, 48 logins about, 137 handling, 142-143 troubleshooting, 187 lxml library, 29 M machine learning, 135, 180 machine training, 135, 171-174 Markov text generators, 123-129 media files, storing, 71-74 Mersenne Twister algorithm, 34 methods (HTTP), 51 Microsoft SQL Server, 76 Microsoft Word, 102-105 MIME (Multipurpose Internet Mail Exten‐ sions) protocol, 90 MIMEText object, 90 MySQL about, 76 basic commands, 79-82 database techniques, 85-87 installing, 77-79 integrating with Python, 82-85 Wikipedia example, 87-89 N name attribute, 140 natural language processing about, 119 additional resources, 136 Markov models, 123-129 Natural Language Toolkit, 129-136 summarizing data, 120-123 Natural Language Toolkit (NLTK) about, 129 installation and setup, 129 lexicographical analysis, 132-136 statistical analysis, 130-132 NavigableString object, 18 navigating trees, 18-22 network connections about, 3-5 connecting reliably, 9-11 security considerations, 181 next_siblings() function, 21 ngrams module, 132 n-grams, 109-112, 120 NLTK (Natural Language Toolkit) about, 129 installation and setup, 129 lexicographical analysis, 132-136 statistical analysis, 130-132 NLTK Downloader interface, 130 NLTK module, 129 None object, 10 normalizing data, 112-113 NumPy library, 164 O OAuth authentication, 57 OCR (optical character recognition) about, 161 library support, 162-164 OpenRefine Expression Language (GREL), 116 OpenRefine tool about, 114 cleaning data, 116-118 filtering data, 115-116 installing, 114 usage considerations, 114 optical character recognition (OCR) about, 161 library support, 162-164 Oracle DBMS, 76 OrderedDict object, 112 os module, 74 P page load times, 154, 182 parentheses (), 25 parents (tags), 20, 22 parsing HTML pages (see HTML parsing) parsing JSON, 63 patents, 217 pay-per-hour computing instances, 205 PDF files, 100-102 PDFMiner3K library, 101 Penn Treebank Project, 133 period (.), 25 Peters, Tim, 211 PhantomJS tool, 152-155, 203 PIL (Python Imaging Library), 162 Pillow library about, 162 processing well-formatted text, 165-169 pipe (|), 25 plus sign (+), 25 POST method (HTTP) about, 51 tracking requests, 140 troubleshooting, 186 variable names and, 138 viewing form parameters, 140 Index | 235 previous_siblings() function, 21 primary keys in tables, 85 programming languages, regular expressions and, 27 projects, bundling with libraries, 7 pseudorandom number generators, 34 PUT method (HTTP), 51 PyMySQL library, 82-85 PySocks module, 202 Python Imaging Library (PIL), 162 Python language, installing, 209-211 Q query time versus database size, 86 quotation marks ("), 17 R random number generators, 34 random seeds, 34 rate limits about, 52 Google APIs, 60 Twitter API, 55 reading documents document encoding, 93 Microsoft Word, 102-105 PDF files, 100 text files, 94-98 recursion limit, 38, 89 redirects, 44, 158 Referrer header, 179 RegexPal website, 24 regular expressions about, 22-27 BeautifulSoup example, 27 commonly used symbols, 25 programming languages and, 27 relational data, 77 remote hosting running from a website hosting account, 203 running from the cloud, 204 remote servers avoiding IP address blocking, 199-200 extensibility and, 200 portability and, 200 PySocks and, 202 Tor and, 201-202 Requests library 236 | Index about, 137 auth module, 144 installing, 138, 179 submitting forms, 138 tracking cookies, 142-143 requests module, 179-181 responses, API calls and, 52 Robots Exclusion Standard, 223 robots.txt file, 138, 167, 222-225, 229 S safe harbor protection, 219, 230 Scrapy library, 45-48 screenshots, 197 script tag, 147 search engine optimization (SEO), 222 searching text data, 135 security considerations copyright law and, 219 forms and, 183-186 handling cookies, 181 SELECT statement, 79, 81 Selenium library about, 143 elements and, 153, 194 executing JavaScript, 152-156 handling redirects, 158 security considerations, 185 testing example, 193-198 Tor support, 203 semicolon (;), 210 SEO (search engine optimization), 222 server-side processing handling redirects, 44, 158 scripting languages and, 147 sets, 67 siblings (tags), 21 Simple Mail Transfer Protocol (SMTP), 90 site maps, 36 Six Degrees of Wikipedia, 31-35 SMTP (Simple Mail Transfer Protocol), 90 smtplib package, 90 sorted function, 112 span tag, 15 Spitler, Daniel, 227 SQL Server (Microsoft), 76 square brackets [], 25 src attribute, 28, 72, 74 StaleElementReferenceException, 158 statistical analysis with NLTK, 130-132 storing data (see data management) StringIO object, 99 strings, regular expressions and, 22-28 stylesheets about, 14, 216 dynamic HTML and, 151 hidden fields and, 184 Surface Web, 36 trademarks, 218 traversing the Web (see web crawlers) tree navigation, 18-22 trespass to chattels, 219-220, 226 trigrams module, 132 try...finally statement, 85 Twitov app, 123 Twitter API, 55-59 T underscore (_), 17 undirected graph problems, 127 Unicode standard, 83, 95-98, 110 unit tests, 190, 197 United States v.


pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, Affordable Care Act / Obamacare, airport security, AltaVista, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, IBM and the Holocaust, index card, informal economy, intangible asset, Internet of things, invention of the printing press, Jeff Bezos, lifelogging, Louis Pasteur, Mark Zuckerberg, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, performance metric, Peter Thiel, Post-materialism, post-materialism, random walk, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, speech recognition, Steve Jobs, Steven Levy, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Watson beat the top human players on Jeopardy!

In fact, endgames when six or fewer pieces are left on the chessboard have been completely analyzed and all possible moves (N=all) have been represented in a massive table that when uncompressed fills more than a terabyte of data. This enables chess computers to play the endgame flawlessly. No human will ever be able to outplay the system. The degree to which more data trumps better algorithms has been powerfully demonstrated in the area of natural language processing: the way computers learn how to parse words as we use them in everyday speech. Around 2000, Microsoft researchers Michele Banko and Eric Brill were looking for a method to improve the grammar checker that is part of the company’s Word program. They weren’t sure whether it would be more useful to put their effort into improving existing algorithms, finding new techniques, or adding more sophisticated features.

The trillion-word corpus Google released in 2006 was compiled from the flotsam and jetsam of Internet content—“data in the wild,” so to speak. This was the “training set” by which the system could calculate the probability that, for example, one word in English follows another. It was a far cry from the grandfather in the field, the famous Brown Corpus of the 1960s, which totaled one million English words. Using the larger dataset enabled great strides in natural-language processing, upon which systems for tasks like voice recognition and computer translation are based. “Simple models and a lot of data trump more elaborate models based on less data,” wrote Google’s artificial-intelligence guru Peter Norvig and colleagues in a paper entitled “The Unreasonable Effectiveness of Data.” As Norvig and his co-authors explained, messiness was the key: “In some ways this corpus is a step backwards from the Brown Corpus: it’s taken from unfiltered Web pages and thus contains incomplete sentences, spelling errors, grammatical errors, and all sorts of other errors.

See imprecision MetaCrawler, [>] metadata: in datafication, [>]–[>] metric system, [>] Microsoft, [>], [>], [>] Amalga software, [>]–[>], [>] and data-valuation, [>] and language translation, [>] Word spell-checking system, [>]–[>] Minority Report [film], [>]–[>], [>] Moneyball [film], [>], [>]–[>], [>], [>] Moneyball (Lewis), [>] Moore’s Law, [>] Mydex, [>] nanotechnology: and qualitative changes, [>] Nash, Bruce, [>] nations: big data and competitive advantage among, [>]–[>] natural language processing, [>] navigation, marine: correlation analysis in, [>]–[>] Maury revolutionizes, [>]–[>], [>], [>], [>], [>], [>], [>], [>], [>], [>] Negroponte, Nicholas: Being Digital, [>] Netbot, [>] Netflix, [>] collaborative filtering at, [>] data-reuse by, [>] releases personal data, [>] Netherlands: comprehensive civil records in, [>]–[>] network analysis, [>] network theory, [>] big data in, [>]–[>] New York City: exploding manhole covers in, [>]–[>], [>]–[>], [>], [>] government data-reuse in, [>]–[>] New York Times, [>]–[>] Next Jump, [>] Neyman, Jerzy: on statistical sampling, [>] Ng, Andrew, [>] 1984 (Orwell), [>], [>] Norvig, Peter, [>] “The Unreasonable Effectiveness of Data,” [>] Nuance: fails to understand data-reuse, [>]–[>] numerical systems: history of, [>]–[>] Oakland Athletics, [>]–[>] Obama, Barack: on open data, [>] Och, Franz Josef, [>] Ohm, Paul: on privacy, [>] oil refining: big data in, [>] ombudsmen, [>] Omidyar, Pierre, [>] open data.


pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AI winter, algorithmic trading, asset allocation, banking crisis, barriers to entry, Big bang: deregulation of the City of London, butterfly effect, buttonwood tree, buy low sell high, capital asset pricing model, citizen journalism, collateralized debt obligation, corporate governance, Craig Reynolds: boids flock, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, Emanuel Derman, en.wikipedia.org, experimental economics, financial innovation, fixed income, Gordon Gekko, implied volatility, index arbitrage, index fund, information retrieval, intangible asset, Internet Archive, John Nash: game theory, Kenneth Arrow, Khan Academy, load shedding, Long Term Capital Management, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, Metcalfe’s law, moral hazard, mutually assured destruction, Myron Scholes, natural language processing, negative equity, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, quantitative hedge fund, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Renaissance Technologies, Richard Stallman, risk tolerance, risk-adjusted returns, risk/return, Robert Metcalfe, Ronald Reagan, Rubik’s Cube, semantic web, Sharpe ratio, short selling, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, too big to fail, transaction costs, Turing machine, Upton Sinclair, value at risk, Vernor Vinge, yield curve, Yogi Berra, your tax dollars at work

No aspect of financial life is untouched: research, risk management, trading, and investor communication. We are much more adept at using structured and quantitative information on the Internet than textual and qualitative information. We are just starting to learn how to effectively use this kind of information. This area is driven by new Internet technologies such as XML (extensible markup language) and RSS (an XML dialect) and by advances in natural language processing. The new kid on the block, expected to take these ideas to new levels, is the Resource Description Framework (RDF), promoted by Web inventor Berners-Lee. RDF does for relationships between tagged data elements what the XML tagging itself did for moving from format HTML tags like “Bold” to meaningful XML tags like “Price.” 38 Nerds on Wall Str eet Hits and Misses: Rational and Irrational Technology Exuberance Peter Bernstein’s book Capital Ideas (Free Press, 1993) tells the story of Bill Sharpe, who wandered Wall Street looking for enough computer time to run a simple capital asset pricing model (CAPM) portfolio optimization, while being regarded as something of a crackpot for doing so.

Karl Sim’s MIT video is here: www.youtube.com/watch?v=F0OHycypSG8. Artificial Intelligence and Intelligence Amplification 157 on genetically adaptive strategies and well funded, but vanished, and few of the principals are still keen on genetic algorithms. After sending the GA to the back of the breakthrough line in the previous chapter, in Chapter 9 we get to “The Text Frontier,” using IA, natural language processing, and Web technologies to extract and make sense of qualitative written information from news and a variety of disintermediated sources. In Chapter 6, “Stupid Data Miner Tricks,” we saw how you could fool yourself with data. When you collect data that people have put on the Web, they can try to fool you as well. Chapter 10 on Collective Intelligence and Chapter 11 on market manipulations include some remarkable and egregious examples.

What Gelertner’s thesis means for investing is that we can look inside that shoebox with a new set of technologies to develop a new form of research. Grabbing more and more data, and doing more and more searches, will quickly overwhelm us, leading to advanced cases of carpal tunnel syndrome, and a shelf full of unread books with “Information Explosion” somewhere in the title. Collectively, the new alphabet soup of technologies—AI, IA, NLP, and IR (artificial intelligence, intelligence amplification, natural language processing, and information retrieval, for those with a bigger soup bowl)—provides a means to make sense of patterns in the data collected in enterprise and global search. These means are molecular search, the use of persistent software agents so you don’t have to keep doing the same thing all the time; the semantic Web, using the information associated with data at the point of origin so there is less guessing about meaning of what find; and modern user interfaces and visualizations, so you can prioritize what you find, and focus on the important and the valuable in a timely way.


pages: 713 words: 93,944

Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement by Eric Redmond, Jim Wilson, Jim R. Wilson

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Amazon Web Services, create, read, update, delete, data is the new oil, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, general-purpose programming language, linked data, MVC pattern, natural language processing, node package manager, random walk, recommendation engine, Ruby on Rails, Skype, social graph, web application

SELECT *​​ ​​FROM movies​​ ​​WHERE title % 'Avatre';​​ ​​ title​​ ​​---------​​ ​​ Avatar​​ Trigrams are an excellent choice for accepting user input, without weighing them down with wildcard complexity. Full-Text Fun Next, we want to allow users to perform full-text searches based on matching words, even if they’re pluralized. If a user wants to search for certain words in a movie title but can remember only some of them, Postgres supports simple natural-language processing. TSVector and TSQuery Let’s look for a movie that contains the words night and day. This is a perfect job for text search using the @@ full-text query operator. ​​SELECT title​​ ​​FROM movies​​ ​​WHERE title @@ 'night & day';​​ ​​ title​​ ​​-------------------------------​​ ​​ A Hard Day’s Night​​ ​​ Six Days Seven Nights​​ ​​ Long Day’s Journey Into Night​​ The query returns titles like A Hard Day’s Night, despite the word Day being in possessive form, and the two words are out of order in the query.

Compare these two vectors: ​​SELECT to_tsvector('english', 'A Hard Day''s Night');​​ ​​ to_tsvector​​ ​​----------------------------​​ ​​'day':3 'hard':2 'night':5​​ ​​SELECT to_tsvector('simple', 'A Hard Day''s Night');​​ ​​ to_tsvector​​ ​​----------------------------------------​​ ​​'a':1 'day':3 'hard':2 'night':5 's':4​​ With simple, you can retrieve any movie containing the lexeme a. Other Languages Since Postgres is doing some natural-language processing here, it only makes sense that different configurations would be used for different languages. All of the installed configurations can be viewed with this command: ​​book=# \dF​​ Dictionaries are part of what Postgres uses to generate tsvector lexemes (along with stop words and other tokenizing rules we haven’t covered called parsers and templates). You can view your system’s list here: ​​book=# \dFd​​ You can test any dictionary outright by calling the ts_lexize function.

This also explains why HBase is often employed at big companies to back logging and search systems. 4.1 Introducing HBase HBase is a column-oriented database that prides itself on consistency and scaling out. It is based on BigTable, a high-performance, proprietary database developed by Google and described in the 2006 white paper “Bigtable: A Distributed Storage System for Structured Data.”[26] Initially created for natural-language processing, HBase started life as a contrib package for Apache Hadoop. Since then, it has become a top-level Apache project. On the architecture front, HBase is designed to be fault tolerant. Hardware failures may be uncommon for individual machines, but in a large cluster, node failure is the norm. By using write-ahead logging and distributed configuration, HBase can quickly recover from individual server failures.


pages: 391 words: 105,382

Utopia Is Creepy: And Other Provocations by Nicholas Carr

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Air France Flight 447, Airbnb, Airbus A320, AltaVista, Amazon Mechanical Turk, augmented reality, autonomous vehicles, Bernie Sanders, book scanning, Brewster Kahle, Buckminster Fuller, Burning Man, Captain Sullenberger Hudson, centralized clearinghouse, cloud computing, cognitive bias, collaborative consumption, computer age, corporate governance, crowdsourcing, Danny Hillis, deskilling, digital map, Donald Trump, Electric Kool-Aid Acid Test, Elon Musk, factory automation, failed state, feminist movement, Frederick Winslow Taylor, friendly fire, game design, global village, Google bus, Google Glasses, Google X / Alphabet X, Googley, hive mind, impulse control, indoor plumbing, interchangeable parts, Internet Archive, invention of movable type, invention of the steam engine, invisible hand, Isaac Newton, Jeff Bezos, jimmy wales, job automation, Kevin Kelly, lifelogging, low skilled workers, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, means of production, Menlo Park, mental accounting, natural language processing, Network effects, new economy, Nicholas Carr, Norman Mailer, off grid, oil shale / tar sands, Peter Thiel, Plutocrats, plutocrats, profit motive, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, Republic of Letters, robot derives from the Czech word robota Czech, meaning slave, Ronald Reagan, self-driving car, SETI@home, side project, Silicon Valley, Silicon Valley ideology, Singularitarianism, Snapchat, social graph, social web, speech recognition, Startup school, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, technoutopianism, the medium is the message, theory of mind, Turing test, Whole Earth Catalog, Y Combinator

That’s much less the case now. Google’s conception of searching has changed since those early days, and that means our own idea of what it means to search is changing as well. Google’s goal is no longer to read the web. It’s to read us. Ray Kurzweil, the inventor and AI speculator, recently joined the company as a director of engineering. His general focus will be on machine learning and natural language processing. But his particular concern will entail reconfiguring the company’s search engine to focus not outwardly on the world but inwardly on the user. “I envision some years from now that the majority of search queries will be answered without you actually asking,” he recently explained. “It’ll just know this is something that you’re going to want to see.” This has actually been Google’s great aspiration for a while now.

They shape speech to the needs of the computer network—and the computer network’s owner. “The speaking of language is part of an activity, or of a form of life,” wrote Wittgenstein in Philosophical Investigations. If human language is bound up in living, if it is an expression of both sense and sensibility, then computers, being nonliving, having no sensibility, will have a very difficult time mastering “natural-language processing” beyond a certain rudimentary level. The best solution, if you have a need to get computers to “understand” human communication, may be to avoid the problem altogether. Instead of figuring out how to get computers to understand natural language, you get people to speak artificial language, the language of computers. A good way to start is with Like buttons and other standardized messaging protocols.

., 226 video games and, 94–97 Merholz, Peter, 21 Merleau-Ponty, Maurice, 300 Merton, Robert, 12–13 message-automation service, 167 Meyer, Stephenie, 50 Meyerowitz, Joanne, 338 microfilm, microphotography, 267 Microsoft, 108, 168, 205, 284 military technology, 331–32 Miller, Perry, xvii mindfulness, 162 Minima Moralia (Adorno), 153–54 mirrors, 138–39 Mitchell, Joni, 128 Mollie (video poker player), 218–19 monitoring: corporate control through, 163–65 of thoughts, 214–15 through wearable behavior-modification devices, 168–69 Montaigne, Michel de, 247, 249, 252, 254 Moore, Geoffrey, 209 Morlocks, 114, 186 “Morphological Basis of the Arm-to-Wing Transition, The” (Poore), 329–30 Morrison, Ewan, 288 Morrison, Jim, 126 Morse code, 34 “Most of It, The” (Frost), 145–46 motor skills, video games and, 93–94 “Mowing” (Frost), 296–300, 302, 304–5 MP3 players, 122, 123, 124, 216, 218, 293 multitasking, media, 96–97 Mumford, Lewis, 138–39, 235 Murdoch, Rupert and Wendi, 131 music: bundling of, 41–46 commercial use of, 244–45 copying and sharing technologies for, 121–26, 314 digital revolution in, 293–95 fidelity of, 124 listening vs. interface in, 216–18, 293 in participatory games, 71–72 streamed and curated, 207, 217–18 music piracy, 121–26 Musings on Human Metamorphoses (Leary), 171 Musk, Elon, 172 Musset, Alfred de, xxiii Muzak, 208, 244 MySpace, xvi, 10–11, 30–31 “Names of the Hare, The,” 201 nanotechnology, 69 Napster, 122, 123 narcissism, 138–39 Twitter and, 34–36 narrative emotions, 250 natural-language processing, 215 Negroponte, Nicholas, xx neobehavioralism, 212–13 Netflix, 92 neural networks, 136–37 neuroengineering, 332–33 New Critics, 249 News Feed, 320 news media, 318–20 newspapers: evolution of, 79, 237 online archives of, 47–48, 190–92 online vs. printed, 289 Newton, Isaac, 66 New York Public Library, 269 New York Times, 8, 71, 83, 133, 152–53, 195, 237, 283, 314, 342 erroneous information revived by, 47–48 on Twitter, 35 Nielsen Company, 80–81 Nietzsche, Friedrich, 126, 234–35, 237 Nightingale, Paul, 335 Nixon, Richard, 317 noise pollution, 243–46 Nook, 257 North of Boston (Frost), 297 nostalgia, 202, 204, 312 in music, 292–95 Now You See It (Davidson), 94 Oates, Warren, 203 Oatley, Keith, 248–50 Obama, Barack, 314 obsession, 218–19 OCLC, 276 “off grid,” 52 Olds, James, 235 O’Neill, Gerard, 171 One Infinite Loop, 76 Ong, Walter, 129 online aggregation, 192 On Photography (Sontag), xx open networks, profiteering from, 83–85 open-source projects, 5–7, 26 Oracle, 17 orchises, 305 O’Reilly, Tim, 3–5, 7 organ donation and transplantation, 115 ornithopters, 239 orphan books, 276, 277 Overture, 279–80 Owad, Tom, 256 Oxford Junior Dictionary, 201–2 Oxford University, library of, 269 Page, Larry, 23, 160, 172, 239, 268–69, 270, 279, 281–85 personal style of, 16–17, 281–82, 285 paint-by-number kits, 71–72 Paley, William, 43 Palfrey, John, 272–74, 277 Palmisano, Sam, 26 “pancake people,” 242 paper, invention and uses of, 286–89 Paper: An Elegy (Sansom), 287 Papert, Seymour, 134 Paradise within the Reach of All Men, The (Etzler), xvi–xvii paradox of time, 203–4 parenting: automation of, 181 of virtual child, 73–75 Parker, Sarah Jessica, 131 participation: “cognitive surplus” in, 59 as content and performance, 184 inclusionists vs. deletionists in, 18–20 internet, 28–29 isolation and, 35–36, 184 limits and flaws of, 5–7, 62 Paul, Rand, 314 Pendragon, Caliandras (avatar), 25 Pentland, Alex, 212–13 perception, spiritual awakening of, 300–301 personalization, 11 of ads, 168, 225, 264 isolation and, 29 loss of autonomy in, 264–66 manipulation through, 258–59 in message automation, 167 in searches, 145–46, 264–66 of streamed music, 207–9, 245 tailoring in, 92, 224 as threat to privacy, 255 Phenomenology of Perception (Merleau-Ponty), 300 Philosophical Investigations (Wittgenstein), 215 phonograph, phonograph records, 41–46, 133, 287 photography, technological advancement in, 311–12 Pichai, Sundar, 181 Pilgrims, 172 Pinterest, 119, 186 playlists, 314 PlayStation, 260 “poetic faith,” 251 poetry, 296–313 polarization, 7 politics, transformed by technology, 314–20 Politics (Aristotle), 307–8 Poore, Samuel O., 329–30 pop culture, fact-mongering in, 58–62 pop music, 44–45, 63–64, 224 copying technologies for, 121–26 dead idols of, 126 industrialization of, 208–9 as retrospective and revivalist, 292–95 positivism, 211 Potter, Dean, 341–42 power looms, 178 Presley, Elvis, 11, 126 Prim Revolution, 26 Principles of Psychology (James), 203 Principles of Scientific Management, The (Taylor), 238 printing press: consequences of, 102–3, 234, 240–41, 271 development of, 53, 286–87 privacy: devaluation of, 258 from electronic surveillance, 52 family cohesion vs., 229 free flow of information vs. right to, 190–94 internet threat to, 184, 255–59, 265, 285 safeguarding of, 258–59, 283 vanity vs., 107 proactive cognitive control, 96 Prochnik, George, 243–46 “Productivity Future Vision (2011),” 108–9 Project Gutenberg, 278 prosperity, technologies of, 118, 119–20 prosumerism, 64 protest movements, 61 Proust and the Squid (Wolf), 234 proximal clues, 303 public-domain books, 277–78 “public library,” debate over use of term, 272–74 punch-card tabulator, 188 punk music, 63–64 Quantified Self Global Conference, 163 Quantified Self (QS) movement, 163–65 Quarter-of-a-Second Rule, 205 racecars, 195, 196 radio: in education, 134 evolution of, 77, 79, 159, 288 as music medium, 45, 121–22, 207 political use of, 315–16, 317, 319 Radosh, Daniel, 71 Rapp, Jen, 341–42 reactive cognitive control, 96 Readers’ Guide to Periodical Literature, 91 reading: brain function in, 247–54, 289–90 and invention of paper, 286–87 monitoring of, 257 video gaming vs., 261–62 see also books reading skills, changes in, 232–34, 240–41 Read Write Web (blog), 30 Reagan, Ronald, 315 real world: digital media intrusion in, 127–30 perceived as boring and ugly, 157–58 as source of knowledge, 313 virtual world vs., xx–xxi, 36, 62, 127–30, 303–4 reconstructive surgery, 239 record albums: copying of, 121–22 jackets for, 122, 224 technology of, 41–46 Redding, Otis, 126 Red Light Center, 39 Reichelt, Franz, 341 Reid, Rob, 122–25 relativists, 20 religion: internet perceived as, 3–4, 238 for McLuhan, 105 technology viewed as, xvi–xvii Republic of Letters, 271 reputations, tarnishing of, 47–48, 190–94 Resident Evil, 260–61 resource sharing, 148–49 resurrection, 69–70, 126 retinal implants, 332 Retromania (Reynolds), 217, 292–95 Reuters, Adam, 26 Reuters’ SL bureau, 26 revivification machine, 69–70 Reynolds, Simon, 217–18, 292–95 Rice, Isaac, 244 Rice, Julia Barnett, 243–44 Richards, Keith, 42 “right to be forgotten” lawsuit, 190–94 Ritalin, 304 robots: control of, 303 creepy quality of, 108 human beings compared to, 242 human beings replaced by, 112, 174, 176, 195, 197, 306–7, 310 limitations of, 323 predictions about, xvii, 177, 331 replaced by humans, 323 threat from, 226, 309 Rogers, Roo, 83–84 Rolling Stones, 42–43 Roosevelt, Franklin, 315 Rosen, Nick, 52 Rubio, Marco, 314 Rumsey, Abby Smith, 325–27 Ryan, Amy, 273 Sandel, Michael J., 340 Sanders, Bernie, 314, 316 Sansom, Ian, 287 Savage, Jon, 63 scatology, 147 Schachter, Joshua, 195 Schivelbusch, Wolfgang, 229 Schmidt, Eric, 13, 16, 238, 239, 257, 284 Schneier, Bruce, 258–59 Schüll, Natasha Dow, 218 science fiction, 106, 115, 116, 150, 309, 335 scientific management, 164–65, 237–38 Scrapbook in American Life, The, 185 scrapbooks, social media compared to, 185–86 “Scrapbooks as Cultural Texts” (Katriel and Farrell), 186 scythes, 302, 304–6 search-engine-optimization (SEO), 47–48 search engines: allusions sought through, 86 blogging, 66–67 in centralization of internet, 66–69 changing use of, 284 customizing by, 264–66 erroneous or outdated stories revived by, 47–48, 190–94 in filtering, 91 placement of results by, 47–48, 68 searching vs., 144–46 targeting information through, 13–14 writing tailored to, 89 see also Google searching, ontological connotations of, 144–46 Seasteading Institute, 172 Second Life, 25–27 second nature, 179 self, technologies of the, 118, 119–20 self-actualization, 120, 340 monitoring and quantification of, 163–65 selfies, 224 self-knowledge, 297–99 self-reconstruction, 339 self-tracking, 163–65 Selinger, Evan, 153 serendipity, internet as engine of, 12–15 SETI@Home, 149 sexbots, 55 Sex Pistols, 63 sex-reassignment procedures, 337–38 sexuality, 10–11 virtual, 39 Shakur, Tupac, 126 sharecropping, as metaphor for social media, 30–31 Shelley, Percy Bysshe, 88 Shirky, Clay, 59–61, 90, 241 Shop Class as Soulcraft (Crawford), 265 Shuster, Brian, 39 sickles, 302 silence, 246 Silicon Valley: American culture transformed by, xv–xxii, 148, 155–59, 171–73, 181, 241, 257, 309 commercial interests of, 162, 172, 214–15 informality eschewed by, 197–98, 215 wealthy lifestyle of, 16–17, 195 Simonite, Tom, 136–37 simulation, see virtual world Singer, Peter, 267 Singularity, Singularitarians, 69, 147 sitcoms, 59 situational overload, 90–92 skimming, 233 “Slaves to the Smartphone,” 308–9 Slee, Tom, 61, 84 SLExchange, 26 slot machines, 218–19 smart bra, 168–69 smartphones, xix, 82, 136, 145, 150, 158, 168, 170, 183–84, 219, 274, 283, 287, 308–9, 315 Smith, Adam, 175, 177 Smith, William, 204 Snapchat, 166, 205, 225, 316 social activism, 61–62 social media, 224 biases reinforced by, 319–20 as deceptively reflective, 138–39 documenting one’s children on, 74–75 economic value of content on, 20–21, 53–54, 132 emotionalism of, 316–17 evolution of, xvi language altered by, 215 loom as metaphor for, 178 maintaining one’s microcelebrity on, 166–67 paradox of, 35–36, 159 personal information collected and monitored through, 257 politics transformed by, 314–20 scrapbooks compared to, 185–86 self-validation through, 36, 73 traditional media slow to adapt to, 316–19 as ubiquitous, 205 see also specific sites social organization, technologies of, 118, 119 Social Physics (Pentland), 213 Society for the Suppression of Unnecessary Noise, 243–44 sociology, technology and, 210–13 Socrates, 240 software: autonomous, 187–89 smart, 112–13 solitude, media intrusion on, 127–30, 253 Songza, 207 Sontag, Susan, xx SoundCloud, 217 sound-management devices, 245 soundscapes, 244–45 space travel, 115, 172 spam, 92 Sparrow, Betsy, 98 Special Operations Command, U.S., 332 speech recognition, 137 spermatic, as term applied to reading, 247, 248, 250, 254 Spinoza, Baruch, 300–301 Spotify, 293, 314 “Sprite Sips” (app), 54 Squarciafico, Hieronimo, 240–41 Srinivasan, Balaji, 172 Stanford Encyclopedia of Philosophy, 68 Starr, Karla, 217–18 Star Trek, 26, 32, 313 Stengel, Rick, 28 Stephenson, Neal, 116 Sterling, Bruce, 113 Stevens, Wallace, 158 Street View, 137, 283 Stroop test, 98–99 Strummer, Joe, 63–64 Studies in Classic American Literature (Lawrence), xxiii Such Stuff as Dreams (Oatley), 248–49 suicide rate, 304 Sullenberger, Sully, 322 Sullivan, Andrew, xvi Sun Microsystems, 257 “surf cams,” 56–57 surfing, internet, 14–15 surveillance, 52, 163–65, 188–89 surveillance-personalization loop, 157 survival, technologies of, 118, 119 Swing, Edward, 95 Talking Heads, 136 talk radio, 319 Tan, Chade-Meng, 162 Tapscott, Don, 84 tattoos, 336–37, 340 Taylor, Frederick Winslow, 164, 237–38 Taylorism, 164, 238 Tebbel, John, 275 Technics and Civilization (Mumford), 138, 235 technology: agricultural, 305–6 American culture transformed by, xv–xxii, 148, 155–59, 174–77, 214–15, 229–30, 296–313, 329–42 apparatus vs. artifact in, 216–19 brain function affected by, 231–42 duality of, 240–41 election campaigns transformed by, 314–20 ethical hazards of, 304–11 evanescence and obsolescence of, 327 human aspiration and, 329–42 human beings eclipsed by, 108–9 language of, 201–2, 214–15 limits of, 341–42 master-slave metaphor for, 307–9 military, 331–32 need for critical thinking about, 311–13 opt-in society run by, 172–73 progress in, 77–78, 188–89, 229–30 risks of, 341–42 sociology and, 210–13 time perception affected by, 203–6 as tool of knowledge and perception, 299–304 as transcendent, 179–80 Technorati, 66 telegrams, 79 telegraph, Twitter compared to, 34 telephones, 103–4, 159, 288 television: age of, 60–62, 79, 93, 233 and attention disorders, 95 in education, 134 Facebook ads on, 155–56 introduction of, 103–4, 159, 288 news coverage on, 318 paying for, 224 political use of, 315–16, 317 technological adaptation of, 237 viewing habits for, 80–81 Teller, Astro, 195 textbooks, 290 texting, 34, 73, 75, 154, 186, 196, 205, 233 Thackeray, William, 318 “theory of mind,” 251–52 Thiel, Peter, 116–17, 172, 310 “Things That Connect Us, The” (ad campaign), 155–58 30 Days of Night (film), 50 Thompson, Clive, 232 thought-sharing, 214–15 “Three Princes of Serendip, The,” 12 Thurston, Baratunde, 153–54 time: memory vs., 226 perception of, 203–6 Time, covers of, 28 Time Machine, The (Wells), 114 tools: blurred line between users and, 333 ethical choice and, 305 gaining knowledge and perception through, 299–304 hand vs. computer, 306 Home and Away blurred by, 159 human agency removed from, 77 innovation in, 118 media vs., 226 slave metaphor for, 307–8 symbiosis with, 101 Tosh, Peter, 126 Toyota Motor Company, 323 Toyota Prius, 16–17 train disasters, 323–24 transhumanism, 330–40 critics of, 339–40 transparency, downside of, 56–57 transsexuals, 337–38 Travels and Adventures of Serendipity, The (Merton and Barber), 12–13 Trends in Biochemistry (Nightingale and Martin), 335 TripAdvisor, 31 trolls, 315 Trump, Donald, 314–18 “Tuft of Flowers, A” (Frost), 305 tugboats, noise restrictions on, 243–44 Tumblr, 166, 185, 186 Turing, Alan, 236 Turing Test, 55, 137 Twain, Mark, 243 tweets, tweeting, 75, 131, 315, 319 language of, 34–36 theses in form of, 223–26 “tweetstorm,” xvii 20/20, 16 Twilight Saga, The (Meyer), 50 Twitter, 34–36, 64, 91, 119, 166, 186, 197, 205, 223, 224, 257, 284 political use of, 315, 317–20 2001: A Space Odyssey (film), 231, 242 Two-Lane Blacktop (film), 203 “Two Tramps in Mud Time” (Frost), 247–48 typewriters, writing skills and, 234–35, 237 Uber, 148 Ubisoft, 261 Understanding Media (McLuhan), 102–3, 106 underwearables, 168–69 unemployment: job displacement in, 164–65, 174, 310 in traditional media, 8 universal online library, 267–78 legal, commercial, and political obstacles to, 268–71, 274–78 universe, as memory, 326 Urban Dictionary, 145 utopia, predictions of, xvii–xviii, xx, 4, 108–9, 172–73 Uzanne, Octave, 286–87, 290 Vaidhyanathan, Siva, 277 vampires, internet giants compared to, 50–51 Vampires (game), 50 Vanguardia, La, 190–91 Van Kekerix, Marvin, 134 vice, virtual, 39–40 video games, 223, 245, 303 as addictive, 260–61 cognitive effects of, 93–97 crafting of, 261–62 violent, 260–62 videos, viewing of, 80–81 virtual child, tips for raising a, 73–75 virtual world, xviii commercial aspects of, 26–27 conflict enacted in, 25–27 language of, 201–2 “playlaborers” of, 113–14 psychological and physical health affected by, 304 real world vs., xx–xxi, 36, 62, 127–30 as restrictive, 303–4 vice in, 39–40 von Furstenberg, Diane, 131 Wales, Jimmy, 192 Wallerstein, Edward, 43–44 Wall Street, automation of, 187–88 Wall Street Journal, 8, 16, 86, 122, 163, 333 Walpole, Horace, 12 Walters, Barbara, 16 Ward, Adrian, 200 Warhol, Andy, 72 Warren, Earl, 255, 257 “Waste Land, The” (Eliot), 86, 87 Watson (IBM computer), 147 Wealth of Networks, The (Benkler), xviii “We Are the Web” (Kelly), xxi, 4, 8–9 Web 1.0, 3, 5, 9 Web 2.0, xvi, xvii, xxi, 33, 58 amorality of, 3–9, 10 culturally transformative power of, 28–29 Twitter and, 34–35 “web log,” 21 Wegner, Daniel, 98, 200 Weinberger, David, 41–45, 277 Weizenbaum, Joseph, 236 Wells, H.

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Bayesian statistics, business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, lifelogging, linked data, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

It is premised on the notion that all massive datasets hold meaningful information that is non-random, valid, novel, useful and ultimately understandable (Han et al. 2011). As such, it uses supervised and unsupervised machine learning to detect, classify and segment meaningful relationships, associations and trends between variables. It does this using a series of different techniques including natural language processing, neural networks, decision trees, and statistical (non-parametric and parametric) methods. The selection of method varies between the type of data (structured, unstructured or semistructured) and the purpose of the analysis (see Table 6.1). Source: Miller and Han (2009: 7). Most of the techniques listed in Table 6.1 relate to structured data as found in relational databases. For example, segmentation models might be applied to a retail database of customers and their purchases to segment them into different profiles based on their characteristics and patterns of behaviour in order to offer each group different services/offers.

In detecting associations, a variety of regression models might be used to compute correlations between variables and thus reveal hidden patterns that can then be leveraged into commercial gain (for example, identifying what goods are bought with each other and reorganising a store to promote purchasing) (see Chapter 7). Unstructured data in the form of language, images and sounds raise particular data mining challenges. Natural language-processing techniques seek to analyse human language as expressed through the written and spoken word. They use semantics and taxonomies to recognise patterns and extract information from documents. Examples would include entity extraction that automatically extracts metadata from text by searching for particular types of text and phrasing, such as person names, locations, dates, specialised terms and product terminology, and entity relation extraction that automatically identifies the relationships between semantic entities, linking them together (e.g., person name to birth date or location, or an opinion to an item) (McCreary 2009).

Index A/B testing 112 abduction 133, 137, 138–139, 148 accountability 34, 44, 49, 55, 63, 66, 113, 116, 165, 171, 180 address e-mail 42 IP 8, 167, 171 place 8, 32, 42, 45, 52, 93, 171 Web 105 administration 17, 30, 34, 40, 42, 56, 64, 67, 87, 89, 114–115, 116, 124, 174, 180, 182 aggregation 8, 14, 101, 140, 169, 171 algorithm 5, 9, 21, 45, 76, 77, 83, 85, 89, 101, 102, 103, 106, 109, 111, 112, 118, 119, 122, 125, 127, 130, 131, 134, 136, 142, 146, 154, 160, 172, 177, 179, 181, 187 Amazon 72, 96, 131, 134 Anderson, C. 130, 135 Andrejevic, M. 133, 167, 178 animation 106, 107 anonymity 57, 63, 79, 90, 92, 116, 167, 170, 171, 172, 178 apophenia 158, 159 Application Programming Interfaces (APIs) 57, 95, 152, 154 apps 34, 59, 62, 64, 65, 78, 86, 89, 90, 95, 97, 125, 151, 170, 174, 177 archive 21, 22, 24, 25, 29–41, 48, 68, 95, 151, 153, 185 archiving 23, 29–31, 64, 65, 141 artificial intelligence 101, 103 Acxiom 43, 44 astronomy 34, 41, 72, 97 ATM 92, 116 audio 74, 77, 83 automatic meter reading (AMR) 89 automatic number plate recognition (ANPR) 85, 89 automation 32, 51, 83, 85, 87, 89–90, 98, 99, 102, 103, 118, 127, 136, 141, 146, 180 Ayasdi 132, 134 backup 29, 31, 40, 64, 163 barcode 74, 85, 92, Bates, J. 56, 61, 62, 182 Batty, M. 90, 111, 112, 140 Berry, D. 134, 141 bias 13, 14, 19, 28, 45, 101, 134–136, 153, 154, 155, 160 Big Brother 126, 180 big data xv, xvi, xvii, 2, 6, 13, 16, 20, 21, 27–29, 42, 46, 67–183, 186, 187, 188, 190, 191, 192 analysis 100–112 characteristics 27–29, 67–79 enablers 80–87 epistemology 128–148 ethical issues 165–183 etymology 67 organisational issues 160–163 rationale 113–127 sources 87–99 technical issues 149–160 biological sciences 128–129, 137 biometric data 8, 84, 115 DNA 8, 71, 84 face 85, 88, 105 fingerprints 8, 9, 84, 87, 88, 115 gait 85, 88 iris 8, 84, 88 bit-rot 20 blog 6, 95, 170 Bonferroni principle 159 born digital 32, 46, 141 Bowker, G. 2, 19, 20, 22, 24 Borgman, C. 2, 7, 10, 20, 30, 37, 40, 41 boyd, D. 68, 75, 151, 152, 156, 158, 160, 182 Brooks, D. 130, 145 business 1, 16, 42, 45, 56, 61, 62, 67, 79, 110, 113–127, 130, 137, 149, 152, 161, 166, 172, 173, 187 calculative practices 115–116 Campbell’s Law 63, 127 camera 6, 81, 83, 87, 88, 89, 90, 107, 116, 124, 167, 178, 180 capitalism 15, 16, 21, 59, 61, 62, 86, 95, 114, 119–123, 126, 136, 161, 184, 186 capta 2 categorization 6, 8, 12, 19, 20, 102, 106, 176 causation 130, 132, 135, 147 CCTV 87, 88, 180 census 17, 18, 19, 22, 24, 27, 30, 43, 54, 68, 74, 75, 76, 77, 87, 102, 115, 157, 176 Centro De Operações Prefeitura Do Rio 124–125, 182 CERN 72, 82 citizen science 97–99, 155 citizens xvi, 45, 57, 58, 61, 63, 71, 88, 114, 115, 116, 126, 127, 165, 166, 167, 174, 176, 179, 187 citizenship 55, 115, 170, 174 classification 6, 10, 11, 23, 28, 104, 105, 157, 176 clickstream 43, 92, 94, 120, 122, 154, 176 clustering 103, 104, 105, 106, 110, 122 Codd, E. 31 competitiveness xvi, 16, 114, computation 2, 4, 5, 6, 29, 32, 68, 80, 81–82, 83, 84, 86, 98, 100, 101, 102, 110, 129, 136, 139–147, 181 computational social science xiv, 139–147, 152, 186 computing cloud xv, 81, 86 distributed xv, 37, 78, 81, 83, 98 mobile xv, 44, 78, 80, 81, 83, 85, 139 pervasive 81, 83–84, 98, 124 ubiquitous 80, 81, 83–84, 98, 100, 124, 126 confidence level 14, 37, 133, 153, 160 confidentiality 8, 169, 175 control creep 126, 166, 178–179 cookies 92, 119, 171 copyright 16, 30, 40, 49, 51, 54, 96 correlation 105, 110, 130, 131, 132, 135, 145, 147, 157, 159 cost xv, 6, 11, 16, 27, 31, 32, 37, 38, 39, 40, 44, 52, 54, 57, 58, 59, 61, 66, 80, 81, 83, 85, 93, 96, 100, 116, 117, 118, 120, 127, 150 Crawford, K. 68, 75, 135, 151, 152, 155, 156, 158, 160, 182 credit cards 8, 13, 42, 44, 45, 85, 92, 167, 171, 176 risk 42, 63, 75, 120, 176, 177 crime 55, 115, 116, 123, 175, 179 crowdsourcing 37, 73, 93, 96–97, 155, 160 Cukier, K. 68, 71, 72, 91, 114, 128, 153, 154, 161, 174 customer relationship management (CRM) 42, 99, 117–118, 120, 122, 176 cyber-infrastructure 33, 34, 35, 41, 186 dashboard 106, 107, 108 data accuracy 12, 14, 110, 153, 154, 171 administrative 84–85, 89, 115, 116, 125, 150, 178 aggregators see data brokers amplification 8, 76, 99, 102, 167 analogue 1, 3, 32, 83, 88, 140, 141 analytics 42, 43, 63, 73, 80, 100–112, 116, 118, 119, 120, 124, 125, 129, 132, 134, 137, 139, 140, 145, 146, 149, 151, 159, 160, 161, 176, 179, 186, 191 archive see archive assemblage xvi, xvii, 2, 17, 22, 24–26, 66, 80, 83, 99, 117, 135, 139, 183, 184–192 attribute 4, 8–9, 31, 115, 150 auditing 33, 40, 64, 163 authenticity 12, 153 automated see automation bias see bias big see big data binary 1, 4, 32, 69 biometric see biometric data body 177–178, 187 boosterism xvi, 67, 127, 187, 192 brokers 42–45, 46, 57, 74, 75, 167, 183, 186, 187, 188, 191 calibration 13, 20 catalogue 32, 33, 35 clean 12, 40, 64, 86, 100, 101, 102, 152, 153, 154, 156 clearing house 33 commodity xvi, 4, 10, 12, 15, 16, 41, 42–45, 56, 161 commons 16, 42 consolidators see data brokers cooked 20, 21 corruption 19, 30 curation 9, 29, 30, 34, 36, 57, 141 definition 1, 2–4 deluge xv, 28, 73, 79, 100, 112, 130, 147, 149–151, 157, 168, 175 derived 1, 2, 3, 6–7, 8, 31, 32, 37, 42, 43, 44, 45, 62, 86, 178 deserts xvi, 28, 80, 147, 149–151, 161 determinism 45, 135 digital 1, 15, 31, 32, 67, 69, 71, 77, 82, 85, 86, 90, 137 directories 33, 35 dirty 29, 154, 163 dive 64–65, 188 documentation 20, 30, 31, 40, 64, 163 dredging 135, 147, 158, 159 dump 64, 150, 163 dynamic see dynamic data enrichment 102 error 13, 14, 44, 45, 101, 110, 153, 154, 156, 169, 175, 180 etymology 2–3, 67 exhaust 6–7, 29, 80, 90 fidelity 34, 40, 55, 79, 152–156 fishing see data dredging formats xvi, 3, 5, 6, 9, 22, 25, 30, 33, 34, 40, 51, 52, 54, 65, 77, 102, 153, 156, 157, 174 framing 12–26, 133–136, 185–188 gamed 154 holding 33, 35, 64 infrastructure xv, xvi, xvii, 2, 21–24, 25, 27–47, 52, 64, 102, 112, 113, 128, 129, 136, 140, 143, 147, 148, 149, 150, 156, 160, 161, 162, 163, 166, 184, 185, 186, 188, 189, 190, 191, 192 integration 42, 149, 156–157 integrity 12, 30, 33, 34, 37, 40, 51, 154, 157, 171 interaction 43, 72, 75, 85, 92–93, 94, 111, 167 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 156–157, 163, 184 interval 5, 110 licensing see licensing lineage 9, 152–156 linked see linked data lost 5, 30, 31, 39, 56, 150 markets xvi, 8, 15, 25, 42-45, 56, 59, 75, 167, 178 materiality see materiality meta see metadata mining 5, 77, 101, 103, 104–106, 109, 110, 112, 129, 132, 138, 159, 188 minimisation 45, 171, 178, 180 nominal 5, 110 ordinal 5, 110 open see open data ontology 12, 28, 54, 150 operational 3 ownership 16, 40, 96, 156, 166 preparation 40, 41, 54, 101–102 philosophy of 1, 2, 14, 17–21, 22, 25, 128–148, 185–188 policy 14, 23, 30, 33, 34, 37, 40, 48, 64, 160, 163, 170, 172, 173, 178 portals 24, 33, 34, 35 primary 3, 7–8, 9, 50, 90 preservation 30, 31, 34, 36, 39, 40, 64, 163 protection 15, 16, 17, 20, 23, 28, 40, 45, 62, 63, 64, 167, 168–174, 175, 178, 188 protocols 23, 25, 30, 34, 37 provenance 9, 30, 40, 79, 153, 156, 179 qualitative 4–5, 6, 14, 146, 191 quantitative 4–5, 14, 109, 127, 136, 144, 145, 191 quality 12, 13, 14, 34, 37, 40, 45, 52, 55, 57, 58, 64, 79, 102, 149, 151, 152–156, 157, 158 raw 1, 2, 6, 9, 20, 86, 185 ratio 5, 110 real-time 65, 68, 71, 73, 76, 88, 89, 91, 99, 102, 106, 107, 116, 118, 121, 124, 125, 139, 151, 181 reduction 5, 101–102 representative 4, 8, 13, 19, 21, 28 relational 3, 8, 28, 44, 68, 74–76, 79, 84, 85, 87, 88, 99, 100, 119, 140, 156, 166, 167, 184 reliability 12, 13–14, 52, 135, 155 resellers see data brokers resolution 7, 26, 27, 28, 68, 72, 73–74, 79, 84, 85, 89, 92, 133–134, 139, 140, 150, 180 reuse 7, 27, 29, 30, 31, 32, 39, 40, 41, 42, 46, 48, 49–50, 52, 56, 59, 61, 64, 102, 113, 163 scaled xvi, xvii 32, 100, 101, 112, 138, 149, 150, 163, 186 scarcity xv, xvi, 28, 80, 149–151, 161 science xvi, 100–112, 130, 137–139, 148, 151, 158, 160–163, 164, 191 secondary 3, 7–8 security see security selection 101, 176 semi-structured 4, 5–6, 77, 100, 105 sensitive 15, 16, 45, 63, 64, 137, 151, 167, 168, 171, 173, 174 shadow 166–168, 177, 179, 180 sharing 9, 11, 20, 21, 23, 24, 27, 29–41, 48–66, 80, 82, 95, 113, 141, 151, 174, 186 small see small data social construction 19–24 spatial 17, 52, 63, 68, 73, 75, 84–85, 88–89 standards xvi, 9, 14, 19, 22, 23, 24, 25, 31, 33, 34, 38, 40, 52, 53, 64, 102, 153, 156, 157 storage see storage stranded 156 structures 4, 5–6, 12, 21, 23, 30, 31, 40, 51, 68, 77, 86, 103, 106, 156 structured 4, 5–6, 11, 32, 52, 68, 71, 75, 77, 79, 86, 88, 105, 112, 163 tertiary 7–8, 9, 27, 74 time-series 68, 102, 106, 110 transient 6–7, 72, 150 transactional 42, 43, 71, 72, 74, 75, 85, 92, 93–94, 120, 122, 131, 167, 175, 176, 177 uncertainty see uncertainty unstructured 4, 5–6, 32, 52, 68, 71, 75, 77, 86, 100, 105, 112, 140, 153, 157 validity 12, 40, 72, 102, 135, 138, 154, 156, 158 variety 26, 28, 43, 44, 46, 68, 77, 79, 86, 139, 140, 166, 184 velocity 26, 28, 29, 68, 76–77, 78, 79, 86, 88, 102, 106, 112. 117, 140, 150, 153, 156, 184 veracity 13, 79, 102, 135, 152–156, 157, 163 volume 7, 26, 27, 28, 29, 32, 46, 67, 68, 69–72, 74, 76, 77, 78, 79, 86, 102, 106, 110, 125, 130, 135, 140, 141, 150, 156, 166, 184 volunteered 87, 93–98, 99, 155 databank 29, 34, 43 database NoSQL 6, 32, 77, 78, 86–87 relational 5, 6, 8, 32–33, 43, 74–75, 77, 78, 86, 100, 105 data-driven science 133, 137–139, 186 data-ism 130 datafication 181 dataveillance 15, 116, 126, 157, 166–168, 180, 181, 182, 184 decision tree 104, 111, 122, 159, deconstruction 24, 98, 126, 189–190 decontextualisation 22 deduction 132, 133, 134, 137, 138, 139, 148 deidentification 171, 172, 178 democracy 48, 55, 62, 63, 96, 117, 170 description 9, 101, 104, 109, 143, 147, 151, 190 designated community 30–31, 33, 46 digital devices 13, 25, 80, 81, 83, 84, 87, 90–91, 167, 174, 175 humanities xvi, 139–147, 152, 186 object identifier 8, 74 serendipity 134 discourse 15, 20, 55, 113–114, 117, 122, 127, 192 discursive regime 15, 20, 24, 56, 98, 113–114, 116, 123, 126, 127, 190 disruptive innovation xv, 68, 147, 184, 192 distributed computing xv, 37, 78, 81, 83, 98 sensors 124, 139, 160 storage 34, 37, 68, 78, 80, 81, 85–87, 97 division of labour 16 Dodge, M. 2, 21, 68, 73, 74, 76, 83, 84, 85, 89, 90, 92, 93, 96, 113, 115, 116, 124, 154, 155, 167, 177, 178, 179, 180, 189 driver’s licence 45, 87, 171 drone 88, Dublin Core 9 dynamic data xv, xvi, 76–77, 86, 106, 112 pricing 16, 120, 123, 177 eBureau 43, 44 ecological fallacy 14, 102, 135, 149, 158–160 Economist, The 58, 67, 69, 70, 72, 128 efficiency 16, 38, 55, 56, 59, 66, 77, 93, 102, 111, 114, 116, 118, 119, 174, 176 e-mail 71, 72–73, 82, 85, 90, 93, 116, 174, 190 empiricism 129, 130–137, 141, 186 empowerment 61, 62–63, 93, 115, 126, 165 encryption 171, 175 Enlightenment 114 Enterprise Resource Planning (ERP) 99, 117, 120 entity extraction 105 epistemology 3, 12, 19, 73, 79, 112, 128–148, 149, 185, 186 Epsilon 43 ethics 12, 14–15, 16, 19, 26, 30, 31, 40, 41, 64, 73, 99, 128, 144, 151, 163, 165–183, 186 ethnography 78, 189, 190, 191 European Union 31, 38, 45, 49, 58, 59, 70, 157, 168, 173, 178 everyware 83 exhaustive 13, 27, 28, 68, 72–73, 79, 83, 88, 100, 110, 118, 133–134, 140, 150, 153, 166, 184 explanation 101, 109, 132, 133, 134, 137, 151 extensionality 67, 78, 140, 184 experiment 2, 3, 6, 34, 75, 78, 118, 129, 131, 137, 146, 150, 160 Facebook 6, 28, 43, 71, 72, 77, 78, 85, 94, 119, 154, 170 facts 3, 4, 9, 10, 52, 140, 159 Fair Information Practice Principles 170–171, 172 false positive 159 Federal Trade Commission (FTC) 45, 173 flexibility 27, 28, 68, 77–78, 79, 86, 140, 157, 184 Flickr 95, 170 Flightradar 107 Floridi, L. 3, 4, 9, 10, 11, 73, 112, 130, 151 Foucault, M. 16, 113, 114, 189 Fourth paradigm 129–139 Franks, B. 6, 111, 154 freedom of information 48 freemium service 60 funding 15, 28, 29, 31, 34, 37, 38, 40, 41, 46, 48, 52, 54–55, 56, 57–58, 59, 60, 61, 65, 67, 75, 119, 143, 189 geographic information systems 147 genealogy 98, 127, 189–190 Gitelman, L. 2, 19, 20, 21, 22 Global Positioning System (GPS) 58, 59, 73, 85, 88, 90, 121, 154, 169 Google 32, 71, 73, 78, 86, 106, 109, 134, 170 governance 15, 21, 22, 23, 38, 40, 55, 63, 64, 66, 85, 87, 89, 117, 124, 126, 136, 168, 170, 178–182, 186, 187, 189 anticipatory 126, 166, 178–179 technocratic 126, 179–182 governmentality xvi, 15, 23, 25, 40, 87, 115, 127, 168, 185, 191 Gray, J. 129–130 Guardian, The 49 Gurstein, M. 52, 62, 63 hacking 45, 154, 174, 175 hackathon 64–65, 96, 97, 188, 191 Hadoop 87 hardware 32, 34, 40, 63, 78, 83, 84, 124, 143, 160 human resourcing 112, 160–163 hype cycle 67 hypothesis 129, 131, 132, 133, 137, 191 IBM 70, 123, 124, 143, 162, 182 identification 8, 44, 68, 73, 74, 77, 84–85, 87, 90, 92, 115, 169, 171, 172 ideology 4, 14, 25, 61, 113, 126, 128, 130, 134, 140, 144, 185, 190 immutable mobiles 22 independence 3, 19, 20, 24, 100 indexical 4, 8–9, 32, 44, 68, 73–74, 79, 81, 84–85, 88, 91, 98, 115, 150, 156, 167, 184 indicator 13, 62, 76, 102, 127 induction 133, 134, 137, 138, 148 information xvii, 1, 3, 4, 6, 9–12, 13, 23, 26, 31, 33, 42, 44, 45, 48, 53, 67, 70, 74, 75, 77, 92, 93, 94, 95, 96, 100, 101, 104, 105, 109, 110, 119, 125, 130, 138, 140, 151, 154, 158, 161, 168, 169, 171, 174, 175, 184, 192 amplification effect 76 freedom of 48 management 80, 100 overload xvi public sector 48 system 34, 65, 85, 117, 181 visualisation 109 information and communication technologies (ICTs) xvi, 37, 80, 83–84, 92, 93, 123, 124 Innocentive 96, 97 INSPIRE 157 instrumental rationality 181 internet 9, 32, 42, 49, 52, 53, 66, 70, 74, 80, 81, 82, 83, 86, 92, 94, 96, 116, 125, 167 of things xv, xvi, 71, 84, 92, 175 intellectual property rights xvi, 11, 12, 16, 25, 30, 31, 40, 41, 49, 50, 56, 62, 152, 166 Intelius 43, 44 intelligent transportation systems (ITS) 89, 124 interoperability 9, 23, 24, 34, 40, 52, 64, 66, 149, 156–157, 163, 184 interpellation 165, 180, 188 interviews 13, 15, 19, 78, 155, 190 Issenberg, S. 75, 76, 78, 119 jurisdiction 17, 25, 51, 56, 57, 74, 114, 116 Kafka 180 knowledge xvii, 1, 3, 9–12, 19, 20, 22, 25, 48, 53, 55, 58, 63, 67, 93, 96, 110, 111, 118, 128, 130, 134, 136, 138, 142, 159, 160, 161, 162, 187, 192 contextual 48, 64, 132, 136–137, 143, 144, 187 discovery techniques 77, 138 driven science 139 economy 16, 38, 49 production of 16, 20, 21, 24, 26, 37, 41, 112, 117, 134, 137, 144, 184, 185 pyramid 9–10, 12, situated 16, 20, 28, 135, 137, 189 Latour, B. 22, 133 Lauriault, T.P. 15, 16, 17, 23, 24, 30, 31, 33, 37, 38, 40, 153 law of telecosm 82 legal issues xvi, 1, 23, 25, 30, 31, 115, 165–179, 182, 183, 187, 188 levels of measurement 4, 5 libraries 31, 32, 52, 71, 141, 142 licensing 14, 25, 40, 42, 48, 49, 51, 53, 57, 73, 96, 151 LIDAR 88, 89, 139 linked data xvii, 52–54, 66, 156 longitudinal study 13, 76, 140, 149, 150, 160 Lyon, D. 44, 74, 87, 167, 178, 180 machine learning 5, 6, 101, 102–104, 106, 111, 136, 188 readable 6, 52, 54, 81, 84–85, 90, 92, 98 vision 106 management 62, 88, 117–119, 120, 121, 124, 125, 131, 162, 181 Manovich, L. 141, 146, 152, 155 Manyika, J. 6, 16, 70, 71, 72, 104, 116, 118, 119, 120, 121, 122, 161 map 5, 22, 24, 34, 48, 54, 56, 73, 85, 88, 93, 96, 106, 107, 109, 115, 143, 144, 147, 154, 155–156, 157, 190 MapReduce 86, 87 marginal cost 11, 32, 57, 58, 59, 66, 151 marketing 8, 44, 58, 73, 117, 119, 120–123, 131, 176 marketisation 56, 61–62, 182 materiality 4, 19, 21, 24, 25, 66, 183, 185, 186, 189, 190 Mattern, S. 137, 181 Mayer-Schonberger, V. 68, 71, 72, 91, 114, 153, 154, 174 measurement 1, 3, 5, 6, 10, 12, 13, 15, 19, 23, 69, 97, 98, 115, 128, 166 metadata xvi, 1, 3, 4, 6, 8–9, 13, 22, 24, 29, 30, 31, 33, 35, 40, 43, 50, 54, 64, 71, 72, 74, 78, 85, 91, 93, 102, 105, 153, 155, 156 methodology 145, 158, 185 middleware 34 military intelligence 71, 116, 175 Miller, H.J. xvi, 27, 100, 101, 103, 104, 138, 139, 159 Minelli, M. 101, 120, 137, 168, 170, 171, 172, 174, 176 mixed methods 147, 191 mobile apps 78 computing xv, 44, 78, 80, 81, 83, 85, 139 mapping 88 phones 76, 81, 83, 90, 93, 151, 168, 170, 175 storage 85 mode of production 16 model 7, 11, 12, 24, 32, 37, 44, 57, 72, 73, 101, 103, 105, 106, 109, 110–112, 119, 125, 129, 130, 131, 132, 133, 134, 137, 139, 140, 144, 145, 147, 158–159, 166, 181 agent-based model 111, business 30, 54, 57–60, 61, 95, 118, 119, 121 environmental 139, 166 meteorological 72 time-space 73 transportation 7 modernity 3 Moore’s Law 81, moral philosophy 14 Moretti, F. 141–142 museum 31, 32, 137 NASA 7 National Archives and Records Administration (NARA) 67 National Security Agency (NSA) 45, 116 natural language processing 104, 105 near-field communication 89, 91 neoliberalism 56, 61–62, 126, 182 neural networks 104, 105, 111 New Public Management 62, non-governmental organisations xvi, 43, 55, 56, 73, 117 non-excludable 11, 151 non-rivalrous 11, 57, 151 normality 100, 101 normative thinking 12, 15, 19, 66, 99, 127, 144, 182, 183, 187, 192 Obama, B. 53, 75–76, 78, 118–119 objectivity 2, 17, 19, 20, 62, 135, 146, 185 observant participation 191 oligopticon 133, 167, 180 ontology 3, 12, 17–21, 22, 28, 54, 79, 128, 138, 150, 156, 177, 178, 184, 185 open data xv, xvi, xvii, 2, 12, 16, 21, 25, 48–66, 97, 114, 124, 128, 129, 140, 149, 151, 163, 164, 167, 186, 187, 188, 190, 191, 192 critique of 61–66 economics of 57–60 rationale 54–56 Open Definition 50 OpenGovData 50, 51 Open Knowledge Foundation 49, 52, 55, 58, 189, 190 open science 48, 72, 98 source 48, 56, 60, 87, 96 OpenStreetMap 73, 93, 96, 154, 155–156 optimisation 101, 104, 110–112, 120, 121, 122, 123 Ordnance Survey 54, 57 Organization for Economic Cooperation and Development (OECD) 49, 50, 59 overlearning 158, 159 panoptic 133, 167, 180 paradigm 112, 128–129, 130, 138, 147, 148, 186 participant observation 190, 191 participation 48, 49, 55, 66, 82, 94, 95, 96, 97–98, 126, 155, 165, 180 passport 8, 45, 84, 87, 88, 115 patent 13, 16, 41, 51 pattern recognition 101, 104–106, 134, 135 personally identifiable information 171 philanthropy 32, 38, 58 philosophy of science 112, 128–148, 185–188 phishing 174, 175 phone hacking 45 photography 6, 43, 71, 72, 74, 77, 86, 87, 88, 93, 94, 95, 105, 115, 116, 141, 155, 170 policing 80, 88, 116, 124, 125, 179 political economy xvi, 15–16, 25, 42–45, 182, 185, 188, 191 Pollock, R. 49, 54, 56, 57 58, 59 positivism 129, 136–137, 140, 141, 144, 145, 147 post-positivism 140, 144, 147 positionality 135, 190 power/knowledge 16, 22 predictive modelling 4, 7, 12, 34, 44, 45, 76, 101, 103, 104, 110–112, 118, 119, 120, 125, 132, 140, 147, 168, 179 profiling 110–112, 175–178, 179, 180 prescription 101 pre-analytical 2, 3, 19, 20, 185 pre-analytics 101–102, 112 pre-factual 3, 4, 19, 185 PRISM 45, 116 privacy 15, 28, 30, 40, 45, 51, 57, 63, 64, 96, 117, 163, 165, 166, 168–174, 175, 178, 182, 187 privacy by design 45, 173, 174 probability 14, 110, 153, 158 productivity xvi, 16, 39, 55, 66, 92, 114, 118 profiling 12, 42–45, 74, 75, 110–112, 119, 166, 168, 175–178, 179, 180, 187 propriety rights 48, 49, 54, 57, 62 prosumption 93 public good 4, 12, 16, 42, 52, 56, 58, 79, 97 –private partnerships 56, 59 sector information (PSI) 12, 48, 54, 56, 59, 61, 62 quantified self 95 redlining 176, 182 reductionism 73, 136, 140, 142, 143, 145 regression 102, 104, 105, 110, 111, 122 regulation xvi, 15, 16, 23, 25, 40, 44, 46, 83, 85, 87, 89–90, 114, 115, 123, 124, 126, 168, 174, 178, 180, 181–182, 187, 192 research design 7, 13, 14, 77–78, 98, 137–138, 153, 158 Renaissance xvi, 129, 141 repository 29, 33, 34, 41 representativeness 13, 14, 19, 21 Resource Description Framework (RDF) 53, 54 remote sensing 73–74, 105 RFID 74, 85, 90, 91, 169 rhetorical 3, 4, 185 right to be forgotten 45, 172, 187 information (RTI) 48, 62 risk 16, 44, 58, 63, 118, 120, 123, 132, 158, 174, 176–177, 178, 179, 180 Rosenberg, D. 1, 3 Ruppert, E. 22, 112, 157, 163, 187 sampling 13, 14, 27, 28, 46, 68, 72, 73, 77, 78, 88, 100, 101, 102, 120, 126, 133, 138, 139, 146, 149–150, 152, 153, 154, 156, 159 scale of economy 37 scanners 6, 25, 29, 32, 83, 85, 88, 89, 90, 91, 92, 175, 177, 180 science xvi, 1, 2, 3, 19, 20, 29, 31, 34, 37, 46, 65, 67, 71, 72, 73, 78, 79, 97, 98, 100, 101, 103, 111, 112, 128–139, 140, 147, 148, 150, 158, 161, 165, 166, 181, 184, 186 scientific method 129, 130, 133, 134, 136, 137–138, 140, 147, 148, 186 security data 28, 33, 34, 40, 45, 46, 51, 57, 126, 157, 166, 169, 171, 173, 174–175, 182, 187 national 42, 71, 88, 116–117, 172, 176, 178, 179 private 99, 115, 118, 151 social 8, 32, 45, 87, 115, 171 segmentation 104, 105, 110, 119, 120, 121, 122, 176 semantic information 9, 10, 11, 105, 157 Web 49, 52, 53, 66 sensors xv, 6, 7, 19, 20, 24, 25, 28, 34, 71, 76, 83, 84, 91–92, 95, 124, 139, 150, 160 sentiment analysis 105, 106, 121, Siegel, E. 103, 110, 111, 114, 120, 132, 158, 176, 179 signal 9, 151, 159 Silver, N. 136, 151, 158 simulation 4, 32, 37, 101, 104, 110–112, 119, 129, 133, 137, 139, 140 skills 37, 48, 52, 53, 57, 63, 94, 97, 98, 112, 149, 160–163, 164 small data 21, 27–47, 68, 72, 75, 76, 77, 79, 100, 103, 110, 112, 146, 147, 148, 150, 156, 160, 166, 184, 186, 188, 191 smart cards 90 cities 91, 92, 99, 124–125, 181–182 devices 83 metering 89, 123, 174 phones 81, 82, 83, 84, 90, 94, 107, 121, 155, 170, 174 SmartSantander 91 social computing xvi determinism 144 media xv, 13, 42, 43, 76, 78, 90, 93, 94–95, 96, 105, 119, 121, 140, 150, 151, 152, 154, 155, 160, 167, 176, 180 physics 144 security number 8, 32, 45, 87, 115, 171 sorting 126, 166, 168, 175–178, 182 sociotechnical systems 21–24, 47, 66, 183, 185, 188 software 6, 20, 32, 34, 40, 48, 53, 54, 56, 63, 80, 83, 84, 86, 88, 96, 132, 143, 160, 161, 163, 166, 170, 172, 175, 177, 180, 189 Solove, D. 116, 120, 168, 169, 170, 172, 176, 178, 180 solutionism 181 sousveillance 95–96 spatial autocorrelation 146 data infrastructure 34, 35, 38 processes 136, 144 resolution 149 statistics 110 video 88 spatiality 17, 157 Star, S.L. 19, 20, 23, 24 stationarity 100 statistical agencies 8, 30, 34, 35, 115 geography 17, 74, 157 statistics 4, 8, 13, 14, 24, 48, 77, 100, 101, 102, 104, 105, 109–110, 111, 129, 132, 134, 135, 136, 140, 142, 143, 145, 147, 159 descriptive 4, 106, 109, 147 inferential 4, 110, 147 non-parametric 105, 110 parametric 105, 110 probablistic 110 radical 147 spatial 110 storage 31–32, 68, 72, 73, 78, 80, 85–87, 88, 100, 118, 161, 171 analogue 85, 86 digital 85–87 media 20, 86 store loyalty cards 42, 45, 165 Sunlight Foundation 49 supervised learning 103 Supply Chain Management (SCM) 74, 99, 117–118, 119, 120, 121 surveillance 15, 71, 80, 83, 87–90, 95, 115, 116, 117, 123, 124, 151, 165, 167, 168, 169, 180 survey 6, 17, 19, 22, 28, 42, 68, 75, 77, 87, 115, 120 sustainability 16, 33, 34, 57, 58, 59, 61, 64–66, 87, 114, 123–124, 126, 155 synchronicity 14, 95, 102 technological handshake 84, 153 lock-in 166, 179–182 temporality 17, 21, 27, 28, 32, 37, 68, 75, 111, 114, 157, 160, 186 terrorism 116, 165, 179 territory 16, 38, 74, 85, 167 Tesco 71, 120 Thrift, N. 83, 113, 133, 167, 176 TopCoder 96 trading funds 54–55, 56, 57 transparency 19, 38, 44, 45, 48–49, 55, 61, 62, 63, 113, 115, 117, 118, 121, 126, 165, 173, 178, 180 trust 8, 30, 33, 34, 40, 44, 55, 84, 117, 152–156, 163, 175 trusted digital repository 33–34 Twitter 6, 71, 78, 94, 106, 107, 133, 143, 144, 146, 152, 154, 155, 170 uncertainty 10, 13, 14, 100, 102, 110, 156, 158 uneven development 16 Uniform Resource Identifiers (URIs) 53, 54 United Nations Development Programme (UNDP) 49 universalism 20, 23, 133, 140, 144, 154, 190 unsupervised learning 103 utility 1, 28, 53, 54, 55, 61, 63, 64–66, 100, 101, 114, 115, 134, 147, 163, 185 venture capital 25, 59 video 6, 43, 71, 74, 77, 83, 88, 90, 93, 94, 106, 141, 146, 170 visual analytics 106–109 visualisation 5, 10, 34, 77, 101, 102, 104, 106–109, 112, 125, 132, 141, 143 Walmart 28, 71, 99, 120 Web 2.0 81, 94–95 Weinberger, D. 9, 10, 11, 96, 97, 132, 133 White House 48 Wikipedia 93, 96, 106, 107, 143, 154, 155 Wired 69, 130 wisdom 9–12, 114, 161 XML 6, 53 Zikopoulos, P.C. 6, 16, 68, 70, 73, 76, 119, 151


pages: 49 words: 12,968

Industrial Internet by Jon Bruner

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

autonomous vehicles, barriers to entry, commoditize, computer vision, data acquisition, demand response, en.wikipedia.org, factory automation, Google X / Alphabet X, industrial robot, Internet of things, job automation, loose coupling, natural language processing, performance metric, Silicon Valley, slashdot, smart grid, smart meter, statistical model, web application

“Now you’ve got what you might call a rain API — two machines talking, mediated by a human being,” says Prasad. It could alert other cars to the presence of rain, perhaps switching on headlights automatically or changing the assumptions that nearby cars make about road traction. The human in this case becomes part of an API in situ — the software, integrated with hardware, is able to detect a strong signal from a human without relying on extractive tools like natural-language processing that are often used to divine human preferences. Connected to networks through easy procedural mechanisms like If This Then That (IFTTT)[29], human operators even at the consumer level can identify significant signals and make their machines react to them. “I’m a car guy, so I’m talking about cars, but imagine the number of machines out there that are being turned on and off. In each case, the fact that a human is turning it on and off tells you something very interesting; it’s human-annotated data,” says Prasad.


pages: 71 words: 14,237

21 Recipes for Mining Twitter by Matthew A. Russell

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

en.wikipedia.org, Google Earth, natural language processing, NP-complete, social web, web application

Only the data needs to be written to a %s placeholder in the template. See Also http://labs.mudynamics.com/wp-content/uploads/2009/04/icouch.html, http://help.com/ post/383276-anyone-knows-the-formula-for-font-s 1.11 Creating a Tag Cloud from Tweet Entities | 33 1.12 Summarizing Link Targets Problem You want to summarize the text of a web page that’s indicated by a short URL in a tweet. Solution Extract the text from the web page, and then use a natural language processing (NLP) toolkit such as the Natural Language Toolkit (NLTK) to help you extract the most important sentences to create a machine-generated abstract. Discussion Summarizing web pages is a very powerful capability, and this is especially the case in the context of a tweet where you have a lot of additional metadata (or “reactions”) about the page from one or more tweets. Summarizing web pages is a particularly hard and messy problem, but you can bootstrap a reasonable solution with less effort than you might think.


pages: 265 words: 74,000

The Numerati by Stephen Baker

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Berlin Wall, Black Swan, business process, call centre, correlation does not imply causation, Drosophila, full employment, illegal immigration, index card, Isaac Newton, job automation, job satisfaction, McMansion, Myron Scholes, natural language processing, PageRank, personalized medicine, recommendation engine, RFID, Silicon Valley, Skype, statistical model, Watson beat the top human players on Jeopardy!

I could come to lots of other conclusions about her passions, her love interests, and even what she likes to eat. This is all clear to me. But she's writing in my language. Practically every word makes sense. The bad news, from a data-mining perspective, is that it takes me a scandalous five minutes to read through her text. In that time, Umbria's computers work through 35,300 blog posts. This magic takes place within two domains of artificial intelligence: natural language processing and machine learning. The idea is simple enough. The machines churn through the words, using their statistical genius and formidable memory to make sense of them. To say that they "understand" the words is a stretch. It's like saying that a blind bat, which navigates by processing the geometry of sound waves, "sees" the open window it flies through. But no matter. If computers can draw correct conclusions from the words they plow through, they pass the language test.

See Social networks Names finding people by, [>], [>], [>], [>]–[>], [>] on phone prompts, [>] protection of, in data mining, [>] NASA, [>]–[>] National Cryptologic Museum, [>], [>], [>]–[>] National Science Foundation, [>] National Security Agency (NSA) data mining by, [>], [>]–[>] mathematicians working for, [>], [>], [>]–[>], [>]–[>], [>] social network interpretation by, [>], [>]–[>] Natural language processing, [>]–[>] "Negotiators" (personality type), [>]–[>], [>] Netflix, [>], [>], [>] "Neural network" programs, [>]–[>] Newton, Isaac, [>] New York Times, [>] Next Friend Analysis, [>]–[>], [>] Nicaragua, [>] Nicolov, Nicolas, [>]–[>], [>], [>]–[>] Nielsen BuzzMetrics (company), [>], [>] 9/11 terrorist attack, [>], [>]–[>], [>]–[>], [>], [>], [>] "Nodes" (in social networks), [>] "Noise," [>] No Place to Hide (O'Harrow), [>] NORA software, [>]–[>], [>] Norman (fistulated cow), [>]–[>], [>], [>] NSA.


pages: 239 words: 70,206

Data-Ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else by Steve Lohr

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, Affordable Care Act / Obamacare, Albert Einstein, big data - Walmart - Pop Tarts, bioinformatics, business intelligence, call centre, cloud computing, computer age, conceptual framework, Credit Default Swap, crowdsourcing, Daniel Kahneman / Amos Tversky, Danny Hillis, data is the new oil, David Brooks, East Village, Edward Snowden, Emanuel Derman, Erik Brynjolfsson, everywhere but in the productivity statistics, Frederick Winslow Taylor, Google Glasses, impulse control, income inequality, indoor plumbing, industrial robot, informal economy, Internet of things, invention of writing, John Markoff, John von Neumann, lifelogging, Mark Zuckerberg, market bubble, meta analysis, meta-analysis, money market fund, natural language processing, obamacare, pattern recognition, payday loans, personalized medicine, precision agriculture, pre–internet, Productivity paradox, RAND corporation, rising living standards, Robert Gordon, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, speech recognition, statistical model, Steve Jobs, Steven Levy, The Design of Experiments, the scientific method, Thomas Kuhn: the structure of scientific revolutions, unbanked and underbanked, underbanked, Von Neumann architecture, Watson beat the top human players on Jeopardy!

Our notions of “knowledge,” “meaning,” and “understanding” don’t really apply to how this technology works. Humans understand things in good part largely because of their experience of the real world. Computers lack that advantage. Advances in artificial intelligence mean that machines can increasingly see, read, listen, and speak, in their way. And a very different way, it is. As Frederick Jelinek, a pioneer in speech recognition and natural-language processing at IBM, once explained by way of analogy: “Airplanes don’t flap their wings.” To get a sense of how computers build knowledge, let’s look at Carnegie Mellon University’s Never-Ending Language Learning system, or NELL. Since 2010, NELL has been steadily scanning hundreds of millions of Web pages for text patterns that it uses to learn facts, more than 2.3 million so far, with an estimated accuracy of 87 percent.

Decades ago, the main focus of artificial intelligence research was to develop knowledge rules and relationships to make so-called expert systems. But those systems proved extremely difficult to build. So knowledge systems gave way to the data-driven path: mine vast amounts of data to make predictions, based on statistical probabilities and patterns. Data-fueled artificial intelligence, Ferrucci says, has been “incredibly powerful” for tasks like natural-language processing—a central technology, for example, behind Google’s search and Watson’s question-answering. “But in a purely data-driven approach, there is no real understanding,” he says. “People are so enamored with the data-driven approach that they believe correlation is enough.” For a broad swath of commercial decisions, as we’ve seen, correlation is sufficient, as long as the outcome is a winner.


pages: 696 words: 143,736

The Age of Spiritual Machines: When Computers Exceed Human Intelligence by Ray Kurzweil

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Any sufficiently advanced technology is indistinguishable from magic, Buckminster Fuller, call centre, cellular automata, combinatorial explosion, complexity theory, computer age, computer vision, cosmological constant, cosmological principle, Danny Hillis, double helix, Douglas Hofstadter, Everything should be made as simple as possible, first square of the chessboard / second half of the chessboard, fudge factor, George Gilder, Gödel, Escher, Bach, I think there is a world market for maybe five computers, information retrieval, invention of movable type, Isaac Newton, iterative process, Jacquard loom, Jacquard loom, John Markoff, John von Neumann, Lao Tzu, Law of Accelerating Returns, mandelbrot fractal, Marshall McLuhan, Menlo Park, natural language processing, Norbert Wiener, optical character recognition, ought to be enough for anybody, pattern recognition, phenotype, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Richard Feynman, Robert Metcalfe, Schrödinger's Cat, Search for Extraterrestrial Intelligence, self-driving car, Silicon Valley, speech recognition, Steven Pinker, Stewart Brand, stochastic process, technological singularity, Ted Kaczynski, telepresence, the medium is the message, There's no reason for any individual to have a computer in his home - Ken Olsen, traveling salesman, Turing machine, Turing test, Whole Earth Review, Y2K

He cites the following sentence:“What number of products of products of products of products of products of products of products of products was the number of products of products of products of products of products of products of products of products?” as having 1,430 X 1,430 = 2,044,900 interpretations. 4 These and other theoretical aspects of computational linguistics are covered in Mary D. Harris, Introduction to Natural Language Processing (Reston, VA: Reston Publishing Co., 1985). CHAPTER 6: BUILDING NEW BRAINS ... 1 Hans Moravec is likely to make this argument in his 1998 book Robot: Mere Machine to Transcendent Mind (Oxford University Press; not yet available as of this writing). 2 One hundred fifty million calculations per second for a 1998 personal computer doubling twenty-seven times by the year 2025 (this assumes doubling both the number of components, and the speed of each component every two years) equals about 20 million billion calculations per second.

New York: Dover Publications, 1961. ————. Ninth Bridgewater Treatise: A Fragment. London: Murray, 1838. Babbage, Henry Prevost. Babbage’s Calculating Engines: A Collection of Papers by Henry Prevost Babbage (Editor). Vol. 2. Los Angeles: Tomash, 1982. Bailey, James. After Thought: The Computer Challenge to Human Intelligence. New York: Basic Books, 1996. Bara, Bruno G. and Giovanni Guida. Computational Models of Natural Language Processing. Amsterdam: North Holland, 1984. Barnsley, Michael F. Fractals Everywhere. Boston: Academic Press Professional, 1993. Baron, Jonathan. Rationality and Intelligence. Cambridge: Cambridge University Press, 1985. Barrett, Paul H., ed. The Collected Papers of Charles Darwin. Vols. 1 and 2. Chicago: University of Chicago Press, 1977. Barrow, John. Theories of Everything. Oxford: Oxford University Press, 1991.

Global Mind Change: The New Age Revolution in the Way We Think. New York: Warner Books, 1988. Harmon, Paul and David King. Expert Systems: Artificial Intelligence in Business. New York: John Wiley and Sons, 1985. Harre, Rom, ed. American Behaviorial Scientist: Computation and the Mind. Vol. 40, no. 6, May 1997. Harrington, Steven. Computer Graphics: A Programming Approach. New York: McGraw-Hill, 1987. Harris, Mary Dee. Introduction to Natural Language Processing. Reston, VA: Reston, 1985. Haugeland, John. Artificial Intelligence: The Very Idea. Cambridge, MA: MIT Press, 1985. ________, ed. Mind Design: Philosophy, Psychology, Artificial Intelligence. Cambridge, MA: MIT Press, 1981. ________, ed. Mind Design II: Philosophy, Psychology, Artificial Intelligence. Cambridge, MA: MIT Press, 1997. Hawking, Stephen W.ABrief History of Time: From the Big Bang to Black Holes.


pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AI winter, artificial general intelligence, bioinformatics, brain emulation, combinatorial explosion, complexity theory, computer vision, conceptual framework, correlation coefficient, epigenetics, friendly AI, information retrieval, Isaac Newton, John Conway, Loebner Prize, Menlo Park, natural language processing, Occam's razor, p-value, pattern recognition, performance metric, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

These terms have a many-to-many mapping to the terms directly used in the system’s “native language”, Narsese, and this mapping corresponds to a symbolize relation in Narsese. The truthvalue of a symbolizing statement indicates the frequency and confidence for the word/phrase/sentence (in the natural language) to be used as the symbol of the term (in Narsese), according to the experience of the system. In language understanding process, NARS will not have separate parsing and semantic mapping phases, like in many other natural language processing systems. Instead, for an input sentence, the recognition of its syntactic structure and the recognition of its semantic structure will be carried out hand-in-hand. The process will start by checking whether the sentence can be understood as a whole, as the case of proverbs and idioms. If unsuccessful, the sentence will be divided recursively into phrases and words, whose sequential relations will be tentatively mapped into the structures of compound terms, with components corresponding to the individual phrases and words.

However, due to the inevitable difference in experience, the system cannot always be able to use a natural language as a native speaker. Even so, its proficiency in that language should be sufficient for many practical purposes. Being able to use any natural language is not a necessary condition for being intelligent. Since the aim of NARS is not to accurately duplicate human behaviors so as to pass the Turing Test [5], natural language processing is optional for the system. 3.3 Education NARS processes tasks using available knowledge, though the system is not designed with a ready-made knowledge base as a necessary part. Instead, all the knowledge, in principle, should come from the system’s experience. In other words, NARS as designed is like a baby that has great potential, but little instinct. P. Wang / From NARS to a Thinking Machine 85 For the system to serve any practical purpose, extensive education, or training, is needed, which means to build a proper internal knowledge base (or call it belief network, long-term memory, etc.) by feeding the system with certain (initial) experience.

To gracefully incorporate heuristics not explicitly based on probability theory, in cases where probability theory, at its current state of development, does not provide adequate pragmatic solutions. To provide “scalable” reasoning, in the sense of being able to carry out inferences involving at least billions of premises. Of course, when the number of premises is fewer, more intensive and accurate reasoning may be carried out. To easily accept input from, and send input to, natural language processing software systems. PLN implements a wide array of first-order and higher-order inference rules including (but not limited to) deduction, Bayes’ Rule, unification, intensional and extensional inference, belief revision, induction, and abduction. Each rule comes with uncertain truth-value formulas, calculating the truth-value of the conclusion from the truthvalues of the premises. Inference is controlled by highly flexible forward and backward chaining processes able to take feedback from external processes and thus behave adaptively.


pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence by John Brockman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, agricultural Revolution, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, artificial general intelligence, augmented reality, autonomous vehicles, basic income, bitcoin, blockchain, clean water, cognitive dissonance, Colonization of Mars, complexity theory, computer age, computer vision, constrained optimization, corporate personhood, cosmological principle, cryptocurrency, cuban missile crisis, Danny Hillis, dark matter, discrete time, Douglas Engelbart, Elon Musk, Emanuel Derman, endowment effect, epigenetics, Ernest Rutherford, experimental economics, Flash crash, friendly AI, functional fixedness, Google Glasses, hive mind, income inequality, information trail, Internet of things, invention of writing, iterative process, Jaron Lanier, job automation, John Markoff, John von Neumann, Kevin Kelly, knowledge worker, loose coupling, microbiome, Moneyball by Michael Lewis explains big data, natural language processing, Network effects, Norbert Wiener, pattern recognition, Peter Singer: altruism, phenotype, planetary scale, Ray Kurzweil, recommendation engine, Republic of Letters, RFID, Richard Thaler, Rory Sutherland, Satyajit Das, Search for Extraterrestrial Intelligence, self-driving car, sharing economy, Silicon Valley, Skype, smart contracts, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, superintelligent machines, supervolcano, the scientific method, The Wisdom of Crowds, theory of mind, Thorstein Veblen, too big to fail, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K

After thirty years of research, a million-times improvement in computer power, and vast data sets from the Internet, we now know the answer to this question: Neural networks scaled up to twelve layers deep, with billions of connections, are outperforming the best algorithms in computer vision for object recognition and have revolutionized speech recognition. It’s rare for any algorithm to scale this well, which suggests that they may soon be able to solve even more difficult problems. Recent breakthroughs have been made that allow the application of deep learning to natural-language processing. Deep recurrent networks with short-term memory were trained to translate English sentences into French sentences at high levels of performance. Other deep-learning networks could create English captions for the content of images with surprising and sometimes amusing acumen. Supervised learning using deep networks is a step forward, but still far from achieving general intelligence. The functions they perform are analogous to some capabilities of the cerebral cortex, which has also been scaled up by evolution, but to solve complex cognitive problems the cortex interacts with many other brain regions.

Brain-machine interfaces continue to be improved, initially for physically impaired people but eventually to provide a seamless boundary between people and the monitoring network. And virtual-reality-style interfaces will continue to become more realistic and immersive. Why won’t a stand-alone sentient brain come sooner? The amazing progress in spoken-language recognition—unthinkable ten years ago—derives in large part from having access to huge amounts of data and huge amounts of storage and fast networks. The improvements we see in natural-language processing are based on mimicking what people do, not understanding or even simulating it. It’s not owing to breakthroughs in understanding human cognition or even significantly different algorithms. But eGaia is already partly here, at least in the developed world. This distributed nerve-center network, an interplay among the minds of people and their monitoring electronics, will give rise to a distributed technical-social mental system the likes of which has not been experienced before.

To be sure, there have been exponential advances in narrow-engineering applications of artificial intelligence, such as playing chess, calculating travel routes, or translating texts in rough fashion, but there’s been scarcely more than linear progress in five decades of working toward strong AI. For example, the different flavors of intelligent personal assistants available on your smartphone are only modestly better than Eliza, an early example of primitive natural-language-processing from the mid-1960s. We still have no machine that can, for instance, read all that the Web has to say about war and plot a decent campaign, nor do we even have an open-ended AI system that can figure out how to write an essay to pass a freshman composition class or an eighth-grade science exam. Why so little progress, despite the spectacular increases in memory and CPU power? When Marvin Minksy and Gerald Sussman attempted the construction of a visual system in 1966, did they envision superclusters or gigabytes that would sit in your pocket?


pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts by Richard Susskind, Daniel Susskind

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, 3D printing, additive manufacturing, AI winter, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, Andrew Keen, Atul Gawande, Automated Insights, autonomous vehicles, Big bang: deregulation of the City of London, big data - Walmart - Pop Tarts, Bill Joy: nanobots, business process, business process outsourcing, Cass Sunstein, Checklist Manifesto, Clapham omnibus, Clayton Christensen, clean water, cloud computing, commoditize, computer age, Computer Numeric Control, computer vision, conceptual framework, corporate governance, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, death of newspapers, disintermediation, Douglas Hofstadter, en.wikipedia.org, Erik Brynjolfsson, Filter Bubble, Frank Levy and Richard Murnane: The New Division of Labor, full employment, future of work, Google Glasses, Google X / Alphabet X, Hacker Ethic, industrial robot, informal economy, information retrieval, interchangeable parts, Internet of things, Isaac Newton, James Hargreaves, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Khan Academy, knowledge economy, lifelogging, lump of labour, Marshall McLuhan, Metcalfe’s law, Narrative Science, natural language processing, Network effects, optical character recognition, Paul Samuelson, personalized medicine, pre–internet, Ray Kurzweil, Richard Feynman, Richard Feynman, Second Machine Age, self-driving car, semantic web, Shoshana Zuboff, Skype, social web, speech recognition, spinning jenny, strong AI, supply-chain management, telepresence, The Future of Employment, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, transaction costs, Turing test, Watson beat the top human players on Jeopardy!, young professional

This is a computer system, effectively, answering questions on any topic under the sun, and doing so more accurately and quickly than the best human beings at this task. It is hard to overstate how impressive this is. For us, it represents the coming of the second wave of AI (section 4.9). Here is a system that undoubtedly performs tasks that we would normally think require human intelligence. The version of Watson that competed on Jeopardy! holds over 200 million pages of documents and implements a wide range of AI tools and techniques, including natural language processing, machine learning, speech synthesis, game-playing, information retrieval, intelligent search, knowledge processing and reasoning, and much more. This type of AI, we stress again, is radically different from the first wave of rule-based expert systems of the 1980s (see section 4.9). It is interesting to note, harking back again to the exponential growth of information technology, that the hardware on which Watson ran in 2011 was said to be about the size of the average bedroom.

This was an exciting time for AI, the heyday of what has since been called the era of GOFAI (good old-fashioned AI). The term ‘artificial intelligence’ was coined by John McCarthy in 1955, and in the thirty years or so that followed a wide range of systems, techniques, and technologies were brought under its umbrella (the terms used in the mid-1980s are included in parentheses): the processing and translation of natural language (natural language processing); the recognition of the spoken word (speech recognition); the playing of complex games such as chess (game-playing); the recognition of images and objects of the physical world (vision and perception); learning from examples and precedents (machine learning); computer programs that can themselves generate programs (automatic programming); the sophisticated education of human users (intelligent computer-aided instruction); the design and development of machines whose physical movements resembled those of human beings (robotics), and intelligent problem-solving and reasoning (intelligent knowledge-based systems or expert systems).103 Our project at the University of Oxford (1983–6) focused on theoretical and philosophical aspects of this last category—expert systems—as applied in the law.

We can imagine a day when machines will not just make coffee, but will write wonderful poetry, compose splendid symphonies, paint stunning landscapes, sing beautifully, and even dance with remarkable grace. We are likely to judge these contributions in two ways. On the one hand, we might take a view on their relative merits as machine-generated achievement, marvelling perhaps at the underpinning natural language processing or robotics. Our interest will be in comparing like with like—machine performance with machine performance. On the other hand, we might compare their output with the creative expressions of human beings. It may well be that we will concede that, in terms of outcomes, the machine is superior. Yet this will be to contrast apples with pears, so that this comparison may turn out to be wrong-headed.


The Blockchain Alternative: Rethinking Macroeconomic Policy and Economic Theory by Kariappa Bheemaiah

accounting loophole / creative accounting, Ada Lovelace, Airbnb, algorithmic trading, asset allocation, autonomous vehicles, balance sheet recession, bank run, banks create money, Basel III, basic income, Ben Bernanke: helicopter money, bitcoin, blockchain, Bretton Woods, business process, call centre, capital controls, Capital in the Twenty-First Century by Thomas Piketty, cashless society, cellular automata, central bank independence, Claude Shannon: information theory, cloud computing, cognitive dissonance, collateralized debt obligation, commoditize, complexity theory, constrained optimization, corporate governance, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, cryptocurrency, David Graeber, deskilling, Diane Coyle, discrete time, distributed ledger, diversification, double entry bookkeeping, ethereum blockchain, fiat currency, financial innovation, financial intermediation, Flash crash, floating exchange rates, Fractional reserve banking, full employment, George Akerlof, illegal immigration, income inequality, income per capita, inflation targeting, information asymmetry, interest rate derivative, inventory management, invisible hand, John Maynard Keynes: technological unemployment, John von Neumann, joint-stock company, Joseph Schumpeter, Kenneth Arrow, Kenneth Rogoff, Kevin Kelly, knowledge economy, labour market flexibility, large denomination, liquidity trap, London Whale, low skilled workers, M-Pesa, Marc Andreessen, market bubble, market fundamentalism, Mexican peso crisis / tequila crisis, money market fund, money: store of value / unit of account / medium of exchange, mortgage debt, natural language processing, Network effects, new economy, Nikolai Kondratiev, offshore financial centre, packet switching, Pareto efficiency, pattern recognition, peer-to-peer lending, Ponzi scheme, precariat, pre–internet, price mechanism, price stability, private sector deleveraging, profit maximization, QR code, quantitative easing, quantitative trading / quantitative finance, Ray Kurzweil, Real Time Gross Settlement, rent control, rent-seeking, Satoshi Nakamoto, Satyajit Das, savings glut, seigniorage, Silicon Valley, Skype, smart contracts, software as a service, software is eating the world, speech recognition, statistical model, Stephen Hawking, supply-chain management, technology bubble, The Chicago School, The Future of Employment, The Great Moderation, the market place, The Nature of the Firm, the payments system, the scientific method, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, too big to fail, trade liberalization, transaction costs, Turing machine, Turing test, universal basic income, Von Neumann architecture, Washington Consensus

Advantages: easier and faster access to funds, less red tape, transparency, reputation awareness, and appropriate matching of risk based on client segment diversity Risks: reputational risks (right to be forgotten, unestablished standards, regulation, and data privacy 3. Investment Management Stance: Customer-facing Main technologies: Big Data, Machine Learning, Trading Algorithms, Social Media, Robo-Advisory, AI, Natural Language Processing (NLP), Cloud Computing. One of the most adverse outcomes of the crisis was its impact on wealth management: banks suffered a loss of trust, while potential clients now required higher amounts of capital in order to invest. As wages stagnated and employment slowed, it became increasingly difficult for new investors to invest smaller sums of money. Since 2008, a growing number of automated wealth management services (also known as robo-advisory) have arisen to provide low-cost, erudite alternatives to traditional wealth management.

So let’s look at one particular entity that is connected to all of these keywords and see how recent developments of this singular entity is linked to all the jargon being flung about today. The entity we will choose is Chatbots. A Chatbot is essentially a service, powered by rules and artificial intelligence (AI), that a user can interact with via a chat interface. The service could be anything ranging from functional to fun, and it could exist in any chat product (Facebook Messenger, Slack, telegram, text messages, etc.). Recent advancements in Natural Language Processing (NLP) and Automatic Speech Recognition (ASR), coupled with crowdsourced data inputs and machine learning techniques, now allow AI’s to not just understand groups of words but also submit a corresponding natural response to a grouping of words. That’s essentially the base definition of a conversation, except this conversation is with a “bot.” Does this mean that we’ll soon have technology that can pass the Turing test?

See also Debt and money capitalism, 22 cash obsession, 2 CRS report, 2 currencies, 3 floating exchange, 3 functions, 3 gold and silver, 3 history of money, 3 histroy, 2 real commodities, 3 transfer of, 4 types of, 3 withdrawn, 4 shadowbanking (see (Shadow banking and systemic risk)) utilitarian approach, 1 Multiple currencies, 130 Bitcoin Obituaries, 134 bitcoin price, 132 BTC/USD and USD/EUR volatility, 131 contractual money, 132 cryptocurrencies, 133 differences, 131 free banking, 135 Gresham’s law, 133 legal definition, 132 legal status, 132 private and government fiat, 134 private money, 130 quantitative model, 133 sovereign cash, 134 volatility, 131 „„         N Namecoin blockchain, 77 Namibia, 147 Natural Language Processing (NLP), 140 NemID, 79 Neo-Keynesian models, 169 Neuroplasticity, 220–221 New Keynesian models (NK models), 169 ■ INDEX „„         O Occupational Information Network (ONET), 89 Office of Scientific Research and Development (OSRD), 218 OpenID protocol, 76 Originate, repackage and sell model, 29 Originate-to-distribute model, 29 „„         P Paine, Thomas, 144 Palley, Thomas I., 28 Payment protection insurance (PPI), 32 Peer-to-peer (P2P), 46 Personal identification number (PIN), 79 Polycoin, 70 Popperian falsifiability, 163 Public Company Accounting Oversight Board (PCAOB), 153 Public-key certificate (PKC), 76 Public-key infrastructure (PKI), 76 „„         Q Quantitative easing (QE), 138 Quantitative model, 133 „„         R R3 CORDA™, 103 Rational expectations, 161–163 Rational expectations structural models, 221 Rational expectations theory (RET), 156 Rational expectations theory (RMT), 21 RBCmodels.


pages: 472 words: 117,093

Machine, Platform, Crowd: Harnessing Our Digital Future by Andrew McAfee, Erik Brynjolfsson

3D printing, additive manufacturing, AI winter, Airbnb, airline deregulation, airport security, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, artificial general intelligence, augmented reality, autonomous vehicles, backtesting, barriers to entry, bitcoin, blockchain, book scanning, British Empire, business process, carbon footprint, Cass Sunstein, centralized clearinghouse, Chris Urmson, cloud computing, cognitive bias, commoditize, complexity theory, computer age, creative destruction, crony capitalism, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, Dean Kamen, discovery of DNA, disintermediation, distributed ledger, double helix, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, ethereum blockchain, everywhere but in the productivity statistics, family office, fiat currency, financial innovation, George Akerlof, global supply chain, Hernando de Soto, hive mind, information asymmetry, Internet of things, inventory management, iterative process, Jean Tirole, Jeff Bezos, jimmy wales, John Markoff, joint-stock company, Joseph Schumpeter, Kickstarter, law of one price, Lyft, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Marc Andreessen, Mark Zuckerberg, meta analysis, meta-analysis, moral hazard, multi-sided market, Myron Scholes, natural language processing, Network effects, new economy, Norbert Wiener, Oculus Rift, PageRank, pattern recognition, peer-to-peer lending, performance metric, Plutocrats, plutocrats, precision agriculture, prediction markets, pre–internet, price stability, principal–agent problem, Ray Kurzweil, Renaissance Technologies, Richard Stallman, ride hailing / ride sharing, risk tolerance, Ronald Coase, Satoshi Nakamoto, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Skype, slashdot, smart contracts, Snapchat, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Pinker, supply-chain management, TaskRabbit, Ted Nelson, The Market for Lemons, The Nature of the Firm, Thomas L Friedman, too big to fail, transaction costs, transportation-network company, traveling salesman, two-sided market, Uber and Lyft, Uber for X, Watson beat the top human players on Jeopardy!, winner-take-all economy, yield management, zero day

Much of the work of customer service, for example, consists of listening to people to understand what they want, then providing an answer or service to them. Modern technologies can take over the latter of these activities once they learn the rules of an interaction. But the hardest part of customer service to automate has not been finding an answer, but rather the initial step: listening and understanding. Speech recognition and other aspects of natural language processing have been tremendously difficult problems in artificial intelligence since the dawn of the field, for all of the reasons described earlier in this chapter. The previously dominant symbolic approaches have not worked well at all, but newer ones based on deep learning are making progress so quickly that it has surprised even the experts. In October of 2016, a team from Microsoft Research announced that a neural network they had built had achieved “human parity in conversational speech recognition,” as the title of their paper put it.

depth=1#x0026;hl=en#x0026;prev=search#x0026;rurl=translate.google.com#x0026;sl=ja#x0026;sp=nmt4#x0026;u=http://www.fukoku-life.co.jp/about/news/download/20161226.pdf. 84 In October of 2016: Allison Linn, “Historic Achievement: Microsoft Researchers Reach Human Parity in Conversational Speech Recognition,” Microsoft (blog), October 18, 2016, http://blogs.microsoft.com/next/2016/10/18/historic-achievement-microsoft-researchers-reach-human-parity-conversational-speech-recognition/#sm.0001d0t49dx0veqdsh21cccecz0e3. 84 “I must confess that I never thought”: Mark Liberman, “Human Parity in Conversational Speech Recognition,” Language Log (blog), October 18, 2016, http://languagelog.ldc.upenn.edu/nll/?p=28894. 84 “Every time I fire a linguist”: Julia Hirschberg, “ ‘Every Time I Fire a Linguist, My Performance Goes Up,’ and Other Myths of the Statistical Natural Language Processing Revolution” (speech, 15th National Conference on Artificial Intelligence, Madison, WI, July 29, 1998). 84 “AI-first world”: Julie Bort, “Salesforce CEO Marc Benioff Just Made a Bold Prediction about the Future of Tech,” Business Insider, May 18, 2016, http://www.businessinsider.com/salesforce-ceo-i-see-an-ai-first-world-2016-5. 85 “Many businesses still make important decisions”: Marc Benioff, “On the Cusp of an AI Revolution,” Project Syndicate, September 13, 2016, https://www.project-syndicate.org/commentary/artificial-intelligence-revolution-by-marc-benioff-2016-09.

Bertram’s Mind, The” (AI-generated prose), 121 MySpace, 170–71 Naam, Ramez, 258n Nakamoto, Satoshi, 279–85, 287, 296–97, 306, 312 Nakamoto Institute, 304 Nappez, Francis, 190 Napster, 144–45 NASA, 15 Nasdaq, 290–91 National Association of Realtors, 39 National Enquirer, 132 National Institutes of Health, 253 National Library of Australia, 274 Naturalis Historia (Pliny the Elder), 246 natural language processing, 83–84 “Nature of the Firm, The” (Coase), 309–10 Navy, US, 72 negative prices, 216 Nelson, Ted, 33 Nelson, Theodore, 229 Nesbitt, Richard, 45 Netflix, 187 Netscape Navigator, 34 network effects, 140–42 defined, 140 diffusion of platforms and, 205–6 O2O platforms and, 193 size of network and, 217 Stripe and, 174 Uber’s market value and, 219 networks, Cambrian Explosion and, 96 neural networks, 73–74, 78 neurons, 72–73 Newell, Allen, 69 Newmark, Craig, 138 New Republic, 133 news aggregators, 139–40 News Corp, 170, 171 newspapers ad revenue, 130, 132, 139 publishing articles directly on Facebook, 165 Newsweek, 133 New York City Postmates in, 185 taxi medallion prices before and after Uber, 201 UberPool in, 9 New York Times, 73, 130, 152 Ng, Andrew, 75, 96, 121, 186 Nielsen BookScan, 293, 294 99Degrees Custom, 333–34 99designs, 261 Nixon, Richard, 280n Nokia, 167–68, 203 noncredentialism, 241–42 Norman, Robert, 273–74 nugget ice, 11–14 Nuomi, 192 Nupedia, 246–48 Obama, Barack, election of 2012, 48–51 occupancy rates, 221–22 oDesk, 188 Office of Personnel Management, US, 32 oil rigs, 100 on-demand economy, future of companies in, 320 online discussion groups, 229–30 online payment services, 171–74 online reviews, 208–10 O2O (online to offline) platforms, 185–98 business-to-business, 188–90 consumer-oriented, 186–88 defined, 186 as engines of liquidity, 192–96 globalization of, 190–92 interdisciplinary insights from data compiled by, 194 for leveraging assets, 196–97 and machine learning, 194 Opal (ice maker), 13–14 Open Agriculture Initiative, 272 openness (crowd collaboration principle), 241 open platforms curation and, 165 downsides, 164 importance of, 163–65 as key to success, 169 open-source software; See also Linux Android as, 166–67 development by crowd, 240–45 operating systems, crowd-developed, 240–45 Oracle, 204 O’Reilly, Tim, 242 organizational dysfunction, 257 Oruna, 291 Osindero, Simon, 76 Osterman, Paul, 322 Ostrom, Elinor, 313 outcomes, clear (crowd collaboration principle), 243 outsiders in automated investing, 270 experts vs., 252–75 overall evaluation criterion, 51 Overstock.com, 290 Owen, Ivan, 273, 274 Owen, Jennifer, 274n ownership, contracts and, 314–15 Page, Larry, 233 PageRank, 233 Pahlka, Jennifer, 163 Painting Fool, The, 117 Papa John’s Pizza, 286 Papert, Seymour, 73 “Paperwork Mine,” 32 Paris, France, terrorist attack (2015), 55 Parker, Geoffrey, 148 parole, 39–40 Parse.ly, 10 Paulos, John Allen, 233 payments platforms, 171–74 peer reviews, 208–10 peer-to-peer lending, 263 peer-to-peer platforms, 144–45, 298 Peloton, 177n Penthouse magazine, 132 People Express, 181n, 182 Perceptron, 72–74 Perceptrons: An Introduction to Computational Geometry (Minsky and Papert), 73 perishing/perishable inventory and O2O platforms, 186 and revenue management, 181–84 risks in managing, 180–81 personal drones, 98 perspectives, differing, 258–59 persuasion, 322 per-transaction fees, 172–73 Pew Research Center, 18 p53 protein, 116–17 photography, 131 physical environments, experimentation in development of, 62–63 Pindyck, Robert, 196n Pinker, Steven, 68n piracy, of recorded music, 144–45 Plaice, Sean, 184 plastics, transition from molds to 3D printing, 104–7 Platform Revolution (Parker, Van Alstyne, and Choudary), 148 platforms; See also specific platforms business advantages of, 205–11 characteristics of successful, 168–74 competition between, 166–68 and complements, 151–68 connecting online and offline experience, 177–98; See also O2O (online to offline) platforms consumer loyalty and, 210–11 defined, 14, 137 diffusion of, 205 economics of “free, perfect, instant” information goods, 135–37 effect on incumbents, 137–48, 200–204 elasticity of demand, 216–18 future of companies based on, 319–20 importance of being open, 163–65; See also open platforms and information asymmetries, 206–10 limits to disruption of incumbents, 221–24 multisided markets, 217–18 music industry disruption, 143–48 network effect, 140–42 for nondigital goods/services, 178–85; See also O2O (online to offline) platforms and perishing inventory, 180–81 preference for lower prices by, 211–21 pricing elasticities, 212–13 product as counterpart to, 15 and product maker prices, 220–21 proliferation of, 142–48 replacement of assets with, 6–10 for revenue management, 181–84 supply/demand curves and, 153–57 and unbundling, 145–48 user experience as strategic element, 169–74 Playboy magazine, 133 Pliny the Elder, 246 Polanyi, Michael, 3 Polanyi’s Paradox and AlphaGo, 4 defined, 3 and difficulty of comparing human judgment to mathematical models, 42 and failure of symbolic machine learning, 71–72 and machine language, 82 and problems with centrally planned economies, 236 and System 1/System 2 relationship, 45 Postmates, 173, 184–85, 205 Postmates Plus Unlimited, 185 Postrel, Virginia, 90 Pratt, Gil, 94–95, 97, 103–4 prediction data-driven, 59–60 experimentation and, 61–63 statistical vs. clinical, 41 “superforecasters” and, 60–61 prediction markets, 237–39 premium brands, 210–11 presidential elections, 48–51 Priceline, 61–62, 223–24 price/pricing data-driven, 47; See also revenue management demand curves and, 154 elasticities, 212–13 loss of traditional companies’ power over, 210–11 in market economies, 237 and prediction markets, 238–39 product makers and platform prices, 220 supply curves and, 154–56 in two-sided networks, 213–16 Principia Mathematica (Whitehead and Russell), 69 print media, ad revenue and, 130, 132, 139 production costs, markets vs. companies, 313–14 productivity, 16 products as counterpart to platforms, 15 loss of profits to platform providers, 202–4 pairing free apps with, 163 platforms’ effect on, 200–225 threats from platform prices, 220–21 profitability Apple, 204 excessive use of revenue management and, 184 programming, origins of, 66–67 Project Dreamcatcher, 114 Project Xanadu, 33 proof of work, 282, 284, 286–87 prose, AI-generated, 121 Proserpio, Davide, 223 Prosper, 263 protein p53, 116–17 public service, 162–63 Pullman, David, 131 Pullum, Geoffrey, 84 quantitative investing firms (quants), 266–70 Quantopian, 267–70 Quinn, Kevin, 40–41 race cars, automated design for, 114–16 racism, 40, 51–52, 209–10 radio stations as complements to recorded music, 148 in late 1990s, 130 revenue declines (2000–2010), 135 Ramos, Ismael, 12 Raspbian, 244 rationalization, 45 Raymond, Eric, 259 real-options pricing, 196 reasoning, See System 1/System 2 reasoning rebundling, 146–47 recommendations, e-commerce, 47 recorded music industry in late 1990s, 130–31 declining sales (1999-2015), 134, 143 disruption by platforms, 143–48 Recording Industry Association of America (RIAA), 144 redlining, 46–47 Redmond, Michael, 2 reengineering, business process, 32–35 Reengineering the Corporation (Hammer and Champy), 32, 34–35, 37 regulation financial services, 202 Uber, 201–2, 208 Reichman, Shachar, 39 reinforcement learning, 77, 80 Renaissance Technologies, 266, 267 Rent the Runway, 186–88 Replicator 2 (3D printer), 273 reputational systems, 209–10 research and development (R&D), crowd-assisted, 11 Research in Motion (RIM), 168 residual rights of control, 315–18 “Resolution of the Bitcoin Experiment, The” (Hearn), 306 resource utilization rate, 196–97 restaurants, robotics in, 87–89, 93–94 retail; See also e-commerce MUEs and, 62–63 Stripe and, 171–74 retail warehouses, robotics in, 102–3 Rethinking the MBA: Business Education at a Crossroads (Datar, Garvin, and Cullen), 37 revenue, defined, 212 revenue management defined, 47 downsides of, 184–85 O2O platforms and, 193 platforms for, 181–84 platform user experience and, 211 problems with, 183–84 Rent the Runway and, 187 revenue-maximizing price, 212–13 revenue opportunities, as benefit of open platforms, 164 revenue sharing, Spotify, 147 reviews, online, 208–10 Ricardo, David, 279 ride services, See BlaBlaCar; Lyft; Uber ride-sharing, 196–97, 201 Rio Tinto, 100 Robohand, 274 robotics, 87–108 conditions for rapid expansion of, 94–98 DANCE elements, 95–98 for dull, dirty, dangerous, dear work, 99–101 future developments, 104–7 humans and, 101–4 in restaurant industry, 87–89 3D printing, 105–7 Rocky Mountain News, 132 Romney, Mitt, 48, 49 Roosevelt, Teddy, 23 Rosenblatt, Frank, 72, 73 Rovio, 159n Roy, Deb, 122 Rubin, Andy, 166 Ruger, Ted, 40–41 rule-based artificial intelligence, 69–72, 81, 84 Russell, Bertrand, 69 Sagalyn, Raphael, 293n Saloner, Garth, 141n Samsung and Android, 166 and Linux, 241, 244 sales and earnings deterioration, 203–4 San Francisco, California Airbnb in, 9 Craigslist in, 138 Eatsa in, 87 Napster case, 144 Postmates in, 185 Uber in, 201 Sanger, Larry, 246–48 Sato, Kaz, 80 Satoshi Nakamoto Institute, 304 scaling, cloud and, 195–96 Schiller, Phil, 152 Schumpeter, Joseph, 129, 264, 279, 330 Scott, Brian, 101–2 second machine age origins of, 16 phase one, 16 phase two, 17–18 secular trends, 93 security lanes, automated, 89 Sedol, Lee, 5–6 self-checkout kiosks, 90 self-driving automobiles, 17, 81–82 self-justification, 45 self-organization, 244 self-selection, 91–92 self-service, at McDonald’s, 92 self-teaching machines, 17 Seychelles Trading Company, 291 Shanghai Tower, 118 Shapiro, Carl, 141n Shaw, David, 266 Shaw, J.


pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life by Adam Greenfield

3D printing, Airbnb, augmented reality, autonomous vehicles, bank run, barriers to entry, basic income, bitcoin, blockchain, business intelligence, business process, call centre, cellular automata, centralized clearinghouse, centre right, Chuck Templeton: OpenTable, cloud computing, collective bargaining, combinatorial explosion, Computer Numeric Control, computer vision, Conway's Game of Life, cryptocurrency, David Graeber, dematerialisation, digital map, distributed ledger, drone strike, Elon Musk, ethereum blockchain, facts on the ground, fiat currency, global supply chain, global village, Google Glasses, IBM and the Holocaust, industrial robot, informal economy, information retrieval, Internet of things, James Watt: steam engine, Jane Jacobs, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, joint-stock company, Kevin Kelly, Kickstarter, late capitalism, license plate recognition, lifelogging, M-Pesa, Mark Zuckerberg, means of production, megacity, megastructure, minimum viable product, money: store of value / unit of account / medium of exchange, natural language processing, Network effects, New Urbanism, Occupy movement, Oculus Rift, Pareto efficiency, pattern recognition, Pearl River Delta, performance metric, Peter Eisenman, Peter Thiel, planetary scale, Ponzi scheme, post scarcity, RAND corporation, recommendation engine, RFID, rolodex, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, sharing economy, Silicon Valley, smart cities, smart contracts, sorting algorithm, special economic zone, speech recognition, stakhanovite, statistical model, stem cell, technoutopianism, Tesla Model S, the built environment, The Death and Life of Great American Cities, The Future of Employment, transaction costs, Uber for X, universal basic income, urban planning, urban sprawl, Whole Earth Review, WikiLeaks, women in the workforce

At retail, “seamless” point-of-sale processes and the displacement of responsibility onto the shopper themselves via self-checkout slash the number of personnel it takes to run a storefront operation, though some staff will always be required to smooth out the inevitable fiascos; perhaps a few high-end boutiques performatively, conspicuously retain a significant floor presence. In customer service, appalling “cognitive agents” take the place of front-line staff.44 Equipped with speech recognition and natural-language processing capabilities, with synthetic virtual faces that unhesitatingly fold in every last kind of problematic assumption about gender and ethnicity, they’re so cheap that it’s hard to imagine demanding, hard-to-train human staff holding out against them for very long. Even in so-called high-touch fields like childcare and home-health assistance, jobs that might be done and done well by people with no other qualification, face the prospect of elimination.

And this is true on many fronts. A test for machinic intelligence called the Winograd Schema, for example, asks candidate systems to resolve the problems of pronoun disambiguation that crop up constantly in everyday speech.11 Sentences of this type (“I plugged my phone into the wall because it needed to be recharged”) yield to common sense more or less immediately, but still tax the competence of the most advanced natural-language processing systems. Similarly, for all the swagger of their parent company, Uber’s nominally autonomous vehicles seem unable to cope with even so simple an element of the urban environment as a bike lane, swerving in front of cyclists on multiple occasions during the few days they were permitted to operate in San Francisco.12 In the light of results like this, fears that algorithmic systems might take over much of anything at all can easily seem wildly overblown.

Some reflexes are apparently immune to mockery. 19.An image of the brochure can be found at i1.wp.com/bobsullivan.net/wp-content/uploads/2014/09/incident-prevention-close-up-tight.png 20.Richard Kelley et al., “Context-Based Bayesian Intent Recognition,” IEEE Transactions on Autonomous Mental Development, Volume 4, Number 3, September 2012. 21.Richard Socher et al., “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank,” Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, October 2013, pp. 1631–42. 22.Bob Sullivan, “Police Sold on Snaptrends, Software That Claims to Stop Crime Before It Starts,” bobsullivan.net, September 4, 2014. 23.Ibid. 24.Leo Mirani, “Millions of Facebook Users Have No Idea They’re Using the Internet,” Quartz, February 9, 2015. 25.Ellen Huet, “Server and Protect: Predictive Policing Firm PredPol Promises to Map Crime Before It Happens,” Forbes, February 11, 2015. 26.Ibid. 27.Robert L.


pages: 1,085 words: 219,144

Solr in Action by Trey Grainger, Timothy Potter

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

business intelligence, cloud computing, commoditize, conceptual framework, crowdsourcing, data acquisition, en.wikipedia.org, failed state, fault tolerance, finite state, full text search, glass ceiling, information retrieval, natural language processing, performance metric, premature optimization, recommendation engine, web application

log function LoggingHandler class logs, 2nd long queries <long> element LowerCaseFilter LowerCaseFilterFactory, 2nd, 3rd LRU (Least Recently Used) <lst> element, 2nd Lucene, 2nd lucene folder Lucene in Action <luceneMatchVersion> element LuceneQParserPlugin class lucene-solr/ folder LukeRequestHandler class, 2nd M map function MappingCharFilterFactory MapReduce master.replication.enabled parameter masterUrl parameter math functions <maxBufferedDocs> element maxdoc function maxMergeAtOnce parameter maxShardsPerNode parameter maxWarmingSearchers parameter <maxWarmingSearchers> element MBeans, 2nd mean reciprocal rank metric memcached memory RAM sorting and mentions, preserving in text mergeFactor parameter <mergeFactor> element MERGEINDEXES action <mergePolicy> element <mergeScheduler> element metadata microblog search application example, 2nd MinimalStem filter minimum match missing values, and sorting misspelled terms mm parameter MMapDirectory monitoring, external More Like This feature, 2nd, 3rd, 4th MoreLikeThisHandler class, 2nd ms function MS Office documents MS SQL Server multicore configuration multilingual search data-modeling features language identification dynamically assigning language analyzers dynamically mapping content overview update processors for language-specific field type configurations linguistic analysis scenarios field type for multiple languages multiple languages in one field separate fields per language separate indexes per language stemming dictionary-based (Hunspell) example KeywordMarkerFilterFactory language-specific analyzer chains vs. lemmatization StemmerOverrideFilterFactory multiselect faceting defined excludes keys multitenant search MultiTextField, 2nd MultiTextFieldAnalyzer MultiTextFieldLanguageIdentifierUpdate-Processor MultiTextFieldLanguageIdentifierUpdate-ProcessorFactory MultiTextFieldTokenizer, 2nd multiValued attribute murmur hash algorithm MySQL N Nagios, 2nd Natural Language Processing. See NLP. natural language, search using near real-time search. See NRT search. negated terms Nested query parser nesting function queries .NET Netflix newSearcher event n-grams NIOFSDirectory NLP (Natural Language Processing) node recovery process norm function normal commit Norwegian language NorwegianLightStemFilterFactory NoSQL (Not only SQL), 2nd, 3rd not function NOT operator, 2nd NRTCachingDirectory NRTCachingDirectoryFactory class numdocs function numeric fields overview precisionStep attribute numShards parameter, 2nd, 3rd Nutch O offsite backup for SolrCloud omitNorms attribute, 2nd, 3rd op parameter OpenOffice documents <openSearcher> element Optimize request, update handler optional terms, 2nd optmistic concurrency control OR operator, 2nd Oracle AS ord function outage types OutOfMemoryError P parameters dereferencing local params parameter substitutions <params> element parseArg() method parseFloat() method parseValueSource() method PatternReplaceCharFilterFactory, 2nd payload boosting PDF documents importing common formats indexing peer sync perception of relevancy permissions, document Persian language, 2nd persist parameter pf (phrase fields) parameters PHP, 2nd PHPResponseWriter class PHPSerializedResponseWriter class phrase searches, 2nd phrase slop parameters.

This probably makes you think of a person named John walking up to a particular kind of place: a financial institution. If the text instead read “After sailing for hours, John approached the bank,” you would likely be thinking about a person named John on a boat floating toward the shore. Both sentences state that “John approached the bank,” but the context plays a critical role in ensuring the text is properly understood. Due to advances in the field of Natural Language Processing (NLP), many important contextual clues can be identified in standard text. These can include identification of the language of unknown text, determination of the parts of speech, discovery or approximation of the root form of a word, understanding of synonyms and unimportant words, and discovery of relationships between words through their usage. You will notice that the best web search engines today go to great lengths to infer the meaning of your query.

Apache UIMA includes integration with many tools to extract knowledge from within your content, and Solr provides connectors for Apache UIMA, so these may be worth looking into if you need to build sophisticated content analysis capabilities into your search application. Other clustering and data classification techniques can also be used to enrich your data, which can lead to a far superior search experience than keyword searching alone. Although implementing most of these capabilities is beyond the scope of this book, Grant Ingersoll, Thomas Morton, and Andrew Farris provide a great overview of how to implement these kind of natural language processing techniques in Taming Text: How to Find, Organize, and Manipulate It (Manning, 2013), including a chapter on building a question-and-answer system similar to some of the previous examples. What Solr does provide out of the box, however, are the building blocks for these kinds of systems. This includes dozens of language-specific stemmers, a synonym filter, a stop words filter and language-specific stop word lists, character/accent normalization, query correction (spell-check) capabilities, and a language identifier.


pages: 308 words: 84,713

The Glass Cage: Automation and Us by Nicholas Carr

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Airbnb, Airbus A320, Andy Kessler, Atul Gawande, autonomous vehicles, Bernard Ziegler, business process, call centre, Captain Sullenberger Hudson, Checklist Manifesto, cloud computing, computerized trading, David Brooks, deliberate practice, deskilling, digital map, Douglas Engelbart, drone strike, Elon Musk, Erik Brynjolfsson, Flash crash, Frank Gehry, Frank Levy and Richard Murnane: The New Division of Labor, Frederick Winslow Taylor, future of work, global supply chain, Google Glasses, Google Hangouts, High speed trading, indoor plumbing, industrial robot, Internet of things, Jacquard loom, Jacquard loom, James Watt: steam engine, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Kevin Kelly, knowledge worker, Lyft, Marc Andreessen, Mark Zuckerberg, means of production, natural language processing, new economy, Nicholas Carr, Norbert Wiener, Oculus Rift, pattern recognition, Peter Thiel, place-making, Plutocrats, plutocrats, profit motive, Ralph Waldo Emerson, RAND corporation, randomized controlled trial, Ray Kurzweil, recommendation engine, robot derives from the Czech word robota Czech, meaning slave, Second Machine Age, self-driving car, Silicon Valley, Silicon Valley ideology, software is eating the world, Stephen Hawking, Steve Jobs, TaskRabbit, technoutopianism, The Wealth of Nations by Adam Smith, turn-by-turn navigation, US Airways Flight 1549, Watson beat the top human players on Jeopardy!, William Langewiesche

When doctors make diagnoses, they draw on their knowledge of a large body of specialized information, learned through years of rigorous education and apprenticeship as well as the ongoing study of medical journals and other relevant literature. Until recently, it was difficult, if not impossible, for computers to replicate such deep, specialized, and often tacit knowledge. But inexorable advances in processing speed, precipitous declines in data-storage and networking costs, and breakthroughs in artificial-intelligence methods such as natural language processing and pattern recognition have changed the equation. Computers have become much more adept at reviewing and interpreting vast amounts of text and other information. By spotting correlations in the data—traits or phenomena that tend to be found together or to occur simultaneously or sequentially—computers are often able to make accurate predictions, calculating, say, the probability that a patient displaying a set of symptoms has or will develop a particular disease or the odds that a patient with a certain disease will respond well to a particular drug or other treatment regimen.

., 48–51, 215, 216 computer as metaphor and model for, 119 drawing and, 143, 144 imaginative work of, 25 unconscious, 83–84 Mindell, David, 60, 61 Missionaries and Cannibals, 75, 180 miswanting, 15, 228 MIT, 174, 175 Mitchell, William J., 138 mobile phones, 132–33 Moore’s Law, 40 Morozov, Evgeny, 205, 225 Moser, Edvard, 134–35 Moser, May-Britt, 134 motivation, 14, 17, 124 “Mowing” (Frost), 211–16, 218, 221–22 Murnane, Richard, 9, 10 Musk, Elon, 8 Nadin, Mihai, 80 NASA, 50, 55, 58 National Safety Council, 208 National Transportation Safety Board (NTSB), 44 natural language processing, 113 nature, 217, 220 Nature, 155 Nature Neuroscience, 134–35 navigation systems, 59, 68–71, 217 see also GPS Navy, U.S., 189 Nazi Germany, 35, 157 nervous system, 9–10, 36, 220–21 Networks of Power (Hughes), 196 neural networks, 113–14 neural processing, 119n neuroergonomic systems, 165 neurological studies, 9 neuromorphic microchips, 114, 119n neurons, 57, 133–34, 150, 219 neuroscience, neuroscientists, 74, 133–37, 140, 149 New Division of Labor, The (Levy and Murnane), 9 Nimwegen, Christof van, 75–76, 180 Noble, David, 173–74 Norman, Donald, 161 Noyes, Jan, 54–55 NSA, 120, 198 numerical control, 174–75 Oakeshott, Michael, 124 Obama, Barack, 94 Observer, 78–79 Oculus Rift, 201 Office of the Inspector General, 99 offices, 28, 108–9, 112, 222 automation complacency and, 69 Ofri, Danielle, 102 O’Keefe, John, 133–34 Old Dominion University, 91 “On Things Relating to the Surgery” (Hippocrates), 158 oracle machine, 119–20 “Outsourced Brain, The” (Brooks), 128 Pallasmaa, Juhani, 145 Parameswaran, Ashwin, 115 Parameters, 191 parametric design, 140–41 parametricism, 140–41 “Parametricism Manifesto” (Schumacher), 141 Parasuraman, Raja, 54, 67, 71, 166, 176 Parry, William Edward, 125 pattern recognition, 57, 58, 81, 83, 113 Pavlov, Ivan, 88 Pebble, 201 Pediatrics, 97 perception, 8, 121, 130, 131, 132, 133, 144, 148–51, 201, 214–18, 220, 226, 230 performance, Yerkes-Dodson law and, 96 Phenomenology of Perception (Merleau-Ponty), 216 philosophers, 119, 143, 144, 148–51, 186, 224 photography, film vs. digital, 230 Piano, Renzo, 138, 141–42 pilots, 1, 2, 32, 43–63, 91, 153 attentional tunneling and, 200–201 capability of the plane vs., 60–61, 154 death of, 53 erosion of expertise of, 54–58, 62–63 human- vs. technology-centered automation and, 168–70, 172–73 income of, 59–60 see also autopilot place, 131–34, 137, 251n place cells, 133–34, 136, 219 Plato, 148 Player Piano (Vonnegut), 39 poetry, 211–16, 218, 221–22 Poirier, Richard, 214, 215 Politics (Aristotle), 224 Popular Science, 48 Post, Wiley, 48, 50, 53, 57, 62, 82, 169 power, 21, 37, 65, 151, 175, 204, 217 practice, 82–83 Predator drone, 188 premature fixation, 145 presence, power of, 200 Priestley, Joseph, 160 Prius, 6, 13, 154–55 privacy, 206 probability, 113–24 procedural (tacit) knowledge, 9–11, 83, 105, 113, 144 productivity, 18, 22, 29, 30, 37, 106, 160, 173, 175, 181, 218 professional work, incursion of computers into, 115 profit motive, 17 profits, 18, 22, 28, 30, 33, 95, 159, 171, 172–73, 175 progress, 21, 26, 29, 37, 40, 65, 196, 214 acceleration of, 26 scientific, 31, 123 social, 159–60, 228 progress (continued) technological, 29, 31, 34, 35, 48–49, 108–9, 159, 160, 161, 173, 174, 222, 223–24, 226, 228, 230 utopian vision of, 25, 26 prosperity, 20, 21, 107 proximal cues, 219–20 psychologists, psychology, 9, 11, 15, 54, 103, 119, 149, 158–59 animal studies, 87–92 cognitive, 72–76, 81, 129–30 psychomotor skills, 56, 57–58, 81, 120 quality of experience, 14–15 Race against the Machine (Brynjolfsson and McAfee), 28–29 RAND Corporation, 93–98 “Rationalism in Politics” (Oakeshott), 124 Rattner, Justin, 203 reading, learning of, 82 Reaper drone, 188 reasoning, reason, 120, 121, 124, 151 recession, 27, 28, 30, 32 Red Dead Redemption, 177–78 “Relation of Strength of Stimulus to Rapidity of Habit-Formation, The” (Yerkes and Dodson), 89 Renslow, Marvin, 43–44 Revit, 146, 147 Rifkin, Jeremy, 28 Robert, David, 45, 169–70 Robert Frost (Poirier), 214 Roberts, J.

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, speech recognition, statistical model, William of Occam

The basic idea is to use a set of words (or terms) that the user WEB CHALLENGES 5 specifies and retrieve documents that include (or do not include) those words. This is the keyword search approach, well known from the area of information retrieval (IR). In web search, further IR techniques are used to avoid terms that are too general and too specific and to take into account term distribution throughout the entire body of documents as well as to explore document similarity. Natural language processing approaches are also used to analyze term context or lexical information, or to combine several terms into phrases. After retrieving a set of documents ranked by their degree of matching the keyword query, they are further ranked by importance (popularity, authority), usually based on the web link structure. All these approaches are discussed further later in the book. Topic Directories Web pages are organized into hierarchical structures that reflect their meaning.

To have content-based access to these documents, we organize them in libraries, bibliography systems, and by other means. This process takes a lot of time and effort because it is done by people. There are attempts to use computers for this purpose, but the problem is that content-based access assumes understanding the meaning of documents, something that is still a research question, studied in the area of artificial intelligence and natural language processing in particular. One may argue that natural language texts are structured, which is true as long as the language syntax (grammatical structure) is concerned. However, the transition to meaning still requires semantic structuring or understanding. There exists a solution that avoids the problem of meaning but still provides some types of content-based access to unstructured data. This is the keyword search approach known from the area of information retrieval (IR).


pages: 470 words: 109,589

Apache Solr 3 Enterprise Search Server by Unknown

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

bioinformatics, continuous integration, database schema, en.wikipedia.org, fault tolerance, Firefox, full text search, information retrieval, Internet Archive, natural language processing, performance metric, platform as a service, Ruby on Rails, web application

SignatureUpdateProcessorFactory: This generates a hash ID value based off of other field values you specify. If you want to de-duplicate your data (that is you don't want to add the same data twice accidentally) then this will do that for you. For further information see http://wiki.apache.org/solr/Deduplication. UIMAUpdateProcessorFactory: This hands the document off to the Unstructured Information Management Architecture (UIMA), a Solr contrib module that enhances the document through natural language processing (NLP) techniques. For further information see http://wiki.apache.org/solr/SolrUIMA. Although it's nice to see an NLP integration option in Solr, beware that NLP processing tends to be computationally expensive. Instead of using UIMA in this way, consider performing this processing external to Solr and cache the results to avoid re-computation as you adjust your indexing process. LogUpdateProcessorFactory: This is the one responsible for writing the log messages you see when an update occurs.

Indexing locations You need raw location data in the form of a latitude and longitude to take advantage of Solr's geospatial capabilities. If you have named locations (for example, "Boston, MA") then the data needs to be resolved to latitudes and longitudes using a gazetteer like Geonames—http://www.geonames.org. If all you have is free-form natural language text without the locations identified, then you'll have to perform a more difficult task that uses Natural Language Processing techniques to find the named locations. These approaches are out of scope of this book. The principle field type in Solr for geospatial is LatLonType, which stores a single latitude-longitude pair. Under the hood, this field type copies the latitude and longitude into a pair of indexed fields using the provided field name suffix. In the following excerpt taken from Solr's example schema, given the field name store, there will be two additional fields named store_0_coordinate and store_1_coordinate, which you'll see in Solr's schema browser.

Beginning R: The Statistical Programming Language by Mark Gardener

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

correlation coefficient, distributed generation, natural language processing, New Urbanism, p-value, statistical model

Table 1-1: Task Views and Their Uses Title Uses Bayesian Bayesian Inference ChemPhys Chemometrics and Computational Physics ClinicalTrials Clinical Trial Design, Monitoring, and Analysis Cluster Cluster Analysis & Finite Mixture Models Distributions Probability Distributions Econometrics Computational Econometrics Environmetrics Analysis of Ecological and Environmental Data ExperimentalDesign Design of Experiments (DoE) & Analysis of Experimental Data Finance Empirical Finance Genetics Statistical Genetics Graphics Graphic Displays & Dynamic Graphics & Graphic Devices & Visualization gR gRaphical Models in R HighPerformanceComputing High-Performance and Parallel Computing with R MachineLearning Machine Learning & Statistical Learning MedicalImaging Medical Image Analysis Multivariate Multivariate Statistics NaturalLanguageProcessing Natural Language Processing OfficialStatistics Official Statistics & Survey Methodology Optimization Optimization and Mathematical Programming Pharmacokinetics Analysis of Pharmacokinetic Data Phylogenetics Phylogenetics, Especially Comparative Methods Psychometrics Psychometric Models and Methods ReproducibleResearch Reproducible Research Robust Robust Statistical Methods SocialSciences Statistics for the Social Sciences Spatial Analysis of Spatial Data Survival Survival Analysis TimeSeries Time Series Analysis Alternatively, you can search the Internet for your topic and you will likely find quite a few hits that mention appropriate R packages.


pages: 588 words: 131,025

The Patient Will See You Now: The Future of Medicine Is in Your Hands by Eric Topol

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, 3D printing, Affordable Care Act / Obamacare, Anne Wojcicki, Atul Gawande, augmented reality, bioinformatics, call centre, Clayton Christensen, clean water, cloud computing, commoditize, computer vision, conceptual framework, connected car, correlation does not imply causation, creative destruction, crowdsourcing, dark matter, data acquisition, disintermediation, don't be evil, Edward Snowden, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Firefox, global village, Google Glasses, Google X / Alphabet X, Ignaz Semmelweis: hand washing, information asymmetry, interchangeable parts, Internet of things, Isaac Newton, job automation, Joseph Schumpeter, Julian Assange, Kevin Kelly, license plate recognition, lifelogging, Lyft, Mark Zuckerberg, Marshall McLuhan, meta analysis, meta-analysis, microbiome, Nate Silver, natural language processing, Network effects, Nicholas Carr, obamacare, pattern recognition, personalized medicine, phenotype, placebo effect, RAND corporation, randomized controlled trial, Second Machine Age, self-driving car, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, Snapchat, social graph, speech recognition, stealth mode startup, Steve Jobs, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Turing test, Uber for X, Watson beat the top human players on Jeopardy!, X Prize

When Sanofi and Regeneron were looking to expedite recruitment of patients with high cholesterol for their new, experimental drug alirocumab, an antibody against the PCSK9 protein, they turned to the American College of Cardiology registry.108 Another approach, developed by researchers at Case Western Reserve University, is a software tool known as “Trial Prospector,” which delves into clinical data systems to match patients with clinical trials.109 It combines artificial intelligence and natural language processing to automate the patient screening and enrollment process, often a rate-limiting step in developing new drugs. Automated clinical trial matching programs for specific conditions, such as the Alzheimer’s Association Trialmatch,107 are proliferating. Data mining to facilitate clinical trial recruitment is offered by a number of companies, such as Blue Chip Marketing Worldwide and Acurian.110 Ben Goldacre, the acclaimed author and one of the leading independent critics and innovators in pharma research, set up the tool “RandomiseMe,” which makes it “easy to run randomized clinical trials on yourself and your friends.”111 So although clinical trial participation is remarkably rare today, there are efforts on multiple fronts to change that in the future.

Cultural change is exceedingly difficult, but given the other forces in the iMedicine galaxy, especially the health care economic crisis that has engendered desperation, it may be possible to accomplish. An aggressive commitment to the education and training of practicing physicians to foster their use of the new tools would not only empower their patients, but also themselves. Eliminating the enormous burden of electronic charting or use of scribes by an all-out effort for natural language processing of voice during a visit would indeed be liberating. It’s long overdue for physicians and health professionals to be constantly cognizant of actual costs, eliminate unnecessary tests and procedures,75a and engage in exquisite electronic communication, which includes e-mail, and sharing notes and all data. If financial incentives are needed, they may be well worth the investment. Data Scientists Government and recalcitrant doctors are major potential impediments, but the biggest bottleneck to advancing the field is unquestionably dealing with data.


pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots by John Markoff

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

A Declaration of the Independence of Cyberspace, AI winter, airport security, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, basic income, Baxter: Rethink Robotics, Bill Duvall, bioinformatics, Brewster Kahle, Burning Man, call centre, cellular automata, Chris Urmson, Claude Shannon: information theory, Clayton Christensen, clean water, cloud computing, collective bargaining, computer age, computer vision, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data acquisition, Dean Kamen, deskilling, don't be evil, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dynabook, Edward Snowden, Elon Musk, Erik Brynjolfsson, factory automation, From Mathematics to the Technologies of Life and Death, future of work, Galaxy Zoo, Google Glasses, Google X / Alphabet X, Grace Hopper, Gunnar Myrdal, Gödel, Escher, Bach, Hacker Ethic, haute couture, hive mind, hypertext link, indoor plumbing, industrial robot, information retrieval, Internet Archive, Internet of things, invention of the wheel, Jacques de Vaucanson, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, knowledge worker, Kodak vs Instagram, labor-force participation, loose coupling, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, medical residency, Menlo Park, Mother of all demos, natural language processing, new economy, Norbert Wiener, PageRank, pattern recognition, pre–internet, RAND corporation, Ray Kurzweil, Richard Stallman, Robert Gordon, Rodney Brooks, Sand Hill Road, Second Machine Age, self-driving car, semantic web, shareholder value, side project, Silicon Valley, Silicon Valley startup, Singularitarianism, skunkworks, Skype, social software, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Nelson, telemarketer, telepresence, telepresence robot, Tenerife airport disaster, The Coming Technological Singularity, the medium is the message, Thorstein Veblen, Turing test, Vannevar Bush, Vernor Vinge, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, William Shockley: the traitorous eight, zero-sum game

Because it was faster to cast an erroneous line than correct it, typesetters would “run down” the rest of the line with easy-to-type nonsense, later removing the entire line after it had cooled down, or if they forgot, hope a proofreader caught it.9 He wasn’t concerned at the time about any ethical implications involved in building a natural language processing system that could “understand” and respond in a virtual world. In SHRDLU “understanding” meant that the program analyzed the structure of the typed questions and attempted to answer them and respond to the commands. It was an early effort at disambiguation, a thorny problem for natural language processing even today. For example, in the sentence “he put the glass on the table and it broke,” does “it” refer to the glass or the table? Without more context, neither a human nor an AI program could decide. Winograd’s system used its general knowledge of the microworld to answer and respond to various questions.


pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, basic income, Bayesian statistics, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, creative destruction, crowdsourcing, Danny Hillis, data is the new oil, double helix, Douglas Hofstadter, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, global village, Google Glasses, Gödel, Escher, Bach, information retrieval, job automation, John Markoff, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, lone genius, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, NP-complete, off grid, P = NP, PageRank, pattern recognition, phenotype, planetary scale, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, Second Machine Age, self-driving car, Silicon Valley, speech recognition, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight, zero-sum game

A semantic network is a set of concepts (like planets and stars) and relations among those concepts (planets orbit stars). Alchemy learned over a million such patterns from facts extracted from the web (e.g., Earth orbits the sun). It discovered concepts like planet all by itself. The version we used was more advanced than the basic one I’ve described here, but the essential ideas are the same. Various research groups have used Alchemy or their own MLN implementations to solve problems in natural language processing, computer vision, activity recognition, social network analysis, molecular biology, and many other areas. Despite its successes, Alchemy has some significant shortcomings. It does not yet scale to truly big data, and someone without a PhD in machine learning will find it hard to use. Because of these problems, it’s not yet ready for prime time. But let’s see what we can do about them.

“Relevance weighting of search terms,”* by Stephen Robertson and Karen Sparck Jones (Journal of the American Society for Information Science, 1976), explains the use of Naïve Bayes–like methods in information retrieval. “First links in the Markov chain,” by Brian Hayes (American Scientist, 2013), recounts Markov’s invention of the eponymous chains. “Large language models in machine translation,”* by Thorsten Brants et al. (Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007), explains how Google Translate works. “The PageRank citation ranking: Bringing order to the Web,”* by Larry Page, Sergey Brin, Rajeev Motwani, and Terry Winograd (Stanford University technical report, 1998), describes the PageRank algorithm and its interpretation as a random walk over the web. Statistical Language Learning,* by Eugene Charniak (MIT Press, 1996), explains how hidden Markov models work.


pages: 163 words: 42,402

Machine Learning for Email by Drew Conway, John Myles White

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

call centre, correlation does not imply causation, Debian, natural language processing, Netflix Prize, pattern recognition, recommendation engine, SpamAssassin, text mining

Moreover, because we calculate conditional probabilities using products, if we assigned a zero probability to terms not in our training data, elementary arithmetic tells us that we would calculate zero as the probability of most messages, since we would be multiplying all the other probabilities by zero every time we encountered an unknown term. This would cause catastrophic results for our classifier, as many, or even all, messages would be incorrectly assigned a zero probability to be either spam or ham. Researchers have come up with many clever ways of trying to get around this problem, such as drawing a random probability from some distribution or using natural language processing (NLP) techniques to estimate the “spamminess” of a term given its context. For our purposes, we will use a very simple rule: assign a very small probability to terms that are not in the training set. This is, in fact, a common way of dealing with missing terms in simple text classifiers, and for our purposes it will serve just fine. In this exercise, by default we will set this probability to 0.0001%, or one-ten-thousandth of a percent, which is sufficiently small for this data set.


pages: 133 words: 42,254

Big Data Analytics: Turning Big Data Into Big Money by Frank J. Ohlhorst

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

algorithmic trading, bioinformatics, business intelligence, business process, call centre, cloud computing, create, read, update, delete, data acquisition, DevOps, fault tolerance, linked data, natural language processing, Network effects, pattern recognition, performance metric, personalized medicine, RFID, sentiment analysis, six sigma, smart meter, statistical model, supply-chain management, Watson beat the top human players on Jeopardy!, web application

In addition, the underlying system can resolve references by inferring new triples from the existing records using a rules set. This is a powerful alternative to joining relational tables to resolve references in a typical RDBMS, while also offering a more expressive way to model data than a key value store. One of the most powerful aspects of semantic technology comes from the world of linguistics and natural language processing, also known as entity extraction. This is a powerful mechanism to extract information from unstructured data and combine it with transactional data, enabling deep analytics by bringing these worlds closer together. Another method that brings structure to the unstructured is the text analytics tool, which is improving daily as scientists come up with new ways of making algorithms understand written text more accurately.


pages: 219 words: 63,495

50 Future Ideas You Really Need to Know by Richard Watson

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, 3D printing, access to a mobile phone, Albert Einstein, artificial general intelligence, augmented reality, autonomous vehicles, BRICs, Buckminster Fuller, call centre, clean water, cloud computing, collaborative consumption, computer age, computer vision, crowdsourcing, dark matter, dematerialisation, digital Maoism, digital map, Elon Musk, energy security, failed state, future of work, Geoffrey West, Santa Fe Institute, germ theory of disease, happiness index / gross national happiness, hive mind, hydrogen economy, Internet of things, Jaron Lanier, life extension, Mark Shuttleworth, Marshall McLuhan, megacity, natural language processing, Network effects, new economy, oil shale / tar sands, pattern recognition, peak oil, personalized medicine, phenotype, precision agriculture, profit maximization, RAND corporation, Ray Kurzweil, RFID, Richard Florida, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Skype, smart cities, smart meter, smart transportation, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, supervolcano, telepresence, The Wisdom of Crowds, Thomas Malthus, Turing test, urban decay, Vernor Vinge, Watson beat the top human players on Jeopardy!, web application, women in the workforce, working-age population, young professional

the condensed idea Thought control timeline 2000 Electrode arrays implanted into owl monkeys 2001 Technology allows a monkey to operate a robotic arm via thought control 2006 Teenager plays Space Invaders using brain signals 2008 Scientists manage to extract images from a person’s mind 2009 Brain–Twitter interface 2017 Voice control replaces 70 percent of keyboards 2026 Google patents neural interface 33 Avatar assistants Computer-based avatars are virtual recreations of real or fictional characters used in forms of computer gaming and in virtual online communities. In the near future they will become common as intelligent digital assistants or personal agents, controlled by forms of artificial intelligence such as natural language processing and accessed via mobile or fixed devices. “Everything is backward now, like out there is the true world, and in here is the dream.” Jake Sully in the movie Avatar Apple’s iPhone 4S offers a tantalizing glimpse of the future in the form of Siri, an application that allows users to employ normal language to send messages or ask questions. But this is a very basic technology compared with what’s to come.


pages: 247 words: 71,698

Avogadro Corp by William Hertling

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Any sufficiently advanced technology is indistinguishable from magic, cloud computing, crowdsourcing, Hacker Ethic, hive mind, invisible hand, natural language processing, Netflix Prize, private military company, Ray Kurzweil, recommendation engine, Richard Stallman, Ruby on Rails, technological singularity, Turing test, web application

David noticed that Rebecca Smith was standing in the doorway listening to the presentation. In a sharp tailored suit, and with her reputation hovering about her like an invisible aura, the Avogadro CEO made for an imposing presence. Only her warm smile left a welcoming space in which an ordinary guy like David could stand. She nodded to David as she came in and took her seat at the head of the table. Kenneth asked, “But what you’re describing, how does it work? Natural language processing ability of computers doesn’t even come close to being able to understand the semantics of human language. Have you had some miracle breakthrough?” “At the heart of how this works is the field of recommendation algorithms,” David explained. “Sean hired me not because I knew anything about language analysis but because I was a leading competitor in the Netflix competition. Netflix recommends movies that you’d enjoy watching.


pages: 589 words: 69,193

Mastering Pandas by Femi Anthony

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Amazon Web Services, Bayesian statistics, correlation coefficient, correlation does not imply causation, Debian, en.wikipedia.org, Internet of things, natural language processing, p-value, random walk, side project, statistical model, Thomas Bayes

Among the characteristics that make Python popular for data science are its very user-friendly (human-readable) syntax, the fact that it is interpreted rather than compiled (leading to faster development time), and its very comprehensive library for parsing and analyzing data, as well as its capacity for doing numerical and statistical computations. Python has libraries that provide a complete toolkit for data science and analysis. The major ones are as follows: NumPy: The general-purpose array functionality with emphasis on numeric computation SciPy: Numerical computing Matplotlib: Graphics pandas: Series and data frames (1D and 2D array-like types) Scikit-Learn: Machine learning NLTK: Natural language processing Statstool: Statistical analysis For this book, we will be focusing on the 4th library listed in the preceding list, pandas. What is pandas? The pandas is a high-performance open source library for data analysis in Python developed by Wes McKinney in 2008. Over the years, it has become the de-facto standard library for data analysis using Python. There's been great adoption of the tool, a large community behind it, (220+ contributors and 9000+ commits by 03/2014), rapid iteration, features, and enhancements continuously made.


pages: 237 words: 64,411

Humans Need Not Apply: A Guide to Wealth and Work in the Age of Artificial Intelligence by Jerry Kaplan

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Affordable Care Act / Obamacare, Amazon Web Services, asset allocation, autonomous vehicles, bank run, bitcoin, Bob Noyce, Brian Krebs, buy low sell high, Capital in the Twenty-First Century by Thomas Piketty, combinatorial explosion, computer vision, corporate governance, crowdsourcing, en.wikipedia.org, Erik Brynjolfsson, estate planning, Flash crash, Gini coefficient, Goldman Sachs: Vampire Squid, haute couture, hiring and firing, income inequality, index card, industrial robot, information asymmetry, invention of agriculture, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, Loebner Prize, Mark Zuckerberg, mortgage debt, natural language processing, Own Your Own Home, pattern recognition, Satoshi Nakamoto, school choice, Schrödinger's Cat, Second Machine Age, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Skype, software as a service, The Chicago School, The Future of Employment, Turing test, Watson beat the top human players on Jeopardy!, winner-take-all economy, women in the workforce, working poor, Works Progress Administration

Jason Brewster, the company’s CEO, estimates that FairDocument reduces the time required to complete a straightforward estate plan from several hours to as little as fifteen to thirty minutes, not to mention that his company is doing the prospecting for new clients and delivering them to the attorneys. A more sophisticated example of synthetic intellects encroaching on legal expertise is the startup Judicata.34 The company uses machine learning and natural language processing techniques to convert ordinary text—such as legal principles or specific cases— into structured information that can be used for finding relevant case law. For instance, it could find all cases in which a male Hispanic gay employee successfully sued for wrongful termination by reading the actual text of court decisions, saving countless hours in a law library or using a more traditional electronic search tool.


pages: 268 words: 75,850

The Formula: How Algorithms Solve All Our Problems-And Create More by Luke Dormehl

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, algorithmic trading, Any sufficiently advanced technology is indistinguishable from magic, augmented reality, big data - Walmart - Pop Tarts, call centre, Cass Sunstein, Clayton Christensen, commoditize, computer age, death of newspapers, deferred acceptance, Edward Lorenz: Chaos theory, Erik Brynjolfsson, Filter Bubble, Flash crash, Florence Nightingale: pie chart, Frank Levy and Richard Murnane: The New Division of Labor, Google Earth, Google Glasses, High speed trading, Internet Archive, Isaac Newton, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Kevin Kelly, Kodak vs Instagram, lifelogging, Marshall McLuhan, means of production, Nate Silver, natural language processing, Netflix Prize, pattern recognition, price discrimination, recommendation engine, Richard Thaler, Rosa Parks, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, Slavoj Žižek, social graph, speech recognition, Steve Jobs, Steven Levy, Steven Pinker, Stewart Brand, the scientific method, The Signal and the Noise by Nate Silver, upwardly mobile, Wall-E, Watson beat the top human players on Jeopardy!, Y Combinator

If such a tool was to be implemented within a future edition of MS Word or Google Docs, it is not inconceivable that users may one day finish typing a document and hit a single button—at which point it is auto-checked for spelling, punctuation, formatting and truthfulness. Already there is widespread use of algorithms in academia for sifting through submitted work and pulling up passages that may or may not be plagiarized. These will only become more widespread as natural language processing becomes more intuitive and able to move beyond simple passage comparison to detailed content and idea analysis. There is no one-size-fits-all answer to how best to deal with algorithms. In some cases, increased transparency would appear to be the answer. Where algorithms are used to enforce laws, for instance, releasing the source code to the general public would both protect against the dangers of unchecked government policy-making and make it possible to determine how specific decisions have been reached.


pages: 224 words: 13,238

Electronic and Algorithmic Trading Technology: The Complete Guide by Kendall Kim

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

algorithmic trading, automated trading system, backtesting, commoditize, computerized trading, corporate governance, Credit Default Swap, diversification, en.wikipedia.org, family office, financial innovation, fixed income, index arbitrage, index fund, interest rate swap, linked data, market fragmentation, money market fund, natural language processing, quantitative trading / quantitative finance, random walk, risk tolerance, risk-adjusted returns, short selling, statistical arbitrage, Steven Levy, transaction costs, yield curve

At the moment, big strategic decisions such as which shares to buy or sell are made by human traders; algorithmic programs are then given the power to decide how to buy or sell shares, with the aim of hiding the client’s intentions. Executing algorithms are designed to be stealthy and create as little volatility as possible. The fact that they are designed to reduce the market impact of trades should in fact have a stabilizing effect in equity markets. Some day, advances in natural language processing and statistical analysis might lead to algorithms capable of analyzing news feeds, deciding which shares to buy and sell, and devising their own strategies. Broker dealers, software vendors, and now investment institutions are entering the algorithmic arms race. Since there are so many possible trading strategies, it is doubtful that there will turn out to be one single trading algorithm that outperforms all others.


pages: 265 words: 69,310

What's Yours Is Mine: Against the Sharing Economy by Tom Slee

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

4chan, Airbnb, Amazon Mechanical Turk, asset-backed security, barriers to entry, Berlin Wall, big-box store, bitcoin, blockchain, citizen journalism, collaborative consumption, congestion charging, Credit Default Swap, crowdsourcing, data acquisition, David Brooks, don't be evil, gig economy, Hacker Ethic, income inequality, informal economy, invisible hand, Jacob Appelbaum, Jane Jacobs, Jeff Bezos, Khan Academy, Kibera, Kickstarter, license plate recognition, Lyft, Marc Andreessen, Mark Zuckerberg, move fast and break things, move fast and break things, natural language processing, Netflix Prize, Network effects, new economy, Occupy movement, openstreetmap, Paul Graham, peer-to-peer, peer-to-peer lending, Peter Thiel, pre–internet, principal–agent problem, profit motive, race to the bottom, Ray Kurzweil, recommendation engine, rent control, ride hailing / ride sharing, sharing economy, Silicon Valley, Snapchat, software is eating the world, South of Market, San Francisco, TaskRabbit, The Nature of the Firm, Thomas L Friedman, transportation-network company, Uber and Lyft, Uber for X, ultimatum game, urban planning, WikiLeaks, winner-take-all economy, Y Combinator, Zipcar

In July 2014 Airbnb tried to encourage more critical reviews by holding back the publication of reviews until both parties had submitted a review of the other; neither the company nor researchers with access to the company’s data have commented on the success of the change. In another experiment, Airbnb staff are working with external researchers to test whether offering a reward to encourage reviews has any effect on the number of critical reviews that guests ­provide.24 Other efforts are trying to squeeze more critical information from what is already there. Airbnb is using natural language processing to parse critical comments from review texts.25 Researchers have shown that taking missing reviews into account can give a much more effective measure of seller quality.26 The problem with such efforts is that, if systems were changed so that missing reviews or passive-aggressive text comments were known to be recorded (and so became, implicitly, a negative review) customer behavior may change to avoid the threat of a negative (non-) review in return.


pages: 222 words: 70,132

Move Fast and Break Things: How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy by Jonathan Taplin

1960s counterculture, 3D printing, affirmative action, Affordable Care Act / Obamacare, Airbnb, Amazon Mechanical Turk, American Legislative Exchange Council, Apple's 1984 Super Bowl advert, back-to-the-land, barriers to entry, basic income, battle of ideas, big data - Walmart - Pop Tarts, bitcoin, Brewster Kahle, Buckminster Fuller, Burning Man, Clayton Christensen, commoditize, creative destruction, crony capitalism, crowdsourcing, data is the new oil, David Brooks, David Graeber, don't be evil, Donald Trump, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, Elon Musk, equal pay for equal work, Erik Brynjolfsson, future of journalism, future of work, George Akerlof, George Gilder, Google bus, Hacker Ethic, Howard Rheingold, income inequality, informal economy, information asymmetry, information retrieval, Internet Archive, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: technological unemployment, John von Neumann, Joseph Schumpeter, Kevin Kelly, Kickstarter, labor-force participation, life extension, Marc Andreessen, Mark Zuckerberg, Menlo Park, Metcalfe’s law, Mother of all demos, move fast and break things, move fast and break things, natural language processing, Network effects, new economy, Norbert Wiener, offshore financial centre, packet switching, Paul Graham, Peter Thiel, Plutocrats, plutocrats, pre–internet, Ray Kurzweil, recommendation engine, rent-seeking, revision control, Robert Bork, Robert Gordon, Robert Metcalfe, Ronald Reagan, Sand Hill Road, secular stagnation, self-driving car, sharing economy, Silicon Valley, Silicon Valley ideology, smart grid, Snapchat, software is eating the world, Steve Jobs, Stewart Brand, technoutopianism, The Chicago School, The Market for Lemons, Tim Cook: Apple, trade route, transfer pricing, trickle-down economics, Tyler Cowen: Great Stagnation, universal basic income, unpaid internship, We wanted flying cars, instead we got 140 characters, web application, Whole Earth Catalog, winner-take-all economy, women in the workforce, Y Combinator

During the 2016 presidential campaign, Donald Trump regularly boasted about his ten million Twitter followers, even though (according to the site StatusPeople, which tracks how many Twitter accounts are bots, how many are inactive, and how many are real) only 21 percent of Trump’s Twitter followers are real, active users on the platform. Hillary Clinton didn’t fare much better, with only 30 percent of her followers classified as real. During the 2012 presidential race, the Annenberg Innovation Lab studied Twitter and politics, and what we found was pretty disturbing. We created a natural-language-processing computer model that read every tweet about every candidate and sorted them by sentiment. At the beginning I loved reading the dashboard of the twenty most positive and negative tweets of the previous hour. But within weeks the incredible amount of racist tweets directed at our president became too painful to look at. The anonymity that Twitter provides is a shield that brings out the worst in humans.

Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

Mining knowledge in cube space can substantially enhance the power and flexibility of data mining. ■ Data mining—an interdisciplinary effort: The power of data mining can be substantially enhanced by integrating new methods from multiple disciplines. For example, to mine data with natural language text, it makes sense to fuse data mining methods with methods of information retrieval and natural language processing. As another example, consider the mining of software bugs in large programs. This form of mining, known as bug mining, benefits from the incorporation of software engineering knowledge into the data mining process. ■ Boosting the power of discovery in a networked environment: Most data objects reside in a linked or interconnected environment, whether it be the Web, database relations, files, or documents.

Semantic annotation of a frequent pattern Figure 7.12 shows an example of a semantic annotation for the pattern “{frequent, pattern}.” This dictionary-like annotation provides semantic information related to “{frequent, pattern},” consisting of its strongest context indicators, the most representative data transactions, and the most semantically similar patterns. This kind of semantic annotation is similar to natural language processing. The semantics of a word can be inferred from its context, and words sharing similar contexts tend to be semantically similar. The context indicators and the representative transactions provide a view of the context of the pattern from different angles to help users understand the pattern. The semantically similar patterns provide a more direct connection between the pattern and any other patterns already known to the users.

Artificial Intelligence (AAAI’10) Atlanta, GA. (July 2010), pp. 1671–1675. [RH01] Raman, V.; Hellerstein, J.M., Potter's wheel: An interactive data cleaning system, In: Proc. 2001 Int. Conf. Very Large Data Bases (VLDB’01) Rome, Italy. (Sept. 2001), pp. 381–390. [RH07] Rosenberg, A.; Hirschberg, J., V-measure: A conditional entropy-based external cluster evaluation measure, In: Proc. 2007 Joint Conf. Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL’07) Prague, Czech Republic. (June 2007), pp. 410–420. [RHS01] Roddick, J.F.; Hornsby, K.; Spiliopoulou, M., An updated bibliography of temporal, spatial, and spatio-temporal data mining research, In: (Editors: Roddick, J.F.; Hornsby, K.) Lecture Notes in Computer Science 2007 (2001) Springer, New York, pp. 147–163; TSDM 2000. [RHW86] Rumelhart, D.E.; Hinton, G.E.; Williams, R.J., Learning internal representations by error propagation, In: (Editors: Rumelhart, D.E.; McClelland, J.L.)


pages: 327 words: 103,336

Everything Is Obvious: *Once You Know the Answer by Duncan J. Watts

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

active measures, affirmative action, Albert Einstein, Amazon Mechanical Turk, Black Swan, butterfly effect, Carmen Reinhart, Cass Sunstein, clockwork universe, cognitive dissonance, collapse of Lehman Brothers, complexity theory, correlation does not imply causation, crowdsourcing, death of newspapers, discovery of DNA, East Village, easy for humans, difficult for computers, edge city, en.wikipedia.org, Erik Brynjolfsson, framing effect, Geoffrey West, Santa Fe Institute, George Santayana, happiness index / gross national happiness, high batting average, hindsight bias, illegal immigration, industrial cluster, interest rate swap, invention of the printing press, invention of the telescope, invisible hand, Isaac Newton, Jane Jacobs, Jeff Bezos, Joseph Schumpeter, Kenneth Rogoff, lake wobegon effect, Long Term Capital Management, loss aversion, medical malpractice, meta analysis, meta-analysis, Milgram experiment, natural language processing, Netflix Prize, Network effects, oil shock, packet switching, pattern recognition, performance metric, phenotype, Pierre-Simon Laplace, planetary scale, prediction markets, pre–internet, RAND corporation, random walk, RFID, school choice, Silicon Valley, statistical model, Steve Ballmer, Steve Jobs, Steve Wozniak, supply-chain management, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, too big to fail, Toyota Production System, ultimatum game, urban planning, Vincenzo Peruggia: Mona Lisa, Watson beat the top human players on Jeopardy!, X Prize

Small, Michael, Pengliang L. Shi, and Chi Kong Tse. 2004. “Plausible Models for Propagation of the SARS Virus.” IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences E87A (9):2379–86. Snow, Rion, Brendan O’Connor, Daniel Jurafsky, and Andrew Y. Ng. 2008. “Cheap and Fast—But Is It Good? Evaluating Non-Expert Annotations for Natural Language Tasks.” In Empirical Methods in Natural Language Processing. Honolulu, Hawaii: Association for Computational Linguistics. Somers, Margaret R. 1998. “ ‘We’re No Angels’: Realism, Rational Choice, and Relationality in Social Science.” American Journal of Sociology 104 (3):722–84. Sorkin, Andrew Ross (ed). 2008. “Steve & Barry’s Files for Bankruptcy.” New York Times, July 9. Sorkin, Andrew Ross. 2009a. Too Big to Fail: The Inside Story of How Wall Street and Washington Fought to Save the Financial System from Crisis—and Themselves.


pages: 353 words: 104,146

European Founders at Work by Pedro Gairifo Santos

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

business intelligence, cloud computing, crowdsourcing, fear of failure, full text search, information retrieval, inventory management, iterative process, Jeff Bezos, Lean Startup, Mark Zuckerberg, natural language processing, pattern recognition, pre–internet, recommendation engine, Richard Stallman, Silicon Valley, Skype, slashdot, Steve Jobs, Steve Wozniak, subscription business, technology bubble, web application, Y Combinator

Since I finished university, I've been with two start-ups. The first start-up I was with was a mobile internet start-up based in Stockholm, where I was the first employee on the business side. So I became VP of product management there and part of my job was to find complementary code to fit in with our product, essentially. I came across code that Peter Halacsy had done. Back then he was doing research in natural language processing and we were in need of that. This company also had a development office in Cluj, Romania. When you go to Cluj from Stockholm, you fly via Budapest. My parents are from Hungary actually. When I went to Cluj, I would stop for a day in Budapest and say hi. And that's what I did. I figured since I'm in Budapest I should try to actually meet this person who had done this interesting code. So I hunted him down and managed to meet him.


pages: 193 words: 98,671

The Inmates Are Running the Asylum by Alan Cooper

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, delayed gratification, Donald Trump, Howard Rheingold, informal economy, iterative process, Jeff Bezos, Menlo Park, natural language processing, new economy, pets.com, Robert X Cringely, Silicon Valley, Silicon Valley startup, skunkworks, Steve Jobs, Steven Pinker, telemarketer, urban planning

Microsoft, in particular, is touting this false panacea. Microsoft says that interfaces will be easy to use as soon as it can perfect voice recognition and handwriting recognition. I think this is silly. Each new technology merely makes it possible to frustrate users with faster and more-powerful systems. A key to better interaction is to reduce the uncertainty between computers and users. Natural-language processing can never do that because meanings are so vague in human conversation. So much of our communication is based on nuance, gesture, and inflection that although it might be a year or two before computers can recognize our words, it might be decades—if ever—before computers can effectively interpret our meaning. Voice-recognition technology will certainly prove to be useful for many products.


pages: 382 words: 92,138

The Entrepreneurial State: Debunking Public vs. Private Sector Myths by Mariana Mazzucato

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Apple II, banking crisis, barriers to entry, Bretton Woods, California gold rush, call centre, carbon footprint, Carmen Reinhart, cleantech, computer age, creative destruction, credit crunch, David Ricardo: comparative advantage, demand response, deskilling, endogenous growth, energy security, energy transition, eurozone crisis, everywhere but in the productivity statistics, Financial Instability Hypothesis, full employment, G4S, Growth in a Time of Debt, Hyman Minsky, incomplete markets, information retrieval, intangible asset, invisible hand, Joseph Schumpeter, Kenneth Rogoff, knowledge economy, knowledge worker, natural language processing, new economy, offshore financial centre, Philip Mirowski, popular electronics, profit maximization, Ralph Nader, renewable energy credits, rent-seeking, ride hailing / ride sharing, risk tolerance, shareholder value, Silicon Valley, Silicon Valley ideology, smart grid, Steve Jobs, Steve Wozniak, The Wealth of Nations by Adam Smith, Tim Cook: Apple, too big to fail, total factor productivity, trickle-down economics, Washington Consensus, William Shockley: the traitorous eight

This technology, as well as the infrastructure of the system, would have been impossible without the government taking the initiative and making the necessary financial commitment for such a highly complex system. Apple’s latest iPhone feature is a virtual personal assistant known as SIRI. And, like most of the other key technological features in Apple’s iOS products, SIRI has its roots in federal funding and research. SIRI is an artificial intelligence program consisting of machine learning, natural language processing and a Web search algorithm (Roush 2010). In 2000, DARPA asked the Stanford Research Institute (SRI) to take the lead on a project to develop a sort of ‘virtual office assistant’ to assist military personnel. SRI was put in charge of coordinating the ‘Cognitive Assistant that Learns and Organizes’ (CALO) project which included 20 universities all over the US collaborating to develop the necessary technology base.


pages: 292 words: 85,151

Exponential Organizations: Why New Organizations Are Ten Times Better, Faster, and Cheaper Than Yours (And What to Do About It) by Salim Ismail, Yuri van Geest

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, 3D printing, Airbnb, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, bioinformatics, bitcoin, Black Swan, blockchain, Burning Man, business intelligence, business process, call centre, chief data officer, Chris Wanstrath, Clayton Christensen, clean water, cloud computing, cognitive bias, collaborative consumption, collaborative economy, commoditize, corporate social responsibility, cross-subsidies, crowdsourcing, cryptocurrency, dark matter, Dean Kamen, dematerialisation, discounted cash flows, distributed ledger, Edward Snowden, Elon Musk, en.wikipedia.org, ethereum blockchain, Galaxy Zoo, game design, Google Glasses, Google Hangouts, Google X / Alphabet X, gravity well, hiring and firing, Hyperloop, industrial robot, Innovator's Dilemma, intangible asset, Internet of things, Iridium satellite, Isaac Newton, Jeff Bezos, Kevin Kelly, Kickstarter, knowledge worker, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, lifelogging, loose coupling, loss aversion, Lyft, Marc Andreessen, Mark Zuckerberg, market design, means of production, minimum viable product, natural language processing, Netflix Prize, Network effects, new economy, Oculus Rift, offshore financial centre, p-value, PageRank, pattern recognition, Paul Graham, peer-to-peer, peer-to-peer model, Peter H. Diamandis: Planetary Resources, Peter Thiel, prediction markets, profit motive, publish or perish, Ray Kurzweil, recommendation engine, RFID, ride hailing / ride sharing, risk tolerance, Ronald Coase, Second Machine Age, self-driving car, sharing economy, Silicon Valley, skunkworks, Skype, smart contracts, Snapchat, social software, software is eating the world, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, subscription business, supply-chain management, TaskRabbit, telepresence, telepresence robot, Tony Hsieh, transaction costs, Tyler Cowen: Great Stagnation, urban planning, WikiLeaks, winner-take-all economy, X Prize, Y Combinator, zero-sum game

And just as inevitably, within two weeks, complete newcomers to the field trounce their best results. For example, the Hewlett Foundation sponsored a 2012 competition to develop an automated scoring algorithm for student-written essays. Of the 155 teams competing, three were awarded a total of $100,000 in prize money. What was particularly interesting was the fact that none of the winners had prior experience with natural language processing (NLP). Nonetheless, they beat the experts, many of them with decades of experience in NLP under their belts. This can’t help but impact the current status quo. Raymond McCauley, Biotechnology & Bioinformatics Chair at Singularity University, has noticed that “When people want a biotech job in Silicon Valley, they hide their PhDs to avoid being seen as a narrow specialist.” So, if experts are suspect, where should we turn instead?


pages: 314 words: 101,034

Every Patient Tells a Story by Lisa Sanders

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

data acquisition, discovery of penicillin, high batting average, index card, medical residency, meta analysis, meta-analysis, natural language processing, pattern recognition, randomized controlled trial, Ronald Reagan

Doctors using the diagnostic tool that Britto and Maude named Isabel can enter information using either key findings (like GIDEON) or whole-text entries, such as clinical descriptions that are cut-and-pasted from another program. Isabel also uses a novel search strategy to identify candidate diagnoses from the clinical findings. The program includes a thesaurus that facilitates recognition of a wide range of terms describing each finding. The program then uses natural language processing and search algorithms to compare these terms to those used in a selected reference library. For internal medicine cases, the library includes six key textbooks and forty-six major journals in general and subspecialty medicine and toxicology. The search domain and results are filtered to take into account the patient’s age, sex, geographic location, pregnancy status, and other clinical parameters that are either selected by the clinician or automatically entered if the system is integrated with the clinician’s electronic medical record.


pages: 396 words: 107,814

Is That a Fish in Your Ear?: Translation and the Meaning of Everything by David Bellos

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Clapham omnibus, Claude Shannon: information theory, Douglas Hofstadter, Etonian, European colonialism, haute cuisine, invention of the telephone, invention of writing, natural language processing, Republic of Letters, Sapir-Whorf hypothesis, speech recognition

But common sense appeals to our total experience of the nonlinguistic world as well as to our ability to find a way through the language maze: it is precisely the kind of fuzzy, vague, and informal knowledge that distinctive feature analysis seeks to overcome and replace. Despite the usefulness of binary decomposition for some kinds of linguistic description and (in far more complex form) in the “natural language processing” that computers can now perform, word meanings can never be fully specified by atomic distinctions alone. People are just too adept at using words to mean something else. Such quasi-mathematical computation of “meaning” is equally unable to solve an even more basic problem, which is how to identify the very units whose meaning is to be specified. To ask what a word means (and translators often are asked to say what this or that word means) is to suppose that you know what word you are asking about, and that in turn requires you to know what a word is.


pages: 352 words: 96,532

Where Wizards Stay Up Late: The Origins of the Internet by Katie Hafner, Matthew Lyon

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

air freight, Bill Duvall, computer age, conceptual framework, Donald Davies, Douglas Engelbart, Douglas Engelbart, fault tolerance, Hush-A-Phone, information retrieval, John Markoff, Kevin Kelly, Leonard Kleinrock, Marc Andreessen, Menlo Park, natural language processing, packet switching, RAND corporation, RFC: Request For Comment, Robert Metcalfe, Ronald Reagan, Silicon Valley, speech recognition, Steve Crocker, Steven Levy

Walden later served as Heart’s boss, and Barker had gone on to run one of BBN’s divisions. The most conspicuous exception to this was Crowther, who had remained a programmer. For years Heart had been Crowther’s champion, lobbying for the company to let Crowther just be Crowther and think up ingenious ideas in his own dreamy way. In the years following the IMP project, Crowther pursued some unusual ideas about natural language processing, and worked extensively on high-speed packet-switching technology. Severo Ornstein had left BBN in the 1970s for Xerox PARC, and while there he started Computer Professionals for Social Responsibility. When he retired from Xerox, he and his wife moved into one of the remotest corners of the San Francisco Bay Area. For years Ornstein stayed off the Net, and for years he eschewed e-mail.


pages: 343 words: 93,544

vN: The First Machine Dynasty (The Machine Dynasty Book 1) by Madeline Ashby

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

big-box store, iterative process, natural language processing, place-making, traveling salesman, urban planning

Her failsafe guaranteed that. The angel investor supporting the development of von Neumann humanoids was not a military contractor, or a tech firm, or even a design giant. It was a church. A global megachurch named New Eden Ministries, Inc, that believed firmly that the Rapture was coming any minute now. It collected donations, bought real estate, and put the proceeds into programmable matter, natural language processing, and affect detection – all for the benefit of the few pitiful humans regrettably left behind to deal with God's wrath. They would need companions, after all. Helpmeets. And those helpmeets couldn't ever hurt humans. That was the Horsemen's job. It all went to hell, of course. The pastor of New Eden Ministries, Jonah LeMarque, and many of his council members became the defendants in a class action suit brought by youth group members regarding the use of their bodies as models in a pornographic game.


pages: 371 words: 108,317

The Inevitable: Understanding the 12 Technological Forces That Will Shape Our Future by Kevin Kelly

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, A Declaration of the Independence of Cyberspace, AI winter, Airbnb, Albert Einstein, Amazon Web Services, augmented reality, bank run, barriers to entry, Baxter: Rethink Robotics, bitcoin, blockchain, book scanning, Brewster Kahle, Burning Man, cloud computing, commoditize, computer age, connected car, crowdsourcing, dark matter, dematerialisation, Downton Abbey, Edward Snowden, Elon Musk, Filter Bubble, Freestyle chess, game design, Google Glasses, hive mind, Howard Rheingold, index card, indoor plumbing, industrial robot, Internet Archive, Internet of things, invention of movable type, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Markoff, Kevin Kelly, Kickstarter, lifelogging, linked data, Lyft, M-Pesa, Marc Andreessen, Marshall McLuhan, means of production, megacity, Minecraft, multi-sided market, natural language processing, Netflix Prize, Network effects, new economy, Nicholas Carr, old-boy network, peer-to-peer, peer-to-peer lending, personalized medicine, placebo effect, planetary scale, postindustrial economy, recommendation engine, RFID, ride hailing / ride sharing, Rodney Brooks, self-driving car, sharing economy, Silicon Valley, slashdot, Snapchat, social graph, social web, software is eating the world, speech recognition, Stephen Hawking, Steven Levy, Ted Nelson, the scientific method, transport as a service, two-sided market, Uber for X, Watson beat the top human players on Jeopardy!, Whole Earth Review, zero-sum game

in-house AI research teams: Reed Albergotti, “Zuckerberg, Musk Invest in Artificial-Intelligence Company,” Wall Street Journal, March 21, 2014. purchased AI companies since 2014: Derrick Harris, “Pinterest, Yahoo, Dropbox and the (Kind of) Quiet Content-as-Data Revolution,” Gigaom, January 6, 2014; Derrick Harris “Twitter Acquires Deep Learning Startup Madbits,” Gigaom, July 29, 2014; Ingrid Lunden, “Intel Has Acquired Natural Language Processing Startup Indisys, Price ‘North’ of $26M, to Build Its AI Muscle,” TechCrunch, September 13, 2013; and Cooper Smith, “Social Networks Are Investing Big in Artificial Intelligence,” Business Insider, March 17, 2014. expanding 70 percent a year: Private analysis by Quid, Inc., 2014. taught an AI to learn to play: Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et al., “Human-Level Control Through Deep Reinforcement Learning,” Nature 518, no. 7540 (2015): 529–33.


pages: 326 words: 103,170

The Seventh Sense: Power, Fortune, and Survival in the Age of Networks by Joshua Cooper Ramo

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Airbnb, Albert Einstein, algorithmic trading, barriers to entry, Berlin Wall, bitcoin, British Empire, cloud computing, crowdsourcing, Danny Hillis, defense in depth, Deng Xiaoping, drone strike, Edward Snowden, Fall of the Berlin Wall, Firefox, Google Chrome, income inequality, Isaac Newton, Jeff Bezos, job automation, market bubble, Menlo Park, Metcalfe’s law, natural language processing, Network effects, Norbert Wiener, Oculus Rift, packet switching, Paul Graham, price stability, quantitative easing, RAND corporation, recommendation engine, Republic of Letters, Richard Feynman, Richard Feynman, road to serfdom, Robert Metcalfe, Sand Hill Road, secular stagnation, self-driving car, Silicon Valley, Skype, Snapchat, social web, sovereign wealth fund, Steve Jobs, Steve Wozniak, Stewart Brand, Stuxnet, superintelligent machines, technological singularity, The Coming Technological Singularity, The Wealth of Nations by Adam Smith, too big to fail, Vernor Vinge, zero day

CHAPTER SEVEN The New Caste In which we meet a powerful group defined, enabled, and enriched by their mastery of the networks. 1. In 1965, an MIT computer scientist named Joseph Weizenbaum found himself, somewhat unexpectedly, considering a problem with his computer and its users that he had not quite anticipated. Weizenbaum was in the midst of an experiment that started innocently enough. He’d written a program to perform what is now known as natural language processing, essentially a bit of code designed to translate what a human tells a machine into something the machine can actually work with. When someone asks a computer, What is the weather? the machine uses a special processing approach to turn that into an instruction set. Answering those sorts of queries demands a great deal of digital work before the computer can figure out what you mean and how to fill you in.


pages: 319 words: 90,965

The End of College: Creating the Future of Learning and the University of Everywhere by Kevin Carey

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, barriers to entry, Bayesian statistics, Berlin Wall, business intelligence, carbon-based life, Claude Shannon: information theory, complexity theory, David Heinemeier Hansson, declining real wages, deliberate practice, discrete time, double helix, Douglas Engelbart, Douglas Engelbart, Downton Abbey, Drosophila, Firefox, Frank Gehry, Google X / Alphabet X, informal economy, invention of the printing press, inventory management, John Markoff, Khan Academy, Kickstarter, low skilled workers, Lyft, Marc Andreessen, Mark Zuckerberg, meta analysis, meta-analysis, natural language processing, Network effects, open borders, pattern recognition, Peter Thiel, pez dispenser, ride hailing / ride sharing, Ronald Reagan, Ruby on Rails, Sand Hill Road, self-driving car, Silicon Valley, Silicon Valley startup, social web, South of Market, San Francisco, speech recognition, Steve Jobs, technoutopianism, transcontinental railway, Vannevar Bush

He and two coauthors recently name-checked a well-known article called “The Unreasonable Effectiveness of Mathematics in the Natural Sciences,” which “examines why so much of physics can be neatly explained with simple mathematical formulas such as F = ma or E = mc2. Meanwhile, sciences that involve human beings rather than elementary particles have proven more resistant to elegant mathematics.” “Perhaps when it comes to natural language processing and related fields,” they wrote, “we’re doomed to complex theories that will never have the elegance of physics equations. But if that’s so, we should stop acting as if our goal is to author extremely elegant theories, and instead embrace complexity and make use of the best ally we have: the unreasonable effectiveness of data.” Learning and human cognition are definitely among the “related fields.”


Python Geospatial Development - Second Edition by Erik Westra

capital controls, database schema, Firefox, Golden Gate Park, Google Earth, Mercator projection, natural language processing, openstreetmap, Silicon Valley, web application

Winwaed specialize in geospatial tools and applications including web applications, and operate the http://www.mapping-tools.com website for tools and add-ins for Microsoft's MapPoint product. Richard also manages the technical aspects of the EcoMapCostaRica.com project for the Biology Department at the University of Dallas. This includes the website, online field maps, field surveys, and the creation and comparison of panoramic photographs. Richard is also active in the field of natural language processing, especially with Python's NLTK package. Will Cadell is a principal consultant with Sparkgeo.com. He builds next generation web mapping applications, primarily using Google Maps, geoDjango, and PostGIS. He has worked in academia, government, and natural resources but now mainly consults for the start-up community in Silicon Valley. His passion has always been the implementation of geographic technology and with over a billion smart, mobile devices in the world it's a great time to be working on the geoweb.


Starstruck: The Business of Celebrity by Currid

barriers to entry, Bernie Madoff, Donald Trump, income inequality, index card, industrial cluster, labour mobility, Mark Zuckerberg, Metcalfe’s law, natural language processing, place-making, Ponzi scheme, post-industrial society, prediction markets, Renaissance Technologies, Richard Florida, Robert Metcalfe, rolodex, shareholder value, Silicon Valley, slashdot, transaction costs, upwardly mobile, urban decay, Vilfredo Pareto, winner-take-all economy

Step one collected meta-information from the pictures in the Getty database. We then stored the meta-information in a MS-SQL relational database. In step two we identified the individuals in each photo. Instead of studying the photos themselves, we studied the caption information associated with the photos and cataloged an aggregate collection of this data. In order to identify the photographed objects, we used natural language processing (NLP). SQL-implemented association rules enabled us to clean the data. Our cataloging process collected the following information: names and occupations of individuals in each picture, the event and date when the photo was taken (e.g., Actress Angelina Jolie at the Oscars, February 22, 2007). In step three we used the database information to build a list of events and the celebrities photographed at them.


pages: 1,076 words: 67,364

Haskell Programming from first principles by Christopher Allen, Julie Moronuki

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

c2.com, en.wikipedia.org, natural language processing, spaced repetition, Turing complete, Turing machine, type inference, web application, Y Combinator

We met on Twitter and quickly became friends. As anyone who has encountered Chris–probably in any medium, but certainly on Twitter–knows, it doesn’t take long before he starts urging you to learn Haskell. I told him I had no interest in programming. I told him nothing and nobody had ever been able to interest me in programming before. When Chris learned of my background in linguistics, he thought I might be interested in natural language processing and exhorted me to learn Haskell for that purpose. I remained unconvinced. Then he tried a different approach. He was spending a lot of time gathering and evaluating resources for teaching Haskell and refining his pedagogical techniques, and he convinced me to try to learn Haskell so that he could gain the experience of teaching a code-neophyte. Finally, with an “anything for science” attitude, I gave in.

PARSER COMBINATORS 842 • use a parsing library to cover the basics of parsing; • demonstrate the awesome power of parser combinators; • marshall and unmarshall some JSON data; • talk about tokenization. 24.2 A few more words of introduction In this chapter, we will not look too deeply into the types of the parsing libraries we’re using, learn every sort of parser there is, or artisanally handcraft all of our parsing functions ourselves. These are thoroughly considered decisions. Parsing is a huge field of research in its own right with connections that span natural language processing, linguistics, and programming language theory. Just this topic could easily fill a book in itself (in fact, it has). The underlying types and typeclasses of the libraries we’ll be using are complicated. To be sure, if you enjoy parsing and expect to do it a lot, those are things you’d want to learn; they are simply out of the scope of this book. This chapter takes a different approach than previous chapters.


pages: 481 words: 121,669

The Invisible Web: Uncovering Information Sources Search Engines Can't See by Gary Price, Chris Sherman, Danny Sullivan

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

AltaVista, American Society of Civil Engineers: Report Card, bioinformatics, Brewster Kahle, business intelligence, dark matter, Donald Davies, Douglas Engelbart, Douglas Engelbart, full text search, HyperCard, hypertext link, information retrieval, Internet Archive, joint-stock company, knowledge worker, natural language processing, pre–internet, profit motive, publish or perish, search engine result page, side project, Silicon Valley, speech recognition, stealth mode startup, Ted Nelson, Vannevar Bush, web application

Using search engine structure to reduce the returned set of possible hits by specifying certain criteria such as Web page date, country of origin, or by using field searching to restrict the search to specific parts of Web pages. metasearch engine. A search engine that simultaneously searches other search engines and aggregates the results into a single result list. Metasearch engines typically do not maintain their own indices of Web pages. natural language. Entering a search query exactly as if the question were being written or spoken. Natural Language Processing (NLP) is a technique used by search engines to break up or “parse” the search into a query the engine can understand. “on the fly.” Dynamic Web pages that are assembled in real time, as opposed to static HTML pages. An example could be your MyYahoo.Com page that contains the information (news, sports, weather, etc.) that you select. When you call for the page, it is built “on the fly” and sent to your browser.


pages: 483 words: 141,836

Red-Blooded Risk: The Secret History of Wall Street by Aaron Brown, Eric Kim

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

activist fund / activist shareholder / activist investor, Albert Einstein, algorithmic trading, Asian financial crisis, Atul Gawande, backtesting, Basel III, Bayesian statistics, beat the dealer, Benoit Mandelbrot, Bernie Madoff, Black Swan, capital asset pricing model, central bank independence, Checklist Manifesto, corporate governance, creative destruction, credit crunch, Credit Default Swap, disintermediation, distributed generation, diversification, diversified portfolio, Edward Thorp, Emanuel Derman, Eugene Fama: efficient market hypothesis, experimental subject, financial innovation, illegal immigration, implied volatility, index fund, Long Term Capital Management, loss aversion, margin call, market clearing, market fundamentalism, market microstructure, money market fund, money: store of value / unit of account / medium of exchange, moral hazard, Myron Scholes, natural language processing, open economy, Pierre-Simon Laplace, pre–internet, quantitative trading / quantitative finance, random walk, Richard Thaler, risk tolerance, risk-adjusted returns, risk/return, road to serfdom, Robert Shiller, Robert Shiller, shareholder value, Sharpe ratio, special drawing rights, statistical arbitrage, stochastic volatility, The Myth of the Rational Market, Thomas Bayes, too big to fail, transaction costs, value at risk, yield curve

This form can be abused by taking a popular simplification of a scientific idea and applying it thoughtlessly as a metaphor or fanatically as a new religion. At the other extreme, serious researchers sometimes try to apply the precise mechanisms or equations from one field to another. The first approach is almost never fruitful and often crazy. The second is called econophysics when applied to finance, even if the scientific field lending techniques is not physics. Borrowings from signal processing and natural language processing have been spectacularly successful in finance, but the jury is still out on whether other fields can produce worthwhile insights. Econophysics offers a lot of promise and has had some minor successes, but more frustration and failure than progress. I’m doing something different. The idea that risk can be analyzed using probability distributions and utility functions is embedded deeply in economics.


pages: 565 words: 151,129

The Zero Marginal Cost Society: The Internet of Things, the Collaborative Commons, and the Eclipse of Capitalism by Jeremy Rifkin

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, active measures, additive manufacturing, Airbnb, autonomous vehicles, back-to-the-land, big-box store, bioinformatics, bitcoin, business process, Chris Urmson, clean water, cleantech, cloud computing, collaborative consumption, collaborative economy, Community Supported Agriculture, Computer Numeric Control, computer vision, crowdsourcing, demographic transition, distributed generation, en.wikipedia.org, Frederick Winslow Taylor, global supply chain, global village, Hacker Ethic, industrial robot, informal economy, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Isaac Newton, James Watt: steam engine, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Julian Assange, Kickstarter, knowledge worker, labour mobility, Mahatma Gandhi, manufacturing employment, Mark Zuckerberg, market design, mass immigration, means of production, meta analysis, meta-analysis, natural language processing, new economy, New Urbanism, nuclear winter, Occupy movement, off grid, oil shale / tar sands, pattern recognition, peer-to-peer, peer-to-peer lending, personalized medicine, phenotype, planetary scale, price discrimination, profit motive, QR code, RAND corporation, randomized controlled trial, Ray Kurzweil, RFID, Richard Stallman, risk/return, Ronald Coase, search inside the book, self-driving car, shareholder value, sharing economy, Silicon Valley, Skype, smart cities, smart grid, smart meter, social web, software as a service, spectrum auction, Steve Jobs, Stewart Brand, the built environment, The Nature of the Firm, The Structural Transformation of the Public Sphere, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas L Friedman, too big to fail, transaction costs, urban planning, Watson beat the top human players on Jeopardy!, web application, Whole Earth Catalog, Whole Earth Review, WikiLeaks, working poor, zero-sum game, Zipcar

The Big Ten Network uses algorithms to create original pieces posted just seconds after games, eliminating human copywriters.37 Artificial intelligence took a big leap into the future in 2011 when an IBM computer, Watson—named after IBM’s past chairman—took on Ken Jennings, who held the record of 74 wins on the popular TV show Jeopardy, and defeated him. The showdown, which netted a $1 million prize for IBM, blew away TV viewers as they watched their Jeopardy hero crumble in the presence of the “all-knowing” Watson. Watson is a cognitive system that is able to integrate “natural language processing, machine learning, and hypothesis generation and evaluation,” says its proud IBM parent, allowing it to think and respond to questions and problems.38 Watson is already being put to work. IBM Healthcare Analytics will use Watson to assist physicians in making quick and accurate diagnoses by analyzing Big Data stored in the electronic health records of millions of patients, as well as in medical journals.39 IBM’s plans for Watson go far beyond serving the specialized needs of the research industry and the back-office tasks of managing Big Data.


pages: 351 words: 123,876

Beautiful Testing: Leading Professionals Reveal How They Improve Software (Theory in Practice) by Adam Goucher, Tim Riley

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, barriers to entry, Black Swan, call centre, continuous integration, Debian, Donald Knuth, en.wikipedia.org, Firefox, Grace Hopper, index card, Isaac Newton, natural language processing, p-value, performance metric, revision control, six sigma, software as a service, software patent, the scientific method, Therac-25, Valgrind, web application

I SAAC C LERENCIA is a software developer at eBox Technologies. Since 2001 he has been involved in several free software projects, including Debian and Battle for Wesnoth. He, along with other partners, founded Warp Networks in 2004. Warp Networks is the open source– oriented software company from which eBox Technologies was later spun off. Other interests of his are artificial intelligence and natural language processing. J OHN D. C OOK is a very applied mathematician. After receiving a Ph.D. in from the University of Texas, he taught mathematics at Vanderbilt University. He then left academia to work as a software developer and consultant. He currently works as a research statistician at M. D. Anderson Cancer Center. His career has been a blend of research, software development, 318 APPENDIX consulting, and management.


pages: 528 words: 146,459

Computer: A History of the Information Machine by Martin Campbell-Kelly, William Aspray, Nathan L. Ensmenger, Jeffrey R. Yost

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Apple's 1984 Super Bowl advert, barriers to entry, Bill Gates: Altair 8800, borderless world, Buckminster Fuller, Build a better mousetrap, Byte Shop, card file, cashless society, cloud computing, combinatorial explosion, computer age, deskilling, don't be evil, Donald Davies, Douglas Engelbart, Douglas Engelbart, Dynabook, fault tolerance, Fellow of the Royal Society, financial independence, Frederick Winslow Taylor, game design, garden city movement, Grace Hopper, informal economy, interchangeable parts, invention of the wheel, Jacquard loom, Jacquard loom, Jeff Bezos, jimmy wales, John Markoff, John von Neumann, light touch regulation, linked data, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Menlo Park, natural language processing, Network effects, New Journalism, Norbert Wiener, Occupy movement, optical character recognition, packet switching, PageRank, pattern recognition, Pierre-Simon Laplace, pirate software, popular electronics, prediction markets, pre–internet, QWERTY keyboard, RAND corporation, Robert X Cringely, Silicon Valley, Silicon Valley startup, Steve Jobs, Steven Levy, Stewart Brand, Ted Nelson, the market place, Turing machine, Vannevar Bush, Von Neumann architecture, Whole Earth Catalog, William Shockley: the traitorous eight, women in the workforce, young professional

was already well established when two other Stanford University doctoral students, Larry Page and Sergey Brin, began work on the Stanford Digital Library Project (funded in part by the National Science Foundation)—research that would not only forever change the process of finding things on the Internet but also, in time, lead to an unprecedentedly successful web advertising model. Page became interested in a dissertation project on the mathematical properties of the web, and found strong support from his adviser Terry Winograd, a pioneer of artificial intelligence research on natural language processing. Using a “web crawler” to gather back-link data (that is, the websites that linked to a particular site), Page, now teamed up with Brin, created their “PageRank” algorithm based on back-links ranked by importance—the more prominent the linking site, the more influence it would have on the linked site’s page rank. They insightfully reasoned that this would provide the basis for more useful web searches than any existing tools and, moreover, that there would be no need to hire a corps of indexing staff.


pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr by Doug Turnbull, John Berryman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

commoditize, crowdsourcing, domain-specific language, finite state, fudge factor, full text search, information retrieval, natural language processing, premature optimization, recommendation engine, sentiment analysis

It usually requires seeing many bad examples to identify problematic patterns, and it’s often challenging to know what better results would look like without actually seeing them show up. Unfortunately, it’s often not until well after a search system is deployed into production that organizations begin to realize the gap between out-of-the-box relevancy defaults and true domain-driven, personalized matching. Not only that, but the skillsets needed to think about relevancy (domain expertise, feature engineering, machine learning, ontologies, user testing, natural language processing) are very different from those needed to build and maintain scalable infrastructure (distributed systems, data structures, performance and concurrency, hardware utilization, network calls and communication). The role of a relevance engineer is almost entirely lacking in many organizations, leaving so much potential untapped for building a search experience that truly delights users and significantly moves a company forward.


pages: 405 words: 117,219

In Our Own Image: Savior or Destroyer? The History and Future of Artificial Intelligence by George Zarkadakis

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

3D printing, Ada Lovelace, agricultural Revolution, Airbnb, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, anthropic principle, Asperger Syndrome, autonomous vehicles, barriers to entry, battle of ideas, Berlin Wall, bioinformatics, British Empire, business process, carbon-based life, cellular automata, Claude Shannon: information theory, combinatorial explosion, complexity theory, continuous integration, Conway's Game of Life, cosmological principle, dark matter, dematerialisation, double helix, Douglas Hofstadter, Edward Snowden, epigenetics, Flash crash, Google Glasses, Gödel, Escher, Bach, income inequality, index card, industrial robot, Internet of things, invention of agriculture, invention of the steam engine, invisible hand, Isaac Newton, Jacquard loom, Jacquard loom, Jacques de Vaucanson, James Watt: steam engine, job automation, John von Neumann, Joseph-Marie Jacquard, liberal capitalism, lifelogging, millennium bug, Moravec's paradox, natural language processing, Norbert Wiener, off grid, On the Economy of Machinery and Manufactures, packet switching, pattern recognition, Paul Erdős, post-industrial society, prediction markets, Ray Kurzweil, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, speech recognition, stem cell, Stephen Hawking, Steven Pinker, strong AI, technological singularity, The Coming Technological Singularity, The Future of Employment, the scientific method, theory of mind, Turing complete, Turing machine, Turing test, Tyler Cowen: Great Stagnation, Vernor Vinge, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K

For the purpose of the TV quiz, the engineers at IBM loaded Watson with 200 million pages of data, including dictionaries, encyclopaedias and literary articles. Moreover, Watson communicated in natural language. You asked it a question, it understood it, and returned an answer. For this to happen, Watson’s designers exploited the whole arsenal of AI tools and techniques, including machine learning, natural language processing and knowledge representation. What the success of their creation demonstrated was that brute computing force could overcome the obstacles that the AI pioneers faced in the 1960s and early 1970s. Bigger, stronger, faster were very meaningful words when it came to increasing machine intelligence. Deep Blue, DARPA’s navigational challenge and Watson ushered AI to the fore of public awareness and debate.


pages: 479 words: 113,510

Fed Up: An Insider's Take on Why the Federal Reserve Is Bad for America by Danielle Dimartino Booth

Affordable Care Act / Obamacare, asset-backed security, bank run, barriers to entry, Basel III, Bernie Sanders, break the buck, Bretton Woods, central bank independence, collateralized debt obligation, corporate raider, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, diversification, Donald Trump, financial deregulation, financial innovation, fixed income, Flash crash, forward guidance, full employment, George Akerlof, greed is good, high net worth, housing crisis, income inequality, index fund, inflation targeting, interest rate swap, invisible hand, John Meriwether, Joseph Schumpeter, liquidity trap, London Whale, Long Term Capital Management, margin call, market bubble, Mexican peso crisis / tequila crisis, money market fund, moral hazard, Myron Scholes, natural language processing, negative equity, new economy, Northern Rock, obamacare, price stability, pushing on a string, quantitative easing, regulatory arbitrage, Robert Shiller, Robert Shiller, Ronald Reagan, selection bias, short selling, side project, Silicon Valley, The Great Moderation, The Wealth of Nations by Adam Smith, too big to fail, trickle-down economics, yield curve

It felt like election night to see which statement would emerge victorious. When I ran up to the break room to watch CNBC’s Steve Liesman read the FOMC statement, I was on tenterhooks, wondering which words had prevailed. I fully grasped the ridiculous pageantry because I knew the markets would parse every single word. A Fed “computational linguistics” study of FOMC statements released in 2015 concluded: “natural language processing can strip away false impressions and uncover hidden truths about complex communications such as those of the Federal Reserve.” The Street had it right all along. Depressingly, the option Fisher preferred was rarely the one that came out of Liesman’s mouth. The doves always seemed to have the upper hand. Hawks and doves had become very political. It wasn’t as much of an economic exercise as a political one, with a distinct irony.


pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions by Brian Christian, Tom Griffiths

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

4chan, Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, algorithmic trading, anthropic principle, asset allocation, autonomous vehicles, Bayesian statistics, Berlin Wall, Bill Duvall, bitcoin, Community Supported Agriculture, complexity theory, constrained optimization, cosmological principle, cryptocurrency, Danny Hillis, David Heinemeier Hansson, delayed gratification, dematerialisation, diversification, Donald Knuth, double helix, Elon Musk, fault tolerance, Fellow of the Royal Society, Firefox, first-price auction, Flash crash, Frederick Winslow Taylor, George Akerlof, global supply chain, Google Chrome, Henri Poincaré, information retrieval, Internet Archive, Jeff Bezos, John Nash: game theory, John von Neumann, knapsack problem, Lao Tzu, Leonard Kleinrock, linear programming, martingale, Nash equilibrium, natural language processing, NP-complete, P = NP, packet switching, Pierre-Simon Laplace, prediction markets, race to the bottom, RAND corporation, RFC: Request For Comment, Robert X Cringely, sealed-bid auction, second-price auction, self-driving car, Silicon Valley, Skype, sorting algorithm, spectrum auction, Steve Jobs, stochastic process, Thomas Bayes, Thomas Malthus, traveling salesman, Turing machine, urban planning, Vickrey auction, Vilfredo Pareto, Walter Mischel, Y Combinator, zero-sum game

Benenson, Itzhak, Karel Martens, and Slava Birfir. “PARKAGENT: An Agent-Based Model of Parking in the City.” Computers, Environment and Urban Systems 32, no. 6 (2008): 431–439. Berezovsky, Boris, and Alexander V. Gnedin. Problems of Best Choice (in Russian). Moscow: Akademia Nauk, 1984. Berg-Kirkpatrick, Taylor, and Dan Klein. “Decipherment with a Million Random Restarts.” In Proceedings of the Conference on Empirical Methods in Natural Language Processing (2013): 874–878. Bernardo, Antonio E., and Ivo Welch. “On the Evolution of Overconfidence and Entrepreneurs.” Journal of Economics & Management Strategy 10, no. 3 (2001): 301–330. Berry, Donald A. “A Bernoulli Two-Armed Bandit.” Annals of Mathematical Statistics 43 (1972): 871–897. ______. “Comment: Ethics and ECMO.” Statistical Science 4 (1989): 306–310. Berry, Donald A., and Bert Fristed.

The Future of Technology by Tom Standage

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

air freight, barriers to entry, business process, business process outsourcing, call centre, Clayton Christensen, computer vision, connected car, corporate governance, creative destruction, disintermediation, distributed generation, double helix, experimental economics, full employment, hydrogen economy, industrial robot, informal economy, information asymmetry, interchangeable parts, job satisfaction, labour market flexibility, Marc Andreessen, market design, Menlo Park, millennium bug, moral hazard, natural language processing, Network effects, new economy, Nicholas Carr, optical character recognition, railway mania, rent-seeking, RFID, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, six sigma, Skype, smart grid, software as a service, spectrum auction, speech recognition, stem cell, Steve Ballmer, technology bubble, telemarketer, transcontinental railway, Y2K

Similarly, Max Thiercy, head of development at Albert, a French firm that produces natural-language search software, also avoids the term ai. “I consider the term a bit obsolete,” he says. “It can make our customers 339 THE FUTURE OF TECHNOLOGY frightened.” This seems odd, because the firm’s search technology uses a classic ai technique, applying multiple algorithms to the same data, and then evaluates the results to see which approach was most effective. Even so, the firm prefers to use such terms as “natural language processing” and “machine learning”. Perhaps the biggest change in ai’s fortunes is simply down to the change of date. The film A.I. was based on an idea by the late director, Stanley Kubrick, who also dealt with the topic in another film, 2001: A Space Odyssey, which was released in 1969. 2001 featured an intelligent computer called hal 9000 with a hypnotic speaking voice. As well as understanding and speaking English, hal could play chess and even learned to lip-read. hal thus encapsulated the optimism of the 1960s that intelligent computers would be widespread by 2001.


pages: 416 words: 129,308

The One Device: The Secret History of the iPhone by Brian Merchant

Airbnb, Apple II, Apple's 1984 Super Bowl advert, citizen journalism, Claude Shannon: information theory, computer vision, conceptual framework, Douglas Engelbart, Dynabook, Edward Snowden, Elon Musk, Ford paid five dollars a day, Frank Gehry, global supply chain, Google Earth, Google Hangouts, Internet of things, Jacquard loom, Jacquard loom, John Gruber, John Markoff, Jony Ive, Lyft, M-Pesa, more computing power than Apollo, Mother of all demos, natural language processing, new economy, New Journalism, Norbert Wiener, offshore financial centre, oil shock, pattern recognition, peak oil, pirate software, profit motive, QWERTY keyboard, ride hailing / ride sharing, rolodex, Silicon Valley, Silicon Valley startup, skunkworks, Skype, Snapchat, special economic zone, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Tim Cook: Apple, Turing test, Upton Sinclair, Vannevar Bush, zero day

Siri is really a constellation of features—speech-recognition software, a natural-language user interface, and an artificially intelligent personal assistant. When you ask Siri a question, here’s what happens: Your voice is digitized and transmitted to an Apple server in the Cloud while a local voice recognizer scans it right on your iPhone. Speech-recognition software translates your speech into text. Natural-language processing parses it. Siri consults what tech writer Steven Levy calls the iBrain—around 200 megabytes of data about your preferences, the way you speak, and other details. If your question can be answered by the phone itself (“Would you set my alarm for eight a.m.?”), the Cloud request is canceled. If Siri needs to pull data from the web (“Is it going to rain tomorrow?”), to the Cloud it goes, and the request is analyzed by another array of models and tools.

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Mikhail Gorbachev, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Richard Feynman, Robert Metcalfe, Rodney Brooks, Search for Extraterrestrial Intelligence, selection bias, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Y2K, Yogi Berra

Dealing naturally with language is the most challenging task of all for artificial intelligence. No simple tricks, short of fully mastering the principles of human intelligence, will allow a computerized system to convincingly emulate human conversation, even if restricted to just text messages. This was Turing's enduring insight in designing his eponymous test based entirely on written language. Although not yet at human levels, natural language-processing systems are making solid progress. Search engines have become so popular that "Google" has gone from a proper noun to a common verb, and its technology has revolutionized research and access to knowledge. Google and other search engines use Al-based statistical-learning methods and logical inference to determine the ranking of links. The most obvious failing of these search engines is their inability to understand the context of words.


pages: 584 words: 187,436

More Money Than God: Hedge Funds and the Making of a New Elite by Sebastian Mallaby

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Andrei Shleifer, Asian financial crisis, asset-backed security, automated trading system, bank run, barriers to entry, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Big bang: deregulation of the City of London, Bonfire of the Vanities, Bretton Woods, capital controls, Carmen Reinhart, collapse of Lehman Brothers, collateralized debt obligation, computerized trading, corporate raider, Credit Default Swap, credit default swaps / collateralized debt obligations, crony capitalism, currency manipulation / currency intervention, currency peg, Elliott wave, Eugene Fama: efficient market hypothesis, failed state, Fall of the Berlin Wall, financial deregulation, financial innovation, financial intermediation, fixed income, full employment, German hyperinflation, High speed trading, index fund, John Meriwether, Kenneth Rogoff, Long Term Capital Management, margin call, market bubble, market clearing, market fundamentalism, merger arbitrage, money market fund, moral hazard, Myron Scholes, natural language processing, Network effects, new economy, Nikolai Kondratiev, pattern recognition, Paul Samuelson, pre–internet, quantitative hedge fund, quantitative trading / quantitative finance, random walk, Renaissance Technologies, Richard Thaler, risk-adjusted returns, risk/return, rolodex, Sharpe ratio, short selling, Silicon Valley, South Sea Bubble, sovereign wealth fund, statistical arbitrage, statistical model, survivorship bias, technology bubble, The Great Moderation, The Myth of the Rational Market, the new new thing, too big to fail, transaction costs

Elwyn Berlekamp, interview with the author, July 24, 2008. It is also interesting that Brown and Mercer’s coauthors who followed them to Renaissance, Stephen and Vincent Della Pietra, explicitly presented their experience with statistical machine translation as relevant to finding order in other types of data, including financial data. See Adam L. Berger, Stephen A. Della Pietra, and Vincent J. Della Pietra, “A Maximum Entropy Approach to Natural Language Processing,” Computational Linguistics 22, no. 1 (March 1996): pp. 39–71. 30. To manage the potential linguistic chaos resulting from this permissiveness, neologisms had to be submitted to a review. Mercer interview. 31. The Russian employees were Pavel Volfbeyn and Alexander Belopolsky. The firm that they defected to was Millennium. They argued through their lawyer that their new system was not based on proprietary secrets from Renaissance.


pages: 677 words: 206,548

Future Crimes: Everything Is Connected, Everyone Is Vulnerable and What We Can Do About It by Marc Goodman

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

23andMe, 3D printing, active measures, additive manufacturing, Affordable Care Act / Obamacare, Airbnb, airport security, Albert Einstein, algorithmic trading, artificial general intelligence, Asilomar, Asilomar Conference on Recombinant DNA, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, Bill Joy: nanobots, bitcoin, Black Swan, blockchain, borderless world, Brian Krebs, business process, butterfly effect, call centre, Chelsea Manning, cloud computing, cognitive dissonance, computer vision, connected car, corporate governance, crowdsourcing, cryptocurrency, data acquisition, data is the new oil, Dean Kamen, disintermediation, don't be evil, double helix, Downton Abbey, drone strike, Edward Snowden, Elon Musk, Erik Brynjolfsson, Filter Bubble, Firefox, Flash crash, future of work, game design, Google Chrome, Google Earth, Google Glasses, Gordon Gekko, high net worth, High speed trading, hive mind, Howard Rheingold, hypertext link, illegal immigration, impulse control, industrial robot, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Jaron Lanier, Jeff Bezos, job automation, John Harrison: Longitude, John Markoff, Jony Ive, Julian Assange, Kevin Kelly, Khan Academy, Kickstarter, knowledge worker, Kuwabatake Sanjuro: assassination market, Law of Accelerating Returns, Lean Startup, license plate recognition, lifelogging, litecoin, M-Pesa, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Metcalfe’s law, mobile money, more computing power than Apollo, move fast and break things, move fast and break things, Nate Silver, national security letter, natural language processing, obamacare, Occupy movement, Oculus Rift, off grid, offshore financial centre, optical character recognition, Parag Khanna, pattern recognition, peer-to-peer, personalized medicine, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, RAND corporation, ransomware, Ray Kurzweil, refrigerator car, RFID, ride hailing / ride sharing, Rodney Brooks, Satoshi Nakamoto, Second Machine Age, security theater, self-driving car, shareholder value, Silicon Valley, Silicon Valley startup, Skype, smart cities, smart grid, smart meter, Snapchat, social graph, software as a service, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, Stuxnet, supply-chain management, technological singularity, telepresence, telepresence robot, Tesla Model S, The Future of Employment, The Wisdom of Crowds, Tim Cook: Apple, trade route, uranium enrichment, Wall-E, Watson beat the top human players on Jeopardy!, Wave and Pay, We are Anonymous. We are Legion, web application, Westphalian system, WikiLeaks, Y Combinator, zero day

When Watson Turns to a Life of Crime Artificial intelligence will reach human levels by around 2029. Follow that out further to, say, 2045, we will have multiplied the intelligence, the human biological machine intelligence of our civilization a billion-fold. RAY KURZWEIL In 2011, we all watched with awe when IBM’s Watson supercomputer beat the world champions on the television game show Jeopardy! Using artificial intelligence and natural language processing, Watson digested over 200 million pages of structured and unstructured data, which it processed at a rate of eighty teraflops—that’s eighty trillion operations per second. In doing so, it handily defeated Ken Jennings, a human Jeopardy! contestant who had won seventy-four games in a row. Jennings was gracious in his defeat, noting, “I, for one, welcome our new computer overlords.” He might want to rethink that.


pages: 661 words: 187,613

The Language Instinct: How the Mind Creates Language by Steven Pinker

Amazon: amazon.comamazon.co.ukamazon.deamazon.fr

Albert Einstein, cloud computing, David Attenborough, double helix, Drosophila, elephant in my pajamas, finite state, illegal immigration, Loebner Prize, mass immigration, Maui Hawaii, meta analysis, meta-analysis, natural language processing, out of africa, P = NP, phenotype, rolodex, Ronald Reagan, Sapir-Whorf hypothesis, Saturday Night Live, speech recognition, Steven Pinker, theory of mind, transatlantic slave trade, Turing machine, Turing test, Yogi Berra

Preface to the Dictionary. Reprinted in E. L. McAdam, Jr., and G. Milne (Eds.), 1964, Samuel Johnson’s Dictionary: A modern selection. New York: Pantheon. Joos, M. (Ed.) 1957. Readings in linguistics: The development of descriptive linguistics in America since 1925. Washington, D.C.: American Council of Learned Societies. Jordan, M. I., & Rosenbaum, D. 1989. Action. In Posner, 1989. Joshi, A. K. 1991. Natural language processing. Science, 253, 1242–1249. Kaplan, R. 1972. Augmented transition networks as psychological models of sentence comprehension. Artificial Intelligence, 3, 77–100. Kaplan, S. 1992. Environmental preference in a knowledge-seeking, knowledge-using organism. In Barkow, Cosmides, & Tooby, 1992. Kasher, A. (Ed.) 1991. The Chomskyan turn. Cambridge, Mass.: Blackwell. Katzner, K. 1977. The languages of the world.


Martin Kleppmann-Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems-O’Reilly (2017) by Unknown

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, general-purpose programming language, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, loose coupling, Marc Andreessen, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, web application, WebSocket, wikimedia commons

For example: • A secondary index is a kind of derived dataset with a straightforward transforma‐ tion function: for each row or document in the base table, it picks out the values in the columns or fields being indexed, and sorts by those values (assuming a Btree or SSTable index, which are sorted by key, as discussed in Chapter 3). • A full-text search index is created by applying various natural language process‐ ing functions such as language detection, word segmentation, stemming or lem‐ matization, spelling correction, and synonym identification, followed by building a data structure for efficient lookups (such as an inverted index). • In a machine learning system, we can consider the model as being derived from the training data by applying various feature extraction and statistical analysis functions.