information retrieval

156 results back to index


Understanding search engines: mathematical modeling and text retrieval by Michael W. Berry, Murray Browne

information retrieval, machine readable, PageRank

The work of William Frakes and Ricardo Baeza-Yates (eds.), Information Retrieval Data Structures and Algorithms, a 1992 collection of journal articles on various related topics, Gerald Kowalski's (1997) Information Retrieval Systems: Theory and Implementation, a broad overview XI xii Preface to the Second Edition of information retrieval systems, and Ricardo Baeza-Yates and Berthier Ribeiro-Neto's (1999) Modern Information Retrieval, a computer-science perspective of information retrieval, are all fine textbooks on the topic, but understandably they lack the gritty details of the mathematical computations needed to build more successful search engines.

The work of William Frakes and Ricardo Baeza-Yates (eds.), Information Retrieval: Data Structures & Algorithms, a 1992 collection of journal articles on various related topics, and Gerald Kowalski's (1997) Information Retrieval Systems: Theory and Implementation, a broad overview xv xvi Preface to the First Edition of information retrieval systems, are fine textbooks on the topic, but both understandably lack the gritty details of the mathematical computations needed to build more successful search engines. With this in mind, USE does not provide an overview of information retrieval systems but prefers to assume a supplementary role to the aforementioned books.

Further Reading but it does address the subject of IR (indexing, queries, and index construction), albeit from a unique compression perspective. One of the first books that covers various information retrieval topics was actually a collection of survey papers edited by William B. Frakes and Ricardo Baeza-Yates. Their 1992 book [30], Information Retrieval: Dat Structures & Algorithms, contains several seminal works in this area, including the use of signature-based text retrieval methods by Christos Faloutsos and the development of ranking algorithms by Donna Harman. Ricardo Baeza-Yates and Berthier Ribeiro-Neto's [2] Modern Information Retrieval is another collection of well-integrated research articles from various authors with a computer-science perspective of information retrieval. 9.2 Computational Methods and Software Two SIAM Review articles (Berry, Dumais, and O'Brien in 1995 [8] and Berry, Drmac, and Jessup in 1999 [7]) demonstrate the use of linear algebra for vector space IR models such as LSI.


pages: 298 words: 43,745

Understanding Sponsored Search: Core Elements of Keyword Advertising by Jim Jansen

AltaVista, AOL-Time Warner, barriers to entry, behavioural economics, Black Swan, bounce rate, business intelligence, butterfly effect, call centre, Claude Shannon: information theory, complexity theory, content marketing, correlation does not imply causation, data science, en.wikipedia.org, first-price auction, folksonomy, Future Shock, information asymmetry, information retrieval, intangible asset, inventory management, life extension, linear programming, longitudinal study, machine translation, megacity, Nash equilibrium, Network effects, PageRank, place-making, power law, price mechanism, psychological pricing, random walk, Schrödinger's Cat, sealed-bid auction, search costs, search engine result page, second-price auction, second-price sealed-bid, sentiment analysis, social bookmarking, social web, software as a service, stochastic process, tacit knowledge, telemarketer, the market place, The Present Situation in Quantum Mechanics, the scientific method, The Wisdom of Crowds, Vickrey auction, Vilfredo Pareto, yield management

Journal of the American Society for Information Science and Technology, vol. 56(6), pp. 559–570. [46] Belkin, N. J. 1993. “Interaction with Texts: Information Retrieval as Information-Seeking Behavior.” In Information retrieval ’93. Von der Modellierung zur Anwendung. Konstanz, Germany: Universitaetsverlag Konstanz, pp. 55–66. [47] Saracevic, T. 1997. “Extension and Application of the Stratified Model of Information Retrieval Interaction.” In the Annual Meeting of the American Society for Information Science, Washington, DC, pp. 313–327. [48] Saracevic, T. 1996. “Modeling Interaction in Information Retrieval (IR): A Review and Proposal.” In the 59th American Society for Information Science Annual Meeting, Baltimore, MD, pp. 3–9. [49] Belkin, N., Cool, C., Croft, W.

., and Callan, J. 1993. “The Effect of Multiple Query Representations on Information Retrieval Systems.” In 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 339–346. [50] Belkin, N., Cool, C., Kelly, D., Lee, H.-J., Muresan, G., Tang, M.-C., and Yuan, X.-J. 2003. “Query Length in Interactive Information Retrieval.” In 26th Annual International ACM 58 [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] Understanding Sponsored Search Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 205–212. Cronen-Townsend, S., Zhou, Y., and Croft, W.

Information overload: refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information (see Chapter 5 customers). Information retrieval: a field of study related to information extraction. Information retrieval is about developing systems to effectively index and search vast amounts of data (Source: SearchEngineDictionary.com) (see Chapter 3 keywords). Information scent: cues related to the desired outcome (see Chapter 3 keywords). Information searching: refers to people’s interaction with information-retrieval systems, ranging from adopting search strategy to judging the relevance of information retrieved (see Chapter 3 keywords). Insertion: actual placement of an ad in a document, as recorded by the ad server (Source: IAB) (see Chapter 2 model).


Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose

Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, sparse data, speech recognition, statistical model, William of Occam

To do this we look into the technology for text analysis and search developed earlier in the area of information retrieval and extended recently with ranking methods based on web hyperlink structure. All that may be seen as a preprocessing step in the overall process of data mining the web content, which provides the input to machine learning methods for extracting knowledge from hypertext data, discussed in the second part of the book. Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage C 2007 John Wiley & Sons, Inc. By Zdravko Markov and Daniel T. Larose Copyright CHAPTER 1 INFORMATION RETRIEVAL AND WEB SEARCH WEB CHALLENGES CRAWLING THE WEB INDEXING AND KEYWORD SEARCH EVALUATING SEARCH QUALITY SIMILARITY SEARCH WEB CHALLENGES As originally proposed by Tim Berners-Lee [1], the Web was intended to improve the management of general information about accelerators and experiments at CERN.

This idea was implemented in one of the first search 32 CHAPTER 1 INFORMATION RETRIEVAL AND WEB SEARCH engines, the World Wide Web Worm system [4], and later used by Lycos and Google. This allows search engines to increase their indices with pages that have never been crawled, are unavailable, or include nontextual content that cannot be indexed, such as images and programs. As reported by Brin and Page [5] in 1998, Google indexed 24 million pages and over 259 million anchors. EVALUATING SEARCH QUALITY Information retrieval systems do not have formal semantics (such as that of databases), and consequently, the query and the set of documents retrieved (the response of the IR system) cannot be mapped one to one.

II. Title. QA76.9.D343M38 2007 005.74 – dc22 2006025099 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 For my children Teodora, Kalin, and Svetoslav – Z.M. For my children Chantal, Ellyriane, Tristan, and Ravel – D.T.L. CONTENTS PREFACE xi PART I WEB STRUCTURE MINING 1 2 INFORMATION RETRIEVAL AND WEB SEARCH 3 Web Challenges Web Search Engines Topic Directories Semantic Web Crawling the Web Web Basics Web Crawlers Indexing and Keyword Search Document Representation Implementation Considerations Relevance Ranking Advanced Text Search Using the HTML Structure in Keyword Search Evaluating Search Quality Similarity Search Cosine Similarity Jaccard Similarity Document Resemblance References Exercises 3 4 5 5 6 6 7 13 15 19 20 28 30 32 36 36 38 41 43 43 HYPERLINK-BASED RANKING 47 Introduction Social Networks Analysis PageRank Authorities and Hubs Link-Based Similarity Search Enhanced Techniques for Page Ranking References Exercises 47 48 50 53 55 56 57 57 vii viii CONTENTS PART II WEB CONTENT MINING 3 4 5 CLUSTERING 61 Introduction Hierarchical Agglomerative Clustering k-Means Clustering Probabilty-Based Clustering Finite Mixture Problem Classification Problem Clustering Problem Collaborative Filtering (Recommender Systems) References Exercises 61 63 69 73 74 76 78 84 86 86 EVALUATING CLUSTERING 89 Approaches to Evaluating Clustering Similarity-Based Criterion Functions Probabilistic Criterion Functions MDL-Based Model and Feature Evaluation Minimum Description Length Principle MDL-Based Model Evaluation Feature Selection Classes-to-Clusters Evaluation Precision, Recall, and F-Measure Entropy References Exercises 89 90 95 100 101 102 105 106 108 111 112 112 CLASSIFICATION 115 General Setting and Evaluation Techniques Nearest-Neighbor Algorithm Feature Selection Naive Bayes Algorithm Numerical Approaches Relational Learning References Exercises 115 118 121 125 131 133 137 138 PART III WEB USAGE MINING 6 INTRODUCTION TO WEB USAGE MINING 143 Definition of Web Usage Mining Cross-Industry Standard Process for Data Mining Clickstream Analysis 143 144 147 CONTENTS 7 8 9 ix Web Server Log Files Remote Host Field Date/Time Field HTTP Request Field Status Code Field Transfer Volume (Bytes) Field Common Log Format Identification Field Authuser Field Extended Common Log Format Referrer Field User Agent Field Example of a Web Log Record Microsoft IIS Log Format Auxiliary Information References Exercises 148 PREPROCESSING FOR WEB USAGE MINING 156 Need for Preprocessing the Data Data Cleaning and Filtering Page Extension Exploration and Filtering De-Spidering the Web Log File User Identification Session Identification Path Completion Directories and the Basket Transformation Further Data Preprocessing Steps References Exercises 156 149 149 149 150 151 151 151 151 151 152 152 152 153 154 154 154 158 161 163 164 167 170 171 174 174 174 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING 177 Introduction Number of Visit Actions Session Duration Relationship between Visit Actions and Session Duration Average Time per Page Duration for Individual Pages References Exercises 177 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION Introduction Modeling Methodology Definition of Clustering The BIRCH Clustering Algorithm Affinity Analysis and the A Priori Algorithm 177 178 181 183 185 188 188 191 191 192 193 194 197 x CONTENTS Discretizing the Numerical Variables: Binning Applying the A Priori Algorithm to the CCSU Web Log Data Classification and Regression Trees The C4.5 Algorithm References Exercises INDEX 199 201 204 208 210 211 213 PREFACE DEFINING DATA MINING THE WEB By data mining the Web, we refer to the application of data mining methodologies, techniques, and models to the variety of data forms, structures, and usage patterns that comprise the World Wide Web.


pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr by Doug Turnbull, John Berryman

business logic, cognitive load, commoditize, crowdsourcing, data science, domain-specific language, Dr. Strangelove, fail fast, finite state, fudge factor, full text search, heat death of the universe, information retrieval, machine readable, natural language processing, premature optimization, recommendation engine, sentiment analysis, the long tail

In reality, there is a discipline behind relevance: the academic field of information retrieval. It has generally accepted practices to improve relevance broadly across many domains. But you’ve seen that what’s relevant depends a great deal on your application. Given that, as we introduce information retrieval, think about how its general findings can be used to solve your narrower relevance problem.[2] 2 For an introduction to the field of information retrieval, we highly recommend the classic text Introduction to Information Retrieval by Christopher D. Manning et al. (Cambridge University Press, 2008); see http://nlp.stanford.edu/IR-book/. 1.3.1.

That information will solve your problem, and you’ll move on. In information retrieval, relevance is defined as the practice of returning search results that most satisfy the user’s information needs. Further, classic information retrieval focuses on text ranking. Many findings in information retrieval try to measure how likely a given article is going to be relevant to a user’s text search. You’ll learn about several of these invaluable methods throughout this book—as many of these findings are implemented in open source search engines. To discover better text-searching methods, information retrieval researchers benchmark different strategies by using test collections of articles.

Example of making a relevance judgment for the query “Rambo” in Quepid, a judgment list management application Using judgment lists, researchers aim to measure whether changes to text relevance calculations improve the overall relevance of the results across every test collection. To classic information retrieval, a solution that improves a dozen text-heavy test collections 1% overall is a success. Rather than focusing on one particular problem in depth, information retrieval focuses on solving search for a broad set of problems. 1.3.2. Can we use information retrieval to solve relevance? You’ve already seen there’s no silver bullet. But information retrieval does seem to systematically create relevance solutions. So ask yourself: Do these insights apply to your application?


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

backpropagation, bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, disinformation, distributed generation, finite state, industrial research laboratory, information retrieval, information security, iterative process, knowledge worker, linked data, machine readable, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, power law, random walk, recommendation engine, RFID, search costs, semantic web, seminal paper, sentiment analysis, sparse data, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

Others include Machine Learning (ML), Pattern Recognition (PR), Artificial Intelligence Journal (AI), IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), and Cognitive Science. Textbooks and reference books on information retrieval include Introduction to Information Retrieval by Manning, Raghavan, and Schutz [MRS08]; Information Retrieval: Implementing and Evaluating Search Engines by Büttcher, Clarke, and Cormack [BCC10]; Search Engines: Information Retrieval in Practice by Croft, Metzler, and Strohman [CMS09]; Modern Information Retrieval: The Concepts and Technology Behind Search by Baeza-Yates and Ribeiro-Neto [BYRN11]; and Information Retrieval: Algorithms and Heuristics by Grossman and Frieder [GR04]. Information retrieval research is published in the proceedings of several information retrieval and Web search and mining conferences, including the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), the International World Wide Web Conference (WWW), the ACM International Conference on Web Search and Data Mining (WSDM), the ACM Conference on Information and Knowledge Management (CIKM), the European Conference on Information Retrieval (ECIR), the Text Retrieval Conference (TREC), and the ACM/IEEE Joint Conference on Digital Libraries (JCDL).

The data cube model not only facilitates OLAP in multidimensional databases but also promotes multidimensional data mining (see Section 1.3.2). 1.5.4. Information Retrieval Information retrieval (IR) is the science of searching for documents or information in documents. Documents can be text or multimedia, and may reside on the Web. The differences between traditional information retrieval and database systems are twofold: Information retrieval assumes that (1) the data under search are unstructured; and (2) the queries are formed mainly by keywords, which do not have complex structures (unlike SQL queries in database systems). The typical approaches in information retrieval adopt probabilistic models. For example, a text document can be regarded as a bag of words, that is, a multiset of words appearing in the document.

Information retrieval research is published in the proceedings of several information retrieval and Web search and mining conferences, including the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), the International World Wide Web Conference (WWW), the ACM International Conference on Web Search and Data Mining (WSDM), the ACM Conference on Information and Knowledge Management (CIKM), the European Conference on Information Retrieval (ECIR), the Text Retrieval Conference (TREC), and the ACM/IEEE Joint Conference on Digital Libraries (JCDL). Other sources of publication include major information retrieval, information systems, and Web journals, such as Journal of Information Retrieval, ACM Transactions on Information Systems (TOIS), Information Processing and Management, Knowledge and Information Systems (KAIS), and IEEE Transactions on Knowledge and Data Engineering (TKDE). 2.


Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data by Dipanjan Sarkar

bioinformatics, business intelligence, business logic, computer vision, continuous integration, data science, deep learning, Dr. Strangelove, en.wikipedia.org, functional programming, general-purpose programming language, Guido van Rossum, information retrieval, Internet of things, invention of the printing press, iterative process, language acquisition, machine readable, machine translation, natural language processing, out of africa, performance metric, premature optimization, recommendation engine, self-driving car, semantic web, sentiment analysis, speech recognition, statistical model, text mining, Turing test, web application

Important Concepts Our main objectives in this chapter are to understand text similarity and clustering. Before moving on to the actual techniques and algorithms, this section will discuss some important concepts related to information retrieval, document similarity measures, and machine learning. Even though some of these concepts might be familiar to you from the previous chapters, all of them will be useful to us as we gradually journey through this chapter. Without further ado, let’s get started. Information Retrieval (IR) Information retrieval (IR) is the process of retrieving or fetching relevant sources of information from a corpus or set of entities that hold information based on some demand.

I recommend using gensim’s hellinger() function, available in the gensim.matutils module (which uses the same logic as our preceding function) when building large-scale systems for analyzing similarity. Okapi BM25 Ranking There are several techniques that are quite popular in information retrieval and search engines, including PageRank and Okapi BM25. The acronym BM stands for best matching. This technique is also known as BM25, but for the sake of completeness I refer to it as Okapi BM25, because originally although the concepts behind the BM25 function were merely theoretical, the City University in London built the Okapi Information Retrieval system in the 1980s–90s, which implemented this technique to retrieve documents on actual real-world data. This technique can also be called a framework or model based on probabilistic relevancy and was developed by several people in the 1970s–80s, including computer scientists S.

Automated Text Classification Text Classification Blueprint Text Normalization Feature Extraction Bag of Words Model TF-IDF Model Advanced Word Vectorization Models Classification Algorithms Multinomial Naïve Bayes Support Vector Machines Evaluating Classification Models Building a Multi-Class Classification System Applications and Uses Summary Chapter 5:​ Text Summarization Text Summarization and Information Extraction Important Concepts Documents Text Normalization Feature Extraction Feature Matrix Singular Value Decomposition Text Normalization Feature Extraction Keyphrase Extraction Collocations Weighted Tag–Based Phrase Extraction Topic Modeling Latent Semantic Indexing Latent Dirichlet Allocation Non-negative Matrix Factorization Extracting Topics from Product Reviews Automated Document Summarization Latent Semantic Analysis TextRank Summarizing a Product Description Summary Chapter 6:​ Text Similarity and Clustering Important Concepts Information Retrieval (IR) Feature Engineering Similarity Measures Unsupervised Machine Learning Algorithms Text Normalization Feature Extraction Text Similarity Analyzing Term Similarity Hamming Distance Manhattan Distance Euclidean Distance Levenshtein Edit Distance Cosine Distance and Similarity Analyzing Document Similarity Cosine Similarity Hellinger-Bhattacharya Distance Okapi BM25 Ranking Document Clustering Clustering Greatest Movies of All Time K-means Clustering Affinity Propagation Ward’s Agglomerative Hierarchical Clustering Summary Chapter 7:​ Semantic and Sentiment Analysis Semantic Analysis Exploring WordNet Understanding Synsets Analyzing Lexical Semantic Relations Word Sense Disambiguation Named Entity Recognition Analyzing Semantic Representations Propositional Logic First Order Logic Sentiment Analysis Sentiment Analysis of IMDb Movie Reviews Setting Up Dependencies Preparing Datasets Supervised Machine Learning Technique Unsupervised Lexicon-based Techniques Comparing Model Performances Summary Index Contents at a Glance About the Author About the Technical Reviewer Acknowledgments Introduction Chapter 1:​ Natural Language Basics Chapter 2:​ Python Refresher Chapter 3:​ Processing and Understanding Text Chapter 4:​ Text Classification Chapter 5:​ Text Summarization Chapter 6:​ Text Similarity and Clustering Chapter 7:​ Semantic and Sentiment Analysis Index About the Author and About the Technical Reviewer About the Author Dipanjan Sarkar is a data scientist at Intel, the world’s largest silicon company, which is on a mission to make the world more connected and productive.


pages: 263 words: 75,610

Delete: The Virtue of Forgetting in the Digital Age by Viktor Mayer-Schönberger

digital divide, en.wikipedia.org, Erik Brynjolfsson, Firefox, full text search, George Akerlof, information asymmetry, information retrieval, information security, information trail, Internet Archive, invention of movable type, invention of the printing press, John Markoff, Joi Ito, lifelogging, moveable type in China, Network effects, packet switching, Panopticon Jeremy Bentham, pattern recognition, power law, RFID, slashdot, Steve Jobs, Steven Levy, systematic bias, The Market for Lemons, The Structural Transformation of the Public Sphere, Vannevar Bush, Yochai Benkler

See information dossiers Dutch citizen registry, 141, 157–58 DVD, 64–65, 145 eBay, 93, 95 Ecommerce, 131 Egypt, 32 Eisenstein, Elizabeth, 37, 38 e-mails: preservation of, 69 entropy, 22 epics, 25, 26, 27 European Human Rights Convention, 110 European Union Privacy Directive, 158–59, 160 exit, 99 Expedia.com, 8 expiration dates for information, 171–95, 198–99 binary nature of, 192–93 imperfection of, 194–95 negotiating, 185–89, 187 persistence of, 183–85 societal preferences for, 182–83 external memory, limitations of, 34 Facebook, 2, 3, 84, 86, 197 Feldmar, Andrew, 3–4, 5, 104–5, 109, 111, 197 Felten, Edward, 151–52, 188 fiber-optic cables, 80–81 fidelity, 60 filing systems, 74 film, 47 fingerprints, 78 First Amendment, 110 Flash memory, 63 Flickr, 84, 102, 124 flight reservation, 8 Foer, Joshua, 21 forgetting: cost of, 68, 91, 92 human, 19–20, 114–17 central importance of, 13, 21 societal, 13 forgiving, 197 Foucault, Michel, 11, 112 free-riding, 133 Friedman, Lawrence, 106 Gandy, Oscar, 11, 165 Gasser, Urs, 3, 130 “Goblin edits,” 62 Google, 2, 6–8, 70–71, 84, 103, 104, 109, 130–31, 175–78, 179, 186, 197 governmental decision-making, 94 GPS, 9 Graham, Mary, 94 Gutenberg, 37–38 hard disks, 62–63 hieroglyphs, 32 Hilton, Paris, 86 history: omnipresence of, 125 Hotmail, 69 human cognition, 154–57 “Imagined Communities,” 43 index, 73–74, 90 full-text, 76–77 information: abstract, 17 biometric, 9 bundling of, 82–83 control over, 85–87, 91, 97–112, 135–36, 140, 167–68, 181–82 deniability of, 87 decontextualization of, 78, 89–90, 142 economics of, 82–83 incompleteness of, 156 interpretation of, 96 leakages of, 105, 133–34 legally mandated retention of, 160–61 lifespan of, 172 markets for, 145–46 misuse of, 140 peer-to-peer sharing of, 84, 86 processors of, 175–78 production cost of, 82–83 property of, 143 quality of, 96–97 recall of, 18–19 recombining of, 61–62, 85, 88–90 recontextualization of, 89–90 retrieval of, 72–79 risk of collecting, 158 role of, 85 self-disclosure of, 4 sharing of, 3, 84–85 total amount of, 52 information control: relational concepts of, 153 information dossiers, 104 digital, 123–25 information ecology, 157–63 information power, 112 differences in, 107, 133, 187, 191, 192 information privacy, 100, 108, 135, 174, 181–82 effectiveness of rights to, 135–36, 139–40, 143–44 enforcement of right to, 139–40 purpose limitation principle in, 136, 138, 159 rights to, 134–44 information retrieval. See information: retrieval of information sharing: default of, 88 information storage: capacity, 66 cheap, 62–72 corporate, 68–69 density of, 71 economics of, 68 increase in, 71–72 magnetic, 62–64 optical, 64–65 relative cost of, 65–66 sequential nature of analog, 75 informational self-determination, 137 relational dimension of, 170 intellectual property (IP), 144, 146, 150, 174 Internet, 79 “future proof,” 59–60 peer-production and, 131–32 Internet archives, 4 Islam: printing in, 40 Ito, Joi, 126 Johnson, Deborah, 14 Keohane, Robert, 98 Kodak, Eastman, 45–46 Korea: printing in, 40 language, 23–28 Lasica, J.

The likely medium-term outcome is that storage capacity will continue to double and storage costs to halve about every eighteen to twenty-four months, leaving us with an abundance of cheap digital storage. Easy Retrieval Remembering is more than committing information to memory. It includes the ability to retrieve that information later easily and at will. As humans, we are all too familiar with the challenges of information retrieval from our brain’s long-term memory. External analog memory, like books, hold huge amounts of information, but finding a particular piece of information in it is difficult and time-consuming. Much of the latent value of stored information remains trapped, unlikely to be utilized. Even though we may have stored it, analog information that cannot be retrieved easily in practical terms is no different from having been forgotten.

In contrast, retrieval from digital memory is vastly easier, cheaper, and swifter: a few words in the search box, a click, and within a few seconds a list of matching information is retrieved and presented in neatly formatted lists. Such trouble-free retrieval greatly enhances the value of information. To be sure, humans have always tried to make information retrieval easier and less cumbersome, but they faced significant hurdles. Take written information. The switch from tablets and scrolls to bound books helped in keeping information together, and certainly improved accessibility, but it did not revolutionize retrieval. Similarly, libraries helped amass information, but didn’t do as much in tracking it down.


pages: 290 words: 73,000

Algorithms of Oppression: How Search Engines Reinforce Racism by Safiya Umoja Noble

A Declaration of the Independence of Cyberspace, affirmative action, Airbnb, algorithmic bias, Alvin Toffler, Black Lives Matter, borderless world, cloud computing, conceptual framework, critical race theory, crowdsourcing, data science, desegregation, digital divide, disinformation, Donald Trump, Edward Snowden, fake news, Filter Bubble, Firefox, Future Shock, Gabriella Coleman, gamification, Google Earth, Google Glasses, housing crisis, illegal immigration, immigration reform, information retrieval, information security, Internet Archive, Jaron Lanier, John Perry Barlow, military-industrial complex, Mitch Kapor, Naomi Klein, new economy, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, PageRank, performance metric, phenotype, profit motive, Silicon Valley, Silicon Valley ideology, Snapchat, the long tail, Tim Cook: Apple, union organizing, women in the workforce, work culture , yellow journalism

Saracevic notes that “the domain of information science is the transmission of the universe of human knowledge in recorded form, centering on manipulation (representation, organization, and retrieval) of information, rather than knowing information.”43 This foregrounds the ways that representations in search engines are decontextualized in one specific type of information-retrieval process, particularly for groups whose images, identities, and social histories are framed through forms of systemic domination. Although there is a long, broad, and historical context for addressing categorizations, the impact of learning from these traditions has not yet been fully realized.44 Attention to “the universe of human knowledge” is suggestive for contextualizing information-retrieval practices this way, leading to inquiries into the ways current information-retrieval practices on the web, via commercial search engines, make some types of information available and suppress others.

., not working at the level of code) to engage in sharing links to and from websites.31 Research shows that users typically use very few search terms when seeking information in a search engine and rarely use advanced search queries, as most queries are different from traditional offline information-seeking behavior.32 This front-end behavior of users appears to be simplistic; however, the information retrieval systems are complex, and the formulation of users’ queries involves cognitive and emotional processes that are not necessarily reflected in the system design.33 In essence, while users use the simplest queries they can in a search box because of the way interfaces are designed, this does not always reflect how search terms are mapped against more complex thought patterns and concepts that users have about a topic. This disjunction between, on the one hand, users’ queries and their real questions and, on the other, information retrieval systems makes understanding the complex linkages between the content of the results that appear in a search and their import as expressions of power and social relations of critical importance.

For this reason, it is important to study the social context of those who are organizing information and the potential impacts of the judgments inherent in informational organization processes. Information must be treated in a context; “it involves motivation or intentionality, and therefore it is connected to the expansive social context or horizon, such as culture, work, or problem-at-hand,” and this is fundamental to the origins of information science and to information retrieval.42 Information retrieval as a practice has become a highly commercialized industry, predicated on federally funded experiments and research initiatives, leading to the formation of profitable ventures such as Yahoo! and Google, and a focus on information relevance continues to be of importance to the field.


pages: 291 words: 77,596

Total Recall: How the E-Memory Revolution Will Change Everything by Gordon Bell, Jim Gemmell

airport security, Albert Einstein, book scanning, cloud computing, Computing Machinery and Intelligence, conceptual framework, Douglas Engelbart, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, Ivan Sutherland, John Markoff, language acquisition, lifelogging, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Steve Bannon, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application

Doherty, A., C. Gurrin, G. Jones, and A. F. Smeaton. “Retrieval of Similar Travel Routes Using GPS Tracklog Place Names.” SIGIR 2006—Conference on Research and Development on Information Retrieval, Workshop on Geographic Information Retrieval, Seattle, Washington, August 6-11, 2006. Gurrin, C., A. F. Smeaton, D. Byrne, N. O’Hare, G. Jones, and N. O’Connor. “An Examination of a Large Visual Lifelog.” AIRS 2008—Asia Information Retrieval Symposium, Harbin, China, January 16-18, 2008. Lavelle, B., D. Byrne, C. Gurrin, A. F. Smeaton, and G. Jones. “Bluetooth Familiarity: Methods of Calculation, Applications and Limitations.”

“Physical Context for Just-in-Time Information Retrieval.” IEEE Transactions on Computers 52, no. 8 (August): 1011-14. ———. 1997. “The Wearable Remembrance Agent: A System for Augmented Memory.” Special Issue on Wearable Computing, Personal Technologies Journal 1:218-24. Rhodes, Bradley J. “Margin Notes: Building a Contextually Aware Associative Memory” (html), to appear in The Proceedings of the International Conference on Intelligent User Interfaces (IUI ’00), New Orleans, Louisiana, January 9-12, 2000. Rhodes, Bradley, and Pattie Maes. 2000. “Just-in-Time Information Retrieval Agents.” Special issue on the MIT Media Laboratory, IBM Systems Journal 39, nos. 3 and 4: 685-704.

Eighth RIAO Conference—Large-Scale Semantic Access to Content (Text, Image, Video and Sound), Pittsburgh, Pennsylvania, May 30-June 1, 2007. Lee, Hyowon, Alan F. Smeaton, Noel E. O’Connor, and Gareth J. F. Jones. “Adaptive Visual Summary of LifeLog Photos for Personal Information Management.” AIR 2006—First International Workshop on Adaptive Information Retrieval, Glasgow, UK, October 14, 2006. O’Conaire, C., N. O’Connor, A. F. Smeaton, and G. Jones. “Organizing a Daily Visual Diary Using Multi-Feature Clustering.” SPIE Electronic Imaging—Multimedia Content Access: Algorithms and Systems (EI121), San Jose, California, January 28-February 1, 2007. Smeaton, A.


pages: 193 words: 19,478

Memory Machines: The Evolution of Hypertext by Belinda Barnet

augmented reality, Benoit Mandelbrot, Bill Duvall, British Empire, Buckminster Fuller, Charles Babbage, Claude Shannon: information theory, collateralized debt obligation, computer age, Computer Lib, conceptual framework, Douglas Engelbart, Douglas Engelbart, game design, hiring and firing, Howard Rheingold, HyperCard, hypertext link, Ian Bogost, information retrieval, Internet Archive, John Markoff, linked data, mandelbrot fractal, Marshall McLuhan, Menlo Park, nonsequential writing, Norbert Wiener, Project Xanadu, publish or perish, Robert Metcalfe, semantic web, seminal paper, Steve Jobs, Stewart Brand, technoutopianism, Ted Nelson, the scientific method, Vannevar Bush, wikimedia commons

He protested that he was doing neither ‘information retrieval’ nor ‘electrical engineering’, but a new thing somewhere in between, and that it should be recognized as a new field of research. In our interview he remembered that: After I’d given a talk at Stanford, [three angry guys] got me later outside at a table. They said, ‘All you’re talking about is information retrieval.’ I said no. They said, ‘YES, it is, we’re professionals and we know, so we’re telling you don’t know enough so stay out of it, ’cause goddamit, you’re bollocksing it all up. You’re in engineering, not information retrieval.’ (Engelbart 1999) Computers, in large part, were still seen as number crunchers, and computer engineers had no business talking about psychology and the human beings who used these machines.

As Engelbart told the author of this book in 1999, he was often told to mind his own business and keep off well-defined turf: After I’d given a talk at Stanford, [three angry guys] got me later outside at a table. They said, ‘All you’re talking about is information retrieval.’ I said no. They said, ‘YES, it is, we’re professionals and we know, so we’re telling you don’t know enough so stay out of it, ’cause goddamit, you’re bollocksing it all up. You’re in engineering, not information retrieval.’ (Engelbart 1999) My hero; the man who never knew too much about disciplinary confines, professional flocking rules and the mere retrieval of information; the man who straps bricks to pencils, who annoys the specialists, who insists on bollocksing up the computer world in all kinds of fascinating ways.

Gleick quotes a rather different assessment of Babbage from an early twentieth-century edition of the Dictionary of National Biography: Mathematician and scientific mechanician […] obtained government grant for making a calculating machine […] but the work of construction ceased, owing to disagreements with the engineer; offered the government an improved design, which was refused on grounds of expense […] Lucasian professor of mathematics, Cambridge, but delivered no lectures. (Cited in Gleick 2011, 121) In the words of the information retrievers, Babbage seems a resounding failure, no matter if he did (undeservedly, according to the insinuation) have Newton’s chair. Perhaps biography does not belong in dictionaries. Among other blessings that came to Babbage was one of the great friendships in intellectual history, with Augusta Ada King, Countess Lovelace.


pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives by Steven Levy

"World Economic Forum" Davos, 23andMe, AltaVista, Andy Rubin, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, Bill Atkinson, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Dutch auction, El Camino Real, Evgeny Morozov, fault tolerance, Firefox, General Magic , Gerard Salton, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, high-speed rail, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, John Markoff, Ken Thompson, Kevin Kelly, Kickstarter, large language model, machine translation, Mark Zuckerberg, Menlo Park, one-China policy, optical character recognition, PageRank, PalmPilot, Paul Buchheit, Potemkin village, prediction markets, Project Xanadu, recommendation engine, risk tolerance, Rubik’s Cube, Sand Hill Road, Saturday Night Live, search inside the book, second-price auction, selection bias, Sheryl Sandberg, Silicon Valley, SimCity, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, subscription business, Susan Wojcicki, Ted Nelson, telemarketer, The future is already here, the long tail, trade route, traveling salesman, turn-by-turn navigation, undersea cable, Vannevar Bush, web application, WikiLeaks, Y Combinator

AltaVista’s actual search quality techniques—what determined the ranking of results—were based on traditional information retrieval (IR) algorithms. Many of those algorithms arose from the work of one man, a refugee from Nazi Germany named Gerard Salton, who had come to America, got a PhD at Harvard, and moved to Cornell University, where he cofounded its computer science department. Searching through databases using the same commands you’d use with a human—“natural language” became the term of art—was Salton’s specialty. During the 1960s, Salton developed a system that was to become a model for information retrieval. It was called SMART, supposedly an acronym for “Salton’s Magical Retriever of Text.”

Page’s brother, nine years older, was already in Silicon Valley, working for an Internet start-up. Page chose to work in the department’s Human-Computer Interaction Group. The subject would stand Page in good stead in the future with respect to product development, even though it was not in the HCI domain to figure out a new model of information retrieval. On his desk and permeating his conversations was Apple interface guru Donald Norman’s classic tome The Psychology of Everyday Things, the bible of a religion whose first, and arguably only, commandment is “The user is always right.” (Other Norman disciples, such as Jeff Bezos at Amazon.com, were adopting this creed on the web.)

DEC had been built on the minicomputer, a once innovative category now rendered a dinosaur by the personal computer revolution. “DEC was very much living in the past,” says Monier. “But they had small groups of people who were very forward-thinking, experimenting with lots of toys.” One of those toys was the web. Monier himself was no expert in information retrieval but a big fan of data in the abstract. “To me, that was the secret—data,” he says. What the data was telling him was that if you had the right tools, it was possible to treat everything in the open web like a single document. Even at that early date, the basic building blocks of web search had been already set in stone.


pages: 1,085 words: 219,144

Solr in Action by Trey Grainger, Timothy Potter

business intelligence, cloud computing, commoditize, conceptual framework, crowdsourcing, data acquisition, data science, en.wikipedia.org, failed state, fault tolerance, finite state, full text search, functional programming, glass ceiling, information retrieval, machine readable, natural language processing, openstreetmap, performance metric, premature optimization, recommendation engine, web application

To begin, we need to know how Solr matches home listings in the index to queries entered by users, as this is the basis for all search applications. 1.2.1. Information retrieval engine Solr is built on Apache Lucene, a popular, Java-based, open source, information retrieval library. We’ll save a detailed discussion of what information retrieval is for chapter 3. For now, we’ll touch on the key concepts behind information retrieval, starting with the formal definition taken from one of the prominent academic texts on modern search concepts: Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).[1] 1 Christopher D.

IBSimilarity class ICUFoldingFilterFactory idf (inverse document frequency), 2nd, 3rd, 4th if function implicit routing importing documents common formats DIH ExtractingRequestHandler Nutch relational database data using JSON using SolrJ library using XML Inactive state incremental indexing indent parameter indexlog utility IndicNormalizationFilterFactory Indonesian language IndonesianStemFilterFactory information discovery use case information retrieval. See IR. installing Solr instanceDir parameter <int> element Integrated Development Environment. See IDE. IntelliJ IDEA internationalization. See multilingual search. Intersects operation invalidating cached objects invariants section inverse document frequency. See idf. inverted index ordering of terms overview IR (information retrieval) Irish language IrishLowerCaseFilterFactory, 2nd IsDisjointTo operation IsWithin operation Italian language ItalianLightStemFilterFactory J J2EE (Java 2 Platform, Enterprise Edition) Japanese language, 2nd JapaneseBaseFormFilterFactory JapaneseKatakanaStemFilterFactory JapaneseTokenizerFactory JAR files Java 2 Platform, Enterprise Edition.

Useful data import configurations Index List of Figures List of Tables List of Listings Table of Contents Copyright Brief Table of Contents Table of Contents Foreword Preface Acknowledgments About this Book 1. Meet Solr Chapter 1. Introduction to Solr 1.1. Why do I need a search engine? 1.1.1. Managing text-centric data 1.1.2. Common search-engine use cases 1.2. What is Solr? 1.2.1. Information retrieval engine 1.2.2. Flexible schema management 1.2.3. Java web application 1.2.4. Multiple indexes in one server 1.2.5. Extendable (plugins) 1.2.6. Scalable 1.2.7. Fault-tolerant 1.3. Why Solr? 1.3.1. Solr for the software architect 1.3.2. Solr for the system administrator 1.3.3.


pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack by Matthew A. Russell

Andy Rubin, business logic, Climategate, cloud computing, crowdsourcing, data science, en.wikipedia.org, fault tolerance, Firefox, folksonomy, full text search, Georg Cantor, Google Earth, information retrieval, machine readable, Mark Zuckerberg, natural language processing, NP-complete, power law, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, sparse data, statistical model, Steve Jobs, supply-chain management, text mining, traveling salesman, Turing test, web application

Text Mining Fundamentals Although rigorous approaches to natural language processing (NLP) that include such things as sentence segmentation, tokenization, word chunking, and entity detection are necessary in order to achieve the deepest possible understanding of textual data, it’s helpful to first introduce some fundamentals from Information Retrieval theory. The remainder of this chapter introduces some of its more foundational aspects, including TF-IDF, the cosine similarity metric, and some of the theory behind collocation detection. Chapter 8 provides a deeper discussion of NLP. Note If you want to dig deeper into IR theory, the full text of Introduction to Information Retrieval is available online and provides more information than you could ever want to know about the field. A Whiz-Bang Introduction to TF-IDF Information retrieval is an extensive field with many specialties.

I identity consolidation, Brief analysis of breadth-first techniques IDF (inverse document frequency), A Whiz-Bang Introduction to TF-IDF, A Whiz-Bang Introduction to TF-IDF (see also TF-IDF) calculation of, A Whiz-Bang Introduction to TF-IDF idf function, A Whiz-Bang Introduction to TF-IDF IETF OAuth 2.0 protocol, No, You Can’t Have My Password IMAP (Internet Message Access Protocol), Analyzing Your Own Mail Data, Accessing Gmail with OAuth, Fetching and Parsing Email Messages connecting to, using OAuth, Accessing Gmail with OAuth constructing an IMAP query, Fetching and Parsing Email Messages imaplib, Fetching and Parsing Email Messages ImportError, Installing Python Development Tools indexing function, JavaScript-based, couchdb-lucene: Full-Text Indexing and More inference, Open-World Versus Closed-World Assumptions, Inferencing About an Open World with FuXi application to machine knowledge, Inferencing About an Open World with FuXi in logic-based programming languages and RDF, Open-World Versus Closed-World Assumptions influence, measuring for Twitter users, Measuring Influence, Measuring Influence, Measuring Influence, Measuring Influence calculating Twitterer’s most popular followers, Measuring Influence crawling friends/followers connections, Measuring Influence Infochimps, Strong Links API, The Infochimps “Strong Links” API, Interactive 3D Graph Visualization information retrieval industry, Before You Go Off and Try to Build a Search Engine… information retrieval theory, Text Mining Fundamentals (see IR theory) intelligent clustering, Intelligent clustering enables compelling user experiences interactive 3D graph visualization, Interactive 3D Graph Visualization interactive 3D tag clouds for tweet entities co-occurring with #JustinBieber and #TeaParty, Visualizing Tweets with Tricked-Out Tag Clouds interpreter, Python (IPython), Closing Remarks intersection operations, Elementary Set Operations, How Much Overlap Exists Between the Entities of #TeaParty and #JustinBieber Tweets?

For comparative purposes, note that it’s certainly possible to perform text-based indexing by writing a simple mapping function that associates keywords and documents, like the one in Example 3-10. Example 3-10. A mapper that tokenizes documents def tokenizingMapper(doc): tokens = doc.split() for token in tokens: if isInteresting(token): # Filter out stop words, etc. yield token, doc However, you’ll quickly find that you need to do a lot more homework about basic Information Retrieval (IR) concepts if you want to establish a good scoring function to rank documents by relevance or anything beyond basic frequency analysis. Fortunately, the benefits of Lucene are many, and chances are good that you’ll want to use couchdb-lucene instead of writing your own mapping function for full-text indexing.


pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić

Albert Einstein, algorithmic bias, backpropagation, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

For readers interested in practical implementation of some clustering methods, the paper offers useful advice and a large spectrum of references. Miyamoto, S., Fuzzy Sets in Information Retrieval and Cluster Analysis, Cluver Academic Publishers, Dodrecht, Germany, 1990. This book offers an in-depth presentation and analysis of some clustering algorithms and reviews the possibilities of combining these techniques with fuzzy representation of data. Information retrieval, which, with the development of advanced Web-mining techniques, is becoming more important in the data-mining community, is also explained in the book. 10 ASSOCIATION RULES Chapter Objectives Explain the local modeling character of association-rule techniques.

Any researcher or practitioner in this field needs to be aware of these issues in order to successfully apply a particular methodology, to understand a method’s limitations, or to develop new techniques. This book is an attempt to present and discuss such issues and principles and then describe representative and popular methods originating from statistics, machine learning, computer graphics, data bases, information retrieval, neural networks, fuzzy logic, and evolutionary computation. In this book, we describe how best to prepare environments for performing data mining and discuss approaches that have proven to be critical in revealing important patterns, trends, and models in large data sets. It is our expectation that once a reader has completed this text, he or she will be able to initiate and perform basic activities in all phases of a data mining process successfully and effectively.

Among the various methods of supervised learning, the nearest neighbor classifier achieves consistently high performance, without a priori assumptions about the distributions from which the training examples are drawn. The reader may have noticed the similarity between the problem of finding nearest neighbors for a test sample and ad hoc retrieval methodologies. In standard information retrieval systems such as digital libraries or web search, we search for the documents (samples) with the highest similarity to the query document represented by a set of key words. Problems are similar, and often the proposed solutions are applicable in both disciplines. Decision boundaries in 1NN are concatenated segments of the Voronoi diagram as shown in Figure 4.28.


pages: 481 words: 121,669

The Invisible Web: Uncovering Information Sources Search Engines Can't See by Gary Price, Chris Sherman, Danny Sullivan

AltaVista, American Society of Civil Engineers: Report Card, Bill Atkinson, bioinformatics, Brewster Kahle, business intelligence, dark matter, Donald Davies, Douglas Engelbart, Douglas Engelbart, full text search, HyperCard, hypertext link, information retrieval, Internet Archive, it's over 9,000, joint-stock company, knowledge worker, machine readable, machine translation, natural language processing, pre–internet, profit motive, Project Xanadu, publish or perish, search engine result page, side project, Silicon Valley, speech recognition, stealth mode startup, Ted Nelson, Vannevar Bush, web application

As more and more computers connected to the Internet, users began to demand tools that would allow them to search for and locate text and other files on computers anywhere on the Net. Early Net Search Tools Although sophisticated search and information retrieval techniques date back to the late 1950s and early ‘60s, these techniques were used primarily in closed or proprietary systems. Early Internet search and retrieval tools lacked even the most basic capabilities, primarily because it was thought that traditional information retrieval techniques would not work well on an open, unstructured information universe like the Internet. Accessing a file on the Internet was a two-part process.

What was needed was an automated approach to Web page discovery and indexing. The Web had now grown large enough that information scientists became interested in creating search services specifically for the Web. Sophisticated information retrieval techniques had been available since the early 1960s, but they were only effective when searching closed, relatively structured databases. The open, laissez-faire nature of the Web made it too messy to easily adapt traditional information retrieval techniques. New, Web-centric approaches were needed. But how best to approach the problem? Web search would clearly have to be more sophisticated than a simple Archie-type service.

But in the early days of the Web, the reality was that most of the Web consisted of simple HTML text documents. Since few servers offered local site search services, developers of the first Web search engines opted for the model of indexing the full text of pages stored on Web servers. To adapt traditional information retrieval techniques to Web search, they built huge databases that attempted to replicate the Web, searching over these relatively controlled, closed archives of pages rather than trying to search the Web itself in real time. With this fateful architectural decision, limiting search engines to HTML text documents and essentially ignoring all other types of data available via the Web, the Invisible Web was born.


pages: 504 words: 89,238

Natural language processing with Python by Steven Bird, Ewan Klein, Edward Loper

bioinformatics, business intelligence, business logic, Computing Machinery and Intelligence, conceptual framework, Donald Knuth, duck typing, elephant in my pajamas, en.wikipedia.org, finite state, Firefox, functional programming, Guido van Rossum, higher-order functions, information retrieval, language acquisition, lolcat, machine translation, Menlo Park, natural language processing, P = NP, search inside the book, sparse data, speech recognition, statistical model, text mining, Turing test, W. E. B. Du Bois

While named entity recognition is frequently a prelude to identifying relations in Information Extraction, it can also contribute to other tasks. For example, in Question Answering (QA), we try to improve the precision of Information Retrieval by recovering not whole pages, but just those parts which contain an answer to the user’s question. Most QA systems take the 7.5 Named Entity Recognition | 281 documents returned by standard Information Retrieval, and then attempt to isolate the minimal text snippet in the document containing the answer. Now suppose the question was Who was the first President of the US?, and one of the documents that was retrieved contained the following passage: (5) The Washington Monument is the most prominent structure in Washington, D.C. and one of the city’s early attractions.

English, 63 code blocks, nested, 25 code examples, downloading, 57 code points, 94 codecs module, 95 coindex (in feature structure), 340 collocations, 20, 81 comma operator (,), 133 comparative wordlists, 65 comparison operators numerical, 22 for words, 23 complements of lexical head, 347 complements of verbs, 313 complex types, 373 complex values, 336 components, language understanding, 31 computational linguistics, challenges of natural language, 441 computer understanding of sentence meaning, 368 concatenation, 11, 88 lists and strings, 87 strings, 16 conclusions in logic, 369 concordances creating, 40 graphical POS-concordance tool, 184 conditional classifiers, 254 conditional expressions, 25 conditional frequency distributions, 44, 52–56 combining with regular expressions, 103 condition and event pairs, 52 counting words by genre, 52 generating random text with bigrams, 55 male and female names ending in each alphabet letter, 62 plotting and tabulating distributions, 53 using to find minimally contrasting set of words, 64 ConditionalFreqDist, 52 commonly used methods, 56 conditionals, 22, 133 confusion matrix, 207, 240 consecutive classification, 232 non phrase chunking with consecutive classifier, 275 consistent, 366 466 | General Index constituent structure, 296 constituents, 297 context exploiting in part-of-speech classifier, 230 for taggers, 203 context-free grammar, 298, 300 (see also grammars) probabilistic context-free grammar, 320 contractions in tokenization, 112 control, 22 control structures, 26 conversion specifiers, 118 conversions of data formats, 419 coordinate structures, 295 coreferential, 373 corpora, 39–52 annotated text corpora, 46–48 Brown Corpus, 42–44 creating and accessing, resources for further reading, 438 defined, 39 differences in corpus access methods, 50 exploring text corpora using a chunker, 267 Gutenberg Corpus, 39–42 Inaugural Address Corpus, 45 from languages other than English, 48 loading your own corpus, 51 obtaining from Web, 416 Reuters Corpus, 44 sources of, 73 tagged, 181–189 text corpus structure, 49–51 web and chat text, 42 wordlists, 60–63 corpora, included with NLTK, 46 corpus case study, structure of TIMIT, 407–412 corpus HOWTOs, 122 life cycle of, 412–416 creation scenarios, 412 curation versus evolution, 415 quality control, 413 widely-used format for, 421 counters, legitimate uses of, 141 cross-validation, 241 CSV (comma-separated value) format, 418 CSV (comma-separated-value) format, 170 D \d decimal digits in regular expressions, 110 \D nondigit characters in regular expressions, 111 data formats, converting, 419 data types dictionary, 190 documentation for Python standard types, 173 finding type of Python objects, 86 function parameter, 146 operations on objects, 86 database query via natural language, 361–365 databases, obtaining data from, 418 debugger (Python), 158 debugging techniques, 158 decimal integers, formatting, 119 decision nodes, 242 decision stumps, 243 decision trees, 242–245 entropy and information gain, 243 decision-tree classifier, 229 declarative style, 140 decoding, 94 def keyword, 9 defaultdict, 193 defensive programming, 159 demonstratives, agreement with noun, 329 dependencies, 310 criteria for, 312 existential dependencies, modeling in XML, 427 non-projective, 312 projective, 311 unbounded dependency constructions, 349–353 dependency grammars, 310–315 valency and the lexicon, 312 dependents, 310 descriptive models, 255 determiners, 186 agreement with nouns, 333 deve-test set, 225 development set, 225 similarity to test set, 238 dialogue act tagging, 214 dialogue acts, identifying types, 235 dialogue systems (see spoken dialogue systems) dictionaries feature set, 223 feature structures as, 337 pronouncing dictionary, 63–65 Python, 189–198 default, 193 defining, 193 dictionary data type, 190 finding key given a value, 197 indexing lists versus, 189 summary of dictionary methods, 197 updating incrementally, 195 storing features and values, 327 translation, 66 dictionary methods, 197 dictionary data structure (Python), 65 directed acyclic graphs (DAGs), 338 discourse module, 401 discourse semantics, 397–402 discourse processing, 400–402 discourse referents, 397 discourse representation structure (DRS), 397 Discourse Representation Theory (DRT), 397–400 dispersion plot, 6 divide-and-conquer strategy, 160 docstrings, 143 contents and structure of, 148 example of complete docstring, 148 module-level, 155 doctest block, 148 doctest module, 160 document classification, 227 documentation functions, 148 online Python documentation, versions and, 173 Python, resources for further information, 173 docutils module, 148 domain (of a model), 377 DRS (discourse representation structure), 397 DRS conditions, 397 DRT (Discourse Representation Theory), 397– 400 Dublin Core Metadata initiative, 435 duck typing, 281 dynamic programming, 165 General Index | 467 application to parsing with context-free grammar, 307 different approaches to, 167 E Earley chart parser, 334 electronic books, 80 elements, XML, 425 ElementTree interface, 427–429 using to access Toolbox data, 429 elif clause, if . . . elif statement, 133 elif statements, 26 else statements, 26 encoding, 94 encoding features, 223 encoding parameters, codecs module, 95 endangered languages, special considerations with, 423–424 entities, 373 entity detection, using chunking, 264–270 entries adding field to, in Toolbox, 431 contents of, 60 converting data formats, 419 formatting in XML, 430 entropy, 251 (see also Maximum Entropy classifiers) calculating for gender prediction task, 243 maximizing in Maximum Entropy classifier, 252 epytext markup language, 148 equality, 132, 372 equivalence (<->) operator, 368 equivalent, 340 error analysis, 225 errors runtime, 13 sources of, 156 syntax, 3 evaluation sets, 238 events, pairing with conditions in conditional frequency distribution, 52 exceptions, 158 existential quantifier, 374 exists operator, 376 Expected Likelihood Estimation, 249 exporting data, 117 468 | General Index F f-structure, 357 feature extractors defining for dialogue acts, 235 defining for document classification, 228 defining for noun phrase (NP) chunker, 276–278 defining for punctuation, 234 defining for suffix checking, 229 Recognizing Textual Entailment (RTE), 236 selecting relevant features, 224–227 feature paths, 339 feature sets, 223 feature structures, 328 order of features, 337 resources for further reading, 357 feature-based grammars, 327–360 auxiliary verbs and inversion, 348 case and gender in German, 353 example grammar, 333 extending, 344–356 lexical heads, 347 parsing using Earley chart parser, 334 processing feature structures, 337–344 subsumption and unification, 341–344 resources for further reading, 357 subcategorization, 344–347 syntactic agreement, 329–331 terminology, 336 translating from English to SQL, 362 unbounded dependency constructions, 349–353 using attributes and constraints, 331–336 features, 223 non-binary features in naive Bayes classifier, 249 fields, 136 file formats, libraries for, 172 files opening and reading local files, 84 writing program output to, 120 fillers, 349 first-order logic, 372–385 individual variables and assignments, 378 model building, 383 quantifier scope ambiguity, 381 summary of language, 376 syntax, 372–375 theorem proving, 375 truth in model, 377 floating-point numbers, formatting, 119 folds, 241 for statements, 26 combining with if statements, 26 inside a list comprehension, 63 iterating over characters in strings, 90 format strings, 118 formatting program output, 116–121 converting from lists to strings, 116 strings and formats, 117–118 text wrapping, 120 writing results to file, 120 formulas of propositional logic, 368 formulas, type (t), 373 free, 375 Frege’s Principle, 385 frequency distributions, 17, 22 conditional (see conditional frequency distributions) functions defined for, 22 letters, occurrence in strings, 90 functions, 142–154 abstraction provided by, 147 accumulative, 150 as arguments to another function, 149 call-by-value parameter passing, 144 checking parameter types, 146 defined, 9, 57 documentation for Python built-in functions, 173 documenting, 148 errors from, 157 for frequency distributions, 22 for iteration over sequences, 134 generating plurals of nouns (example), 58 higher-order, 151 inputs and outputs, 143 named arguments, 152 naming, 142 poorly-designed, 147 recursive, call structure, 165 saving in modules, 59 variable scope, 145 well-designed, 147 gazetteer, 282 gender identification, 222 Decision Tree model for, 242 gender in German, 353–356 Generalized Phrase Structure Grammar (GPSG), 345 generate_model ( ) function, 55 generation of language output, 29 generative classifiers, 254 generator expressions, 138 functions exemplifying, 151 genres, systematic differences between, 42–44 German, case and gender in, 353–356 gerunds, 211 glyphs, 94 gold standard, 201 government-sponsored challenges to machine learning application in NLP, 257 gradient (grammaticality), 318 grammars, 327 (see also feature-based grammars) chunk grammar, 265 context-free, 298–302 parsing with, 302–310 validating Toolbox entries with, 433 writing your own, 300 dependency, 310–315 development, 315–321 problems with ambiguity, 317 treebanks and grammars, 315–317 weighted grammar, 318–321 dilemmas in sentence structure analysis, 292–295 resources for further reading, 322 scaling up, 315 grammatical category, 328 graphical displays of data conditional frequency distributions, 56 Matplotlib, 168–170 graphs defining and manipulating, 170 directed acyclic graphs, 338 greedy sequence classification, 232 Gutenberg Corpus, 40–42, 80 G hapaxes, 19 hash arrays, 189, 190 (see also dictionaries) gaps, 349 H General Index | 469 head of a sentence, 310 criteria for head and dependencies, 312 heads, lexical, 347 headword (lemma), 60 Heldout Estimation, 249 hexadecimal notation for Unicode string literal, 95 Hidden Markov Models, 233 higher-order functions, 151 holonyms, 70 homonyms, 60 HTML documents, 82 HTML markup, stripping out, 418 hypernyms, 70 searching corpora for, 106 semantic similarity and, 72 hyphens in tokenization, 110 hyponyms, 69 I identifiers for variables, 15 idioms, Python, 24 IDLE (Interactive DeveLopment Environment), 2 if . . . elif statements, 133 if statements, 25 combining with for statements, 26 conditions in, 133 immediate constituents, 297 immutable, 93 implication (->) operator, 368 in operator, 91 Inaugural Address Corpus, 45 inconsistent, 366 indenting code, 138 independence assumption, 248 naivete of, 249 indexes counting from zero (0), 12 list, 12–14 mapping dictionary definition to lexeme, 419 speeding up program by using, 163 string, 15, 89, 91 text index created using a stemmer, 107 words containing a given consonant-vowel pair, 103 inference, 369 information extraction, 261–289 470 | General Index architecture of system, 263 chunking, 264–270 defined, 262 developing and evaluating chunkers, 270– 278 named entity recognition, 281–284 recursion in linguistic structure, 278–281 relation extraction, 284 resources for further reading, 286 information gain, 243 inside, outside, begin tags (see IOB tags) integer ordinal, finding for character, 95 interpreter >>> prompt, 2 accessing, 2 using text editor instead of to write programs, 56 inverted clauses, 348 IOB tags, 269, 286 reading, 270–272 is operator, 145 testing for object identity, 132 ISO 639 language codes, 65 iterative optimization techniques, 251 J joint classifier models, 231 joint-features (maximum entropy model), 252 K Kappa coefficient (k), 414 keys, 65, 191 complex, 196 keyword arguments, 153 Kleene closures, 100 L lambda expressions, 150, 386–390 example, 152 lambda operator (λ), 386 Lancaster stemmer, 107 language codes, 65 language output, generating, 29 language processing, symbol processing versus, 442 language resources describing using OLAC metadata, 435–437 LanguageLog (linguistics blog), 35 latent semantic analysis, 171 Latin-2 character encoding, 94 leaf nodes, 242 left-corner parser, 306 left-recursive, 302 lemmas, 60 lexical relationships between, 71 pairing of synset with a word, 68 lemmatization, 107 example of, 108 length of a text, 7 letter trie, 162 lexical categories, 179 lexical entry, 60 lexical relations, 70 lexical resources comparative wordlists, 65 pronouncing dictionary, 63–65 Shoebox and Toolbox lexicons, 66 wordlist corpora, 60–63 lexicon, 60 (see also lexical resources) chunking Toolbox lexicon, 434 defined, 60 validating in Toolbox, 432–435 LGB rule of name resolution, 145 licensed, 350 likelihood ratios, 224 Linear-Chain Conditional Random Field Models, 233 linguistic objects, mappings from keys to values, 190 linguistic patterns, modeling, 255 linguistics and NLP-related concepts, resources for, 34 list comprehensions, 24 for statement in, 63 function invoked in, 64 used as function parameters, 55 lists, 10 appending item to, 11 concatenating, using + operator, 11 converting to strings, 116 indexing, 12–14 indexing, dictionaries versus, 189 normalizing and sorting, 86 Python list type, 86 sorted, 14 strings versus, 92 tuples versus, 136 local variables, 58 logic first-order, 372–385 natural language, semantics, and, 365–368 propositional, 368–371 resources for further reading, 404 logical constants, 372 logical form, 368 logical proofs, 370 loops, 26 looping with conditions, 26 lowercase, converting text to, 45, 107 M machine learning application to NLP, web pages for government challenges, 257 decision trees, 242–245 Maximum Entropy classifiers, 251–254 naive Bayes classifiers, 246–250 packages, 237 resources for further reading, 257 supervised classification, 221–237 machine translation (MT) limitations of, 30 using NLTK’s babelizer, 30 mapping, 189 Matplotlib package, 168–170 maximal projection, 347 Maximum Entropy classifiers, 251–254 Maximum Entropy Markov Models, 233 Maximum Entropy principle, 253 memoization, 167 meronyms, 70 metadata, 435 OLAC (Open Language Archives Community), 435 modals, 186 model building, 383 model checking, 379 models interpretation of sentences of logical language, 371 of linguistic patterns, 255 representation using set theory, 367 truth-conditional semantics in first-order logic, 377 General Index | 471 what can be learned from models of language, 255 modifiers, 314 modules defined, 59 multimodule programs, 156 structure of Python module, 154 morphological analysis, 213 morphological cues to word category, 211 morphological tagging, 214 morphosyntactic information in tagsets, 212 MSWord, text from, 85 mutable, 93 N \n newline character in regular expressions, 111 n-gram tagging, 203–208 across sentence boundaries, 208 combining taggers, 205 n-gram tagger as generalization of unigram tagger, 203 performance limitations, 206 separating training and test data, 203 storing taggers, 206 unigram tagging, 203 unknown words, 206 naive Bayes assumption, 248 naive Bayes classifier, 246–250 developing for gender identification task, 223 double-counting problem, 250 as generative classifier, 254 naivete of independence assumption, 249 non-binary features, 249 underlying probabilistic model, 248 zero counts and smoothing, 248 name resolution, LGB rule for, 145 named arguments, 152 named entities commonly used types of, 281 relations between, 284 named entity recognition (NER), 281–284 Names Corpus, 61 negative lookahead assertion, 284 NER (see named entity recognition) nested code blocks, 25 NetworkX package, 170 new words in languages, 212 472 | General Index newlines, 84 matching in regular expressions, 109 printing with print statement, 90 resources for further information, 122 non-logical constants, 372 non-standard words, 108 normalizing text, 107–108 lemmatization, 108 using stemmers, 107 noun phrase (NP), 297 noun phrase (NP) chunking, 264 regular expression–based NP chunker, 267 using unigram tagger, 272 noun phrases, quantified, 390 nouns categorizing and tagging, 184 program to find most frequent noun tags, 187 syntactic agreement, 329 numerically intense algorithms in Python, increasing efficiency of, 257 NumPy package, 171 O object references, 130 copying, 132 objective function, 114 objects, finding data type for, 86 OLAC metadata, 74, 435 definition of metadata, 435 Open Language Archives Community, 435 Open Archives Initiative (OAI), 435 open class, 212 open formula, 374 Open Language Archives Community (OLAC), 435 operators, 369 (see also names of individual operators) addition and multiplication, 88 Boolean, 368 numerical comparison, 22 scope of, 157 word comparison, 23 or operator, 24 orthography, 328 out-of-vocabulary items, 206 overfitting, 225, 245 P packages, 59 parameters, 57 call-by-value parameter passing, 144 checking types of, 146 defined, 9 defining for functions, 143 parent nodes, 279 parsing, 318 (see also grammars) with context-free grammar left-corner parser, 306 recursive descent parsing, 303 shift-reduce parsing, 304 well-formed substring tables, 307–310 Earley chart parser, parsing feature-based grammars, 334 parsers, 302 projective dependency parser, 311 part-of-speech tagging (see POS tagging) partial information, 341 parts of speech, 179 PDF text, 85 Penn Treebank Corpus, 51, 315 personal pronouns, 186 philosophical divides in contemporary NLP, 444 phonetics computer-readable phonetic alphabet (SAMPA), 137 phones, 63 resources for further information, 74 phrasal level, 347 phrasal projections, 347 pipeline for NLP, 31 pixel images, 169 plotting functions, Matplotlib, 168 Porter stemmer, 107 POS (part-of-speech) tagging, 179, 208, 229 (see also tagging) differences in POS tagsets, 213 examining word context, 230 finding IOB chunk tag for word's POS tag, 272 in information retrieval, 263 morphology in POS tagsets, 212 resources for further reading, 214 simplified tagset, 183 storing POS tags in tagged corpora, 181 tagged data from four Indian languages, 182 unsimplifed tags, 187 use in noun phrase chunking, 265 using consecutive classifier, 231 pre-sorting, 160 precision, evaluating search tasks for, 239 precision/recall trade-off in information retrieval, 205 predicates (first-order logic), 372 prepositional phrase (PP), 297 prepositional phrase attachment ambiguity, 300 Prepositional Phrase Attachment Corpus, 316 prepositions, 186 present participles, 211 Principle of Compositionality, 385, 443 print statements, 89 newline at end, 90 string formats and, 117 prior probability, 246 probabilistic context-free grammar (PCFG), 320 probabilistic model, naive Bayes classifier, 248 probabilistic parsing, 318 procedural style, 139 processing pipeline (NLP), 86 productions in grammars, 293 rules for writing CFGs for parsing in NLTK, 301 program development, 154–160 debugging techniques, 158 defensive programming, 159 multimodule programs, 156 Python module structure, 154 sources of error, 156 programming style, 139 programs, writing, 129–177 advanced features of functions, 149–154 algorithm design, 160–167 assignment, 130 conditionals, 133 equality, 132 functions, 142–149 resources for further reading, 173 sequences, 133–138 style considerations, 138–142 legitimate uses for counters, 141 procedural versus declarative style, 139 General Index | 473 Python coding style, 138 summary of important points, 172 using Python libraries, 167–172 Project Gutenberg, 80 projections, 347 projective, 311 pronouncing dictionary, 63–65 pronouns anaphoric antecedents, 397 interpreting in first-order logic, 373 resolving in discourse processing, 401 proof goal, 376 properties of linguistic categories, 331 propositional logic, 368–371 Boolean operators, 368 propositional symbols, 368 pruning decision nodes, 245 punctuation, classifier for, 233 Python carriage return and linefeed characters, 80 codecs module, 95 dictionary data structure, 65 dictionary methods, summary of, 197 documentation, 173 documentation and information resources, 34 ElementTree module, 427 errors in understanding semantics of, 157 finding type of any object, 86 getting started, 2 increasing efficiency of numerically intense algorithms, 257 libraries, 167–172 CSV, 170 Matplotlib, 168–170 NetworkX, 170 NumPy, 171 other, 172 reference materials, 122 style guide for Python code, 138 textwrap module, 120 Python Package Index, 172 Q quality control in corpus creation, 413 quantification first-order logic, 373, 380 quantified noun phrases, 390 scope ambiguity, 381, 394–397 474 | General Index quantified formulas, interpretation of, 380 questions, answering, 29 quotation marks in strings, 87 R random text generating in various styles, 6 generating using bigrams, 55 raster (pixel) images, 169 raw strings, 101 raw text, processing, 79–128 capturing user input, 85 detecting word patterns with regular expressions, 97–101 formatting from lists to strings, 116–121 HTML documents, 82 NLP pipeline, 86 normalizing text, 107–108 reading local files, 84 regular expressions for tokenizing text, 109– 112 resources for further reading, 122 RSS feeds, 83 search engine results, 82 segmentation, 112–116 strings, lowest level text processing, 87–93 summary of important points, 121 text from web and from disk, 80 text in binary formats, 85 useful applications of regular expressions, 102–106 using Unicode, 93–97 raw( ) function, 41 re module, 101, 110 recall, evaluating search tasks for, 240 Recognizing Textual Entailment (RTE), 32, 235 exploiting word context, 230 records, 136 recursion, 161 function to compute Sanskrit meter (example), 165 in linguistic structure, 278–281 tree traversal, 280 trees, 279–280 performance and, 163 in syntactic structure, 301 recursive, 301 recursive descent parsing, 303 reentrancy, 340 references (see object references) regression testing framework, 160 regular expressions, 97–106 character class and other symbols, 110 chunker based on, evaluating, 272 extracting word pieces, 102 finding word stems, 104 matching initial and final vowel sequences and all consonants, 102 metacharacters, 101 metacharacters, summary of, 101 noun phrase (NP) chunker based on, 265 ranges and closures, 99 resources for further information, 122 searching tokenized text, 105 symbols, 110 tagger, 199 tokenizing text, 109–112 use in PlaintextCorpusReader, 51 using basic metacharacters, 98 using for relation extraction, 284 using with conditional frequency distributions, 103 relation detection, 263 relation extraction, 284 relational operators, 22 reserved words, 15 return statements, 144 return value, 57 reusing code, 56–59 creating programs using a text editor, 56 functions, 57 modules, 59 Reuters Corpus, 44 root element (XML), 427 root hypernyms, 70 root node, 242 root synsets, 69 Rotokas language, 66 extracting all consonant-vowel sequences from words, 103 Toolbox file containing lexicon, 429 RSS feeds, 83 feedparser library, 172 RTE (Recognizing Textual Entailment), 32, 235 exploiting word context, 230 runtime errors, 13 S \s whitespace characters in regular expressions, 111 \S nonwhitespace characters in regular expressions, 111 SAMPA computer-readable phonetic alphabet, 137 Sanskrit meter, computing, 165 satisfies, 379 scope of quantifiers, 381 scope of variables, 145 searches binary search, 160 evaluating for precision and recall, 239 processing search engine results, 82 using POS tags, 187 segmentation, 112–116 in chunking and tokenization, 264 sentence, 112 word, 113–116 semantic cues to word category, 211 semantic interpretations, NLTK functions for, 393 semantic role labeling, 29 semantics natural language, logic and, 365–368 natural language, resources for information, 403 semantics of English sentences, 385–397 quantifier ambiguity, 394–397 transitive verbs, 391–394 ⋏-calculus, 386–390 SemCor tagging, 214 sentence boundaries, tagging across, 208 sentence segmentation, 112, 233 in chunking, 264 in information retrieval process, 263 sentence structure, analyzing, 291–326 context-free grammar, 298–302 dependencies and dependency grammar, 310–315 grammar development, 315–321 grammatical dilemmas, 292 parsing with context-free grammar, 302– 310 resources for further reading, 322 summary of important points, 321 syntax, 295–298 sents( ) function, 41 General Index | 475 sequence classification, 231–233 other methods, 233 POS tagging with consecutive classifier, 232 sequence iteration, 134 sequences, 133–138 combining different sequence types, 136 converting between sequence types, 135 operations on sequence types, 134 processing using generator expressions, 137 strings and lists as, 92 shift operation, 305 shift-reduce parsing, 304 Shoebox, 66, 412 sibling nodes, 279 signature, 373 similarity, semantic, 71 Sinica Treebank Corpus, 316 slash categories, 350 slicing lists, 12, 13 strings, 15, 90 smoothing, 249 space-time trade-offs in algorihm design, 163 spaces, matching in regular expressions, 109 Speech Synthesis Markup Language (W3C SSML), 214 spellcheckers, Words Corpus used by, 60 spoken dialogue systems, 31 spreadsheets, obtaining data from, 418 SQL (Structured Query Language), 362 translating English sentence to, 362 stack trace, 158 standards for linguistic data creation, 421 standoff annotation, 415, 421 start symbol for grammars, 298, 334 startswith( ) function, 45 stemming, 107 NLTK HOWTO, 122 stemmers, 107 using regular expressions, 104 using stem( ) fuinction, 105 stopwords, 60 stress (in pronunciation), 64 string formatting expressions, 117 string literals, Unicode string literal in Python, 95 strings, 15, 87–93 476 | General Index accessing individual characters, 89 accessing substrings, 90 basic operations with, 87–89 converting lists to, 116 formats, 117–118 formatting lining things up, 118 tabulating data, 119 immutability of, 93 lists versus, 92 methods, 92 more operations on, useful string methods, 92 printing, 89 Python’s str data type, 86 regular expressions as, 101 tokenizing, 86 structurally ambiguous sentences, 300 structure sharing, 340 interaction with unification, 343 structured data, 261 style guide for Python code, 138 stylistics, 43 subcategories of verbs, 314 subcategorization, 344–347 substrings (WFST), 307 substrings, accessing, 90 subsumes, 341 subsumption, 341–344 suffixes, classifier for, 229 supervised classification, 222–237 choosing features, 224–227 documents, 227 exploiting context, 230 gender identification, 222 identifying dialogue act types, 235 part-of-speech tagging, 229 Recognizing Textual Entailment (RTE), 235 scaling up to large datasets, 237 sentence segmentation, 233 sequence classification, 231–233 Swadesh wordlists, 65 symbol processing, language processing versus, 442 synonyms, 67 synsets, 67 semantic similarity, 71 in WordNet concept hierarchy, 69 syntactic agreement, 329–331 syntactic cues to word category, 211 syntactic structure, recursion in, 301 syntax, 295–298 syntax errors, 3 T \t tab character in regular expressions, 111 T9 system, entering text on mobile phones, 99 tabs avoiding in code indentation, 138 matching in regular expressions, 109 tag patterns, 266 matching, precedence in, 267 tagging, 179–219 adjectives and adverbs, 186 combining taggers, 205 default tagger, 198 evaluating tagger performance, 201 exploring tagged corpora, 187–189 lookup tagger, 200–201 mapping words to tags using Python dictionaries, 189–198 nouns, 184 part-of-speech (POS) tagging, 229 performance limitations, 206 reading tagged corpora, 181 regular expression tagger, 199 representing tagged tokens, 181 resources for further reading, 214 across sentence boundaries, 208 separating training and testing data, 203 simplified part-of-speech tagset, 183 storing taggers, 206 transformation-based, 208–210 unigram tagging, 202 unknown words, 206 unsimplified POS tags, 187 using POS (part-of-speech) tagger, 179 verbs, 185 tags in feature structures, 340 IOB tags representing chunk structures, 269 XML, 425 tagsets, 179 morphosyntactic information in POS tagsets, 212 simplified POS tagset, 183 terms (first-order logic), 372 test sets, 44, 223 choosing for classification models, 238 testing classifier for document classification, 228 text, 1 computing statistics from, 16–22 counting vocabulary, 7–10 entering on mobile phones (T9 system), 99 as lists of words, 10–16 searching, 4–7 examining common contexts, 5 text alignment, 30 text editor, creating programs with, 56 textonyms, 99 textual entailment, 32 textwrap module, 120 theorem proving in first order logic, 375 timeit module, 164 TIMIT Corpus, 407–412 tokenization, 80 chunking and, 264 in information retrieval, 263 issues with, 111 list produced from tokenizing string, 86 regular expressions for, 109–112 representing tagged tokens, 181 segmentation and, 112 with Unicode strings as input and output, 97 tokenized text, searching, 105 tokens, 8 Toolbox, 66, 412, 431–435 accessing data from XML, using ElementTree, 429 adding field to each entry, 431 resources for further reading, 438 validating lexicon, 432–435 tools for creation, publication, and use of linguistic data, 421 top-down approach to dynamic programming, 167 top-down parsing, 304 total likelihood, 251 training classifier, 223 classifier for document classification, 228 classifier-based chunkers, 274–278 taggers, 203 General Index | 477 unigram chunker using CoNLL 2000 Chunking Corpus, 273 training sets, 223, 225 transformation-based tagging, 208–210 transitive verbs, 314, 391–394 translations comparative wordlists, 66 machine (see machine translation) treebanks, 315–317 trees, 279–281 representing chunks, 270 traversal of, 280 trie, 162 trigram taggers, 204 truth conditions, 368 truth-conditional semantics in first-order logic, 377 tuples, 133 lists versus, 136 parentheses with, 134 representing tagged tokens, 181 Turing Test, 31, 368 type-raising, 390 type-token distinction, 8 TypeError, 157 types, 8, 86 (see also data types) types (first-order logic), 373 U unary predicate, 372 unbounded dependency constructions, 349– 353 defined, 350 underspecified, 333 Unicode, 93–97 decoding and encoding, 94 definition and description of, 94 extracting gfrom files, 94 resources for further information, 122 using your local encoding in Python, 97 unicodedata module, 96 unification, 342–344 unigram taggers confusion matrix for, 240 noun phrase chunking with, 272 unigram tagging, 202 lookup tagger (example), 200 separating training and test data, 203 478 | General Index unique beginners, 69 Universal Feed Parser, 83 universal quantifier, 374 unknown words, tagging, 206 updating dictionary incrementally, 195 US Presidential Inaugural Addresses Corpus, 45 user input, capturing, 85 V valencies, 313 validity of arguments, 369 validity of XML documents, 426 valuation, 377 examining quantifier scope ambiguity, 381 Mace4 model converted to, 384 valuation function, 377 values, 191 complex, 196 variables arguments of predicates in first-order logic, 373 assignment, 378 bound by quantifiers in first-order logic, 373 defining, 14 local, 58 naming, 15 relabeling bound variables, 389 satisfaction of, using to interpret quantified formulas, 380 scope of, 145 verb phrase (VP), 297 verbs agreement paradigm for English regular verbs, 329 auxiliary, 336 auxiliary verbs and inversion of subject and verb, 348 categorizing and tagging, 185 examining for dependency grammar, 312 head of sentence and dependencies, 310 present participle, 211 transitive, 391–394 W \W non-word characters in Python, 110, 111 \w word characters in Python, 110, 111 web text, 42 Web, obtaining data from, 416 websites, obtaining corpora from, 416 weighted grammars, 318–321 probabilistic context-free grammar (PCFG), 320 well-formed (XML), 425 well-formed formulas, 368 well-formed substring tables (WFST), 307– 310 whitespace regular expression characters for, 109 tokenizing text on, 109 wildcard symbol (.), 98 windowdiff scorer, 414 word classes, 179 word comparison operators, 23 word occurrence, counting in text, 8 word offset, 45 word processor files, obtaining data from, 417 word segmentation, 113–116 word sense disambiguation, 28 word sequences, 7 wordlist corpora, 60–63 WordNet, 67–73 concept hierarchy, 69 lemmatizer, 108 more lexical relations, 70 semantic similarity, 71 visualization of hypernym hierarchy using Matplotlib and NetworkX, 170 Words Corpus, 60 words( ) function, 40 wrapping text, 120 Z zero counts (naive Bayes classifier), 249 zero projection, 347 X XML, 425–431 ElementTree interface, 427–429 formatting entries, 430 representation of lexical entry from chunk parsing Toolbox record, 434 resources for further reading, 438 role of, in using to represent linguistic structures, 426 using ElementTree to access Toolbox data, 429 using for linguistic structures, 425 validity of documents, 426 General Index | 479 About the Authors Steven Bird is Associate Professor in the Department of Computer Science and Software Engineering at the University of Melbourne, and Senior Research Associate in the Linguistic Data Consortium at the University of Pennsylvania.

Further readings in quantitative data analysis in linguistics are: (Baayen, 2008), (Gries, 2009), and (Woods, Fletcher, & Hughes, 1986). The original description of WordNet is (Fellbaum, 1998). Although WordNet was originally developed for research in psycholinguistics, it is now widely used in NLP and Information Retrieval. WordNets are being developed for many other languages, as documented at http://www.globalwordnet.org/. For a study of WordNet similarity measures, see (Budanitsky & Hirst, 2006). Other topics touched on in this chapter were phonetics and lexical semantics, and we refer readers to Chapters 7 and 20 of (Jurafsky & Martin, 2008). 2.8 Exercises 1. ○ Create a variable phrase containing a list of words.


Bootstrapping: Douglas Engelbart, Coevolution, and the Origins of Personal Computing (Writing Science) by Thierry Bardini

Apple II, augmented reality, Bill Duvall, Charles Babbage, classic study, Compatible Time-Sharing System, Computing Machinery and Intelligence, conceptual framework, Donald Davies, Douglas Engelbart, Douglas Engelbart, Dynabook, experimental subject, Grace Hopper, hiring and firing, hypertext link, index card, information retrieval, invention of hypertext, Ivan Sutherland, Jaron Lanier, Jeff Rulifson, John von Neumann, knowledge worker, Leonard Kleinrock, Menlo Park, military-industrial complex, Mother of all demos, Multics, new economy, Norbert Wiener, Norman Mailer, packet switching, Project Xanadu, QWERTY keyboard, Ralph Waldo Emerson, RAND corporation, RFC: Request For Comment, Sapir-Whorf hypothesis, Silicon Valley, Steve Crocker, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, stochastic process, Ted Nelson, the medium is the message, theory of mind, Turing test, unbiased observer, Vannevar Bush, Whole Earth Catalog, work culture

I was trying to explain what I wanted to do and one guy just kept telling me, "You are just givIng fancy names to information retrieval. Why do that? Why don't you just admit that it's information retrieval and get on with the rest of it and make it all work?" He was getting kind of nasty. The other guy was trying to get him to back off. (Engelbart I996) It seems difficult to dispute, therefore, that the Memex was not conceived as a medium, only as a personal "tool" for information retrieval. Personal ac- cess to information was emphasized over communication. The later research of Ted Nelson on hypertext is very representative of that emphasis. 4 It is problematic, however, to grant Bush the status of the "unique forefa- ther" of computerized hypertext systems.

The regnant term at the time for what Bush was proposing was indeed "in- formation retrieval," and Engelbart himself has testified to the power that a preconceived notion of information retrieval held for creating misunderstand- ing of his work on hypertext networks: I started trying to reach out to make connections in domains of interest and con- cerns out there that fit along the vector I was interested in. I went to the informa- tion retrieval people. I remember one instance when I went to the Ford Founda- tion's Center for Advanced Study in Social Sciences to see somebody who was there for a year, who was into informatIon retrieval. We sat around. In fact, at coffee break, there were about five people sitting there.

The difference in objectives signals the difference in means that char- acterized the two approaches. The first revolved around the "association" of ideas on the model of how the individual mind is supposed to work. The sec- ond revolved around the intersubjective "connection" of words in the systems of natural languages. What actually differentiates hypertext systems from information -retrieval systems is not the process of "association," the term Bush proposed as analo- gous to the way the individual mind works. Instead, what constitutes a hyper- text system is clear in the definition of hypertext already cited: "a style of building systems for information representation and management around a network of nodes connected together by typed l,nks."


pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs

Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, Free Software Foundation, game design, information retrieval, iterative process, language acquisition, machine readable, machine translation, natural language processing, pattern recognition, performance metric, power law, sentiment analysis, social web, sparse data, speech recognition, statistical model, text mining

Literary researchers begin compiling systematic collections of the complete works of different authors. Key Word in Context (KWIC) is invented as a means of indexing documents and creating concordances. 1960s: Kucera and Francis publish A Standard Corpus of Present-Day American English (the Brown Corpus), the first broadly available large corpus of language texts. Work in Information Retrieval (IR) develops techniques for statistical similarity of document content. 1970s: Stochastic models developed from speech corpora make Speech Recognition systems possible. The vector space model is developed for document indexing. The London-Lund Corpus (LLC) is developed through the work of the Survey of English Usage. 1980s: The Lancaster-Oslo-Bergen (LOB) Corpus, designed to match the Brown Corpus in terms of size and genres, is compiled.

They are also used in speech disambiguation—if a person speaks unclearly but utters a sequence that does not commonly (or ever) occur in the language being spoken, an n-gram model can help recognize that problem and find the words that the speaker probably intended to say. Another modern corpus is ClueWeb09 (http://lemurproject.org/clueweb09.php/), a dataset “created to support research on information retrieval and related human language technologies. It consists of about 1 billion web pages in ten languages that were collected in January and February 2009.” This corpus is too large to use for an annotation project (it’s about 25 terabytes uncompressed), but some projects have taken parts of the dataset (such as a subset of the English websites) and used them for research (Pomikálek et al. 2012).

So the first word in the ranking occurs about twice as often as the second word in the ranking, and three times as often as the third word in the ranking, and so on. N-grams In this section we introduce the notion of an n-gram. N-grams are important for a wide range of applications in Natural Language Processing (NLP), because fairly straightforward language models can be built using them, for speech, Machine Translation, indexing, Information Retrieval (IR), and, as we will see, classification. Imagine that we have a string of tokens, W, consisting of the elements w1, w2, … , wn. Now consider a sliding window over W. If the sliding window consists of one cell (wi), then the collection of one-cell substrings is called the unigram profile of the string; there will be as many unigram profiles as there are elements in the string.


pages: 392 words: 108,745

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think by James Vlahos

Albert Einstein, AltaVista, Amazon Mechanical Turk, Amazon Web Services, augmented reality, Automated Insights, autonomous vehicles, backpropagation, Big Tech, Cambridge Analytica, Chuck Templeton: OpenTable:, cloud computing, Colossal Cave Adventure, computer age, deep learning, DeepMind, Donald Trump, Elon Musk, fake news, Geoffrey Hinton, information retrieval, Internet of things, Jacques de Vaucanson, Jeff Bezos, lateral thinking, Loebner Prize, machine readable, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Mark Zuckerberg, Menlo Park, natural language processing, Neal Stephenson, Neil Armstrong, OpenAI, PageRank, pattern recognition, Ponzi scheme, randomized controlled trial, Ray Kurzweil, Ronald Reagan, Rubik’s Cube, self-driving car, sentiment analysis, Silicon Valley, Skype, Snapchat, speech recognition, statistical model, Steve Jobs, Steve Wozniak, Steven Levy, TechCrunch disrupt, Turing test, Watson beat the top human players on Jeopardy!

., 68 Hollingshead, John, 68 Holocaust survivors, 272–74 holograms, 273 Holtzman, Ari, 163–64 HomePod, 218, 225, 280 homonyms, 112 homophones, 97 Horsley, Scott, 214–15 Houdin, Jean-Eugène Robert, 19 Houston, Farah, 133, 134 Huffman, Scott, 49 human brain, 86–87 Hunt, Troy, 230 I IBM, 3, 71, 97, 108, 205 ICT (Institute for Creative Technologies), 244–46, 272–74 ImageNet Large-Scale Visual Recognition Challenge, 93–94 image recognition, 87–88, 90, 91–94, 103 immortality, virtual. See virtual immortality information retrieval (IR), 103–4, 146, 149–50, 160 InspiroBot, 108 Institute for Creative Technologies (ICT), 244–46, 272–74 intents, 257, 262 interactive voice response (IVR), 127 Internet of Things, 21–22 internet search technology, 3, 26, 54, 199–200, 203, 212, 278. See also question answering Invoke, 281 iPhone Evi app on, 203–4 sales of, 45 Siri and, 8, 17–18, 37, 47, 50, 212, 225 speech recognition and, 95 unveiling of, 7, 25 voice search app, 48 IR (information retrieval), 103–4, 146, 149–50, 160 Iris, 29 Irson, Thomas, 65 Isbitski, David, xvi Ishiguro, Hiroshi, 190–91 Ivona, 41–42 IVR (interactive voice response), 127 J Jack in the Box, 46 Jackson, Samuel L., 46 Jacob, Oren, 134, 171–73, 196, 253 Jarvis, 51 Jobs, Steve, 7, 34–37, 47, 48, 172 journalism, AI, 214–16 Julia (chatbot), 80–84, 98 Julia, Luc, 47 K Kahn, Peter, 192, 244 Karim (therapist chatbot), 246 Kasisto, 132 Kay, Tim, ix–x, xii–xiii, 13 Kelly, John, 110 Kempelen, Wolfgang von, 65–67, 69 Kim Jong Un, 217 Kindle, 41 Kismet (robot), 191–92 Kittlaus, Dag, 23–29, 32–37, 46–47, 55, 279 Kleber, Sophie, 278 Klein, Stephen, xiii knowledge, control of, 220 knowledge-based AI, 76–78, 84, 159, 161–63 Knowledge Graph, 204, 206, 212 knowledge graphs, 201–2, 204–5, 213 Knowledge Navigator, 16–18, 27 Krizhevsky, Alex, 93, 94 Kunze, Lauren, 256 Kurzweil, Fredric, 274–76 Kurzweil, Ray, 274–76, 277 Kuyda, Eugenia, 186–88, 196 Kuznetsov, Phillip, 261 Kylie.ai, 107 L L2, 209 language and human species, 4, 285–86 language models for ASR, 96–97 Lasseter, John, 172 Lawson, Lindsey, 169, 182 Le, Quoc, 93, 97, 105–6, 254 LeCun, Yann, 89, 91–92, 93–94, 161 Lemon, Oliver, 145–46, 148, 158, 159 Lenat, Doug, 161, 162 Levitan, Peter, ix, xi, xii–xiii Levy, Steven, 79 Lewis, Thor, 138 Lieberman, Philip, 14 LifePod, 239 Lindbeck, Erica, 179–80 Lindsay, Al, 41, 44 linguistics, 127 linguistics, computational, 72 lip reading, 98 Loebner Prize competition, 82–84, 142, 160, 285 long short-term memory (LSTM), 106 Loup Ventures, 213 Love, Rachel, 271 LSTM (long short-term memory), 106 Luka, 186–87 Lycos, 79 Lyrebird, 114–15, 217 M M (virtual assistant), 51–52 machine learning.

The simplest method for getting it to reply is for it to fire off a line of dialogue that its programmer authored in advance. People from Weizenbaum on have done this; even Siri, Alexa, and the Assistant use some prescripted content. But this technique is laborious and limited to the narrow pool of conversational situations designers imagine in advance. A more scalable technique is information retrieval, or IR, in which the AI grabs a suitable response from a database or web page. Because there’s so much content online, IR gives machines vastly more to say than if they were limited to hand-authored utterances. The technique can also be combined with the scripted approach, filling blanks within prewritten templates.

For instance, responding to a question about the weather, a voice assistant might say, “It’ll be sunny with a high of 78. Looks like a great day to go outside!” In that case, the specifics (“sunny,” “78”) were retrieved from a weather service while the surrounding words (“great day to go outside”) were manually authored as reusable boilerplate. Voice AI creators use information retrieval more than any other technique, and IR will pop up again later in this book. So we will focus now on an intriguing new method in which responses are neither written out in advance nor cherry-picked from some preexisting source. For what are known as generative methods, computers use deep learning to come up with words all on their own.


pages: 205 words: 20,452

Data Mining in Time Series Databases by Mark Last, Abraham Kandel, Horst Bunke

backpropagation, call centre, computer vision, discrete time, G4S, information retrieval, iterative process, NP-complete, p-value, pattern recognition, random walk, sensor fusion, speech recognition, web application

Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Database. Proc. 21st Int. Conf. on Very Large Databases (VLDB), pp. 490– 501. 3. Baeza-Yates, R. and Gonnet, G.H. (1999). A Fast Algorithm on Average for All-Against-All Sequence Matching. Proc. 6th String Processing and Information Retrieval Symposium (SPIRE), pp. 16–23. 4. Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. ACM Press/Addison–Wesley Longman Limited. 5. Chakrabarti, K. and Mehrotra, S. (1999). The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. Proc. 15th Int. Conf. on Data Engineering (ICDE), pp. 440–447. 6. Chan, K. and Fu, A.W. (1999).

Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining, AAAI Press, pp. 239–241. 14. Keogh, E. and Pazzani, M. (1999). Relevance Feedback Retrieval of Time Series Data. Proceedings of the 22th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 183–190. 15. Keogh, E. and Smyth, P. (1997). A Probabilistic Approach to Fast Pattern Matching in Time Series Databases. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining, pp. 24–20. 16. Last, M., Klein, Y., and Kandel, A. (2001). Knowledge Discovery in Time Series Databases.

Such similarity-based retrieval has attracted a great deal of attention in recent years. Although several different approaches have appeared, most are based on the common premise of dimensionality reduction and spatial access methods. This chapter gives an overview of recent research and shows how the methods fit into a general context of signature extraction. Keywords: Information retrieval; sequence databases; similarity search; spatial indexing; time sequences. 1. Introduction Time sequences arise in many applications—any applications that involve storing sensor inputs, or sampling a value that changes over time. A problem which has received an increasing amount of attention lately is the problem of similarity retrieval in databases of time sequences, so-called “query by example.”


pages: 1,082 words: 87,792

Python for Algorithmic Trading: From Idea to Cloud Deployment by Yves Hilpisch

algorithmic trading, Amazon Web Services, automated trading system, backtesting, barriers to entry, bitcoin, Brownian motion, cloud computing, coronavirus, cryptocurrency, data science, deep learning, Edward Thorp, fiat currency, global macro, Gordon Gekko, Guido van Rossum, implied volatility, information retrieval, margin call, market microstructure, Myron Scholes, natural language processing, paper trading, passive investing, popular electronics, prediction markets, quantitative trading / quantitative finance, random walk, risk free rate, risk/return, Rubik’s Cube, seminal paper, Sharpe ratio, short selling, sorting algorithm, systematic trading, transaction costs, value at risk

Index A absolute maximum drawdown, Case Study AdaBoost algorithm, Vectorized Backtesting addition (+) operator, Data Types adjusted return appraisal ratio, Algorithmic Trading algorithmic trading (generally)advantages of, Algorithmic Trading basics, Algorithmic Trading-Algorithmic Trading strategies, Trading Strategies-Conclusions alpha seeking strategies, Trading Strategies alpha, defined, Algorithmic Trading anonymous functions, Python Idioms API key, for data sets, Working with Open Data Sources-Working with Open Data Sources Apple, Inc.intraday stock prices, Getting into the Basics reading stock price data from different sources, Reading Financial Data From Different Sources-Reading from Excel and JSON retrieving historical unstructured data about, Retrieving Historical Unstructured Data-Retrieving Historical Unstructured Data app_key, for Eikon Data API, Eikon Data API AQR Capital Management, pandas and the DataFrame Class arithmetic operations, Data Types array programming, Making Use of Vectorization(see also vectorization) automated trading operations, Automating Trading Operations-Strategy Monitoringcapital management, Capital Management-Kelly Criterion for Stocks and Indices configuring Oanda account, Configuring Oanda Account hardware setup, Setting Up the Hardware infrastructure and deployment, Infrastructure and Deployment logging and monitoring, Logging and Monitoring-Logging and Monitoring ML-based trading strategy, ML-Based Trading Strategy-Persisting the Model Object online algorithm, Online Algorithm-Online Algorithm Python environment setup, Setting Up the Python Environment Python scripts for, Python Script-Strategy Monitoring real-time monitoring, Real-Time Monitoring running code, Running the Code uploading code, Uploading the Code visual step-by-step overview, Visual Step-by-Step Overview-Real-Time Monitoring B backtestingbased on simple moving averages, Strategies Based on Simple Moving Averages-Generalizing the Approach Python scripts for classification algorithm backtesting, Classification Algorithm Backtesting Class Python scripts for linear regression backtesting class, Linear Regression Backtesting Class vectorized (see vectorized backtesting) BacktestLongShort class, Long-Short Backtesting Class, Long-Short Backtesting Class bar charts, matplotlib bar plots (see Plotly; streaming bar plot) base class, for event-based backtesting, Backtesting Base Class-Backtesting Base Class, Backtesting Base Class Bash script, Building a Ubuntu and Python Docker Imagefor Droplet set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up for Python/Jupyter Lab installation, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Bitcoin, pandas and the DataFrame Class, Working with Open Data Sources Boolean operationsNumPy, Boolean Operations pandas, Boolean Operations C callback functions, Retrieving Streaming Data capital managementautomated trading operations and, Capital Management-Kelly Criterion for Stocks and Indices Kelly criterion for stocks and indices, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices Kelly criterion in binomial setting, Kelly Criterion in Binomial Setting-Kelly Criterion in Binomial Setting Carter, Graydon, FX Trading with FXCM CFD (contracts for difference)algorithmic trading risks, Logging and Monitoring defined, CFD Trading with Oanda risks of losses, Long-Short Backtesting Class risks of trading on margin, FX Trading with FXCM trading with Oanda, CFD Trading with Oanda-Python Script(see also Oanda) classification problemsmachine learning for, A Simple Classification Problem-A Simple Classification Problem neural networks for, The Simple Classification Problem Revisited-The Simple Classification Problem Revisited Python scripts for vectorized backtesting, Classification Algorithm Backtesting Class .close_all() method, Placing Orders cloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Upinstallation script for Python and Jupyter Lab, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Jupyter Notebook configuration file, Jupyter Notebook Configuration File RSA public/private keys, RSA Public and Private Keys script to orchestrate Droplet set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up Cocteau, Jean, Building Classes for Event-Based Backtesting comma separated value (CSV) files (see CSV files) condaas package manager, Conda as a Package Manager-Basic Operations with Conda as virtual environment manager, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager basic operations, Basic Operations with Conda-Basic Operations with Conda installing Miniconda, Installing Miniconda-Installing Miniconda conda remove, Basic Operations with Conda configparser module, The Oanda API containers (see Docker containers) contracts for difference (see CFD) control structures, Control Structures CPython, Python for Finance, Python Infrastructure .create_market_buy_order() method, Placing Orders .create_order() method, Placing Market Orders-Placing Market Orders cross-sectional momentum strategies, Strategies Based on Momentum CSV filesinput-output operations, Input-Output Operations-Input-Output Operations reading from a CSV file with pandas, Reading from a CSV File with pandas reading from a CSV file with Python, Reading from a CSV File with Python-Reading from a CSV File with Python .cummax() method, Case Study currency pairs, Logging and Monitoring(see also EUR/USD exchange rate) algorithmic trading risks, Logging and Monitoring D data science stack, Python, NumPy, matplotlib, pandas data snooping, Data Snooping and Overfitting data storageSQLite3 for, Storing Data with SQLite3-Storing Data with SQLite3 storing data efficiently, Storing Financial Data Efficiently-Storing Data with SQLite3 storing DataFrame objects, Storing DataFrame Objects-Storing DataFrame Objects TsTables package for, Using TsTables-Using TsTables data structures, Data Structures-Data Structures DataFrame class, pandas and the DataFrame Class-pandas and the DataFrame Class, Reading from a CSV File with pandas, DataFrame Class-DataFrame Class DataFrame objectscreating, Vectorization with pandas storing, Storing DataFrame Objects-Storing DataFrame Objects dataism, Preface DatetimeIndex() constructor, Plotting with pandas decision tree classification algorithm, Vectorized Backtesting deep learningadding features to analysis, Adding Different Types of Features-Adding Different Types of Features classification problem, The Simple Classification Problem Revisited-The Simple Classification Problem Revisited deep neural networks for predicting market direction, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features market movement prediction, Using Deep Learning for Market Movement Prediction-Adding Different Types of Features trading strategies and, Machine and Deep Learning deep neural networks, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features delta hedging, Algorithmic Trading dense neural network (DNN), The Simple Classification Problem Revisited, Using Deep Neural Networks to Predict Market Direction dictionary (dict) objects, Reading from a CSV File with Python, Data Structures DigitalOceancloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Up droplet setup, Setting Up the Hardware DNN (dense neural network), The Simple Classification Problem Revisited, Using Deep Neural Networks to Predict Market Direction Docker containers, Using Docker Containers-Building a Ubuntu and Python Docker Imagebuilding a Ubuntu and Python Docker image, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image defined, Docker Images and Containers Docker images versus, Docker Images and Containers Docker imagesdefined, Docker Images and Containers Docker containers versus, Docker Images and Containers Dockerfile, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image Domingos, Pedro, Automating Trading Operations Droplet, Using Cloud Instancescosts, Infrastructure and Deployment script to orchestrate set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up dynamic hedging, Algorithmic Trading E efficient market hypothesis, Predicting Market Movements with Machine Learning Eikon Data API, Eikon Data API-Retrieving Historical Unstructured Dataretrieving historical structured data, Retrieving Historical Structured Data-Retrieving Historical Structured Data retrieving historical unstructured data, Retrieving Historical Unstructured Data-Retrieving Historical Unstructured Data Euler discretization, Python Versus Pseudo-Code EUR/USD exchange ratebacktesting momentum strategy on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars evaluation of regression-based strategy, Generalizing the Approach factoring in leverage/margin, Factoring In Leverage and Margin-Factoring In Leverage and Margin gross performance versus deep learning-based strategy, Using Deep Neural Networks to Predict Market Direction-Using Deep Neural Networks to Predict Market Direction, Adding Different Types of Features-Adding Different Types of Features historical ask close prices, Retrieving Historical Data-Retrieving Historical Data historical candles data for, Retrieving Candles Data historical tick data for, Retrieving Tick Data implementing trading strategies in real time, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time logistic regression-based strategies, Generalizing the Approach placing orders, Placing Orders-Placing Orders predicting, Predicting Index Levels-Predicting Index Levels predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels retrieving streaming data for, Retrieving Streaming Data retrieving trading account information, Retrieving Account Information-Retrieving Account Information SMA calculation, Getting into the Basics-Generalizing the Approach vectorized backtesting of ML-based trading strategy, Vectorized Backtesting-Vectorized Backtesting vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy event-based backtesting, Building Classes for Event-Based Backtesting-Long-Short Backtesting Classadvantages, Building Classes for Event-Based Backtesting base class, Backtesting Base Class-Backtesting Base Class, Backtesting Base Class building classes for, Building Classes for Event-Based Backtesting-Long-Short Backtesting Class long-only backtesting class, Long-Only Backtesting Class-Long-Only Backtesting Class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class-Long-Short Backtesting Class, Long-Short Backtesting Class Python scripts for, Backtesting Base Class-Long-Short Backtesting Class Excelexporting financial data to, Exporting to Excel and JSON reading financial data from, Reading from Excel and JSON F featuresadding different types, Adding Different Types of Features-Adding Different Types of Features lags and, Using Logistic Regression to Predict Market Direction financial data, working with, Working with Financial Data-Python Scriptsdata set for examples, The Data Set Eikon Data API, Eikon Data API-Retrieving Historical Unstructured Data exporting to Excel/JSON, Exporting to Excel and JSON open data sources, Working with Open Data Sources-Working with Open Data Sources reading data from different sources, Reading Financial Data From Different Sources-Reading from Excel and JSON reading data from Excel/JSON, Reading from Excel and JSON reading from a CSV file with pandas, Reading from a CSV File with pandas reading from a CSV file with Python, Reading from a CSV File with Python-Reading from a CSV File with Python storing data efficiently, Storing Financial Data Efficiently-Storing Data with SQLite3 .flatten() method, matplotlib foreign exchange trading (see FX trading; FXCM) future returns, predicting, Predicting Future Returns-Predicting Future Returns FX trading, FX Trading with FXCM-References and Further Resources(see also EUR/USD exchange rate) FXCMFX trading, FX Trading with FXCM-References and Further Resources getting started, Getting Started placing orders, Placing Orders-Placing Orders retrieving account information, Account Information retrieving candles data, Retrieving Candles Data-Retrieving Candles Data retrieving data, Retrieving Data-Retrieving Candles Data retrieving historical data, Retrieving Historical Data-Retrieving Historical Data retrieving streaming data, Retrieving Streaming Data retrieving tick data, Retrieving Tick Data-Retrieving Tick Data working with the API, Working with the API-Account Information fxcmpy wrapper packagecallback functions, Retrieving Streaming Data installing, Getting Started tick data retrieval, Retrieving Tick Data fxTrade, CFD Trading with Oanda G GDX (VanEck Vectors Gold Miners ETF)logistic regression-based strategies, Generalizing the Approach mean-reversion strategies, Getting into the Basics-Generalizing the Approach regression-based strategies, Generalizing the Approach generate_sample_data(), Storing Financial Data Efficiently .get_account_summary() method, Retrieving Account Information .get_candles() method, Retrieving Historical Data .get_data() method, Backtesting Base Class, Retrieving Tick Data .get_date_price() method, Backtesting Base Class .get_instruments() method, Looking Up Instruments Available for Trading .get_last_price() method, Retrieving Streaming Data .get_raw_data() method, Retrieving Tick Data get_timeseries() function, Retrieving Historical Structured Data .get_transactions() method, Retrieving Account Information GLD (SPDR Gold Shares)logistic regression-based strategies, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction mean-reversion strategies, Getting into the Basics-Generalizing the Approach gold pricemean-reversion strategies, Getting into the Basics-Getting into the Basics momentum strategy and, Getting into the Basics-Getting into the Basics, Generalizing the Approach-Generalizing the Approach Goldman Sachs, Python and Algorithmic Trading, Algorithmic Trading .go_long() method, Long-Short Backtesting Class H half Kelly criterion, Optimal Leverage Harari, Yuval Noah, Preface HDF5 binary storage library, Using TsTables-Using TsTables HDFStore wrapper, Storing DataFrame Objects-Storing DataFrame Objects high frequency trading (HFQ), Algorithmic Trading histograms, matplotlib hit ratio, defined, Vectorized Backtesting I if-elif-else control structure, Python Idioms in-sample fitting, Generalizing the Approach index levels, predicting, Predicting Index Levels-Predicting Index Levels infrastructure (see Python infrastructure) installation script, Python/Jupyter Lab, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Intel Math Kernel Library, Basic Operations with Conda iterations, Control Structures J JSONexporting financial data to, Exporting to Excel and JSON reading financial data from, Reading from Excel and JSON Jupyter Labinstallation script for, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab RSA public/private keys for, RSA Public and Private Keys tools included, Using Cloud Instances Jupyter Notebook, Jupyter Notebook Configuration File K Kelly criterionin binomial setting, Kelly Criterion in Binomial Setting-Kelly Criterion in Binomial Setting optimal leverage, Optimal Leverage-Optimal Leverage stocks and indices, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices Keras, Using Deep Learning for Market Movement Prediction, Using Deep Neural Networks to Predict Market Direction, Adding Different Types of Features key-value stores, Data Structures keys, public/private, RSA Public and Private Keys L lags, The Basic Idea for Price Prediction, Using Logistic Regression to Predict Market Direction lambda functions, Python Idioms LaTeX, Python Versus Pseudo-Code leveraged trading, risks of, Factoring In Leverage and Margin, FX Trading with FXCM, Optimal Leverage linear regressiongeneralizing the approach, Generalizing the Approach market movement prediction, Using Linear Regression for Market Movement Prediction-Generalizing the Approach predicting future market direction, Predicting Future Market Direction predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels price prediction based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction review of, A Quick Review of Linear Regression scikit-learn and, Linear Regression with scikit-learn vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy, Linear Regression Backtesting Class list comprehension, Python Idioms list constructor, Data Structures list objects, Reading from a CSV File with Python, Data Structures, Regular ndarray Object logging, of automated trading operations, Logging and Monitoring-Logging and Monitoring logistic regressiongeneralizing the approach, Generalizing the Approach-Generalizing the Approach market direction prediction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction Python script for vectorized backtesting, Classification Algorithm Backtesting Class long-only backtesting class, Long-Only Backtesting Class-Long-Only Backtesting Class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class-Long-Short Backtesting Class, Long-Short Backtesting Class longest drawdown period, Risk Analysis M machine learningclassification problem, A Simple Classification Problem-A Simple Classification Problem linear regression with scikit-learn, Linear Regression with scikit-learn market movement prediction, Using Machine Learning for Market Movement Prediction-Generalizing the Approach ML-based trading strategy, ML-Based Trading Strategy-Persisting the Model Object Python scripts, Linear Regression Backtesting Class trading strategies and, Machine and Deep Learning using logistic regression to predict market direction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction macro hedge funds, algorithmic trading and, Algorithmic Trading __main__ method, Backtesting Base Class margin trading, FX Trading with FXCM market direction prediction, Predicting Future Market Direction market movement predictiondeep learning for, Using Deep Learning for Market Movement Prediction-Adding Different Types of Features deep neural networks for, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features linear regression for, Using Linear Regression for Market Movement Prediction-Generalizing the Approach linear regression with scikit-learn, Linear Regression with scikit-learn logistic regression to predict market direction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction machine learning for, Using Machine Learning for Market Movement Prediction-Generalizing the Approach predicting future market direction, Predicting Future Market Direction predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels price prediction based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy market orders, placing, Placing Market Orders-Placing Market Orders math module, Data Types mathematical functions, Data Types matplotlib, matplotlib-matplotlib, Plotting with pandas-Plotting with pandas maximum drawdown, Risk Analysis, Case Study McKinney, Wes, pandas and the DataFrame Class mean-reversion strategies, NumPy and Vectorization, Strategies Based on Mean Reversion-Generalizing the Approachbasics, Getting into the Basics-Generalizing the Approach generalizing the approach, Generalizing the Approach Python code with a class for vectorized backtesting, Momentum Backtesting Class Miniconda, Installing Miniconda-Installing Miniconda mkl (Intel Math Kernel Library), Basic Operations with Conda ML-based strategies, ML-Based Trading Strategy-Persisting the Model Objectoptimal leverage, Optimal Leverage-Optimal Leverage persisting the model object, Persisting the Model Object Python script for, Automated Trading Strategy risk analysis, Risk Analysis-Risk Analysis vectorized backtesting, Vectorized Backtesting-Vectorized Backtesting MLPClassifier, The Simple Classification Problem Revisited MLTrader class, Online Algorithm-Online Algorithm momentum strategies, Momentumbacktesting on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars basics, Getting into the Basics-Getting into the Basics generalizing the approach, Generalizing the Approach Python code with a class for vectorized backtesting, Momentum Backtesting Class Python script for custom streaming class, Python Script Python script for momentum online algorithm, Momentum Online Algorithm vectorized backtesting of, Strategies Based on Momentum-Generalizing the Approach MomentumTrader class, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time MomVectorBacktester class, Generalizing the Approach monitoringautomated trading operations, Logging and Monitoring-Logging and Monitoring, Real-Time Monitoring Python scripts for strategy monitoring, Strategy Monitoring Monte Carlo simulationsample tick data server, Sample Tick Data Server time series data based on, Python Scripts motives, for trading, Algorithmic Trading MRVectorBacktester class, Generalizing the Approach multi-layer perceptron, The Simple Classification Problem Revisited Musashi, Miyamoto, Python Infrastructure N natural language processing (NLP), Retrieving Historical Unstructured Data ndarray class, Vectorization with NumPy-Vectorization with NumPy ndarray objects, NumPy and Vectorization, ndarray Methods and NumPy Functions-ndarray Methods and NumPy Functionscreating, ndarray Creation linear regression and, A Quick Review of Linear Regression regular, Regular ndarray Object nested structures, Data Structures NLP (natural language processing), Retrieving Historical Unstructured Data np.arange(), ndarray Creation numbers, data typing of, Data Types numerical operations, pandas, Numerical Operations NumPy, NumPy and Vectorization-NumPy and Vectorization, NumPy-Random NumbersBoolean operations, Boolean Operations ndarray creation, ndarray Creation ndarray methods, ndarray Methods and NumPy Functions-ndarray Methods and NumPy Functions random numbers, Random Numbers regular ndarray object, Regular ndarray Object universal functions, ndarray Methods and NumPy Functions vectorization, Vectorization with NumPy-Vectorization with NumPy vectorized operations, Vectorized Operations numpy.random sub-package, Random Numbers NYSE Arca Gold Miners Index, Getting into the Basics O Oandaaccount configuration, Configuring Oanda Account account setup, Setting Up an Account API access, The Oanda API-The Oanda API backtesting momentum strategy on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars CFD trading, CFD Trading with Oanda-Python Script factoring in leverage/margin with historical data, Factoring In Leverage and Margin-Factoring In Leverage and Margin implementing trading strategies in real time, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time looking up instruments available for trading, Looking Up Instruments Available for Trading placing market orders, Placing Market Orders-Placing Market Orders Python script for custom streaming class, Python Script retrieving account information, Retrieving Account Information-Retrieving Account Information retrieving historical data, Retrieving Historical Data-Factoring In Leverage and Margin working with streaming data, Working with Streaming Data Oanda v20 RESTful API, The Oanda API, ML-Based Trading Strategy-Persisting the Model Object, Vectorized Backtesting offline algorithmdefined, Signal Generation in Real Time transformation to online algorithm, Online Algorithm OLS (ordinary least squares) regression, matplotlib online algorithmautomated trading operations, Online Algorithm-Online Algorithm defined, Signal Generation in Real Time Python script for momentum online algorithm, Momentum Online Algorithm signal generation in real time, Signal Generation in Real Time-Signal Generation in Real Time transformation of offline algorithm to, Online Algorithm .on_success() method, Implementing Trading Strategies in Real Time, Online Algorithm open data sources, Working with Open Data Sources-Working with Open Data Sources ordinary least squares (OLS) regression, matplotlib out-of-sample evaluation, Generalizing the Approach overfitting, Data Snooping and Overfitting P package manager, conda as, Conda as a Package Manager-Basic Operations with Conda pandas, pandas and the DataFrame Class-pandas and the DataFrame Class, pandas-Input-Output OperationsBoolean operations, Boolean Operations case study, Case Study-Case Study data selection, Data Selection-Data Selection DataFrame class, DataFrame Class-DataFrame Class exporting financial data to Excel/JSON, Exporting to Excel and JSON input-output operations, Input-Output Operations-Input-Output Operations numerical operations, Numerical Operations plotting, Plotting with pandas-Plotting with pandas reading financial data from Excel/JSON, Reading from Excel and JSON reading from a CSV file, Reading from a CSV File with pandas storing DataFrame objects, Storing DataFrame Objects-Storing DataFrame Objects vectorization, Vectorization with pandas-Vectorization with pandas password protection, for Jupyter lab, Jupyter Notebook Configuration File .place_buy_order() method, Backtesting Base Class .place_sell_order() method, Backtesting Base Class Plotlybasics, The Basics multiple real-time streams for, Three Real-Time Streams multiple sub-plots for streams, Three Sub-Plots for Three Streams streaming data as bars, Streaming Data as Bars visualization of streaming data, Visualizing Streaming Data with Plotly-Streaming Data as Bars plotting, with pandas, Plotting with pandas-Plotting with pandas .plot_data() method, Backtesting Base Class polyfit()/polyval() convenience functions, matplotlib price prediction, based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction .print_balance() method, Backtesting Base Class .print_net_wealth() method, Backtesting Base Class .print_transactions() method, Retrieving Account Information pseudo-code, Python versus, Python Versus Pseudo-Code publisher-subscriber (PUB-SUB) pattern, Working with Real-Time Data and Sockets Python (generally)advantages of, Python for Algorithmic Trading basics, Python and Algorithmic Trading-References and Further Resources control structures, Control Structures data structures, Data Structures-Data Structures data types, Data Types-Data Types deployment difficulties, Python Infrastructure idioms, Python Idioms-Python Idioms NumPy and vectorization, NumPy and Vectorization-NumPy and Vectorization obstacles to adoption in financial industry, Python for Finance origins, Python for Finance pandas and DataFrame class, pandas and the DataFrame Class-pandas and the DataFrame Class pseudo-code versus, Python Versus Pseudo-Code reading from a CSV file, Reading from a CSV File with Python-Reading from a CSV File with Python Python infrastructure, Python Infrastructure-References and Further Resourcesconda as package manager, Conda as a Package Manager-Basic Operations with Conda conda as virtual environment manager, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager Docker containers, Using Docker Containers-Building a Ubuntu and Python Docker Image using cloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Up Python scriptsautomated trading operations, Running the Code, Python Script-Strategy Monitoring backtesting base class, Backtesting Base Class custom streaming class that trades a momentum strategy, Python Script linear regression backtesting class, Linear Regression Backtesting Class long-only backtesting class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class real-time data handling, Python Scripts-Sample Data Server for Bar Plot sample time series data set, Python Scripts strategy monitoring, Strategy Monitoring uploading for automated trading operations, Uploading the Code vectorized backtesting, Python Scripts-Mean Reversion Backtesting Class Q Quandlpremium data sets, Working with Open Data Sources working with open data sources, Working with Open Data Sources-Working with Open Data Sources R random numbers, Random Numbers random walk hypothesis, Predicting Index Levels range (iterator object), Control Structures read_csv() function, Reading from a CSV File with pandas real-time data, Working with Real-Time Data and Sockets-Sample Data Server for Bar PlotPython script for handling, Python Scripts-Sample Data Server for Bar Plot signal generation in real time, Signal Generation in Real Time-Signal Generation in Real Time tick data client for, Connecting a Simple Tick Data Client tick data server for, Running a Simple Tick Data Server-Running a Simple Tick Data Server, Sample Tick Data Server visualizing streaming data with Plotly, Visualizing Streaming Data with Plotly-Streaming Data as Bars real-time monitoring, Real-Time Monitoring Refinitiv, Eikon Data API relative maximum drawdown, Case Study returns, predicting future, Predicting Future Returns-Predicting Future Returns risk analysis, for ML-based trading strategy, Risk Analysis-Risk Analysis RSA public/private keys, RSA Public and Private Keys .run_mean_reversion_strategy() method, Long-Only Backtesting Class, Long-Short Backtesting Class .run_simulation() method, Kelly Criterion in Binomial Setting S S&P 500, Algorithmic Trading-Algorithmic Tradinglogistic regression-based strategies and, Generalizing the Approach momentum strategies, Getting into the Basics passive long position in, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices scatter objects, Three Real-Time Streams scientific stack, NumPy and Vectorization, Python, NumPy, matplotlib, pandas scikit-learn, Linear Regression with scikit-learn ScikitBacktester class, Generalizing the Approach-Generalizing the Approach SciPy package project, NumPy and Vectorization seaborn library, matplotlib-matplotlib simple moving averages (SMAs), pandas and the DataFrame Class, Simple Moving Averagestrading strategies based on, Strategies Based on Simple Moving Averages-Generalizing the Approach visualization with price ticks, Three Real-Time Streams .simulate_value() method, Running a Simple Tick Data Server Singer, Paul, CFD Trading with Oanda sockets, real-time data and, Working with Real-Time Data and Sockets-Sample Data Server for Bar Plot sorting list objects, Data Structures SQLite3, Storing Data with SQLite3-Storing Data with SQLite3 SSL certificate, RSA Public and Private Keys storage (see data storage) streaming bar plots, Streaming Data as Bars, Sample Data Server for Bar Plot streaming dataOanda and, Working with Streaming Data visualization with Plotly, Visualizing Streaming Data with Plotly-Streaming Data as Bars string objects (str), Data Types-Data Types Swiss Franc event, CFD Trading with Oanda systematic macro hedge funds, Algorithmic Trading T TensorFlow, Using Deep Learning for Market Movement Prediction, Using Deep Neural Networks to Predict Market Direction Thomas, Rob, Working with Financial Data Thorp, Edward, Capital Management tick data client, Connecting a Simple Tick Data Client tick data server, Running a Simple Tick Data Server-Running a Simple Tick Data Server, Sample Tick Data Server time series data setspandas and vectorization, Vectorization with pandas price prediction based on, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction Python script for generating sample set, Python Scripts SQLite3 for storage of, Storing Data with SQLite3-Storing Data with SQLite3 TsTables for storing, Using TsTables-Using TsTables time series momentum strategies, Strategies Based on Momentum(see also momentum strategies) .to_hdf() method, Storing DataFrame Objects tpqoa wrapper package, The Oanda API, Working with Streaming Data trading platforms, factors influencing choice of, CFD Trading with Oanda trading strategies, Trading Strategies-Conclusions(see also specific strategies) implementing in real time with Oanda, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time machine learning/deep learning, Machine and Deep Learning mean-reversion, NumPy and Vectorization momentum, Momentum simple moving averages, Simple Moving Averages trading, motives for, Algorithmic Trading transaction costs, Long-Only Backtesting Class, Vectorized Backtesting TsTables package, Using TsTables-Using TsTables tuple objects, Data Structures U Ubuntu, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image universal functions, NumPy, ndarray Methods and NumPy Functions V v20 wrapper package, The Oanda API, ML-Based Trading Strategy-Persisting the Model Object, Vectorized Backtesting value-at-risk (VAR), Risk Analysis-Risk Analysis vectorization, NumPy and Vectorization, Strategies Based on Mean Reversion-Generalizing the Approach vectorized backtestingdata snooping and overfitting, Data Snooping and Overfitting-Conclusions ML-based trading strategy, Vectorized Backtesting-Vectorized Backtesting momentum-based trading strategies, Strategies Based on Momentum-Generalizing the Approach potential shortcomings, Building Classes for Event-Based Backtesting Python code with a class for vectorized backtesting of mean-reversion trading strategies, Momentum Backtesting Class Python scripts for, Python Scripts-Mean Reversion Backtesting Class, Linear Regression Backtesting Class regression-based strategy, Vectorized Backtesting of Regression-Based Strategy trading strategies based on simple moving averages, Strategies Based on Simple Moving Averages-Generalizing the Approach vectorization with NumPy, Vectorization with NumPy-Vectorization with NumPy vectorization with pandas, Vectorization with pandas-Vectorization with pandas vectorized operations, Vectorized Operations virtual environment management, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager W while loops, Control Structures Z ZeroMQ, Working with Real-Time Data and Sockets About the Author Dr.

Index A absolute maximum drawdown, Case Study AdaBoost algorithm, Vectorized Backtesting addition (+) operator, Data Types adjusted return appraisal ratio, Algorithmic Trading algorithmic trading (generally)advantages of, Algorithmic Trading basics, Algorithmic Trading-Algorithmic Trading strategies, Trading Strategies-Conclusions alpha seeking strategies, Trading Strategies alpha, defined, Algorithmic Trading anonymous functions, Python Idioms API key, for data sets, Working with Open Data Sources-Working with Open Data Sources Apple, Inc.intraday stock prices, Getting into the Basics reading stock price data from different sources, Reading Financial Data From Different Sources-Reading from Excel and JSON retrieving historical unstructured data about, Retrieving Historical Unstructured Data-Retrieving Historical Unstructured Data app_key, for Eikon Data API, Eikon Data API AQR Capital Management, pandas and the DataFrame Class arithmetic operations, Data Types array programming, Making Use of Vectorization(see also vectorization) automated trading operations, Automating Trading Operations-Strategy Monitoringcapital management, Capital Management-Kelly Criterion for Stocks and Indices configuring Oanda account, Configuring Oanda Account hardware setup, Setting Up the Hardware infrastructure and deployment, Infrastructure and Deployment logging and monitoring, Logging and Monitoring-Logging and Monitoring ML-based trading strategy, ML-Based Trading Strategy-Persisting the Model Object online algorithm, Online Algorithm-Online Algorithm Python environment setup, Setting Up the Python Environment Python scripts for, Python Script-Strategy Monitoring real-time monitoring, Real-Time Monitoring running code, Running the Code uploading code, Uploading the Code visual step-by-step overview, Visual Step-by-Step Overview-Real-Time Monitoring B backtestingbased on simple moving averages, Strategies Based on Simple Moving Averages-Generalizing the Approach Python scripts for classification algorithm backtesting, Classification Algorithm Backtesting Class Python scripts for linear regression backtesting class, Linear Regression Backtesting Class vectorized (see vectorized backtesting) BacktestLongShort class, Long-Short Backtesting Class, Long-Short Backtesting Class bar charts, matplotlib bar plots (see Plotly; streaming bar plot) base class, for event-based backtesting, Backtesting Base Class-Backtesting Base Class, Backtesting Base Class Bash script, Building a Ubuntu and Python Docker Imagefor Droplet set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up for Python/Jupyter Lab installation, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Bitcoin, pandas and the DataFrame Class, Working with Open Data Sources Boolean operationsNumPy, Boolean Operations pandas, Boolean Operations C callback functions, Retrieving Streaming Data capital managementautomated trading operations and, Capital Management-Kelly Criterion for Stocks and Indices Kelly criterion for stocks and indices, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices Kelly criterion in binomial setting, Kelly Criterion in Binomial Setting-Kelly Criterion in Binomial Setting Carter, Graydon, FX Trading with FXCM CFD (contracts for difference)algorithmic trading risks, Logging and Monitoring defined, CFD Trading with Oanda risks of losses, Long-Short Backtesting Class risks of trading on margin, FX Trading with FXCM trading with Oanda, CFD Trading with Oanda-Python Script(see also Oanda) classification problemsmachine learning for, A Simple Classification Problem-A Simple Classification Problem neural networks for, The Simple Classification Problem Revisited-The Simple Classification Problem Revisited Python scripts for vectorized backtesting, Classification Algorithm Backtesting Class .close_all() method, Placing Orders cloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Upinstallation script for Python and Jupyter Lab, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Jupyter Notebook configuration file, Jupyter Notebook Configuration File RSA public/private keys, RSA Public and Private Keys script to orchestrate Droplet set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up Cocteau, Jean, Building Classes for Event-Based Backtesting comma separated value (CSV) files (see CSV files) condaas package manager, Conda as a Package Manager-Basic Operations with Conda as virtual environment manager, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager basic operations, Basic Operations with Conda-Basic Operations with Conda installing Miniconda, Installing Miniconda-Installing Miniconda conda remove, Basic Operations with Conda configparser module, The Oanda API containers (see Docker containers) contracts for difference (see CFD) control structures, Control Structures CPython, Python for Finance, Python Infrastructure .create_market_buy_order() method, Placing Orders .create_order() method, Placing Market Orders-Placing Market Orders cross-sectional momentum strategies, Strategies Based on Momentum CSV filesinput-output operations, Input-Output Operations-Input-Output Operations reading from a CSV file with pandas, Reading from a CSV File with pandas reading from a CSV file with Python, Reading from a CSV File with Python-Reading from a CSV File with Python .cummax() method, Case Study currency pairs, Logging and Monitoring(see also EUR/USD exchange rate) algorithmic trading risks, Logging and Monitoring D data science stack, Python, NumPy, matplotlib, pandas data snooping, Data Snooping and Overfitting data storageSQLite3 for, Storing Data with SQLite3-Storing Data with SQLite3 storing data efficiently, Storing Financial Data Efficiently-Storing Data with SQLite3 storing DataFrame objects, Storing DataFrame Objects-Storing DataFrame Objects TsTables package for, Using TsTables-Using TsTables data structures, Data Structures-Data Structures DataFrame class, pandas and the DataFrame Class-pandas and the DataFrame Class, Reading from a CSV File with pandas, DataFrame Class-DataFrame Class DataFrame objectscreating, Vectorization with pandas storing, Storing DataFrame Objects-Storing DataFrame Objects dataism, Preface DatetimeIndex() constructor, Plotting with pandas decision tree classification algorithm, Vectorized Backtesting deep learningadding features to analysis, Adding Different Types of Features-Adding Different Types of Features classification problem, The Simple Classification Problem Revisited-The Simple Classification Problem Revisited deep neural networks for predicting market direction, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features market movement prediction, Using Deep Learning for Market Movement Prediction-Adding Different Types of Features trading strategies and, Machine and Deep Learning deep neural networks, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features delta hedging, Algorithmic Trading dense neural network (DNN), The Simple Classification Problem Revisited, Using Deep Neural Networks to Predict Market Direction dictionary (dict) objects, Reading from a CSV File with Python, Data Structures DigitalOceancloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Up droplet setup, Setting Up the Hardware DNN (dense neural network), The Simple Classification Problem Revisited, Using Deep Neural Networks to Predict Market Direction Docker containers, Using Docker Containers-Building a Ubuntu and Python Docker Imagebuilding a Ubuntu and Python Docker image, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image defined, Docker Images and Containers Docker images versus, Docker Images and Containers Docker imagesdefined, Docker Images and Containers Docker containers versus, Docker Images and Containers Dockerfile, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image Domingos, Pedro, Automating Trading Operations Droplet, Using Cloud Instancescosts, Infrastructure and Deployment script to orchestrate set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up dynamic hedging, Algorithmic Trading E efficient market hypothesis, Predicting Market Movements with Machine Learning Eikon Data API, Eikon Data API-Retrieving Historical Unstructured Dataretrieving historical structured data, Retrieving Historical Structured Data-Retrieving Historical Structured Data retrieving historical unstructured data, Retrieving Historical Unstructured Data-Retrieving Historical Unstructured Data Euler discretization, Python Versus Pseudo-Code EUR/USD exchange ratebacktesting momentum strategy on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars evaluation of regression-based strategy, Generalizing the Approach factoring in leverage/margin, Factoring In Leverage and Margin-Factoring In Leverage and Margin gross performance versus deep learning-based strategy, Using Deep Neural Networks to Predict Market Direction-Using Deep Neural Networks to Predict Market Direction, Adding Different Types of Features-Adding Different Types of Features historical ask close prices, Retrieving Historical Data-Retrieving Historical Data historical candles data for, Retrieving Candles Data historical tick data for, Retrieving Tick Data implementing trading strategies in real time, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time logistic regression-based strategies, Generalizing the Approach placing orders, Placing Orders-Placing Orders predicting, Predicting Index Levels-Predicting Index Levels predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels retrieving streaming data for, Retrieving Streaming Data retrieving trading account information, Retrieving Account Information-Retrieving Account Information SMA calculation, Getting into the Basics-Generalizing the Approach vectorized backtesting of ML-based trading strategy, Vectorized Backtesting-Vectorized Backtesting vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy event-based backtesting, Building Classes for Event-Based Backtesting-Long-Short Backtesting Classadvantages, Building Classes for Event-Based Backtesting base class, Backtesting Base Class-Backtesting Base Class, Backtesting Base Class building classes for, Building Classes for Event-Based Backtesting-Long-Short Backtesting Class long-only backtesting class, Long-Only Backtesting Class-Long-Only Backtesting Class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class-Long-Short Backtesting Class, Long-Short Backtesting Class Python scripts for, Backtesting Base Class-Long-Short Backtesting Class Excelexporting financial data to, Exporting to Excel and JSON reading financial data from, Reading from Excel and JSON F featuresadding different types, Adding Different Types of Features-Adding Different Types of Features lags and, Using Logistic Regression to Predict Market Direction financial data, working with, Working with Financial Data-Python Scriptsdata set for examples, The Data Set Eikon Data API, Eikon Data API-Retrieving Historical Unstructured Data exporting to Excel/JSON, Exporting to Excel and JSON open data sources, Working with Open Data Sources-Working with Open Data Sources reading data from different sources, Reading Financial Data From Different Sources-Reading from Excel and JSON reading data from Excel/JSON, Reading from Excel and JSON reading from a CSV file with pandas, Reading from a CSV File with pandas reading from a CSV file with Python, Reading from a CSV File with Python-Reading from a CSV File with Python storing data efficiently, Storing Financial Data Efficiently-Storing Data with SQLite3 .flatten() method, matplotlib foreign exchange trading (see FX trading; FXCM) future returns, predicting, Predicting Future Returns-Predicting Future Returns FX trading, FX Trading with FXCM-References and Further Resources(see also EUR/USD exchange rate) FXCMFX trading, FX Trading with FXCM-References and Further Resources getting started, Getting Started placing orders, Placing Orders-Placing Orders retrieving account information, Account Information retrieving candles data, Retrieving Candles Data-Retrieving Candles Data retrieving data, Retrieving Data-Retrieving Candles Data retrieving historical data, Retrieving Historical Data-Retrieving Historical Data retrieving streaming data, Retrieving Streaming Data retrieving tick data, Retrieving Tick Data-Retrieving Tick Data working with the API, Working with the API-Account Information fxcmpy wrapper packagecallback functions, Retrieving Streaming Data installing, Getting Started tick data retrieval, Retrieving Tick Data fxTrade, CFD Trading with Oanda G GDX (VanEck Vectors Gold Miners ETF)logistic regression-based strategies, Generalizing the Approach mean-reversion strategies, Getting into the Basics-Generalizing the Approach regression-based strategies, Generalizing the Approach generate_sample_data(), Storing Financial Data Efficiently .get_account_summary() method, Retrieving Account Information .get_candles() method, Retrieving Historical Data .get_data() method, Backtesting Base Class, Retrieving Tick Data .get_date_price() method, Backtesting Base Class .get_instruments() method, Looking Up Instruments Available for Trading .get_last_price() method, Retrieving Streaming Data .get_raw_data() method, Retrieving Tick Data get_timeseries() function, Retrieving Historical Structured Data .get_transactions() method, Retrieving Account Information GLD (SPDR Gold Shares)logistic regression-based strategies, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction mean-reversion strategies, Getting into the Basics-Generalizing the Approach gold pricemean-reversion strategies, Getting into the Basics-Getting into the Basics momentum strategy and, Getting into the Basics-Getting into the Basics, Generalizing the Approach-Generalizing the Approach Goldman Sachs, Python and Algorithmic Trading, Algorithmic Trading .go_long() method, Long-Short Backtesting Class H half Kelly criterion, Optimal Leverage Harari, Yuval Noah, Preface HDF5 binary storage library, Using TsTables-Using TsTables HDFStore wrapper, Storing DataFrame Objects-Storing DataFrame Objects high frequency trading (HFQ), Algorithmic Trading histograms, matplotlib hit ratio, defined, Vectorized Backtesting I if-elif-else control structure, Python Idioms in-sample fitting, Generalizing the Approach index levels, predicting, Predicting Index Levels-Predicting Index Levels infrastructure (see Python infrastructure) installation script, Python/Jupyter Lab, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Intel Math Kernel Library, Basic Operations with Conda iterations, Control Structures J JSONexporting financial data to, Exporting to Excel and JSON reading financial data from, Reading from Excel and JSON Jupyter Labinstallation script for, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab RSA public/private keys for, RSA Public and Private Keys tools included, Using Cloud Instances Jupyter Notebook, Jupyter Notebook Configuration File K Kelly criterionin binomial setting, Kelly Criterion in Binomial Setting-Kelly Criterion in Binomial Setting optimal leverage, Optimal Leverage-Optimal Leverage stocks and indices, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices Keras, Using Deep Learning for Market Movement Prediction, Using Deep Neural Networks to Predict Market Direction, Adding Different Types of Features key-value stores, Data Structures keys, public/private, RSA Public and Private Keys L lags, The Basic Idea for Price Prediction, Using Logistic Regression to Predict Market Direction lambda functions, Python Idioms LaTeX, Python Versus Pseudo-Code leveraged trading, risks of, Factoring In Leverage and Margin, FX Trading with FXCM, Optimal Leverage linear regressiongeneralizing the approach, Generalizing the Approach market movement prediction, Using Linear Regression for Market Movement Prediction-Generalizing the Approach predicting future market direction, Predicting Future Market Direction predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels price prediction based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction review of, A Quick Review of Linear Regression scikit-learn and, Linear Regression with scikit-learn vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy, Linear Regression Backtesting Class list comprehension, Python Idioms list constructor, Data Structures list objects, Reading from a CSV File with Python, Data Structures, Regular ndarray Object logging, of automated trading operations, Logging and Monitoring-Logging and Monitoring logistic regressiongeneralizing the approach, Generalizing the Approach-Generalizing the Approach market direction prediction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction Python script for vectorized backtesting, Classification Algorithm Backtesting Class long-only backtesting class, Long-Only Backtesting Class-Long-Only Backtesting Class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class-Long-Short Backtesting Class, Long-Short Backtesting Class longest drawdown period, Risk Analysis M machine learningclassification problem, A Simple Classification Problem-A Simple Classification Problem linear regression with scikit-learn, Linear Regression with scikit-learn market movement prediction, Using Machine Learning for Market Movement Prediction-Generalizing the Approach ML-based trading strategy, ML-Based Trading Strategy-Persisting the Model Object Python scripts, Linear Regression Backtesting Class trading strategies and, Machine and Deep Learning using logistic regression to predict market direction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction macro hedge funds, algorithmic trading and, Algorithmic Trading __main__ method, Backtesting Base Class margin trading, FX Trading with FXCM market direction prediction, Predicting Future Market Direction market movement predictiondeep learning for, Using Deep Learning for Market Movement Prediction-Adding Different Types of Features deep neural networks for, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features linear regression for, Using Linear Regression for Market Movement Prediction-Generalizing the Approach linear regression with scikit-learn, Linear Regression with scikit-learn logistic regression to predict market direction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction machine learning for, Using Machine Learning for Market Movement Prediction-Generalizing the Approach predicting future market direction, Predicting Future Market Direction predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels price prediction based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy market orders, placing, Placing Market Orders-Placing Market Orders math module, Data Types mathematical functions, Data Types matplotlib, matplotlib-matplotlib, Plotting with pandas-Plotting with pandas maximum drawdown, Risk Analysis, Case Study McKinney, Wes, pandas and the DataFrame Class mean-reversion strategies, NumPy and Vectorization, Strategies Based on Mean Reversion-Generalizing the Approachbasics, Getting into the Basics-Generalizing the Approach generalizing the approach, Generalizing the Approach Python code with a class for vectorized backtesting, Momentum Backtesting Class Miniconda, Installing Miniconda-Installing Miniconda mkl (Intel Math Kernel Library), Basic Operations with Conda ML-based strategies, ML-Based Trading Strategy-Persisting the Model Objectoptimal leverage, Optimal Leverage-Optimal Leverage persisting the model object, Persisting the Model Object Python script for, Automated Trading Strategy risk analysis, Risk Analysis-Risk Analysis vectorized backtesting, Vectorized Backtesting-Vectorized Backtesting MLPClassifier, The Simple Classification Problem Revisited MLTrader class, Online Algorithm-Online Algorithm momentum strategies, Momentumbacktesting on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars basics, Getting into the Basics-Getting into the Basics generalizing the approach, Generalizing the Approach Python code with a class for vectorized backtesting, Momentum Backtesting Class Python script for custom streaming class, Python Script Python script for momentum online algorithm, Momentum Online Algorithm vectorized backtesting of, Strategies Based on Momentum-Generalizing the Approach MomentumTrader class, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time MomVectorBacktester class, Generalizing the Approach monitoringautomated trading operations, Logging and Monitoring-Logging and Monitoring, Real-Time Monitoring Python scripts for strategy monitoring, Strategy Monitoring Monte Carlo simulationsample tick data server, Sample Tick Data Server time series data based on, Python Scripts motives, for trading, Algorithmic Trading MRVectorBacktester class, Generalizing the Approach multi-layer perceptron, The Simple Classification Problem Revisited Musashi, Miyamoto, Python Infrastructure N natural language processing (NLP), Retrieving Historical Unstructured Data ndarray class, Vectorization with NumPy-Vectorization with NumPy ndarray objects, NumPy and Vectorization, ndarray Methods and NumPy Functions-ndarray Methods and NumPy Functionscreating, ndarray Creation linear regression and, A Quick Review of Linear Regression regular, Regular ndarray Object nested structures, Data Structures NLP (natural language processing), Retrieving Historical Unstructured Data np.arange(), ndarray Creation numbers, data typing of, Data Types numerical operations, pandas, Numerical Operations NumPy, NumPy and Vectorization-NumPy and Vectorization, NumPy-Random NumbersBoolean operations, Boolean Operations ndarray creation, ndarray Creation ndarray methods, ndarray Methods and NumPy Functions-ndarray Methods and NumPy Functions random numbers, Random Numbers regular ndarray object, Regular ndarray Object universal functions, ndarray Methods and NumPy Functions vectorization, Vectorization with NumPy-Vectorization with NumPy vectorized operations, Vectorized Operations numpy.random sub-package, Random Numbers NYSE Arca Gold Miners Index, Getting into the Basics O Oandaaccount configuration, Configuring Oanda Account account setup, Setting Up an Account API access, The Oanda API-The Oanda API backtesting momentum strategy on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars CFD trading, CFD Trading with Oanda-Python Script factoring in leverage/margin with historical data, Factoring In Leverage and Margin-Factoring In Leverage and Margin implementing trading strategies in real time, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time looking up instruments available for trading, Looking Up Instruments Available for Trading placing market orders, Placing Market Orders-Placing Market Orders Python script for custom streaming class, Python Script retrieving account information, Retrieving Account Information-Retrieving Account Information retrieving historical data, Retrieving Historical Data-Factoring In Leverage and Margin working with streaming data, Working with Streaming Data Oanda v20 RESTful API, The Oanda API, ML-Based Trading Strategy-Persisting the Model Object, Vectorized Backtesting offline algorithmdefined, Signal Generation in Real Time transformation to online algorithm, Online Algorithm OLS (ordinary least squares) regression, matplotlib online algorithmautomated trading operations, Online Algorithm-Online Algorithm defined, Signal Generation in Real Time Python script for momentum online algorithm, Momentum Online Algorithm signal generation in real time, Signal Generation in Real Time-Signal Generation in Real Time transformation of offline algorithm to, Online Algorithm .on_success() method, Implementing Trading Strategies in Real Time, Online Algorithm open data sources, Working with Open Data Sources-Working with Open Data Sources ordinary least squares (OLS) regression, matplotlib out-of-sample evaluation, Generalizing the Approach overfitting, Data Snooping and Overfitting P package manager, conda as, Conda as a Package Manager-Basic Operations with Conda pandas, pandas and the DataFrame Class-pandas and the DataFrame Class, pandas-Input-Output OperationsBoolean operations, Boolean Operations case study, Case Study-Case Study data selection, Data Selection-Data Selection DataFrame class, DataFrame Class-DataFrame Class exporting financial data to Excel/JSON, Exporting to Excel and JSON input-output operations, Input-Output Operations-Input-Output Operations numerical operations, Numerical Operations plotting, Plotting with pandas-Plotting with pandas reading financial data from Excel/JSON, Reading from Excel and JSON reading from a CSV file, Reading from a CSV File with pandas storing DataFrame objects, Storing DataFrame Objects-Storing DataFrame Objects vectorization, Vectorization with pandas-Vectorization with pandas password protection, for Jupyter lab, Jupyter Notebook Configuration File .place_buy_order() method, Backtesting Base Class .place_sell_order() method, Backtesting Base Class Plotlybasics, The Basics multiple real-time streams for, Three Real-Time Streams multiple sub-plots for streams, Three Sub-Plots for Three Streams streaming data as bars, Streaming Data as Bars visualization of streaming data, Visualizing Streaming Data with Plotly-Streaming Data as Bars plotting, with pandas, Plotting with pandas-Plotting with pandas .plot_data() method, Backtesting Base Class polyfit()/polyval() convenience functions, matplotlib price prediction, based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction .print_balance() method, Backtesting Base Class .print_net_wealth() method, Backtesting Base Class .print_transactions() method, Retrieving Account Information pseudo-code, Python versus, Python Versus Pseudo-Code publisher-subscriber (PUB-SUB) pattern, Working with Real-Time Data and Sockets Python (generally)advantages of, Python for Algorithmic Trading basics, Python and Algorithmic Trading-References and Further Resources control structures, Control Structures data structures, Data Structures-Data Structures data types, Data Types-Data Types deployment difficulties, Python Infrastructure idioms, Python Idioms-Python Idioms NumPy and vectorization, NumPy and Vectorization-NumPy and Vectorization obstacles to adoption in financial industry, Python for Finance origins, Python for Finance pandas and DataFrame class, pandas and the DataFrame Class-pandas and the DataFrame Class pseudo-code versus, Python Versus Pseudo-Code reading from a CSV file, Reading from a CSV File with Python-Reading from a CSV File with Python Python infrastructure, Python Infrastructure-References and Further Resourcesconda as package manager, Conda as a Package Manager-Basic Operations with Conda conda as virtual environment manager, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager Docker containers, Using Docker Containers-Building a Ubuntu and Python Docker Image using cloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Up Python scriptsautomated trading operations, Running the Code, Python Script-Strategy Monitoring backtesting base class, Backtesting Base Class custom streaming class that trades a momentum strategy, Python Script linear regression backtesting class, Linear Regression Backtesting Class long-only backtesting class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class real-time data handling, Python Scripts-Sample Data Server for Bar Plot sample time series data set, Python Scripts strategy monitoring, Strategy Monitoring uploading for automated trading operations, Uploading the Code vectorized backtesting, Python Scripts-Mean Reversion Backtesting Class Q Quandlpremium data sets, Working with Open Data Sources working with open data sources, Working with Open Data Sources-Working with Open Data Sources R random numbers, Random Numbers random walk hypothesis, Predicting Index Levels range (iterator object), Control Structures read_csv() function, Reading from a CSV File with pandas real-time data, Working with Real-Time Data and Sockets-Sample Data Server for Bar PlotPython script for handling, Python Scripts-Sample Data Server for Bar Plot signal generation in real time, Signal Generation in Real Time-Signal Generation in Real Time tick data client for, Connecting a Simple Tick Data Client tick data server for, Running a Simple Tick Data Server-Running a Simple Tick Data Server, Sample Tick Data Server visualizing streaming data with Plotly, Visualizing Streaming Data with Plotly-Streaming Data as Bars real-time monitoring, Real-Time Monitoring Refinitiv, Eikon Data API relative maximum drawdown, Case Study returns, predicting future, Predicting Future Returns-Predicting Future Returns risk analysis, for ML-based trading strategy, Risk Analysis-Risk Analysis RSA public/private keys, RSA Public and Private Keys .run_mean_reversion_strategy() method, Long-Only Backtesting Class, Long-Short Backtesting Class .run_simulation() method, Kelly Criterion in Binomial Setting S S&P 500, Algorithmic Trading-Algorithmic Tradinglogistic regression-based strategies and, Generalizing the Approach momentum strategies, Getting into the Basics passive long position in, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices scatter objects, Three Real-Time Streams scientific stack, NumPy and Vectorization, Python, NumPy, matplotlib, pandas scikit-learn, Linear Regression with scikit-learn ScikitBacktester class, Generalizing the Approach-Generalizing the Approach SciPy package project, NumPy and Vectorization seaborn library, matplotlib-matplotlib simple moving averages (SMAs), pandas and the DataFrame Class, Simple Moving Averagestrading strategies based on, Strategies Based on Simple Moving Averages-Generalizing the Approach visualization with price ticks, Three Real-Time Streams .simulate_value() method, Running a Simple Tick Data Server Singer, Paul, CFD Trading with Oanda sockets, real-time data and, Working with Real-Time Data and Sockets-Sample Data Server for Bar Plot sorting list objects, Data Structures SQLite3, Storing Data with SQLite3-Storing Data with SQLite3 SSL certificate, RSA Public and Private Keys storage (see data storage) streaming bar plots, Streaming Data as Bars, Sample Data Server for Bar Plot streaming dataOanda and, Working with Streaming Data visualization with Plotly, Visualizing Streaming Data with Plotly-Streaming Data as Bars string objects (str), Data Types-Data Types Swiss Franc event, CFD Trading with Oanda systematic macro hedge funds, Algorithmic Trading T TensorFlow, Using Deep Learning for Market Movement Prediction, Using Deep Neural Networks to Predict Market Direction Thomas, Rob, Working with Financial Data Thorp, Edward, Capital Management tick data client, Connecting a Simple Tick Data Client tick data server, Running a Simple Tick Data Server-Running a Simple Tick Data Server, Sample Tick Data Server time series data setspandas and vectorization, Vectorization with pandas price prediction based on, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction Python script for generating sample set, Python Scripts SQLite3 for storage of, Storing Data with SQLite3-Storing Data with SQLite3 TsTables for storing, Using TsTables-Using TsTables time series momentum strategies, Strategies Based on Momentum(see also momentum strategies) .to_hdf() method, Storing DataFrame Objects tpqoa wrapper package, The Oanda API, Working with Streaming Data trading platforms, factors influencing choice of, CFD Trading with Oanda trading strategies, Trading Strategies-Conclusions(see also specific strategies) implementing in real time with Oanda, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time machine learning/deep learning, Machine and Deep Learning mean-reversion, NumPy and Vectorization momentum, Momentum simple moving averages, Simple Moving Averages trading, motives for, Algorithmic Trading transaction costs, Long-Only Backtesting Class, Vectorized Backtesting TsTables package, Using TsTables-Using TsTables tuple objects, Data Structures U Ubuntu, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image universal functions, NumPy, ndarray Methods and NumPy Functions V v20 wrapper package, The Oanda API, ML-Based Trading Strategy-Persisting the Model Object, Vectorized Backtesting value-at-risk (VAR), Risk Analysis-Risk Analysis vectorization, NumPy and Vectorization, Strategies Based on Mean Reversion-Generalizing the Approach vectorized backtestingdata snooping and overfitting, Data Snooping and Overfitting-Conclusions ML-based trading strategy, Vectorized Backtesting-Vectorized Backtesting momentum-based trading strategies, Strategies Based on Momentum-Generalizing the Approach potential shortcomings, Building Classes for Event-Based Backtesting Python code with a class for vectorized backtesting of mean-reversion trading strategies, Momentum Backtesting Class Python scripts for, Python Scripts-Mean Reversion Backtesting Class, Linear Regression Backtesting Class regression-based strategy, Vectorized Backtesting of Regression-Based Strategy trading strategies based on simple moving averages, Strategies Based on Simple Moving Averages-Generalizing the Approach vectorization with NumPy, Vectorization with NumPy-Vectorization with NumPy vectorization with pandas, Vectorization with pandas-Vectorization with pandas vectorized operations, Vectorized Operations virtual environment management, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager W while loops, Control Structures Z ZeroMQ, Working with Real-Time Data and Sockets About the Author Dr.

Index A absolute maximum drawdown, Case Study AdaBoost algorithm, Vectorized Backtesting addition (+) operator, Data Types adjusted return appraisal ratio, Algorithmic Trading algorithmic trading (generally)advantages of, Algorithmic Trading basics, Algorithmic Trading-Algorithmic Trading strategies, Trading Strategies-Conclusions alpha seeking strategies, Trading Strategies alpha, defined, Algorithmic Trading anonymous functions, Python Idioms API key, for data sets, Working with Open Data Sources-Working with Open Data Sources Apple, Inc.intraday stock prices, Getting into the Basics reading stock price data from different sources, Reading Financial Data From Different Sources-Reading from Excel and JSON retrieving historical unstructured data about, Retrieving Historical Unstructured Data-Retrieving Historical Unstructured Data app_key, for Eikon Data API, Eikon Data API AQR Capital Management, pandas and the DataFrame Class arithmetic operations, Data Types array programming, Making Use of Vectorization(see also vectorization) automated trading operations, Automating Trading Operations-Strategy Monitoringcapital management, Capital Management-Kelly Criterion for Stocks and Indices configuring Oanda account, Configuring Oanda Account hardware setup, Setting Up the Hardware infrastructure and deployment, Infrastructure and Deployment logging and monitoring, Logging and Monitoring-Logging and Monitoring ML-based trading strategy, ML-Based Trading Strategy-Persisting the Model Object online algorithm, Online Algorithm-Online Algorithm Python environment setup, Setting Up the Python Environment Python scripts for, Python Script-Strategy Monitoring real-time monitoring, Real-Time Monitoring running code, Running the Code uploading code, Uploading the Code visual step-by-step overview, Visual Step-by-Step Overview-Real-Time Monitoring B backtestingbased on simple moving averages, Strategies Based on Simple Moving Averages-Generalizing the Approach Python scripts for classification algorithm backtesting, Classification Algorithm Backtesting Class Python scripts for linear regression backtesting class, Linear Regression Backtesting Class vectorized (see vectorized backtesting) BacktestLongShort class, Long-Short Backtesting Class, Long-Short Backtesting Class bar charts, matplotlib bar plots (see Plotly; streaming bar plot) base class, for event-based backtesting, Backtesting Base Class-Backtesting Base Class, Backtesting Base Class Bash script, Building a Ubuntu and Python Docker Imagefor Droplet set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up for Python/Jupyter Lab installation, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Bitcoin, pandas and the DataFrame Class, Working with Open Data Sources Boolean operationsNumPy, Boolean Operations pandas, Boolean Operations C callback functions, Retrieving Streaming Data capital managementautomated trading operations and, Capital Management-Kelly Criterion for Stocks and Indices Kelly criterion for stocks and indices, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices Kelly criterion in binomial setting, Kelly Criterion in Binomial Setting-Kelly Criterion in Binomial Setting Carter, Graydon, FX Trading with FXCM CFD (contracts for difference)algorithmic trading risks, Logging and Monitoring defined, CFD Trading with Oanda risks of losses, Long-Short Backtesting Class risks of trading on margin, FX Trading with FXCM trading with Oanda, CFD Trading with Oanda-Python Script(see also Oanda) classification problemsmachine learning for, A Simple Classification Problem-A Simple Classification Problem neural networks for, The Simple Classification Problem Revisited-The Simple Classification Problem Revisited Python scripts for vectorized backtesting, Classification Algorithm Backtesting Class .close_all() method, Placing Orders cloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Upinstallation script for Python and Jupyter Lab, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Jupyter Notebook configuration file, Jupyter Notebook Configuration File RSA public/private keys, RSA Public and Private Keys script to orchestrate Droplet set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up Cocteau, Jean, Building Classes for Event-Based Backtesting comma separated value (CSV) files (see CSV files) condaas package manager, Conda as a Package Manager-Basic Operations with Conda as virtual environment manager, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager basic operations, Basic Operations with Conda-Basic Operations with Conda installing Miniconda, Installing Miniconda-Installing Miniconda conda remove, Basic Operations with Conda configparser module, The Oanda API containers (see Docker containers) contracts for difference (see CFD) control structures, Control Structures CPython, Python for Finance, Python Infrastructure .create_market_buy_order() method, Placing Orders .create_order() method, Placing Market Orders-Placing Market Orders cross-sectional momentum strategies, Strategies Based on Momentum CSV filesinput-output operations, Input-Output Operations-Input-Output Operations reading from a CSV file with pandas, Reading from a CSV File with pandas reading from a CSV file with Python, Reading from a CSV File with Python-Reading from a CSV File with Python .cummax() method, Case Study currency pairs, Logging and Monitoring(see also EUR/USD exchange rate) algorithmic trading risks, Logging and Monitoring D data science stack, Python, NumPy, matplotlib, pandas data snooping, Data Snooping and Overfitting data storageSQLite3 for, Storing Data with SQLite3-Storing Data with SQLite3 storing data efficiently, Storing Financial Data Efficiently-Storing Data with SQLite3 storing DataFrame objects, Storing DataFrame Objects-Storing DataFrame Objects TsTables package for, Using TsTables-Using TsTables data structures, Data Structures-Data Structures DataFrame class, pandas and the DataFrame Class-pandas and the DataFrame Class, Reading from a CSV File with pandas, DataFrame Class-DataFrame Class DataFrame objectscreating, Vectorization with pandas storing, Storing DataFrame Objects-Storing DataFrame Objects dataism, Preface DatetimeIndex() constructor, Plotting with pandas decision tree classification algorithm, Vectorized Backtesting deep learningadding features to analysis, Adding Different Types of Features-Adding Different Types of Features classification problem, The Simple Classification Problem Revisited-The Simple Classification Problem Revisited deep neural networks for predicting market direction, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features market movement prediction, Using Deep Learning for Market Movement Prediction-Adding Different Types of Features trading strategies and, Machine and Deep Learning deep neural networks, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features delta hedging, Algorithmic Trading dense neural network (DNN), The Simple Classification Problem Revisited, Using Deep Neural Networks to Predict Market Direction dictionary (dict) objects, Reading from a CSV File with Python, Data Structures DigitalOceancloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Up droplet setup, Setting Up the Hardware DNN (dense neural network), The Simple Classification Problem Revisited, Using Deep Neural Networks to Predict Market Direction Docker containers, Using Docker Containers-Building a Ubuntu and Python Docker Imagebuilding a Ubuntu and Python Docker image, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image defined, Docker Images and Containers Docker images versus, Docker Images and Containers Docker imagesdefined, Docker Images and Containers Docker containers versus, Docker Images and Containers Dockerfile, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image Domingos, Pedro, Automating Trading Operations Droplet, Using Cloud Instancescosts, Infrastructure and Deployment script to orchestrate set-up, Script to Orchestrate the Droplet Set Up-Script to Orchestrate the Droplet Set Up dynamic hedging, Algorithmic Trading E efficient market hypothesis, Predicting Market Movements with Machine Learning Eikon Data API, Eikon Data API-Retrieving Historical Unstructured Dataretrieving historical structured data, Retrieving Historical Structured Data-Retrieving Historical Structured Data retrieving historical unstructured data, Retrieving Historical Unstructured Data-Retrieving Historical Unstructured Data Euler discretization, Python Versus Pseudo-Code EUR/USD exchange ratebacktesting momentum strategy on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars evaluation of regression-based strategy, Generalizing the Approach factoring in leverage/margin, Factoring In Leverage and Margin-Factoring In Leverage and Margin gross performance versus deep learning-based strategy, Using Deep Neural Networks to Predict Market Direction-Using Deep Neural Networks to Predict Market Direction, Adding Different Types of Features-Adding Different Types of Features historical ask close prices, Retrieving Historical Data-Retrieving Historical Data historical candles data for, Retrieving Candles Data historical tick data for, Retrieving Tick Data implementing trading strategies in real time, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time logistic regression-based strategies, Generalizing the Approach placing orders, Placing Orders-Placing Orders predicting, Predicting Index Levels-Predicting Index Levels predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels retrieving streaming data for, Retrieving Streaming Data retrieving trading account information, Retrieving Account Information-Retrieving Account Information SMA calculation, Getting into the Basics-Generalizing the Approach vectorized backtesting of ML-based trading strategy, Vectorized Backtesting-Vectorized Backtesting vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy event-based backtesting, Building Classes for Event-Based Backtesting-Long-Short Backtesting Classadvantages, Building Classes for Event-Based Backtesting base class, Backtesting Base Class-Backtesting Base Class, Backtesting Base Class building classes for, Building Classes for Event-Based Backtesting-Long-Short Backtesting Class long-only backtesting class, Long-Only Backtesting Class-Long-Only Backtesting Class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class-Long-Short Backtesting Class, Long-Short Backtesting Class Python scripts for, Backtesting Base Class-Long-Short Backtesting Class Excelexporting financial data to, Exporting to Excel and JSON reading financial data from, Reading from Excel and JSON F featuresadding different types, Adding Different Types of Features-Adding Different Types of Features lags and, Using Logistic Regression to Predict Market Direction financial data, working with, Working with Financial Data-Python Scriptsdata set for examples, The Data Set Eikon Data API, Eikon Data API-Retrieving Historical Unstructured Data exporting to Excel/JSON, Exporting to Excel and JSON open data sources, Working with Open Data Sources-Working with Open Data Sources reading data from different sources, Reading Financial Data From Different Sources-Reading from Excel and JSON reading data from Excel/JSON, Reading from Excel and JSON reading from a CSV file with pandas, Reading from a CSV File with pandas reading from a CSV file with Python, Reading from a CSV File with Python-Reading from a CSV File with Python storing data efficiently, Storing Financial Data Efficiently-Storing Data with SQLite3 .flatten() method, matplotlib foreign exchange trading (see FX trading; FXCM) future returns, predicting, Predicting Future Returns-Predicting Future Returns FX trading, FX Trading with FXCM-References and Further Resources(see also EUR/USD exchange rate) FXCMFX trading, FX Trading with FXCM-References and Further Resources getting started, Getting Started placing orders, Placing Orders-Placing Orders retrieving account information, Account Information retrieving candles data, Retrieving Candles Data-Retrieving Candles Data retrieving data, Retrieving Data-Retrieving Candles Data retrieving historical data, Retrieving Historical Data-Retrieving Historical Data retrieving streaming data, Retrieving Streaming Data retrieving tick data, Retrieving Tick Data-Retrieving Tick Data working with the API, Working with the API-Account Information fxcmpy wrapper packagecallback functions, Retrieving Streaming Data installing, Getting Started tick data retrieval, Retrieving Tick Data fxTrade, CFD Trading with Oanda G GDX (VanEck Vectors Gold Miners ETF)logistic regression-based strategies, Generalizing the Approach mean-reversion strategies, Getting into the Basics-Generalizing the Approach regression-based strategies, Generalizing the Approach generate_sample_data(), Storing Financial Data Efficiently .get_account_summary() method, Retrieving Account Information .get_candles() method, Retrieving Historical Data .get_data() method, Backtesting Base Class, Retrieving Tick Data .get_date_price() method, Backtesting Base Class .get_instruments() method, Looking Up Instruments Available for Trading .get_last_price() method, Retrieving Streaming Data .get_raw_data() method, Retrieving Tick Data get_timeseries() function, Retrieving Historical Structured Data .get_transactions() method, Retrieving Account Information GLD (SPDR Gold Shares)logistic regression-based strategies, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction mean-reversion strategies, Getting into the Basics-Generalizing the Approach gold pricemean-reversion strategies, Getting into the Basics-Getting into the Basics momentum strategy and, Getting into the Basics-Getting into the Basics, Generalizing the Approach-Generalizing the Approach Goldman Sachs, Python and Algorithmic Trading, Algorithmic Trading .go_long() method, Long-Short Backtesting Class H half Kelly criterion, Optimal Leverage Harari, Yuval Noah, Preface HDF5 binary storage library, Using TsTables-Using TsTables HDFStore wrapper, Storing DataFrame Objects-Storing DataFrame Objects high frequency trading (HFQ), Algorithmic Trading histograms, matplotlib hit ratio, defined, Vectorized Backtesting I if-elif-else control structure, Python Idioms in-sample fitting, Generalizing the Approach index levels, predicting, Predicting Index Levels-Predicting Index Levels infrastructure (see Python infrastructure) installation script, Python/Jupyter Lab, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab Intel Math Kernel Library, Basic Operations with Conda iterations, Control Structures J JSONexporting financial data to, Exporting to Excel and JSON reading financial data from, Reading from Excel and JSON Jupyter Labinstallation script for, Installation Script for Python and Jupyter Lab-Installation Script for Python and Jupyter Lab RSA public/private keys for, RSA Public and Private Keys tools included, Using Cloud Instances Jupyter Notebook, Jupyter Notebook Configuration File K Kelly criterionin binomial setting, Kelly Criterion in Binomial Setting-Kelly Criterion in Binomial Setting optimal leverage, Optimal Leverage-Optimal Leverage stocks and indices, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices Keras, Using Deep Learning for Market Movement Prediction, Using Deep Neural Networks to Predict Market Direction, Adding Different Types of Features key-value stores, Data Structures keys, public/private, RSA Public and Private Keys L lags, The Basic Idea for Price Prediction, Using Logistic Regression to Predict Market Direction lambda functions, Python Idioms LaTeX, Python Versus Pseudo-Code leveraged trading, risks of, Factoring In Leverage and Margin, FX Trading with FXCM, Optimal Leverage linear regressiongeneralizing the approach, Generalizing the Approach market movement prediction, Using Linear Regression for Market Movement Prediction-Generalizing the Approach predicting future market direction, Predicting Future Market Direction predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels price prediction based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction review of, A Quick Review of Linear Regression scikit-learn and, Linear Regression with scikit-learn vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy, Linear Regression Backtesting Class list comprehension, Python Idioms list constructor, Data Structures list objects, Reading from a CSV File with Python, Data Structures, Regular ndarray Object logging, of automated trading operations, Logging and Monitoring-Logging and Monitoring logistic regressiongeneralizing the approach, Generalizing the Approach-Generalizing the Approach market direction prediction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction Python script for vectorized backtesting, Classification Algorithm Backtesting Class long-only backtesting class, Long-Only Backtesting Class-Long-Only Backtesting Class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class-Long-Short Backtesting Class, Long-Short Backtesting Class longest drawdown period, Risk Analysis M machine learningclassification problem, A Simple Classification Problem-A Simple Classification Problem linear regression with scikit-learn, Linear Regression with scikit-learn market movement prediction, Using Machine Learning for Market Movement Prediction-Generalizing the Approach ML-based trading strategy, ML-Based Trading Strategy-Persisting the Model Object Python scripts, Linear Regression Backtesting Class trading strategies and, Machine and Deep Learning using logistic regression to predict market direction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction macro hedge funds, algorithmic trading and, Algorithmic Trading __main__ method, Backtesting Base Class margin trading, FX Trading with FXCM market direction prediction, Predicting Future Market Direction market movement predictiondeep learning for, Using Deep Learning for Market Movement Prediction-Adding Different Types of Features deep neural networks for, Using Deep Neural Networks to Predict Market Direction-Adding Different Types of Features linear regression for, Using Linear Regression for Market Movement Prediction-Generalizing the Approach linear regression with scikit-learn, Linear Regression with scikit-learn logistic regression to predict market direction, Using Logistic Regression to Predict Market Direction-Using Logistic Regression to Predict Market Direction machine learning for, Using Machine Learning for Market Movement Prediction-Generalizing the Approach predicting future market direction, Predicting Future Market Direction predicting future returns, Predicting Future Returns-Predicting Future Returns predicting index levels, Predicting Index Levels-Predicting Index Levels price prediction based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction vectorized backtesting of regression-based strategy, Vectorized Backtesting of Regression-Based Strategy market orders, placing, Placing Market Orders-Placing Market Orders math module, Data Types mathematical functions, Data Types matplotlib, matplotlib-matplotlib, Plotting with pandas-Plotting with pandas maximum drawdown, Risk Analysis, Case Study McKinney, Wes, pandas and the DataFrame Class mean-reversion strategies, NumPy and Vectorization, Strategies Based on Mean Reversion-Generalizing the Approachbasics, Getting into the Basics-Generalizing the Approach generalizing the approach, Generalizing the Approach Python code with a class for vectorized backtesting, Momentum Backtesting Class Miniconda, Installing Miniconda-Installing Miniconda mkl (Intel Math Kernel Library), Basic Operations with Conda ML-based strategies, ML-Based Trading Strategy-Persisting the Model Objectoptimal leverage, Optimal Leverage-Optimal Leverage persisting the model object, Persisting the Model Object Python script for, Automated Trading Strategy risk analysis, Risk Analysis-Risk Analysis vectorized backtesting, Vectorized Backtesting-Vectorized Backtesting MLPClassifier, The Simple Classification Problem Revisited MLTrader class, Online Algorithm-Online Algorithm momentum strategies, Momentumbacktesting on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars basics, Getting into the Basics-Getting into the Basics generalizing the approach, Generalizing the Approach Python code with a class for vectorized backtesting, Momentum Backtesting Class Python script for custom streaming class, Python Script Python script for momentum online algorithm, Momentum Online Algorithm vectorized backtesting of, Strategies Based on Momentum-Generalizing the Approach MomentumTrader class, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time MomVectorBacktester class, Generalizing the Approach monitoringautomated trading operations, Logging and Monitoring-Logging and Monitoring, Real-Time Monitoring Python scripts for strategy monitoring, Strategy Monitoring Monte Carlo simulationsample tick data server, Sample Tick Data Server time series data based on, Python Scripts motives, for trading, Algorithmic Trading MRVectorBacktester class, Generalizing the Approach multi-layer perceptron, The Simple Classification Problem Revisited Musashi, Miyamoto, Python Infrastructure N natural language processing (NLP), Retrieving Historical Unstructured Data ndarray class, Vectorization with NumPy-Vectorization with NumPy ndarray objects, NumPy and Vectorization, ndarray Methods and NumPy Functions-ndarray Methods and NumPy Functionscreating, ndarray Creation linear regression and, A Quick Review of Linear Regression regular, Regular ndarray Object nested structures, Data Structures NLP (natural language processing), Retrieving Historical Unstructured Data np.arange(), ndarray Creation numbers, data typing of, Data Types numerical operations, pandas, Numerical Operations NumPy, NumPy and Vectorization-NumPy and Vectorization, NumPy-Random NumbersBoolean operations, Boolean Operations ndarray creation, ndarray Creation ndarray methods, ndarray Methods and NumPy Functions-ndarray Methods and NumPy Functions random numbers, Random Numbers regular ndarray object, Regular ndarray Object universal functions, ndarray Methods and NumPy Functions vectorization, Vectorization with NumPy-Vectorization with NumPy vectorized operations, Vectorized Operations numpy.random sub-package, Random Numbers NYSE Arca Gold Miners Index, Getting into the Basics O Oandaaccount configuration, Configuring Oanda Account account setup, Setting Up an Account API access, The Oanda API-The Oanda API backtesting momentum strategy on minute bars, Backtesting a Momentum Strategy on Minute Bars-Backtesting a Momentum Strategy on Minute Bars CFD trading, CFD Trading with Oanda-Python Script factoring in leverage/margin with historical data, Factoring In Leverage and Margin-Factoring In Leverage and Margin implementing trading strategies in real time, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time looking up instruments available for trading, Looking Up Instruments Available for Trading placing market orders, Placing Market Orders-Placing Market Orders Python script for custom streaming class, Python Script retrieving account information, Retrieving Account Information-Retrieving Account Information retrieving historical data, Retrieving Historical Data-Factoring In Leverage and Margin working with streaming data, Working with Streaming Data Oanda v20 RESTful API, The Oanda API, ML-Based Trading Strategy-Persisting the Model Object, Vectorized Backtesting offline algorithmdefined, Signal Generation in Real Time transformation to online algorithm, Online Algorithm OLS (ordinary least squares) regression, matplotlib online algorithmautomated trading operations, Online Algorithm-Online Algorithm defined, Signal Generation in Real Time Python script for momentum online algorithm, Momentum Online Algorithm signal generation in real time, Signal Generation in Real Time-Signal Generation in Real Time transformation of offline algorithm to, Online Algorithm .on_success() method, Implementing Trading Strategies in Real Time, Online Algorithm open data sources, Working with Open Data Sources-Working with Open Data Sources ordinary least squares (OLS) regression, matplotlib out-of-sample evaluation, Generalizing the Approach overfitting, Data Snooping and Overfitting P package manager, conda as, Conda as a Package Manager-Basic Operations with Conda pandas, pandas and the DataFrame Class-pandas and the DataFrame Class, pandas-Input-Output OperationsBoolean operations, Boolean Operations case study, Case Study-Case Study data selection, Data Selection-Data Selection DataFrame class, DataFrame Class-DataFrame Class exporting financial data to Excel/JSON, Exporting to Excel and JSON input-output operations, Input-Output Operations-Input-Output Operations numerical operations, Numerical Operations plotting, Plotting with pandas-Plotting with pandas reading financial data from Excel/JSON, Reading from Excel and JSON reading from a CSV file, Reading from a CSV File with pandas storing DataFrame objects, Storing DataFrame Objects-Storing DataFrame Objects vectorization, Vectorization with pandas-Vectorization with pandas password protection, for Jupyter lab, Jupyter Notebook Configuration File .place_buy_order() method, Backtesting Base Class .place_sell_order() method, Backtesting Base Class Plotlybasics, The Basics multiple real-time streams for, Three Real-Time Streams multiple sub-plots for streams, Three Sub-Plots for Three Streams streaming data as bars, Streaming Data as Bars visualization of streaming data, Visualizing Streaming Data with Plotly-Streaming Data as Bars plotting, with pandas, Plotting with pandas-Plotting with pandas .plot_data() method, Backtesting Base Class polyfit()/polyval() convenience functions, matplotlib price prediction, based on time series data, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction .print_balance() method, Backtesting Base Class .print_net_wealth() method, Backtesting Base Class .print_transactions() method, Retrieving Account Information pseudo-code, Python versus, Python Versus Pseudo-Code publisher-subscriber (PUB-SUB) pattern, Working with Real-Time Data and Sockets Python (generally)advantages of, Python for Algorithmic Trading basics, Python and Algorithmic Trading-References and Further Resources control structures, Control Structures data structures, Data Structures-Data Structures data types, Data Types-Data Types deployment difficulties, Python Infrastructure idioms, Python Idioms-Python Idioms NumPy and vectorization, NumPy and Vectorization-NumPy and Vectorization obstacles to adoption in financial industry, Python for Finance origins, Python for Finance pandas and DataFrame class, pandas and the DataFrame Class-pandas and the DataFrame Class pseudo-code versus, Python Versus Pseudo-Code reading from a CSV file, Reading from a CSV File with Python-Reading from a CSV File with Python Python infrastructure, Python Infrastructure-References and Further Resourcesconda as package manager, Conda as a Package Manager-Basic Operations with Conda conda as virtual environment manager, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager Docker containers, Using Docker Containers-Building a Ubuntu and Python Docker Image using cloud instances, Using Cloud Instances-Script to Orchestrate the Droplet Set Up Python scriptsautomated trading operations, Running the Code, Python Script-Strategy Monitoring backtesting base class, Backtesting Base Class custom streaming class that trades a momentum strategy, Python Script linear regression backtesting class, Linear Regression Backtesting Class long-only backtesting class, Long-Only Backtesting Class long-short backtesting class, Long-Short Backtesting Class real-time data handling, Python Scripts-Sample Data Server for Bar Plot sample time series data set, Python Scripts strategy monitoring, Strategy Monitoring uploading for automated trading operations, Uploading the Code vectorized backtesting, Python Scripts-Mean Reversion Backtesting Class Q Quandlpremium data sets, Working with Open Data Sources working with open data sources, Working with Open Data Sources-Working with Open Data Sources R random numbers, Random Numbers random walk hypothesis, Predicting Index Levels range (iterator object), Control Structures read_csv() function, Reading from a CSV File with pandas real-time data, Working with Real-Time Data and Sockets-Sample Data Server for Bar PlotPython script for handling, Python Scripts-Sample Data Server for Bar Plot signal generation in real time, Signal Generation in Real Time-Signal Generation in Real Time tick data client for, Connecting a Simple Tick Data Client tick data server for, Running a Simple Tick Data Server-Running a Simple Tick Data Server, Sample Tick Data Server visualizing streaming data with Plotly, Visualizing Streaming Data with Plotly-Streaming Data as Bars real-time monitoring, Real-Time Monitoring Refinitiv, Eikon Data API relative maximum drawdown, Case Study returns, predicting future, Predicting Future Returns-Predicting Future Returns risk analysis, for ML-based trading strategy, Risk Analysis-Risk Analysis RSA public/private keys, RSA Public and Private Keys .run_mean_reversion_strategy() method, Long-Only Backtesting Class, Long-Short Backtesting Class .run_simulation() method, Kelly Criterion in Binomial Setting S S&P 500, Algorithmic Trading-Algorithmic Tradinglogistic regression-based strategies and, Generalizing the Approach momentum strategies, Getting into the Basics passive long position in, Kelly Criterion for Stocks and Indices-Kelly Criterion for Stocks and Indices scatter objects, Three Real-Time Streams scientific stack, NumPy and Vectorization, Python, NumPy, matplotlib, pandas scikit-learn, Linear Regression with scikit-learn ScikitBacktester class, Generalizing the Approach-Generalizing the Approach SciPy package project, NumPy and Vectorization seaborn library, matplotlib-matplotlib simple moving averages (SMAs), pandas and the DataFrame Class, Simple Moving Averagestrading strategies based on, Strategies Based on Simple Moving Averages-Generalizing the Approach visualization with price ticks, Three Real-Time Streams .simulate_value() method, Running a Simple Tick Data Server Singer, Paul, CFD Trading with Oanda sockets, real-time data and, Working with Real-Time Data and Sockets-Sample Data Server for Bar Plot sorting list objects, Data Structures SQLite3, Storing Data with SQLite3-Storing Data with SQLite3 SSL certificate, RSA Public and Private Keys storage (see data storage) streaming bar plots, Streaming Data as Bars, Sample Data Server for Bar Plot streaming dataOanda and, Working with Streaming Data visualization with Plotly, Visualizing Streaming Data with Plotly-Streaming Data as Bars string objects (str), Data Types-Data Types Swiss Franc event, CFD Trading with Oanda systematic macro hedge funds, Algorithmic Trading T TensorFlow, Using Deep Learning for Market Movement Prediction, Using Deep Neural Networks to Predict Market Direction Thomas, Rob, Working with Financial Data Thorp, Edward, Capital Management tick data client, Connecting a Simple Tick Data Client tick data server, Running a Simple Tick Data Server-Running a Simple Tick Data Server, Sample Tick Data Server time series data setspandas and vectorization, Vectorization with pandas price prediction based on, The Basic Idea for Price Prediction-The Basic Idea for Price Prediction Python script for generating sample set, Python Scripts SQLite3 for storage of, Storing Data with SQLite3-Storing Data with SQLite3 TsTables for storing, Using TsTables-Using TsTables time series momentum strategies, Strategies Based on Momentum(see also momentum strategies) .to_hdf() method, Storing DataFrame Objects tpqoa wrapper package, The Oanda API, Working with Streaming Data trading platforms, factors influencing choice of, CFD Trading with Oanda trading strategies, Trading Strategies-Conclusions(see also specific strategies) implementing in real time with Oanda, Implementing Trading Strategies in Real Time-Implementing Trading Strategies in Real Time machine learning/deep learning, Machine and Deep Learning mean-reversion, NumPy and Vectorization momentum, Momentum simple moving averages, Simple Moving Averages trading, motives for, Algorithmic Trading transaction costs, Long-Only Backtesting Class, Vectorized Backtesting TsTables package, Using TsTables-Using TsTables tuple objects, Data Structures U Ubuntu, Building a Ubuntu and Python Docker Image-Building a Ubuntu and Python Docker Image universal functions, NumPy, ndarray Methods and NumPy Functions V v20 wrapper package, The Oanda API, ML-Based Trading Strategy-Persisting the Model Object, Vectorized Backtesting value-at-risk (VAR), Risk Analysis-Risk Analysis vectorization, NumPy and Vectorization, Strategies Based on Mean Reversion-Generalizing the Approach vectorized backtestingdata snooping and overfitting, Data Snooping and Overfitting-Conclusions ML-based trading strategy, Vectorized Backtesting-Vectorized Backtesting momentum-based trading strategies, Strategies Based on Momentum-Generalizing the Approach potential shortcomings, Building Classes for Event-Based Backtesting Python code with a class for vectorized backtesting of mean-reversion trading strategies, Momentum Backtesting Class Python scripts for, Python Scripts-Mean Reversion Backtesting Class, Linear Regression Backtesting Class regression-based strategy, Vectorized Backtesting of Regression-Based Strategy trading strategies based on simple moving averages, Strategies Based on Simple Moving Averages-Generalizing the Approach vectorization with NumPy, Vectorization with NumPy-Vectorization with NumPy vectorization with pandas, Vectorization with pandas-Vectorization with pandas vectorized operations, Vectorized Operations virtual environment management, Conda as a Virtual Environment Manager-Conda as a Virtual Environment Manager W while loops, Control Structures Z ZeroMQ, Working with Real-Time Data and Sockets About the Author Dr.


pages: 252 words: 74,167

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future by Luke Dormehl

"World Economic Forum" Davos, Ada Lovelace, agricultural Revolution, AI winter, Albert Einstein, Alexey Pajitnov wrote Tetris, algorithmic management, algorithmic trading, AlphaGo, Amazon Mechanical Turk, Apple II, artificial general intelligence, Automated Insights, autonomous vehicles, backpropagation, Bletchley Park, book scanning, borderless world, call centre, cellular automata, Charles Babbage, Claude Shannon: information theory, cloud computing, computer vision, Computing Machinery and Intelligence, correlation does not imply causation, crowdsourcing, deep learning, DeepMind, driverless car, drone strike, Elon Musk, Flash crash, Ford Model T, friendly AI, game design, Geoffrey Hinton, global village, Google X / Alphabet X, Hans Moravec, hive mind, industrial robot, information retrieval, Internet of things, iterative process, Jaron Lanier, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, life extension, Loebner Prize, machine translation, Marc Andreessen, Mark Zuckerberg, Menlo Park, Mustafa Suleyman, natural language processing, Nick Bostrom, Norbert Wiener, out of africa, PageRank, paperclip maximiser, pattern recognition, radical life extension, Ray Kurzweil, recommendation engine, remote working, RFID, scientific management, self-driving car, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, social intelligence, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, tech billionaire, technological singularity, The Coming Technological Singularity, The Future of Employment, Tim Cook: Apple, Tony Fadell, too big to fail, traumatic brain injury, Turing machine, Turing test, Vernor Vinge, warehouse robotics, Watson beat the top human players on Jeopardy!

‘See it, THINK, and marvel at the mind of man and his machine,’ wrote one giddy reviewer, borrowing the ‘Think’ tagline that had been IBM’s since the 1920s. IBM showed off several impressive technologies at the event. One was a groundbreaking handwriting recognition computer, which the official fair brochure referred to as an ‘Optical Scanning and Information Retrieval’ system. This demo allowed visitors to write an historical date of their choosing (post-1851) in their own handwriting on a small card. That card was then fed into an ‘optical character reader’ where it was converted into digital form, and then relayed once more to a state-of-the-art IBM 1460 computer system.

Simon’s prediction was hopelessly off, but as it turns out, the second thing that registers about the World’s Fair is that IBM wasn’t wrong. All three of the technologies that dropped jaws in 1964 are commonplace today – despite our continued insistence that AI is not yet here. The Optical Scanning and Information Retrieval has become the Internet: granting us access to more information at a moment’s notice than we could possibly hope to absorb in a lifetime. While we still cannot see the future, we are making enormous advances in this capacity, thanks to the huge datasets generated by users that offer constant forecasts about the news stories, books or songs that are likely to be of interest to us.

Another, called ANALOGY, did the same for the geometric questions found in IQ tests, while STUDENT cracked complex algebra story conundrums such as: ‘If the number of customers Tom gets is twice the square of 20 per cent of the number of advertisements he runs, and the number of advertisements he runs is 45, what is the number of customers Tom gets?fn1 A particularly impressive display of computational reasoning was a program called SIR (standing for Semantic Information Retrieval). SIR appeared to understand English sentences and was even able to learn relationships between objects in a way that resembled real intelligence. In reality, this ‘knowledge’ relied on a series of pre-programmed templates, such as A is a part of B, with nouns substituting for the variables.


Designing Search: UX Strategies for Ecommerce Success by Greg Nudelman, Pabini Gabriel-Petit

access to a mobile phone, Albert Einstein, AltaVista, augmented reality, barriers to entry, Benchmark Capital, business intelligence, call centre, cognitive load, crowdsourcing, folksonomy, information retrieval, Internet of things, Neal Stephenson, Palm Treo, performance metric, QR code, recommendation engine, RFID, search costs, search engine result page, semantic web, Silicon Valley, social graph, social web, speech recognition, text mining, the long tail, the map is not the territory, The Wisdom of Crowds, web application, zero-sum game, Zipcar

He holds a degree in music theory and composition from Harvard. Daniel Tunkelang is a leading industry advocate of human-computer information retrieval (HCIR). He was a founding employee of faceted search pioneer Endeca, where he spent ten years as Chief Scientist. During that time, he established the HCIR workshop, which has taken place annually since 2007. Always working to bring together industry and academia, he co-organized the 2010 Workshop on Search and Social Media and has served as an organizer for the industry tracks of the premier conferences on information retrieval: SIGIR and CIKM. He authored a popular book on faceted search as part of the Morgan & Claypool Synthesis Lectures.

Journal of the American Society for Information Science, 48(11): pp. 1036–1048, 1997. [Cousins, 1997] S.B. Cousins. “Reification and Affordances in a User Interface for Interacting with Heterogeneous Distributed Applications.” PhD thesis, Stanford University, May 1997. [Ellis, 1989] D. Ellis. A behavioural model for information retrieval system design. Journal of Information Science, 15: pp. 237–247, 1989. [Bates, 1979] M.J. Bates. Information search tactics. Journal of the American Society for Information Science, 30(4): pp. 205–214, 1979. [Norman, 1988] D.A. Norman. The Psychology of Everyday Things. Basic Books, New York, 1988.

Basic Books, New York, 1988. [Pirolli and Card, 1999] P. Pirolli and S.K. Card. Information foraging. Psychological Review, 106(4):pp. 643–675, 1999. [Belkin et al., 1993] N. Belkin, P. G. Marchetti, and C. Cool. Braque – design of an interface to support user interaction in information retrieval. Information Processing and Management, 29(3): pp. 325–344, 1993. [Chang and Rice, 1993] Shan-Ju Chang and Ronald E. Rice. Browsing: A multidimensional framework. Annual Review of Information Science and Technology, 28: pp. 231–276, 1993. [Marchionini, 1995] Gary Marchionini. Information Seeking in Electronic Environments.


pages: 1,535 words: 337,071

Networks, Crowds, and Markets: Reasoning About a Highly Connected World by David Easley, Jon Kleinberg

Albert Einstein, AltaVista, AOL-Time Warner, Apollo 13, classic study, clean water, conceptual framework, Daniel Kahneman / Amos Tversky, Douglas Hofstadter, Dutch auction, Erdős number, experimental subject, first-price auction, fudge factor, Garrett Hardin, George Akerlof, Gerard Salton, Gerard Salton, Gödel, Escher, Bach, incomplete markets, information asymmetry, information retrieval, John Nash: game theory, Kenneth Arrow, longitudinal study, market clearing, market microstructure, moral hazard, Nash equilibrium, Network effects, Pareto efficiency, Paul Erdős, planetary scale, power law, prediction markets, price anchoring, price mechanism, prisoner's dilemma, random walk, recommendation engine, Richard Thaler, Ronald Coase, sealed-bid auction, search engine result page, second-price auction, second-price sealed-bid, seminal paper, Simon Singh, slashdot, social contagion, social web, Steve Jobs, Steve Jurvetson, stochastic process, Ted Nelson, the long tail, The Market for Lemons, the strength of weak ties, The Wisdom of Crowds, trade route, Tragedy of the Commons, transaction costs, two and twenty, ultimatum game, Vannevar Bush, Vickrey auction, Vilfredo Pareto, Yogi Berra, zero-sum game

Before discussing some of the ideas behind the ranking of pages, let’s begin by considering some of the basic reasons why it’s a hard problem. First, search is a hard problem for computers to solve in any setting, not just on the Web. Indeed, the field of information retrieval [35, 354] has dealt with this problem for decades before the creation of the Web: automated information retrieval systems starting in the 1960s were designed to search repositories of newspaper articles, scientific papers, patents, legal abstracts, and other document collections in reponse to keyword queries. Information retrieval systems have always had to deal with the problem that keywords are a very limited way to express a complex information need; in addition to the fact that a list of keywords is short and inexpressive, it suffers from the problems of synonomy (multiple ways to say the same thing, so that your search for recipes involving scallions fails because the recipe you wanted called them “green onions”) and pol-ysemy (multiple meaning for the same term, so that your search for information about the animal called a jaguar instead produces results primarily about automobiles, football players, and an operating system for the Apple Macintosh.)

Even today, such news search features are only partly integrated into the core parts of the search engine interface, and emerging Web sites such as Twitter continue to fill in the spaces that exist between static content and real-time awareness. More fundamental still, and at the heart of many of these issues, is the fact that the Web has shifted much of the information retrieval question from a problem of scarcity to a problem of abundance. The prototypical applications of information retrieval in the pre-Web era had a “needle-in-a-haystack” flavor — for example, an intellectual-property attorney might express the information need, “find me any patents that have dealt with the design 14.2. LINK ANALYSIS USING HUBS AND AUTHORITIES 407 of elevator speed regulators based on fuzzy-logic controllers.”

With this in mind, people who depended on the success of their Web sites increasingly began modifying their Web-page authoring styles to score highly in search engine rankings. For people who had conceived of Web search as a kind of classical information retrieval application, this was something novel. Back in the 1970s and 1980s, when people designed information retrieval tools for scientific papers or newspaper articles, authors were not overtly writing their papers or abstracts with these search tools in mind.4 From the relatively early days of the Web, however, people have written Web pages with search engines quite explicitly in mind.


Cataloging the World: Paul Otlet and the Birth of the Information Age by Alex Wright

1960s counterculture, Ada Lovelace, barriers to entry, British Empire, business climate, business intelligence, Cape to Cairo, card file, centralized clearinghouse, Charles Babbage, Computer Lib, corporate governance, crowdsourcing, Danny Hillis, Deng Xiaoping, don't be evil, Douglas Engelbart, Douglas Engelbart, Electric Kool-Aid Acid Test, European colonialism, folksonomy, Frederick Winslow Taylor, Great Leap Forward, hive mind, Howard Rheingold, index card, information retrieval, invention of movable type, invention of the printing press, Jane Jacobs, John Markoff, Kevin Kelly, knowledge worker, Law of Accelerating Returns, Lewis Mumford, linked data, Livingstone, I presume, lone genius, machine readable, Menlo Park, military-industrial complex, Mother of all demos, Norman Mailer, out of africa, packet switching, pneumatic tube, profit motive, RAND corporation, Ray Kurzweil, scientific management, Scramble for Africa, self-driving car, semantic web, Silicon Valley, speech recognition, Steve Jobs, Stewart Brand, systems thinking, Ted Nelson, The Death and Life of Great American Cities, the scientific method, Thomas L Friedman, urban planning, Vannevar Bush, W. E. B. Du Bois, Whole Earth Catalog

In 1780, an Austrian named Gerhard van Swieten further adapted the technique to create a master catalog for the Austrian National Library, known as the Josephinian Catalog (named for Austria’s “enlightened despot” Joseph II). Van Swieten decided to store his catalog cards in 205 wooden boxes, sealed in an airtight locker—the first recognizable precursor to the once familiar, now rapidly disappearing, library card catalog.23 Today, we might tend to think of the card catalog as a simplistic information retrieval tool: the dominion of somber librarians in fusty reading rooms. However, to take such a dismissive view of these compact, efficient systems—the direct ancestors of the modern database—may lead us to overlook the critical role they played in the industrial information explosion that would reshape the European world in the nineteenth century.

In the mid-1930s, IBM was building its portfolio of electronic devices (even before it had started manufacturing any of them), long before Vannevar Bush, then dean of engineering at the Massachusetts Institute of Technology, published his famous essay “As We May Think.” Today, most computer science historians have characterized Bush’s Rapid Selector as the first electronic information-retrieval machine. When Bush tried to patent his invention in 1937 and 1940, however, the U.S. Patent Office turned him down, citing Goldberg’s work. And while there is no evidence that Goldberg’s invention directly influenced Bush’s work, Donker Duyvis—Paul Otlet’s eventual successor at the IIB—did tell Bush about Goldberg’s ­invention in 1946.12 Despite his considerable achievements, Goldberg remains all but unknown today.

Only when the conflict between nation-states had been eliminated could humanity finally realize its spiritual and intellectual potential. Worldwide dissemination of recorded knowledge was an essential step along that path. Like Otlet, Wells believed that better access to information might help prevent future wars. Beginning with his 1905 work, A Modern Utopia, Wells had developed a fascination with the problem of information retrieval— the need for better methods for organizing the world’s recorded knowledge. This led him to reject old values and institutional strictures and embrace a mechanistic approach, one founded on Taylorist ideals of scientific management and a belief in the power of science to solve humanity’s problems, and the coming war in particular.


pages: 174 words: 56,405

Machine Translation by Thierry Poibeau

Alignment Problem, AlphaGo, AltaVista, augmented reality, call centre, Claude Shannon: information theory, cloud computing, combinatorial explosion, crowdsourcing, deep learning, DeepMind, easy for humans, difficult for computers, en.wikipedia.org, geopolitical risk, Google Glasses, information retrieval, Internet of things, language acquisition, machine readable, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, natural language processing, Necker cube, Norbert Wiener, RAND corporation, Robert Mercer, seminal paper, Skype, speech recognition, statistical model, technological singularity, Turing test, wikimedia commons

Speech translation has become a hot topic (“speech to speech” applications aim at making it possible to speak in one’s own language with another interlocutor speaking in a foreign language by using live automated translation). The machine translation market is growing fast. Over the last few years we have witnessed the emergence of new applications, particularly on mobile devices. Cross-Language Information Retrieval Cross-language information retrieval aims to give access to documents initially written in different languages. Consider research on patents: when a company seeks to know if an idea or a process has already been patented, it must ensure that its research is exhaustive and covers all parts of the world. It is therefore fundamental to cross the language barrier, for both the query (i.e., the information need expressed through keywords) and the analysis of the responses (i.e., documents relevant to the information need).

See Smart glasses; Smart watch Construction (linguistic), 23 Context (linguistic), 17–21, 31, 34, 54–56, 64–67, 71, 92, 117–119, 129, 150, 176–178, 186, 188, 215–216, 238 Continuous model, 186–187 Conversational agent, 2. See also Artificial dialogue Coordination, 175 Corpus alignment, 91–108 Cross-language information retrieval, 238–239 Cryptography, 49, 52, 56, 58–60 Cryptology. See Cryptography CSLi, 232, 236 Cultural hegemony, 168, 250–251 Czech, 210, 213 DARPA, 200–203, 209, 259 Database access, 241 Date expressions, 115, 152, 160 Deceptive cognate, 11, 261 Decoder, 141, 144, 185, 186, 190 Deep learning, 34–35, 37, 170, 181–195, 228, 234, 247, 253–255 Deepmind, 182 Defense industry, 77, 88, 173, 232–233, 235 De Firmas-Périés, Arman-Charles-Daniel, 41 De Maimieux, Joseph, 41 Descartes, René, 40–42 Determiner, 133, 215 Dialogue.

See Machine translation systems Ideographic writing system, 105 Idiom. See Idiomatic expression Idiomatic expression, 10, 11, 15, 23, 28, 30, 33, 115, 125, 178, 217, 219, 262 Iida, Hitoshi, 117 Image recognition, 183 Indirect machine translation, 25–32 Indo-European languages, 165, 213, 214, 250 Information retrieval, 45, 92, 238–239 Informativeness, 201, 206 Intelligence industry. See Intelligence services Intelligence services, 77, 89, 173, 225, 233, 235, 249 Interception (of communications), 225, 232 Interlingua, 24, 28–32, 40, 58, 63, 66–68, 85, 262 Interlingual machine translation. See Interlingua Intermediate representation, 25–32, 63 Internet, 33, 93, 97, 98, 100, 102, 164, 166, 168–169, 172, 197, 227–233, 238, 242–243, 247–250 link, 98–99 Interpretation, 20, 201 Island of confidence, 102, 108, 150 Isolating language, 215–216 Israel, 60, 69 Japan, 44, 67, 86, 87, 109 Japanese, 11, 88, 117–118, 164–165, 192, 242 Jibbigo, 236 JRC-Acquis corpus, 97, 212–213, 223 Keyword, 92, 99, 238 Kilgarriff, Adam, 18 King, Gilbert, 76 Kircher, Athanasius, 41 Koehn, Philip, 136, 212–213 Korean, 88, 235–236 Language complexity (see Complexity) diversity, 1, 164–170 (see also typology) exposure (see Child language acquisition) family, 30, 106, 138, 172–174 independent representation (see Interlingua) learning (see Child language acquisition) model, 127, 140, 142, 144, 153, 185 proximity, 163 (see also family) typology, 138, 192 (see also family) universal, 56, 66, 67 (see also Universal language) Lavie, Alon, 206 Learning step (or learning phase).


pages: 394 words: 108,215

What the Dormouse Said: How the Sixties Counterculture Shaped the Personal Computer Industry by John Markoff

Any sufficiently advanced technology is indistinguishable from magic, Apple II, back-to-the-land, beat the dealer, Bill Duvall, Bill Gates: Altair 8800, Buckminster Fuller, California gold rush, card file, computer age, Computer Lib, computer vision, conceptual framework, cuban missile crisis, different worldview, digital divide, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Thorp, El Camino Real, Electric Kool-Aid Acid Test, Fairchild Semiconductor, General Magic , general-purpose programming language, Golden Gate Park, Hacker Ethic, Hans Moravec, hypertext link, informal economy, information retrieval, invention of the printing press, Ivan Sutherland, Jeff Rulifson, John Markoff, John Nash: game theory, John von Neumann, Kevin Kelly, knowledge worker, Lewis Mumford, Mahatma Gandhi, Menlo Park, military-industrial complex, Mother of all demos, Norbert Wiener, packet switching, Paul Terrell, popular electronics, punch-card reader, QWERTY keyboard, RAND corporation, RFC: Request For Comment, Richard Stallman, Robert X Cringely, Sand Hill Road, Silicon Valley, Silicon Valley startup, South of Market, San Francisco, speech recognition, Steve Crocker, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, technological determinism, Ted Nelson, The Hackers Conference, The Theory of the Leisure Class by Thorstein Veblen, Thorstein Veblen, Turing test, union organizing, Vannevar Bush, We are as Gods, Whole Earth Catalog, William Shockley: the traitorous eight

In 1960, Engelbart presented a paper at the annual meeting of the American Documentation Institute, outlining how computer systems of the future might change the role of information-retrieval specialists. The idea didn’t sit at all well with his audience, which gave his paper a blasé reception. He also got into an argument with a researcher who asserted that Engelbart was proposing nothing that was any different from any of the other information-retrieval efforts that were already under way. It was a long and lonely two years. The state of the art of computer science was moving quickly toward mathematical algorithms, and the computer scientists looked down their nose at his work, belittling it as mere office automation and hence beneath their notice.

For a while, he thought that the emergent field of artificial intelligence might provide him with some support, or at least meaningful overlap. But the AI researchers translated his ideas into their own, and the concept of Augmentation seemed pallid when viewed through their eyes, reduced to the more mundane idea of information retrieval, missing Engelbart’s dream entirely.4 Gradually, he began to understand that the AI community was actually his philosophical enemy. After all, their vision was to replace humans with machines, while he wanted to extend and empower people. Engelbart would later say that he had nothing against the vision of AI but just believed that it would be decades and decades before it could be realized.

There was an abyss between the original work done by Engelbart’s group in the sixties and the motley crew of hobbyists that would create the personal-computer industry beginning in 1975. In their hunger to possess their own computers, the PC hobbyists would miss the crux of the original idea: communications as an integral part of the design. That was at the heart of the epiphanies that Engelbart had years earlier, which led to the realization of Vannevar Bush’s Memex information-retrieval system of the 1940s. During the period from the early 1960s until 1969, when most of the development of the NLS system was completed, Engelbart and his band of researchers remained in a comfortable bubble. They were largely Pentagon funded, but unlike many of the engineering and computing groups that surrounded them at SRI, they weren’t doing work that directly contributed to the Vietnam War.


pages: 893 words: 199,542

Structure and interpretation of computer programs by Harold Abelson, Gerald Jay Sussman, Julie Sussman

Andrew Wiles, conceptual framework, Donald Knuth, Douglas Hofstadter, Eratosthenes, Fermat's Last Theorem, functional programming, Gödel, Escher, Bach, higher-order functions, industrial robot, information retrieval, iterative process, Ivan Sutherland, Johannes Kepler, loose coupling, machine translation, Multics, probability theory / Blaise Pascal / Pierre de Fermat, Richard Stallman, Turing machine

Use the results of exercises 2.63 and 2.64 to give θ(n) implementations of union-set and intersection-set for sets implemented as (balanced) binary trees.41 Sets and information retrieval We have examined options for using lists to represent sets and have seen how the choice of representation for a data object can have a large impact on the performance of the programs that use the data. Another reason for concentrating on sets is that the techniques discussed here appear again and again in applications involving information retrieval. Consider a data base containing a large number of individual records, such as the personnel files for a company or the transactions in an accounting system.

Also, a central role is played in the implementation by a frame data structure, which determines the correspondence between symbols and their associated values. One additional interesting aspect of our query-language implementation is that we make substantial use of streams, which were introduced in chapter 3. 4.4.1 Deductive Information Retrieval Logic programming excels in providing interfaces to data bases for information retrieval. The query language we shall implement in this chapter is designed to be used in this way. In order to illustrate what the query system does, we will show how it can be used to manage the data base of personnel records for Microshaft, a thriving high-technology company in the Boston area.

The resulting RSA algorithm has become a widely used technique for enhancing the security of electronic communications. Because of this and related developments, the study of prime numbers, once considered the epitome of a topic in “pure” mathematics to be studied only for its own sake, now turns out to have important practical applications to cryptography, electronic funds transfer, and information retrieval. 1.3 Formulating Abstractions with Higher-Order Procedures We have seen that procedures are, in effect, abstractions that describe compound operations on numbers independent of the particular numbers. For example, when we (define (cube x) (* x x x)) we are not talking about the cube of a particular number, but rather about a method for obtaining the cube of any number.


pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions by Brian Christian, Tom Griffiths

4chan, Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, algorithmic bias, algorithmic trading, anthropic principle, asset allocation, autonomous vehicles, Bayesian statistics, behavioural economics, Berlin Wall, Big Tech, Bill Duvall, bitcoin, Boeing 747, Charles Babbage, cognitive load, Community Supported Agriculture, complexity theory, constrained optimization, cosmological principle, cryptocurrency, Danny Hillis, data science, David Heinemeier Hansson, David Sedaris, delayed gratification, dematerialisation, diversification, Donald Knuth, Donald Shoup, double helix, Dutch auction, Elon Musk, exponential backoff, fault tolerance, Fellow of the Royal Society, Firefox, first-price auction, Flash crash, Frederick Winslow Taylor, fulfillment center, Garrett Hardin, Geoffrey Hinton, George Akerlof, global supply chain, Google Chrome, heat death of the universe, Henri Poincaré, information retrieval, Internet Archive, Jeff Bezos, Johannes Kepler, John Nash: game theory, John von Neumann, Kickstarter, knapsack problem, Lao Tzu, Leonard Kleinrock, level 1 cache, linear programming, martingale, multi-armed bandit, Nash equilibrium, natural language processing, NP-complete, P = NP, packet switching, Pierre-Simon Laplace, power law, prediction markets, race to the bottom, RAND corporation, RFC: Request For Comment, Robert X Cringely, Sam Altman, scientific management, sealed-bid auction, second-price auction, self-driving car, Silicon Valley, Skype, sorting algorithm, spectrum auction, Stanford marshmallow experiment, Steve Jobs, stochastic process, Thomas Bayes, Thomas Malthus, Tragedy of the Commons, traveling salesman, Turing machine, urban planning, Vickrey auction, Vilfredo Pareto, Walter Mischel, Y Combinator, zero-sum game

Anderson and Milson, “Human Memory,” in turn, draws from a statistical study of library borrowing that appears in Burrell, “A Simple Stochastic Model for Library Loans.” the missing piece in the study of the mind: Anderson’s initial exploration of connections between information retrieval by computers and the organization of human memory was conducted in an era when most people had never interacted with an information retrieval system, and the systems in use were quite primitive. As search engine research has pushed the boundaries of what information retrieval systems can do, it’s created new opportunities for discovering parallels between minds and machines. For example, Tom and his colleagues have shown how ideas behind Google’s PageRank algorithm are relevant to understanding human semantic memory.

Does it suggest that human memory is good or bad? What’s the underlying story here? These questions have stimulated psychologists’ speculation and research for more than a hundred years. In 1987, Carnegie Mellon psychologist and computer scientist John Anderson found himself reading about the information retrieval systems of university libraries. Anderson’s goal—or so he thought—was to write about how the design of those systems could be informed by the study of human memory. Instead, the opposite happened: he realized that information science could provide the missing piece in the study of the mind.

Basically, all of these theories characterize memory as an arbitrary and non-optimal configuration.… I had long felt that the basic memory processes were quite adaptive and perhaps even optimal; however, I had never been able to see a framework in which to make this point. In the computer science work on information retrieval, I saw that framework laid out before me.” A natural way to think about forgetting is that our minds simply run out of space. The key idea behind Anderson’s new account of human memory is that the problem might be not one of storage, but of organization. According to his theory, the mind has essentially infinite capacity for memories, but we have only a finite amount of time in which to search for them.


pages: 250 words: 73,574

Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today's Computers by John MacCormick, Chris Bishop

Ada Lovelace, AltaVista, Charles Babbage, Claude Shannon: information theory, Computing Machinery and Intelligence, fault tolerance, information retrieval, Menlo Park, PageRank, pattern recognition, Richard Feynman, Silicon Valley, Simon Singh, sorting algorithm, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, traveling salesman, Turing machine, Turing test, Vannevar Bush

An example set of web pages that each have a title and a body. We already know that the Babylonians were using indexing 5000 years before search engines existed. It turns out that search engines did not invent the word-location trick either: this is a well-known technique that was used in other types of information retrieval before the internet arrived on the scene. However, in the next section we will learn about a new trick that does appear to have been invented by search engine designers: the metaword trick. The cunning use of this trick and various related ideas helped to catapult the AltaVista search engine to the top of the search industry in the late 1990s.

In other cases, the algorithms may have existed in the research community for some time, waiting in the wings for the right wave of new technology to give them wide applicability. The search algorithms for indexing and ranking fall into this category: similar algorithms had existed for years in the field known as information retrieval, but it took the phenomenon of web search to make these algorithms “great,” in the sense of daily use by ordinary computer users. Of course, the algorithms also evolved for their new application; PageRank is a good example of this. Note that the emergence of new technology does not necessarily lead to new algorithms.

Among the many college-level computer science texts on algorithms, three particularly readable options are Algorithms, by Dasgupta, Papadimitriou, and Vazirani; Algorithmics: The Spirit of Computing, by Harel and Feldman; and Introduction to Algorithms, by Cormen, Leiserson, Rivest, and Stein. Search engine indexing (chapter 2). The original AltaVista patent covering the metaword trick is U.S. patent 6105019, “Constrained Searching of an Index,” by Mike Burrows (2000). For readers with a computer science background, Search Engines: Information Retrieval in Practice, by Croft, Metzler, and Strohman, is a good option for learning more about indexing and many other aspects of search engines. PageRank (chapter 3). The opening quotation by Larry Page is taken from an interview by Ben Elgin, published in Businessweek, May 3, 2004. Vannevar Bush's “As We May Think” was, as mentioned above, originally published in The Atlantic magazine (July 1945).


pages: 397 words: 102,910

The Idealist: Aaron Swartz and the Rise of Free Culture on the Internet by Justin Peters

4chan, Aaron Swartz, activist lawyer, Alan Greenspan, Any sufficiently advanced technology is indistinguishable from magic, Bayesian statistics, Brewster Kahle, buy low sell high, crowdsourcing, digital rights, disintermediation, don't be evil, Free Software Foundation, global village, Hacker Ethic, hypertext link, index card, informal economy, information retrieval, Internet Archive, invention of movable type, invention of writing, Isaac Newton, John Markoff, Joi Ito, Lean Startup, machine readable, military-industrial complex, moral panic, Open Library, Paul Buchheit, Paul Graham, profit motive, RAND corporation, Republic of Letters, Richard Stallman, selection bias, semantic web, Silicon Valley, social bookmarking, social web, Steve Jobs, Steven Levy, Stewart Brand, strikebreaker, subprime mortgage crisis, Twitter Arab Spring, Vannevar Bush, Whole Earth Catalog, Y Combinator

His brief remarks to the group at Woods Hole were wistful: “I merely wish I were young enough to participate with you in the fascinating intricacies you will encounter and bring under your control.”48 Vannevar rhymes with believer, and when it came to government funding of scientific research, Bush certainly was. He was also a lifelong believer in libraries, and the benefits to be derived from their automation. In 1945, he published an article in the Atlantic Monthly that proposed a rudimentary mechanized library called Memex, a linked-information retrieval system. Memex was a desk-size machine that was equal parts stenographer, filing cabinet, and reference librarian: “a device in which an individual stores his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility.”49 The goal was to build a machine that could capture a user’s thought patterns, compile and organize his reading material and correspondence, and record the resulting “associative trails” between them all, such that the user could trace his end insights back to conception.

The rise of photoduplication technologies that facilitated the rapid spread of information merely underscored the fragility of copyright holders’ claims that intellectual property was indistinguishable from regular physical property. “We know that volumes of information can be stored on microfilm and magnetic tape. We keep hearing about information-retrieval networks,” former senator Kenneth B. Keating told Congress in 1965. “The inexorable question arises—what will happen in the long run if authors’ income is cut down and down by increasing free uses by photocopy and information storage and retrieval? Will the authors continue writing? Will the publishers continue publishing if their markets are diluted, eroded, and eventually, the profit motive and incentive completely destroyed?

Project Gutenberg had become an eloquent counterargument to copyright advocates’ dismissive claims about the public domain. It demonstrated just how easily a network could be used to breathe new life into classics that might otherwise go unseen. Despite the existence of initiatives such as Project Gutenberg, despite the emergence of the Internet as a new medium for information retrieval and distribution, the same official attitudes about intellectual property prevailed. The public domain was regarded as a penalty rather than as an opportunity. Parochial concerns were conflated with the public interest. The rise of the Internet might portend an informational revolution, but from the standpoint of the people in power, Hart warned, revolution was a bad thing.


pages: 1,387 words: 202,295

Structure and Interpretation of Computer Programs, Second Edition by Harold Abelson, Gerald Jay Sussman, Julie Sussman

Andrew Wiles, conceptual framework, Donald Knuth, Douglas Hofstadter, Eratosthenes, functional programming, Gödel, Escher, Bach, higher-order functions, industrial robot, information retrieval, iterative process, Ivan Sutherland, Johannes Kepler, loose coupling, machine translation, Multics, probability theory / Blaise Pascal / Pierre de Fermat, Richard Stallman, Turing machine, wikimedia commons

Exercise 2.65: Use the results of Exercise 2.63 and Exercise 2.64 to give implementations of union-set and intersection-set for sets implemented as (balanced) binary trees.107 Sets and information retrieval We have examined options for using lists to represent sets and have seen how the choice of representation for a data object can have a large impact on the performance of the programs that use the data. Another reason for concentrating on sets is that the techniques discussed here appear again and again in applications involving information retrieval. Consider a data base containing a large number of individual records, such as the personnel files for a company or the transactions in an accounting system.

Also, a central role is played in the implementation by a frame data structure, which determines the correspondence between symbols and their associated values. One additional interesting aspect of our query-language implementation is that we make substantial use of streams, which were introduced in Chapter 3. 4.4.1Deductive Information Retrieval Logic programming excels in providing interfaces to data bases for information retrieval. The query language we shall implement in this chapter is designed to be used in this way. In order to illustrate what the query system does, we will show how it can be used to manage the data base of personnel records for Microshaft, a thriving high-technology company in the Boston area.

2.1.4 Extended Exercise: Interval Arithmetic 2.2 Hierarchical Data and the Closure Property 2.2.1 Representing Sequences 2.2.2 Hierarchical Structures 2.2.3 Sequences as Conventional Interfaces 2.2.4 Example: A Picture Language 2.3 Symbolic Data 2.3.1 Quotation 2.3.2 Example: Symbolic Differentiation 2.3.3 Example: Representing Sets 2.3.4 Example: Huffman Encoding Trees 2.4 Multiple Representations for Abstract Data 2.4.1 Representations for Complex Numbers 2.4.2 Tagged data 2.4.3 Data-Directed Programming and Additivity 2.5 Systems with Generic Operations 2.5.1 Generic Arithmetic Operations 2.5.2 Combining Data of Different Types 2.5.3 Example: Symbolic Algebra 3 Modularity, Objects, and State 3.1 Assignment and Local State 3.1.1 Local State Variables 3.1.2 The Benefits of Introducing Assignment 3.1.3 The Costs of Introducing Assignment 3.2 The Environment Model of Evaluation 3.2.1 The Rules for Evaluation 3.2.2 Applying Simple Procedures 3.2.3 Frames as the Repository of Local State 3.2.4 Internal Definitions 3.3 Modeling with Mutable Data 3.3.1 Mutable List Structure 3.3.2 Representing Queues 3.3.3 Representing Tables 3.3.4 A Simulator for Digital Circuits 3.3.5 Propagation of Constraints 3.4 Concurrency: Time Is of the Essence 3.4.1 The Nature of Time in Concurrent Systems 3.4.2 Mechanisms for Controlling Concurrency 3.5 Streams 3.5.1 Streams Are Delayed Lists 3.5.2 Infinite Streams 3.5.3 Exploiting the Stream Paradigm 3.5.4 Streams and Delayed Evaluation 3.5.5 Modularity of Functional Programs and Modularity of Objects 4 Metalinguistic Abstraction 4.1 The Metacircular Evaluator 4.1.1 The Core of the Evaluator 4.1.2 Representing Expressions 4.1.3 Evaluator Data Structures 4.1.4 Running the Evaluator as a Program 4.1.5 Data as Programs 4.1.6 Internal Definitions 4.1.7 Separating Syntactic Analysis from Execution 4.2 Variations on a Scheme — Lazy Evaluation 4.2.1 Normal Order and Applicative Order 4.2.2 An Interpreter with Lazy Evaluation 4.2.3 Streams as Lazy Lists 4.3 Variations on a Scheme — Nondeterministic Computing 4.3.1 Amb and Search 4.3.2 Examples of Nondeterministic Programs 4.3.3 Implementing the Amb Evaluator 4.4 Logic Programming 4.4.1 Deductive Information Retrieval 4.4.2 How the Query System Works 4.4.3 Is Logic Programming Mathematical Logic? 4.4.4 Implementing the Query System 4.4.4.1 The Driver Loop and Instantiation 4.4.4.2 The Evaluator 4.4.4.3 Finding Assertions by Pattern Matching 4.4.4.4 Rules and Unification 4.4.4.5 Maintaining the Data Base 4.4.4.6 Stream Operations 4.4.4.7 Query Syntax Procedures 4.4.4.8 Frames and Bindings 5 Computing with Register Machines 5.1 Designing Register Machines 5.1.1 A Language for Describing Register Machines 5.1.2 Abstraction in Machine Design 5.1.3 Subroutines 5.1.4 Using a Stack to Implement Recursion 5.1.5 Instruction Summary 5.2 A Register-Machine Simulator 5.2.1 The Machine Model 5.2.2 The Assembler 5.2.3 Generating Execution Procedures for Instructions 5.2.4 Monitoring Machine Performance 5.3 Storage Allocation and Garbage Collection 5.3.1 Memory as Vectors 5.3.2 Maintaining the Illusion of Infinite Memory 5.4 The Explicit-Control Evaluator 5.4.1 The Core of the Explicit-Control Evaluator 5.4.2 Sequence Evaluation and Tail Recursion 5.4.3 Conditionals, Assignments, and Definitions 5.4.4 Running the Evaluator 5.5 Compilation 5.5.1 Structure of the Compiler 5.5.2 Compiling Expressions 5.5.3 Compiling Combinations 5.5.4 Combining Instruction Sequences 5.5.5 An Example of Compiled Code 5.5.6 Lexical Addressing 5.5.7 Interfacing Compiled Code to the Evaluator References List of Exercises List of Figures Term Index Colophon Next: UTF, Prev: (dir), Up: (dir) [Contents] Next: UTF, Prev: (dir), Up: (dir) [Contents] Next: Dedication, Prev: Top, Up: Top [Contents] Unofficial Texinfo Format This is the second edition SICP book, from Unofficial Texinfo Format.


The Card Catalog: Books, Cards, and Literary Treasures by Library Of Congress, Carla Hayden

In Cold Blood by Truman Capote, index card, information retrieval, Johannes Kepler, late fees, machine readable, W. E. B. Du Bois

About the same time, the handful of computer companies that existed were making major innovations and had moved away from the punched-card system, advancing to vacuum tubes and magnetic tapes. Seeing new possibilities for cataloging and storing data, Librarian of Congress Lawrence Quincy Mumford established the Committee on Mechanized Information Retrieval in January 1958. In the years that followed, and with the approval of Congress, the Library purchased an IBM 1401, a small-scale computer system the size of a Volkswagen bus. The committee also recommended establishing a group to both design and implement the procedures required to automate the catalog.

See Roman Catholic Church census, 151 Centennial International Exhibition of 1876, 84 Ch’eng Ti, 15 Chicago Public Library, 7 Christianity, rise of, 15 clay, 12 Clemens, Samuel, 121 codex, 17 Cole, John, 87, 107 Collins, Billy, 156 Collyer, Homer, 148 Collyer, Langley, 148 Committee on Mechanized Information Retrieval, 152 computer punch cards, 151 Computing-Tabulating-Recording Company, 151 Congress Main Reading Room, 159 Copyright Act of 1870, 103 cross-referencing, 17 cuneiform, 12 Cutter, Charles Ammi, 82, 83, 108 D Dana, John, 146 Descartes, René, 19 Dewey Deciman Classification, 84 Dewey, Melville Louis, 82, 83, 85, 87, 107, 113, 151 Dixson, Kathy, 155 Diderot, Denis, 33 Douglass, Frederick, 102 Dove, Rita, 156 E Edlund, Paul, 112, 158 Eliot, T.


pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber

"World Economic Forum" Davos, AI winter, Alan Greenspan, algorithmic trading, AOL-Time Warner, Apollo 11, asset allocation, banking crisis, barriers to entry, Bear Stearns, Big bang: deregulation of the City of London, Bob Litterman, book value, business cycle, butter production in bangladesh, butterfly effect, buttonwood tree, buy and hold, buy low sell high, capital asset pricing model, Charles Babbage, citizen journalism, collateralized debt obligation, Cornelius Vanderbilt, corporate governance, Craig Reynolds: boids flock, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, electricity market, Emanuel Derman, en.wikipedia.org, experimental economics, fake news, financial engineering, financial innovation, fixed income, Ford Model T, Gordon Gekko, Hans Moravec, Herman Kahn, implied volatility, index arbitrage, index fund, information retrieval, intangible asset, Internet Archive, Ivan Sutherland, Jim Simons, John Bogle, John Nash: game theory, Kenneth Arrow, load shedding, Long Term Capital Management, machine readable, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, Metcalfe’s law, military-industrial complex, moral hazard, mutually assured destruction, Myron Scholes, natural language processing, negative equity, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, proprietary trading, quantitative hedge fund, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Reminiscences of a Stock Operator, Renaissance Technologies, risk free rate, risk tolerance, risk-adjusted returns, risk/return, Robert Metcalfe, Ronald Reagan, Rubik’s Cube, Savings and loan crisis, semantic web, Sharpe ratio, short selling, short squeeze, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, stock buybacks, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, tontine, too big to fail, transaction costs, Turing machine, two and twenty, Upton Sinclair, value at risk, value engineering, Vernor Vinge, Wayback Machine, yield curve, Yogi Berra, your tax dollars at work

Reporters were necessary intermediaries in an era when (for example) press releases were sent to a few thousand fax machines and assigned to reporters by editors, and when SEC filings were found on a shelf in the Commission’s reading rooms in major cities. Press releases go to everyone over the Web. SEC filings are completely electronic. The reading rooms are closed. There is a great deal of effort to develop persistent specialized information-retrieval software agents for these sorts of routine newsgathering activities, which in turn creates incentives for reporters to move up from moving information around to interpretation and analysis. Examples and more in-depth discussion on these “new research” topics are forthcoming in Chapters 9 and 10. 86 Nerds on Wall Str eet Innovative algo systems will facilitate the use of news, in processed and raw forms.

Reuters Newscope algorithmic offerings, http://about.reuters.com/productinfo/ newsscoperealtime/index.aspx?user=1&. 27. These tools are called Open Calais (www.opencalais.com/). 28. For the technically ambitious reader, Lucene (http://lucene.apache.org/), Lingpipe (http://alias-i.com/lingpipe/), and Lemur (www.lemurproject.org/) are popular open source language and information retrieval tools. 29. Anthony Oettinger, a pioneer in machine translation at Harvard going back to the 1950s, told a story of an early English-Russian-English system sponsored by U.S. intelligence agencies. The English “The spirit is willing but the flesh is weak” went in, was translated to Russian, which was then sent in again to be translated back into English.

Direct access to primary sources of financially relevant information is disintermediating reporters, who now have to provide more than just a conduit to earn their keep. We would be hard-pressed to find more innovation than we see today on the Web. Google Finance, Yahoo! Finance, and their brethren have made more advanced information retrieval and analysis tools available for free than could be purchased for any amount in the notso-distant past. Other new technologies enable a new level of human-machine collaboration in investment research, such as XML (extensible markup language), discussed in Chapter 2. One of this technology’s most vocal proponents is Christopher Cox, former chairman of the SEC, who has taken the lead in encouraging the adoption of XBRL (extensible Business Reporting Language) to keep U.S. markets, exchanges, companies, and investors ahead of the curve. 106 Nerds on Wall Str eet We constantly hear about information overload, information glut, information anxiety, data smog, and the like.


pages: 223 words: 52,808

Intertwingled: The Work and Influence of Ted Nelson (History of Computing) by Douglas R. Dechow

3D printing, Apple II, Bill Duvall, Brewster Kahle, Buckminster Fuller, Claude Shannon: information theory, cognitive dissonance, computer age, Computer Lib, conceptual framework, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, game design, HyperCard, hypertext link, Ian Bogost, information retrieval, Internet Archive, Ivan Sutherland, Jaron Lanier, knowledge worker, linked data, Marc Andreessen, Marshall McLuhan, Menlo Park, Mother of all demos, pre–internet, Project Xanadu, RAND corporation, semantic web, Silicon Valley, software studies, Steve Jobs, Steve Wozniak, Stewart Brand, Ted Nelson, TED Talk, The Home Computer Revolution, the medium is the message, Vannevar Bush, Wall-E, Whole Earth Catalog

J Technol Educ 10(1). http://​scholar.​lib.​vt.​edu/​ejournals/​JTE/​v10n1/​childress.​html 5. Nelson TH (1965) A file structure for the complex, the changing and the indeterminate. In: Proceedings of the ACM 20th national conference. ACM Press, New York, pp 84–100 6. Nelson TH (1967) Getting it out of our system. In: Schlechter G (ed) Information retrieval: a critical review. Thompson Books, Washington, DC, pp 191–210 7. Nelson TH (1968) Hypertext implementation notes, 6–10 March 1968. Xuarchives. http://​xanadu.​com/​REF%20​XUarchive%20​SET%20​03.​11.​06/​hin68.​tif 8. Nelson TH (1974) Computer lib: you can and must understand computers now/dream machines.

Ted signed my copy of Literary Machines [25] at a talk in the mid-1990s, thus I was in awe of the man when Bill Dutton put us together as visiting scholars in the OII attic, a wonderful space overlooking the Ashmolean Museum. Ted and I arrived at concepts of data and metadata from very different paths. He brought his schooling in the theater and literary theory to the pioneer days of personal computing. I brought my schooling in mathematics, information retrieval, documentation, libraries, and communication to the study of scholarship. While Ted was sketching personal computers to revolutionize written communication [24], I was learning how to pry data out of card catalogs and move them into the first generation of online catalogs [6]. Our discussions that began 30 years later revealed the interaction of these threads, which have since converged. 10.2 Collecting and Organizing Data Ted overwhelms himself in data, hence he needs metadata to manage his collections.

In: Proceedings of the World Documentation Federation Nelson TH (1966–1967) Hypertext notes. http://​web.​archive.​org/​web/​20031127035740/​http://​www.​xanadu.​com/​XUarchive/​. Unpublished series of ten short essays or “notes“ Nelson TH (1967) Getting it out of our system. In: Schechter G (ed) Information retrieval: a critical review. Thompson Books, Washington, DC, pp 191–210 Nelson TH, Carmody S, Gross W, Rice D, van Dam A (1969) A hypertext editing system for the/360. In: Faiman M, Nievergelt J (eds) Pertinent concepts in computer graphics. Proceedings of the Second University of Illinois conference on computer graphics.


pages: 371 words: 93,570

Broad Band: The Untold Story of the Women Who Made the Internet by Claire L. Evans

4chan, Ada Lovelace, air gap, Albert Einstein, Bletchley Park, British Empire, Charles Babbage, colonial rule, Colossal Cave Adventure, computer age, crowdsourcing, D. B. Cooper, dark matter, dematerialisation, Doomsday Book, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, East Village, Edward Charles Pickering, game design, glass ceiling, Grace Hopper, Gödel, Escher, Bach, Haight Ashbury, Harvard Computers: women astronomers, Honoré de Balzac, Howard Rheingold, HyperCard, hypertext link, index card, information retrieval, Internet Archive, Jacquard loom, John von Neumann, Joseph-Marie Jacquard, junk bonds, knowledge worker, Leonard Kleinrock, machine readable, Mahatma Gandhi, Mark Zuckerberg, Menlo Park, military-industrial complex, Mondo 2000, Mother of all demos, Network effects, old-boy network, On the Economy of Machinery and Manufactures, packet switching, PalmPilot, pets.com, rent control, RFC: Request For Comment, rolodex, San Francisco homelessness, semantic web, side hustle, Silicon Valley, Skype, South of Market, San Francisco, Steve Jobs, Steven Levy, Stewart Brand, subscription business, tech worker, technoutopianism, Ted Nelson, telepresence, The Soul of a New Machine, Wayback Machine, Whole Earth Catalog, Whole Earth Review, women in the workforce, Works Progress Administration, Y2K

With a couple of phones and boxes of index cards, it coordinated extensive group action for quick-response incidents like the 1971 San Francisco Bay oil spill—an early version of the kind of organizing that happens so easily today on social media. Resource One took up where these efforts left off, even inheriting the San Francisco Switchboard’s corporate shell. When Pam and the Chrises moved into the warehouse, their plan was to design a common information retrieval system for all the existing Switchboards in the city, interlinking their various resources into a database running on borrowed computer time. “Our vision was making technology accessible to people,” Pam explains. “It was a very passionate time. And we thought anything was possible.” But borrowing computer time to build such a database was far too limiting; if they were to imbue their politics into a computer system for the people, they’d need to build it from the ground up.

That summer, while the other communards plumbed the building’s twenty-foot hot tub, the Resource One group installed cabinet racks and drum storage units. Nobody on the job had done anything remotely like it—even the lead electrician learned as he went, and the software was written from scratch, encoding the counterculture’s values into the computer at an operating system level. The Resource One Generalized Information Retrieval System, ROGIRS, written by a hacker, Ephrem Lipkin, was designed for the underground Switchboards, as a way to manage the offerings of an alternative economy. Once up and running, the machine would become the heart of Northern California’s underground free-access network, a glimmer of the Internet’s vital cultural importance years before most people would ever hear of it.

Bolton told them how social services agencies in the Bay Area didn’t share a citywide database for referral information; he’d personally observed how social workers at different agencies relied on their own Rolodexes. The quality of referrals they gave varied throughout the city, and people weren’t always connected to the services they needed, even if the services did exist. Chris Macie, who founded Resource One with Pam and stayed on after she left, programmed a new information retrieval system for the project, and the women started calling social workers all over San Francisco. If they kept an updated database of referral information, they asked, would the agencies be interested in subscribing? The answer was a resounding yes. The women of Resource One found their cause: using the computer to help the most disadvantaged people in the city gain access to services.


pages: 480 words: 99,288

Mastering ElasticSearch by Rafal Kuc, Marek Rogozinski

Amazon Web Services, book value, business logic, create, read, update, delete, en.wikipedia.org, fault tolerance, finite state, full text search, information retrieval

The Lucene conceptual formula The conceptual version of the TF/IDF formula looks like: The previous presented formula is a representation of Boolean model of Information Retrieval combined with Vector Space Model of Information Retrieval. Let's not discuss it and let's just jump into the practical formula, which is implemented by Apache Lucene and is actually used. Note The information about Boolean model and Vector Space Model of Information Retrieval are far beyond the scope of this book. If you would like to read more about it, start with http://en.wikipedia.org/wiki/Standard_Boolean_model and http://en.wikipedia.org/wiki/Vector_Space_Model.

Rafał began his journey with Lucene in 2002 and it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and this was it. He started working with ElasticSearch in the middle of 2010. Currently, Lucene, Solr, ElasticSearch, and information retrieval are his main points of interest. Rafał is also an author of Solr 3.1 Cookbook, the update to it—Solr 4.0 Cookbook, and is a co-author of ElasticSearch Server all published by Packt Publishing. The book you are holding in your hands was something that I wanted to write after finishing the ElasticSearch Server book and I got the opportunity.


pages: 660 words: 141,595

Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking by Foster Provost, Tom Fawcett

Albert Einstein, Amazon Mechanical Turk, Apollo 13, big data - Walmart - Pop Tarts, bioinformatics, business process, call centre, chief data officer, Claude Shannon: information theory, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, data acquisition, data science, David Brooks, en.wikipedia.org, Erik Brynjolfsson, Gini coefficient, Helicobacter pylori, independent contractor, information retrieval, intangible asset, iterative process, Johann Wolfgang von Goethe, Louis Pasteur, Menlo Park, Nate Silver, Netflix Prize, new economy, p-value, pattern recognition, placebo effect, price discrimination, recommendation engine, Ronald Coase, selection bias, Silicon Valley, Skype, SoftBank, speech recognition, Steve Jobs, supply-chain management, systems thinking, Teledyne, text mining, the long tail, The Signal and the Noise by Nate Silver, Thomas Bayes, transaction costs, WikiLeaks

It forms the core of several prediction algorithms that estimate a target value such as the expected resouce usage of a client or the probability of a customer to respond to an offer. It is also the basis for clustering techniques, which group entities by their shared features without a focused objective. Similarity forms the basis of information retrieval, in which documents or webpages relevant to a search query are retrieved. Finally, it underlies several common algorithms for recommendation. A traditional algorithm-oriented book might present each of these tasks in a different chapter, under different names, with common aspects buried in algorithm details or mathematical propositions.

Jaccard distance Cosine distance is often used in text classification to measure the similarity of two documents. It is defined in Equation 6-5. Equation 6-5. Cosine distance where ||·||2 again represents the L2 norm, or Euclidean length, of each feature vector (for a vector this is simply the distance from the origin). Note The information retrieval literature more commonly talks about cosine similarity, which is simply the fraction in Equation 6-5. Alternatively, it is 1 – cosine distance. In text classification, each word or token corresponds to a dimension, and the location of a document along each dimension is the number of occurrences of the word in that document.

The True negative rate and False positive rate are analogous for the instances that are actually negative. These are often taken as estimates of the probability of predicting Y when the instance is actually p, that is p(Y|p), etc. We will continue to explore these measures in Chapter 8. The metrics Precision and Recall are often used, especially in text classification and information retrieval. Recall is the same as true positive rate, while precision is TP/(TP + FP), which is the accuracy over the cases predicted to be positive. The F-measure is the harmonic mean of precision and recall at a given point, and is: Practitioners in many fields such as statistics, pattern recognition, and epidemiology speak of the sensitivity and specificity of a classifier: You may also hear about the positive predictive value, which is the same as precision.


pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots by John Markoff

A Declaration of the Independence of Cyberspace, AI winter, airport security, Andy Rubin, Apollo 11, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, basic income, Baxter: Rethink Robotics, Bill Atkinson, Bill Duvall, bioinformatics, Boston Dynamics, Brewster Kahle, Burning Man, call centre, cellular automata, Charles Babbage, Chris Urmson, Claude Shannon: information theory, Clayton Christensen, clean water, cloud computing, cognitive load, collective bargaining, computer age, Computer Lib, computer vision, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data acquisition, Dean Kamen, deep learning, DeepMind, deskilling, Do you want to sell sugared water for the rest of your life?, don't be evil, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dr. Strangelove, driverless car, dual-use technology, Dynabook, Edward Snowden, Elon Musk, Erik Brynjolfsson, Evgeny Morozov, factory automation, Fairchild Semiconductor, Fillmore Auditorium, San Francisco, From Mathematics to the Technologies of Life and Death, future of work, Galaxy Zoo, General Magic , Geoffrey Hinton, Google Glasses, Google X / Alphabet X, Grace Hopper, Gunnar Myrdal, Gödel, Escher, Bach, Hacker Ethic, Hans Moravec, haute couture, Herbert Marcuse, hive mind, hype cycle, hypertext link, indoor plumbing, industrial robot, information retrieval, Internet Archive, Internet of things, invention of the wheel, Ivan Sutherland, Jacques de Vaucanson, Jaron Lanier, Jeff Bezos, Jeff Hawkins, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John Perry Barlow, John von Neumann, Kaizen: continuous improvement, Kevin Kelly, Kiva Systems, knowledge worker, Kodak vs Instagram, labor-force participation, loose coupling, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, medical residency, Menlo Park, military-industrial complex, Mitch Kapor, Mother of all demos, natural language processing, Neil Armstrong, new economy, Norbert Wiener, PageRank, PalmPilot, pattern recognition, Philippa Foot, pre–internet, RAND corporation, Ray Kurzweil, reality distortion field, Recombinant DNA, Richard Stallman, Robert Gordon, Robert Solow, Rodney Brooks, Sand Hill Road, Second Machine Age, self-driving car, semantic web, Seymour Hersh, shareholder value, side project, Silicon Valley, Silicon Valley startup, Singularitarianism, skunkworks, Skype, social software, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Strategic Defense Initiative, strong AI, superintelligent machines, tech worker, technological singularity, Ted Nelson, TED Talk, telemarketer, telepresence, telepresence robot, Tenerife airport disaster, The Coming Technological Singularity, the medium is the message, Thorstein Veblen, Tony Fadell, trolley problem, Turing test, Vannevar Bush, Vernor Vinge, warehouse automation, warehouse robotics, Watson beat the top human players on Jeopardy!, We are as Gods, Whole Earth Catalog, William Shockley: the traitorous eight, zero-sum game

Engelbart’s researchers, an eclectic collection of buttoned-down white-shirted engineers and long-haired computer hackers, were taking computing in a direction so different it was not even in the same coordinate system. The Shakey project was struggling to mimic the human mind and body. Engelbart had a very different goal. During World War II he had stumbled across an article by Vannevar Bush, who had proposed a microfiche-based information retrieval system called Memex to manage all of the world’s knowledge. Engelbart later decided that such a system could be assembled based on the then newly available computers. He thought the time was right to build an interactive system to capture knowledge and organize information in such a way that it would now be possible for a small group of people—scientists, engineers, educators—to create and collaborate more effectively.

The PageRank algorithm Larry Page developed to improve Internet search results essentially mined human intelligence by using the crowd-sourced accumulation of human decisions about valuable information sources. Google initially began by collecting and organizing human knowledge and then making it available to humans as part of a glorified Memex, the original global information retrieval system first proposed by Vannevar Bush in the Atlantic Monthly in 1945.11 As the company has evolved, however, it has started to push heavily toward systems that replace rather than extend humans. Google’s executives have obviously thought to some degree about the societal consequences of the systems they are creating.

Louis and Stanford, but dropped out of both programs before receiving an advanced degree. Once he was on the West Coast, he had gotten involved with Brewster Kahle’s Internet Archive Project, which sought to save a copy of every Web page on the Internet. Larry Page and Sergey Brin had given Hassan stock for programming PageRank, and Hassan also sold E-Groups, another of his information retrieval projects, to Yahoo! for almost a half-billion dollars. By then, he was a very wealthy Silicon Valley technologist looking for interesting projects. In 2006 he backed both Ng and Salisbury and hired Salisbury’s students to join Willow Garage, a laboratory he’d already created to facilitate the next generation of robotics technology—like designing driverless cars.


The Art of SEO by Eric Enge, Stephan Spencer, Jessie Stricchiola, Rand Fishkin

AltaVista, barriers to entry, bounce rate, Build a better mousetrap, business intelligence, cloud computing, content marketing, dark matter, en.wikipedia.org, Firefox, folksonomy, Google Chrome, Google Earth, hypertext link, index card, information retrieval, Internet Archive, Larry Ellison, Law of Accelerating Returns, linked data, mass immigration, Metcalfe’s law, Network effects, optical character recognition, PageRank, performance metric, Quicken Loans, risk tolerance, search engine result page, self-driving car, sentiment analysis, social bookmarking, social web, sorting algorithm, speech recognition, Steven Levy, text mining, the long tail, vertical integration, Wayback Machine, web application, wikimedia commons

However, the search engines recognize an iframe or a frame used to pull in another site’s content for what it is, and therefore ignore the content inside the iframe or frame as it is content published by another publisher. In other words, they don’t consider content pulled in from another site as part of the unique content of your web page. Determining Searcher Intent and Delivering Relevant, Fresh Content Modern commercial search engines rely on the science of information retrieval (IR). This science has existed since the middle of the twentieth century, when retrieval systems powered computers in libraries, research facilities, and government labs. Early in the development of search systems, IR scientists realized that two critical components comprised the majority of search functionality: relevance and importance (which we defined earlier in this chapter).

As far as the search engines are concerned, however, the text in a document—and particularly the frequency with which a particular term or phrase is used—has very little impact on how happy a searcher will be with that page. In fact, quite often a page laden with repetitive keywords in an attempt to please the engines will provide a very poor user experience; thus, although some SEO professionals today do claim to use term weight (a mathematical equation grounded in the real science of information retrieval) or other, more “modern” keyword text usage methods, nearly all optimization can be done very simply. The best way to ensure that you’ve achieved the greatest level of targeting in your text for a particular term or phrase is to use it in the title tag, in one or more of the section headings (within reason), and in the copy on the web page.

Hiding text in Java applets As with text in images, the search engines cannot easily parse content inside Java applets. Using them as a tool to hide text would certainly be a strange choice, though. Forcing form submission Search engines will not submit HTML forms in an attempt to access the information retrieved from a search or submission. Thus, if you keep content behind a forced-form submission and never link to it externally, your content will remain out of the engines’ indexes (as Figure 6-43 demonstrates). Figure 6-43. Content that can only be accessed by submitting a form is unreadable by crawlers The problem comes when content behind forms earns links outside your control, as when bloggers, journalists, or researchers decide to link to the pages in your archives without your knowledge.


pages: 160 words: 45,516

Tomorrow's Lawyers: An Introduction to Your Future by Richard Susskind

business intelligence, business process, business process outsourcing, call centre, Clayton Christensen, cloud computing, commoditize, crowdsourcing, data science, disruptive innovation, global supply chain, information retrieval, invention of the wheel, power law, pre–internet, Ray Kurzweil, Silicon Valley, Skype, speech recognition, supply-chain management, telepresence, Watson beat the top human players on Jeopardy!

Consider recent progress in artificial intelligence (AI) and, in particular, the achievements of Watson, IBM’s AI-based system that competed—in a live broadcast in 2011—on the US television general knowledge quiz show Jeopardy! Watson beat the show’s two finest ever human contestants. This is a phenomenal technological feat, combining advanced natural language understanding, machine learning, information retrieval, knowledge processing, speech synthesis, and more. While the remarkable Google retrieves information for us that might be relevant, Watson shows how AI-based systems, in years to come, will actually speak with us and solve our problems. It is significant that many new and emerging applications do not simply computerize and streamline pre-existing and inefficient manual processes.

This thinking led me in 1996, in my book The Future of Law, to predict a shift in legal paradigm, by which I meant that many or most of our fundamental assumptions about legal service and legal process would be challenged and displaced by IT and the Internet. It was a 20-year prediction, so I can be called fully to account in 2016. I do not think I will be far out—when I look at IBM’s Watson (see Chapter 1) and think of similar technology in law, or reflect on information retrieval systems that are already outperforming human beings engaged in document review, then I feel we are on the brink of a monumental shift. Crucially, I concluded in 1996 that legal service would move from being a one-to-one, consultative, print-based advisory service to a one-to-many, packaged, Internet-based information service.


pages: 347 words: 97,721

Only Humans Need Apply: Winners and Losers in the Age of Smart Machines by Thomas H. Davenport, Julia Kirby

"World Economic Forum" Davos, AI winter, Amazon Robotics, Andy Kessler, Apollo Guidance Computer, artificial general intelligence, asset allocation, Automated Insights, autonomous vehicles, basic income, Baxter: Rethink Robotics, behavioural economics, business intelligence, business process, call centre, carbon-based life, Clayton Christensen, clockwork universe, commoditize, conceptual framework, content marketing, dark matter, data science, David Brooks, deep learning, deliberate practice, deskilling, digital map, disruptive innovation, Douglas Engelbart, driverless car, Edward Lloyd's coffeehouse, Elon Musk, Erik Brynjolfsson, estate planning, financial engineering, fixed income, flying shuttle, follow your passion, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, game design, general-purpose programming language, global pandemic, Google Glasses, Hans Lippershey, haute cuisine, income inequality, independent contractor, index fund, industrial robot, information retrieval, intermodal, Internet of things, inventory management, Isaac Newton, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joi Ito, Khan Academy, Kiva Systems, knowledge worker, labor-force participation, lifelogging, longitudinal study, loss aversion, machine translation, Mark Zuckerberg, Narrative Science, natural language processing, Nick Bostrom, Norbert Wiener, nuclear winter, off-the-grid, pattern recognition, performance metric, Peter Thiel, precariat, quantitative trading / quantitative finance, Ray Kurzweil, Richard Feynman, risk tolerance, Robert Shiller, robo advisor, robotic process automation, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, six sigma, Skype, social intelligence, speech recognition, spinning jenny, statistical model, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, superintelligent machines, supply-chain management, tacit knowledge, tech worker, TED Talk, the long tail, transaction costs, Tyler Cowen, Tyler Cowen: Great Stagnation, Watson beat the top human players on Jeopardy!, Works Progress Administration, Zipcar

Forms of Augmentation: Superpowers and Leverage In the realm of knowledge work, we’ve seen augmentation by intelligent machines take four forms, and we can further group them into just two categories. The first two we would class as superpowers, and the second two as leverage. When a machine greatly augments your powers of information retrieval, as many information systems do, we would call that gaining a superpower. Indeed, in the Terminator film franchise, out of all the superhuman capabilities Skynet designed into its “cybernetic organisms,” the one filmgoers covet most is the instant pop-up retrieval of biographical information on any humans encountered.

It was the inspiration, for example, for Google Glass, according to the technical lead on that product, Thad Starner.6 (And although we had to say Hasta la vista, baby, to that particular product, Google assures us it will be back.) When Tom wrote a book about knowledge workers a decade ago, there were already some examples of how empowering such information retrieval can be for them. He wrote in some detail, for example, about the idea of “computer-aided physician order entry,” particularly focusing on an example of this type of system at Partners HealthCare, a care network in Boston. When physicians input medical orders (drugs, tests, referrals, etc.) for their patients into the system, it checks to see if the order is consistent with what it thinks is best medical practice.

See also augmentation; specific professions augmentation and, 31–32, 62, 65, 74, 76, 100, 122, 139, 176, 185, 228, 234, 251 big-picture perspective and, 100 codified tasks and automation, 12–13, 14, 16–18, 19, 27–28, 30, 70, 139, 156, 167, 191, 204, 216, 246 creativity and, 120–21 defined, 5 demand peak, 6 deskilling and, 16 five options for, 76–77, 218, 232 (see also specific steps) how job loss happens, 23–24 information retrieval and, 65–66 lack of wage growth, 24 machine encroachment, 13, 24–25 political strategy to help, 239 roles better done by humans, 26–30 signs of coming automation, 19–22 Stepping In, post-automation work, 30–32 taking charge of destiny, 8–9 time frame for dislocation of, 24–26 who they are, 5–6 working hours of, 70 Kraft, Robert, 172–73 Krans, Mike, 102–3, 132, 134–35, 138 Kurup, Deepika, 164 Kurzweil, Ray, 36 labor unions, 1, 16, 25 Lacerte, 22 language recognition technologies, 39–40, 43, 44–45, 50, 53, 56, 212 natural language processing (NLP), 34, 37, 178 Lawton, Jim, 50, 182–83, 193 Learning by Doing (Bessen), 133, 233 legal field augmentation as leverage in, 68 automation (e-discovery), 13, 142–44, 145, 151 content analysis and automation, 20 narrow specializations, 159–60, 162 number of U.S. lawyers, 68 Stepping Up in, 93 Leibniz Institute for Astrophysics, 59 Levasseur, M.


pages: 582 words: 160,693

The Sovereign Individual: How to Survive and Thrive During the Collapse of the Welfare State by James Dale Davidson, William Rees-Mogg

affirmative action, agricultural Revolution, Alan Greenspan, Alvin Toffler, bank run, barriers to entry, Berlin Wall, borderless world, British Empire, California gold rush, classic study, clean water, colonial rule, Columbine, compound rate of return, creative destruction, Danny Hillis, debt deflation, ending welfare as we know it, epigenetics, Fall of the Berlin Wall, falling living standards, feminist movement, financial independence, Francis Fukuyama: the end of history, full employment, George Gilder, Hernando de Soto, illegal immigration, income inequality, independent contractor, informal economy, information retrieval, Isaac Newton, John Perry Barlow, Kevin Kelly, market clearing, Martin Wolf, Menlo Park, money: store of value / unit of account / medium of exchange, new economy, New Urbanism, Norman Macrae, offshore financial centre, Parkinson's law, pattern recognition, phenotype, price mechanism, profit maximization, rent-seeking, reserve currency, road to serfdom, Ronald Coase, Sam Peltzman, school vouchers, seigniorage, Silicon Valley, spice trade, statistical model, telepresence, The Nature of the Firm, the scientific method, The Wealth of Nations by Adam Smith, Thomas L Friedman, Thomas Malthus, trade route, transaction costs, Turing machine, union organizing, very high income, Vilfredo Pareto

Digital Lawyers 154 Before agreeing to perform an operation, the skilled surgeon will probably call upon a digital lawyer to draft an instant contract that specifies and limits liability based upon the size and characteristics of the tumor revealed in images displayed by the magnetic resonance machine. Digital lawyers will be information-retrieval systems that automate selection of contract provisions, employing artificial intelligence processes such as neural networks to customize private contracts to meet transnational legal conditions. Participants in most high-value or important transactions will not only shop for suitable partners with whom to conduct a business; they will also shop for a suitable domicile for their transactions.

• Lifetime employment will disappear as "jobs" increasingly become tasks or "piece work" rather than positions within an organization. • Control over economic resources will shift away from the state to persons of superior skills and intelligence, as it becomes increasingly easy to create wealth by adding knowledge to products. • Many members of learned professions will be displaced by interactive information-retrieval systems. • New survival strategies for persons of lower intelligence will evolve, involving greater concentration on development of leisure skills, sports abilities, and crime, as well as service to the growing numbers of Sovereign Individuals as income inequality within jurisdictions rises.

As a consequence, broad paradigmatic understanding, or unspoken theories about the way the world works, are being antiquated more quickly than in the past. This increases the importance of the broad overview and diminishes the value of individual "facts" of the kind that are readily available to almost anyone with an information retrieval system. 3. The growing tribalization and marginalization of life have had a stunting effect on discourse, and even on thinking. Many people have consequently gotten into the habit of shying away from conclusions that are obviously implied by the facts at their disposal. A recent psychological study disguised as a public opinion poll showed that members of individual occupational groups were almost uniformly unwilling to accept any conclusion that implied a loss of income for them, no matter how airtight the logic supporting it.


Sorting Things Out: Classification and Its Consequences (Inside Technology) by Geoffrey C. Bowker

affirmative action, business process, classic study, corporate governance, Drosophila, government statistician, information retrieval, loose coupling, Menlo Park, Mitch Kapor, natural language processing, Occam's razor, QWERTY keyboard, Scientific racism, scientific worldview, sexual politics, statistical model, Stephen Hawking, Stewart Brand, tacit knowledge, the built environment, the medium is the message, the strength of weak ties, transaction costs, William of Occam

So one culture sees spirit possession as a valid cause of death , another ridicules this as superstition; one medical specialty sees cancer as a localized phenomenon to be cut out and stopped from spreading, another sees it as a disorder of the whole immune system that merely manifests in one location or another. The implications f(Jr both treatment and classification differ. Tryin g to encode both causes results in serious information retrieval problems . I n addition , classifications shift historically. I n Britain i n I 650 we find that 696 people died of being " aged" ; 3 1 succumbed to wolves, 9 to grief, and l 9 to " King's Evil . " " Mother" claimed 2 in I 647 but none in 1 65 0 , but in that year 2 were " smothered and stifled" (see figure 1 . 3 ) .

) indexi­ cality: the 48 points were only recognized if they were at least 0.5 cun from a classic acupuncture point, where a cun is: "the distance between the interphalangeal creases of the patient's middle finger" (WHO 1 99 1 , 1 4 ) . Formal Classification The structural aspects of classification are themselves a technical spe­ cialty in information science, biology, and statistics, among other places . Information scientists design thesauri for information retrieval, valuing parsimony and accuracy of terms, and the overall stability of the system over long periods of time. For biologists the choice of structure reflects how one sees species and the evolutionary process. For transformed cladists and numerical taxonomists, no useful state­ ment about the past can be read out of their classifications; for evolu­ tionary taxonomists that is the very basis of their system.

The lCD, he points out, originated as The Kindness of Strangers 69 a means for describing causes of death; a trace of its heritage is its continued difficulty with describing chronic as opposed to acute forms of disease. This is one basis for the temporal fault lines that emerge in its usage. The UMLS originated as a means of information retrieval (the MeSH scheme) and is not as sensitive to clinical conditions as it might be (Musen 1 992, 440) . The two basic problems for any overarching classification scheme in a rapidly changing and complex field can be described as follows. First, any classificatory decision made now might by its nature block off valuable future developments .


pages: 32 words: 7,759

8 Day Trips From London by Dee Maldon

Doomsday Book, information retrieval, Isaac Newton, Stephen Hawking, the market place

8 Day Trips from London A simple guide for visitors who want to see more than the capital By Dee Maldon Bookline & Thinker Ltd Bookline & Thinker Ltd #231, 405 King’s Road London SW10 OBB www.booklinethinker.com Eight Days Out From London Copyright © Bookline & Thinker Ltd 2010 This book is a work of non-fiction A CIP catalogue record for this book is available from the British Library All rights reserved. No part of this work may be reproduced or stored in an information retrieval system without the express permission of the publisher ISBN: 9780956517715 Printed and bound by Lightning Source UK Book cover designed by Donald McColl Contents Bath Brighton Cambridge Canterbury Oxford Stonehenge Winchester Windsor Introduction Why take any day trips from London?


pages: 379 words: 109,612

Is the Internet Changing the Way You Think?: The Net's Impact on Our Minds and Future by John Brockman

A Declaration of the Independence of Cyberspace, Albert Einstein, AltaVista, Amazon Mechanical Turk, Asperger Syndrome, availability heuristic, Benoit Mandelbrot, biofilm, Black Swan, bread and circuses, British Empire, conceptual framework, corporate governance, Danny Hillis, disinformation, Douglas Engelbart, Douglas Engelbart, Emanuel Derman, epigenetics, Evgeny Morozov, financial engineering, Flynn Effect, Frank Gehry, Future Shock, Google Earth, hive mind, Howard Rheingold, index card, information retrieval, Internet Archive, invention of writing, Jane Jacobs, Jaron Lanier, John Markoff, John Perry Barlow, Kevin Kelly, Large Hadron Collider, lifelogging, lone genius, loss aversion, mandelbrot fractal, Marc Andreessen, Marshall McLuhan, Menlo Park, meta-analysis, Neal Stephenson, New Journalism, Nicholas Carr, One Laptop per Child (OLPC), out of africa, Paul Samuelson, peer-to-peer, pneumatic tube, Ponzi scheme, power law, pre–internet, Project Xanadu, Richard Feynman, Rodney Brooks, Ronald Reagan, satellite internet, Schrödinger's Cat, search costs, Search for Extraterrestrial Intelligence, SETI@home, Silicon Valley, Skype, slashdot, smart grid, social distancing, social graph, social software, social web, Stephen Hawking, Steve Wozniak, Steven Pinker, Stewart Brand, synthetic biology, Ted Nelson, TED Talk, telepresence, the medium is the message, the scientific method, the strength of weak ties, The Wealth of Nations by Adam Smith, theory of mind, trade route, upwardly mobile, Vernor Vinge, Whole Earth Catalog, X Prize, Yochai Benkler

And when a file becomes corrupt, all I am left with is a pointer, a void where an idea should be, the ghost of a departed thought. The New Balance: More Processing, Less Memorization Fiery Cushman Postdoctoral fellow, Mind/Brain/Behavior Interfaculty Initiative, Harvard University The Internet changes the way I behave, and possibly the way I think, by reducing the processing costs of information retrieval. I focus more on knowing how to obtain and use information online and less on memorizing it. This tradeoff between processing and memory reminds me of one of my father’s favorite stories, perhaps apocryphal, about studying the periodic table of the elements in his high school chemistry class.

And when a friend cooks a good meal, I’m more interested to learn what Website it came from than how it was spiced. I don’t know most of the American Psychological Association rules for style and citation, but my computer does. For any particular “computation” I perform, I don’t need the same depth of knowledge, because I have access to profoundly more efficient processes of information retrieval. So the Internet clearly changes the way I behave. It must be changing the way I think at some level, insofar as my behavior is a product of my thoughts. It probably is not changing the basic kinds of mental processes I can perform but it might be changing their relative weighting. We psychologists love to impress undergraduates with the fact that taxi drivers have unusually large hippocampi.

Anthony Aguirre Associate professor of physics, University of California, Santa Cruz Recently I wanted to learn about twelfth-century China—not a deep or scholarly understanding, just enough to add a bit of not-wrong color to something I was writing. Wikipedia was perfect! More regularly, my astrophysics and cosmology endeavors bring me to databases such as arXiv, ADS (Astrophysics Data System), and SPIRES (Stanford Physics Information Retrieval System), which give instant and organized access to all the articles and information I might need to research and write. Between such uses and an appreciable fraction of my time spent processing e-mails, I, like most of my colleagues, spend a lot of time connected to the Internet. It is a central tool in my research life.


pages: 408 words: 105,715

Kingdom of Characters: The Language Revolution That Made China Modern by Jing Tsu

affirmative action, British Empire, computer age, Deng Xiaoping, Frederick Winslow Taylor, Great Leap Forward, information retrieval, invention of movable type, machine readable, machine translation, Menlo Park, natural language processing, Norbert Wiener, QWERTY keyboard, scientific management, Silicon Valley, smart cities, South China Sea, transcontinental railway

Only selected state agencies and research institutions were given permission to build mainframe computers or to house them, and their equipment was largely dependent on imported parts if not entire machines. Information retrieval was also a distant goal. Back then, information retrieval meant something more basic than typing a query into a search box on Google or Bing. It was literally about where and how to store data information and how to call it up as a file or other format. Both electronic and informational retrieval would take longer-term planning. For the time being, the only area that was both urgent and achievable was phototypesetting. This method of typesetting involved taking a snapshot of the character to be printed, then transferring the film image to printing plates.


pages: 58 words: 12,386

Big Data Glossary by Pete Warden

business intelligence, business logic, crowdsourcing, fault tolerance, functional programming, information retrieval, linked data, machine readable, natural language processing, recommendation engine, web application

For example, you might want to extract product names and prices from a shopping site. With the tool, you could find a single product page, select the product name and price, and then the same elements would be pulled for every other page it crawled from the site. It relies on the fact that most web pages are generated by combining templates with information retrieved from a database, and so have a very consistent structure. Once you’ve gathered the data, it offers some features that are a bit like Google Refine’s for de-duplicating and cleaning up the data. All in all, it’s a very powerful tool for turning web content into structured information, with a very approachable interface.


pages: 586 words: 186,548

Architects of Intelligence by Martin Ford

3D printing, agricultural Revolution, AI winter, algorithmic bias, Alignment Problem, AlphaGo, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, barriers to entry, basic income, Baxter: Rethink Robotics, Bayesian statistics, Big Tech, bitcoin, Boeing 747, Boston Dynamics, business intelligence, business process, call centre, Cambridge Analytica, cloud computing, cognitive bias, Colonization of Mars, computer vision, Computing Machinery and Intelligence, correlation does not imply causation, CRISPR, crowdsourcing, DARPA: Urban Challenge, data science, deep learning, DeepMind, Demis Hassabis, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, driverless car, Elon Musk, Erik Brynjolfsson, Ernest Rutherford, fake news, Fellow of the Royal Society, Flash crash, future of work, general purpose technology, Geoffrey Hinton, gig economy, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, Hans Rosling, hype cycle, ImageNet competition, income inequality, industrial research laboratory, industrial robot, information retrieval, job automation, John von Neumann, Large Hadron Collider, Law of Accelerating Returns, life extension, Loebner Prize, machine translation, Mark Zuckerberg, Mars Rover, means of production, Mitch Kapor, Mustafa Suleyman, natural language processing, new economy, Nick Bostrom, OpenAI, opioid epidemic / opioid crisis, optical character recognition, paperclip maximiser, pattern recognition, phenotype, Productivity paradox, radical life extension, Ray Kurzweil, recommendation engine, Robert Gordon, Rodney Brooks, Sam Altman, self-driving car, seminal paper, sensor fusion, sentiment analysis, Silicon Valley, smart cities, social intelligence, sparse data, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, synthetic biology, systems thinking, Ted Kaczynski, TED Talk, The Rise and Fall of American Growth, theory of mind, Thomas Bayes, Travis Kalanick, Turing test, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, women in the workforce, working-age population, workplace surveillance , zero-sum game, Zipcar

From 1996 to 1999, he worked for Digital Equipment Corporation’s Western Research Lab in Palo Alto, where he worked on low-overhead profiling tools, design of profiling hardware for out-of-order microprocessors, and web-based information retrieval. From 1990 to 1991, Jeff worked for the World Health Organization’s Global Programme on AIDS, developing software to do statistical modeling, forecasting, and analysis of the HIV pandemic. In 2009, Jeff was elected to the National Academy of Engineering, and he was also named a Fellow of the Association for Computing Machinery (ACM) and a Fellow of the American Association for the Advancement of Sciences (AAAS). His areas of interest include large-scale distributed systems, performance monitoring, compression techniques, information retrieval, application of machine learning to search and other related problems, microprocessor architecture, compiler optimizations, and development of new products that organize existing information in new and interesting ways.

That made me start thinking about AI again I eventually figured out that the reason Watson won is because it was actually a narrower AI problem than it first appeared to be. That’s almost always the answer. In Watson’s case it’s because about 95% of the answers in Jeopardy turn out to be the titles of Wikipedia pages. Instead of understanding language, reasoning about it and so forth, it was mostly doing information retrieval from a restricted set, namely the pages that are Wikipedia titles. It was actually not as hard of a problem as it looked like to the untutored eye, but it was interesting enough that it got me to think about AI again. Around the same time, I started writing for The New Yorker, where I was producing a lot of pieces about neuroscience, linguistics, psychology, and also AI.

MARTIN FORD: Of course, that’s not a problem that’s exclusive to AI; humans are subject to the same issues when confronted with flawed data. It’s a bias in the data that results from past decisions that people doing research made. BARBARA GROSZ: Right, but now look what’s going on in some areas of medicine. The computer system can, “read all the papers” (more than a person could) and do certain kinds of information retrieval from them and extract results, and then do statistical analyses. But if most of the papers are on scientific work that was done only on male mice, or only on male humans, then the conclusions the system is coming to are limited. We’re also seeing this problem in the legal realm, with policing and fairness.


pages: 259 words: 73,193

The End of Absence: Reclaiming What We've Lost in a World of Constant Connection by Michael Harris

4chan, Albert Einstein, algorithmic management, AltaVista, Andrew Keen, augmented reality, Burning Man, Carrington event, cognitive dissonance, crowdsourcing, dematerialisation, disinformation, en.wikipedia.org, Evgeny Morozov, Filter Bubble, Firefox, Google Glasses, informal economy, information retrieval, invention of movable type, invention of the printing press, invisible hand, James Watt: steam engine, Jaron Lanier, jimmy wales, Kevin Kelly, Lewis Mumford, lifelogging, Loebner Prize, low earth orbit, Marshall McLuhan, McMansion, moral panic, Nicholas Carr, off-the-grid, pattern recognition, Plato's cave, pre–internet, Republic of Letters, Silicon Valley, Skype, Snapchat, social web, Steve Jobs, technological solutionism, TED Talk, the medium is the message, The Wisdom of Crowds, traumatic brain injury, Turing test

As we inevitably off-load media content to the cloud—storing our books, our television programs, our videos of the trip to Taiwan, and photos of Grandma’s ninetieth birthday, all on a nameless server—can we happily dematerialize our mind’s stores, too? Perhaps we should side with philosopher Lewis Mumford, who insisted in The Myth of the Machine that “information retrieving,” however expedient, is simply no substitute for the possession of knowledge accrued through personal and direct labor. Author Clive Thompson wondered about this when he came across recent research suggesting that we remember fewer and fewer facts these days—of three thousand people polled by neuroscientist Ian Robertson, the young were less able to recall basic personal information (a full one-third, for example, didn’t know their own phone numbers).

., III, 84–85 “He Poos Clouds” (Pallett), 164 History of Reading, A (Manguel), 16, 117, 159 Hollinghurst, Alan, 115 Holmes, Sherlock, 147–48 House at Pooh Corner, The (Milne), 93 Hugo, Victor, 20–21 “Idea of North, The” (Gould), 200–201 In Defense of Elitism (Henry), 84–85 Information, The (Gleick), 137 information retrieval, 141–42 Innis, Harold, 202 In Search of Lost Time (Proust), 160 Instagram, 19, 104, 149 Internet, 19, 20, 21, 23, 26–27, 55, 69, 125, 126, 129, 141, 143, 145, 146, 187, 199, 205 brain and, 37–38, 40, 142, 185 going without, 185, 186, 189–97, 200, 208–9 remembering life before, 7–8, 15–16, 21–22, 48, 55, 203 Internship, The, 89 iPad, 21, 31 children and, 26–27, 45 iPhone, see phones iPotty, 26 iTunes, 89 Jobs, Steve, 134 Jones, Patrick, 152n Justification of Johann Gutenberg, The (Morrison), 12 Kaiser Foundation, 27, 28n Kandel, Eric, 154 Kaufman, Charlie, 155 Keen, Andrew, 88 Kelly, Kevin, 43 Kierkegaard, Søren, 49 Kinsey, Alfred, 173 knowledge, 11–12, 75, 80, 82, 83, 86, 92, 94, 98, 141, 145–46 Google Books and, 102–3 Wikipedia and, 63, 78 Koller, Daphne, 95 Kranzberg, Melvin, 7 Kundera, Milan, 184 Lanier, Jaron, 85, 106–7, 189 latent Dirichlet allocation (LDA), 64–65 Leonardo da Vinci, 56 Lewis, R.


pages: 288 words: 86,995

Rule of the Robots: How Artificial Intelligence Will Transform Everything by Martin Ford

AI winter, Airbnb, algorithmic bias, algorithmic trading, Alignment Problem, AlphaGo, Amazon Mechanical Turk, Amazon Web Services, artificial general intelligence, Automated Insights, autonomous vehicles, backpropagation, basic income, Big Tech, big-box store, call centre, carbon footprint, Chris Urmson, Claude Shannon: information theory, clean water, cloud computing, commoditize, computer age, computer vision, Computing Machinery and Intelligence, coronavirus, correlation does not imply causation, COVID-19, crowdsourcing, data is the new oil, data science, deep learning, deepfake, DeepMind, Demis Hassabis, deskilling, disruptive innovation, Donald Trump, Elon Musk, factory automation, fake news, fulfillment center, full employment, future of work, general purpose technology, Geoffrey Hinton, George Floyd, gig economy, Gini coefficient, global pandemic, Googley, GPT-3, high-speed rail, hype cycle, ImageNet competition, income inequality, independent contractor, industrial robot, informal economy, information retrieval, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Jeff Bezos, job automation, John Markoff, Kiva Systems, knowledge worker, labor-force participation, Law of Accelerating Returns, license plate recognition, low interest rates, low-wage service sector, Lyft, machine readable, machine translation, Mark Zuckerberg, Mitch Kapor, natural language processing, Nick Bostrom, Northpointe / Correctional Offender Management Profiling for Alternative Sanctions, Ocado, OpenAI, opioid epidemic / opioid crisis, passive income, pattern recognition, Peter Thiel, Phillips curve, post scarcity, public intellectual, Ray Kurzweil, recommendation engine, remote working, RFID, ride hailing / ride sharing, Robert Gordon, Rodney Brooks, Rubik’s Cube, Sam Altman, self-driving car, Silicon Valley, Silicon Valley startup, social distancing, SoftBank, South of Market, San Francisco, special economic zone, speech recognition, stealth mode startup, Stephen Hawking, superintelligent machines, TED Talk, The Future of Employment, The Rise and Fall of American Growth, the scientific method, Turing machine, Turing test, Tyler Cowen, Tyler Cowen: Great Stagnation, Uber and Lyft, uber lyft, universal basic income, very high income, warehouse automation, warehouse robotics, Watson beat the top human players on Jeopardy!, WikiLeaks, women in the workforce, Y Combinator

The robot is able to engage in rudimentary conversations and do a number of practical things that center around information retrieval; it can look up things on the internet, get weather and traffic reports, play music and so forth. In other words, Jibo offers a set of capabilities that are broadly similar to Amazon’s Alexa-powered Echo smart speakers. The Echo, of course, can’t move at all, but backed by Amazon’s massive cloud computing infrastructure and far larger team of highly paid AI developers, it’s information retrieval and natural language capabilities are likely stronger—and certain to become more so over time.


Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, data science, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Gregor Mendel, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, Large Hadron Collider, longitudinal study, machine readable, machine translation, Mars Rover, natural language processing, openstreetmap, Paradox of Choice, power law, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social bookmarking, social graph, SPARQL, sparse data, speech recognition, statistical model, supply-chain management, systematic bias, TED Talk, text mining, the long tail, Vernor Vinge, web application

The author demonstrates shocking prescience. The title of the paper is “A Business Intelligence System,” and it appears to be the first use of the term “Business Intelligence” in its modern context. In addition to the dissemination of information in real time, the system was to allow for “information retrieval”—search—to be conducted over the entire document collection. Luhn’s emphasis on action points focuses the role of information processing on goal completion. In other words, it’s not enough to just collect and aggregate data; an organization must improve its capacity to complete critical tasks because of the insights gleaned from the data.

These may reflect a range of viewpoints and enable us to begin to consider alternative notions of place as we attempt to describe it more effectively. Consequently, Ross Purves and Alistair Edwardes have been using Geograph as a source of descriptions of place in their research at the University of Zurich. Their ultimate objective involves improving information retrieval by automatically adding indexing terms to georeferenced digital photographs that relate to popular notions of place, such as “mountain,” “remote,” or “hiking.” Their work involves validating previous studies and forming new perspectives by comparing Geograph to existing efforts to describe place and analyzing term co-occurrence in the geograph descriptions (Edwardes and Purves 2007).

Zerfos, and J. Cho. “Downloading textual hidden web content through keyword queries.” JCDL 2005: 100–109. SURFACING THE DEEP WEB Download at Boykma.Com 147 Raghavan, S. and H. Garcia-Molina. “Crawling the Hidden Web.” VLDB 2001: 129–138. Salton, G. and M. J. McGill. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983. SpiderMonkey (JavaScript-C) Engine, http://www.mozilla.org/js/spidermonkey/. V8 JavaScript Engine, http://code.google.com/p/v8/. 148 CHAPTER NINE Download at Boykma.Com Chapter 10 CHAPTER TEN Building Radiohead’s House of Cards Aaron Koblin with Valdean Klump THIS IS THE STORY OF HOW THE GRAMMY-NOMINATED MUSIC VIDEO FOR RADIOHEAD’S “HOUSE OF Cards” was created entirely with data.


pages: 135 words: 26,407

How to DeFi by Coingecko, Darren Lau, Sze Jin Teh, Kristian Kho, Erina Azmi, Tm Lee, Bobby Ong

algorithmic trading, asset allocation, Bernie Madoff, bitcoin, blockchain, buy and hold, capital controls, collapse of Lehman Brothers, cryptocurrency, distributed ledger, diversification, Ethereum, ethereum blockchain, fiat currency, Firefox, information retrieval, litecoin, margin call, new economy, passive income, payday loans, peer-to-peer, prediction markets, QR code, reserve currency, robo advisor, smart contracts, tulip mania, two-sided market

Retrieved from https://www.defisnap.io/#/dashboard ~ Chapter 14: DeFi in Action (n.d.). Retrieved October 19, 2019, from https://slideslive.com/38920018/living-on-defi-how-i-survive-argentinas-50-inflation Gundiuc, C. (2019, September 29). Argentina Central Bank Exposed 800 Citizens' Sensitive Information. Retrieved from https://beincrypto.com/argentina-central-bank-exposed-sensitive-information-of-800-citizens/ Lopez, J. M. S. (2020, February 5). Argentina’s ‘little trees’ blossom as forex controls fuel black market. Retrieved from https://www.reuters.com/article/us-argentina-currency-blackmarket/argentinas-little-trees-blossom-as-forex-controls-fuel-black-market-idUSKBN1ZZ1H1 Russo, C. (2019, December 9).


The Art of Computer Programming: Sorting and Searching by Donald Ervin Knuth

card file, Charles Babbage, Claude Shannon: information theory, complexity theory, correlation coefficient, Donald Knuth, double entry bookkeeping, Eratosthenes, Fermat's Last Theorem, G4S, information retrieval, iterative process, John von Neumann, linked data, locality of reference, Menlo Park, Norbert Wiener, NP-complete, p-value, Paul Erdős, RAND corporation, refrigerator car, sorting algorithm, Vilfredo Pareto, Yogi Berra, Zipf's Law

. — TITUS LIVIUS, Ab Urbe Condita XXXIX.vi (Robert Burton, Anatomy of Melancholy 1.2.2.2) This book forms a natural sequel to the material on information structures in Chapter 2 of Volume 1, because it adds the concept of linearly ordered data to the other basic structural ideas. The title "Sorting and Searching" may sound as if this book is only for those systems programmers who are concerned with the preparation of general-purpose sorting routines or applications to information retrieval. But in fact the area of sorting and searching provides an ideal framework for discussing a wide variety of important general issues: • How are good algorithms discovered? • How can given algorithms and programs be improved? • How can the efficiency of algorithms be analyzed mathematically?

For example, given a large file about stage performers, a producer might wish to find all unemployed actresses between 25 and 30 with dancing talent and a French accent; given a large file of baseball statistics, a sportswriter may wish to determine the total number of runs scored by the Chicago White Sox in 1964, during the seventh inning of night games, against left-handed pitchers. Given a large file of data about anything, people like to ask arbitrarily complicated questions. Indeed, we might consider an entire library as a database, and a searcher may want to find everything that has been published about information retrieval. An introduction to the techniques for such secondary key (multi-attribute) retrieval problems appears below in Section 6.5. Before entering into a detailed study of searching, it may be helpful to put things in historical perspective. During the pre-computer era, many books of logarithm tables, trigonometry tables, etc., were compiled, so that mathematical calculations could be replaced by searching.

Suppose that we want to test a given search argument to see whether it is one of the 31 most common words of English (see Figs. 12 and 13 in Section 6.2.2). The data is represented in Table 1 as a trie structure; this name was suggested by E. Fredkin [CACM 3 A960), 490-500] because it is a part of information retrieval. A trie — pronounced "try" —is essentially an M-ary tree, whose nodes are M-place vectors with components corresponding to digits or characters. Each node on level I represents the set of all keys that begin with a certain sequence of I characters called its prefix; the node specifies an M-way branch, depending on the (I + l)st character.


Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

. • Particle physicists have been doing Big Data–style large-scale data analysis for decades, and projects like the Large Hadron Collider (LHC) now work with hun‐ dreds of petabytes! At such a scale custom solutions are required to stop the hardware cost from spiraling out of control [49]. • Full-text search is arguably a kind of data model that is frequently used alongside databases. Information retrieval is a large specialist subject that we won’t cover in great detail in this book, but we’ll touch on search indexes in Chapter 3 and Part III. We have to leave it there for now. In the next chapter we will discuss some of the trade-offs that come into play when implementing the data models described in this chapter.

In LevelDB, this in-memory index is a sparse collection of some of the keys, but in Lucene, the in-memory index is a finite state automaton over the characters in the keys, similar to a trie [38]. This automaton can be transformed into a Levenshtein automaton, which supports efficient search for words within a given edit distance [39]. Other fuzzy search techniques go in the direction of document classification and machine learning. See an information retrieval textbook for more detail [e.g., 40]. Keeping everything in memory The data structures discussed so far in this chapter have all been answers to the limi‐ tations of disks. Compared to main memory, disks are awkward to deal with. With both magnetic disks and SSDs, data on disk needs to be laid out carefully if you want good performance on reads and writes.

Schulz and Stoyan Mihov: “Fast String Correction with Levenshtein Automata,” International Journal on Document Analysis and Recognition, volume 5, number 1, pages 67–85, November 2002. doi:10.1007/s10032-002-0082-8 [40] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduc‐ tion to Information Retrieval. Cambridge University Press, 2008. ISBN: 978-0-521-86571-5, available online at nlp.stanford.edu/IR-book 106 | Chapter 3: Storage and Retrieval [41] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, et al.: “The End of an Architectural Era (It’s Time for a Complete Rewrite),” at 33rd International Confer‐ ence on Very Large Data Bases (VLDB), September 2007. [42] “VoltDB Technical Overview White Paper,” VoltDB, 2014. [43] Stephen M.


pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, billion-dollar mistake, bitcoin, blockchain, business intelligence, business logic, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, data science, database schema, deep learning, DevOps, distributed ledger, Donald Knuth, Edward Snowden, end-to-end encryption, Ethereum, ethereum blockchain, exponential backoff, fake news, fault tolerance, finite state, Flash crash, Free Software Foundation, full text search, functional programming, general-purpose programming language, Hacker News, informal economy, information retrieval, Infrastructure as a Service, Internet of things, iterative process, John von Neumann, Ken Thompson, Kubernetes, Large Hadron Collider, level 1 cache, loose coupling, machine readable, machine translation, Marc Andreessen, microservices, natural language processing, Network effects, no silver bullet, operational security, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, SQL injection, statistical model, surveillance capitalism, systematic bias, systems thinking, Tragedy of the Commons, undersea cable, web application, WebSocket, wikimedia commons

Particle physicists have been doing Big Data–style large-scale data analysis for decades, and projects like the Large Hadron Collider (LHC) now work with hundreds of petabytes! At such a scale custom solutions are required to stop the hardware cost from spiraling out of control [49]. Full-text search is arguably a kind of data model that is frequently used alongside databases. Information retrieval is a large specialist subject that we won’t cover in great detail in this book, but we’ll touch on search indexes in Chapter 3 and Part III. We have to leave it there for now. In the next chapter we will discuss some of the trade-offs that come into play when implementing the data models described in this chapter.

In LevelDB, this in-memory index is a sparse collection of some of the keys, but in Lucene, the in-memory index is a finite state automaton over the characters in the keys, similar to a trie [38]. This automaton can be transformed into a Levenshtein automaton, which supports efficient search for words within a given edit distance [39]. Other fuzzy search techniques go in the direction of document classification and machine learning. See an information retrieval textbook for more detail [e.g., 40]. Keeping everything in memory The data structures discussed so far in this chapter have all been answers to the limitations of disks. Compared to main memory, disks are awkward to deal with. With both magnetic disks and SSDs, data on disk needs to be laid out carefully if you want good performance on reads and writes.

Schulz and Stoyan Mihov: “Fast String Correction with Levenshtein Automata,” International Journal on Document Analysis and Recognition, volume 5, number 1, pages 67–85, November 2002. doi:10.1007/s10032-002-0082-8 [40] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval. Cambridge University Press, 2008. ISBN: 978-0-521-86571-5, available online at nlp.stanford.edu/IR-book [41] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, et al.: “The End of an Architectural Era (It’s Time for a Complete Rewrite),” at 33rd International Conference on Very Large Data Bases (VLDB), September 2007


pages: 855 words: 178,507

The Information: A History, a Theory, a Flood by James Gleick

Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, AltaVista, bank run, bioinformatics, Bletchley Park, Brownian motion, butterfly effect, Charles Babbage, citation needed, classic study, Claude Shannon: information theory, clockwork universe, computer age, Computing Machinery and Intelligence, conceptual framework, crowdsourcing, death of newspapers, discovery of DNA, Donald Knuth, double helix, Douglas Hofstadter, en.wikipedia.org, Eratosthenes, Fellow of the Royal Society, Gregor Mendel, Gödel, Escher, Bach, Henri Poincaré, Honoré de Balzac, index card, informal economy, information retrieval, invention of the printing press, invention of writing, Isaac Newton, Jacquard loom, Jaron Lanier, jimmy wales, Johannes Kepler, John von Neumann, Joseph-Marie Jacquard, Lewis Mumford, lifelogging, Louis Daguerre, machine translation, Marshall McLuhan, Menlo Park, microbiome, Milgram experiment, Network effects, New Journalism, Norbert Wiener, Norman Macrae, On the Economy of Machinery and Manufactures, PageRank, pattern recognition, phenotype, Pierre-Simon Laplace, pre–internet, quantum cryptography, Ralph Waldo Emerson, RAND corporation, reversible computing, Richard Feynman, Rubik’s Cube, Simon Singh, Socratic dialogue, Stephen Hawking, Steven Pinker, stochastic process, talking drums, the High Line, The Wisdom of Crowds, transcontinental railway, Turing machine, Turing test, women in the workforce, yottabyte

Shannon’s theory made a bridge between information and uncertainty; between information and entropy; and between information and chaos. It led to compact discs and fax machines, computers and cyberspace, Moore’s law and all the world’s Silicon Alleys. Information processing was born, along with information storage and information retrieval. People began to name a successor to the Iron Age and the Steam Age. “Man the food-gatherer reappears incongruously as information-gatherer,”♦ remarked Marshall McLuhan in 1967.♦ He wrote this an instant too soon, in the first dawn of computation and cyberspace. We can see now that information is what our world runs on: the blood and the fuel, the vital principle.

It is an ancient observation, but one that seemed to bear restating when information became plentiful—particularly in a world where all bits are created equal and information is divorced from meaning. The humanist and philosopher of technology Lewis Mumford, for example, restated it in 1970: “Unfortunately, ‘information retrieving,’ however swift, is no substitute for discovering by direct personal inspection knowledge whose very existence one had possibly never been aware of, and following it at one’s own pace through the further ramification of relevant literature.”♦ He begged for a return to “moral self-discipline.”

♦ “KNOWLEDGE OF SPEECH, BUT NOT OF SILENCE”: T. S. Eliot, “The Rock,” in Collected Poems: 1909–1962 (New York: Harcourt Brace, 1963), 147. ♦ “THE TSUNAMI OF AVAILABLE FACT”: David Foster Wallace, Introduction to The Best American Essays 2007 (New York: Mariner, 2007). ♦ “UNFORTUNATELY, ‘INFORMATION RETRIEVING,’ HOWEVER SWIFT”: Lewis Mumford, The Myth of the Machine, vol. 2, The Pentagon of Power (New York: Harcourt, Brace, 1970), 182. ♦ “ELECTRONIC MAIL SYSTEM”: Jacob Palme, “You Have 134 Unread Mail! Do You Want to Read Them Now?” in Computer-Based Message Services, ed. Hugh T. Smith (North Holland: Elsevier, 1984), 175–76


pages: 153 words: 27,424

REST API Design Rulebook by Mark Masse

anti-pattern, business logic, conceptual framework, create, read, update, delete, data acquisition, database schema, hypertext link, information retrieval, off-the-grid, web application

The first web server.[8] The first web browser, which Berners-Lee also named “WorldWideWeb” and later renamed “Nexus” to avoid confusion with the Web itself. The first WYSIWYG[9] HTML editor, which was built right into the browser. On August 6, 1991, on the Web’s first page, Berners-Lee wrote, The WorldWideWeb (W3) is a wide-area hypermedia information retrieval initiative aiming to give universal access to a large universe of documents.[10] From that moment, the Web began to grow, at times exponentially. Within five years, the number of web users skyrocketed to 40 million. At one point, the number was doubling every two months. The “universe of documents” that Berners-Lee had described was indeed expanding.


pages: 281 words: 95,852

The Googlization of Everything: by Siva Vaidhyanathan

"Friedman doctrine" OR "shareholder theory", 1960s counterculture, activist fund / activist shareholder / activist investor, AltaVista, barriers to entry, Berlin Wall, borderless world, Burning Man, Cass Sunstein, choice architecture, cloud computing, commons-based peer production, computer age, corporate social responsibility, correlation does not imply causation, creative destruction, data acquisition, death of newspapers, digital divide, digital rights, don't be evil, Firefox, Francis Fukuyama: the end of history, full text search, global pandemic, global village, Google Earth, Great Leap Forward, Howard Rheingold, Ian Bogost, independent contractor, informal economy, information retrieval, John Markoff, Joseph Schumpeter, Kevin Kelly, knowledge worker, libertarian paternalism, market fundamentalism, Marshall McLuhan, means of production, Mikhail Gorbachev, moral panic, Naomi Klein, Network effects, new economy, Nicholas Carr, PageRank, Panopticon Jeremy Bentham, pirate software, radical decentralization, Ray Kurzweil, Richard Thaler, Ronald Reagan, side project, Silicon Valley, Silicon Valley ideology, single-payer health, Skype, Social Responsibility of Business Is to Increase Its Profits, social web, Steven Levy, Stewart Brand, technological determinism, technoutopianism, the long tail, The Nature of the Firm, The Structural Transformation of the Public Sphere, Thorstein Veblen, Tyler Cowen, urban decay, web application, Yochai Benkler, zero-sum game

In 2009 the core service of Google—its Web search engine—handled more than 70 percent of the Web search business in the United States and more than 90 percent in much of Europe, and grew at impressive rates elsewhere around the world. 15. Thorsten Joachims et al., “Accurately Interpreting Clickthrough Data as Implicit Feedback,” Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Salvador, Brazil: ACM, 2005), 154–61. 16. B. J. Jansen and U. Pooch, “A Review of Web Searching Studies and a Framework for Future Research,” Journal of the American Society for Information Science and Technology 52, no. 3 (2001): 235–46; Amanda Spink and Bernard J. Jansen, Web Search: Public Searching on the Web (Dordrecht: Kluwer Academic Publishers, 2004); Caroline M.

A Comparison of Websites across Countries and Domains,” Journal of Computer-Mediated Communication 12, no. 3 (2007), http://jcmc.indiana.edu. 69. Wingyan Chung, “Web Searching in a Multilingual World,” Communications of the ACM 51, no. 5 (2008): 32–40; Fotis Lazarinis et al., “Current Research Issues and Trends in Non-English Web Searching,” Information Retrieval 12, no. 3 (2009): 230–50. 70. “Google’s Market Share in Your Country.” 71. Choe Sang-Hun, “Crowd’s Wisdom Helps South Korean Search Engine Beat Google and Yahoo,” New York Times, July 4, 2007. 72. “S. Korea May Clash with Google over Internet Regulation Differences,” Hankyoreh, April 17, 2009; Kim Tong-hyung, “Google Refuses to Bow to Gov’t Pressure,” Korea Times, April 9, 2009. 73.


RDF Database Systems: Triples Storage and SPARQL Query Processing by Olivier Cure, Guillaume Blin

Amazon Web Services, bioinformatics, business intelligence, cloud computing, database schema, fault tolerance, folksonomy, full text search, functional programming, information retrieval, Internet Archive, Internet of things, linked data, machine readable, NP-complete, peer-to-peer, performance metric, power law, random walk, recommendation engine, RFID, semantic web, Silicon Valley, social intelligence, software as a service, SPARQL, sparse data, web application

Indeed, while compression is the main objective in URI encoding, the main feature sought in RDF stores related to literal is a full text search.The most popular solution for handling a full text search in literals is Lucene, integrated in RDF stores such as Yars2, Jena TDB/SDB, and GraphDB (formerly OWLIM), and in Big Data RDF databases, but it’s also popular for other systems, such as IBM OmnifindY! Edition, Technorati, Wikipedia, Internet Archive, and LinkedIn. Lucene is a very popular open-source information-retrieval library from the Apache Software Foundation (originally created in Java by Doug Cutting). It provides Java-based full-text indexing 99 100 RDF Database Systems and searching capabilities for applications through an easy-to-use API. Lucene is based on powerful and efficient search algorithms using indexes.

Therefore, they can be used to identify the fastest index among the six clustered indexes.The overall claim of this multiple-index approach is that, due to a clever compression strategy, the total size of the indexes is less than the size required by a standard triples table solution. The system supports both individual update operations and updates to entire batches. More details on RDF-3X and its extension X-RDF-3X are provided in Chapter 6. The YARS (Harth and Decker, 2005) system combines methods from information retrieval and databases to allow for better query answering performance over RDF data. It stores RDF data persistently by using six B+tree indexes. It not only stores the subject, the predicate, and the object, but also the context information about the data origin. Each element of the corresponding quad (i.e., 4-uplet) is encoded in a dictionary storing mappings from literals and URIs to object IDs (object IDs are stored as number identifiers for compactness).To speed up keyword queries, the lexicon keeps an inverted index on string literals to allow fast full-text searches.


Beautiful Visualization by Julie Steele

barriers to entry, correlation does not imply causation, data acquisition, data science, database schema, Drosophila, en.wikipedia.org, epigenetics, global pandemic, Hans Rosling, index card, information retrieval, iterative process, linked data, Mercator projection, meta-analysis, natural language processing, Netflix Prize, no-fly zone, pattern recognition, peer-to-peer, performance metric, power law, QR code, recommendation engine, semantic web, social bookmarking, social distancing, social graph, sorting algorithm, Steve Jobs, the long tail, web application, wikimedia commons, Yochai Benkler

[2] See http://bit.ly/4iZib. [3] See http://en.wikipedia.org/wiki/George_Washingtons_Farewell_Address. [4] See http://avalon.law.yale.edu/18th_century/washing.asp. Chapter Nine The Big Picture: Search and Discovery Todd Holloway Search and discovery are two styles of information retrieval. Search is a familiar modality, well exemplified by Google and other web search engines. While there is a discovery aspect to search engines, there are more straightforward examples of discovery systems, such as product recommendations on Amazon and movie recommendations on Netflix. These two types of retrieval systems have in common that they can be incredibly complex under the hood.

She’s the author of acclaimed site thisisindexed.com, and her work has appeared in the New York Times, the BBC Magazine Online, Paste, Golf Digest, Redbook, New York Magazine, the National Post of Canada, the Guardian, Time, and many other old and new media outlets. Todd Holloway can’t get enough of information visualization, information retrieval, machine learning, data mining, the science of networks, and artificial intelligence. He is a Grinnell College and Indiana University alumnus. Noah Iliinsky has spent the last several years thinking about effective approaches to creating diagrams and other types of information visualization.


Getting the Builders in : How to Manage Homebuilding and Renovation Projects by Sales, Leonard John.

information retrieval

Published by How To Content, A division of How To Books Ltd, Spring Hill House, Spring Hill Road, Begbroke Oxford OX5 1RX, United Kingdom. Tel: (01865) 375794. Fax: (01865) 379162. info_howtobooks.co.uk www.howtobooks.co.uk All rights reserved. No part of this work may be reproduced or stored in an information retrieval system (other than for purposes of review) without the express permission of the publisher in writing. The right of Leonard Sales to be identified as the author of this work has been asserted by him in accordance with the Copyright, Designs and Patents Act 1988. © 2008 Leonard Sales First published 2004 Second edition 2006 Third edition 2008 First published in electronic form 2008 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 978 1 84803 285 9 Cover design by Baseline Arts Ltd, Oxford Produced for How To Books by Deer Park Productions, Tavistock, Devon Typeset by TW Typesetting, Plymouth, Devon NOTE: The material contained in this book is set out in good faith for general guidance and no liability can be accepted for loss or expense incurred as a result of relying in particular circumstances on statements made in the book.


pages: 123 words: 32,382

Grouped: How Small Groups of Friends Are the Key to Influence on the Social Web by Paul Adams

Airbnb, Cass Sunstein, cognitive dissonance, content marketing, David Brooks, Dunbar number, information retrieval, invention of the telegraph, Jeff Hawkins, mirror neurons, planetary scale, race to the bottom, Richard Thaler, sentiment analysis, social web, statistical model, the strength of weak ties, The Wisdom of Crowds, web application, white flight

Just as we are surrounded by people throughout our daily life, the web is being rebuilt around people. People are increasingly using the web to seek the information they need from each other, rather than from businesses directly. People always sourced information from each other offline, but up until now, online information retrieval tended to be from a business to a person. The second driving factor is an acknowledgment in our business models of the fact that people live in networks. For many years, we considered people as isolated, independent actors. Most of our consumer behavior models are structured this way—people acting independently, moving down a decision funnel, making objective choices along the way.


pages: 648 words: 108,814

Solr 1.4 Enterprise Search Server by David Smiley, Eric Pugh

Amazon Web Services, bioinformatics, cloud computing, continuous integration, database schema, domain-specific language, en.wikipedia.org, fault tolerance, Firefox, information retrieval, Ruby on Rails, SQL injection, Wayback Machine, web application, Y Combinator

The major features found in Lucene are as follows: • A text-based inverted index persistent storage for efficient retrieval of documents by indexed terms A rich set of text analyzers to transform a string of text into a series of terms (words), which are the fundamental units indexed and searched A query syntax with a parser and a variety of query types from a simple term lookup to exotic fuzzy matches A good scoring algorithm based on sound Information Retrieval (IR) principles to produce the more likely candidates first, with flexible means to affect the scoring A highlighter feature to show words found in context • A query spellchecker based on indexed content • • • • For even more information on the query spellchecker, check out the Lucene In Action book (LINA for short) by Erik Hatcher and Otis Gospodnetić.

NW, , Atlanta, , 30327 hl fragmenter, highlighting component 165 hl maxAlternateFieldLength, highlighting component 165 hl maxAnalyzedChars, highlighting component 165 home directory, Solr bin 15 conf 15 conf/schema.xml 15 conf/solrconfig.xml 15 conf/xslt 15 data 15 lib 15 HTML, indexing in Solr 227 HTMLStripStandardTokenizerFactory 52 HTMLStripStandardTokenizerFactory tokenizer 227 HTMLStripWhitespaceTokenizerFactory 52 HTTP caching 277-279 HTTP server request access logs, logging about 201, 202 log directory, creating 201 Tailing 202 I IDF 33 idf 112 ID field 44 indent, diagnostic parameter 98 index 31 index-time and query-time, boosting 113 versus query-time 57 index-time boosting 70 IndexBasedSpellChecker options field 174 sourceLocation 174 thresholdTokenFrequency 175 index data document access, controlling 221 securing 220 indexed, field option 41 indexed, schema design 282 indexes sharding 295 indexing strategies about 283 factors, committing 285 factors, optimizing 285 unique document checking, disabling 285 Index Searchers 280 Information Retrieval. See IR int element 92 InternetArchive 226 invariants 111 Inverse Document Frequency. See IDF inverse reciprocals 125 IR 8 ISOLatin1AccentFilterFactory filter 62 issue tracker, Solr 27 J J2SE with JConsole 212 JARmageddon 205 jarowinkler, spellchecker 172 java.util.logging package 203 Java class names abbreviated 40 org.apache.solr.schema.BoolField 40 Java Development Kit (JDK) URL 11 JavaDoc tags 234 Java Management Extensions.


pages: 353 words: 104,146

European Founders at Work by Pedro Gairifo Santos

business intelligence, clean tech, cloud computing, crowdsourcing, deal flow, do what you love, fail fast, fear of failure, full text search, Hacker News, hockey-stick growth, information retrieval, inventory management, iterative process, Jeff Bezos, Joi Ito, Lean Startup, Mark Zuckerberg, Multics, natural language processing, pattern recognition, pre–internet, recommendation engine, Richard Stallman, Salesforce, Silicon Valley, Skype, slashdot, SoftBank, Steve Jobs, Steve Wozniak, subscription business, technology bubble, TED Talk, web application, Y Combinator

They're a group who meet every year about music recommendations and information retrieval in music. We ended up hiring a guy called Norman, who was both a great scientist and understood all the algorithms and captive audience sort of things, but also an excellent programmer who was able to implement all these ideas. So we got really lucky. The first person we hired was great and he just took over. He chucked out all of our crappy recommendation systems we had and built something good, and then improved it constantly for the next several years. __________ 2 The International Society for Music Information Retrieval So we had some A/B testing, split testing systems in there for the radio so they could try out new tweaks to the algorithms and see what was performing better.


pages: 913 words: 265,787

How the Mind Works by Steven Pinker

affirmative action, agricultural Revolution, Alfred Russel Wallace, Apple Newton, backpropagation, Buckminster Fuller, cognitive dissonance, Columbine, combinatorial explosion, complexity theory, computer age, computer vision, Computing Machinery and Intelligence, Daniel Kahneman / Amos Tversky, delayed gratification, disinformation, double helix, Dr. Strangelove, experimental subject, feminist movement, four colour theorem, Geoffrey Hinton, Gordon Gekko, Great Leap Forward, greed is good, Gregor Mendel, hedonic treadmill, Henri Poincaré, Herman Kahn, income per capita, information retrieval, invention of agriculture, invention of the wheel, Johannes Kepler, John von Neumann, lake wobegon effect, language acquisition, lateral thinking, Linda problem, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Mikhail Gorbachev, Murray Gell-Mann, mutually assured destruction, Necker cube, out of africa, Parents Music Resource Center, pattern recognition, phenotype, Plato's cave, plutocrats, random walk, Richard Feynman, Ronald Reagan, Rubik’s Cube, Saturday Night Live, scientific worldview, Search for Extraterrestrial Intelligence, sexual politics, social intelligence, Steven Pinker, Stuart Kauffman, tacit knowledge, theory of mind, Thorstein Veblen, Tipper Gore, Turing machine, urban decay, Yogi Berra

A piece that has been requested recently is more likely to be needed now than a piece that has not been requested for a while. An optimal information-retrieval system should therefore be biased to fetch frequently and recently encountered items. Anderson notes that that is exactly what human memory retrieval does: we remember common and recent events better than rare and long-past events. He found four other classic phenomena in memory research that meet the optimal design criteria independently established for computer information-retrieval systems. A third notable feature of access-consciousness is the emotional coloring of experience.

So the neural medium itself is not necessarily to blame. The psychologist John Anderson has reverse-engineered human memory retrieval, and has shown that the limits of memory are not a byproduct of a mushy storage medium. As programmers like to say, “It’s not a bug, it’s a feature.” In an optimally designed information-retrieval system, an item should be recovered only when the relevance of the item outweighs the cost of retrieving it. Anyone who has used a computerized library retrieval system quickly comes to rue the avalanche of titles spilling across the screen. A human expert, despite our allegedly feeble powers of retrieval, vastly outperforms any computer in locating a piece of information from its content.

A human expert, despite our allegedly feeble powers of retrieval, vastly outperforms any computer in locating a piece of information from its content. When I need to find articles on a topic in an unfamiliar field, I don’t use the library computer; I send email to a pal in the field. What would it mean for an information-retrieval system to be optimally designed? It should cough up the information most likely to be useful at the time of the request. But how could that be known in advance? The probabilities could be estimated, using general laws about what kinds of information are most likely to be needed. If such laws exist, we should be able to find them in information systems in general, not just human memory; for example, the laws should be visible in the statistics of books requested at a library or the files retrieved in a computer.


Bookkeeping the Easy Way by Wallace W. Kravitz

double entry bookkeeping, information retrieval, post-work, profit motive

Any similarity with the names and types of business of a real person or company is purely coincidental. © Copyright 1999 by Barron's Educational Series, Inc. Prior © copyrights 1990, 1983 by Barron's Educational Series, Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microfilm, xerography, or any other means, or incorporated into any information retrieval system, electronic or mechanical, without the written permission of the copyright owner. All inquiries should be addressed to: Barron's Educational Series, Inc. 250 Wireless Boulevard Hauppauge, NY 11788 http://www.barronseduc.com Library of Congress Catalog Card No. 99-17245 International Standard Book No. 0-7641-1079-9 Library of Congress Cataloging-in-Publication Data Kravitz, Wallace W.


pages: 429 words: 114,726

The Computer Boys Take Over: Computers, Programmers, and the Politics of Technical Expertise by Nathan L. Ensmenger

barriers to entry, business process, Charles Babbage, Claude Shannon: information theory, computer age, deskilling, Donald Knuth, Firefox, Frederick Winslow Taylor, functional programming, future of work, Grace Hopper, informal economy, information retrieval, interchangeable parts, Isaac Newton, Jacquard loom, job satisfaction, John von Neumann, knowledge worker, Larry Ellison, loose coupling, machine readable, new economy, no silver bullet, Norbert Wiener, pattern recognition, performance metric, Philip Mirowski, post-industrial society, Productivity paradox, RAND corporation, Robert Gordon, scientific management, Shoshana Zuboff, sorting algorithm, Steve Jobs, Steven Levy, systems thinking, tacit knowledge, technological determinism, the market place, The Theory of the Leisure Class by Thorstein Veblen, Thomas Kuhn: the structure of scientific revolutions, Thorstein Veblen, Turing machine, Von Neumann architecture, world market for maybe five computers, Y2K

Daniel McCracken, “The Human Side of Computing,” Datamation 7, no. 1 (1961): 9–11. Chapter 6 1. “The Thinking Machine,” Time magazine, January 23, 1950, 54–60. 2. J. Lear, “Can a Mechanical Brain Replace You?” Colliers, no. 131 (1953), 58–63. 3. “Office Robots,” Fortune 45 (January 1952), 82–87, 112, 114, 116, 118. 4. Cheryl Knott Malone, “Imagining Information Retrieval in the Library: Desk Set in Historical Context,” IEEE Annals of the History of Computing 24, no. 3 (2002): 14–22. 5. Ibid. 6. Ibid. 7. Thorstein Veblen, The Theory of the Leisure Class (New York: McMillan, 1899). 8. Thomas Haigh, “The Chromium-Plated Tabulator: Institutionalizing an Electronic Revolution, 1954–1958,” IEEE Annals of the History of Computing 4, no. 23 (2001), 75–104. 9.

In History of Computing: Software Issues, ed. Ulf Hashagen, Reinhard Keil-Slawik, and Arthur Norberg. Berlin: Springer-Verlag, 2002, 25–48. Mahoney, Michael. “What Makes the History of Software Hard.” IEEE Annals of the History of Computing 30 (3) (2008): 8–18. Malone, Cheryl Knott. “Imagining Information Retrieval in the Library: Desk Set in Historical Context.” IEEE Annals of the History of Computing 24 (3) (2002): 14–22. Mandel, Lois. “The Computer Girls.” Cosmopolitan, April 1967, 52–56. Manion, Mark, and William M. Evan. “The Y2K problem: technological risk and professional responsibility.”


pages: 924 words: 196,343

JavaScript & jQuery: The Missing Manual by David Sawyer McFarland

Firefox, framing effect, functional programming, HyperCard, information retrieval, Ruby on Rails, Steve Jobs, web application

JavaScript lets a web page react intelligently. With it, you can create smart web forms that let visitors know when they’ve forgotten to include necessary information; you can make elements appear, disappear, or move around a web page (see Figure 1-1); you can even update the contents of a web page with information retrieved from a web server—without having to load a new web page. In short, JavaScript lets you make your websites more engaging and effective. Figure 1-1. JavaScript lets web pages respond to visitors. On Amazon.com, mousing over the “Gifts & Wish Lists” link opens a tab that floats above the other content on the page and offers additional options.

It can be as simple as this: { firstName : 'Bob', lastName : 'Smith' } In this code, firstName acts like a key with a value of Bob—a simple string value. However, the value can also be another object (see Figure 11-10 on page 376), so you can often end up with a complex nested structure—like dolls within dolls. That’s what Flickr’s JSON feed is like. Here’s a small snippet of one of those feeds. It shows the information retrieved for two photos: 1 { 2 "title": "Uploads from Smithsonian Institution", 3 "link": "http://www.flickr.com/photos/smithsonian/", 4 "description": "", 5 "modified": "2011-08-11T13:16:37Z", 6 "generator": "http://www.flickr.com/", 7 "items": [ 8 { 9 "title": "East Island, June 12, 1966.", 10 "link": "http://www.flickr.com/photos/smithsonian/5988083516/", 11 "media": {"m":"http://farm7.static.flickr.com/6029/5988083516_ bfc9f41286_m.jpg"}, 12 "date_taken": "2011-07-29T11:45:50-08:00", 13 "description": "Short description here", 14 "published": "2011-08-11T13:16:37Z", 15 "author": "nobody@flickr.com (Smithsonian Institution)", 16 "author_id": "25053835@N03", 17 "tags": "ocean birds redfootedbooby" 18 }, 19 { 20 "title": "Phoenix Island, April 15, 1966

You might add some code in your program to do that like this: $('.weed').click(function() { $(this).remove(); }); // end click The problem with this code is that it only applies to elements that already exist. If you programmatically add new divs—<div class=“weed”>—the click handler isn’t applied to them. Code that applies only to existing elements is also a problem when you use Ajax as described in Part Four of this book. Ajax lets you update content on a page using information retrieved from a web server. Gmail, for example, can display new mail as you receive it by continually retrieving it from a web server and updating the content in the web browser. In this case, your list of received emails changes after you first started using Gmail. Any events that were applied to the page content when the page loads won’t apply to the new content added from the server.


Scikit-Learn Cookbook by Trent Hauck

bioinformatics, book value, computer vision, data science, information retrieval, p-value

We just need to find some distance metric, compute the pairwise distances, and compare the outcomes to what's expected. Getting ready A lower-level utility in scikit-learn is sklearn.metrics.pairwise. This contains server functions to compute the distances between the vectors in a matrix X or the distances between the vectors in X and Y easily. This can be useful for information retrieval. For example, given a set of customers with attributes of X, we might want to take a reference customer and find the closest customers to this customer. In fact, we might want to rank customers by the notion of similarity measured by a distance function. The quality of the similarity depends upon the feature space selection as well as any transformation we might do on the space.


pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts by Richard Susskind, Daniel Susskind

23andMe, 3D printing, Abraham Maslow, additive manufacturing, AI winter, Albert Einstein, Amazon Mechanical Turk, Amazon Robotics, Amazon Web Services, Andrew Keen, Atul Gawande, Automated Insights, autonomous vehicles, Big bang: deregulation of the City of London, big data - Walmart - Pop Tarts, Bill Joy: nanobots, Blue Ocean Strategy, business process, business process outsourcing, Cass Sunstein, Checklist Manifesto, Clapham omnibus, Clayton Christensen, clean water, cloud computing, commoditize, computer age, Computer Numeric Control, computer vision, Computing Machinery and Intelligence, conceptual framework, corporate governance, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, death of newspapers, disintermediation, Douglas Hofstadter, driverless car, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, Filter Bubble, full employment, future of work, Garrett Hardin, Google Glasses, Google X / Alphabet X, Hacker Ethic, industrial robot, informal economy, information retrieval, interchangeable parts, Internet of things, Isaac Newton, James Hargreaves, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Khan Academy, knowledge economy, Large Hadron Collider, lifelogging, lump of labour, machine translation, Marshall McLuhan, Metcalfe’s law, Narrative Science, natural language processing, Network effects, Nick Bostrom, optical character recognition, Paul Samuelson, personalized medicine, planned obsolescence, pre–internet, Ray Kurzweil, Richard Feynman, Second Machine Age, self-driving car, semantic web, Shoshana Zuboff, Skype, social web, speech recognition, spinning jenny, strong AI, supply-chain management, Susan Wojcicki, tacit knowledge, TED Talk, telepresence, The Future of Employment, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Tragedy of the Commons, transaction costs, Turing test, Two Sigma, warehouse robotics, Watson beat the top human players on Jeopardy!, WikiLeaks, world market for maybe five computers, Yochai Benkler, young professional

Here is a system that undoubtedly performs tasks that we would normally think require human intelligence. The version of Watson that competed on Jeopardy! holds over 200 million pages of documents and implements a wide range of AI tools and techniques, including natural language processing, machine learning, speech synthesis, game-playing, information retrieval, intelligent search, knowledge processing and reasoning, and much more. This type of AI, we stress again, is radically different from the first wave of rule-based expert systems of the 1980s (see section 4.9). It is interesting to note, harking back again to the exponential growth of information technology, that the hardware on which Watson ran in 2011 was said to be about the size of the average bedroom.

In the thaw that has followed the winter, over the past few years, we have seen a series of significant developments—Big Data, Watson, robotics, and affective computing—that we believe point to a second wave of AI. In summary, the computerization of the work of professionals began in earnest in the late 1970s with information retrieval systems. Then, in the 1980s, there were first-generation AI systems in the professions, whose main focus was expert systems technologies. In the next decade, the 1990s, there was a shift towards the field of knowledge management, when professionals started to store and retrieve not just source materials but know-how and working practices.


pages: 170 words: 49,193

The People vs Tech: How the Internet Is Killing Democracy (And How We Save It) by Jamie Bartlett

Ada Lovelace, Airbnb, AlphaGo, Amazon Mechanical Turk, Andrew Keen, autonomous vehicles, barriers to entry, basic income, Bernie Sanders, Big Tech, bitcoin, Black Lives Matter, blockchain, Boris Johnson, Californian Ideology, Cambridge Analytica, central bank independence, Chelsea Manning, cloud computing, computer vision, creative destruction, cryptocurrency, Daniel Kahneman / Amos Tversky, data science, deep learning, DeepMind, disinformation, Dominic Cummings, Donald Trump, driverless car, Edward Snowden, Elon Musk, Evgeny Morozov, fake news, Filter Bubble, future of work, general purpose technology, gig economy, global village, Google bus, Hans Moravec, hive mind, Howard Rheingold, information retrieval, initial coin offering, Internet of things, Jeff Bezos, Jeremy Corbyn, job automation, John Gilmore, John Maynard Keynes: technological unemployment, John Perry Barlow, Julian Assange, manufacturing employment, Mark Zuckerberg, Marshall McLuhan, Menlo Park, meta-analysis, mittelstand, move fast and break things, Network effects, Nicholas Carr, Nick Bostrom, off grid, Panopticon Jeremy Bentham, payday loans, Peter Thiel, post-truth, prediction markets, QR code, ransomware, Ray Kurzweil, recommendation engine, Renaissance Technologies, ride hailing / ride sharing, Robert Mercer, Ross Ulbricht, Sam Altman, Satoshi Nakamoto, Second Machine Age, sharing economy, Silicon Valley, Silicon Valley billionaire, Silicon Valley ideology, Silicon Valley startup, smart cities, smart contracts, smart meter, Snapchat, Stanford prison experiment, Steve Bannon, Steve Jobs, Steven Levy, strong AI, surveillance capitalism, TaskRabbit, tech worker, technological singularity, technoutopianism, Ted Kaczynski, TED Talk, the long tail, the medium is the message, the scientific method, The Spirit Level, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, too big to fail, ultimatum game, universal basic income, WikiLeaks, World Values Survey, Y Combinator, you are the product

Apple splashed out $200 million for Turi, a machine learning start-up, in 2016, and Intel has invested over $1 billion in AI companies over the past couple of years.7 Market leaders in AI like Google, with the data, the geniuses, the experience and the computing power, won’t be limited to just search and information retrieval. They will also be able to leap ahead in almost anything where AI is important: logistics, driverless cars, medical research, television, factory production, city planning, agriculture, energy use, storage, clerical work, education and who knows what else. Amazon is already a retailer, marketing platform, delivery and logistics network, payment system, credit lender, auction house, book publisher, TV production company, fashion designer and cloud computing provider.8 What next?


pages: 165 words: 50,798

Intertwingled: Information Changes Everything by Peter Morville

A Pattern Language, Airbnb, Albert Einstein, Arthur Eddington, augmented reality, Bernie Madoff, bike sharing, Black Swan, business process, Cass Sunstein, cognitive dissonance, collective bargaining, Computer Lib, disinformation, disruptive innovation, folksonomy, holacracy, index card, information retrieval, Internet of things, Isaac Newton, iterative process, Jane Jacobs, Jeff Hawkins, John Markoff, Kanban, Lean Startup, Lyft, messenger bag, minimum viable product, Mother of all demos, Nelson Mandela, Paul Graham, peer-to-peer, Project Xanadu, quantum entanglement, RFID, Richard Thaler, ride hailing / ride sharing, Schrödinger's Cat, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley startup, single source of truth, source of truth, Steve Jobs, Stewart Brand, systems thinking, Ted Nelson, the Cathedral and the Bazaar, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, theory of mind, uber lyft, urban planning, urban sprawl, Vannevar Bush, vertical integration, zero-sum game

In 1992, I started classes at the School of Information and Library Studies, and promptly began to panic. I was stuck in required courses like Reference and Cataloging with people who wanted to be librarians. In hindsight, I’m glad I took those classes, but at the time I was convinced I’d made a very big mistake. It took a while to find my groove. I studied information retrieval and database design. I explored Dialog, the world’s first commercial online search service. And I fell madly in love with the Internet. The tools were crude, the content sparse, but the promise irresistible. A global network of networks that provides universal access to ideas and information: how could anyone who loves knowledge resist that?


Principles of Protocol Design by Robin Sharp

accounting loophole / creative accounting, business process, discrete time, exponential backoff, fault tolerance, finite state, functional programming, Gödel, Escher, Bach, information retrieval, loose coupling, MITM: man-in-the-middle, OSI model, packet switching, quantum cryptography, RFC: Request For Comment, stochastic process

Challenge-response mechanism for authentication of client (see Section 11.4.4). Coding: ASCII encoding of all PDUs. Addressing: Uniform Resource Identifier (URI) identifies destination system and path to resource. Fault tolerance: Resistance to corruption via optional MD5 checksumming of resource content during transfer. 11.4.3 Web Caching Since most distributed information retrieval applications involve transfer of considerable amounts of data through the network, caching is commonly used in order to reduce the amount of network traffic and reduce response times. HTTP, which is intended to support such applications, therefore includes explicit mechanisms for controlling the operation of caching.

The proceedings of the two series of international workshops on “Intelligent Agents for Telecommunication Applications”, and on “Cooperative Information Agents” are good places to search for the results of recent research into both theory and applications of agents in the telecommunications and information retrieval areas. A new trend in the construction of very large distributed systems is to base them on Grid technology. This is a technology for coordinating the activities of a potentially huge number of computers, in order to supply users with computer power, in the form of CPU power, storage and other resources.


pages: 286 words: 94,017

Future Shock by Alvin Toffler

Albert Einstein, Alvin Toffler, Brownian motion, Buckminster Fuller, Charles Lindbergh, cognitive dissonance, Colonization of Mars, corporate governance, East Village, Future Shock, global village, Great Leap Forward, Haight Ashbury, Herman Kahn, information retrieval, intentional community, invention of agriculture, invention of movable type, invention of writing, Lewis Mumford, longitudinal study, Marshall McLuhan, mass immigration, Menlo Park, New Urbanism, Norman Mailer, open immigration, planned obsolescence, post-industrial society, RAND corporation, social intelligence, Teledyne, the market place, Thomas Kuhn: the structure of scientific revolutions, urban renewal, Whole Earth Catalog, zero-sum game

The profession of airline flight engineer, he notes, emerged and then began to die out within a brief period of fifteen years. A look at the "help wanted" pages of any major newspaper brings home the fact that new occupations are increasing at a mind-dazzling rate. Systems analyst, console operator, coder, tape librarian, tape handler, are only a few of those connected with computer operations. Information retrieval, optical scanning, thin-film technology all require new kinds of expertise, while old occupations lose importance or vanish altogether. When Fortune magazine in the mid-1960's surveyed 1,003 young executives employed by major American corporations, it found that fully one out of three held a job that simply had not existed until he stepped into it.

This itself, with its demands for uniform discipline, regular hours, attendance checks and the like, was a standardizing force. Advanced technology will, in the future, make much of this unnecessary. A good deal of education will take place in the student's own room at home or in a dorm, at hours of his own choosing. With vast libraries of data available to him via computerized information retrieval systems, with his own tapes and video units, his own language laboratory and his own electronically equipped study carrel, he will be freed, for much of the time, of the restrictions and unpleasantness that dogged him in the lockstep classroom. The technology upon which these new freedoms will be based will inevitably spread through the schools in the years ahead—aggressively pushed, no doubt, by major corporations like IBM, RCA, and Xerox.


pages: 573 words: 157,767

From Bacteria to Bach and Back: The Evolution of Minds by Daniel C. Dennett

Ada Lovelace, adjacent possible, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, AlphaGo, Andrew Wiles, Bayesian statistics, bioinformatics, bitcoin, Bletchley Park, Build a better mousetrap, Claude Shannon: information theory, computer age, computer vision, Computing Machinery and Intelligence, CRISPR, deep learning, disinformation, double entry bookkeeping, double helix, Douglas Hofstadter, Elon Musk, epigenetics, experimental subject, Fermat's Last Theorem, Gödel, Escher, Bach, Higgs boson, information asymmetry, information retrieval, invention of writing, Isaac Newton, iterative process, John von Neumann, language acquisition, megaproject, Menlo Park, Murray Gell-Mann, Necker cube, Norbert Wiener, pattern recognition, phenotype, Richard Feynman, Rodney Brooks, self-driving car, social intelligence, sorting algorithm, speech recognition, Stephen Hawking, Steven Pinker, strong AI, Stuart Kauffman, TED Talk, The Wealth of Nations by Adam Smith, theory of mind, Thomas Bayes, trickle-down economics, Turing machine, Turing test, Watson beat the top human players on Jeopardy!, Y2K

“Flash Signal Evolution, Mate Choice and Predation in Fireflies.” Annual Review of Entomology 53: 293–321. Lieberman, Matthew D. 2013. Social: Why Our Brains Are Wired to Connect. New York: Crown. Littman, Michael L., Susan T. Dumais, and Thomas K. Landauer. 1998. “Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing.” In Cross-Language Information Retrieval, 51–62. New York: Springer. Lycan, William G. 1987. Consciousness. Cambridge, Mass.: MIT Press. MacCready, P. 1999. “An Ambivalent Luddite at a Technological Feast.” Designfax, August. MacKay, D. M. 1968. “Electroencephalogram Potentials Evoked by Accelerated Visual Motion.”


pages: 501 words: 145,943

If Mayors Ruled the World: Dysfunctional Nations, Rising Cities by Benjamin R. Barber

"World Economic Forum" Davos, Aaron Swartz, Affordable Care Act / Obamacare, American Legislative Exchange Council, Berlin Wall, bike sharing, borderless world, Boris Johnson, Bretton Woods, British Empire, car-free, carbon footprint, Cass Sunstein, Celebration, Florida, classic study, clean water, congestion pricing, corporate governance, Crossrail, crowdsourcing, David Brooks, desegregation, Detroit bankruptcy, digital divide, digital Maoism, digital rights, disinformation, disintermediation, edge city, Edward Glaeser, Edward Snowden, Etonian, Evgeny Morozov, failed state, Fall of the Berlin Wall, feminist movement, Filter Bubble, gentrification, George Gilder, ghettoisation, global pandemic, global village, Hernando de Soto, Howard Zinn, illegal immigration, In Cold Blood by Truman Capote, income inequality, informal economy, information retrieval, Jane Jacobs, Jaron Lanier, Jeff Bezos, Lewis Mumford, London Interbank Offered Rate, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, megacity, microcredit, Mikhail Gorbachev, mortgage debt, mutually assured destruction, new economy, New Urbanism, Nicholas Carr, Norman Mailer, nuclear winter, obamacare, Occupy movement, off-the-grid, Panopticon Jeremy Bentham, Peace of Westphalia, Pearl River Delta, peer-to-peer, planetary scale, plutocrats, Prenzlauer Berg, profit motive, Ralph Waldo Emerson, RFID, Richard Florida, Ronald Reagan, self-driving car, Silicon Valley, SimCity, Skype, smart cities, smart meter, Steve Jobs, Stewart Brand, technological determinism, technological solutionism, TED Talk, Telecommunications Act of 1996, The Death and Life of Great American Cities, The Fortune at the Bottom of the Pyramid, The future is already here, The Wealth of Nations by Adam Smith, Tobin tax, Tony Hsieh, trade route, UNCLOS, UNCLOS, unpaid internship, urban sprawl, Virgin Galactic, War on Poverty, zero-sum game

I myself was fascinated when, nearly thirty years ago, I enthused about emerging interactive technologies and the impact they might have on citizenship and “strong democracy”: The wiring of homes for cable television across America . . . the availability of low frequency and satellite transmissions in areas beyond regular transmission or cable and the interactive possibilities of video, computers, and information retrieval systems open up a new mode of human communication that can be used either in civic and constructive ways or in manipulative and destructive ways.19 Mine was one of the earliest instances of anticipatory enthusiasm (though laced with skepticism), but a decade later with the web actually in development, cyber zealots were everywhere predicting a new electronic frontier for civic interactivity.

—than the founders and CEOs of immensely powerful tech firms that are first of all profit-seeking, market-monopolizing, consumer-craving commercial entities no more virtuous (or less virtuous) than oil or tobacco or weapons manufacturing firms. It should not really be a surprise that Apple will exploit cheap labor at its Foxconn subsidiary glass manufacturer in China or that Google will steer to the wind, allowing states like China to dictate the terms of “information retrieval” in their own domains. Or that the World Wide Web is being called the “walled-wide-web” by defenders of an open network who fear they are losing the battle. Dictators, nowadays mostly faltering or gone, are no longer the most potent threat to democracy: robust corporations are, not because they are enemies of popular sovereignty but because court decisions like Buckley v.


pages: 189 words: 57,632

Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future by Cory Doctorow

AltaVista, AOL-Time Warner, book scanning, Brewster Kahle, Burning Man, cognitive load, drop ship, en.wikipedia.org, general purpose technology, informal economy, information retrieval, Internet Archive, invention of movable type, Jeff Bezos, John Gilmore, John Perry Barlow, Law of Accelerating Returns, machine readable, Metcalfe's law, mirror neurons, Mitch Kapor, moral panic, mutually assured destruction, Neal Stephenson, new economy, optical character recognition, PalmPilot, patent troll, pattern recognition, peer-to-peer, Ponzi scheme, post scarcity, QWERTY keyboard, Ray Kurzweil, RFID, Sand Hill Road, Skype, slashdot, Snow Crash, social software, speech recognition, Steve Jobs, the long tail, Thomas Bayes, Turing test, Vernor Vinge, Wayback Machine

Taken more broadly, this kind of metadata can be thought of as a pedigree: who thinks that this document is valuable? How closely correlated have this person's value judgments been with mine in times gone by? This kind of implicit endorsement of information is a far better candidate for an information-retrieval panacea than all the world's schema combined. Amish for QWERTY (Originally published on the O'Reilly Network, 07/09/2003) I learned to type before I learned to write. The QWERTY keyboard layout is hard-wired to my brain, such that I can't write anything of significance without that I have a 101-key keyboard in front of me.


pages: 144 words: 55,142

Interlibrary Loan Practices Handbook by Cherie L. Weible, Karen L. Janke

Firefox, information retrieval, Internet Archive, late fees, machine readable, Multics, optical character recognition, pull request, QR code, transaction costs, Wayback Machine, Works Progress Administration

If an electronic resources management system is not available or used, it is important to find the interlibrary loan terms on a license and record this information in the ILL department. The terms of the license should be upheld. Regular communication with 41 42 lending workflow basics library staff who are responsible for licensing will ensure that ILL staff are aware of any new or updated license information. Retrieving the Item If the print item is owned and available, the call number or other location-specific information should be noted on the request. Borrowers might request a particular edition or year, so careful attention should be paid to make sure the call number and item are an exact match. All requests should be collected and sorted by location and the items pulled from the stacks at least daily.


pages: 190 words: 62,941

Wild Ride: Inside Uber's Quest for World Domination by Adam Lashinsky

"Susan Fowler" uber, "World Economic Forum" Davos, Airbnb, always be closing, Amazon Web Services, asset light, autonomous vehicles, Ayatollah Khomeini, Benchmark Capital, business process, Chuck Templeton: OpenTable:, cognitive dissonance, corporate governance, DARPA: Urban Challenge, Didi Chuxing, Donald Trump, driverless car, Elon Musk, Erlich Bachman, gig economy, Golden Gate Park, Google X / Alphabet X, hustle culture, independent contractor, information retrieval, Jeff Bezos, John Zimmer (Lyft cofounder), Lyft, Marc Andreessen, Mark Zuckerberg, megacity, Menlo Park, multilevel marketing, new economy, pattern recognition, price mechanism, public intellectual, reality distortion field, ride hailing / ride sharing, Salesforce, San Francisco homelessness, Sand Hill Road, self-driving car, side hustle, Silicon Valley, Silicon Valley billionaire, Silicon Valley startup, Skype, Snapchat, South of Market, San Francisco, sovereign wealth fund, statistical model, Steve Jobs, super pumped, TaskRabbit, tech worker, Tony Hsieh, transportation-network company, Travis Kalanick, turn-by-turn navigation, Uber and Lyft, Uber for X, uber lyft, ubercab, young professional

Still, the simplicity of the product masked the complexity of the software code necessary to build it. Camp was getting a master’s degree in software engineering, and though he and his friends bootstrapped StumbleUpon with their labor and little cash, their graduate research dovetailed with the product. Camp’s thesis was on “information retrieval through collaborative interface design and evolutionary algorithms.” Like Facebook, which began a few years later, StumbleUpon was a dorm-room success. It grew quickly to hundreds of thousands of users with Camp and his cofounders as the only employees. (Revenue would follow in later years with an early form of “native” advertising, full-page ads that would appear after several “stumbles,” or items users were discovering.)


pages: 673 words: 164,804

Peer-to-Peer by Andy Oram

AltaVista, big-box store, c2.com, combinatorial explosion, commoditize, complexity theory, correlation coefficient, dark matter, Dennis Ritchie, fault tolerance, Free Software Foundation, Garrett Hardin, independent contractor, information retrieval, Kickstarter, Larry Wall, Marc Andreessen, moral hazard, Network effects, P = NP, P vs NP, p-value, packet switching, PalmPilot, peer-to-peer, peer-to-peer model, Ponzi scheme, power law, radical decentralization, rolodex, Ronald Coase, Search for Extraterrestrial Intelligence, semantic web, SETI@home, Silicon Valley, slashdot, statistical model, Tragedy of the Commons, UUNET, Vernor Vinge, web application, web of trust, Zimmermann PGP

Suppose you query the Gnutella network for “strawberry rhubarb pie.” You expect a few results that let you download a recipe. That’s what we expect from today’s Gnutella system, but it actually doesn’t capture the unique properties Gnutella offers. Remember, Gnutella is a distributed, real-time information retrieval system wherein your query is disseminated across the network in its raw form. That means that every node that receives your query can interpret your query however it wants and respond however it wants, in free form. In fact, Gnutella file-sharing software does just that. Each flavor of Gnutella software interprets the search queries differently.

[75] “eBay Feedback Removal Policy,” http://pages.ebay.com/help/community/fbremove.html. [76] D. Chaum (1981), “Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms.” Communications of the ACM, vol. 24, no. 2, pp.84-88. [77] “Electronic Frontiers Georgia Remailer Uptime List,” http://anon.efga.org. [78] Tal Malkin (1999), MIT Ph.D. thesis, “Private Information Retrieval and Oblivious Transfer.” [79] Masayuki Abe (1998), “Universally Verifiable MIX-Network with Verification Work Independent of the Number of MIX Servers,” EUROCRYPT ’98, Springer-Verlag LNCS. [80] We ignore the possibility of traffic analysis here and assume that the user chooses more than one hop


pages: 245 words: 64,288

Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy by Pistono, Federico

3D printing, Albert Einstein, autonomous vehicles, bioinformatics, Buckminster Fuller, cloud computing, computer vision, correlation does not imply causation, en.wikipedia.org, epigenetics, Erik Brynjolfsson, Firefox, future of work, gamification, George Santayana, global village, Google Chrome, happiness index / gross national happiness, hedonic treadmill, illegal immigration, income inequality, information retrieval, Internet of things, invention of the printing press, Jeff Hawkins, jimmy wales, job automation, John Markoff, Kevin Kelly, Khan Academy, Kickstarter, Kiva Systems, knowledge worker, labor-force participation, Lao Tzu, Law of Accelerating Returns, life extension, Loebner Prize, longitudinal study, means of production, Narrative Science, natural language processing, new economy, Occupy movement, patent troll, pattern recognition, peak oil, post scarcity, QR code, quantum entanglement, race to the bottom, Ray Kurzweil, recommendation engine, RFID, Rodney Brooks, selection bias, self-driving car, seminal paper, slashdot, smart cities, software as a service, software is eating the world, speech recognition, Steven Pinker, strong AI, synthetic biology, technological singularity, TED Talk, Turing test, Vernor Vinge, warehouse automation, warehouse robotics, women in the workforce

While our brains will stay pretty much the same for the next 20 years, computer’s efficiency and computational power will have doubled about twenty times. That is a million-fold increase. So, for the same $3 million you will have a computer a million times more powerful than Watson, or you could have a Watson-equivalent computer for $3. Watson’s computational power and exceptional skills of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, Machine Learning, and open domain question answering are already being put to better use than showing off at a TV contest. IBM and Nuance Communications Inc. are partnering for the research project to develop a commercial product during the next 18 to 24 months that will exploit Watson’s capabilities as a clinical decision support system to aid the diagnosis and treatment of patients.86 Recall the example of automated radiologists we mentioned earlier.


pages: 222 words: 70,132

Move Fast and Break Things: How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy by Jonathan Taplin

"Friedman doctrine" OR "shareholder theory", "there is no alternative" (TINA), 1960s counterculture, affirmative action, Affordable Care Act / Obamacare, Airbnb, AlphaGo, Amazon Mechanical Turk, American Legislative Exchange Council, AOL-Time Warner, Apple's 1984 Super Bowl advert, back-to-the-land, barriers to entry, basic income, battle of ideas, big data - Walmart - Pop Tarts, Big Tech, bitcoin, Brewster Kahle, Buckminster Fuller, Burning Man, Clayton Christensen, Cody Wilson, commoditize, content marketing, creative destruction, crony capitalism, crowdsourcing, data is the new oil, data science, David Brooks, David Graeber, decentralized internet, don't be evil, Donald Trump, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, Elon Musk, equal pay for equal work, Erik Brynjolfsson, Fairchild Semiconductor, fake news, future of journalism, future of work, George Akerlof, George Gilder, Golden age of television, Google bus, Hacker Ethic, Herbert Marcuse, Howard Rheingold, income inequality, informal economy, information asymmetry, information retrieval, Internet Archive, Internet of things, invisible hand, Jacob Silverman, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: technological unemployment, John Perry Barlow, John von Neumann, Joseph Schumpeter, Kevin Kelly, Kickstarter, labor-force participation, Larry Ellison, life extension, Marc Andreessen, Mark Zuckerberg, Max Levchin, Menlo Park, Metcalfe’s law, military-industrial complex, Mother of all demos, move fast and break things, natural language processing, Network effects, new economy, Norbert Wiener, offshore financial centre, packet switching, PalmPilot, Paul Graham, paypal mafia, Peter Thiel, plutocrats, pre–internet, Ray Kurzweil, reality distortion field, recommendation engine, rent-seeking, revision control, Robert Bork, Robert Gordon, Robert Metcalfe, Ronald Reagan, Ross Ulbricht, Sam Altman, Sand Hill Road, secular stagnation, self-driving car, sharing economy, Silicon Valley, Silicon Valley ideology, Skinner box, smart grid, Snapchat, Social Justice Warrior, software is eating the world, Steve Bannon, Steve Jobs, Stewart Brand, tech billionaire, techno-determinism, technoutopianism, TED Talk, The Chicago School, the long tail, The Market for Lemons, The Rise and Fall of American Growth, Tim Cook: Apple, trade route, Tragedy of the Commons, transfer pricing, Travis Kalanick, trickle-down economics, Tyler Cowen, Tyler Cowen: Great Stagnation, universal basic income, unpaid internship, vertical integration, We are as Gods, We wanted flying cars, instead we got 140 characters, web application, Whole Earth Catalog, winner-take-all economy, women in the workforce, Y Combinator, you are the product

The effect on the thousand people gathered for the conference was revolutionary. Imagine the first performance of Stravinsky’s The Rite of Spring but without the boos and walkouts. People were thunderstruck by this radical upending of what a computer could be. No longer a giant calculation machine, it was a personal tool of communication and information retrieval. 2. It is not an exaggeration to say that the work of Steve Jobs, Bill Gates, Larry Page, and Mark Zuckerberg stands on the shoulders of Doug Engelbart. Yet Engelbart’s vision of the computing future was different from today’s reality. In the run-up to the demonstration, Bill English had enlisted the help of Whole Earth Catalog publisher Stewart Brand, who had produced the Acid Tests with Ken Kesey two years earlier.


pages: 222 words: 74,587

Paper Machines: About Cards & Catalogs, 1548-1929 by Markus Krajewski, Peter Krapp

Apollo 11, business process, Charles Babbage, continuation of politics by other means, double entry bookkeeping, Frederick Winslow Taylor, Gödel, Escher, Bach, index card, Index librorum prohibitorum, information retrieval, invention of movable type, invention of the printing press, Jacques de Vaucanson, Johann Wolfgang von Goethe, Joseph-Marie Jacquard, knowledge worker, means of production, new economy, paper trading, Turing machine, work culture

Paper Machines History and Foundations of Information Science Edited by Michael Buckland, Jonathan Furner, and Markus Krajewski Human Information Retrieval by Julian Warner Good Faith Collaboration: The Culture of Wikipedia by Joseph Michael Reagle Jr. Paper Machines: About Cards & Catalogs, 1548–1929 by Markus Krajewski Paper Machines About Cards & Catalogs, 1548–1929 Markus Krajewski translated by Peter Krapp The MIT Press Cambridge, Massachusetts London, England © 2011 Massachusetts Institute of Technology © für die deutsche Ausgabe 2002, Kulturverlag Kadmos Berlin All rights reserved.


pages: 244 words: 66,599

Insanely Great: The Life and Times of Macintosh, the Computer That Changed Everything by Steven Levy

Apple II, Apple's 1984 Super Bowl advert, Bill Atkinson, computer age, Computer Lib, conceptual framework, Do you want to sell sugared water for the rest of your life?, Douglas Engelbart, Douglas Engelbart, Dynabook, General Magic , Howard Rheingold, HyperCard, information retrieval, information trail, Ivan Sutherland, John Markoff, John Perry Barlow, Kickstarter, knowledge worker, Marshall McLuhan, Mitch Kapor, Mother of all demos, Pepsi Challenge, Productivity paradox, QWERTY keyboard, reality distortion field, rolodex, Silicon Valley, skunkworks, speech recognition, Steve Jobs, Steve Wozniak, Steven Levy, Ted Nelson, The Home Computer Revolution, the medium is the message, Vannevar Bush

That was the intangible benefit of HyperCard-a hastening of what now seems an inevitable reordering of the way we consume information. On a more basic level, HyperCard found several niches, the most prevalent being an easy-to-use control panel, or "front end," for databases, providing easy access for files, pictures, notes, and video clips that otherwise would be elusive to those unschooled in the black arts of information retrieval. Thus it became associated with another use of Macintosh that would become central to the computer's role in nudging digital technology a little closer to the familiar: multimedia. In recent years multimedia has taken on a negative connotation in the computer industry. The term is often used with a suspicious fuzziness, and is often dismissed as a meaningless buzzword, tainted by hucksters invoking the word to move new hardware.


pages: 671 words: 228,348

Pro AngularJS by Adam Freeman

business logic, business process, create, read, update, delete, en.wikipedia.org, Google Chrome, information retrieval, inventory management, MVC pattern, place-making, premature optimization, revision control, Ruby on Rails, single page application, web application

A URL like this will be a requested relative to the main HTML document, which means that I don’t have to hard-code protocols, hostnames, and ports into the application. GET AND POST: PICK THE RIGHT ONE The rule of thumb is that GET requests should be used for all read-only information retrieval, while POST requests should be used for any operation that changes the application state. In standards-compliance terms, GET requests are for safe interactions (having no side effects besides information retrieval), and POST requests are for unsafe interactions (making a decision or changing something). These conventions are set by the World Wide Web Consortium (W3C), at www.w3.org/Protocols/rfc2616/rfc2616-sec9.html.


pages: 1,302 words: 289,469

The Web Application Hacker's Handbook: Finding and Exploiting Security Flaws by Dafydd Stuttard, Marcus Pinto

business logic, call centre, cloud computing, commoditize, database schema, defense in depth, easy for humans, difficult for computers, Firefox, information retrieval, information security, lateral thinking, machine readable, MITM: man-in-the-middle, MVC pattern, optical character recognition, Ruby on Rails, SQL injection, Turing test, Wayback Machine, web application

Application Pages Versus Functional Paths The enumeration techniques described so far have been implicitly driven by one particular picture of how web application content may be conceptualized and cataloged. This picture is inherited from the pre-application days of the World Wide Web, in which web servers functioned as repositories of static information, retrieved using URLs that were effectively filenames. To publish some web content, an author simply generated a bunch of HTML files and copied these into the relevant directory on a web server. When users followed hyperlinks. 94 Chapter 4 Mapping the Application they navigated the set of files created by the author, requesting each file via its name within the directory tree residing on the server.

The authors' favorite is sqlmap, which can attack MySQL, Oracle, and MS-SQL, among others. It implements UNiON-based and inference-based retrieval. It supports various escalation methods, including retrieval of files from the operating system, and command execution under Windows using xp_cmdshell. In practice, sqlmap is an effective tool for database information retrieval through time-delay or other inference methods and can be useful for union-based retrieval. One of the best ways to use it is with the --sql-sheli option. This gives the attacker a SQL prompt and performs the necessary union, error-based, or blind SQL injection behind the scenes to send and retrieve results.

/default/fedefault.aspx SessionUser.Key f7e50aef8fadd30f31f3aeal04cef26ed2ce2be50073c SessionClient.ID 306 SessionClient.ReviewID 245 UPriv.2100 Chapter 15 i Exploiting Information Disclosure 619 SessionUser.NetworkLevelUser 0 UPriv.2200 SessionUser.BranchLevelUser 0 SessionDatabase fd219.prod.wahh-bank.com The following items are commonly included in verbose debug messages: ■ Values of key session variables that can be manipulated via user input ■ Hostnames and credentials for back-end components such as databases ■ File and directory names on the server ■ Information embedded within meaningful session tokens (see Chapter 7) ■ Encryption keys used to protect data transmitted via the client (see Chapter 5) ■ Debug information for exceptions arising in native code components, including the values of CPU registers, contents of the stack, and a list of the loaded DLLs and their base addresses (see Chapter 16) When this kind of error reporting functionality is present in live production code, it may signify a critical weakness in the application's security. You should review it closely to identify any items that can be used to further advance your attack, and any ways in which you can supply crafted input to manipulate the application's state and control the information retrieved. Server and Database Messages Informative error messages are often returned not by the application itself but by some back-end component such as a database, mail server, or SOAP server. If a completely unhandled error occurs, the application typically responds with an HTTP 500 status code, and the response body may contain further information about the error.


pages: 1,164 words: 309,327

Trading and Exchanges: Market Microstructure for Practitioners by Larry Harris

active measures, Andrei Shleifer, AOL-Time Warner, asset allocation, automated trading system, barriers to entry, Bernie Madoff, Bob Litterman, book value, business cycle, buttonwood tree, buy and hold, compound rate of return, computerized trading, corporate governance, correlation coefficient, data acquisition, diversified portfolio, equity risk premium, fault tolerance, financial engineering, financial innovation, financial intermediation, fixed income, floating exchange rates, High speed trading, index arbitrage, index fund, information asymmetry, information retrieval, information security, interest rate swap, invention of the telegraph, job automation, junk bonds, law of one price, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, market clearing, market design, market fragmentation, market friction, market microstructure, money market fund, Myron Scholes, National best bid and offer, Nick Leeson, open economy, passive investing, pattern recognition, payment for order flow, Ponzi scheme, post-materialism, price discovery process, price discrimination, principal–agent problem, profit motive, proprietary trading, race to the bottom, random walk, Reminiscences of a Stock Operator, rent-seeking, risk free rate, risk tolerance, risk-adjusted returns, search costs, selection bias, shareholder value, short selling, short squeeze, Small Order Execution System, speech recognition, statistical arbitrage, statistical model, survivorship bias, the market place, transaction costs, two-sided market, vertical integration, winner-take-all economy, yield curve, zero-coupon bond, zero-sum game

In contrast, if the manager wants to buy the stock because he believes that it is fundamentally undervalued, Bob can be more patient. The prices of such stocks usually do not rise so quickly that Bob needs to hurry to trade. The portfolio manager says that he wants to buy Exxon Mobil because he believes it is fundamentally undervalued. Bob then uses an electronic information retrieval system to examine the recent price and trade history for Exxon Mobil. He looks to see whether other traders are trying to fill large orders. If a large seller is pushing prices down, Bob might be able to fill his order quickly at a good price. If Bob must compete with another large buyer, the order may be hard to execute at a good price.

In addition to their regulatory functions, the SEC and CFTC collect and disseminate information useful to traders, investors, speculators, and legislators. The SEC collects various financial reports from issuers and position reports from large traders. Investors who are interested in estimating security values can access these reports over the Internet via the SEC’s Edgar information retrieval system. The CFTC likewise collects and publishes information about commodity market supply and demand conditions and large trader positions. Traders use this information to value commodities and to forecast what other traders might do in the future. Both organizations also provide information to Congress through their regular annual reports, their special reports on specific issues, their testimony at congressional hearings, and their responses to requests for information from members of Congress and their staffs.

His notes now project price targets of 20 and 25 dollars per share, with the possibility of more than 50 dollars a share by the time the new plant comes on line. 12.1.1 The Successful Ending: Bill Profits Some traders who follow BNB closely see the price change. They immediately query their electronic information retrieval services to determine why the stock is moving, and when it started to move. They find the story about producing in China and see that the price increase immediately followed its publication. Although the news has no particular fundamental value, many traders infer more from the story than they should because of the large positive price change that followed the announcement.


Mastering Book-Keeping: A Complete Guide to the Principles and Practice of Business Accounting by Peter Marshall

accounting loophole / creative accounting, asset allocation, book value, double entry bookkeeping, information retrieval, intangible asset, the market place

Fax: (01865) 379162. info@howtobooks.co.uk www.howtobooks.co.uk © 2009 Dr Peter Marshall First edition 1992 Second edition 1995 Third edition 1997 Fourth edition 1999 Fifth edition 2001 Sixth edition 2003 Seventh edition 2005 Reprinted 2006 Eighth edition 2009 First published in electronic form 2009 All rights reserved. No part of this work may be reproduced or stored in an information retrieval system (other than for purposes of review) without the express permission of the publisher in writing. The rights of Peter Marshall to be identified as the author this work has been asserted by him in accordance with the Copyright Designs and Patents Act 1988. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 978 1 84803 324 5 Produced for How To Books by Deer Park Productions, Tavistock, Devon Typeset by PDQ Typesetting, Newcastle-under-Lyme, Staffordshire Cover design by Baseline Arts Ltd, Oxford NOTE: The material contained in this book is set out in good faith for general guidance and no liability can be accepted for loss or expense incurred as a result of relying in particular circumstances on statements made in the book.


Deep Work: Rules for Focused Success in a Distracted World by Cal Newport

8-hour work day, Albert Einstein, barriers to entry, behavioural economics, Bluma Zeigarnik, business climate, Cal Newport, Capital in the Twenty-First Century by Thomas Piketty, Clayton Christensen, David Brooks, David Heinemeier Hansson, deliberate practice, digital divide, disruptive innovation, do what you love, Donald Knuth, Donald Trump, Downton Abbey, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, experimental subject, follow your passion, Frank Gehry, Hacker News, Higgs boson, informal economy, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, Mark Zuckerberg, Marshall McLuhan, Merlin Mann, Nate Silver, Neal Stephenson, new economy, Nicholas Carr, popular electronics, power law, remote working, Richard Feynman, Ruby on Rails, seminal paper, Silicon Valley, Silicon Valley startup, Snapchat, statistical model, the medium is the message, Tyler Cowen, Watson beat the top human players on Jeopardy!, web application, winner-take-all economy, work culture , zero-sum game

In this case, I would suggest that you maintain the strategy of scheduling Internet use even after the workday is over. To simplify matters, when scheduling Internet use after work, you can allow time-sensitive communication into your offline blocks (e.g., texting with a friend to agree on where you’ll meet for dinner), as well as time-sensitive information retrieval (e.g., looking up the location of the restaurant on your phone). Outside of these pragmatic exceptions, however, when in an offline block, put your phone away, ignore texts, and refrain from Internet usage. As in the workplace variation of this strategy, if the Internet plays a large and important role in your evening entertainment, that’s fine: Schedule lots of long Internet blocks.


pages: 242 words: 71,938

The Google Resume: How to Prepare for a Career and Land a Job at Apple, Microsoft, Google, or Any Top Tech Company by Gayle Laakmann Mcdowell

barriers to entry, cloud computing, do what you love, game design, information retrieval, job-hopping, side project, Silicon Valley, Steve Jobs, TED Talk, why are manhole covers round?

The one thing that would make this slightly stronger is for Bill to list the dates of the projects. Distributed Hash Table (Language/Platform: Java/Linux) Successfully implemented Distributed Hash Table based on chord lookup protocol, Chord protocol is one solution for connecting the peers of a P2P network. Chord consistently maps a key onto a node. Information Retrieval System (Language/Platform: Java/Linux) Developed an indexer to index corpus of file and a Query Processor to process the Boolean query. The Query Processor outputs the file name, title, line number, and word position. Implemented using Java API such as serialization and collections (Sortedset, Hashmaps).


pages: 242 words: 245

The New Ruthless Economy: Work & Power in the Digital Age by Simon Head

Alan Greenspan, Asian financial crisis, business cycle, business process, call centre, conceptual framework, deskilling, Erik Brynjolfsson, Ford Model T, Ford paid five dollars a day, Frederick Winslow Taylor, Great Leap Forward, informal economy, information retrieval, Larry Ellison, medical malpractice, new economy, Panopticon Jeremy Bentham, scientific management, shareholder value, Shoshana Zuboff, Silicon Valley, single-payer health, supply-chain management, telemarketer, Thomas Davenport, Toyota Production System, union organizing, work culture

Instead, the company believed that "reducing dependency on people knowledge and skills through expert and artificial intelligence systems" offered the best approach. With the expert system containing "most, if not all, of the knowledge required to perform a task or solve a problem," the knowledgeability of the agent could be confined "largely to data entry and information retrieval procedures"—echoes of Hammer and Champy's deal structurers and case managers.29 The chief KM problem faced by MMR's software engineers was how to achieve an accurate definition of the problem to be solved by CasePoint. The one thing the expert system could not do was provide for itself an accurate description of the symptoms of machine breakdown.


pages: 345 words: 75,660

Prediction Machines: The Simple Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans, Avi Goldfarb

Abraham Wald, Ada Lovelace, AI winter, Air France Flight 447, Airbus A320, algorithmic bias, AlphaGo, Amazon Picking Challenge, artificial general intelligence, autonomous vehicles, backpropagation, basic income, Bayesian statistics, Black Swan, blockchain, call centre, Capital in the Twenty-First Century by Thomas Piketty, Captain Sullenberger Hudson, carbon tax, Charles Babbage, classic study, collateralized debt obligation, computer age, creative destruction, Daniel Kahneman / Amos Tversky, data acquisition, data is the new oil, data science, deep learning, DeepMind, deskilling, disruptive innovation, driverless car, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, everywhere but in the productivity statistics, financial engineering, fulfillment center, general purpose technology, Geoffrey Hinton, Google Glasses, high net worth, ImageNet competition, income inequality, information retrieval, inventory management, invisible hand, Jeff Hawkins, job automation, John Markoff, Joseph Schumpeter, Kevin Kelly, Lyft, Minecraft, Mitch Kapor, Moneyball by Michael Lewis explains big data, Nate Silver, new economy, Nick Bostrom, On the Economy of Machinery and Manufactures, OpenAI, paperclip maximiser, pattern recognition, performance metric, profit maximization, QWERTY keyboard, race to the bottom, randomized controlled trial, Ray Kurzweil, ride hailing / ride sharing, Robert Solow, Salesforce, Second Machine Age, self-driving car, shareholder value, Silicon Valley, statistical model, Stephen Hawking, Steve Jobs, Steve Jurvetson, Steven Levy, strong AI, The Future of Employment, the long tail, The Signal and the Noise by Nate Silver, Tim Cook: Apple, trolley problem, Turing test, Uber and Lyft, uber lyft, US Airways Flight 1549, Vernor Vinge, vertical integration, warehouse automation, warehouse robotics, Watson beat the top human players on Jeopardy!, William Langewiesche, Y Combinator, zero-sum game

To be mobile-first is to drive traffic to your mobile experience and optimize consumers’ interfaces for mobile even at the expense of your full website and other platforms. The last part is what makes it strategic. “Do well on mobile” is something to aim for. But saying you will do so even if it harms other channels is a real commitment. What does this mean in the context of AI-first? Google’s research director Peter Norvig gives an answer: With information retrieval, anything over 80% recall and precision is pretty good—not every suggestion has to be perfect, since the user can ignore the bad suggestions. With assistance, there is a much higher barrier. You wouldn’t use a service that booked the wrong reservation 20% of the time, or even 2% of the time.


Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, machine readable, machine translation, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, Wikidata, wikimedia commons, Wikivoyage

Oracle (2015) Oracle Spatial and Graph. www.oracle.com/technetwork/database/ options/spatialandgraph/overview/index.html. Accessed 10 April 2015. 11. SYSTAP LLC (2015) Blazegraph. www.blazegraph.com/bigdata. Accessed 10 April 2015. Chapter 7 Querying While machine-readable datasets are published primarily for software agents, automatic data extraction is not always an option. Semantic Information Retrieval often involves users searching for the answer to a complex question, based on the formally represented knowledge in a dataset or database. While Structured Query Language (SQL) is used to query relational databases, querying graph databases and flat Resource Description Framework (RDF) files can be done using the SPARQL Protocol and RDF Query Language (SPARQL), the primary query language of RDF, which is much more powerful than SQL.


pages: 238 words: 77,730

Final Jeopardy: Man vs. Machine and the Quest to Know Everything by Stephen Baker

23andMe, AI winter, Albert Einstein, artificial general intelligence, behavioural economics, business process, call centre, clean water, commoditize, computer age, Demis Hassabis, Frank Gehry, information retrieval, Iridium satellite, Isaac Newton, job automation, machine translation, pattern recognition, Ray Kurzweil, Silicon Valley, Silicon Valley startup, statistical model, The Soul of a New Machine, theory of mind, thinkpad, Turing test, Vernor Vinge, vertical integration, Wall-E, Watson beat the top human players on Jeopardy!

Researchers at Harvard, studying the brain scans of people suffering from tip of the tongue syndrome, have noted increased activity in the anterior cingulate—a part of the brain behind the frontal lobe, devoted to conflict resolution and detecting surprise. Few of these conflicts appeared to interfere with Jennings’s information retrieval. During his unprecedented seventy-four-game streak, he routinely won the buzz on more than half the clues. And his snap judgments that the answers were on call in his head somewhere led him to a remarkable 92 percent precision rate, according to statistics compiled by the quiz show’s fans.


Writing Effective Use Cases by Alistair Cockburn

business process, c2.com, create, read, update, delete, finite state, index card, information retrieval, iterative process, operational security, recommendation engine, Silicon Valley, web application, work culture

System has been setup to require the Shopper to identify themselves: Shopper establishes identity 2f. System is setup to interact with known other systems (parts inventory, process & planning) that will affect product availability and selection: 2f.1. System interacts with known other systems (parts inventory, process & planning) to get the needed information. (Retrieve Part Availability, Retrieve Build Schedule). 2f.2. System uses the results to filter or show availability of product and/or options(parts). 2g. Shopper was presented and selects a link to an Industry related web-site: Shopper views other web-site. 2h. System is setup to interact with known Customer Information System: 2h.1.


Paper Knowledge: Toward a Media History of Documents by Lisa Gitelman

Alvin Toffler, An Inconvenient Truth, Andrew Keen, Charles Babbage, computer age, corporate governance, Dennis Ritchie, deskilling, Douglas Engelbart, Douglas Engelbart, East Village, en.wikipedia.org, information retrieval, Internet Archive, invention of movable type, Ivan Sutherland, Jaron Lanier, Ken Thompson, knowledge economy, Lewis Mumford, machine translation, Marshall McLuhan, Mikhail Gorbachev, military-industrial complex, national security letter, Neal Stephenson, On the Economy of Machinery and Manufactures, optical character recognition, profit motive, QR code, RAND corporation, RFC: Request For Comment, scientific management, Shoshana Zuboff, Silicon Valley, Steve Jobs, tacit knowledge, technological determinism, The Structural Transformation of the Public Sphere, Turing test, WikiLeaks, Works Progress Administration

Both microform databanks and Sutherland’s Sketchpad gesture selectively toward a prehistory for the pdf page image because both—though differently—mobilized pages and images of pages for a screen-­based interface. The databanks retrieved televisual reproductions of existing source pages, modeling not just information retrieval but also encouraging certain citation norms (since users could indicate that, for example, “the information appears on page 10”). Meanwhile, Sketchpad established a page as a fixed computational field, a visible ground on which further computational objects might be rendered. The portable document format is related more tenuously to mainframes and microform, even though today’s reference databases—the majority of which of course include and serve up pdf —clearly descend in some measure from experiments like Intrex and the Times Information Bank.


pages: 411 words: 80,925

What's Mine Is Yours: How Collaborative Consumption Is Changing the Way We Live by Rachel Botsman, Roo Rogers

"World Economic Forum" Davos, Abraham Maslow, Airbnb, Apollo 13, barriers to entry, behavioural economics, Bernie Madoff, bike sharing, Buckminster Fuller, business logic, buy and hold, carbon footprint, Cass Sunstein, collaborative consumption, collaborative economy, commoditize, Community Supported Agriculture, credit crunch, crowdsourcing, dematerialisation, disintermediation, en.wikipedia.org, experimental economics, Ford Model T, Garrett Hardin, George Akerlof, global village, hedonic treadmill, Hugh Fearnley-Whittingstall, information retrieval, intentional community, iterative process, Kevin Kelly, Kickstarter, late fees, Mark Zuckerberg, market design, Menlo Park, Network effects, new economy, new new economy, out of africa, Paradox of Choice, Parkinson's law, peer-to-peer, peer-to-peer lending, peer-to-peer rental, planned obsolescence, Ponzi scheme, pre–internet, public intellectual, recommendation engine, RFID, Richard Stallman, ride hailing / ride sharing, Robert Shiller, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, Simon Kuznets, Skype, slashdot, smart grid, South of Market, San Francisco, Stewart Brand, systems thinking, TED Talk, the long tail, The Nature of the Firm, The Spirit Level, the strength of weak ties, The Theory of the Leisure Class by Thorstein Veblen, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thorstein Veblen, Torches of Freedom, Tragedy of the Commons, transaction costs, traveling salesman, ultimatum game, Victor Gruen, web of trust, women in the workforce, work culture , Yochai Benkler, Zipcar

This section was heavily influenced by Richard Grants, “Drowning in Plastic: The Great Pacific Garbage Patch Is Twice the Size of France,” Telegraph (April 24, 2009), www.telegraph.co.uk/earth/environment/5208645/Drowning-in-plastic-The-Great-Pacific-Garbage-Patch-is-twice-the-size-of-France.html. 5. Statistics on annual consumption of plastic materials come from “Plastics Recycling Information.” Retrieved August 2009, www.wasteonline.org.uk/resources/InformationSheets/Plastics.htm. 6. Thomas M. Kostigen, “The World’s Largest Dump: The Great Pacific Garbage Patch,” Discover Magazine (July 10, 2008), http://discovermagazine.com/2008/jul/10-the-worlds-largest-dump. 7. Paul Hawken, Amory Lovins, and L.


pages: 290 words: 83,248

The Greed Merchants: How the Investment Banks Exploited the System by Philip Augar

Alan Greenspan, Andy Kessler, AOL-Time Warner, barriers to entry, Bear Stearns, Berlin Wall, Big bang: deregulation of the City of London, Bonfire of the Vanities, business cycle, buttonwood tree, buy and hold, capital asset pricing model, Carl Icahn, commoditize, corporate governance, corporate raider, crony capitalism, cross-subsidies, deal flow, equity risk premium, financial deregulation, financial engineering, financial innovation, fixed income, Glass-Steagall Act, Gordon Gekko, high net worth, information retrieval, interest rate derivative, invisible hand, John Meriwether, junk bonds, Long Term Capital Management, low interest rates, Martin Wolf, Michael Milken, new economy, Nick Leeson, offshore financial centre, pensions crisis, proprietary trading, regulatory arbitrage, risk free rate, Sand Hill Road, shareholder value, short selling, Silicon Valley, South Sea Bubble, statistical model, systematic bias, Telecommunications Act of 1996, The Chicago School, The Predators' Ball, The Wealth of Nations by Adam Smith, transaction costs, tulip mania, value at risk, yield curve

They range in size from top firms like UBS, Fidelity, State Street, and Barclays Global Investors, which manage over a trillion dollars apiece, to small hedge funds looking after a few million dollars. They rely heavily on brokers whose job is to provide them with advice, information and share dealing: ‘Our best brokers have a great appetite for information retrieval and dissemination. We get our first Bloomberg messages at 5.20 a.m., it’s an information game. We pay brokers $60 million of commission out of a $3 billion fund and most goes to those that phone us most often. They are fast ten-second conversations, often Bloomberg driven. I get a thousand e-mails a day and I read them all.’7 The broking divisions of the top investment banks flood their clients with information: ‘We give them a view on every single price movement; it’s all about short term momentum.


pages: 791 words: 85,159

Social Life of Information by John Seely Brown, Paul Duguid

Alvin Toffler, business process, Charles Babbage, Claude Shannon: information theory, computer age, Computing Machinery and Intelligence, cross-subsidies, disintermediation, double entry bookkeeping, Frank Gehry, frictionless, frictionless market, future of work, George Gilder, George Santayana, global village, Goodhart's law, Howard Rheingold, informal economy, information retrieval, invisible hand, Isaac Newton, John Markoff, John Perry Barlow, junk bonds, Just-in-time delivery, Kenneth Arrow, Kevin Kelly, knowledge economy, knowledge worker, lateral thinking, loose coupling, Marshall McLuhan, medical malpractice, Michael Milken, moral hazard, Network effects, new economy, Productivity paradox, Robert Metcalfe, rolodex, Ronald Coase, scientific management, shareholder value, Shoshana Zuboff, Silicon Valley, Steve Jobs, Superbowl ad, tacit knowledge, Ted Nelson, telepresence, the medium is the message, The Nature of the Firm, the strength of weak ties, The Wealth of Nations by Adam Smith, Thomas Malthus, transaction costs, Turing test, Vannevar Bush, Y2K

The definitions of knowledge management that began this chapter perform a familiar two-step. First, they define the core problem in terms of information, so that, second, they can put solutions in the province of information technology.13 Here, retrieval looks as easy as search. Page 125 If information retrieval were all that is required for such things as knowledge management or best practice, HP would have nothing to worry about. It has an abundance of very good information technology. The persistence of HP's problem, then, argues that knowledge management, knowledge, and learning involve more than information.


pages: 344 words: 94,332

The 100-Year Life: Living and Working in an Age of Longevity by Lynda Gratton, Andrew Scott

"World Economic Forum" Davos, 3D printing, Airbnb, asset light, assortative mating, behavioural economics, carbon footprint, carbon tax, classic study, Clayton Christensen, collapse of Lehman Brothers, creative destruction, crowdsourcing, deep learning, delayed gratification, disruptive innovation, diversification, Downton Abbey, driverless car, Erik Brynjolfsson, falling living standards, financial engineering, financial independence, first square of the chessboard, first square of the chessboard / second half of the chessboard, future of work, gender pay gap, gig economy, Google Glasses, indoor plumbing, information retrieval, intangible asset, Isaac Newton, job satisfaction, longitudinal study, low skilled workers, Lyft, Nelson Mandela, Network effects, New Economic Geography, old age dependency ratio, pattern recognition, pension reform, Peter Thiel, Ray Kurzweil, Richard Florida, Richard Thaler, risk free rate, Second Machine Age, sharing economy, Sheryl Sandberg, side project, Silicon Valley, smart cities, Stanford marshmallow experiment, Stephen Hawking, Steve Jobs, tacit knowledge, The Future of Employment, uber lyft, warehouse robotics, women in the workforce, young professional

There are those who argue that even these skills can be performed by AI – pointing, for example, to the development of IBM’s supercomputer Watson, which is able to perform detailed oncological diagnosis. This means that with diagnostic augmentation, the skill set for the medical profession will shift from information retrieval to deeper intuitive experience, more person-to- person skills and greater emphasis on team motivation and judgement. The same technological developments will occur in the education sector, where digital teaching will replace textbooks and classroom teaching and the valuable skills will move towards the intricate human skills of empathy, motivation and encouragement.


pages: 382 words: 92,138

The Entrepreneurial State: Debunking Public vs. Private Sector Myths by Mariana Mazzucato

Apple II, banking crisis, barriers to entry, Bretton Woods, business cycle, California gold rush, call centre, carbon footprint, carbon tax, Carmen Reinhart, circular economy, clean tech, computer age, creative destruction, credit crunch, David Ricardo: comparative advantage, demand response, deskilling, dual-use technology, endogenous growth, energy security, energy transition, eurozone crisis, everywhere but in the productivity statistics, Fairchild Semiconductor, Financial Instability Hypothesis, full employment, G4S, general purpose technology, green transition, Growth in a Time of Debt, Hyman Minsky, incomplete markets, information retrieval, intangible asset, invisible hand, Joseph Schumpeter, Kenneth Rogoff, Kickstarter, knowledge economy, knowledge worker, linear model of innovation, natural language processing, new economy, offshore financial centre, Philip Mirowski, popular electronics, Post-Keynesian economics, profit maximization, Ralph Nader, renewable energy credits, rent-seeking, ride hailing / ride sharing, risk tolerance, Robert Solow, shareholder value, Silicon Valley, Silicon Valley ideology, smart grid, Solyndra, Steve Jobs, Steve Wozniak, The Wealth of Nations by Adam Smith, Tim Cook: Apple, Tony Fadell, too big to fail, total factor productivity, trickle-down economics, vertical integration, Washington Consensus, William Shockley: the traitorous eight

Available online at http://www.guardian.co.uk/technology/2002/apr/04/internetnews.maths/print (accessed 10 October 2012). DIUS (Department of Innovation, Universities and Skills). 2008. Innovation Nation, March. Cm 7345. London: DIUS. DoD (United States Department of Defense). 2011. Selected Acquisition Report (SAR): RCS: DD-A&T(Q&A)823-166 : NAVSTAR GPS: Defense Acquisition Management Information Retrieval (DAMIR). Los Angeles, 31 December. DoE (United States Department of Energy). 2007. ‘DOE-Supported Researcher Is Co-winner of 2007 Nobel Prize in Physics’. 10 September. Available online at http://science.energy.gov/news/in-the-news/2007/10-09-07/?p=1 (accessed 21 January 2013). _____. 2009.


pages: 339 words: 94,769

Possible Minds: Twenty-Five Ways of Looking at AI by John Brockman

AI winter, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Alignment Problem, AlphaGo, artificial general intelligence, Asilomar, autonomous vehicles, basic income, Benoit Mandelbrot, Bill Joy: nanobots, Bletchley Park, Buckminster Fuller, cellular automata, Claude Shannon: information theory, Computing Machinery and Intelligence, CRISPR, Daniel Kahneman / Amos Tversky, Danny Hillis, data science, David Graeber, deep learning, DeepMind, Demis Hassabis, easy for humans, difficult for computers, Elon Musk, Eratosthenes, Ernest Rutherford, fake news, finite state, friendly AI, future of work, Geoffrey Hinton, Geoffrey West, Santa Fe Institute, gig economy, Hans Moravec, heat death of the universe, hype cycle, income inequality, industrial robot, information retrieval, invention of writing, it is difficult to get a man to understand something, when his salary depends on his not understanding it, James Watt: steam engine, Jeff Hawkins, Johannes Kepler, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, Kickstarter, Laplace demon, Large Hadron Collider, Loebner Prize, machine translation, market fundamentalism, Marshall McLuhan, Menlo Park, military-industrial complex, mirror neurons, Nick Bostrom, Norbert Wiener, OpenAI, optical character recognition, paperclip maximiser, pattern recognition, personalized medicine, Picturephone, profit maximization, profit motive, public intellectual, quantum cryptography, RAND corporation, random walk, Ray Kurzweil, Recombinant DNA, Richard Feynman, Rodney Brooks, self-driving car, sexual politics, Silicon Valley, Skype, social graph, speech recognition, statistical model, Stephen Hawking, Steven Pinker, Stewart Brand, strong AI, superintelligent machines, supervolcano, synthetic biology, systems thinking, technological determinism, technological singularity, technoutopianism, TED Talk, telemarketer, telerobotics, The future is already here, the long tail, the scientific method, theory of mind, trolley problem, Turing machine, Turing test, universal basic income, Upton Sinclair, Von Neumann architecture, Whole Earth Catalog, Y2K, you are the product, zero-sum game

To be fair, the human body needs 100 watts to operate and twenty years to build, hence about 6 trillion joules of energy to “manufacture” a mature human brain. The cost of manufacturing Watson-scale computing is similar. So why aren’t humans displacing computers? For one, the Jeopardy! contestants’ brains were doing far more than information retrieval—much of which would be considered mere distractions by Watson (e.g., cerebellar control of smiling). Other parts allow leaping out of the box with transcendence unfathomable by Watson, such as what we see in Einstein’s five annus mirabilis papers of 1905. Also, humans consume more energy than the minimum (100 watts) required for life and reproduction.


Noam Chomsky: A Life of Dissent by Robert F. Barsky

Albert Einstein, anti-communist, centre right, feminist movement, Herbert Marcuse, Howard Zinn, information retrieval, language acquisition, machine translation, means of production, military-industrial complex, Murray Bookchin, Norman Mailer, profit motive, public intellectual, Ralph Nader, Ronald Reagan, strong AI, The Bell Curve by Richard Herrnstein and Charles Murray, theory of mind, Yom Kippur War

He tried to use the features of linguistic analysis for discourse analysis" (qtd. in R. A. Harris 83). From this project discourse analysis was born. Chomsky was in search of transformations "to model the linguistic knowledge in a native speaker's head," while Harris was interested in "such practical purposes as machine translation and automated information retrieval" (R. A. Harris 84). Their linguistic interests were irrevocably diverging. Chomsky's last communications with Harris were in the early 1960s, "when [Harris] asked me to [approach] contacts at the [National Science Foundation] for a research contract for him, which I did. We then spent a couple of days together in Israel, in 1964.


The Internet Trap: How the Digital Economy Builds Monopolies and Undermines Democracy by Matthew Hindman

A Declaration of the Independence of Cyberspace, accounting loophole / creative accounting, activist fund / activist shareholder / activist investor, AltaVista, Amazon Web Services, barriers to entry, Benjamin Mako Hill, bounce rate, business logic, Cambridge Analytica, cloud computing, computer vision, creative destruction, crowdsourcing, David Ricardo: comparative advantage, death of newspapers, deep learning, DeepMind, digital divide, discovery of DNA, disinformation, Donald Trump, fake news, fault tolerance, Filter Bubble, Firefox, future of journalism, Ida Tarbell, incognito mode, informal economy, information retrieval, invention of the telescope, Jeff Bezos, John Perry Barlow, John von Neumann, Joseph Schumpeter, lake wobegon effect, large denomination, longitudinal study, loose coupling, machine translation, Marc Andreessen, Mark Zuckerberg, Metcalfe’s law, natural language processing, Netflix Prize, Network effects, New Economic Geography, New Journalism, pattern recognition, peer-to-peer, Pepsi Challenge, performance metric, power law, price discrimination, recommendation engine, Robert Metcalfe, search costs, selection bias, Silicon Valley, Skype, sparse data, speech recognition, Stewart Brand, surveillance capitalism, technoutopianism, Ted Nelson, The Chicago School, the long tail, The Soul of a New Machine, Thomas Malthus, web application, Whole Earth Catalog, Yochai Benkler

., and Dugan, M. (2012). A live comparison of methods for personalized article recommendation at Forbes.com. Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Bristol, England (pp. 51–66). Knight Foundation. (2016, May). Mobile first news: how people use smartphones to access information. Retrieved from https://www.knightfoundation.org/media /uploads/publication_pdfs/KF_Mobile-Report_Final_050916.pdf. Kohavi, R., Deng, A., Frasca, B., Walker, T., Xu, Y., and Pohlmann, N. (2013). Online controlled experiments at large scale. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge Discovery and Data Mining, Chicago, IL (pp. 1168–76).


UNIX® Network Programming, Volume 1: The Sockets Networking API, 3rd Edition by W. Richard Stevens, Bill Fenner, Andrew M. Rudoff

Dennis Ritchie, exponential backoff, failed state, fudge factor, global macro, history of Unix, information retrieval, OpenAI, OSI model, p-value, RFC: Request For Comment, Richard Stallman, UUNET, web application

Unfortunately, it is implementation-dependent how an administrator configures a host to use the different types of name services. Solaris 2.x, HP-UX 10 and later, and FreeBSD 5.x and later use the file /etc/nsswitch.conf, and AIX uses the file /etc/netsvc.conf. BIND 9.2.2 supplies its own version named the Information Retrieval Service (IRS), which uses the file /etc/irs.conf. If a name server is to be used for hostname lookups, then all these systems use the file /etc/resolv.conf to specify the IP addresses of the name servers. Fortunately, these differences are normally hidden to the application programmer, so we just call the resolver functions such as gethostbyname and gethostbyaddr.

., 304, 889, 950, 954 Hypertext Markup Language, see HTML Hypertext Transfer Protocol, see HTTP I_RECVFD constant, 420 I_SENDFD constant, 420 IANA (Internet Assigned Numbers Authority), 50 – 52, 215, 311, 950, 953 IBM, xxiii ICMP (Internet Control Message Protocol), 33, 62, 200, 249, 256 – 257, 735, 739, 742, 755, 896, 922, UNIX Network Programming 925 address request, 739, 883 code field, 882 destination unreachable, 100 – 101, 144, 200, 249, 762, 764 – 765, 771, 775, 865, 883 – 884 destination unreachable, fragmentation required, 56, 771, 883 echo reply, 735, 741, 883 – 884 echo request, 735, 739, 741, 883 – 884 header, picture of, 882 message daemon, implementation, 769 – 786 packet too big, 56, 771, 884 parameter problem, 720, 883 – 884 port unreachable, 249, 253, 257, 265, 534, 755, 761, 764, 771, 794, 815, 883 – 884, 925 redirect, 485, 497, 883 – 884 router advertisement, 735, 741, 883 – 884 router solicitation, 735, 883 – 884 source quench, 771 – 772, 883 time exceeded, 755, 761, 764, 771, 883 – 884 timestamp request, 739, 883 type field, 882 ICMP6_FILTER socket option, 216, 740 ICMP6_FILTER_SETBLOCK macro, definition of, 740 ICMP6_FILTER_SETBLOCKALL macro, definition of, 740 ICMP6_FILTER_SETPASS macro, definition of, 740 ICMP6_FILTER_SETPASSALL macro, definition of, 740 ICMP6_FILTER_WILLBLOCK macro, definition of, 740 ICMP6_FILTER_WILLPASS macro, definition of, 740 icmp6_filter structure, 193, 216, 740 icmpcode_v4 function, 765 icmpcode_v6 function, 765 icmpd program, 769, 772, 774 – 786, 946 icmpd_dest member, 772 icmpd_err member, 771, 774, 783 – 784 icmpd_errno member, 771 icmpd.h header, 775 ICMPv4 (Internet Control Message Protocol version 4), 33 – 34, 735, 740, 769, 871, 882 – 884 checksum, 737, 753, 806, 882 header, 743, 755 message types, 883 ICMPv6 (Internet Control Message Protocol version 6), 33 – 34, 216, 735, 738, 769, 882 – 884 checksum, 738, 753 – 754, 882 header, 744, 755 message types, 884 multicast listener done, 884 multicast listener query, 884 multicast listener report, 884 Index 965 neighbor advertisement, 884 neighbor advertisement, inverse, 884 neighbor solicitation, 884 neighbor solicitation, inverse, 884 socket option, 216 type filtering, 740 – 741 id program, 431 ident member, 405 identification field, IPv4, 870 IEC (International Electrotechnical Commission), 26, 950 IEEE (Institute of Electrical and Electronics Engineers), 26, 509, 550, 879, 950 IEEE-IX, 26 IETF (Internet Engineering Task Force), 28, 947 if_announcemsghdr structure, 487 definition of, 488 if_freenameindex function, 504 – 508 definition of, 504 source code, 508 if_index member, 504, 903 if_indextoname function, 504 – 508, 566, 568, 593 definition of, 504 source code, 506 if_msghdr structure, 487, 502 definition of, 488 if_name member, 504, 508, 903 if_nameindex function, 486, 504 – 508 definition of, 504 source code, 507 if_nameindex structure, 504, 507 – 508, 903 definition of, 504 if_nametoindex function, 486, 504 – 508, 566 – 567, 569 definition of, 504 source code, 505 ifa_msghdr structure, 487 definition of, 488 ifam_addrs member, 489, 493 ifc_buf member, 469 – 470 ifc_len member, 77, 468, 470 ifc_req member, 469 ifconf structure, 77, 467 – 468, 470 definition of, 469 ifconfig program, 23, 25, 103, 234, 471, 480 IFF_BROADCAST constant, 480 IFF_POINTOPOINT constant, 480 IFF_PROMISC constant, 792 IFF_UP constant, 480 ifi_hlen member, 473, 478, 502 ifi_index member, 502 ifi_info structure, 469, 471, 473, 475, 478, 484, 500, 502, 608 ifi_next member, 471, 478 ifm_addrs member, 489, 493 966 UNIX Network Programming ifm_type member, 502 ifma_msghdr structure, 487 definition of, 488 ifmam_addrs member, 489 IFNAMSIZ constant, 504 ifr_addr member, 469, 480 – 481 ifr_broadaddr member, 469, 481, 484 ifr_data member, 469 ifr_dstaddr member, 469, 481, 484 ifr_flags member, 469, 480 – 481 ifr_metric member, 469, 481 ifr_name member, 470, 480 ifreq structure, 467 – 468, 470, 475, 477, 480, 484, 568 definition of, 469 IFT_NONE constant, 591 IGMP (Internet Group Management Protocol), 33 – 34, 556, 735, 739 – 740, 871 checksum, 753 ILP32, programming model, 28 imperfect multicast filtering, 555 implementation ICMP message daemon, 769 – 786 ping program, 741 – 754 traceroute program, 755 – 768 imr_interface member, 560, 562, 568 imr_multiaddr member, 560, 562 imr_sourceaddr member, 562 IN6_IS_ADDR_LINKLOCAL macro, definition of, 360 IN6_IS_ADDR_LOOPBACK macro, definition of, 360 IN6_IS_ADDR_MC_GLOBAL macro, definition of, 360 IN6_IS_ADDR_MC_LINKLOCAL macro, definition of, 360 IN6_IS_ADDR_MC_NODELOCAL macro, definition of, 360 IN6_IS_ADDR_MC_ORGLOCAL macro, definition of, 360 IN6_IS_ADDR_MC_SITELOCAL macro, definition of, 360 IN6_IS_ADDR_MULTICAST macro, definition of, 360 IN6_IS_ADDR_SITELOCAL macro, definition of, 360 IN6_IS_ADDR_UNSPECIFIED macro, definition of, 360 IN6_IS_ADDR_V4COMPAT macro, definition of, 360 IN6_IS_ADDR_V4MAPPED macro, 355, 360, 362, 745 definition of, 360 in6_addr structure, 193, 561 definition of, 71 in6_pktinfo structure, 588, 615 – 617, 731 Index definition of, 616 IN6ADDR_ANY_INIT constant, 103, 320, 322, 412, 616, 881 IN6ADDR_LOOPBACK_INIT constant, 880 in6addr_any constant, 103, 881 in6addr_loopback constant, 880 in_addr structure, 70, 193, 308, 310, 358, 560, 563 definition of, 68 in_addr_t datatype, 69 – 70 in_cksum function, 753 source code, 753 in_pcbdetach function, 140 in_port_t datatype, 69 INADDR_ANY constant, 13, 53, 102 – 103, 122, 126, 214, 242, 288, 320, 322, 412, 534, 560 – 563, 859, 876, 915 INADDR_LOOPBACK constant, 876 INADDR_MAX_LOCAL_GROUP constant, 915 INADDR_NONE constant, 82, 901, 915 in-addr.arpa domain, 304, 310 in-band data, 645 incarnation, definition of, 44 incomplete connection queue, 104 index, interface, 217, 489, 498, 502, 504 – 508, 560 – 563, 566, 569, 577, 616, 731 INET6_ADDRSTRLEN constant, 83, 86, 901 inet6_opt_append function, 723 – 724 definition of, 723 inet6_opt_find function, 725 definition of, 724 inet6_opt_finish function, 723 – 724 definition of, 723 inet6_opt_get_val function, 725 definition of, 724 inet6_opt_init function, 723 – 724 definition of, 723 inet6_option_alloc function, 732 inet6_option_append function, 732 inet6_option_find function, 732 inet6_option_init function, 732 inet6_option_next function, 732 inet6_option_space function, 732 inet6_opt_next function, 724 – 725 definition of, 724 inet6_opt_set_val function, 723 – 725 definition of, 723 inet6_rth_add function, 727 – 728 definition of, 727 inet6_rthdr_add function, 732 inet6_rthdr_getaddr function, 732 inet6_rthdr_getflags function, 732 inet6_rthdr_init function, 732 inet6_rthdr_lasthop function, 732 inet6_rthdr_reverse function, 732 inet6_rthdr_segments function, 732 inet6_rthdr_space function, 732 UNIX Network Programming inet6_rth_getaddr function, 728, 731 definition of, 728 inet6_rth_init function, 727 – 728 definition of, 727 inet6_rth_reverse function, 728, 730 definition of, 728 inet6_rth_segments function, 728, 731 definition of, 728 inet6_rth_space function, 727 – 728 definition of, 727 inet6_srcrt_print function, 730 – 731 INET_ADDRSTRLEN constant, 83, 86, 901 inet_addr function, 9, 67, 82 – 83, 93 definition of, 82 inet_aton function, 82 – 83, 93, 314 definition of, 82 inet_ntoa function, 67, 82 – 83, 343, 685 definition of, 82 inet_ntop function, 67, 82 – 86, 93, 110, 309, 341, 343, 345, 350, 593, 731 definition of, 83 IPv4-only version, source code, 85 inet_pton function, 8 – 9, 11, 67, 82 – 85, 93, 290, 333, 343, 930 definition of, 83 IPv4-only version, source code, 85 inet_pton_loose function, 93 inet_srcrt_add function, 713, 715 inet_srcrt_init function, 712, 715 inet_srcrt_print function, 714 inetd program, 61, 114, 118 – 119, 154, 363, 371 – 380, 587, 613 – 614, 825, 850, 897, 934, 945 Information Retrieval Service, see IRS INFTIM constant, 184, 902 init program, 132, 145, 938 init_v6 function, 749 initial thread, 676 in.rdisc program, 735 Institute of Electrical and Electronics Engineers, see IEEE int16_t datatype, 69 int32_t datatype, 69 int8_t datatype, 69 interface address, UDP, binding, 608 – 612 configuration, ioctl function, 468 – 469 index, 217, 489, 498, 502, 504 – 508, 560 – 563, 566, 569, 577, 616, 731 index, recvmsg function, receiving, 588 – 593 logical, 877 loopback, 23, 792, 799, 809, 876 – 877 message-based, 858 operations, ioctl function, 480 – 481 UDP determining outgoing, 261 – 262 interface-local multicast scope, 552 – 553 International Electrotechnical Commission, see IEC Index 967 International Organization for Standardization, see ISO Internet, 5, 22 Internet Assigned Numbers Authority, see IANA Internet Control Message Protocol, see ICMP Internet Control Message Protocol version 4, see ICMPv4 Internet Control Message Protocol version 6, see ICMPv6 Internet Draft, 947 Internet Engineering Task Force, see IETF Internet Group Management Protocol, see IGMP Internet Protocol, see IP Internet Protocol next generation, see IPng Internet Protocol version 4, see IPv4 Internet Protocol version 6, see IPv6 Internet service provider, see ISP Internetwork Packet Exchange, see IPX interoperability IPv4 and IPv6, 353 – 362 IPv4 client IPv6 server, 354 – 357 IPv6 client IPv4 server, 357 – 359 source code portability, 361 interprocess communication, see IPC interrupts, software, 129 inverse, ICMPv6 neighbor advertisement, 884 ICMPv6 neighbor solicitation, 884 I/O asynchronous, 160, 468, 663 definition of, Unix, 399 model, asynchronous, 158 – 159 model, blocking, 154 – 155 model, comparison of, 159 – 160 model, I/O, multiplexing, 156 – 157 model, nonblocking, 155 – 156 model, signal-driven, 157 – 158 models, 154 – 160 multiplexing, 153 – 189 multiplexing I/O, model, 156 – 157 nonblocking, 88, 165, 234 – 235, 388, 398, 435 – 464, 468, 665, 669, 671, 919, 945 signal-driven, 200, 234 – 235, 663 – 673 standard, 168, 344, 399 – 402, 409, 437, 935, 952 synchronous, 160 ioctl function, 191, 222, 233 – 234, 399, 403 – 404, 409, 420, 465 – 469, 474 – 475, 477 – 478, 480 – 485, 500, 538, 566, 568, 585, 647, 654, 664, 666, 669, 790, 792, 799, 852, 857, 868 ARP cache operations, 481 – 483 definition of, 466, 857 file operations, 468 interface configuration, 468 – 469 interface operations, 480 – 481 routing table operations, 483 – 484 socket operations, 466 – 467 STREAMS, 857 – 858 968 UNIX Network Programming IOV_MAX constant, 390 iov_base member, 389 iov_len member, 389, 392 iovec structure, 389 – 391, 393, 601 definition of, 389 IP (Internet Protocol), 33 fragmentation and broadcast, 537 – 538 fragmentation and multicast, 571 Multicast Infrastructure, 571, 584 – 585 Multicast Infrastructure session announcements, 571 – 575 routing, 869 spoofing, 108, 948 version number field, 869, 871 ip6_mtuinfo structure, definition of, 619 ip6.arpa domain, 304 ip6m_addr member, 619 ip6m_mtu member, 619 IP_ADD_MEMBERSHIP socket option, 193, 560, 562 IP_ADD_SOURCE_MEMBERSHIP socket option, 193, 560 IP_BLOCK_SOURCE socket option, 193, 560, 562 IP_DROP_MEMBERSHIP socket option, 193, 560 – 561 IP_DROP_SOURCE_MEMBERSHIP socket option, 193, 560 IP_HDRINCL socket option, 193, 214, 710, 736 – 738, 753, 755, 790, 793, 805 – 806 IP_MULTICAST_IF socket option, 193, 559, 563, 945 IP_MULTICAST_LOOP socket option, 193, 559, 563 IP_MULTICAST_TTL socket option, 193, 215, 559, 563, 871, 945 IP_OPTIONS socket option, 193, 214, 709 – 710, 718, 733, 945 IP_RECVDSTADDR socket option, 193, 211, 214, 251, 265, 392 – 396, 587 – 588, 590, 592, 608, 616, 620, 666, 895 ancillary data, picture of, 394 IP_RECVIF socket option, 193, 215, 395, 487, 588, 590, 592, 608, 620, 666 ancillary data, picture of, 591 IP_TOS socket option, 193, 215, 870, 895 IP_TTL socket option, 193, 215, 218, 755, 761, 871, 895 IP_UNBLOCK_SOURCE socket option, 193, 560 ip_id member, 740, 806 ip_len member, 737, 740, 806 ip_mreq structure, 193, 560, 568 definition of, 560 ip_mreq_source structure, 193 definition of, 562 ip_off member, 737, 740 IPC (interprocess communication), 411 – 412, Index 545 – 547, 675 ipi6_addr member, 616 ipi6_ifindex member, 616 ipi_addr member, 588, 901 ipi_ifindex member, 588, 901 IPng (Internet Protocol next generation), 871 ipopt_dst member, 714 ipopt_list member, 714 ipoption structure, definition of, 714 IPPROTO_ICMP constant, 736 IPPROTO_ICMPV6 constant, 193, 216, 738, 740 IPPROTO_IP constant, 214, 394 – 395, 591, 710 IPPROTO_IPV6 constant, 216, 395, 615 – 619, 722, 727 IPPROTO_RAW constant, 737 IPPROTO_SCTP constant, 97, 222, 288 IPPROTO_TCP constant, 97, 219, 288, 519 IPPROTO_UDP constant, 97 IPsec, 951 IPv4 (Internet Protocol version 4), 33, 869 address, 874 – 877 and IPv6 interoperability, 353 – 362 checksum, 214, 737, 753, 871 client IPv6 server, interoperability, 354 – 357 destination address, 871 fragment offset field, 871 header, 743, 755, 869 – 871 header length field, 870 header, picture of, 870 identification field, 870 multicast address, 549 – 551 multicast address, ethernet mapping, picture of, 550 options, 214, 709 – 711, 871 protocol field, 871 receiving packet information, 588 – 593 server, interoperability, IPv6 client, 357 – 359 socket address structure, 68 – 70 socket option, 214 – 215 source address, 871 source routing, 711 – 719 total length field, 870 IPv4-compatible IPv6 address, 880 IPv4/IPv6 host, definition of, 34 IPv4-mapped IPv6 address, 93, 322, 333, 354 – 360, 745, 879 – 880 IPv6 (Internet Protocol version 6), xx, 33, 871 address, 877 – 881 backbone, see 6bone checksum, 216, 738, 873 client IPv4 server, interoperability, 357 – 359 destination address, 873 destination options, 719 – 725 extension headers, 719 flow label field, 871 getaddrinfo function, 322 – 323 UNIX Network Programming header, 744, 755, 871 – 874 header, picture of, 872 historical advanced API, 732 hop-by-hop options, 719 – 725 interoperability, IPv4 and, 353 – 362 multicast address, 551 – 552 multicast address, ethernet mapping, picture of, 550 multicast address, picture of, 551 next header field, 872 options, see IPv6, extension headers path MTU control, 618 – 619 payload length field, 872 receiving packet information, 615 – 618 routing header, 725 – 731 server, interoperability, IPv4 client, 354 – 357 socket address structure, 71 – 72 socket option, 216 – 218 source address, 873 source routing, 725 – 731 source routing segments left, 725 source routing type, 725 sticky options, 731 – 732 IPV6_ADD_MEMBERSHIP socket option, 560 – 561 IPV6_ADDRFORM socket option, 361 IPV6_CHECKSUM socket option, 193, 216, 738 IPV6_DONTFRAG socket option, 216, 619 IPV6_DROP_MEMBERSHIP socket option, 560 – 561 IPV6_DSTOPTS socket option, 193, 395, 732 ancillary data, picture of, 722 IPV6_HOPLIMIT socket option, 193, 395, 617, 732, 749 – 750, 873 ancillary data, picture of, 615 IPV6_HOPOPTS socket option, 193, 395, 732 ancillary data, picture of, 722 IPV6_JOIN_GROUP socket option, 193, 560, 562 IPV6_LEAVE_GROUP socket option, 193, 561 IPV6_MULTICAST_HOPS socket option, 193, 559, 563, 617, 873 IPV6_MULTICAST_IF socket option, 193, 559, 563, 616 IPV6_MULTICAST_LOOP socket option, 193, 559, 563 IPV6_NEXTHOP socket option, 193, 217, 395, 617, 732 ancillary data, picture of, 615 IPV6_PATHMTU socket option, 217, 619 IPV6_PKTINFO socket option, 193, 251, 395, 561, 608, 616, 620, 666, 732 ancillary data, picture of, 615 IPV6_PKTOPTIONS socket option, 732 IPV6_RECVDSTOPTS socket option, 217, 722 IPV6_RECVHOPLIMIT socket option, 217 – 218, 617, 749, 873 IPV6_RECVHOPOPTS socket option, 217, 722 Index 969 IPV6_RECVPATHMTU socket option, 216 – 217, 619 IPV6_RECVPKTINFO socket option, 217, 616 – 617, 620 IPV6_RECVRTHDR socket option, 218, 727, 729 IPV6_RECVTCLASS socket option, 218, 618 IPV6_RTHDR socket option, 193, 395, 732 ancillary data, picture of, 727 IPV6_RTHDR_TYPE_0 constant, 727 IPV6_TCLASS socket option, 395, 618, 732, 871 ancillary data, picture of, 615 IPV6_UNICAST_HOPS socket option, 193, 218, 617, 755, 761, 873 IPV6_USE_MIN_MTU socket option, 218, 618 – 619 IPV6_V6ONLY socket option, 218, 357 IPV6_XXX socket options, 218 ipv6_mreq structure, 193, 560, 569 definition of, 560 ipv6mr_interface member, 560, 569 ipv6mr_multiaddr member, 560 IPX (Internetwork Packet Exchange), 952 IRS (Information Retrieval Service), 306 ISO (International Organization for Standardization), 18, 26, 950 ISO 8859, 573 ISP (Internet service provider), 875 iterative server, 15, 114, 243, 821 – 822 Jackson, A., 721, 952 Jacobson, V., 35, 38 – 39, 44, 571, 596, 598 – 599, 737, 788, 790, 896, 949 – 951 Jim, J., 285, 953 Jinmei, T., 28, 216, 397, 719, 738, 744, 953 joinable thread, 678 Jones, R.

., 304, 889, 950, 954 Hypertext Markup Language, see HTML Hypertext Transfer Protocol, see HTTP I_RECVFD constant, 420 I_SENDFD constant, 420 IANA (Internet Assigned Numbers Authority), 50 – 52, 215, 311, 950, 953 IBM, xxiii ICMP (Internet Control Message Protocol), 33, 62, 200, 249, 256 – 257, 735, 739, 742, 755, 896, 922, UNIX Network Programming 925 address request, 739, 883 code field, 882 destination unreachable, 100 – 101, 144, 200, 249, 762, 764 – 765, 771, 775, 865, 883 – 884 destination unreachable, fragmentation required, 56, 771, 883 echo reply, 735, 741, 883 – 884 echo request, 735, 739, 741, 883 – 884 header, picture of, 882 message daemon, implementation, 769 – 786 packet too big, 56, 771, 884 parameter problem, 720, 883 – 884 port unreachable, 249, 253, 257, 265, 534, 755, 761, 764, 771, 794, 815, 883 – 884, 925 redirect, 485, 497, 883 – 884 router advertisement, 735, 741, 883 – 884 router solicitation, 735, 883 – 884 source quench, 771 – 772, 883 time exceeded, 755, 761, 764, 771, 883 – 884 timestamp request, 739, 883 type field, 882 ICMP6_FILTER socket option, 216, 740 ICMP6_FILTER_SETBLOCK macro, definition of, 740 ICMP6_FILTER_SETBLOCKALL macro, definition of, 740 ICMP6_FILTER_SETPASS macro, definition of, 740 ICMP6_FILTER_SETPASSALL macro, definition of, 740 ICMP6_FILTER_WILLBLOCK macro, definition of, 740 ICMP6_FILTER_WILLPASS macro, definition of, 740 icmp6_filter structure, 193, 216, 740 icmpcode_v4 function, 765 icmpcode_v6 function, 765 icmpd program, 769, 772, 774 – 786, 946 icmpd_dest member, 772 icmpd_err member, 771, 774, 783 – 784 icmpd_errno member, 771 icmpd.h header, 775 ICMPv4 (Internet Control Message Protocol version 4), 33 – 34, 735, 740, 769, 871, 882 – 884 checksum, 737, 753, 806, 882 header, 743, 755 message types, 883 ICMPv6 (Internet Control Message Protocol version 6), 33 – 34, 216, 735, 738, 769, 882 – 884 checksum, 738, 753 – 754, 882 header, 744, 755 message types, 884 multicast listener done, 884 multicast listener query, 884 multicast listener report, 884 Index 965 neighbor advertisement, 884 neighbor advertisement, inverse, 884 neighbor solicitation, 884 neighbor solicitation, inverse, 884 socket option, 216 type filtering, 740 – 741 id program, 431 ident member, 405 identification field, IPv4, 870 IEC (International Electrotechnical Commission), 26, 950 IEEE (Institute of Electrical and Electronics Engineers), 26, 509, 550, 879, 950 IEEE-IX, 26 IETF (Internet Engineering Task Force), 28, 947 if_announcemsghdr structure, 487 definition of, 488 if_freenameindex function, 504 – 508 definition of, 504 source code, 508 if_index member, 504, 903 if_indextoname function, 504 – 508, 566, 568, 593 definition of, 504 source code, 506 if_msghdr structure, 487, 502 definition of, 488 if_name member, 504, 508, 903 if_nameindex function, 486, 504 – 508 definition of, 504 source code, 507 if_nameindex structure, 504, 507 – 508, 903 definition of, 504 if_nametoindex function, 486, 504 – 508, 566 – 567, 569 definition of, 504 source code, 505 ifa_msghdr structure, 487 definition of, 488 ifam_addrs member, 489, 493 ifc_buf member, 469 – 470 ifc_len member, 77, 468, 470 ifc_req member, 469 ifconf structure, 77, 467 – 468, 470 definition of, 469 ifconfig program, 23, 25, 103, 234, 471, 480 IFF_BROADCAST constant, 480 IFF_POINTOPOINT constant, 480 IFF_PROMISC constant, 792 IFF_UP constant, 480 ifi_hlen member, 473, 478, 502 ifi_index member, 502 ifi_info structure, 469, 471, 473, 475, 478, 484, 500, 502, 608 ifi_next member, 471, 478 ifm_addrs member, 489, 493 966 UNIX Network Programming ifm_type member, 502 ifma_msghdr structure, 487 definition of, 488 ifmam_addrs member, 489 IFNAMSIZ constant, 504 ifr_addr member, 469, 480 – 481 ifr_broadaddr member, 469, 481, 484 ifr_data member, 469 ifr_dstaddr member, 469, 481, 484 ifr_flags member, 469, 480 – 481 ifr_metric member, 469, 481 ifr_name member, 470, 480 ifreq structure, 467 – 468, 470, 475, 477, 480, 484, 568 definition of, 469 IFT_NONE constant, 591 IGMP (Internet Group Management Protocol), 33 – 34, 556, 735, 739 – 740, 871 checksum, 753 ILP32, programming model, 28 imperfect multicast filtering, 555 implementation ICMP message daemon, 769 – 786 ping program, 741 – 754 traceroute program, 755 – 768 imr_interface member, 560, 562, 568 imr_multiaddr member, 560, 562 imr_sourceaddr member, 562 IN6_IS_ADDR_LINKLOCAL macro, definition of, 360 IN6_IS_ADDR_LOOPBACK macro, definition of, 360 IN6_IS_ADDR_MC_GLOBAL macro, definition of, 360 IN6_IS_ADDR_MC_LINKLOCAL macro, definition of, 360 IN6_IS_ADDR_MC_NODELOCAL macro, definition of, 360 IN6_IS_ADDR_MC_ORGLOCAL macro, definition of, 360 IN6_IS_ADDR_MC_SITELOCAL macro, definition of, 360 IN6_IS_ADDR_MULTICAST macro, definition of, 360 IN6_IS_ADDR_SITELOCAL macro, definition of, 360 IN6_IS_ADDR_UNSPECIFIED macro, definition of, 360 IN6_IS_ADDR_V4COMPAT macro, definition of, 360 IN6_IS_ADDR_V4MAPPED macro, 355, 360, 362, 745 definition of, 360 in6_addr structure, 193, 561 definition of, 71 in6_pktinfo structure, 588, 615 – 617, 731 Index definition of, 616 IN6ADDR_ANY_INIT constant, 103, 320, 322, 412, 616, 881 IN6ADDR_LOOPBACK_INIT constant, 880 in6addr_any constant, 103, 881 in6addr_loopback constant, 880 in_addr structure, 70, 193, 308, 310, 358, 560, 563 definition of, 68 in_addr_t datatype, 69 – 70 in_cksum function, 753 source code, 753 in_pcbdetach function, 140 in_port_t datatype, 69 INADDR_ANY constant, 13, 53, 102 – 103, 122, 126, 214, 242, 288, 320, 322, 412, 534, 560 – 563, 859, 876, 915 INADDR_LOOPBACK constant, 876 INADDR_MAX_LOCAL_GROUP constant, 915 INADDR_NONE constant, 82, 901, 915 in-addr.arpa domain, 304, 310 in-band data, 645 incarnation, definition of, 44 incomplete connection queue, 104 index, interface, 217, 489, 498, 502, 504 – 508, 560 – 563, 566, 569, 577, 616, 731 INET6_ADDRSTRLEN constant, 83, 86, 901 inet6_opt_append function, 723 – 724 definition of, 723 inet6_opt_find function, 725 definition of, 724 inet6_opt_finish function, 723 – 724 definition of, 723 inet6_opt_get_val function, 725 definition of, 724 inet6_opt_init function, 723 – 724 definition of, 723 inet6_option_alloc function, 732 inet6_option_append function, 732 inet6_option_find function, 732 inet6_option_init function, 732 inet6_option_next function, 732 inet6_option_space function, 732 inet6_opt_next function, 724 – 725 definition of, 724 inet6_opt_set_val function, 723 – 725 definition of, 723 inet6_rth_add function, 727 – 728 definition of, 727 inet6_rthdr_add function, 732 inet6_rthdr_getaddr function, 732 inet6_rthdr_getflags function, 732 inet6_rthdr_init function, 732 inet6_rthdr_lasthop function, 732 inet6_rthdr_reverse function, 732 inet6_rthdr_segments function, 732 inet6_rthdr_space function, 732 UNIX Network Programming inet6_rth_getaddr function, 728, 731 definition of, 728 inet6_rth_init function, 727 – 728 definition of, 727 inet6_rth_reverse function, 728, 730 definition of, 728 inet6_rth_segments function, 728, 731 definition of, 728 inet6_rth_space function, 727 – 728 definition of, 727 inet6_srcrt_print function, 730 – 731 INET_ADDRSTRLEN constant, 83, 86, 901 inet_addr function, 9, 67, 82 – 83, 93 definition of, 82 inet_aton function, 82 – 83, 93, 314 definition of, 82 inet_ntoa function, 67, 82 – 83, 343, 685 definition of, 82 inet_ntop function, 67, 82 – 86, 93, 110, 309, 341, 343, 345, 350, 593, 731 definition of, 83 IPv4-only version, source code, 85 inet_pton function, 8 – 9, 11, 67, 82 – 85, 93, 290, 333, 343, 930 definition of, 83 IPv4-only version, source code, 85 inet_pton_loose function, 93 inet_srcrt_add function, 713, 715 inet_srcrt_init function, 712, 715 inet_srcrt_print function, 714 inetd program, 61, 114, 118 – 119, 154, 363, 371 – 380, 587, 613 – 614, 825, 850, 897, 934, 945 Information Retrieval Service, see IRS INFTIM constant, 184, 902 init program, 132, 145, 938 init_v6 function, 749 initial thread, 676 in.rdisc program, 735 Institute of Electrical and Electronics Engineers, see IEEE int16_t datatype, 69 int32_t datatype, 69 int8_t datatype, 69 interface address, UDP, binding, 608 – 612 configuration, ioctl function, 468 – 469 index, 217, 489, 498, 502, 504 – 508, 560 – 563, 566, 569, 577, 616, 731 index, recvmsg function, receiving, 588 – 593 logical, 877 loopback, 23, 792, 799, 809, 876 – 877 message-based, 858 operations, ioctl function, 480 – 481 UDP determining outgoing, 261 – 262 interface-local multicast scope, 552 – 553 International Electrotechnical Commission, see IEC Index 967 International Organization for Standardization, see ISO Internet, 5, 22 Internet Assigned Numbers Authority, see IANA Internet Control Message Protocol, see ICMP Internet Control Message Protocol version 4, see ICMPv4 Internet Control Message Protocol version 6, see ICMPv6 Internet Draft, 947 Internet Engineering Task Force, see IETF Internet Group Management Protocol, see IGMP Internet Protocol, see IP Internet Protocol next generation, see IPng Internet Protocol version 4, see IPv4 Internet Protocol version 6, see IPv6 Internet service provider, see ISP Internetwork Packet Exchange, see IPX interoperability IPv4 and IPv6, 353 – 362 IPv4 client IPv6 server, 354 – 357 IPv6 client IPv4 server, 357 – 359 source code portability, 361 interprocess communication, see IPC interrupts, software, 129 inverse, ICMPv6 neighbor advertisement, 884 ICMPv6 neighbor solicitation, 884 I/O asynchronous, 160, 468, 663 definition of, Unix, 399 model, asynchronous, 158 – 159 model, blocking, 154 – 155 model, comparison of, 159 – 160 model, I/O, multiplexing, 156 – 157 model, nonblocking, 155 – 156 model, signal-driven, 157 – 158 models, 154 – 160 multiplexing, 153 – 189 multiplexing I/O, model, 156 – 157 nonblocking, 88, 165, 234 – 235, 388, 398, 435 – 464, 468, 665, 669, 671, 919, 945 signal-driven, 200, 234 – 235, 663 – 673 standard, 168, 344, 399 – 402, 409, 437, 935, 952 synchronous, 160 ioctl function, 191, 222, 233 – 234, 399, 403 – 404, 409, 420, 465 – 469, 474 – 475, 477 – 478, 480 – 485, 500, 538, 566, 568, 585, 647, 654, 664, 666, 669, 790, 792, 799, 852, 857, 868 ARP cache operations, 481 – 483 definition of, 466, 857 file operations, 468 interface configuration, 468 – 469 interface operations, 480 – 481 routing table operations, 483 – 484 socket operations, 466 – 467 STREAMS, 857 – 858 968 UNIX Network Programming IOV_MAX constant, 390 iov_base member, 389 iov_len member, 389, 392 iovec structure, 389 – 391, 393, 601 definition of, 389 IP (Internet Protocol), 33 fragmentation and broadcast, 537 – 538 fragmentation and multicast, 571 Multicast Infrastructure, 571, 584 – 585 Multicast Infrastructure session announcements, 571 – 575 routing, 869 spoofing, 108, 948 version number field, 869, 871 ip6_mtuinfo structure, definition of, 619 ip6.arpa domain, 304 ip6m_addr member, 619 ip6m_mtu member, 619 IP_ADD_MEMBERSHIP socket option, 193, 560, 562 IP_ADD_SOURCE_MEMBERSHIP socket option, 193, 560 IP_BLOCK_SOURCE socket option, 193, 560, 562 IP_DROP_MEMBERSHIP socket option, 193, 560 – 561 IP_DROP_SOURCE_MEMBERSHIP socket option, 193, 560 IP_HDRINCL socket option, 193, 214, 710, 736 – 738, 753, 755, 790, 793, 805 – 806 IP_MULTICAST_IF socket option, 193, 559, 563, 945 IP_MULTICAST_LOOP socket option, 193, 559, 563 IP_MULTICAST_TTL socket option, 193, 215, 559, 563, 871, 945 IP_OPTIONS socket option, 193, 214, 709 – 710, 718, 733, 945 IP_RECVDSTADDR socket option, 193, 211, 214, 251, 265, 392 – 396, 587 – 588, 590, 592, 608, 616, 620, 666, 895 ancillary data, picture of, 394 IP_RECVIF socket option, 193, 215, 395, 487, 588, 590, 592, 608, 620, 666 ancillary data, picture of, 591 IP_TOS socket option, 193, 215, 870, 895 IP_TTL socket option, 193, 215, 218, 755, 761, 871, 895 IP_UNBLOCK_SOURCE socket option, 193, 560 ip_id member, 740, 806 ip_len member, 737, 740, 806 ip_mreq structure, 193, 560, 568 definition of, 560 ip_mreq_source structure, 193 definition of, 562 ip_off member, 737, 740 IPC (interprocess communication), 411 – 412, Index 545 – 547, 675 ipi6_addr member, 616 ipi6_ifindex member, 616 ipi_addr member, 588, 901 ipi_ifindex member, 588, 901 IPng (Internet Protocol next generation), 871 ipopt_dst member, 714 ipopt_list member, 714 ipoption structure, definition of, 714 IPPROTO_ICMP constant, 736 IPPROTO_ICMPV6 constant, 193, 216, 738, 740 IPPROTO_IP constant, 214, 394 – 395, 591, 710 IPPROTO_IPV6 constant, 216, 395, 615 – 619, 722, 727 IPPROTO_RAW constant, 737 IPPROTO_SCTP constant, 97, 222, 288 IPPROTO_TCP constant, 97, 219, 288, 519 IPPROTO_UDP constant, 97 IPsec, 951 IPv4 (Internet Protocol version 4), 33, 869 address, 874 – 877 and IPv6 interoperability, 353 – 362 checksum, 214, 737, 753, 871 client IPv6 server, interoperability, 354 – 357 destination address, 871 fragment offset field, 871 header, 743, 755, 869 – 871 header length field, 870 header, picture of, 870 identification field, 870 multicast address, 549 – 551 multicast address, ethernet mapping, picture of, 550 options, 214, 709 – 711, 871 protocol field, 871 receiving packet information, 588 – 593 server, interoperability, IPv6 client, 357 – 359 socket address structure, 68 – 70 socket option, 214 – 215 source address, 871 source routing, 711 – 719 total length field, 870 IPv4-compatible IPv6 address, 880 IPv4/IPv6 host, definition of, 34 IPv4-mapped IPv6 address, 93, 322, 333, 354 – 360, 745, 879 – 880 IPv6 (Internet Protocol version 6), xx, 33, 871 address, 877 – 881 backbone, see 6bone checksum, 216, 738, 873 client IPv4 server, interoperability, 357 – 359 destination address, 873 destination options, 719 – 725 extension headers, 719 flow label field, 871 getaddrinfo function, 322 – 323 UNIX Network Programming header, 744, 755, 871 – 874 header, picture of, 872 historical advanced API, 732 hop-by-hop options, 719 – 725 interoperability, IPv4 and, 353 – 362 multicast address, 551 – 552 multicast address, ethernet mapping, picture of, 550 multicast address, picture of, 551 next header field, 872 options, see IPv6, extension headers path MTU control, 618 – 619 payload length field, 872 receiving packet information, 615 – 618 routing header, 725 – 731 server, interoperability, IPv4 client, 354 – 357 socket address structure, 71 – 72 socket option, 216 – 218 source address, 873 source routing, 725 – 731 source routing segments left, 725 source routing type, 725 sticky options, 731 – 732 IPV6_ADD_MEMBERSHIP socket option, 560 – 561 IPV6_ADDRFORM socket option, 361 IPV6_CHECKSUM socket option, 193, 216, 738 IPV6_DONTFRAG socket option, 216, 619 IPV6_DROP_MEMBERSHIP socket option, 560 – 561 IPV6_DSTOPTS socket option, 193, 395, 732 ancillary data, picture of, 722 IPV6_HOPLIMIT socket option, 193, 395, 617, 732, 749 – 750, 873 ancillary data, picture of, 615 IPV6_HOPOPTS socket option, 193, 395, 732 ancillary data, picture of, 722 IPV6_JOIN_GROUP socket option, 193, 560, 562 IPV6_LEAVE_GROUP socket option, 193, 561 IPV6_MULTICAST_HOPS socket option, 193, 559, 563, 617, 873 IPV6_MULTICAST_IF socket option, 193, 559, 563, 616 IPV6_MULTICAST_LOOP socket option, 193, 559, 563 IPV6_NEXTHOP socket option, 193, 217, 395, 617, 732 ancillary data, picture of, 615 IPV6_PATHMTU socket option, 217, 619 IPV6_PKTINFO socket option, 193, 251, 395, 561, 608, 616, 620, 666, 732 ancillary data, picture of, 615 IPV6_PKTOPTIONS socket option, 732 IPV6_RECVDSTOPTS socket option, 217, 722 IPV6_RECVHOPLIMIT socket option, 217 – 218, 617, 749, 873 IPV6_RECVHOPOPTS socket option, 217, 722 Index 969 IPV6_RECVPATHMTU socket option, 216 – 217, 619 IPV6_RECVPKTINFO socket option, 217, 616 – 617, 620 IPV6_RECVRTHDR socket option, 218, 727, 729 IPV6_RECVTCLASS socket option, 218, 618 IPV6_RTHDR socket option, 193, 395, 732 ancillary data, picture of, 727 IPV6_RTHDR_TYPE_0 constant, 727 IPV6_TCLASS socket option, 395, 618, 732, 871 ancillary data, picture of, 615 IPV6_UNICAST_HOPS socket option, 193, 218, 617, 755, 761, 873 IPV6_USE_MIN_MTU socket option, 218, 618 – 619 IPV6_V6ONLY socket option, 218, 357 IPV6_XXX socket options, 218 ipv6_mreq structure, 193, 560, 569 definition of, 560 ipv6mr_interface member, 560, 569 ipv6mr_multiaddr member, 560 IPX (Internetwork Packet Exchange), 952 IRS (Information Retrieval Service), 306 ISO (International Organization for Standardization), 18, 26, 950 ISO 8859, 573 ISP (Internet service provider), 875 iterative server, 15, 114, 243, 821 – 822 Jackson, A., 721, 952 Jacobson, V., 35, 38 – 39, 44, 571, 596, 598 – 599, 737, 788, 790, 896, 949 – 951 Jim, J., 285, 953 Jinmei, T., 28, 216, 397, 719, 738, 744, 953 joinable thread, 678 Jones, R.


pages: 386 words: 91,913

The Elements of Power: Gadgets, Guns, and the Struggle for a Sustainable Future in the Rare Metal Age by David S. Abraham

"World Economic Forum" Davos, 3D printing, Airbus A320, Boeing 747, carbon footprint, circular economy, Citizen Lab, clean tech, clean water, commoditize, Deng Xiaoping, Elon Musk, en.wikipedia.org, Fairphone, geopolitical risk, gigafactory, glass ceiling, global supply chain, information retrieval, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Large Hadron Collider, new economy, oil shale / tar sands, oil shock, planned obsolescence, reshoring, Robert Metcalfe, Ronald Reagan, Silicon Valley, Solyndra, South China Sea, Steve Ballmer, Steve Jobs, systems thinking, telemarketer, Tesla Model S, thinkpad, upwardly mobile, uranium enrichment, WikiLeaks, Y2K

O’Rourke, “Navy Virginia (SSN-774) Class Attack Submarine Procurement: Background and Issues for Congress,” Congressional Research Service, July 31, 2014, http://www.fas.org/sgp/crs/weapons/RL32418.pdf. For information on Virginia class submarine purchases, see, “DDG 51 Arleigh Burke Class Guided Missile Destroyer,” Defense Acquisition Management Information Retrieval, December 31, 2012, accessed December 18, 2014, http://www.dod.mil/pubs/foi/logistics_material_readiness/acq_bud_fin/SARs/2012-sars/13-F-0884_SARs_as_of_Dec_2012/Navy/DDG_51_December_2012_SAR.pdf. For information on the DDG 51 Aegis Destroyer Ships as of 2012, including expected production until 2016, see “Next Global Positioning System Receiver Equipment,” Committee Reports 113th Congress (2013–2014), House Report 113-102, June 7, 2013, accessed December 18, 2014, thomas.loc.gov/cgi-bin/cpquery/?


Lessons-Learned-in-Software-Testing-A-Context-Driven-Approach by Anson-QA

anti-pattern, Chuck Templeton: OpenTable:, finite state, framing effect, full employment, independent contractor, information retrieval, job automation, knowledge worker, lateral thinking, Ralph Nader, Richard Feynman, side project, Silicon Valley, statistical model, systems thinking, tacit knowledge, web application

For example (these examples are all based on successes from our personal experience), imagine bringing in a smart person whose most recent work role was as an attorney who can analyze any specification you can give them and is trained as an advocate, a director of sales and marketing (the one we hired trained our staff in new methods of researching and writing bug reports to draw the attention of the marketing department), a hardware repair technician, a librarian (think about testing databases or other information retrieval systems), a programmer, a project manager (of nonsoftware projects), a technical support representative with experience supporting products like the ones you're testing, a translator (especially useful if your company publishes software in many languages), a secretary (think about all the information that you collect, store, and disseminate and all the time management you and your staff have to do), a system administrator who knows networks, or a user of the software you're testing.


pages: 352 words: 96,532

Where Wizards Stay Up Late: The Origins of the Internet by Katie Hafner, Matthew Lyon

air freight, Bill Duvall, Charles Babbage, Compatible Time-Sharing System, computer age, conceptual framework, Donald Davies, Douglas Engelbart, Douglas Engelbart, fault tolerance, Hush-A-Phone, information retrieval, Ivan Sutherland, John Markoff, Kevin Kelly, Leonard Kleinrock, Marc Andreessen, Menlo Park, military-industrial complex, Multics, natural language processing, OSI model, packet switching, RAND corporation, RFC: Request For Comment, Robert Metcalfe, Ronald Reagan, seminal paper, Silicon Valley, Skinner box, speech recognition, Steve Crocker, Steven Levy, The Soul of a New Machine

It was “published” electronically in the MsgGroup in 1977. They went on: “As computer communication systems become more powerful, more humane, more forgiving and above all, cheaper, they will become ubiquitous.” Automated hotel reservations, credit checking, real-time financial transactions, access to insurance and medical records, general information retrieval, and real-time inventory control in businesses would all come. In the late 1970s, the Information Processing Techniques Office’s final report to ARPA management on the completion of the ARPANET research program concluded similarly: “The largest single surprise of the ARPANET program has been the incredible popularity and success of network mail.


pages: 443 words: 98,113

The Corruption of Capitalism: Why Rentiers Thrive and Work Does Not Pay by Guy Standing

"World Economic Forum" Davos, 3D printing, Airbnb, Alan Greenspan, Albert Einstein, Amazon Mechanical Turk, anti-fragile, Asian financial crisis, asset-backed security, bank run, banking crisis, basic income, Ben Bernanke: helicopter money, Bernie Sanders, Big bang: deregulation of the City of London, Big Tech, bilateral investment treaty, Bonfire of the Vanities, Boris Johnson, Bretton Woods, business cycle, Capital in the Twenty-First Century by Thomas Piketty, carried interest, cashless society, central bank independence, centre right, Clayton Christensen, collapse of Lehman Brothers, collective bargaining, commons-based peer production, credit crunch, crony capitalism, cross-border payments, crowdsourcing, debt deflation, declining real wages, deindustrialization, disruptive innovation, Doha Development Round, Donald Trump, Double Irish / Dutch Sandwich, ending welfare as we know it, eurozone crisis, Evgeny Morozov, falling living standards, financial deregulation, financial innovation, Firefox, first-past-the-post, future of work, Garrett Hardin, gentrification, gig economy, Goldman Sachs: Vampire Squid, Greenspan put, Growth in a Time of Debt, housing crisis, income inequality, independent contractor, information retrieval, intangible asset, invention of the steam engine, investor state dispute settlement, it's over 9,000, James Watt: steam engine, Jeremy Corbyn, job automation, John Maynard Keynes: technological unemployment, labour market flexibility, light touch regulation, Long Term Capital Management, low interest rates, lump of labour, Lyft, manufacturing employment, Mark Zuckerberg, market clearing, Martin Wolf, means of production, megaproject, mini-job, Money creation, Mont Pelerin Society, moral hazard, mortgage debt, mortgage tax deduction, Neil Kinnock, non-tariff barriers, North Sea oil, Northern Rock, nudge unit, Occupy movement, offshore financial centre, oil shale / tar sands, open economy, openstreetmap, patent troll, payday loans, peer-to-peer lending, Phillips curve, plutocrats, Ponzi scheme, precariat, quantitative easing, remote working, rent control, rent-seeking, ride hailing / ride sharing, Right to Buy, Robert Gordon, Ronald Coase, Ronald Reagan, Sam Altman, savings glut, Second Machine Age, secular stagnation, sharing economy, Silicon Valley, Silicon Valley startup, Simon Kuznets, SoftBank, sovereign wealth fund, Stephen Hawking, Steve Ballmer, structural adjustment programs, TaskRabbit, The Chicago School, The Future of Employment, the payments system, The Rise and Fall of American Growth, Thomas Malthus, Thorstein Veblen, too big to fail, Tragedy of the Commons, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, Y Combinator, zero-sum game, Zipcar

McClintick, ‘How Harvard lost Russia’, Institutional Investor, 27 February 2006. 2 ‘The new age of crony capitalism’, The Economist, 15 March 2014, pp. 9, 53–4; ‘The party winds down’, The Economist, 7 May 2016, pp. 46–8. 3 M. Lupu, K. Mayer, J. Tait and A. J. Trippe (eds), Current Challenges in Patent Information Retrieval (Heidelberg: Springer-Verlag, 2011), p. v. 4 Letter to Isaac McPherson, 13 August 1813. A public good is one that can be consumed or used by one person without affecting its consumption or use by others; it is available to all. 5 ‘A question of utility’, The Economist, 8 August 2015. 6 M.


How to Form Your Own California Corporation by Anthony Mancuso

book value, business cycle, corporate governance, corporate raider, distributed generation, estate planning, independent contractor, information retrieval, intangible asset, passive income, passive investing, Silicon Valley

California Secretary of State contact Information www.ss.ca.gov/business/corp/corporate.htm Office hours for all locations are Monday through Friday 8:00 a.m. to 5:00 p.m. Sacramento Office 1500 11th Street Sacramento, CA 95814 (916) 657-5448* • Name Availability Unit (*recorded information on how to obtain) • Document Filing Support Unit • Legal Review Unit • Information Retrieval and Certification Unit • Status (*recorded information on how to obtain) • Statement of Information Unit (filings only) P.O. Box 944230 Sacramento, CA 94244-2300 • Substituted Service of Process (must be hand delivered to the Sacramento office) San Francisco Regional Office 455 Golden Gate Avenue, Suite 14500 San Francisco, CA 94102-7007 415-557-8000 Fresno Regional Office 1315 Van Ness Ave., Suite 203 Fresno, CA 93721-1729 559-445-6900 Los Angeles Regional Office 300 South Spring Street, Room 12513 Los Angeles, CA 90013-1233 213-897-3062 San Diego Regional Office 1350 Front Street, Suite 2060 San Diego, CA 92101-3609 619-525-4113 California Department of Corporations contact information www.corp.ca.gov Contact Information The Department of Corporations, the office that receives your Notice of Stock Issuance, as explained in Chapter 5, Step 7, has four offices.


pages: 364 words: 102,926

What the F: What Swearing Reveals About Our Language, Our Brains, and Ourselves by Benjamin K. Bergen

correlation does not imply causation, information retrieval, intentional community, machine readable, Parler "social media", pre–internet, Ronald Reagan, seminal paper, statistical model, Steven Pinker, traumatic brain injury

NBC Sports. Retrieved from http://profootballtalk.nbcsportscom/2014/03/03/richard-sherman-calls-nfl-banning-the-n-word-an-atrocious-idea. Snopes. (October 11, 2014). Pluck Yew. Retrieved from http://www.snopes.com/language/apocryph/pluckyew.asp. Social Security Administration. (n.d.). Background information. Retrieved from https://www.ssa.gov/oact/babynames/background.html. Songbass. (November 3, 2008). Obama gives McCain the middle finger. YouTube. Retrieved from https://www.youtube.com/watch?v=Pc8Wc1CN7sY. Spears, A. K. (1998). African-American language use: Ideology and so-called obscenity. In African-American English: Structure, history, and use, Salikoko S.


Your Own Allotment : How to Find It, Cultivate It, and Enjoy Growing Your Own Food by Russell-Jones, Neil.

Berlin Wall, British Empire, carbon footprint, Corn Laws, David Attenborough, discovery of the americas, Easter island, information retrieval, Kickstarter, mass immigration, spice trade

YOUR OWN ALLOTMENT If you want to know how… How to Grow Your Own Food A week-by-week guide to wild life friendly fruit and vegetable gardening Planning and Creating Your First Garden A step-by-step guide to designing your garden – whatever your experience or knowledge How to Start Your Own Gardening Business An insider guide to setting yourself up as a professional gardener Please send for a free copy of the latest catalogue: How To Books Ltd Spring Hill House, Spring Hill Road, Begbroke Oxford OX5 1RX, United Kingdom info@howtoboooks.co.uk www.howtobooks.co.uk YOUR OWN ALLOTMENT How to find it, cultivate it, and enjoy growing your own food Neil Russell-Jones SPRING HILL Published by How To Content, A division of How To Books Ltd, Spring Hill House, Spring Hill Road, Begbroke, Oxford OX5 1RX, United Kingdom. Tel: (01865) 375794. Fax: (01865) 379162. info@howtobooks.co.uk www.howtobooks.co.uk All rights reserved. No part of this work may be reproduced or stored in an information retrieval system (other than for purposes of review), without the express permission of the publisher in writing. The right of Neil Russell-Jones to be identified as author of this work has been asserted by him in accordance with the Copyright, Design and Patents Act 1988. © 2008 Neil Russell-Jones First edition 2008 First published in electronic form 2008 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 978 1 84803 247 7 Cover design by Mousemat Design Illustrations by Deborah Andrews Produced for How To Books by Deer Park Productions, Tavistock, Devon Typeset by Pantek Arts Ltd, Maidstone, Kent NOTE:The material contained in this book is set out in good faith for general guidance and no liability can be accepted for loss or expense incurred as a result of relying in particular circumstances on statements made in this book.The laws and regulations are complex and liable to change, and readers should check the current positions with the relevant authorities before making personal arrangements.


The Myth of Artificial Intelligence: Why Computers Can't Think the Way We Do by Erik J. Larson

AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Alignment Problem, AlphaGo, Amazon Mechanical Turk, artificial general intelligence, autonomous vehicles, Big Tech, Black Swan, Bletchley Park, Boeing 737 MAX, business intelligence, Charles Babbage, Claude Shannon: information theory, Computing Machinery and Intelligence, conceptual framework, correlation does not imply causation, data science, deep learning, DeepMind, driverless car, Elon Musk, Ernest Rutherford, Filter Bubble, Geoffrey Hinton, Georg Cantor, Higgs boson, hive mind, ImageNet competition, information retrieval, invention of the printing press, invention of the wheel, Isaac Newton, Jaron Lanier, Jeff Hawkins, John von Neumann, Kevin Kelly, Large Hadron Collider, Law of Accelerating Returns, Lewis Mumford, Loebner Prize, machine readable, machine translation, Nate Silver, natural language processing, Nick Bostrom, Norbert Wiener, PageRank, PalmPilot, paperclip maximiser, pattern recognition, Peter Thiel, public intellectual, Ray Kurzweil, retrograde motion, self-driving car, semantic web, Silicon Valley, social intelligence, speech recognition, statistical model, Stephen Hawking, superintelligent machines, tacit knowledge, technological singularity, TED Talk, The Coming Technological Singularity, the long tail, the scientific method, The Signal and the Noise by Nate Silver, The Wisdom of Crowds, theory of mind, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, Yochai Benkler

Alice easily succeeds at improving Hugh-­Machine’s chess competence (despite his being a champion player already), by simply downloading some StockFish chess code off her smartphone. Similarly, she gives the Hugh-­Machine perfect arithmetical abilities with a calculator, and supercomputer memory, as well as access to all the information retrievable by Google. System X is optimal, and the Hugh-­Machine can do something that Hugh Alexander, for all his seeming intelligence, could not: it can play superhuman chess, and superhumanly add numbers, and excel at many other System X ­things. The prob­lem is, so too could Bob-­Machine. In fact, Alice realizes that Bob-­Machine and Hugh-­Machine are provably equivalent.


Artificial Whiteness by Yarden Katz

affirmative action, AI winter, algorithmic bias, AlphaGo, Amazon Mechanical Turk, autonomous vehicles, benefit corporation, Black Lives Matter, blue-collar work, Californian Ideology, Cambridge Analytica, cellular automata, Charles Babbage, cloud computing, colonial rule, computer vision, conceptual framework, Danny Hillis, data science, David Graeber, deep learning, DeepMind, desegregation, Donald Trump, Dr. Strangelove, driverless car, Edward Snowden, Elon Musk, Erik Brynjolfsson, European colonialism, fake news, Ferguson, Missouri, general purpose technology, gentrification, Hans Moravec, housing crisis, income inequality, information retrieval, invisible hand, Jeff Bezos, Kevin Kelly, knowledge worker, machine readable, Mark Zuckerberg, mass incarceration, Menlo Park, military-industrial complex, Nate Silver, natural language processing, Nick Bostrom, Norbert Wiener, pattern recognition, phenotype, Philip Mirowski, RAND corporation, recommendation engine, rent control, Rodney Brooks, Ronald Reagan, Salesforce, Seymour Hersh, Shoshana Zuboff, Silicon Valley, Silicon Valley billionaire, Silicon Valley ideology, Skype, speech recognition, statistical model, Stephen Hawking, Stewart Brand, Strategic Defense Initiative, surveillance capitalism, talking drums, telemarketer, The Signal and the Noise by Nate Silver, W. E. B. Du Bois, Whole Earth Catalog, WikiLeaks

In the 1990s probabilistic modeling and inference were becoming AI’s dominant new computational engine and starting to displace logic-based approaches to reasoning within the field. These probabilistic frameworks, which crystalized in the 1980s, did not always develop under the umbrella of “AI” but also under headings such as “statistical pattern recognition,” “data mining,” or “information retrieval.”97 Regardless, these frameworks were being absorbed into AI’s familiar narratives. An article in AI Magazine titled “Is AI Going Mainstream at Last? A Look Inside Microsoft Research” (1993) exemplifies this turn. The piece omits AI’s history of shifting in and out of the mainstream, claiming that “AI” merely had its “15 minutes of fame in the mid-80s,” but that new developments in probabilistic modeling could put it back on the map.


JUST ONE DAMNED THING AFTER ANOTHER by Jodi Taylor

clean water, friendly fire, information retrieval

I fell in love with the library, which, together with the Hall, obviously constituted the heart of the building. High ceilings made it spacious and a huge fireplace made it cosy. Comfortable chairs were scattered around and tall windows all along one wall let the sunshine flood in. As well as bays of books they had all the latest electronic information retrieval systems, study areas and data tables and through an archway, a huge archive. ‘You name it, we’ve got it somewhere,’ said Doctor Dowson, introduced to me as Librarian and Archivist and who appeared to be wearing a kind of sou’ wester. ‘At least until that old fool upstairs blows us all sky high.


pages: 416 words: 106,582

This Will Make You Smarter: 150 New Scientific Concepts to Improve Your Thinking by John Brockman

23andMe, adjacent possible, Albert Einstein, Alfred Russel Wallace, Anthropocene, banking crisis, Barry Marshall: ulcers, behavioural economics, Benoit Mandelbrot, Berlin Wall, biofilm, Black Swan, Bletchley Park, butterfly effect, Cass Sunstein, cloud computing, cognitive load, congestion charging, correlation does not imply causation, Daniel Kahneman / Amos Tversky, dark matter, data acquisition, David Brooks, delayed gratification, Emanuel Derman, epigenetics, Evgeny Morozov, Exxon Valdez, Flash crash, Flynn Effect, Garrett Hardin, Higgs boson, hive mind, impulse control, information retrieval, information security, Intergovernmental Panel on Climate Change (IPCC), Isaac Newton, Jaron Lanier, Johannes Kepler, John von Neumann, Kevin Kelly, Large Hadron Collider, lifelogging, machine translation, mandelbrot fractal, market design, Mars Rover, Marshall McLuhan, microbiome, Murray Gell-Mann, Nicholas Carr, Nick Bostrom, ocean acidification, open economy, Pierre-Simon Laplace, place-making, placebo effect, power law, pre–internet, QWERTY keyboard, random walk, randomized controlled trial, rent control, Richard Feynman, Richard Feynman: Challenger O-ring, Richard Thaler, Satyajit Das, Schrödinger's Cat, scientific management, security theater, selection bias, Silicon Valley, Stanford marshmallow experiment, stem cell, Steve Jobs, Steven Pinker, Stewart Brand, Stuart Kauffman, sugar pill, synthetic biology, the scientific method, Thorstein Veblen, Turing complete, Turing machine, twin studies, Vilfredo Pareto, Walter Mischel, Whole Earth Catalog, WikiLeaks, zero-sum game

Sometimes this information is found by directed search using a Web search engine, sometimes by serendipity by following links, and sometimes by asking hundreds of people in your social network or hundreds of thousands of people on a question-answering Web site such as Answers.com, Quora, or Yahoo Answers. I do not actually know of a real findability index, but tools in the field of information retrieval could be applied to develop one. One of the unsolved problems in the field is how to help the searcher to determine if the information simply is not available. An Assertion Is Often an Empirical Question, Settled by Collecting Evidence Susan Fiske Eugene Higgins Professor of Psychology, Princeton University; author, Envy Up, Scorn Down: How Status Divides Us The most important scientific concept is that an assertion is often an empirical question, settled by collecting evidence.


pages: 519 words: 102,669

Programming Collective Intelligence by Toby Segaran

algorithmic management, always be closing, backpropagation, correlation coefficient, Debian, en.wikipedia.org, Firefox, full text search, functional programming, information retrieval, PageRank, prediction markets, recommendation engine, slashdot, social bookmarking, sparse data, Thomas Bayes, web application

Algorithms for full-text searches are among the most important collective intelligence algorithms, and many fortunes have been made by new ideas in this field. It is widely believed that Google's rapid rise from an academic project to the world's most popular search engine was based largely on the PageRank algorithm, a variation that you'll learn about in this chapter. Information retrieval is a huge field with a long history. This chapter will only be able to cover a few key concepts, but we'll go through the construction of a search engine that will index a set of documents and leave you with ideas on how to improve things further. Although the focus will be on algorithms for searching and ranking rather than on the infrastructure requirements for indexing large portions of the Web, the search engine you build should have no problem with collections of up to 100,000 pages.


pages: 433 words: 106,048

The End of Illness by David B. Agus

confounding variable, Coronary heart disease and physical activity of work, Danny Hillis, discovery of penicillin, double helix, epigenetics, germ theory of disease, Google Earth, Gregor Mendel, impulse control, information retrieval, Larry Ellison, longitudinal study, Marc Benioff, meta-analysis, Michael Milken, microbiome, Murray Gell-Mann, pattern recognition, Pepto Bismol, personalized medicine, randomized controlled trial, risk tolerance, Salesforce, Steve Jobs, systems thinking, TED Talk, the scientific method

The poll also found that 68 percent of those who have access have used the Internet to look for information about specific medicines, and nearly four in ten use it to look for other patients’ experiences of a condition. Without a doubt new technologies are helping more people around the world to find out more about their health and to make better-informed decisions, but often their online searches lack usefulness because the information retrieved cannot be personalized. Relying on dodgy information can easily lead people to take risks with inappropriate tests and treatments, wasting money and causing unnecessary worry. But with a health-record system like Dell’s and its developing infrastructure to tailor health advice and guidance to individual people based on their personal records, the outcome could be revolutionary to our health-care system, instigating the reform that’s sorely needed.


pages: 540 words: 103,101

Building Microservices by Sam Newman

airport security, Amazon Web Services, anti-pattern, business logic, business process, call centre, continuous integration, Conway's law, create, read, update, delete, defense in depth, don't repeat yourself, Edward Snowden, fail fast, fallacies of distributed computing, fault tolerance, index card, information retrieval, Infrastructure as a Service, inventory management, job automation, Kubernetes, load shedding, loose coupling, microservices, MITM: man-in-the-middle, platform as a service, premature optimization, pull request, recommendation engine, Salesforce, SimCity, social graph, software as a service, source of truth, sunk-cost fallacy, systems thinking, the built environment, the long tail, two-pizza team, web application, WebSocket

However, we still need to know how to set up and maintain these systems in a resilient fashion. Starting Again The architecture that gets you started may not be the architecture that keeps you going when your system has to handle very different volumes of load. As Jeff Dean said in his presentation “Challenges in Building Large-Scale Information Retrieval Systems” (WSDM 2009 conference), you should “design for ~10× growth, but plan to rewrite before ~100×.” At certain points, you need to do something pretty radical to support the next level of growth. Recall the story of Gilt, which we touched on in Chapter 6. A simple monolithic Rails application did well for Gilt for two years.


pages: 461 words: 106,027

Zero to Sold: How to Start, Run, and Sell a Bootstrapped Business by Arvid Kahl

business logic, business process, centre right, Chuck Templeton: OpenTable:, cognitive load, content marketing, continuous integration, coronavirus, COVID-19, crowdsourcing, domain-specific language, financial independence, functional programming, Google Chrome, hockey-stick growth, if you build it, they will come, information asymmetry, information retrieval, inventory management, Jeff Bezos, job automation, Kanban, Kubernetes, machine readable, minimum viable product, Network effects, performance metric, post-work, premature optimization, risk tolerance, Ruby on Rails, sentiment analysis, side hustle, Silicon Valley, single source of truth, software as a service, solopreneur, source of truth, statistical model, subscription business, sunk-cost fallacy, supply-chain management, the long tail, trickle-down economics, value engineering, web application

Start by explaining your documents and what they contain in an overview document. Provide a master document that gives your buyer quick access to the data they're looking for at a glance. If you're storing all of your documents in cloud storage like Google Drive, you can cross-link between documents easily. Anything you can do to speed up information retrieval will make the due diligence process less stressful. While the due diligence phase usually comes with certain legal guarantees, don't be naive: there will be bad actors in the field, and some people will just promise more than they're willing to do. While most buyers are serious, some may just want to take a look under the hood of your business.


Reset by Ronald J. Deibert

23andMe, active measures, air gap, Airbnb, Amazon Web Services, Anthropocene, augmented reality, availability heuristic, behavioural economics, Bellingcat, Big Tech, bitcoin, blockchain, blood diamond, Brexit referendum, Buckminster Fuller, business intelligence, Cal Newport, call centre, Cambridge Analytica, carbon footprint, cashless society, Citizen Lab, clean water, cloud computing, computer vision, confounding variable, contact tracing, contact tracing app, content marketing, coronavirus, corporate social responsibility, COVID-19, crowdsourcing, data acquisition, data is the new oil, decarbonisation, deep learning, deepfake, Deng Xiaoping, disinformation, Donald Trump, Doomsday Clock, dual-use technology, Edward Snowden, Elon Musk, en.wikipedia.org, end-to-end encryption, Evgeny Morozov, failed state, fake news, Future Shock, game design, gig economy, global pandemic, global supply chain, global village, Google Hangouts, Great Leap Forward, high-speed rail, income inequality, information retrieval, information security, Internet of things, Jaron Lanier, Jeff Bezos, John Markoff, Lewis Mumford, liberal capitalism, license plate recognition, lockdown, longitudinal study, Mark Zuckerberg, Marshall McLuhan, mass immigration, megastructure, meta-analysis, military-industrial complex, move fast and break things, Naomi Klein, natural language processing, New Journalism, NSO Group, off-the-grid, Peter Thiel, planetary scale, planned obsolescence, post-truth, proprietary trading, QAnon, ransomware, Robert Mercer, Sheryl Sandberg, Shoshana Zuboff, Silicon Valley, single source of truth, Skype, Snapchat, social distancing, sorting algorithm, source of truth, sovereign wealth fund, sparse data, speech recognition, Steve Bannon, Steve Jobs, Stuxnet, surveillance capitalism, techlash, technological solutionism, the long tail, the medium is the message, The Structural Transformation of the Public Sphere, TikTok, TSMC, undersea cable, unit 8200, Vannevar Bush, WikiLeaks, zero day, zero-sum game

Retrieved from https://www.salon.com/2019/06/17/lithium-mining-for-green-electric-cars-is-leaving-a-stain-on-the-planet/ In Chile’s Atacama and Argentina’s Salar de Hombre Muerto regions: Zacune. Lithium. More than half of the world’s cobalt supply is sourced from the Democratic Republic of Congo: U.S. Department of the Interior. (n.d.). Cobalt statistics and information. Retrieved June 16, 2020, from https://www.usgs.gov/centers/nmic/cobalt-statistics-and-information; Eichstaedt, P. (2011). Consuming the Congo: War and conflict minerals in the world’s deadliest place. Chicago Review Press. Cobalt mining operations in the DRC routinely use child labour: Amnesty International. (2016).


The Dream Machine: J.C.R. Licklider and the Revolution That Made Computing Personal by M. Mitchell Waldrop

Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anti-communist, Apple II, battle of ideas, Berlin Wall, Bill Atkinson, Bill Duvall, Bill Gates: Altair 8800, Bletchley Park, Boeing 747, Byte Shop, Charles Babbage, Claude Shannon: information theory, Compatible Time-Sharing System, computer age, Computing Machinery and Intelligence, conceptual framework, cuban missile crisis, Dennis Ritchie, do well by doing good, Donald Davies, double helix, Douglas Engelbart, Douglas Engelbart, Dynabook, experimental subject, Fairchild Semiconductor, fault tolerance, Frederick Winslow Taylor, friendly fire, From Mathematics to the Technologies of Life and Death, functional programming, Gary Kildall, Haight Ashbury, Howard Rheingold, information retrieval, invisible hand, Isaac Newton, Ivan Sutherland, James Watt: steam engine, Jeff Rulifson, John von Neumann, Ken Thompson, Leonard Kleinrock, machine translation, Marc Andreessen, Menlo Park, Multics, New Journalism, Norbert Wiener, packet switching, pink-collar, pneumatic tube, popular electronics, RAND corporation, RFC: Request For Comment, Robert Metcalfe, Silicon Valley, Skinner box, Steve Crocker, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Ted Nelson, The Soul of a New Machine, Turing machine, Turing test, Vannevar Bush, Von Neumann architecture, Wiener process, zero-sum game

I first tried to find close relevance within es- tablished disciplines [such as artificial intelligence,] but in each case I found that the people I would talk with would immediately translate my admittedly strange (for the times) statements of purpose and possibility into their own discipline's framework."9 At the 1960 meeting of the American Documentation Institute, a talk he gave was greeted with yawns, and his proposed augmentation environ- ment was dismissed as just another information-retrieval system. No, Engelbart realized, if his augmentation ideas were ever going to fly, he would have to create a new discipline from scratch. And to do that, he would have to give this new discipline a conceptual framework all its own-a manifesto that would layout his thinking in the most compelling way possible.

To begin with, while he very much liked the idea of having a big influence on PARC's research, he considered Pake's notion of a "graphics research group" a complete nonstarter. Sure, graphics technology was a critical part of this what- ever-it-was he wanted to create. But so were text display, mass-storage technol- ogy, networking technology, information retrieval, and all the rest. Taylor wanted to go after the whole, integrated vision, just as he'd gone after the whole Intergalactic Network. To focus entirely on graphics would be like trying to build the Arpanet by focusing entirely on the technology of telephone lines. And yet Pake did have a point, damn it.


pages: 365 words: 117,713

The Selfish Gene by Richard Dawkins

double helix, Garrett Hardin, Gregor Mendel, information retrieval, lateral thinking, Necker cube, pattern recognition, phenotype, prisoner's dilemma, zero-sum game

By this device, the timing of muscle contractions could be influenced not only by events in the immediate past, but by events in the distant past as well. The memory, or store, is an essential part of a digital computer too. Computer memories are more reliable than human ones, but they are less capacious, and enormously less sophisticated in their techniques of information-retrieval. One of the most striking properties of survival-machine behaviour is its apparent purposiveness. By this I do not just mean that it seems to be well calculated to help the animal's genes to survive, although of course it is. I am talking about a closer analogy to human purposeful behaviour.


pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, backpropagation, basic income, Bayesian statistics, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Charles Babbage, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, creative destruction, crowdsourcing, Danny Hillis, data is not the new oil, data is the new oil, data science, deep learning, DeepMind, double helix, Douglas Hofstadter, driverless car, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, Geoffrey Hinton, global village, Google Glasses, Gödel, Escher, Bach, Hans Moravec, incognito mode, information retrieval, Jeff Hawkins, job automation, John Markoff, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, large language model, lone genius, machine translation, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, Nick Bostrom, NP-complete, off grid, P = NP, PageRank, pattern recognition, phenotype, planetary scale, power law, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, scientific worldview, Second Machine Age, self-driving car, Silicon Valley, social intelligence, speech recognition, Stanford marshmallow experiment, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the long tail, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight, yottabyte, zero-sum game

The use of Naïve Bayes in spam filtering is described in “Stopping spam,” by Joshua Goodman, David Heckerman, and Robert Rounthwaite (Scientific American, 2005). “Relevance weighting of search terms,”* by Stephen Robertson and Karen Sparck Jones (Journal of the American Society for Information Science, 1976), explains the use of Naïve Bayes–like methods in information retrieval. “First links in the Markov chain,” by Brian Hayes (American Scientist, 2013), recounts Markov’s invention of the eponymous chains. “Large language models in machine translation,”* by Thorsten Brants et al. (Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007), explains how Google Translate works.


pages: 437 words: 113,173

Age of Discovery: Navigating the Risks and Rewards of Our New Renaissance by Ian Goldin, Chris Kutarna

"World Economic Forum" Davos, 2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, Airbnb, Albert Einstein, AltaVista, Asian financial crisis, asset-backed security, autonomous vehicles, banking crisis, barriers to entry, battle of ideas, Bear Stearns, Berlin Wall, bioinformatics, bitcoin, Boeing 747, Bonfire of the Vanities, bread and circuses, carbon tax, clean water, collective bargaining, Colonization of Mars, Credit Default Swap, CRISPR, crowdsourcing, cryptocurrency, Dava Sobel, demographic dividend, Deng Xiaoping, digital divide, Doha Development Round, double helix, driverless car, Edward Snowden, Elon Musk, en.wikipedia.org, epigenetics, experimental economics, Eyjafjallajökull, failed state, Fall of the Berlin Wall, financial innovation, full employment, Galaxy Zoo, general purpose technology, Glass-Steagall Act, global pandemic, global supply chain, Higgs boson, Hyperloop, immigration reform, income inequality, indoor plumbing, industrial cluster, industrial robot, information retrieval, information security, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invention of the printing press, Isaac Newton, Islamic Golden Age, Johannes Kepler, Khan Academy, Kickstarter, Large Hadron Collider, low cost airline, low skilled workers, Lyft, Mahbub ul Haq, Malacca Straits, mass immigration, Max Levchin, megacity, Mikhail Gorbachev, moral hazard, Nelson Mandela, Network effects, New Urbanism, non-tariff barriers, Occupy movement, On the Revolutions of the Heavenly Spheres, open economy, Panamax, Paris climate accords, Pearl River Delta, personalized medicine, Peter Thiel, post-Panamax, profit motive, public intellectual, quantum cryptography, rent-seeking, reshoring, Robert Gordon, Robert Metcalfe, Search for Extraterrestrial Intelligence, Second Machine Age, self-driving car, Shenzhen was a fishing village, Silicon Valley, Silicon Valley startup, Skype, smart grid, Snapchat, special economic zone, spice trade, statistical model, Stephen Hawking, Steve Jobs, Stuxnet, synthetic biology, TED Talk, The Future of Employment, too big to fail, trade liberalization, trade route, transaction costs, transatlantic slave trade, uber lyft, undersea cable, uranium enrichment, We are the 99%, We wanted flying cars, instead we got 140 characters, working poor, working-age population, zero day

Goldin, Ian and Kenneth Reinert (2012). Globalization for Development. Oxford: Oxford University Press. 49. Vietnam Food Association (2014). “Yearly Export Statistics.” Retrieved from vietfood.org.vn/en/default.aspx?c=108. 50. Bangladesh Garment Manufacturers and Exporters Association (2015). “Trade Information.” Retrieved from bgmea.com.bd/home/pages/TradeInformation#.U57MMhZLGYU. 51. Burke, Jason (2013, November 14). “Bangladesh Garment Workers Set for 77% Pay Rise.” The Guardian. Retrieved from www.theguardian.com. 52. Goldin, Ian and Kenneth Reinert (2012). Globalization for Development. Oxford: Oxford University Press. 53.


pages: 470 words: 109,589

Apache Solr 3 Enterprise Search Server by Unknown

bioinformatics, business logic, continuous integration, database schema, en.wikipedia.org, fault tolerance, Firefox, full text search, functional programming, information retrieval, natural language processing, performance metric, platform as a service, Ruby on Rails, SQL injection, Wayback Machine, web application

A rich set of chainable text analysis components, such as tokenizers and language-specific stemmers that transform a text string into a series of terms (words). A query syntax with a parser and a variety of query types from a simple term lookup to exotic fuzzy matching. A good scoring algorithm based on sound Information Retrieval (IR) principles to produce the more likely candidates first, with flexible means to affect the scoring. Search enhancing features like:A highlighter feature to show query words found in context. A query spellchecker based on indexed content or a supplied dictionary. A "more like this" feature to list documents that are statistically similar to provided text.


pages: 397 words: 110,130

Smarter Than You Think: How Technology Is Changing Our Minds for the Better by Clive Thompson

4chan, A Declaration of the Independence of Cyberspace, Andy Carvin, augmented reality, barriers to entry, behavioural economics, Benjamin Mako Hill, butterfly effect, citizen journalism, Claude Shannon: information theory, compensation consultant, conceptual framework, context collapse, corporate governance, crowdsourcing, Deng Xiaoping, digital rights, discovery of penicillin, disruptive innovation, Douglas Engelbart, Douglas Engelbart, drone strike, Edward Glaeser, Edward Thorp, en.wikipedia.org, Evgeny Morozov, experimental subject, Filter Bubble, folksonomy, Freestyle chess, Galaxy Zoo, Google Earth, Google Glasses, Gunnar Myrdal, guns versus butter model, Henri Poincaré, hindsight bias, hive mind, Howard Rheingold, Ian Bogost, information retrieval, iterative process, James Bridle, jimmy wales, John Perry Barlow, Kevin Kelly, Khan Academy, knowledge worker, language acquisition, lifelogging, lolcat, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Netflix Prize, Nicholas Carr, Panopticon Jeremy Bentham, patent troll, pattern recognition, pre–internet, public intellectual, Richard Feynman, Ronald Coase, Ronald Reagan, Rubik’s Cube, sentiment analysis, Silicon Valley, Skype, Snapchat, Socratic dialogue, spaced repetition, superconnector, telepresence, telepresence robot, The future is already here, The Nature of the Firm, the scientific method, the strength of weak ties, The Wisdom of Crowds, theory of mind, transaction costs, Twitter Arab Spring, Two Sigma, Vannevar Bush, Watson beat the top human players on Jeopardy!, WikiLeaks, X Prize, éminence grise

the Wikipedia page on “Drone attacks in Pakistan”: “Drone attacks in Pakistan,” Wikipedia, accessed March 24, 2013, en.wikipedia.org/wiki/Drone_attacks_in_Pakistan. 40 percent of all queries are acts of remembering: Jaime Teevan, Eytan Adar, Rosie Jones, and Michael A. S. Potts, “Information Re-Retrieval: Repeat Queries in Yahoo’s Logs,” in SIGIR ’07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007), 151–58. collaborative inhibition: Celia B. Harris, Paul G. Keil, John Sutton, and Amanda J. Barnier, “We Remember, We Forget: Collaborative Remembering in Older Couples,” Discourse Processes 48, no. 4 (2011), 267–303. In his essay “Mathematical Creation”: Henri Poincaré, “Mathematical Creation,” in The Anatomy of Memory: An Anthology (New York: Oxford University Press, 1996), 126–35.


pages: 1,172 words: 114,305

New Laws of Robotics: Defending Human Expertise in the Age of AI by Frank Pasquale

affirmative action, Affordable Care Act / Obamacare, Airbnb, algorithmic bias, Amazon Mechanical Turk, Anthropocene, augmented reality, Automated Insights, autonomous vehicles, basic income, battle of ideas, Bernie Sanders, Big Tech, Bill Joy: nanobots, bitcoin, blockchain, Brexit referendum, call centre, Cambridge Analytica, carbon tax, citizen journalism, Clayton Christensen, collective bargaining, commoditize, computer vision, conceptual framework, contact tracing, coronavirus, corporate social responsibility, correlation does not imply causation, COVID-19, critical race theory, cryptocurrency, data is the new oil, data science, decarbonisation, deep learning, deepfake, deskilling, digital divide, digital twin, disinformation, disruptive innovation, don't be evil, Donald Trump, Douglas Engelbart, driverless car, effective altruism, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, fake news, Filter Bubble, finite state, Flash crash, future of work, gamification, general purpose technology, Google Chrome, Google Glasses, Great Leap Forward, green new deal, guns versus butter model, Hans Moravec, high net worth, hiring and firing, holacracy, Ian Bogost, independent contractor, informal economy, information asymmetry, information retrieval, interchangeable parts, invisible hand, James Bridle, Jaron Lanier, job automation, John Markoff, Joi Ito, Khan Academy, knowledge economy, late capitalism, lockdown, machine readable, Marc Andreessen, Mark Zuckerberg, means of production, medical malpractice, megaproject, meta-analysis, military-industrial complex, Modern Monetary Theory, Money creation, move fast and break things, mutually assured destruction, natural language processing, new economy, Nicholas Carr, Nick Bostrom, Norbert Wiener, nuclear winter, obamacare, One Laptop per Child (OLPC), open immigration, OpenAI, opioid epidemic / opioid crisis, paperclip maximiser, paradox of thrift, pattern recognition, payday loans, personalized medicine, Peter Singer: altruism, Philip Mirowski, pink-collar, plutocrats, post-truth, pre–internet, profit motive, public intellectual, QR code, quantitative easing, race to the bottom, RAND corporation, Ray Kurzweil, recommendation engine, regulatory arbitrage, Robert Shiller, Rodney Brooks, Ronald Reagan, self-driving car, sentiment analysis, Shoshana Zuboff, Silicon Valley, Singularitarianism, smart cities, smart contracts, software is eating the world, South China Sea, Steve Bannon, Strategic Defense Initiative, surveillance capitalism, Susan Wojcicki, tacit knowledge, TaskRabbit, technological solutionism, technoutopianism, TED Talk, telepresence, telerobotics, The Future of Employment, The Turner Diaries, Therac-25, Thorstein Veblen, too big to fail, Turing test, universal basic income, unorthodox policies, wage slave, Watson beat the top human players on Jeopardy!, working poor, workplace surveillance , Works Progress Administration, zero day

The sociologist Harold Wilensky once observed that “many occupations engage in heroic struggles for professional identification; few make the grade.”43 But if we are to maintain a democratic society rather than give ourselves over to the rise of the robots—or to those who bid them to rise—then we must spread the status and autonomy now enjoyed by professionals in fields like law and medicine to information retrieval, dispute resolution, elder care, marketing, planning, designing, and many other fields. Imagine a labor movement built on solidarity between workers who specialize in non-routine tasks. If they succeed in uniting, they might project a vision of labor far more concrete and realistic than the feudal futurism of techno-utopians.


pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline by Cathy O'Neil, Rachel Schutt

Amazon Mechanical Turk, augmented reality, Augustin-Louis Cauchy, barriers to entry, Bayesian statistics, bike sharing, bioinformatics, computer vision, confounding variable, correlation does not imply causation, crowdsourcing, data science, distributed generation, Dunning–Kruger effect, Edward Snowden, Emanuel Derman, fault tolerance, Filter Bubble, finite state, Firefox, game design, Google Glasses, index card, information retrieval, iterative process, John Harrison: Longitude, Khan Academy, Kickstarter, machine translation, Mars Rover, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, performance metric, personalized medicine, pull request, recommendation engine, rent-seeking, selection bias, Silicon Valley, speech recognition, statistical model, stochastic process, tacit knowledge, text mining, the scientific method, The Wisdom of Crowds, Watson beat the top human players on Jeopardy!, X Prize

Another evaluation metric you could use is precision, defined in Chapter 5. The fact that some of the same formulas have different names is due to the fact that different academic disciplines have developed these ideas separately. So precision and recall are the quantities used in the field of information retrieval. Note that precision is not the same thing as specificity. Finally, we have accuracy, which is the ratio of the number of correct labels to the total number of labels, and the misclassification rate, which is just 1–accuracy. Minimizing the misclassification rate then just amounts to maximizing accuracy.


Succeeding With AI: How to Make AI Work for Your Business by Veljko Krunic

AI winter, Albert Einstein, algorithmic trading, AlphaGo, Amazon Web Services, anti-fragile, anti-pattern, artificial general intelligence, autonomous vehicles, Bayesian statistics, bioinformatics, Black Swan, Boeing 737 MAX, business process, cloud computing, commoditize, computer vision, correlation coefficient, data is the new oil, data science, deep learning, DeepMind, en.wikipedia.org, fail fast, Gini coefficient, high net worth, information retrieval, Internet of things, iterative process, job automation, Lean Startup, license plate recognition, minimum viable product, natural language processing, recommendation engine, self-driving car, sentiment analysis, Silicon Valley, six sigma, smart cities, speech recognition, statistical model, strong AI, tail risk, The Design of Experiments, the scientific method, web application, zero-sum game

Although you can use the F-score to measure other characteristics of this system, it’s certainly not a good technical metric for a profit curve in which the business metric is cost savings. If this is the case, why do people use F-score at all? Because the F-score makes sense in many areas of information retrieval, but not in our particular business case. F-score is often used in the context of NLP [124], so if you’re debating which technical metrics to use, it’s a reasonable starting point. The broader teaching point is that just because a certain technical metric is widely used, doesn’t automatically make it a useful metric for your profit curve.


pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life by Adam Greenfield

3D printing, Airbnb, algorithmic bias, algorithmic management, AlphaGo, augmented reality, autonomous vehicles, bank run, barriers to entry, basic income, bitcoin, Black Lives Matter, blockchain, Boston Dynamics, business intelligence, business process, Californian Ideology, call centre, cellular automata, centralized clearinghouse, centre right, Chuck Templeton: OpenTable:, circular economy, cloud computing, Cody Wilson, collective bargaining, combinatorial explosion, Computer Numeric Control, computer vision, Conway's Game of Life, CRISPR, cryptocurrency, David Graeber, deep learning, DeepMind, dematerialisation, digital map, disruptive innovation, distributed ledger, driverless car, drone strike, Elon Musk, Ethereum, ethereum blockchain, facts on the ground, fiat currency, fulfillment center, gentrification, global supply chain, global village, Goodhart's law, Google Glasses, Herman Kahn, Ian Bogost, IBM and the Holocaust, industrial robot, informal economy, information retrieval, Internet of things, Jacob Silverman, James Watt: steam engine, Jane Jacobs, Jeff Bezos, Jeff Hawkins, job automation, jobs below the API, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John Perry Barlow, John von Neumann, joint-stock company, Kevin Kelly, Kickstarter, Kiva Systems, late capitalism, Leo Hollis, license plate recognition, lifelogging, M-Pesa, Mark Zuckerberg, means of production, megacity, megastructure, minimum viable product, money: store of value / unit of account / medium of exchange, natural language processing, Network effects, New Urbanism, Nick Bostrom, Occupy movement, Oculus Rift, off-the-grid, PalmPilot, Pareto efficiency, pattern recognition, Pearl River Delta, performance metric, Peter Eisenman, Peter Thiel, planetary scale, Ponzi scheme, post scarcity, post-work, printed gun, proprietary trading, RAND corporation, recommendation engine, RFID, rolodex, Rutger Bregman, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, sharing economy, Shenzhen special economic zone , Sidewalk Labs, Silicon Valley, smart cities, smart contracts, social intelligence, sorting algorithm, special economic zone, speech recognition, stakhanovite, statistical model, stem cell, technoutopianism, Tesla Model S, the built environment, The Death and Life of Great American Cities, The Future of Employment, Tony Fadell, transaction costs, Uber for X, undersea cable, universal basic income, urban planning, urban sprawl, vertical integration, Vitalik Buterin, warehouse robotics, When a measure becomes a target, Whole Earth Review, WikiLeaks, women in the workforce

One scenario along these lines is that proposed by Simon Taylor, VP for Blockchain R&D at Barclays Bank, in a white paper on distributed-ledger applications prepared for the British government.19 Taylor imagines all of our personal information stored on a common blockchain, duly encrypted. Any legitimate actor, public or private—the HR department, the post office, your bank, the police—could query the same unimpeachable source of information, retrieve from it only what they were permitted to, and leave behind a trace of their access. Each of us would have read/write access to our own record; should we find erroneous information, we would have to make but one correction, and it would then propagate across the files of every institution with access to the ledger.


pages: 597 words: 119,204

Website Optimization by Andrew B. King

AltaVista, AOL-Time Warner, bounce rate, don't be evil, Dr. Strangelove, en.wikipedia.org, Firefox, In Cold Blood by Truman Capote, information retrieval, iterative process, Kickstarter, machine readable, medical malpractice, Network effects, OSI model, performance metric, power law, satellite internet, search engine result page, second-price auction, second-price sealed-bid, semantic web, Silicon Valley, slashdot, social bookmarking, social graph, Steve Jobs, the long tail, three-martini lunch, traumatic brain injury, web application

This is especially true in the more complex world of the Web where application calls are hidden within the content portion of the page and third parties are critical to the overall download time. You need to have a view into every piece of the page load in order to manage and improve it. * * * [167] Roast, C. 1998. "Designing for Delay in Interactive Information Retrieval." Interacting with Computers 10 (1): 87–104. [168] Balashov, K., and A. King. 2003. "Compressing the Web." In Speed Up Your Site: Web Site Optimization. Indianapolis: New Riders, 412. A test of 25 popular sites found that HTTP gzip compression saved 75% on average off text file sizes and 37% overall


pages: 436 words: 124,373

Galactic North by Alastair Reynolds

back-to-the-land, Buckminster Fuller, hive mind, information retrieval, Kickstarter, risk/return, stem cell, time dilation, trade route

"Veda would have figured it out." "We'll never know now, will we?" "What does it matter?" she said. "Gonna kill them anyway, aren't you?" Seven flashed an arc of teeth filed to points and waved a hand towards the female pirate. "Allow me to introduce Mirsky, our loose-tongued but efficient information retrieval specialist. She's going to take you on a little trip down memory lane; see if we can't remember those access codes." "What codes?" "It'll come back to you," Seven said. They were taken through the tunnels, past half-assembled mining machines, onto the surface and then into the pirate ship.


pages: 320 words: 87,853

The Black Box Society: The Secret Algorithms That Control Money and Information by Frank Pasquale

Adam Curtis, Affordable Care Act / Obamacare, Alan Greenspan, algorithmic trading, Amazon Mechanical Turk, American Legislative Exchange Council, asset-backed security, Atul Gawande, bank run, barriers to entry, basic income, Bear Stearns, Berlin Wall, Bernie Madoff, Black Swan, bonus culture, Brian Krebs, business cycle, business logic, call centre, Capital in the Twenty-First Century by Thomas Piketty, Chelsea Manning, Chuck Templeton: OpenTable:, cloud computing, collateralized debt obligation, computerized markets, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, cryptocurrency, data science, Debian, digital rights, don't be evil, drone strike, Edward Snowden, en.wikipedia.org, Evgeny Morozov, Fall of the Berlin Wall, Filter Bubble, financial engineering, financial innovation, financial thriller, fixed income, Flash crash, folksonomy, full employment, Gabriella Coleman, Goldman Sachs: Vampire Squid, Google Earth, Hernando de Soto, High speed trading, hiring and firing, housing crisis, Ian Bogost, informal economy, information asymmetry, information retrieval, information security, interest rate swap, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Bogle, Julian Assange, Kevin Kelly, Kevin Roose, knowledge worker, Kodak vs Instagram, kremlinology, late fees, London Interbank Offered Rate, London Whale, machine readable, Marc Andreessen, Mark Zuckerberg, Michael Milken, mobile money, moral hazard, new economy, Nicholas Carr, offshore financial centre, PageRank, pattern recognition, Philip Mirowski, precariat, profit maximization, profit motive, public intellectual, quantitative easing, race to the bottom, reality distortion field, recommendation engine, regulatory arbitrage, risk-adjusted returns, Satyajit Das, Savings and loan crisis, search engine result page, shareholder value, Silicon Valley, Snapchat, social intelligence, Spread Networks laid a new fibre optics cable between New York and Chicago, statistical arbitrage, statistical model, Steven Levy, technological solutionism, the scientific method, too big to fail, transaction costs, two-sided market, universal basic income, Upton Sinclair, value at risk, vertical integration, WikiLeaks, Yochai Benkler, zero-sum game

The question now is whether its dictatorship will be benign. Does Google intend Book Search to promote widespread public access, or is it envisioning finely tiered access to content, granted (and withheld) in opaque ways?168 Will Google grant open access to search results on its platform, so experts in library science and information retrieval can understand (and critique) its orderings of results?169 Finally, where will the profits go from this immense cooperative project? Will they be distributed fairly among contributors, or will this be another instance in which the aggregator of content captures an unfair share of revenues from well-established dynamics of content digitization?


pages: 570 words: 115,722

The Tangled Web: A Guide to Securing Modern Web Applications by Michal Zalewski

barriers to entry, business process, defense in depth, easy for humans, difficult for computers, fault tolerance, finite state, Firefox, Google Chrome, information retrieval, information security, machine readable, Multics, RFC: Request For Comment, semantic web, Steve Jobs, telemarketer, Tragedy of the Commons, Turing test, Vannevar Bush, web application, WebRTC, WebSocket

The subsequent proposals experimented with an increasingly bizarre set of methods to permit interactions other than retrieving a document or running a script, including such curiosities as SHOWMETHOD, CHECKOUT, or—why not—SPACEJUMP.[122] Most of these thought experiments have been abandoned in HTTP/1.1, which settles on a more manageable set of eight methods. Only the first two request types—GET and POST—are of any significance to most of the modern Web. GET The GET method is meant to signify information retrieval. In practice, it is used for almost all client-server interactions in the course of a normal browsing session. Regular GET requests carry no browser-supplied payloads, although they are not strictly prohibited from doing so. The expectation is that GET requests should not have, to quote the RFC, “significance of taking an action other than retrieval” (that is, they should make no persistent changes to the state of the application).


pages: 455 words: 138,716

The Divide: American Injustice in the Age of the Wealth Gap by Matt Taibbi

"RICO laws" OR "Racketeer Influenced and Corrupt Organizations", Alan Greenspan, banking crisis, Bear Stearns, Bernie Madoff, book value, butterfly effect, buy and hold, collapse of Lehman Brothers, collateralized debt obligation, company town, Corrections Corporation of America, Credit Default Swap, credit default swaps / collateralized debt obligations, Edward Snowden, ending welfare as we know it, fake it until you make it, fixed income, forensic accounting, Glass-Steagall Act, Gordon Gekko, greed is good, illegal immigration, information retrieval, London Interbank Offered Rate, London Whale, Michael Milken, naked short selling, off-the-grid, offshore financial centre, Ponzi scheme, profit motive, regulatory arbitrage, Savings and loan crisis, short selling, social contagion, telemarketer, too big to fail, two and twenty, War on Poverty

“Just think what I could do with your emails,” he hissed, adding that he, Spyro, was going to “consider all my options as maintaining our confidentiality,” and that if the executive didn’t cooperate, he could “no longer rely on my discretion.” Contogouris seemed to be playing a triple game. First, he was genuinely trying to deliver an informant to the FBI and set himself up as an FBI informant. Second, he was trying to deliver confidential information to the hedge funds, to whom he had set himself up as an expert at information retrieval. And third, he was playing secret source to “reputable” journalists, to whom he had promised to deliver stunning exposés. Contogouris even referenced one of those contacts in his adolescent coded emails to Sender sent from London that day: CONTOGOURIS: We have been rapping here about the postman.


pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang

AI winter, artificial general intelligence, backpropagation, bioinformatics, brain emulation, classic study, combinatorial explosion, complexity theory, computer vision, Computing Machinery and Intelligence, conceptual framework, correlation coefficient, epigenetics, friendly AI, functional programming, G4S, higher-order functions, information retrieval, Isaac Newton, Jeff Hawkins, John Conway, Loebner Prize, Menlo Park, natural language processing, Nick Bostrom, Occam's razor, p-value, pattern recognition, performance metric, precautionary principle, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

NARS can be connected to existing knowledge bases, such as Cyc (for commonsense knowledge), WordNet (for linguistic knowledge), Mizar (for mathematical knowledge), and so on. For each of them, a special interface module should be able to approximately translate knowledge from its original format into Narsese. x The Internet. It is possible for NARS to be equipped with additional modules, which use techniques like semantic web, information retrieval, and data mining, to directly acquire certain knowledge from the Internet, and put them into Narsese. x Natural language interface. After NARS has learned a natural language (as discussed previously), it should be able to accept knowledge from various sources in that language. Additionally, interactive tutoring will be necessary, which allows a human trainer to monitor the establishing of the knowledge base, to answer questions, to guide the system to form a proper goal structure and priority distributions among its concepts, tasks, and beliefs.


pages: 696 words: 143,736

The Age of Spiritual Machines: When Computers Exceed Human Intelligence by Ray Kurzweil

Ada Lovelace, Alan Greenspan, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Alvin Toffler, Any sufficiently advanced technology is indistinguishable from magic, backpropagation, Buckminster Fuller, call centre, cellular automata, Charles Babbage, classic study, combinatorial explosion, complexity theory, computer age, computer vision, Computing Machinery and Intelligence, cosmological constant, cosmological principle, Danny Hillis, double helix, Douglas Hofstadter, Everything should be made as simple as possible, financial engineering, first square of the chessboard / second half of the chessboard, flying shuttle, fudge factor, functional programming, George Gilder, Gödel, Escher, Bach, Hans Moravec, I think there is a world market for maybe five computers, information retrieval, invention of movable type, Isaac Newton, iterative process, Jacquard loom, John Gilmore, John Markoff, John von Neumann, Lao Tzu, Law of Accelerating Returns, mandelbrot fractal, Marshall McLuhan, Menlo Park, natural language processing, Norbert Wiener, optical character recognition, ought to be enough for anybody, pattern recognition, phenotype, punch-card reader, quantum entanglement, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Robert Metcalfe, Schrödinger's Cat, Search for Extraterrestrial Intelligence, self-driving car, Silicon Valley, social intelligence, speech recognition, Steven Pinker, Stewart Brand, stochastic process, Stuart Kauffman, technological singularity, Ted Kaczynski, telepresence, the medium is the message, The Soul of a New Machine, There's no reason for any individual to have a computer in his home - Ken Olsen, traveling salesman, Turing machine, Turing test, Whole Earth Review, world market for maybe five computers, Y2K

Cybernetics A term coined by Norbert Wiener to describe the “science of control and communication in animals and machines.” Cybernetics is based on the theory that intelligent living beings adapt to their environments and accomplish objectives primarily by reacting to feedback from their surroundings. Database The structured collection of data that is designed in connection with an information retrieval system. A database management system (DBMS) allows monitoring, updating, and interacting with the database. Debugging The process of discovering and correcting errors in computer hardware and software. The issue of bugs or errors in a program will become increasingly important as computers are integrated into the human brain and physiology throughout the twenty-first century.


pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection by Jacob Silverman

"World Economic Forum" Davos, 23andMe, 4chan, A Declaration of the Independence of Cyberspace, Aaron Swartz, Airbnb, airport security, Amazon Mechanical Turk, augmented reality, basic income, Big Tech, Brian Krebs, California gold rush, Californian Ideology, call centre, cloud computing, cognitive dissonance, commoditize, company town, context collapse, correlation does not imply causation, Credit Default Swap, crowdsourcing, data science, deep learning, digital capitalism, disinformation, don't be evil, driverless car, drone strike, Edward Snowden, Evgeny Morozov, fake it until you make it, feminist movement, Filter Bubble, Firefox, Flash crash, game design, global village, Google Chrome, Google Glasses, Higgs boson, hive mind, Ian Bogost, income inequality, independent contractor, informal economy, information retrieval, Internet of things, Jacob Silverman, Jaron Lanier, jimmy wales, John Perry Barlow, Kevin Kelly, Kevin Roose, Kickstarter, knowledge economy, knowledge worker, Larry Ellison, late capitalism, Laura Poitras, license plate recognition, life extension, lifelogging, lock screen, Lyft, machine readable, Mark Zuckerberg, Mars Rover, Marshall McLuhan, mass incarceration, meta-analysis, Minecraft, move fast and break things, national security letter, Network effects, new economy, Nicholas Carr, Occupy movement, off-the-grid, optical character recognition, payday loans, Peter Thiel, planned obsolescence, postindustrial economy, prediction markets, pre–internet, price discrimination, price stability, profit motive, quantitative hedge fund, race to the bottom, Ray Kurzweil, real-name policy, recommendation engine, rent control, rent stabilization, RFID, ride hailing / ride sharing, Salesforce, self-driving car, sentiment analysis, shareholder value, sharing economy, Sheryl Sandberg, Silicon Valley, Silicon Valley ideology, Snapchat, social bookmarking, social graph, social intelligence, social web, sorting algorithm, Steve Ballmer, Steve Jobs, Steven Levy, systems thinking, TaskRabbit, technological determinism, technological solutionism, technoutopianism, TED Talk, telemarketer, transportation-network company, Travis Kalanick, Turing test, Uber and Lyft, Uber for X, uber lyft, universal basic income, unpaid internship, women in the workforce, Y Combinator, yottabyte, you are the product, Zipcar

As storage costs decrease and analytical powers grow, it’s not unreasonable to think that this capability will be extended to other targets, including, should the political environment allow it, the United States. Some of the NSA’s surveillance capacity derives from deals made with Internet firms—procedures for automating court-authorized information retrieval, direct access to central servers, and even (as in the case of Verizon) fiber optic cables piped from military bases into major Internet hubs. In the United States, the NSA uses the FBI to conduct surveillance authorized under the Patriot Act and to issue National Security Letters (NSLs)—subpoenas requiring recipients to turn over any information deemed relevant to an ongoing investigation.


pages: 550 words: 154,725

The Idea Factory: Bell Labs and the Great Age of American Innovation by Jon Gertner

Albert Einstein, back-to-the-land, Black Swan, business climate, Charles Babbage, Claude Shannon: information theory, Clayton Christensen, complexity theory, corporate governance, cuban missile crisis, Dennis Ritchie, Edward Thorp, Fairchild Semiconductor, Henry Singleton, horn antenna, Hush-A-Phone, information retrieval, invention of the telephone, James Watt: steam engine, Karl Jansky, Ken Thompson, knowledge economy, Leonard Kleinrock, machine readable, Metcalfe’s law, Nicholas Carr, Norbert Wiener, Picturephone, Richard Feynman, Robert Metcalfe, Russell Ohl, Sand Hill Road, Silicon Valley, Skype, space junk, Steve Jobs, Telecommunications Act of 1996, Teledyne, traveling salesman, undersea cable, uranium enrichment, vertical integration, William Shockley: the traitorous eight

A visitor could also try something called a portable “pager,” a big, blocky device that could alert doctors and other busy professionals when they received urgent calls.2 New York’s fair would dwarf Seattle’s. The crowds were expected to be immense—probably somewhere around 50 or 60 million people in total. Pierce and David’s 1961 memo recommended a number of exhibits: “personal hand-carried telephones,” “business letters in machine-readable form, transmitted by wire,” “information retrieval from a distant computer-automated library,” and “satellite and space communications.” By the time the fair opened in April 1964, though, the Bell System exhibits, housed in a huge white cantilevered building nicknamed the “floating wing,” described a more conservative future than the one Pierce and David had envisioned.


pages: 492 words: 153,565

Countdown to Zero Day: Stuxnet and the Launch of the World's First Digital Weapon by Kim Zetter

air gap, Ayatollah Khomeini, Brian Krebs, crowdsourcing, data acquisition, Doomsday Clock, drone strike, Edward Snowden, facts on the ground, false flag, Firefox, friendly fire, Google Earth, information retrieval, information security, John Markoff, Julian Assange, Kickstarter, Loma Prieta earthquake, machine readable, Maui Hawaii, military-industrial complex, MITM: man-in-the-middle, Morris worm, pre–internet, RAND corporation, rolling blackouts, Silicon Valley, skunkworks, smart grid, smart meter, South China Sea, Stuxnet, Timothy McVeigh, two and twenty, undersea cable, unit 8200, uranium enrichment, Vladimir Vetrov: Farewell Dossier, WikiLeaks, Y2K, zero day

See “Software Problem Led to System Failure at Dhahran, Saudi Arabia,” US Government Accountability Office, February 4, 1992, available at gao.gov/products/IMTEC-92-26. 22 Bryan, “Lessons from Our Cyber Past.” 23 “The Information Operations Roadmap,” dated October 30, 2003, is a seventy-four-page report that was declassified in 2006, though the pages dealing with computer network attacks are heavily redacted. The document is available at http://information-retrieval.info/docs/DoD-IO.html. 24 Arquilla Frontline “CyberWar!” interview. A Washington Post story indicates that attacks on computers controlling air-defense systems in Kosovo were launched from electronic-jamming aircraft rather than over computer networks from ground-based keyboards. Bradley Graham, “Military Grappling with Rules for Cyber,” Washington Post, November 8, 1999. 25 James Risen, “Crisis in the Balkans: Subversion; Covert Plan Said to Take Aim at Milosevic’s Hold on Power,” New York Times, June 18, 1999.


pages: 467 words: 149,632

If Then: How Simulmatics Corporation Invented the Future by Jill Lepore

A Declaration of the Independence of Cyberspace, Alvin Toffler, anti-communist, Apollo 11, Buckminster Fuller, Cambridge Analytica, company town, computer age, coronavirus, cuban missile crisis, data science, desegregation, don't be evil, Donald Trump, Dr. Strangelove, Elon Musk, fake news, game design, George Gilder, Grace Hopper, Hacker Ethic, Howard Zinn, index card, information retrieval, Jaron Lanier, Jeff Bezos, Jeffrey Epstein, job automation, John Perry Barlow, land reform, linear programming, Mahatma Gandhi, Marc Andreessen, Mark Zuckerberg, mass incarceration, Maui Hawaii, Menlo Park, military-industrial complex, New Journalism, New Urbanism, Norbert Wiener, Norman Mailer, packet switching, Peter Thiel, profit motive, punch-card reader, RAND corporation, Robert Bork, Ronald Reagan, Rosa Parks, self-driving car, Silicon Valley, SimCity, smart cities, social distancing, South China Sea, Stewart Brand, technoutopianism, Ted Sorensen, Telecommunications Act of 1996, urban renewal, War on Poverty, white flight, Whole Earth Catalog

To help his reader picture what he pictured, he conjured a scene set in 2000 in which a person sits at a computer console and attempts to get to the bottom of a research question merely by undertaking a series of searches. Nearly all of what Licklider described in Libraries of the Future later came to pass: the digitization of printed material, the networking of library catalogs and their contents, the development of sophisticated, natural language–based information-retrieval and search mechanisms.23 Licklider described, with a contagious amazement, what would become, in the twenty-first century, the Internet at its very best. In 1962, Licklider left Bolt Beranek and Newman for ARPA, where his many duties included funding behavioral science projects, including Pool’s Project ComCom.


pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

AlphaGo, Amazon Mechanical Turk, Anton Chekhov, backpropagation, combinatorial explosion, computer vision, constrained optimization, correlation coefficient, crowdsourcing, data science, deep learning, DeepMind, don't repeat yourself, duck typing, Elon Musk, en.wikipedia.org, friendly AI, Geoffrey Hinton, ImageNet competition, information retrieval, iterative process, John von Neumann, Kickstarter, machine translation, natural language processing, Netflix Prize, NP-complete, OpenAI, optical character recognition, P = NP, p-value, pattern recognition, pull request, recommendation engine, self-driving car, sentiment analysis, SpamAssassin, speech recognition, stochastic process

Build a classification deep neural network, reusing the lower layers of the autoencoder. Train it using only 10% of the training set. Can you get it to perform as well as the same classifier trained on the full training set? Semantic hashing, introduced in 2008 by Ruslan Salakhutdinov and Geoffrey Hinton,13 is a technique used for efficient information retrieval: a document (e.g., an image) is passed through a system, typically a neural network, which outputs a fairly low-dimensional binary vector (e.g., 30 bits). Two similar documents are likely to have identical or very similar hashes. By indexing each document using its hash, it is possible to retrieve many documents similar to a particular document almost instantly, even if there are billions of documents: just compute the hash of the document and look up all documents with that same hash (or hashes differing by just one or two bits).


Smart Grid Standards by Takuro Sato

business cycle, business process, carbon footprint, clean water, cloud computing, data acquisition, decarbonisation, demand response, distributed generation, electricity market, energy security, exponential backoff, factory automation, Ford Model T, green new deal, green transition, information retrieval, information security, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Iridium satellite, iterative process, knowledge economy, life extension, linear programming, low earth orbit, machine readable, market design, MITM: man-in-the-middle, off grid, oil shale / tar sands, OSI model, packet switching, performance metric, RFC: Request For Comment, RFID, smart cities, smart grid, smart meter, smart transportation, Thomas Davenport

Unlike C12.18 or C12.21 protocols, which only support session-oriented communications, the sessionless communication has the advantage of requiring less complex handling on both sides of the communication links and reduces the number of signaling overhead. ANSI C12.22 has a common application layer (layer 7 in the OSI, Open System Interconnection, reference model), which provides a minimal set of services and data structures required to support C12.22 nodes for the purposes of configuration, programming, and information retrieval in a networked environment. The application layer is independent of the underlying network technologies. This enables interoperability between C12.22 with already existing communication systems. C12.22 also defines a number of application layer services, which are combined to realize the various functions of the C12.22 protocols.


pages: 542 words: 161,731

Alone Together by Sherry Turkle

Albert Einstein, Columbine, Computing Machinery and Intelligence, fake news, Future Shock, global village, Hacker Ethic, helicopter parent, Howard Rheingold, industrial robot, information retrieval, Jacques de Vaucanson, Jaron Lanier, Joan Didion, John Markoff, Kevin Kelly, lifelogging, Loebner Prize, Marshall McLuhan, meta-analysis, mirror neurons, Nicholas Carr, Norbert Wiener, off-the-grid, Panopticon Jeremy Bentham, Paradox of Choice, Ralph Waldo Emerson, Rodney Brooks, Skype, social intelligence, stem cell, technological determinism, technoutopianism, The Great Good Place, the medium is the message, the strength of weak ties, theory of mind, Turing test, Vannevar Bush, Wall-E, warehouse robotics, women in the workforce, Year of Magical Thinking

From 1996 on, Thad Starner, who like Steve Mann was a member of the MIT cyborg group, worked on the Remembrance Agent, a tool that would sit on your computer desktop (or now, your mobile device) and not only record what you were doing but make suggestions about what you might be interested in looking at next. See Bradley J. Rhodes and Thad Starner, “Remembrance Agent: A Continuously Running Personal Information Retrieval System,” Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi Agent Technology (PAAM ’96),487-495, 487-495, www.bradleyrhodes.com/Papers/remembrance.html (accessed December 14, 2009).Albert Frigo’s “Storing, Indexing and Retrieving My Autobiography,” presented at the 2004 Workshop on Memory and the Sharing of Experience in Vienna, Austria, describes a device to take pictures of what comes into his hand.


pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, Anthropocene, anti-communist, artificial general intelligence, autism spectrum disorder, autonomous vehicles, backpropagation, barriers to entry, Bayesian statistics, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, Computing Machinery and Intelligence, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, Demis Hassabis, demographic transition, different worldview, Donald Knuth, Douglas Hofstadter, driverless car, Drosophila, Elon Musk, en.wikipedia.org, endogenous growth, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, general purpose technology, Geoffrey Hinton, Gödel, Escher, Bach, hallucination problem, Hans Moravec, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John Markoff, John von Neumann, knowledge worker, Large Hadron Collider, longitudinal study, machine translation, megaproject, Menlo Park, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Nick Bostrom, Norbert Wiener, NP-complete, nuclear winter, operational security, optical character recognition, paperclip maximiser, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, search costs, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, Strategic Defense Initiative, strong AI, superintelligent machines, supervolcano, synthetic biology, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, time dilation, Tragedy of the Commons, transaction costs, trolley problem, Turing machine, Vernor Vinge, WarGames: Global Thermonuclear War, Watson beat the top human players on Jeopardy!, World Values Survey, zero-sum game

Software polices the world’s email traffic, and despite continual adaptation by spammers to circumvent the countermeasures being brought against them, Bayesian spam filters have largely managed to hold the spam tide at bay. Software using AI components is responsible for automatically approving or declining credit card transactions, and continuously monitors account activity for signs of fraudulent use. Information retrieval systems also make extensive use of machine learning. The Google search engine is, arguably, the greatest AI system that has yet been built. Now, it must be stressed that the demarcation between artificial intelligence and software in general is not sharp. Some of the applications listed above might be viewed more as generic software applications rather than AI in particular—though this brings us back to McCarthy’s dictum that when something works it is no longer called AI.


In the Age of the Smart Machine by Shoshana Zuboff

affirmative action, American ideology, blue-collar work, collective bargaining, computer age, Computer Numeric Control, conceptual framework, data acquisition, demand response, deskilling, factory automation, Ford paid five dollars a day, fudge factor, future of work, industrial robot, information retrieval, interchangeable parts, job automation, lateral thinking, linked data, Marshall McLuhan, means of production, old-boy network, optical character recognition, Panopticon Jeremy Bentham, pneumatic tube, post-industrial society, radical decentralization, RAND corporation, scientific management, Shoshana Zuboff, social web, systems thinking, tacit knowledge, The Wealth of Nations by Adam Smith, Thorstein Veblen, union organizing, vertical integration, work culture , zero-sum game

The designers complained that as systems use became more central to task performance, managers and operators would need a more ana- lytic understanding of their work in order to determine their informa- tion requirements. They would also need a deeper level of insight into the systems themselves (procedural reasoning) that would allow them to go beyond simple information retrieval to actually becoming familiar with data and generatin'g new insights. People don't know enough about what goes into making up their job. Time hasn't been spent with them to tell them why. They've just been told, "Here's the system and here's how to use it." But they have to learn more about their job and more about the systems if The Dominion of the Smart Machine 283 they are going to figure out not only how to get data but what data they need.


pages: 611 words: 188,732

Valley of Genius: The Uncensored History of Silicon Valley (As Told by the Hackers, Founders, and Freaks Who Made It Boom) by Adam Fisher

adjacent possible, Airbnb, Albert Einstein, AltaVista, An Inconvenient Truth, Andy Rubin, AOL-Time Warner, Apple II, Apple Newton, Apple's 1984 Super Bowl advert, augmented reality, autonomous vehicles, Bill Atkinson, Bob Noyce, Brownian motion, Buckminster Fuller, Burning Man, Byte Shop, circular economy, cognitive dissonance, Colossal Cave Adventure, Computer Lib, disintermediation, Do you want to sell sugared water for the rest of your life?, don't be evil, Donald Trump, Douglas Engelbart, driverless car, dual-use technology, Dynabook, Elon Musk, Fairchild Semiconductor, fake it until you make it, fake news, frictionless, General Magic , glass ceiling, Hacker Conference 1984, Hacker Ethic, Henry Singleton, Howard Rheingold, HyperCard, hypertext link, index card, informal economy, information retrieval, Ivan Sutherland, Jaron Lanier, Jeff Bezos, Jeff Rulifson, John Markoff, John Perry Barlow, Jony Ive, Kevin Kelly, Kickstarter, knowledge worker, Larry Ellison, life extension, Marc Andreessen, Marc Benioff, Mark Zuckerberg, Marshall McLuhan, Maui Hawaii, Menlo Park, Metcalfe’s law, Mondo 2000, Mother of all demos, move fast and break things, Neal Stephenson, Network effects, new economy, nuclear winter, off-the-grid, PageRank, Paul Buchheit, paypal mafia, peer-to-peer, Peter Thiel, pets.com, pez dispenser, popular electronics, quantum entanglement, random walk, reality distortion field, risk tolerance, Robert Metcalfe, rolodex, Salesforce, self-driving car, side project, Silicon Valley, Silicon Valley startup, skeuomorphism, skunkworks, Skype, Snow Crash, social graph, social web, South of Market, San Francisco, Startup school, Steve Jobs, Steve Jurvetson, Steve Wozniak, Steven Levy, Stewart Brand, Susan Wojcicki, synthetic biology, Ted Nelson, telerobotics, The future is already here, The Hackers Conference, the long tail, the new new thing, Tim Cook: Apple, Tony Fadell, tulip mania, V2 rocket, We are as Gods, Whole Earth Catalog, Whole Earth Review, Y Combinator

He showed off a way to edit text, a version of e-mail, even a primitive Skype. To modern eyes, Engelbart’s computer system looks pretty familiar, but to an audience used to punch cards and printouts it was a revelation. The computer could be more than a number cruncher; it could be a communications and information-retrieval tool. In one ninety-minute demo Engelbart shattered the military-industrial computing paradigm, and gave the hippies and freethinkers and radicals who were already gathering in Silicon Valley a vision of the future that would drive the culture of technology for the next several decades. Bob Taylor: There was about a thousand or more people in the audience and they were blown away.


Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman

cloud computing, crowdsourcing, en.wikipedia.org, first-price auction, G4S, information retrieval, John Snow's cholera map, Netflix Prize, NP-complete, PageRank, pattern recognition, power law, random walk, recommendation engine, second-price auction, sentiment analysis, social graph, statistical model, the long tail, web application

Widom, Database Systems: The Complete Book Second Edition, Prentice-Hall, Upper Saddle River, NJ, 2009. [4]D.E. Knuth, The Art of Computer Programming Vol. 3 (Sorting and Searching), Second Edition, Addison-Wesley, Upper Saddle River, NJ, 1998. [5]C.P. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008. [6]R.K. Merton, “The Matthew effect in science,” Science 159:3810, pp. 56–63, Jan. 5, 1968. [7]P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison-Wesley, Upper Saddle River, NJ, 2005. 1 This startup attempted to use machine learning to mine large-scale data, and hired many of the top machine-learning people to do so.


pages: 685 words: 203,949

The Organized Mind: Thinking Straight in the Age of Information Overload by Daniel J. Levitin

Abraham Maslow, airport security, Albert Einstein, Amazon Mechanical Turk, Anton Chekhov, autism spectrum disorder, Bayesian statistics, behavioural economics, big-box store, business process, call centre, Claude Shannon: information theory, cloud computing, cognitive bias, cognitive load, complexity theory, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, data science, deep learning, delayed gratification, Donald Trump, en.wikipedia.org, epigenetics, Eratosthenes, Exxon Valdez, framing effect, friendly fire, fundamental attribution error, Golden Gate Park, Google Glasses, GPS: selective availability, haute cuisine, How many piano tuners are there in Chicago?, human-factors engineering, if you see hoof prints, think horses—not zebras, impulse control, index card, indoor plumbing, information retrieval, information security, invention of writing, iterative process, jimmy wales, job satisfaction, Kickstarter, language acquisition, Lewis Mumford, life extension, longitudinal study, meta-analysis, more computing power than Apollo, Network effects, new economy, Nicholas Carr, optical character recognition, Pareto efficiency, pattern recognition, phenotype, placebo effect, pre–internet, profit motive, randomized controlled trial, Rubik’s Cube, Salesforce, shared worldview, Sheryl Sandberg, Skype, Snapchat, social intelligence, statistical model, Steve Jobs, supply-chain management, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Thomas Bayes, traumatic brain injury, Turing test, Twitter Arab Spring, ultimatum game, Wayback Machine, zero-sum game

All bits are created equal After writing this, I discovered the same phrase “all bits are created equal” in Gleick, J. (2011). The information: A history, a theory, a flood. New York, NY: Vintage. Information has thus become separated from meaning Gleick writes “information is divorced from meaning.” He cites the technology philosopher Lewis Mumford from 1970: “Unfortunately, ‘information retrieving,’ however swift, is no substitute for discovering by direct personal inspection knowledge whose very existence one had possibly never been aware of, and following it at one’s own pace through the further ramification of relevant literature.” Gleick, J. (2011). The information: A history, a theory, a flood.


The Code: Silicon Valley and the Remaking of America by Margaret O'Mara

A Declaration of the Independence of Cyberspace, accounting loophole / creative accounting, affirmative action, Airbnb, Alan Greenspan, AltaVista, Alvin Toffler, Amazon Web Services, An Inconvenient Truth, AOL-Time Warner, Apple II, Apple's 1984 Super Bowl advert, autonomous vehicles, back-to-the-land, barriers to entry, Ben Horowitz, Berlin Wall, Big Tech, Black Lives Matter, Bob Noyce, Buckminster Fuller, Burning Man, business climate, Byte Shop, California gold rush, Californian Ideology, carried interest, clean tech, clean water, cloud computing, cognitive dissonance, commoditize, company town, Compatible Time-Sharing System, computer age, Computer Lib, continuous integration, cuban missile crisis, Danny Hillis, DARPA: Urban Challenge, deindustrialization, different worldview, digital divide, Do you want to sell sugared water for the rest of your life?, don't be evil, Donald Trump, Doomsday Clock, Douglas Engelbart, driverless car, Dynabook, Edward Snowden, El Camino Real, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Fairchild Semiconductor, Frank Gehry, Future Shock, Gary Kildall, General Magic , George Gilder, gig economy, Googley, Hacker Ethic, Hacker News, high net worth, hockey-stick growth, Hush-A-Phone, immigration reform, income inequality, industrial research laboratory, informal economy, information retrieval, invention of movable type, invisible hand, Isaac Newton, It's morning again in America, Jeff Bezos, Joan Didion, job automation, job-hopping, John Gilmore, John Markoff, John Perry Barlow, Julian Assange, Kitchen Debate, knowledge economy, knowledge worker, Larry Ellison, Laura Poitras, Lyft, Marc Andreessen, Mark Zuckerberg, market bubble, Mary Meeker, mass immigration, means of production, mega-rich, Menlo Park, Mikhail Gorbachev, military-industrial complex, millennium bug, Mitch Kapor, Mother of all demos, move fast and break things, mutually assured destruction, Neil Armstrong, new economy, Norbert Wiener, old-boy network, Palm Treo, pattern recognition, Paul Graham, Paul Terrell, paypal mafia, Peter Thiel, pets.com, pirate software, popular electronics, pre–internet, prudent man rule, Ralph Nader, RAND corporation, Richard Florida, ride hailing / ride sharing, risk tolerance, Robert Metcalfe, ROLM, Ronald Reagan, Salesforce, Sand Hill Road, Second Machine Age, self-driving car, shareholder value, Sheryl Sandberg, side hustle, side project, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, skunkworks, Snapchat, social graph, software is eating the world, Solyndra, speech recognition, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Strategic Defense Initiative, supercomputer in your pocket, Susan Wojcicki, tacit knowledge, tech billionaire, tech worker, technoutopianism, Ted Nelson, TED Talk, the Cathedral and the Bazaar, the market place, the new new thing, The Soul of a New Machine, There's no reason for any individual to have a computer in his home - Ken Olsen, Thomas L Friedman, Tim Cook: Apple, Timothy McVeigh, transcontinental railway, Twitter Arab Spring, Uber and Lyft, uber lyft, Unsafe at Any Speed, upwardly mobile, Vannevar Bush, War on Poverty, Wargames Reagan, WarGames: Global Thermonuclear War, We wanted flying cars, instead we got 140 characters, Whole Earth Catalog, WikiLeaks, William Shockley: the traitorous eight, work culture , Y Combinator, Y2K

The firm had defense industry roots: founded by Martin Marietta president George Bunker and TRW vice president Simon Ramo, the firm was dedicated to what the two founders termed “a national need in the application of electronics to information handling.” An early client was NASA, for which Bunker Ramo built one of the world’s first computerized information retrieval systems, using the networked computer to classify and categorize large data sets a la Vannevar Bush’s memex.16 At first, the system Bunker Ramo designed for the dealers was simply another digital database that put paper stock tables on line. But when the firm added a feature that allowed brokers to buy and sell over the network, AT&T again cried foul.


pages: 843 words: 223,858

The Rise of the Network Society by Manuel Castells

air traffic controllers' union, Alan Greenspan, Apple II, Asian financial crisis, barriers to entry, Big bang: deregulation of the City of London, Bob Noyce, borderless world, British Empire, business cycle, capital controls, classic study, complexity theory, computer age, Computer Lib, computerized trading, content marketing, creative destruction, Credit Default Swap, declining real wages, deindustrialization, delayed gratification, dematerialisation, deskilling, digital capitalism, digital divide, disintermediation, double helix, Douglas Engelbart, Douglas Engelbart, edge city, experimental subject, export processing zone, Fairchild Semiconductor, financial deregulation, financial independence, floating exchange rates, future of work, gentrification, global village, Gunnar Myrdal, Hacker Ethic, hiring and firing, Howard Rheingold, illegal immigration, income inequality, independent contractor, Induced demand, industrial robot, informal economy, information retrieval, intermodal, invention of the steam engine, invention of the telephone, inventory management, Ivan Sutherland, James Watt: steam engine, job automation, job-hopping, John Markoff, John Perry Barlow, Kanban, knowledge economy, knowledge worker, labor-force participation, laissez-faire capitalism, Leonard Kleinrock, longitudinal study, low skilled workers, manufacturing employment, Marc Andreessen, Marshall McLuhan, means of production, megacity, Menlo Park, military-industrial complex, moral panic, new economy, New Urbanism, offshore financial centre, oil shock, open economy, packet switching, Pearl River Delta, peer-to-peer, planetary scale, popular capitalism, popular electronics, post-Fordism, post-industrial society, Post-Keynesian economics, postindustrial economy, prediction markets, Productivity paradox, profit maximization, purchasing power parity, RAND corporation, Recombinant DNA, Robert Gordon, Robert Metcalfe, Robert Solow, seminal paper, Shenzhen special economic zone , Shoshana Zuboff, Silicon Valley, Silicon Valley startup, social software, South China Sea, South of Market, San Francisco, special economic zone, spinning jenny, statistical model, Steve Jobs, Steve Wozniak, Strategic Defense Initiative, tacit knowledge, technological determinism, Ted Nelson, the built environment, the medium is the message, the new new thing, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, total factor productivity, trade liberalization, transaction costs, urban renewal, urban sprawl, vertical integration, work culture , zero-sum game

From these he accepts only a few dozen each instant, from which to make an image.19 Because of the low definition of TV, McLuhan argued, viewers have to fill in the gaps in the image, thus becoming more emotionally involved in the viewing (what he, paradoxically, characterized as a “cool medium”). Such involvement does not contradict the hypothesis of the least effort because TV appeals to the associative/lyrical mind, not involving the psychological effort of information retrieving and analyzing to which Herbert Simon’s theory refers. This is why Neil Postman, a leading media scholar, considers that television represents an historical rupture with the typographic mind. While print favors systematic exposition, TV is best suited to casual conversation. To make the distinction sharply, in his own words: “Typography has the strongest possible bias towards exposition: a sophisticated ability to think conceptually, deductively and sequentially; a high valuation of reason and order; an abhorrence of contradiction; a large capacity for detachment and objectivity; and a tolerance for delayed response.”20 While for television, “entertainment is the supra-ideology of all discourse on television.


Red Rabbit by Tom Clancy, Scott Brick

anti-communist, battle of ideas, disinformation, diversified portfolio, false flag, Ignaz Semmelweis: hand washing, information retrieval, operational security, union organizing, urban renewal

Not that this would matter all that much to the corpse in question. "Wet" operations interfered with the main mission, which was gathering information. That was something people occasionally forgot, but something that CIA and KGB mainly understood, which was why both agencies had gotten away from it. But when the information retrieved frightened or otherwise upset the politicians who oversaw the intelligence services, then the spook shops were ordered to do things that they usually preferred to avoid—and so, then, they took their action through surrogates and/or mercenaries, mainly… "Arthur, if KGB wants to hurt the Pope, how do you suppose they'd go about it?"


pages: 933 words: 205,691

Hadoop: The Definitive Guide by Tom White

Amazon Web Services, bioinformatics, business intelligence, business logic, combinatorial explosion, data science, database schema, Debian, domain-specific language, en.wikipedia.org, exponential backoff, fallacies of distributed computing, fault tolerance, full text search, functional programming, Grace Hopper, information retrieval, Internet Archive, Kickstarter, Large Hadron Collider, linked data, loose coupling, openstreetmap, recommendation engine, RFID, SETI@home, social graph, sparse data, web application

Here are the contents of MaxTemperatureWithCounters_Temperature.properties: CounterGroupName=Air Temperature Records MISSING.name=Missing MALFORMED.name=Malformed Hadoop uses the standard Java localization mechanisms to load the correct properties for the locale you are running in, so, for example, you can create a Chinese version of the properties in a file named MaxTemperatureWithCounters_Temperature_zh_CN.properties, and they will be used when running in the zh_CN locale. Refer to the documentation for java.util.PropertyResourceBundle for more information. Retrieving counters In addition to being available via the web UI and the command line (using hadoop job -counter), you can retrieve counter values using the Java API. You can do this while the job is running, although it is more usual to get counters at the end of a job run, when they are stable.


Seeking SRE: Conversations About Running Production Systems at Scale by David N. Blank-Edelman

Affordable Care Act / Obamacare, algorithmic trading, AlphaGo, Amazon Web Services, backpropagation, Black Lives Matter, Bletchley Park, bounce rate, business continuity plan, business logic, business process, cloud computing, cognitive bias, cognitive dissonance, cognitive load, commoditize, continuous integration, Conway's law, crowdsourcing, dark matter, data science, database schema, Debian, deep learning, DeepMind, defense in depth, DevOps, digital rights, domain-specific language, emotional labour, en.wikipedia.org, exponential backoff, fail fast, fallacies of distributed computing, fault tolerance, fear of failure, friendly fire, game design, Grace Hopper, imposter syndrome, information retrieval, Infrastructure as a Service, Internet of things, invisible hand, iterative process, Kaizen: continuous improvement, Kanban, Kubernetes, loose coupling, Lyft, machine readable, Marc Andreessen, Maslow's hierarchy, microaggression, microservices, minimum viable product, MVC pattern, performance metric, platform as a service, pull request, RAND corporation, remote working, Richard Feynman, risk tolerance, Ruby on Rails, Salesforce, scientific management, search engine result page, self-driving car, sentiment analysis, Silicon Valley, single page application, Snapchat, software as a service, software is eating the world, source of truth, systems thinking, the long tail, the scientific method, Toyota Production System, traumatic brain injury, value engineering, vertical integration, web application, WebSocket, zero day

One of their first questions would be something like, “Who are our customers? And why is getting the response in 10 seconds important for them?” Despite the fact that these questions came primarily from the business perspective, the information questions like these reveal can change the game dramatically. What if this service is for an “information retrieval” development team whose purpose is to address the necessity of content validation on the search engine results page, to make sure that the new index serves only live links? And what if we download a page with a million links on it? Now we can see the conflict between the priorities in the SLA and those of the service’s purposes.


pages: 864 words: 222,565

Inventor of the Future: The Visionary Life of Buckminster Fuller by Alec Nevala-Lee

Adam Neumann (WeWork), Airbnb, Albert Einstein, Alvin Toffler, American energy revolution, Apple II, basic income, Biosphere 2, blockchain, British Empire, Buckminster Fuller, Burning Man, Charles Lindbergh, cloud computing, Columbine, complexity theory, Computer Lib, coronavirus, cotton gin, COVID-19, cryptocurrency, declining real wages, digital nomad, double helix, Douglas Engelbart, Douglas Engelbart, East Village, Electric Kool-Aid Acid Test, Elon Musk, Evgeny Morozov, Frank Gehry, gentrification, gig economy, global village, Golden Gate Park, Henry Ford's grandson gave labor union leader Walter Reuther a tour of the company’s new, automated factory…, hydraulic fracturing, index card, information retrieval, James Dyson, Jane Jacobs, Jaron Lanier, Jeff Bezos, John Markoff, Kitchen Debate, Lao Tzu, lateral thinking, Lean Startup, Lewis Mumford, Mark Zuckerberg, Marshall McLuhan, megastructure, Menlo Park, minimum viable product, Mother of all demos, Neil Armstrong, New Journalism, Norbert Wiener, Norman Mailer, Own Your Own Home, Paul Graham, public intellectual, Ralph Waldo Emerson, reality distortion field, remote working, Ronald Reagan, side project, Silicon Valley, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Ted Nelson, the built environment, The Death and Life of Great American Cities, the medium is the message, Thomas Malthus, universal basic income, urban planning, urban renewal, We are as Gods, WeWork, Whole Earth Catalog, WikiLeaks

Unlike the student activists of the New Left, who emphasized politics and protest, its fans—described by one of Brand’s colleagues as “baling wire hippies”—were drawn to technology, and they would advance far beyond Fuller’s sense of what computers could be. On December 9, 1968, Brand assisted with a talk at the Joint Computer Conference in San Francisco by Douglas Engelbart, who treated computers as tools for communication and information retrieval, rather than for data processing alone. Along with advising on logistics, Brand operated a camera that provided a live feed from Menlo Park as Engelbart demonstrated windows, hypertext, and the mouse. At first, its impact was limited to a handful of researchers, but the presentation would be known one day as the Mother of All Demos


pages: 857 words: 232,302

The Evolutionary Void by Peter F. Hamilton

clean water, information retrieval, Kickstarter, megacity, Neil Armstrong, orbital mechanics / astrodynamics, pattern recognition, plutocrats, trade route, urban sprawl

Whatever.” The Delivery Man was mildly puzzled by Gore’s lack of focus. It wasn’t like him at all. “All right. So what I was thinking is that there has to be some kind of web and database in the cities.” “There is. You can’t access it.” “Why not?” “The AIs are sentient. They won’t allow any information retrieval.” “That’s stupid.” “From our point of view, yes, but they’re the same as the borderguards: They maintain the homeworld’s sanctity; the AIs keep the Anomine’s information safe.” “Why?” “Because that’s what the Anomine do; that’s what they are. They’re entitled to protect what they’ve built, same as anyone.”


pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil

additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business cycle, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Charles Babbage, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, coronavirus, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, digital divide, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, functional programming, George Gilder, Gödel, Escher, Bach, Hans Moravec, hype cycle, informal economy, information retrieval, information security, invention of the telephone, invention of the telescope, invention of writing, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Marshall McLuhan, Mikhail Gorbachev, Mitch Kapor, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Nick Bostrom, Norbert Wiener, oil shale / tar sands, optical character recognition, PalmPilot, pattern recognition, phenotype, power law, precautionary principle, premature optimization, punch-card reader, quantum cryptography, quantum entanglement, radical life extension, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Robert Metcalfe, Rodney Brooks, scientific worldview, Search for Extraterrestrial Intelligence, selection bias, semantic web, seminal paper, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, Stuart Kauffman, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, two and twenty, Vernor Vinge, Y2K, Yogi Berra

John Smith, director of the ABC Institute—you last saw him six months ago at the XYZ conference" or, "That's the Time-Life Building—your meeting is on the tenth floor." We'll have real-time translation of foreign languages, essentially subtitles on the world, and access to many forms of online information integrated into our daily activities. Virtual personalities that overlay the real world will help us with information retrieval and our chores and transactions. These virtual assistants won't always wait for questions and directives but will step forward if they see us struggling to find a piece of information. (As we wonder about "That actress ... who played the princess, or was it the queen ... in that movie with the robot," our virtual assistant may whisper in our ear or display in our visual field of view: "Natalie Portman as Queen Amidala in Star Wars, episodes 1, 2, and 3.")


pages: 496 words: 174,084

Masterminds of Programming: Conversations With the Creators of Major Programming Languages by Federico Biancuzzi, Shane Warden

Benevolent Dictator For Life (BDFL), business intelligence, business logic, business process, cellular automata, cloud computing, cognitive load, commoditize, complexity theory, conceptual framework, continuous integration, data acquisition, Dennis Ritchie, domain-specific language, Douglas Hofstadter, Fellow of the Royal Society, finite state, Firefox, follow your passion, Frank Gehry, functional programming, general-purpose programming language, Guido van Rossum, higher-order functions, history of Unix, HyperCard, industrial research laboratory, information retrieval, information security, iterative process, Ivan Sutherland, John von Neumann, Ken Thompson, Larry Ellison, Larry Wall, linear programming, loose coupling, machine readable, machine translation, Mars Rover, millennium bug, Multics, NP-complete, Paul Graham, performance metric, Perl 6, QWERTY keyboard, RAND corporation, randomized controlled trial, Renaissance Technologies, Ruby on Rails, Sapir-Whorf hypothesis, seminal paper, Silicon Valley, slashdot, software as a service, software patent, sorting algorithm, SQL injection, Steve Jobs, traveling salesman, Turing complete, type inference, Valgrind, Von Neumann architecture, web application

When I write a line of code, I need to rely on understanding what it’s going to do. Don: Well, there are applications where determinism is important and applications where it is not. Traditionally there has been a dividing line between what you might call databases and what you might call information retrieval. Certainly both of those are flourishing fields and they have their respective uses. XQuery and XML Will XML affect the way we use search engines in the future? Don: I think it’s possible. Search engines already exploit the kinds of metadata that are included in HTML tags such as hyperlinks.


pages: 1,201 words: 233,519

Coders at Work by Peter Seibel

Ada Lovelace, Bill Atkinson, bioinformatics, Bletchley Park, Charles Babbage, cloud computing, Compatible Time-Sharing System, Conway's Game of Life, Dennis Ritchie, domain-specific language, don't repeat yourself, Donald Knuth, fallacies of distributed computing, fault tolerance, Fermat's Last Theorem, Firefox, Free Software Foundation, functional programming, George Gilder, glass ceiling, Guido van Rossum, history of Unix, HyperCard, industrial research laboratory, information retrieval, Ken Thompson, L Peter Deutsch, Larry Wall, loose coupling, Marc Andreessen, Menlo Park, Metcalfe's law, Multics, no silver bullet, Perl 6, premature optimization, publish or perish, random walk, revision control, Richard Stallman, rolodex, Ruby on Rails, Saturday Night Live, side project, slashdot, speech recognition, systems thinking, the scientific method, Therac-25, Turing complete, Turing machine, Turing test, type inference, Valgrind, web application

If you don't feel really pretty comfortable swimming around in that world, maybe programming isn't what you should be doing. Seibel: Did you have any important mentors? Deutsch: There were two people. One of them is someone who's no longer around; his name was Calvin Mooers. He was an early pioneer in information systems. I believe he is credited with actually coining the term information retrieval. His background was originally in library science. I met him when I was, I think, high-school or college age. He had started to design a programming language that he thought would be usable directly by just people. But he didn't know anything about programming languages. And at that point, I did because I had built this Lisp system and I'd studied some other programming languages.


pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton

1960s counterculture, 3D printing, 4chan, Ada Lovelace, Adam Curtis, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Robotics, Amazon Web Services, Andy Rubin, Anthropocene, augmented reality, autonomous vehicles, basic income, Benevolent Dictator For Life (BDFL), Berlin Wall, bioinformatics, Biosphere 2, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, capitalist realism, carbon credits, carbon footprint, carbon tax, carbon-based life, Cass Sunstein, Celebration, Florida, Charles Babbage, charter city, clean water, cloud computing, company town, congestion pricing, connected car, Conway's law, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, digital capitalism, digital divide, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Douglas Engelbart, driverless car, Edward Snowden, Elon Musk, en.wikipedia.org, Eratosthenes, Ethereum, ethereum blockchain, Evgeny Morozov, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, fulfillment center, functional programming, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, high-speed rail, Hyperloop, Ian Bogost, illegal immigration, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Jacob Appelbaum, James Bridle, Jaron Lanier, Joan Didion, John Markoff, John Perry Barlow, Joi Ito, Jony Ive, Julian Assange, Khan Academy, Kim Stanley Robinson, Kiva Systems, Laura Poitras, liberal capitalism, lifelogging, linked data, lolcat, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megaproject, megastructure, Menlo Park, Minecraft, MITM: man-in-the-middle, Monroe Doctrine, Neal Stephenson, Network effects, new economy, Nick Bostrom, ocean acidification, off-the-grid, offshore financial centre, oil shale / tar sands, Oklahoma City bombing, OSI model, packet switching, PageRank, pattern recognition, peak oil, peer-to-peer, performance metric, personalized medicine, Peter Eisenman, Peter Thiel, phenotype, Philip Mirowski, Pierre-Simon Laplace, place-making, planetary scale, pneumatic tube, post-Fordism, precautionary principle, RAND corporation, recommendation engine, reserve currency, rewilding, RFID, Robert Bork, Sand Hill Road, scientific management, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, skeuomorphism, Slavoj Žižek, smart cities, smart grid, smart meter, Snow Crash, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, synthetic biology, TaskRabbit, technological determinism, TED Talk, the built environment, The Chicago School, the long tail, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, undersea cable, universal basic income, urban planning, Vernor Vinge, vertical integration, warehouse automation, warehouse robotics, Washington Consensus, web application, Westphalian system, WikiLeaks, working poor, Y Combinator, yottabyte

., telcos, states, standards bodies, hardware original equipment manufacturers, and cloud software platforms) all play different roles and control hardware and software applications in different ways and toward different ends. Internet backbone is generally provided and shared by tier 1 bandwidth providers (such as telcos), but one key trend is for very large platforms, such as Google, to bypass other actors and architect complete end-to-end networks, from browser, to fiber, to data center, such that information retrieval, composition, and analysis are consolidated and optimized on private loops. Consider that if Google's own networks, both internal and external, were compared to others, they would represent one of the largest Internet service providers in the world, and by the time this sentence is published, they may very well be the largest.


pages: 891 words: 253,901

The Devil's Chessboard: Allen Dulles, the CIA, and the Rise of America's Secret Government by David Talbot

Albert Einstein, anti-communist, Berlin Wall, Bletchley Park, Bretton Woods, British Empire, Charles Lindbergh, colonial rule, Cornelius Vanderbilt, cuban missile crisis, disinformation, Dr. Strangelove, drone strike, independent contractor, information retrieval, Internet Archive, land reform, means of production, Naomi Klein, Norman Mailer, operation paperclip, Ralph Waldo Emerson, RAND corporation, Ted Sorensen

Army and the CIA. The top secret work conducted by the SO Division included research on LSD-induced mind control, assassination toxins, and biological warfare agents like those allegedly being used in Korea. Olson’s division also was involved in research that was euphemistically labeled “information retrieval”—extreme methods of extracting intelligence from uncooperative captives. For the past two years, Olson had been traveling to secret centers in Europe where Soviet prisoners and other human guinea pigs were subjected to these experimental interrogation methods. Dulles began spearheading this CIA research even before he became director of the agency, under a secret program that preceded MKULTRA code-named Operation Artichoke, after the spymaster’s favorite vegetable.


pages: 982 words: 221,145

Ajax: The Definitive Guide by Anthony T. Holdener

AltaVista, Amazon Web Services, business logic, business process, centre right, Citizen Lab, Colossal Cave Adventure, create, read, update, delete, database schema, David Heinemeier Hansson, en.wikipedia.org, Firefox, full text search, game design, general-purpose programming language, Guido van Rossum, information retrieval, loose coupling, machine readable, MVC pattern, Necker cube, p-value, Ruby on Rails, SimCity, slashdot, social bookmarking, sorting algorithm, SQL injection, Wayback Machine, web application

You know nothing about this site until you dig further by following the links on the page. The point of a business site’s main page is to grab your attention with a central focus: We do web design. Our specialty is architectural engineering. We sell fluffy animals. Regardless of the focus, it should be readily apparent. * Chris Roast, “Designing for Delay in Interactive Information Retrieval,” Interacting with Computers 10 (1998): 87–104. “Need for Speed I,” Zona Research, Zona Market Bulletin (1999). “Need for Speed II,” Zona Research, Zona Market Bulletin (2001). Jonathan Klein, Youngme Moon, and Rosalind W. Picard, “This Computer Responds to User Frustration: Theory, Design, and Results,” Interacting with Computers 14 (2) (2002): 119–140. 144 | Chapter 6: Designing Ajax Interfaces Obscurity This can cover two different problems you do not want for your application.


pages: 1,073 words: 314,528

Strategy: A History by Lawrence Freedman

Albert Einstein, anti-communist, Anton Chekhov, Ayatollah Khomeini, barriers to entry, battle of ideas, behavioural economics, Black Swan, Blue Ocean Strategy, British Empire, business process, butterfly effect, centre right, Charles Lindbergh, circulation of elites, cognitive dissonance, coherent worldview, collective bargaining, complexity theory, conceptual framework, Cornelius Vanderbilt, corporate raider, correlation does not imply causation, creative destruction, cuban missile crisis, Daniel Kahneman / Amos Tversky, defense in depth, desegregation, disinformation, Dr. Strangelove, Edward Lorenz: Chaos theory, en.wikipedia.org, endogenous growth, endowment effect, escalation ladder, Ford Model T, Ford paid five dollars a day, framing effect, Frederick Winslow Taylor, Gordon Gekko, greed is good, Herbert Marcuse, Herman Kahn, Ida Tarbell, information retrieval, interchangeable parts, invisible hand, John Nash: game theory, John von Neumann, Kenneth Arrow, lateral thinking, linear programming, loose coupling, loss aversion, Mahatma Gandhi, means of production, mental accounting, Murray Gell-Mann, mutually assured destruction, Nash equilibrium, Nelson Mandela, Norbert Wiener, Norman Mailer, oil shock, Pareto efficiency, performance metric, Philip Mirowski, prisoner's dilemma, profit maximization, race to the bottom, Ralph Nader, RAND corporation, Richard Thaler, road to serfdom, Ronald Reagan, Rosa Parks, scientific management, seminal paper, shareholder value, social contagion, social intelligence, Steven Pinker, strikebreaker, The Chicago School, The Myth of the Rational Market, the scientific method, theory of mind, Thomas Davenport, Thomas Kuhn: the structure of scientific revolutions, Torches of Freedom, Toyota Production System, transaction costs, Twitter Arab Spring, ultimatum game, unemployed young men, Upton Sinclair, urban sprawl, Vilfredo Pareto, W. E. B. Du Bois, War on Poverty, women in the workforce, Yogi Berra, zero-sum game

They operated quickly and automatically when needed, managing cognitive tasks of great complexity and evaluating situations and options before they reached consciousness. This referred to not one but a number of processes, perhaps with different evolutionary roots, ranging from simple forms of information retrieval to complex mental representations.43 They all involved the extraordinary computational and storage power of the brain, drawing on past learning and experiences, picking up on and interpreting cues and signals from the environment, suggesting appropriate and effective behavior, and enabling individuals to cope with the circumstances in which they might find themselves without having to deliberate on every move.


pages: 1,199 words: 332,563

Golden Holocaust: Origins of the Cigarette Catastrophe and the Case for Abolition by Robert N. Proctor

"RICO laws" OR "Racketeer Influenced and Corrupt Organizations", bioinformatics, carbon footprint, clean water, corporate social responsibility, Deng Xiaoping, desegregation, disinformation, Dr. Strangelove, facts on the ground, friendly fire, germ theory of disease, global pandemic, index card, Indoor air pollution, information retrieval, invention of gunpowder, John Snow's cholera map, language of flowers, life extension, New Journalism, optical character recognition, pink-collar, Ponzi scheme, Potemkin village, precautionary principle, publication bias, Ralph Nader, Ronald Reagan, selection bias, speech recognition, stem cell, telemarketer, Thomas Kuhn: the structure of scientific revolutions, Triangle Shirtwaist Factory, Upton Sinclair, vertical integration, Yogi Berra

The Ad Hoc Committee was also responsible for helping to locate medical witnesses and prepare testimony. Edwin Jacob from Jacob, Medinger & Finnegan supervised the Central File with financial support from all parties to the conspiracy. Responsibility for maintaining the Central File Information Center in 1971 was transferred to the CTR, which managed “informational retrieval” and maintenance through a CTR Special Project, organized as part of a new Information Systems division, by which means the CTR became a crucial resource for the industry’s effort to defend itself against litigation. See Kessler’s “Amended Final Opinion,” pp. 165–68. 46. “Congressional Preparation,” Jan. 26, 1968, Bates 955007434–7439; F.


pages: 2,054 words: 359,149

The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities by Justin Schuh

address space layout randomization, Albert Einstein, Any sufficiently advanced technology is indistinguishable from magic, bash_history, business logic, business process, database schema, Debian, defense in depth, en.wikipedia.org, Firefox, information retrieval, information security, iterative process, Ken Thompson, loose coupling, MITM: man-in-the-middle, Multics, MVC pattern, off-by-one error, operational security, OSI model, RFC: Request For Comment, slashdot, SQL injection, web application

In fact, this type of error is even more relevant in RPC because many factors can cause impersonation functions to fail. Context Handles and State Before you go any further, you need to see how RPC keeps state information about connected clients. RPC is inherently stateless, but it does provide explicit mechanisms for maintaining state. This state information might include session information retrieved from a database or information on whether a client has called procedures in the correct sequence. The typical RPC mechanism for maintaining state is the context handle, a unique token a client can supply to a server that’s similar in function to a session ID stored in an HTTP cookie. From the server’s point of view, the context handle is a pointer to the associated data for that client, so no special translation of the context handle is necessary.


The Art of Computer Programming by Donald Ervin Knuth

Abraham Wald, Brownian motion, Charles Babbage, complexity theory, correlation coefficient, Donald Knuth, Eratosthenes, G4S, Georg Cantor, information retrieval, Isaac Newton, iterative process, John von Neumann, Louis Pasteur, mandelbrot fractal, Menlo Park, NP-complete, P = NP, Paul Erdős, probability theory / Blaise Pascal / Pierre de Fermat, RAND corporation, random walk, sorting algorithm, Turing machine, Y2K

I. Collision test. Chi-square tests can be made only when a nontrivial number of items are expected in each category. But another kind of test can be used when the number of categories is much larger than the number of observations; this test is related to "hashing," an important method for information retrieval that we shall study in Section 6.4. Suppose we have m urns and we throw n balls at random into those urns, where m is much greater than n. Most of the balls will land in urns that were previously empty, but if a ball falls into an urn that already contains at least one ball we say that a "collision" has occurred.


pages: 889 words: 433,897

The Best of 2600: A Hacker Odyssey by Emmanuel Goldstein

affirmative action, Apple II, benefit corporation, call centre, disinformation, don't be evil, Firefox, game design, Hacker Ethic, hiring and firing, information retrieval, information security, John Markoff, John Perry Barlow, late fees, license plate recognition, Mitch Kapor, MITM: man-in-the-middle, Oklahoma City bombing, optical character recognition, OSI model, packet switching, pirate software, place-making, profit motive, QWERTY keyboard, RFID, Robert Hanssen: Double agent, rolodex, Ronald Reagan, satellite internet, Silicon Valley, Skype, spectrum auction, statistical model, Steve Jobs, Steve Wozniak, Steven Levy, Telecommunications Act of 1996, telemarketer, undersea cable, UUNET, Y2K

Immediately, you’ll get a list of everyone with that name, as well as their city and state, which often don’t fit properly on the line. There are no reports of any wildcards that allow you to see everybody at once. (The closest thing is *R, which will show all of the usernames that you’re sending to.) It’s also impossible for a user not to be seen if you get his name or alias right. It’s a good free information retrieval system. But there’s more. MCI Mail can also be used as a free word processor of sorts. The system will allow you to enter a letter, or for that matter, a manuscript. You can then hang up and do other things, come back within 24 hours, and your words will still be there. You can conceivably list them out using your own printer on a fresh sheet of paper and send it through the mail all by yourself, thus sparing MCI Mail’s laser printer the trouble.