optical character recognition

Mastering Machine Learning With Scikit-Learn by Gavin Hackeling

backpropagation, computer vision, constrained optimization, correlation coefficient, data science, Debian, deep learning, distributed generation, iterative process, natural language processing, Occam's razor, optical character recognition, performance metric, recommendation engine

Extracting features from pixel intensities A digital image is usually a raster, or pixmap, that maps colors to coordinates on a grid. An image can be viewed as a matrix in which each element represents a color. A basic feature representation for an image can be constructed by reshaping the matrix into a vector by concatenating its rows together. Optical character recognition (OCR) is a canonical machine learning problem. Let's use this technique to create basic feature representations that could be used in an OCR application for recognizing hand-written digits in character-delimited forms. The digits dataset included with scikit-learn contains grayscale images of more than 1,700 hand-written digits between zero and nine.

…

You learned to preprocess text by filtering stop words and stemming tokens, and you also replaced the term counts in our feature vectors with TF-IDF weights that penalize common words and normalize for documents of different lengths. Next, we created feature vectors for images. We began with an optical character recognition problem in which we represented images of hand-written digits with flattened matrices of pixel intensities. This is a computationally costly approach. We improved our representations of images by extracting only their most interesting points as SURF descriptors. Finally, you learned to standardize data to ensure that our estimators can learn from all of the explanatory variables and can converge as quickly as possible.

…

The SMO algorithm breaks the optimization problem down into a series of the smallest possible subproblems, which are then solved analytically. [ 178 ] www.it-ebooks.info Chapter 9 Classifying characters in scikit-learn Let's apply support vector machines to a classification problem. In recent years, support vector machines have been used successfully in the task of character recognition. Given an image, the classifier must predict the character that is depicted. Character recognition is a component of many optical character-recognition systems. Even small images require high-dimensional representations when raw pixel intensities are used as features. If the classes are linearly inseparable and must be mapped to a higher-dimensional feature space, the dimensions of the feature space can become even larger. Fortunately, SVMs are suited to working with such data efficiently.

Programming Computer Vision with Python by Jan Erik Solem

augmented reality, computer vision, database schema, en.wikipedia.org, optical character recognition, pattern recognition, text mining, Thomas Bayes, web application

Now if we apply PCA to reduce the dimensions to 50, as we did in 8.2 Bayes Classifier, this changes the accuracy to: Accuracy: 0.890052356021 Not bad, seeing that the feature vectors are about 200 times smaller than the original data (and the space to store the support vectors then also 200 times less). 8.4 Optical Character Recognition As an example of a multi-class problem, let’s look at interpreting images of Sudokus. Optical character recognition (OCR) is the process of interpreting images of hand- or machine-written text. A common example is text extraction from scanned documents such as zip-codes on letters or book pages such as the library volumes in Google Books (http://books.google.com/).

…

What You Will Learn Hands-on programming with images using Python. Computer vision techniques behind a wide variety of real-world applications. Many of the fundamental algorithms and how to implement and apply them yourself. The code examples in this book will show you object recognition, content-based image retrieval, image search, optical character recognition, optical flow, tracking, 3D reconstruction, stereo imaging, augmented reality, pose estimation, panorama creation, image segmentation, de-noising, image grouping, and more. Chapter Overview Chapter 1 Introduces the basic tools for working with images and the central Python modules used in the book.

…

If you have two cameras, mount them in a stereo rig setting and capture stereo image pairs using cv2.VideoCapture() with different video device ids. Try 0 and 1 for starters. Compute depth maps for some varying scenes. Use Hu moments with cv2.HuMoments() as features for the Sudoku OCR classification problem in 8.4 Optical Character Recognition and check the performance. OpenCV has an implementation of the Grab Cut segmentation algorithm. Use the function cv2.grabCut() on the Microsoft Research Grab Cut dataset (see 9.1 Graph Cuts). Hopefully you will get better results than the low-resolution segmentation in our examples.

pages: 291 words: 77,596

Total Recall: How the E-Memory Revolution Will Change Everything by Gordon Bell, Jim Gemmell

airport security, Albert Einstein, book scanning, cloud computing, Computing Machinery and Intelligence, conceptual framework, Douglas Engelbart, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, Ivan Sutherland, John Markoff, language acquisition, lifelogging, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Steve Bannon, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application

Scanned documents are image files, not text files, and as such, they’re invisible to keyword searches. But with thousands upon thousands of documents in my e-memory, keyword searching would be the only way to re-locate an old file that I could only recollect one or two fragments of, such as a name, a dollar amount, or a dateline. So I ran all the scanned documents through optical character recognition (OCR) software, which is able to recognize written letters and numbers in an image and reconstruct them in a text file. What I ended up with were thousands upon thousands of text files that were neatly interleaved among the scanned files. Now I just needed desktop search software, that is, software that would allow me to search through my thousands of files for some desired text, just like you search for Web pages now using Yahoo or Google.

…

Unlike a digital camera, it always gets the lighting right. Some stuff just won’t fit in any scanner, though, so sometimes you will have to use a camera; if you can get it outdoors on a cloudy day, you can often get it nicely lit without reflections. Finally, make sure your scanner software is performing optical character recognition (OCR) on the scanned pages so that later the computer will be able to search for the text inside them. DEALING WITH WHAT YO U ALREADY HAVE Properly equipped, you are now ready to convert your old analog life’s worth of papers and memorabilia to digital form. Set a goal of being paperless within a year.

…

Nike 1984 (Orwell) Nintendo Nixon, Richard Nokia Norman, Donald Northeastern University Northrup, Christiane note taking notebook computers . See also laptops O The Observers (Williamson) Office of Scientific Research and Development OneNote open systems operating systems optical character recognition (OCR) oral histories organic light-emitting polymer technology organization of data. See also files-and-folders organization automatic summarization categorization schemes and clutter and data analysis and DSpace and electronic memory and implementation of Total Recall and indexing and lifelong learning and lifetime periods and scanned documents Ornish, Dean Orwell, George OS X, Otlet, Paul Outlook ownership of data .

pages: 189 words: 57,632

Content: Selected Essays on Technology, Creativity, Copyright, and the Future of the Future by Cory Doctorow

AltaVista, AOL-Time Warner, book scanning, Brewster Kahle, Burning Man, cognitive load, drop ship, en.wikipedia.org, general purpose technology, informal economy, information retrieval, Internet Archive, invention of movable type, Jeff Bezos, John Gilmore, John Perry Barlow, Law of Accelerating Returns, machine readable, Metcalfe's law, mirror neurons, Mitch Kapor, moral panic, mutually assured destruction, Neal Stephenson, new economy, optical character recognition, PalmPilot, patent troll, pattern recognition, peer-to-peer, Ponzi scheme, post scarcity, QWERTY keyboard, Ray Kurzweil, RFID, Sand Hill Road, Skype, slashdot, Snow Crash, social software, speech recognition, Steve Jobs, the long tail, Thomas Bayes, Turing test, Vernor Vinge, Wayback Machine

Will it further decentralize decision-making for artists? And for SF writers and fans, the further question is, "Will it be any good to our chosen medium?" Like I said, science fiction is the only literature people care enough about to steal on the Internet. It's the only literature that regularly shows up, scanned and run through optical character recognition software and lovingly hand-edited on darknet newsgroups, Russian websites, IRC channels and elsewhere (yes, there's also a brisk trade in comics and technical books, but I'm talking about prose fiction here — though this is clearly a sign of hope for our friends in tech publishing and funnybooks).

…

One meaning for that word is "legitimate" ebook ventures, that is to say, rightsholder-authorized editions of the texts of books, released in a proprietary, use-restricted format, sometimes for use on a general-purpose PC and sometimes for use on a special-purpose hardware device like the nuvoMedia Rocketbook [ROCKETBOOK]. The other meaning for ebook is a "pirate" or unauthorized electronic edition of a book, usually made by cutting the binding off of a book and scanning it a page at a time, then running the resulting bitmaps through an optical character recognition app to convert them into ASCII text, to be cleaned up by hand. These books are pretty buggy, full of errors introduced by the OCR. A lot of my colleagues worry that these books also have deliberate errors, created by mischievous book-rippers who cut, add or change text in order to "improve" the work.

…

More importantly, the free e-book skeptics have no evidence to offer in support of their position — just hand-waving and dark muttering about a mythological future when book-lovers give up their printed books for electronic book-readers (as opposed to the much more plausible future where book lovers go on buying their fetish objects and carry books around on their electronic devices). I started giving away e-books after I witnessed the early days of the "bookwarez" scene, wherein fans cut the binding off their favorite books, scanned them, ran them through optical character recognition software, and manually proofread them to eliminate the digitization errors. These fans were easily spending 80 hours to rip their favorite books, and they were only ripping their favorite books, books they loved and wanted to share. (The 80-hour figure comes from my own attempt to do this — I'm sure that rippers get faster with practice.)

pages: 255 words: 78,207

Web Scraping With Python: Collecting Data From the Modern Web by Ryan Mitchell

AltaVista, Amazon Web Services, Apollo 13, cloud computing, Computing Machinery and Intelligence, data science, en.wikipedia.org, Firefox, Guido van Rossum, information security, machine readable, meta-analysis, natural language processing, optical character recognition, random walk, self-driving car, Turing test, web application

Hamidi, 227 intellectual property, 217-219 234 internal links crawling an entire site, 35-40 crawling with Scrapy, 45-48 traversing a single domain, 31-35 Internet about, 213-216 cautions downloading files from, 74 crawling across, 40-45 moving forward, 206 IP address blocking, avoiding, 199-200 ISO character sets, 96-98 is_displayed function, 186 Item object, 46, 48 items.py file, 46 | Index lambda expressions, 28, 74 legalities of web scraping, 217-230 lexicographical analysis with NLTK, 132-136 libraries bundling with projects, 7 OCR support, 161-164 logging with Scrapy, 48 logins about, 137 handling, 142-143 troubleshooting, 187 lxml library, 29 M machine learning, 135, 180 machine training, 135, 171-174 Markov text generators, 123-129 media files, storing, 71-74 Mersenne Twister algorithm, 34 methods (HTTP), 51 Microsoft SQL Server, 76 Microsoft Word, 102-105 MIME (Multipurpose Internet Mail Exten‐ sions) protocol, 90 MIMEText object, 90 MySQL about, 76 basic commands, 79-82 database techniques, 85-87 installing, 77-79 integrating with Python, 82-85 Wikipedia example, 87-89 N name attribute, 140 natural language processing about, 119 additional resources, 136 Markov models, 123-129 Natural Language Toolkit, 129-136 summarizing data, 120-123 Natural Language Toolkit (NLTK) about, 129 installation and setup, 129 lexicographical analysis, 132-136 statistical analysis, 130-132 NavigableString object, 18 navigating trees, 18-22 network connections about, 3-5 connecting reliably, 9-11 security considerations, 181 next_siblings() function, 21 ngrams module, 132 n-grams, 109-112, 120 NLTK (Natural Language Toolkit) about, 129 installation and setup, 129 lexicographical analysis, 132-136 statistical analysis, 130-132 NLTK Downloader interface, 130 NLTK module, 129 None object, 10 normalizing data, 112-113 NumPy library, 164 O OAuth authentication, 57 OCR (optical character recognition) about, 161 library support, 162-164 OpenRefine Expression Language (GREL), 116 OpenRefine tool about, 114 cleaning data, 116-118 filtering data, 115-116 installing, 114 usage considerations, 114 optical character recognition (OCR) about, 161 library support, 162-164 Oracle DBMS, 76 OrderedDict object, 112 os module, 74 P page load times, 154, 182 parentheses (), 25 parents (tags), 20, 22 parsing HTML pages (see HTML parsing) parsing JSON, 63 patents, 217 pay-per-hour computing instances, 205 PDF files, 100-102 PDFMiner3K library, 101 Penn Treebank Project, 133 period (.), 25 Peters, Tim, 211 PhantomJS tool, 152-155, 203 PIL (Python Imaging Library), 162 Pillow library about, 162 processing well-formatted text, 165-169 pipe (|), 25 plus sign (+), 25 POST method (HTTP) about, 51 tracking requests, 140 troubleshooting, 186 variable names and, 138 viewing form parameters, 140 Index | 235 previous_siblings() function, 21 primary keys in tables, 85 programming languages, regular expressions and, 27 projects, bundling with libraries, 7 pseudorandom number generators, 34 PUT method (HTTP), 51 PyMySQL library, 82-85 PySocks module, 202 Python Imaging Library (PIL), 162 Python language, installing, 209-211 Q query time versus database size, 86 quotation marks ("), 17 R random number generators, 34 random seeds, 34 rate limits about, 52 Google APIs, 60 Twitter API, 55 reading documents document encoding, 93 Microsoft Word, 102-105 PDF files, 100 text files, 94-98 recursion limit, 38, 89 redirects, 44, 158 Referrer header, 179 RegexPal website, 24 regular expressions about, 22-27 BeautifulSoup example, 27 commonly used symbols, 25 programming languages and, 27 relational data, 77 remote hosting running from a website hosting account, 203 running from the cloud, 204 remote servers avoiding IP address blocking, 199-200 extensibility and, 200 portability and, 200 PySocks and, 202 Tor and, 201-202 Requests library 236 | Index about, 137 auth module, 144 installing, 138, 179 submitting forms, 138 tracking cookies, 142-143 requests module, 179-181 responses, API calls and, 52 Robots Exclusion Standard, 223 robots.txt file, 138, 167, 222-225, 229 S safe harbor protection, 219, 230 Scrapy library, 45-48 screenshots, 197 script tag, 147 search engine optimization (SEO), 222 searching text data, 135 security considerations copyright law and, 219 forms and, 183-186 handling cookies, 181 SELECT statement, 79, 81 Selenium library about, 143 elements and, 153, 194 executing JavaScript, 152-156 handling redirects, 158 security considerations, 185 testing example, 193-198 Tor support, 203 semicolon (;), 210 SEO (search engine optimization), 222 server-side processing handling redirects, 44, 158 scripting languages and, 147 sets, 67 siblings (tags), 21 Simple Mail Transfer Protocol (SMTP), 90 site maps, 36 Six Degrees of Wikipedia, 31-35 SMTP (Simple Mail Transfer Protocol), 90 smtplib package, 90 sorted function, 112 span tag, 15 Spitler, Daniel, 227 SQL Server (Microsoft), 76 square brackets [], 25 src attribute, 28, 72, 74 StaleElementReferenceException, 158 statistical analysis with NLTK, 130-132 storing data (see data management) StringIO object, 99 strings, regular expressions and, 22-28 stylesheets about, 14, 216 dynamic HTML and, 151 hidden fields and, 184 Surface Web, 36 trademarks, 218 traversing the Web (see web crawlers) tree navigation, 18-22 trespass to chattels, 219-220, 226 trigrams module, 132 try...finally statement, 85 Twitov app, 123 Twitter API, 55-59 T underscore (_), 17 undirected graph problems, 127 Unicode standard, 83, 95-98, 110 unit tests, 190, 197 United States v.

…

Even in this day and age, many documents are simply scanned from hard copies and put on the Web, making these documents inaccessible as far as much of the Internet is concerned, although they are “hiding in plain sight.” Without image-to-text capabilities, the only way to make these documents accessible is for a human to type them up by hand—and nobody has time for that. Translating images into text is called optical character recognition, or OCR. There are a few major libraries that are able to perform OCR, and many other libraries that sup‐ port them or are built on top of them. This system of libraries can get fairly compli‐ cated at times, so I recommend you read the next section before attempting any of the exercises in this chapter. 161 Overview of Libraries Python is a fantastic language for image processing and reading, image-based machine-learning, and even image creation.

…

In addition, you will need to understand not just how to use the tools presented in this book in isolation, but how they can work together to solve a larger problem. Sometimes the data is easily available and well formatted, allowing a simple scraper to do the trick. Other times you have to put some thought into it. In Chapter 10, for example, I combined the Selenium library to identify Ajax-loaded images on Amazon, and Tesseract to use optical character recognition to read them. 206 | Chapter 14: Scraping Remotely In the “Six Degrees of Wikipedia” problem, I used regular expressions to write a crawler that stored link information in a database, and then used a graph-solving algorithm in order to answer the question, “What is the shortest path of links between Kevin Bacon and Eric Idle”?

pages: 328 words: 77,877

API Marketplace Engineering: Design, Build, and Run a Platform for External Developers by Rennay Dorasamy

Airbnb, Amazon Web Services, barriers to entry, business logic, business process, butterfly effect, continuous integration, DevOps, digital divide, disintermediation, fault tolerance, if you build it, they will come, information security, Infrastructure as a Service, Internet of things, Jeff Bezos, Kanban, Kubernetes, Lyft, market fragmentation, microservices, minimum viable product, MITM: man-in-the-middle, mobile money, optical character recognition, platform as a service, pull request, ride hailing / ride sharing, speech recognition, the payments system, transaction costs, two-pizza team, Uber and Lyft, uber lyft, underbanked, web application

For example, to upload an image to a server, the file is split into chunks which are then sent in a stream to the server which reassembles the image for processing. We use this approach to upload images for Optical Character Recognition (OCR) processing. 4. Bidirectional streaming RPC: The client and server can read and write messages in any order. For example, the image could be processed by different Optical Character Recognition (OCR) engines, and responses from each engine are returned in separate streams. gRPC is a prime example of how technology can be simplified to encourage mass adoption. I also consider this specific journey as a stark reminder to have an open mind to new concepts and approaches and to fight pre-conceived notions to evolve our platform architecture.

…

Depending on the nature of the API product, the microservice may perform validation and verification such as confirming end user consent before fulfilling the request. To streamline process execution, requests to middleware components can be initiated concurrently. A gRPC stream can also be used to optimize processing of requests. As an example, for Optical Character Recognition (OCR) of a document, we initiate the call to a middleware component which then initiates a request to a local enterprise service and to a Cloud vision service. Once the request is processed, a response is returned in the stream. If the first response received was not successful, the component waits for a second response in a best-effort approach to process the request.

…

Index A Account Information Service Providers (AISP) Account Servicing Payment Service Provider (ASPSP) Amazon Echo devices Application development administration authorization development guidance gateway product implementation hosted applications lightweight microservices operation architecture client/server elements integration logic components logic orchestration middleware component platform services portal applications principles rate limiting security third-party onboarding well-defined requests Application Performance Monitoring (APM) Application Programming Interface (API) aircraft carrier altruistic view Amazon Echo devices banking benefits banking industry SeeBanking industry benefits coding principle collaboration/sharing information connectedness counter-survival delivery lead democratizing technology developer centricity developer ecosystem digital channel engineering lead (EL) enterprise-wide impact Forensics teams foundational elements human-readable information security integration pattern interface Krypton lab environment manifesto concepts operational platform operations lead (OL) pivot product owner (PO) program executive requirements retrofit products sharing services speech recognition service speedboat streamline performance and execution team dynamic members technical debt third-party participation timeframe Automatic Teller Machines (ATMs) B Banking industry across territories Australia benefits China European Union facilitative/flexible approach Hong Kong India Japan objectives operational risk overview payment system benefits prescriptive approach reputational risk risks association significant risks Singapore South Africa terminology third-party providers United State (USA) visual representation Billing engineering event sequence monetization approach real-time sequences technical flow transaction bunq application Business development attract bulid trust case studies detailing collaboration consultation sessions customer testimonial educate lead mailing lists non-technical overview partnership pricing reference customers technical jargon timelines transparency use-cases Business to Business (B2B) security Business to Consumer (B2C) security C CHIP application Client URL (cURL) capability code snippets education fiddler mandatory elements neutrality postman sharing variables and environments Commercial-Off-The-Shelf (COTS) Commercial proposition SeeMonetization Consumption advocacy business SeeBusiness development business/technical personas client URL SeeClient URL (cURL) ecosystem engagements interactions internal and external developers Marketplace vs. internal APIs personas support technical-developer portal attract beta program blog posts and podcasts blueprints collaborate developer certification educate hackathons impact assessment instructions/code lead/product owner (PO) messaging channels patterns product/service release notes status page transparency trust user groups/community sessions toolings User Experience (UX) Continuous Delivery (CD) Continuous integration (CI) D Demilitarized Zone (DMZ) Design Strategies access mechanisms availability bottom up build your own strategy business rules compliance requirements consideration consumer-driven approach definition documentation enterprise architecture (EA) error handling filtering/pagination governance guidelines integration approach lifecycle design process developer experience end of life versioning maintainability/onboarding patterns asynchronous callback pattern complex business logics event-driven architecture polling approach proxy vs. tap debate synchronous tap-and-go strategy performance pre-defined/top-down regulatory requirements reporting and historical data requirements Software Development Kit (SDK) viability/feasibility E Elasticsearch, Logstash, and Kibana (ELK) Electronic funds transfer (EFT) Enterprise Application Integration (EAI) Enterprise Java Beans (EJBs) Extensible Markup Language (XML) F Financial Services (FS) APIs applications benefits commercial models customer financial data digital payment services direct vs. brokered integration dispute mechanisms open banking/finance payment initiation regulation screen scraping standardization wait-and-see approach Financial stability and security G, H Google Remote Procedure Call (gRPC) bidirectional streaming client streaming knee-jerk reaction proto definition server streaming/unary types GraphQL I Integration strategy code elements components dedicated adapter deployment architecture as-is configuration launch configuration to-be configuration deployment strategies duplicated framework gRPC logic review microservice middleware components overview platform as a service platform services auditing error handling property management shared libraries/packages snowflakes tracing port-forward connectivity shared library taxonomy business logic categories components connectivity component loose architectural pattern microservice middleware service assembly J Java Enterprise Edition (JEE) JavaScript Object Notation (JSON) K, L Kibana Query Language (KQL) Knee-jerk reaction M Marketplace vs. internal APIs Mobile Virtual Network Operators (MVNO) Monetization analytics/insight analytics collection dashboards email historical path/progress implementation instant messages operational metrics reporting platform billing engineering business models flywheel identification implementation leveraging economies Marketplace positioning notional income statement value/revenue strategies affiliate Business-to-business (B2B) customer data financial perspective free/freemium freemium gets paid indirect strategy information pays points based service referral program reputational process revenue share social media personalities tiered services transaction fee Monitoring capability alerting strategy application performance monitoring environment functional monitoring infrastructure overview service telemetry transaction tracing user interface (UI) Monolith vs. microservice N Notional Income Statement O OAuth (open-standard framework) actors/participants application registration client/resource server process flow server interaction authorization code flow client credentials client credentials/authorization credential phishing flow grant type/access tokens Lucidchart login open banking variation permission administration refresh token scenario vulnerability Open Web Application Security Project (OWASP) authentication mechanisms excessive data exposure function level authorization guidelines improper assets management injection flaws insufficient logging and monitoring mass assignment misconfiguration object level authorization resources and rate limiting working process Operational expenditure (OPEX) budget Operational universe change/release management high-level view implementation guide internal/supporting elements product iteration quality assurance team release funnel requirements third-party consumption updates DevOps process foundational elements logging process architecture principle contexts Elasticsearch Kibana console mounting persistent storage overview strategies monitoring overview parameters/conditions platform supporting systems approaches architecture backend dependencies managed services process flow application domain incident management issue tracking/reporting severity-response mapping sub-domains support capability supporting systems dependencies traffic/value transactions Optical Character Recognition (OCR) Organizational capability P, Q Payment Initiation Service Providers (PISP) Payment Services Directive (PSD2) Platform architecture API gateway (external/internal) authorization framework container deployment database docker configuration elements identification inherent benefit integration strategy SeeIntegration strategy iterations Managed container platform microservices layer middleware platform snowflake environment spreadsheet time-lapsed view virtualized approach virtual Machines (VM) Platform as a Service (PaaS) Proof of Concept (PoC) R Remote Procedure Calls (RPC) SeeGoogle Remote Procedure Call (gRPC) Representational State Transfer (REST) Revolut application S Sandbox strategies access key areas/services backend simulation approach conditions design environment pros/cons system virtualiser beta approach design considerations inception phase pros/cons strategies use-cases environments full-blown operational environment functional testing interface definition live context objective onboarding process overview purpose quality assurance (QA) approach design foundational consumer pros/cons use-cases semi-live approach design environment pros/cons use-cases shallow approach backend simulation approach design flow diagram overview pros/cons use-cases third parties unique opportunity unique synergy virtualiser configuration customizations design philosophy implementation options integration components predicate parameters process requirements responses runtime configuration Screen scraping Security application code approaches Business to Business (B2B) Business to Consumer (B2C) client identifier container cross-cutting concern dynamicity infrastructure meaning network OAuth SeeOAuth (open-standard framework) open API pattern OWASP review process Service-Level Agreements (SLAs) Simple Object Access Protocol (SOAP) Software Development Kit (SDK) Software development lifecycle application SeeApplication development delivery approach agile methodology initial strategy ongoing delivery planning sprint approach squad strategies DevOps aspirational goal continuous delivery continuous integration (CI) implementation microservices integration requirement objectives philosophy borrows retrieve reference data team structure automated testing consumer/provider delivery leads developer journey development documentation finance/reporting orchestrating development ownership performance testing quality assurance (QA) responsibilities security team unit testing Speedboat operating model Standardization and neutral technology Storage Area Network (SAN) Support capability communications strategy details development focus call foundational principle full-stack squad level 1/2 support logging/error handling operations lead people/process/technology periodic updates third party communication transition war room T, U, V, W X Third-Party Provider (TPP) Transparency and public accountability TrueLayer application Y, Z YOLT’s application

pages: 138 words: 27,404

OpenCV Computer Vision With Python by Joseph Howse

augmented reality, computer vision, Debian, optical character recognition, pattern recognition

David now has more than 10 years of experience in IT, with more than seven years experience in computer vision, computer graphics, and pattern recognition working on different projects and startups, applying his knowledge of computer vision, optical character recognition, and augmented reality. He is the author of the DamilesBlog (http://blog.damiles.com), where he publishes research articles and tutorials about OpenCV, computer vision in general, and Optical Character Recognition algorithms. He is the co-author of Mastering OpenCV with Practical Computer Vision Projects , Daniel Lélis Baggio, Shervin Emami, David Millán Escrivá, Khvedchenia Ievgen, Naureen Mahmood, Jasonl Saragih, and Roy Shilkrot, Packt Publishing.

pages: 296 words: 66,815

The AI-First Company by Ash Fontana

23andMe, Amazon Mechanical Turk, Amazon Web Services, autonomous vehicles, barriers to entry, blockchain, business intelligence, business process, business process outsourcing, call centre, Charles Babbage, chief data officer, Clayton Christensen, cloud computing, combinatorial explosion, computer vision, crowdsourcing, data acquisition, data science, deep learning, DevOps, en.wikipedia.org, Geoffrey Hinton, independent contractor, industrial robot, inventory management, John Conway, knowledge economy, Kubernetes, Lean Startup, machine readable, minimum viable product, natural language processing, Network effects, optical character recognition, Pareto efficiency, performance metric, price discrimination, recommendation engine, Ronald Coase, Salesforce, single source of truth, software as a service, source of truth, speech recognition, the scientific method, transaction costs, vertical integration, yield management

☐ Perishability ☐ Veracity ☐ Dimensionality ☐ Breadth ☐ Self-reinforcement Discrimination There are five ways to value data based on how difficult it is for others to get it (as opposed to its ultimate utility to the acquirer of that data): accessibility, availability, cost, time, and fungibility. Accessibility Data that is hard to obtain might be hard for others to get, too. To give you an example, it might require traveling to a special site, such as a local council office, manually collecting paper files, photocopying them, and then running them through optical character recognition software that turns the image from the photocopier into text that a computer can read. It’s important to assess whether data may be hard to obtain in the future, based on contracts or policies related to a dataset. Typically, this might involve the data owner’s restricting access to it.

…

., variants A and B) to different groups of users; also known as a split test ACYCLIC: jumping between points rather than going through points in the same pattern each time AGENT-BASED MODEL: model that generates the actions of agents and interactions with other agents given the agent’s properties, incentives, and environmental constraints AGGREGATION THEORY: the theory that new entrants in a market can aggregate existing quantities in that market; for example, data points, to create new and valuable products APPLICATION PROGRAMMING INTERFACE: a set of functions that allows applications to communicate with other applications, either to use a feature or fetch data; effectively, a structured way for software to communicate with other pieces of software AREA UNDERNEATH THE CURVE (AUC): the integral of the ROC curve BLOCKCHAIN: decentralized and distributed public ledger of transactions CLUSTERING TOOL: using unsupervised machine learning to group similar objects COMPLEMENTARY DATA: new data that increases the value of existing data CONCAVE PAYOFF: decreasing dividends from using a product CONCEPT DRIFT: when the idea behind the subject of a prediction changes based on observations CONSUMER APP: software application primarily used by individuals (rather than businesses) CONTRIBUTION MARGIN: average price per unit minus labor and quality control costs associated with that unit CONVEX PAYOFF: increasing dividends from using a product COST LEADERSHIP: a form of competitive advantage that comes from having the lowest cost of production with respect to competitors in a given industry CRYPTOGRAPHY: writing and solving codes CRYPTO TOKEN: representation of an asset that is kept on a blockchain CUSTOMER RELATIONSHIP MANAGEMENT SOFTWARE: software that stores and manipulates data about customers CUSTOMER SUPPORT AGENT: employee that is paid to respond to customer support tickets CUSTOMER SUPPORT TICKET: message from user of a product requesting help in using that product CYBERNETICS: the science of control and communication in machines and living things DATA: facts and statistics collected together for reference or analysis DATA ANALYST: person who sets up dashboards, visualizes data, and interprets model outputs DATA DRIFT: (1) when the distribution on which a prediction is based changes such that it no longer represents observed reality; or (2) when the data on which a prediction is based changes such that some of it is no longer available or properly formed DATA ENGINEER: person who cleans data, creates automated data management tools, maintains the data catalogue, consolidates data assets, incorporates new data sources, maintains data pipelines, sets up links to external data sources, and more DATA EXHAUST: data collected when users perform operations in an application, for example clicking buttons and changing values DATA INFRASTRUCTURE ENGINEER: person who chooses the right database, sets up databases, moves data between them, and manages infrastructure cost and more DATA LABELING: adding a piece of information to a piece of data DATA LEARNING EFFECT: the automatic compounding of information DATA LEARNING LOOP: the endogenous and continuous generation of proprietary data from an intelligent system that provides the basis of the next generation of that intelligent system DATA NETWORK: set of data that is built by a group of otherwise unrelated entities, rather than a single entity DATA NETWORK EFFECT: the increase in marginal benefit that comes from adding a new data point to an existing collection of data; the marginal benefit is defined in terms of informational value DATA PRODUCT MANAGER: person who incorporates the data needs of the model with the usability intentions of the product designers and preferences of users in order to prioritize product features that collect proprietary data DATA SCIENTIST: person who sets up and runs data science experiments DATA STEWARD: person responsible for ensuring compliance to data storage standards DEEP LEARNING: artificial neural network with multiple layers DEFENSIBILITY: the relative ability to protect a source of income; for example, an income-generating asset DIFFERENTIAL PRIVACY: system for sharing datasets without revealing the individual data points DIMENSIONALITY REDUCTION: transforming data (using a method such as principal component analysis) to reduce the measures associated with each data point DISRUPTION THEORY: the theory that new entrants in a market can appropriate customers from incumbent suppliers by selling a specialized product to a niche segment of customers at a fraction of the cost of the incumbent’s product DRIFT: when a model’s concept or data diverges from reality EDGE: connections between nodes; also called a link or a line ENTERPRISE RESOURCE PLANNING PRODUCT: software that collects and thus contains data about product inventory ENTRY-LEVEL DATA NETWORK EFFECT: the compounding marginal benefit that comes from adding new data to an existing collection of data; the marginal benefit is defined in terms of informational value to the model computed over that data EPOCH: completed pass of the entire training dataset by the machine learning model ETL (EXTRACT, TRANSFORM, AND LOAD): the three, main steps in moving data from one place to another EXAMPLE: a single input and output pair from which a machine learning model learns FEATURE: set of mathematical functions that are fed data to output a prediction FEDERATED LEARNING: method for training machine learning models across different computers without exchanging data between those computers FIRST-MOVER: company that collects scarce assets, builds technological leadership, and creates switching costs in a market by virtue of entering that market before other companies GAUSSIAN MIXTURE MODEL: probabilistic model representing a subset within a set, assuming a normal distribution, without requiring the observed data match the subset GLOBAL, MULTIUSER MODEL: model that makes predictions about something common to all customers of a given company; this is generally trained on data aggregated across all customers GRADIENT BOOSTED TREE: method for producing a decision tree in multiple stages according to a loss function GRAPH: mathematical structure made up of nodes and edges that is typically used to model interactions between objects HEURISTICS: knowledge acquired by doing HISTOGRAM: diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval HORIZONTAL INTEGRATION: the combination in one product of multiple industry-specific functions normally operated by separate products HUMAN-IN-THE-LOOP SYSTEM: machine learning system that requires human input to generate an output HYPERPARAMETER: parameter that is used to control the machine learning model HYPERTEXT MARKUP LANGUAGE: programming language specifically for writing documents to be displayed in a web browser INCUMBENT: existing market leader INDEPENDENT SOFTWARE VENDOR: a company that publishes software INFORMATION: data that resolves uncertainty for the receiver INSOURCING: finding the resources to complete a task within an existing organization such that it’s not necessary to contract new resources to complete that task INTEGRATOR: software company that builds tools to connect data sources, normalizes data across sources, and updates connections as these sources change INTELLIGENT APPLICATION: application that runs machine learning algorithms over data to make predictions or decisions INTERACTIVE MACHINE LEARNING: machine learning models that collect data from a user, put that data into a model, present the model output back to the user, and so on K-MEANS: unsupervised machine learning method to group objects in a number of clusters based on the cluster with the center point, or average, that’s closest to the object LABEL: the output of a machine learning system based on learning from examples LAYER: aggregation of neurons; layers can be connected to other layers LEAN AI: the process of building a small but complete and accurate AI to solve a specific problem LEARNING EFFECT: the process through which knowledge accumulation leads to an economic benefit LEGACY APPLICATION: application already in use LOSS: the quantum of how right or wrong a model was in making a given prediction LOSS FUNCTION: mathematical function that determines the degree to which the output of a model is incorrect MACHINE LEARNING: computable algorithms that automatically improve with the addition of novel data MACHINE LEARNING ENGINEER: person who implements, trains, monitors, and fixes machine learning models MACHINE LEARNING MANAGEMENT LOOP: automated system for continuous incorporation of real-world data into machine learning models MACHINE LEARNING RESEARCHER: person who sets up and runs machine learning experiments MARKETING SEGMENTATION: dividing customers into groups based on similarity MINIMUM VIABLE PRODUCT: the minimum set of product features that a customer needs for a product to be useful MOAT: accumulation of assets that form a barrier to other parties that may reduce the income-generating potential of those assets MONITORING: observing a product to ensure both quality and reliability NETWORK EFFECT: the increase in marginal benefit that comes from adding a new node to an existing collection of nodes; the marginal benefit is defined in terms of utility to the user of the collection NEURAL NETWORK: collection of nodes that are connected to each other such that they can transmit signals to each other across the edges of the network, with the strength of the signal depending on the weights on each node and edge NEXT-LEVEL DATA NETWORK EFFECT: the compounding marginal benefit that comes from adding new data to an existing collection of data; the marginal benefit is defined in terms of the rate of automatic data creation by the model computed over that data NODE: discrete part of a network that can receive, process, and send signals; also called a vertex or a point OPTICAL CHARACTER RECOGNITION SOFTWARE: software that turns images into machine-readable text PARETO OPTIMAL SOLUTION: achieving 80 percent of the optimal solution for 20 percent of the work PARTIAL PLOT: graph that shows the effect of adding an incremental variable to a function PERSONALLY IDENTIFIABLE INFORMATION: information that can be linked to a real person PERTURBATION: deliberately modifying an object, e.g., data POWER GENERATOR: user that contributes an inordinate amount of data with respect to other users POWER USER: user that uses a product an inordinate amount with respect to other users PRECISION: the number of relevant data points retrieved by the model over the total number of data points PREDICTION USABILITY THRESHOLD: the point at which a prediction becomes useful to a customer PRICING: product usage; for example, hours spent using a product PROOF OF CONCEPT: a project jointly conducted by potential customers and vendors of a software product to prove the value theoretically provided by that, in practice PROPRIETARY INFORMATION: information that is owned by a specific entity and not in the public domain QUERY LANGUAGE: programming language used to retrieve data from a database RANDOM FOREST: method for analyzing data that constructs multiple decision trees and outputs the class of objects that occurs most often among all the objects or the average prediction across all of the decision trees RECALL: the number of relevant data points retrieved by the model over the total number of relevant data points RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE: plot that shows how well the model performed at different discrimination thresholds, e.g., true and false positive rates RECURSION: repeated application of a method REINFORCEMENT LEARNING: ML that learns from objectives RETURN ON INVESTMENT (ROI): calculated by dividing the return from using an asset by the investment in that asset ROI-BASED PRICING: pricing that is directly correlated with a rate of return on an investment SCALE EFFECT: the increase in marginal benefit or reduction in marginal cost that comes from having a higher quantity of the asset or capability that generates the benefit SCATTER PLOT: graph in which the values of two variables are plotted along two axes, the pattern of the resulting points revealing any correlation present SCHEME: the form common to all values in a particular table in a database SECURE MULTIPARTY COMPUTATION: method for jointly computing inputs while keeping the inputs private from the participating computers SENSOR NETWORK: a collection of devices that collect data from the real world SIMULATION: method that generates inputs to put through a program to see if that program fails to execute or generates inaccurate outputs SOFTWARE-AS-A-SERVICE (SAAS): method of delivering software online and licensing that software on a subscription basis SOFTWARE DEVELOPMENT KIT: tool made by software developers to allow other software developers to build on top of or alongside their software STATISTICAL PROCESS CONTROL: quality control process that is based on statistical methods SUPERVISED MACHINE LEARNING: ML that learns from inputs given outputs SUPPORT VECTOR MACHINE: supervised learning model that classifies new data points by category SYSTEM OF ENGAGEMENT: system that actively (e.g., through user input) aggregates information about a particular business function SYSTEM OF RECORD: system that passively aggregates information about a particular business function SYSTEMS INTEGRATOR: an entity that installs new software systems such that they function with customers’ existing systems TALENT LOOP: the compounding competitive advantage in attracting high-quality personnel that comes from having more high quality data than competitors TRANSACTIONAL PRICING: pricing that is directly correlated with the quantum of units transacted through a product, for example, number of processed data points or computation cycles UNSUPERVISED MACHINE LEARNING: ML that learns from inputs without outputs USAGE-BASED PRICING: pricing that is directly correlated with the quantum of product usage; for example, hours of time spent using a product USER INTERFACE: set of objects that exist in software that are manipulated to initiate a function in that software VALUE CHAIN: the process by which a company adds value to an asset; for example, adding value to a data point by processing that data into information, and that information into a prediction VARIABLE IMPORTANCE PLOT: list of the most important variables in a function in terms of their contribution to a given prediction, or predictive power VERSIONING: keeping a copy of every form of a model, program, or dataset VERTICAL INTEGRATION: the combination in one company of multiple stages of production (or points on a value chain) normally operated by separate firms VERTICAL PRODUCT: software product that is only relevant to users in a particular industry WATERFALL CHART: data visualization that shows the result of adding or subtracting sequential values in adjacent columns WEB CRAWLER: program that systematically queries webpages or other documents on the internet; strips out the unnecessary content on those pages, such as formatting; grabs salient data, puts it in a standard document format (e.g., JSON), and puts it in a private data repository WEIGHT: the relative measure of strength ascribed to nodes and edges in a network; this can be automatically or manually adjusted after learning of a more optimal weight WORKFLOW APPLICATION: software that takes a sequence of things that someone does in the real world and puts those steps into an interface that allows for data input at each step ZETTABYTE: 10^21 bytes or 1 trillion gigabytes A B C D E F G H I J K L M N O P Q R S T U V W X Y Z INDEX The page numbers in this index refer to the printed version of this book.

…

A/B test, 271 accessibility of data, 72, 107 accuracy, 175, 203–4 in proof of concept phase, 59–60 active learning-based systems, 94–95 acyclic, 150, 271 advertising, 227, 240 agent-based models (ABMs), 103–5, 271 simulations versus, 105 aggregated data, 81, 83 aggregating advantages, 222–65 branding and, 255–56 data aggregation and, 241–45 on demand side, 225 disruption and, 239–41 first-mover advantage and, 253–55 and integrating incumbents, 244–45 and leveraging the loop against incumbents, 256–61 positioning and, 245–56 ecosystem, 251–53 staging, 249–51 standardization, 247–51 storage, 246–47 pricing and, 236–39 customer data contribution, 237 features, 238–39 transactional, 237, 281 updating, 238 usage-based, 237–38, 281 on supply side, 224–25 talent loop and, 260–61 traditional forms of competitive advantage versus, 224–25 with vertical integration, see vertical integration aggregation theory, 243–44, 271 agreement rate, 216 AI (artificial intelligence), 1–3 coining of term, 5 definitions and analogies regarding, 15–16 investment in, 7 lean, see Lean AI AI-First Century, 3 first half of (1950–2000), 3–9 cost and power of computers and, 8 progression to practice, 5–7 theoretical foundations, 4–5 second half of (2000–2050), 9 AI-First companies, 1, 9, 10, 44 eight-part framework for, 10–13 learning journey of, 44–45 AI-First teams, 127–42 centralized, 138–39 decentralized, 139 management of, 135–38 organization structure of, 138–39 outsourcing, 131 support for, 134–35 when to hire, 130–32 where to find people for, 133 who to hire, 128–30 airlines, 42 Alexa, 8, , 228, 230 algorithms, 23, 58, 200–201 evolutionary, 150–51, 153 alliances of corporate and noncorporate organizations, 251 Amazon, 34, 37, 84, 112, 226 Alexa, 8, 228, 230 Mechanical Turk, 98, 99, 215 analytics, 50–52 anonymized data, 81, 83 Apple, 8, 226 iPhone, 252 application programming interfaces (APIs), 86, 118–22, 159, 172, 236, 271 applications, 171 area underneath the curve (AUC), 206, 272 artificial intelligence, see AI artificial neural network, 5 Atlassian Corporation, 243 augmentation, 172 automation versus, 163 availability of data, 72–73 Babbage, Charles, 2 Bank of England, 104–5 Bayesian networks, 150, 201 Bengio, Yoshua, 7 bias, 177 big-data era, 28 BillGuard, 112 binary classification, 204–6 blockchain, 109–10, 117, 272 Bloomberg, 73, 121 brain, 5, 15, 31–32 shared, 31–33 branding, 256–57 breadth of data, 76 business goal, in proof of concept phase, 60 business software companies, 113 buying data, 119–22 data brokers, 119–22 financial, 120–21 marketing, 120 car insurance, 85 Carnegie, Andrew, 226 cars, 6, 254 causes, 145 census, 118 centrifugal process, 49–50 centripetal process, 50 chess, 6 chief data officer (CDO), 138 chief information officer (CIO), 138 chief technology officer (CTO), 139 Christensen, Clay, 239 cloud computing, 8, 22, 78–79, 87, 242, 248, 257 Cloudflare, 35–36 clustering, 53, 64, 95, 272 Coase, Ronald, 226 compatibility, 251–52 competitions, 117–18 competitive advantages, 16, 20, 22 in DLEs, 24, 33 traditional forms of, 224–25 see also aggregating advantages complementarity, 253 complementary data, 89, 124, 272 compliance concerns, 80 computer chips, 7, 22, 250 computers, 2, 3, 6 cost of, 8–9 power of, 7, 8, 19, 22 computer vision, 90 concave payoffs, 195–98, 272 concept drift, 175–76, 272 confusion matrix, 173–74 consistency, 256–57 consultants, 117–18, 131 consumer apps, 111–13, 272 consumer data, 109–14 apps, 111–13 customer-contributed data versus, 109 sensor networks, 113–14 token-based incentives for, 109–10 consumer reviews, 29, 43 contractual rights, 78–82 clean start advantage and, 78–79 negotiating, 79 structuring, 79–82 contribution margin, 214, 272 convex payoffs, 195–97, 202, 272 convolutional neural networks (CNNs), 151, 153 Conway, John, 104 cost of data labeling, 108 in ML management, 158 in proof of concept phase, 60 cost leadership, 272 DLEs and, 39–41 cost of goods sold (COGS), 217 crawling, 115–16, 281 Credit Karma, 112 credit scores, 36–37 CRM (customer relationship management), 159, 230–31, 255, 260, 272 Salesforce, 159, 212, 243, 248, 258 cryptography, 272 crypto tokens, 109–10, 272 CUDA, 250 customer-generated data, 77–91 consumer data versus, 109 contractual rights and, 78–82 clean start advantage and, 78–79 negotiating, 79 structuring, 79–82 customer data coalitions, 82–84 data integrators and, 86–89 partnerships and, 89–91 pricing and, 237 workflow applications for, 84–86 customers costs to serve, 242 direct relationship with, 242 needs of, 49–50 customer support agents, 232, 272 customer support tickets, 260, 272 cybernetics, 4, 273 Dark Sky, 112, 113 DARPA (Defense Advanced Research Projects Agency), 5 dashboards, 171 data, 1, 8, 69, 273 aggregation of, 241–45 big-data era, 28 complementary, 89 harvesting from multiple sources, 57 incomplete, 178 information versus, 22–23 missing sources of, 177 in proof of concept phase, 60 quality of, 177–78 scale effects with, 22 sensitive, 57 starting small with, 56–58 vertical integration and, 231–32 data acquisition, 69–126, 134 buying data, 119–22 consumer data, 109–14 apps, 111–13 customer-contributed data versus, 109 sensor networks, 113–14 token-based incentives for, 109–10 customer-generated data, see customer-generated data human-generated data, see human-generated data machine-generated data, 102–8 agent-based models, 103–5 simulation, 103–4 synthetic, 105–8 partnerships for, 89–91 public data, 115–22 buying, see buying data consulting and competitions, 117–18 crawling, 115–16, 281 governments, 118–19 media, 118 valuation of, 71–77 accessibility, 72, 107 availability, 72–73 breadth, 76 cost, 73 determination, 74–76 dimensionality, 75 discrimination, 72–74 fungibility, 74 perishability and relevance, 74–75, 201 self-reinforcement, 76 time, 73–74 veracity, 75 volume of, 76–77 data analysts, 128–30, 132, 133, 137, 273 data as a service (DaaS), 116, 120 databases, 258 data brokers, 119–22 financial, 120–21 marketing, 120 data cleaning, 162–63 data distribution drift, 178 data drift, 176, 273 data-driven media, 118 data engineering, 52 data engineers, 128–30, 132, 133, 137, 161, 273 data exhaust, 80, 257–58, 273 data infrastructure engineers, 129–32, 137, 273 data integration and integrators, 86–90, 276 data labeling, 57, 58, 92–100, 273 best practices for, 98 human-in-the-loop (HIL) systems, 100–101, 276 management of, 98–99 measurement in, 99–100 missing labels, 178 outsourcing of, 101–2 profitability metrics and, 215–16 tools for, 93–97 data lake, 57, 163 data learning effects (DLEs), 15–47, 48, 69, 222, 273 competitive advantages of, 24, 33 data network effects, 19, 26–33, 44, 273 edges of, 24 entry-level, 26–29, 31–33, 36–37, 274 network effects versus, 24–25 next-level, 26–27, 29–33, 36–37, 278 what type to build, 33 economies of scale in, 34 formula for, 17–20 information accumulation and, 21 learning effects and, 20–21 limitations of, 21, 42–43 loops around, see loops network effects and, 24–26 powers of, 34–42 compounding, 36–38 cost leadership, 39–41 flywheels, 37–38 price optimization, 41–42 product utility, 35–36 winner-take-all dynamics, 34–35 product value and, 39 scale effects and, 21–23 variety and, 34–35 data learning loops, see loops data lock-in, 247–48 data networks, 109–10, 143–44, 273 normal networks versus, 26 underneath products, 25–26 data pipelines, 181, 216 breaks in, 87, 181 data platform, 57 data processing capabilities (computing power), 7, 8, 19, 22 data product managers, 129–32, 274 data rights, 78–82, 246 data science, 52–56 decoupling software engineering from, 133 data scientists, 54–56, 117, 128–30, 132–39, 161, 274 data stewards, 58, 274 data storage, 57, 81, 246–47, 257 data validators, 161 data valuation, 71–77 accessibility in, 72, 107 availability in, 72–73 breadth in, 76 cost in, 73 determination in, 74–76 dimensionality in, 75 discrimination in, 72–74 fungibility in, 74 perishability and relevance in, 74–75, 201 self-reinforcement in, 76 time in, 73–74 veracity in, 75 decision networks, 150, 153 decision trees, 149–50, 153 deduction and induction, 49–50 deep learning, 7, 147–48, 274 defensibility, 200, 274 defensible assets, 25 Dell, Michael, 226 Dell Technologies, 226 demand, 225 denial-of-service (DoS) attacks, 36 designers, 129 differential privacy, 117, 274 dimensionality reduction, 53, 274 disruption, 239–41 disruption theory, 239, 274 distributed systems, 8, 9 distribution costs, 243 DLEs, see data learning effects DoS (denial-of-service) attacks, 36 drift, 175–77, 203, 274 concept, 175–76 data, 176 minimizing, 201 e-commerce, 29, 31, 34, 37, 41, 84 economies of scale, 19, 34 ecosystem, 251–52 edges, 24, 274 enterprise resource planning (ERP), 161, 250, 274 entry-level data network effects, 26–29, 31–33, 36–37, 274 epochs, 173, 275 equity capital, 230 ETL (extract, transform, and load), 58, 275 evolutionary algorithms, 150–51, 153 expected error reduction, 96 expected model change, 96 Expensify, 85–86 Facebook, 25, 43, 112, 119, 122 features, 63–64, 145, 275 finding, 64–65 pricing and, 238–39 federated learning, 117, 275 feedback data, 36, 199–200 feed-forward networks, 151, 153 financial data brokers for, 120–21 stock market, 72, 74, 120–21 first-movers, 253–55, 275 flywheels, 37–38, 243–44 Ford, Henry, 49 fungibility of data, 74 Game of Life, 104 Gaussian mixture model, 275 generative adversarial networks (GANs), 152, 153 give-to-get model, 36 global multiuser models, 275 glossary, 271–82 Google, 111–12, 115, 195, 241, 251, 253–54 governments, 118–19 gradient boosted tree, 53, 275 gradient descent, 208 graph, 275 Gulf War, 6 hedge funds, 227 heuristics, 139, 231, 275 Hinton, Geoffrey, 7 histogram, 53, 275 holdout data, 199 horizontal products, 210–12, 276 HTML (hypertext markup language), 116, 276 human-generated data, 91–102 data labeling in, 57, 58, 92–100, 273 best practices for, 98 human-in-the-loop (HIL) systems, 100–101, 276 management of, 98–99 measurement in, 99–100 missing labels, 178 outsourcing of, 101–2 profitability metrics and, 215–16 tools for, 93–97 human learning, 16–17 hyperparameters, 173, 276 hypertext markup language (HTML), 116, 276 IBM (International Business Machines), 5–8, 255 image recognition, 76–77, 146 optical character, 72, 278 incumbents, 276 integrating, 245–46 leveraging the loop against, 256–61 independent software vendors (ISVs), 161, 248, 276 induction and deduction, 49–50 inductive logic programming (ILP), 149, 153 Informatica, 86 information, 1, 2, 276 data versus, 22–23 informational leverage, 3 Innovator’s Dilemma, The (Christensen), 239 input cost analysis, 215–16 input data, 199 insourcing, 102, 276 integration, 86–90, 276 predictions and, 171 testing, 174 integrations-first versus workflow-first companies, 88–89 intellectual leverage, 3 intellectual property (IP), 25, 251 intelligence, 1, 2, 5, 15, 16 artificial, see AI intelligent applications, 257–60, 276 intelligent systems, 19 interaction frequency, 197 interactive machine learning (IML), 96–97, 276 International Telecommunications Union (ITU), 250–51 Internet, 8, 19, 32, 69, 241–42, 244 inventory management software, 260 investment firms, 232 iPhone, 252 JIRA, 243 Kaggle, 9, 56, 117 Keras, 251 k-means, 276 knowledge economy, 21 Kubernetes, 251 language processing, 77, 94 latency, 158 layers of neurons, 7, 277 Lean AI, 48–68, 277 customer needs and, 49–50 decision tree for, 50–52 determining customer need for AI, 50–60 data and, 56–58 data science and, 54–56 sales and, 58–60 statistics and, 53–54 lean start-up versus, 61–62 levels in, 65–66 milestones for, 61 minimum viable product and, 62–63 model features lean start-ups, 61–62 learning human formula for, 16–17 machine formula for, 17–20 learning effects, 20–22, 277 moving beyond, 20–21 legacy applications, 257, 277 leverage, 3 linear optimization, 42 LinkedIn, 122 loans, 35, 37, 227 lock-in, 247–48 loops, 187–221, 273 drift and, 201 entropy and, 191–92 good versus bad, 191–92 metrics for measuring, see metrics moats versus, 187–88, 192–94 physics of, 190–92 prediction and, 202–3 product payoffs and, 195–98, 202 concave, 195–98 convex, 195–97, 202 picking the product to build, 198 repeatability in, 188–89 scale and, 198–201, 203 and data that doesn’t contribute to output, 199–200 loss, 207–8, 277 loss function, 275, 277 machine-generated data, 102–8 agent-based models, 103–5 simulation, 103–4 synthetic, 105–8 machine learning (ML), 9, 145–47, 277 types of, 147–48 machine learning engineers, 39, 56, 117–18, 129, 130, 132, 138, 139, 161, 277 machine learning management loop, 277 machine learning models (ML models), 9, 26, 27, 31, 52–56, 59, 61, 134 customer predictions and, 80–81 features of, 61, 63 machine learning models, building, 64–65, 143–54 compounding, 148–52 diverse disciplines in, 149–51 convolutional neural networks in, 151, 153 decision networks in, 150, 153 decision trees in, 149–50, 153 defining features, 146–47 evolutionary algorithms in, 150–51, 153 feed-forward networks in, 151, 153 generative adversarial networks in, 152, 153 inductive logic programming in, 149, 153 machine learning in, 151–52 primer for, 145–47 recurrent neural networks in, 151, 153 reinforcement learning in, 152, 153 statistical analysis in, 149, 153 machine learning models, managing, 155–86 acceptance, 157, 162–66 accountability and, 164 and augmentation versus automation, 163 budget and, 164 data cleaning and, 162–63 distribution and, 165 executive education and, 165–66 experiments and, 165 explainability and, 166 feature development and, 163 incentives and, 164 politics and, 163–66 product enhancements and, 165 retraining and, 163 and revenues versus costs, 164 schedule and, 163 technical, 162–63 and time to value, 164 usage tracking and, 166 decentralization versus centralization in, 156 experimentation versus implementation in, 155 implementation, 158–66 data, 158–59 security, 159–60 sensors, 160 services, 161 software, 159 staffing, 161–162 loop in, 156, 166–81 deployment, 171–72 monitoring, see monitoring model performance training, 168–69 redeploying, 181 reproducibility and, 170 rethinking, 181 reworking, 179–80 testing, 172–74 versioning, 169–70, 281 ROI in, 164, 176, 181 testing and observing in, 156 machine learning researchers, 129–34, 135–36, 138, 277 management of AI-First teams, 135–38 of data labeling, 98–99 of machine learning models, see machine learning models, managing manual acceptance, 208–9 manufacturing, 6 marketing, customer data coalitions and, 83 marketing segmentation, 277 McCulloch, Warren, 4–5 McDonald’s, 256 Mechanical Turk, 98, 99, 215 media, 118 medical applications, 90–91, 145, 208 metrics, 203 measurement, 203–9 accuracy, 203–4 area underneath the curve, 206, 272 binary classification, 204–6 loss, 207–8 manual acceptance, 208–9 precision and recall, 206–7 receiver operating characteristic, 205–6, 279 usage, 209 profitability, 209–18 data labeling and, 215–16 data pipes and, 216 input cost analysis, 215–16 research cost analysis, 217–18 unit analysis, 213–14 and vertical versus horizontal products, 210–12 Microsoft, 8, 247 Access, 257 Outlook, 252 military, 6, 7 minimum viable product (MVP), 62–63, 277 MIT (Massachusetts Institute of Technology), 4, 5 ML models, see machine learning models moats, 277 loops versus, 187–88, 192–94 mobile phones, 113 iPhone, 252 monitoring, 277 monitoring model performance, 174–78 accuracy, 175 bias, 177 data quality, 177–78 reworking and, 179–80 stability, 175–77 MuleSoft, 86, 87 negotiating data rights, 79, 80 Netflix, 242, 243 network effects, 15–16, 20, 22, 23, 44, 278 compounding of, 36 data network effects versus, 24–25 edges of, 24 limits to, 42–43 moving beyond, 24–26 products with versus without, 26 scale effects versus, 24 traditional, 27 value of, 27 networks, 7, 15, 17 data networks versus, 26 neural networks, 5, 7, 8, 19, 23, 53, 54, 277–78 neurons, 5, 7, 15 layers of, 7, 276 next-level data network effects, 26–27, 29–33, 36–37, 278 nodes, 21, 23–25, 27, 44, 278 NVIDIA, 250 Obama administration, 118 Onavo, 112 optical character recognition software, 72, 278 Oracle, 247, 248 outsourcing, 216 data labeling, 101–2 team members, 131 overfitting, 82 Pareto optimal solution, 56, 278 partial plots, 53, 278 payoffs, 195–98 concave, 195–98 convex, 195–97, 202 perceptron algorithm, 5 perishability of data, 74–75, 201 personalization, 255–56 personally identifiable information (PII), 81, 278 personnel lock-in, 248 perturbation, 178, 278 physical leverage, 3 Pitts, Walter, 4–5 POC (proof of concept), 59–60, 63, 278 positioning, 245–56 power generators, 209, 278 power teachers, 209 precision, 278 precision and recall, 206–7 prediction usability threshold (PUT), 62–64, 90, 91, 173, 200–202, 279 predictions, 34–35, 48, 63, 65, 148, 202–3 predictive pricing, 41, 42 prices charged by data vendors, 73 pricing of AI-First products, 236–39 customer data contribution and, 237 features and, 238–39 transactional, 237, 280 updating and, 238 usage-based, 237–38, 281 of data integration products, 87 optimization of, 41–42 personalized, 41 predictive, 41, 42 ROI-based, 235–36, 279 Principia Mathematica, 4 prisoner’s dilemma, 104 probability, in data labeling, 107 process automation, 6 process lock-in, 248 products, 59 features of, 61, 63 lock-in and, 248 utility of, 35–36 value of, 39 profit, 213 profitability metrics, 209–18 data labeling and, 215–16 data pipes and, 216 input cost analysis, 215–16 research cost analysis, 217–18 unit analysis, 213–14 and vertical versus horizontal products, 210–12 proof of concept (POC), 59–60, 63, 278 proprietary information, 44, 279 feedback data, 199–200 protocols, 248 public data, 115–22 buying, see buying data consulting and competitions, 117–18 crawling, 115–16, 281 governments, 118–19 media, 118 PUT (prediction usability threshold), 62–64, 90, 91, 173, 200–202, 278 quality, 175, 177–78 query by committee, 96 query languages, 279 random forest, 53, 64, 279 recall, 279 receiver operating characteristic (ROC) curve, 205–6, 279 recurrent neural networks (RNNs), 151, 153 recursion, 150, 279 regression, 64 reinforcement learning (RL), 103, 147–48, 152, 153, 279 relevance of data, 74–75 reliability, 175 reports, 171 research and development (R & D), 42 cost analysis, 217–18 revolutionary products, 252 robots, 6 ROI (return on investment), 55, 63–65, 93, 164, 176, 181, 198, 279 pricing based on, 235–36, 279 Russell, Bertrand, 4 sales, 58–60 Salesforce, 159, 212, 243, 248, 258 SAP (Systems Applications and Products in Data Processing), 6, 159, 161, 247, 248 SAS, 253 scalability, in data labeling, 106 scale, 20–22, 227, 279 economies of, 19, 34 loops and, 198–200, 203 in ML management, 158 moving beyond, 21–23 network effects versus, 24 scatter plot, 53, 280 scheme, 279 search engines, 31 secure multiparty computation, 117, 279 security, 159 Segment, 87–88 self-reinforcing data, 76 selling data, 122 sensors, 113–14, 160, 280 shopping online, 29, 31, 34, 37, 41, 84 simulation, 103–4, 280 ABMs versus, 105 social networks, 16, 20, 44 Facebook, 25, 43, 112, 119, 122 LinkedIn, 122 software, 159 traditional business models for, 233–34 software-as-a-service (SaaS), 87, 280 software development kits (SDKs), 112, 280 software engineering, decoupling data science from, 133 software engineers, 139, 134–37 Sony, 7 speed of data labeling, 108 spreadsheets, 171 Square Capital, 35 stability, 175–77 staging, 249–51 standardization, 247–48, 249–50 statistical analysis, 149, 153 statistical process control (SPC), 156, 173, 280 statistics, 53–54 stocks, 72, 74, 120–21 supervised machine learning, 147–48, 280 supply, 225 supply-chain tracking, 260 support vector machines, 280 synthetic data, 105–8, 216 system of engagement, 280 system of record, 243, 281 systems integrators (SIs), 161, 248, 281 Tableau, 253 talent loop, 260–61, 281 Taylor, Frederick W., 6 teams in proof of concept phase, 60 see also AI-First teams telecommunications industry, 250–51 telephones mobile, 113 iPhone, 253 networks, 23–25 templates, 171 temporal leverage, 3 threshold logic unit (TLU), 5 ticker data, 120–21 token-based incentives, 109–10 tools, 2–3, 93–97 training data, 199 transactional pricing, 237, 280 transaction costs, 243 transfer learning, 147–48 true and false, 204–6 Turing, Alan, 5 23andMe, 112 Twilio, 87 uncertainty sampling, 96 unit analysis, 213–14 United Nations, 250 unsupervised machine learning, 53, 147–48, 281 Upwork, 99 usability, 255–56 usage-based pricing, 237–38, 281 usage metrics, 209 user interface (UI), 89, 159, 281 utility of network effects, 42 of products, 35–36 validation data, 199 value chain, 18–19, 281 value proposition, 59 values, missing, 178 variable importance plots, 53, 281 variance reduction, 96 Veeva Systems, 212 vendors, 73, 161 data, prices charged by, 73 independent software, 161, 248, 276 lock-in and, 247–48 venture capital, 230 veracity of data, 75 versioning, 169–70, 281 vertical integration, 226–37, 239, 244, 252, 281 vertical products, 210–12, 282 VMWare, 248 waterfall charts, 282 Web crawlers, 115–16, 282 weights, 150, 281 workflow applications, 84–86, 253, 259, 282 workflow-first versus integrations-first companies, 88–89 yield management systems, 42 Zapier, 87 Zendesk, 233 zettabyte, 8, 282 Zetta Venture Partners, 8–9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ABOUT THE AUTHOR Ash Fontana became one of the most recognized startup investors in the world after launching online investing at AngelList.

pages: 696 words: 143,736

The Age of Spiritual Machines: When Computers Exceed Human Intelligence by Ray Kurzweil

Ada Lovelace, Alan Greenspan, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Alvin Toffler, Any sufficiently advanced technology is indistinguishable from magic, backpropagation, Buckminster Fuller, call centre, cellular automata, Charles Babbage, classic study, combinatorial explosion, complexity theory, computer age, computer vision, Computing Machinery and Intelligence, cosmological constant, cosmological principle, Danny Hillis, double helix, Douglas Hofstadter, Everything should be made as simple as possible, financial engineering, first square of the chessboard / second half of the chessboard, flying shuttle, fudge factor, functional programming, George Gilder, Gödel, Escher, Bach, Hans Moravec, I think there is a world market for maybe five computers, information retrieval, invention of movable type, Isaac Newton, iterative process, Jacquard loom, John Gilmore, John Markoff, John von Neumann, Lao Tzu, Law of Accelerating Returns, mandelbrot fractal, Marshall McLuhan, Menlo Park, natural language processing, Norbert Wiener, optical character recognition, ought to be enough for anybody, pattern recognition, phenotype, punch-card reader, quantum entanglement, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Robert Metcalfe, Schrödinger's Cat, Search for Extraterrestrial Intelligence, self-driving car, Silicon Valley, social intelligence, speech recognition, Steven Pinker, Stewart Brand, stochastic process, Stuart Kauffman, technological singularity, Ted Kaczynski, telepresence, the medium is the message, The Soul of a New Machine, There's no reason for any individual to have a computer in his home - Ken Olsen, traveling salesman, Turing machine, Turing test, Whole Earth Review, world market for maybe five computers, Y2K

Because the sequence is random and without meaning, noise carries no information. Contrasted with information. Objective experience The experience of an entity as observed by another entity, or measurement apparatus. OCR See Optical character recognition. Operating system A software program that manages and provides a variety of services to application programs, including user interface facilities and management of input-output and memory devices. Optical character recognition (OCR) A process in which a machine scans, recognizes, and encodes printed (and possibly handwritten) characters into digital form. Optical computer A computer that processes information encoded in patterns of light beams; different from today’s conventional computers, in which information is represented in electronic circuitry or encoded on magnetic surfaces.

…

A few parents, on the other hand, were furious that we had failed to recommend Harvard. It was my first experience with the ability of computers to affect people’s lives. I sold that company to Harcourt, Brace & World, a New York publisher, and moved on to other ideas. In 1974, computer programs that could recognize printed letters, called optical character recognition (OCR), were capable of handling only one or two specialized type styles. I founded Kurzweil Computer Products that year to develop the first OCR program that could recognize any style of print, which we succeeded in doing later that year. So the question then became, What is it good for?

…

.), creator of speech recognition and natural language software systems: <http://www.lhs.com/dictation/> The overall Lernout &: Hauspie web site: <http://www.lhs.com/> Kurzweil Music Systems, Inc., creator of computer-based music synthesizers, sold to Young Chang in 1990: <http:l/www youngchang. com/kurzweil/index.html> TextBridge Optical Character Recognition (OCR). Formerly Kurzweil OCR from Kurzweil Computer Products, Inc. (sold to Xerox Corp. in 1980): <http://www.xerox.com/scansoft/textbridge/> ARTIFICIAL LIFE AND ARTIFICIAL INTELLIGENCE RESEARCH The Artificial Intelligence Laboratory at Massachusetts Institute of Technology (MIT): <http://www.ai.mit.edu/> Artificial Life Online: <http://alife.santafe.edu> Contemporary Philosophy of Mind: An Annotated Bibliography: <http://ling.ucsc.edu/-chalmers/biblio.html> Machine Learning Laboratory, the University of Massachusetts, Amherst: <http://www-ml.cs.umass.edu/> The MIT Media Lab: <http://www.media.mit.edu/> SSIE 580B: Evolutionary Systems and Artificial Life, by Luis M.

Paper Knowledge: Toward a Media History of Documents by Lisa Gitelman

Alvin Toffler, An Inconvenient Truth, Andrew Keen, Charles Babbage, computer age, corporate governance, Dennis Ritchie, deskilling, Douglas Engelbart, Douglas Engelbart, East Village, en.wikipedia.org, information retrieval, Internet Archive, invention of movable type, Ivan Sutherland, Jaron Lanier, Ken Thompson, knowledge economy, Lewis Mumford, machine translation, Marshall McLuhan, Mikhail Gorbachev, military-industrial complex, national security letter, Neal Stephenson, On the Economy of Machinery and Manufactures, optical character recognition, profit motive, QR code, RAND corporation, RFC: Request For Comment, scientific management, Shoshana Zuboff, Silicon Valley, Steve Jobs, tacit knowledge, technological determinism, The Structural Transformation of the Public Sphere, Turing test, WikiLeaks, Works Progress Administration

That said, there are plenty of pdf s— called “image-only”—that cannot be searched within a pdf -reader application until or unless they have been manipulated computationally to identify the alphanumeric characters they contain through optical character recognition (ocr ), which produces machine-encoded text. Before being scanned, these image-only pdf s do function as images, and very “poor” ones at that.87 “To ocr ” a document has become a verb at least as handy in some situations as “to pdf ” one. Optical character recognition points precisely to the line that separates electronic texts from images. It is a line that disappears at the level of the alphanumeric character since “the algorithmic eyes of ” scanning technology effectively identify the shapes of characters, “seeing” them as patterns of yes/no variables that can together be “recognized” (that is, processed) as alphanumeric characters.

…

., 19 208 INDEX mla Handbook for Writers of Research Papers, 9 Morgan, Pierpont, 63 Moskowitz, Sam, 146, 148 Moxon, John, 49 Mumford, Lewis, 61 The Myth of the Paperless Office (Sellen and Harper), 4, 111, 128, 130 National Association of Book Publishers, 73 National Library of Medicine, 107 Neilsen, Jakob, 132 New Deal, 14, 62, 167n15 New York Public Library, 23, 55, 66–67, 73 New York Times, 84–88, 92, 94, 120–21, 129 newspapers, 2, 4, 25, 31, 33, 36, 40, 43, 45, 46, 52, 58, 72, 75, 77, 78, 85, 88, 111, 124, 138, 139, 141, 148, 149 Nixon, Richard M., 86–87, 95–96 novels, 3, 36, 40, 115 Novelty Job Printing Press, 138–40, 184n6 Nunberg, Geoffrey, 4 Obama, Barack, 95, 97, 116 The Office (television series), 106 Ohmann, Richard, 145 Oliver Optic’s Magazine, 141–42 optical character recognition (ocr), 134 Our Young Folks, 142 Oushakine, Serguei Alex., 174n41 Owen, Robert Dale, 34 paper, 3–4, 33, 46, 89, 123, 128, 147; format, 12, 41, 62; perishability of, 54, 66–67; shredding, 96 passport, 1, 10 Patriot Act of 2001, 97 Pentagon Papers, 16, 85–88, 90, 94–96, 107, 116–17 Phillips, John L., 38, 40–42; “The Art of Preservative”: 100 Fancy Specimens of Job Printing, 11, 38, 40, 42, 44, 140 photocopy.

pages: 688 words: 107,867

Python Data Analytics: With Pandas, NumPy, and Matplotlib by Fabio Nelli

Amazon Web Services, backpropagation, centre right, computer vision, data science, Debian, deep learning, DevOps, functional programming, Google Earth, Guido van Rossum, Internet of things, optical character recognition, pattern recognition, sentiment analysis, speech recognition, statistical model, web application

Think about, for example, the ZIP codes on letters at the post office and the automation needed to recognize these five digits. Perfect recognition of these codes is necessary in order to sort mail automatically and efficiently. Included among the other applications that may come to mind is OCR (Optical Character Recognition) software . OCR software must read handwritten text, or pages of printed books, for general electronic documents in which each character is well defined. But the problem of handwriting recognition goes farther back in time, more precisely to the early 20th Century (1920s), when Emanuel Goldberg (1881–1970) began his studies regarding this issue and suggested that a statistical approach would be an optimal choice.

…

Index A Accents, LaTeX Advanced Data aggregation apply() functions transform() function Anaconda Anderson Iris Dataset, see Iris flower dataset Array manipulation joining arrays column_stack() and row_stack() hstack() function vstack() function splitting arrays hsplit() function split() function vsplit() function Artificial intelligence schematization of Artificial neural networks biological networks edges hidden layer input and output layer multi layer perceptron nodes schematization of SLP ( see Single layer perceptron (SLP)) weight B Bar chart 3D error bars horizontal matplotlib multiserial multiseries stacked bar pandas DataFrame representations stacked bar charts x-axis xticks() function Bayesian methods Big Data Bigrams Biological neural networks Blending operation C Caffe2 Chart typology Choropleth maps D3 library geographical representations HTML() function jinja2 JSON and TSV JSON TopoJSON require.config() results US population data source census.gov file TSV, codes HTML() function jinja2.Template pop2014_by_county dataframe population.csv render() function SUMLEV values Classification and regression trees Classification models Climatic data Clustered bar chart IPython Notebook jinja2 render() function Clustering models Collocations Computer vision Concatenation arrays combining concat() function dataframe keys option pivoting hierarchical indexing long to wide format stack() function unstack() function removing Correlation Covariance Cross-validation Cython D Data aggregation apply() functions GroupBy groupby() function operations output of SPLIT-APPLY-COMBINE hierarchical grouping merge() numeric and string values price1 column transform() function Data analysis charts data visualization definition deployment phase information knowledge knowledge domains computer science disciplines fields of application machine learning and artificial intelligence mathematics and statistics problems of open data predictive model process data sources deployment exploration/visualization extraction model validation planning phase predictive modeling preparation problem definition stages purpose of Python and quantitative and qualitative types categorical data numerical data DataFrame pandas definition nested dict operations structure transposition structure Data manipulation aggregation ( see Data aggregation) concatenation discretization and binning group iteration permutation phases of preparation ( see Data preparation) string ( see String manipulation) transformation Data preparation DataFrame merging operation pandas.concat() pandas.DataFrame.combine_first() pandas.merge() procedures of Data structures, operations DataFrame and series flexible arithmetic methods Data transformation drop_duplicates() function mapping adding values axes dict objects replacing values remove duplicates Data visualization adding text axis labels informative label mathematical expression modified of text() function bar chart ( see Bar chart) chart typology contour plot/map data analysis 3D surfaces grid grids, subplots handling date values histogram installation IPython and IPython QtConsole kwargs figures and axes horizontal subplots linewidth plot() function vertical subplots legend chart of legend() function multiseries chart upper-right corner line chart ( see Line chart) matplotlib architecture and NumPy matplotlib library ( see matplotlib library) mplot3d multi-panel plots grids, subplots subplots pie charts axis() function modified chart pandas Dataframe pie() function shadow kwarg plotting window buttons of commands matplotlib and NumPy plt.plot() function properties QtConsole polar chart pyplot module saving, charts HTML file image file source code scatter plot, 3D Decision trees Deep learning artificial ( see Artificial neural networks) artificial intelligence data availability machine learning neural networks and GPUs Python frameworks programming language schematization of TensorFlow ( see TensorFlow) Digits dataset definition digits.images array digit.targets array handwritten digits handwritten number images matplotlib library scikit-learn library Discretization and binning any() function categorical type cut() function describe() function detecting and filtering outliers qcut() std() function value_counts() function Django Dropping E Eclipse (pyDev) Element-wise computation Expression-oriented programming F Financial data Flexible arithmetic methods Fonts, LaTeX G Gradient theory Graphics Processing Unit (GPU) Grouping Group iteration chain of transformations functions on groups mark() function quantiles() function GroupBy object H Handwriting recognition digits dataset handwritten digits, matplotlib library learning and predicting OCR software scikit-learn svc estimator TensorFlow validation set, six digits Health data Hierarchical indexing arrays DataFrame reordering and sorting levels stack() function statistic levels structure two-dimensional structure I IDEs, see Interactive development environments (IDEs) Image analysis concept of convolutions definition edge detection blackandwhite.jpg image black and white system filters function gradients.jpg image gray gradients Laplacian and Sobel filters results source code face detection gradient theory OpenCV ( see Open Source Computer Vision (OpenCV)) operations representation of Indexing functionalities arithmetic and data alignment dropping reindexing Integration Interactive development environments (IDEs) Eclipse (pyDev) Komodo Liclipse NinjaIDE Spyder Sublime Interactive programming language Interfaced programming language Internet of Things (IoT) Interpreted programming language Interpreter characterization Cython Jython PVM PyPy tokenization IPython and IPython QtConsole Jupyter project logo Notebook DataFrames QtConsole shell tools of Iris flower dataset Anderson Iris Dataset IPython QtConsole Iris setosa features length and width, petal matplotlib library PCA decomposition target attribute types of analysis variables J JavaScript D3 Library bar chart CSS definitions data-driven documents HTML importing library IPython Notebooks Jinja2 library pandas dataframe render() function require.config() method web chart creation Jinja2 library Jython K K-nearest neighbors classification decision boundaries 2D scatterplot, sepals predict() function random.permutation() training and testing set L LaTeX accents fonts fractions, binomials, and stacked numbers with IPython Notebook in Markdown Cell in Python 2 Cell with matplotlib radicals subscripts and superscripts symbols arrow symbols big symbols binary operation and relation symbols Delimiters Hebrew lowercase Greek miscellaneous symbols standard function names uppercase Greek Learning phase Liclipse Linear regression Line chart annotate() arrowprops kwarg Cartesian axes color codes data points different series gca() function Greek characters LaTeX expression line and color styles mathematical expressions mathematical function pandas plot() function set_position() function xticks() and yticks() functions Linux distribution LOD cloud diagram Logistic regression M Machine learning (ML) algorithm development process deep learning diabetes dataset features/attributes Iris flower dataset learning problem linear/least square regression coef_ attribute fit() function linear correlation parameters physiological factors and progression of diabetes single physiological factor schematization of supervised learning SVM ( see Support vector machines (SVMs)) training and testing set unsupervised learning Mapping adding values inplace option rename() function renaming, axes replacing values Mathematical expressions with LaTeX, see LaTeX MATLAB matplotlib matplotlib library architecture artist layer backend layer functions and tools layers pylab and pyplot scripting layer (pyplot) artist layer graphical representation hierarchical structure primitive and composite graphical representation LaTeX NumPy Matrix product Merging operation DataFrame dataframe objects index join() function JOIN operation left_index/right_index options left join, right join and outer join left_on and right_on merge() function Meteorological data Adriatic Sea and Po Valley cities Comacchio image of mountainous areas reference standards TheTimeNow website climate data source JSON file Weather Map site IPython Notebook chart representation CSV files DataFrames humidity function linear regression matplotlib library Milan read_csv() function result shape() function SVR method temperature Jupyter Notebook access internal data command line dataframe extraction procedures Ferrara JSON file json.load() function parameters prepare() function RoseWind ( see RoseWind) wind speed Microsoft excel files dataframe data.xls internal module xlrd read_excel() function MongoDB Multi Layer Perceptron (MLP) artificial networks evaluation of experimental data hidden layers IPython session learning phase model definition test phase and accuracy calculation Musical data N Natural Language Toolkit (NLTK) bigrams and collocations common_contexts() function concordance() function corpora downloader tool fileids() function HTML pages, text len() function library macbeth variable Python library request() function selecting words sentimental analysis sents() function similar() function text, network word frequency macbeth variable most_common() function nltk.download() function nltk.FreqDist() function stopwords string() function word search Ndarray array() function data, types dtype (data-type) intrinsic creation type() function NOSE MODULE “Not a Number” data filling, NaN occurrences filtering out NaN values NaN value NumPy library array manipulation ( see Array manipulation) basic operations aggregate functions arithmetic operators increment and decrement operators matrix product ufunc broadcasting compatibility complex cases operator/function BSD conditions and Boolean arrays copies/views of objects data analysis indexing bidimensional array monodimensional ndarray negative index value installation iterating an array ndarray ( see Ndarray) Numarray python language reading and writing array data shape manipulation slicing structured arrays vectorization O Object-oriented programming language OCR, see Optical Character Recognition (OCR) software Open data Open data sources climatic data demographics IPython Notebook matplotlib pandas dataframes pop2014_by_state dataframe pop2014 dataframe United States Census Bureau financial data health data miscellaneous and public data sets musical data political and government data publications, newspapers, and books social data sports data Open Source Computer Vision (OpenCV) deep learning image processing and analysis add() function blackish image blending destroyWindow() method elementary operations imread() method imshow() method load and display merge() method NumPy matrices saving option waitKey() method working process installation MATLAB packages start programming Open-source programming language Optical Character Recognition (OCR) software order() function P Pandas dataframes Pandas data structures DataFrame assigning values deleting column element selection filtering membership value nested dict transposition evaluating values index objects duplicate labels methods NaN values NumPy arrays and existing series operations operations and mathematical functions series assigning values declaration dictionaries filtering values index internal elements, selection operations Pandas library correlation and covariance data structures ( see Pandas data structures) function application and mapping element row/column statistics getting started hierarchical indexing and leveling indexes ( see Indexing functionalities) installation Anaconda development phases Linux module repository, Windows PyPI source testing “Not a Number” data python data analysis sorting and ranking Permutation new_order array np.random.randint() function numpy.random.permutation() function random sampling DataFrame take() function Pickle—python object serialization cPickle frame.pkl pandas library stream of bytes Political and government data pop2014_by_county dataframe pop2014_by_state dataframe pop2014 dataframe Portable programming language PostgreSQL Principal component analysis (PCA) Public data sets PVM, see Python virtual machine (PVM) pyplot module interactive chart Line2D object plotting window show() function PyPy interpreter Python data analysis library deep learning frameworks module OpenCV Python Package Index (PyPI) Python’s world code implementation distributions Anaconda Enthought Canopy Python(x,y) IDEs ( see Interactive development environments (IDEs)) installation interact interpreter ( see Interpreter) IPython ( see IPython) programming language PyPI Python 2 Python 3 running, entire program code SciPy libraries matplotlib NumPy pandas shell source code data structure dictionaries and lists functional programming Hello World index libraries and functions map() function mathematical operations print() function writing python code, indentation Python virtual machine (PVM) PyTorch Q Qualitative analysis Quantitative analysis R R Radial Basis Function (RBF) Radicals, LaTeX Ranking Reading and writing array binary files tabular data Reading and writing data CSV and textual files header option index_col option myCSV_01.csv myCSV_03.csv names option read_csv() function read_table() function .txt extension databases create_engine() function dataframe pandas.io.sql module pgAdmin III PostgreSQL read_sql() function read_sql_query() function read_sql_table() function sqlalchemy sqlite3 DataFrame objects functionalities HDF5 library data structures HDFStore hierarchical data format mydata.h5 HTML files data structures read_html () web_frames web pages web scraping I/O API Tools JSON data books.json frame.json json_normalize() function JSONViewer normalization read_json() and to_json() read_json() function Microsoft excel files NoSQL database insert() function MongoDB pickle—python object serialization RegExp metacharacters read_table() skiprows TXT files nrows and skiprows options portion by portion writing ( see Writing data) XML ( see XML) Regression models Reindexing RoseWind DataFrame hist array polar chart scatter plot representation showRoseWind() function S Scikit-learn library data analysis k-nearest neighbors classification PCA Python module sklearn.svm.SVC supervised learning svm module SciPy libraries matplotlib NumPy pandas Sentimental analysis document_features() function documents list() function movie_reviews negative/positive opinion opinion mining Shape manipulation reshape() function shape attribute transpose() function Single layer perceptron (SLP) accuracy activation function architecture cost optimization data analysis evaluation phase learning phase model definition explicitly implicitly learning phase placeholders tf.add() function tf.nn.softmax() function modules representation testing set test phase and accuracy calculation training sets Social data sort_index() function Sports data SQLite3 stack() function String manipulation built-in methods count() function error message index() and find() join() function replace() function split() function strip() function regular expressions findall() function match() function re.compile() function regex re.split() function split() function Structured arrays dtype option structs/records Subjective interpretations Subscripts and superscripts, LaTeX Supervised learning machine learning scikit-learn Support vector classification (SVC) decision area effect, decision boundary nonlinear number of points, C parameter predict() function regularization support_vectors array training set, decision space Support vector machines (SVMs) decisional space decision boundary Iris Dataset decision boundaries linear decision boundaries polynomial decision boundaries polynomial kernel RBF kernel training set SVC ( see Support vector classification (SVC)) SVR ( see Support vector regression (SVR)) Support vector regression (SVR) curves diabetes dataset linear predictive model test set, data swaplevel() function T TensorFlow data flow graph Google’s framework installation IPython QtConsole MLP ( see Multi Layer Perceptron (MLP)) model and sessions SLP ( see Single layer perceptron (SLP)) tensors operation parameters print() function representations of tf.convert_to_tensor() function tf.ones() method tf.random_normal() function tf.random_uniform() function tf.zeros() method Text analysis techniques definition NLTK ( see Natural Language Toolkit (NLTK)) techniques Theano trigrams() function U, V United States Census Bureau Universal functions (ufunc) Unsupervised learning W Web Scraping Wind speed polar chart representation RoseWind_Speed() function ShowRoseWind() function ShowRoseWind_Speed() function to_csv () function Writing data HTML files myFrame.html to_html() function na_rep option to_csv() function X, Y, Z XML books.xml getchildren() getroot() function lxml.etree tree structure lxml library objectify parse() function tag attribute text attribute

…

pages: 1,199 words: 332,563

Golden Holocaust: Origins of the Cigarette Catastrophe and the Case for Abolition by Robert N. Proctor

"RICO laws" OR "Racketeer Influenced and Corrupt Organizations", bioinformatics, carbon footprint, clean water, corporate social responsibility, Deng Xiaoping, desegregation, disinformation, Dr. Strangelove, facts on the ground, friendly fire, germ theory of disease, global pandemic, index card, Indoor air pollution, information retrieval, invention of gunpowder, John Snow's cholera map, language of flowers, life extension, New Journalism, optical character recognition, pink-collar, Ponzi scheme, Potemkin village, precautionary principle, publication bias, Ralph Nader, Ronald Reagan, selection bias, speech recognition, stem cell, telemarketer, Thomas Kuhn: the structure of scientific revolutions, Triangle Shirtwaist Factory, Upton Sinclair, vertical integration, Yogi Berra

The present text is different in taking more of a global view (even if America remains the centerpiece) but also by virtue of being almost entirely based on the industry’s formerly secret archives, now (and only recently) available online in full-text searchable form. In this sense the book represents a new kind of historiography: history based on optical character recognition, allowing a rapid “combing” of the archives for historical gems (and fleas). Searching by optical character recognition works like a powerful magnet, allowing anyone with an Internet connection to pull out rhetorical needles from large and formidable document haystacks. (Try it—you need only go to http://legacy.library.ucsf.edu, and enter whatever search term you might fancy.)

…

Prior to computerization, it would have taken many lifetimes to go through such a large body of documents and gather up all usages of words such as “alleged,” “castoreum,” or “propaganda.” With full text searchability online, however—thanks to optical character recognition—this can now be done in a matter of seconds, and by anyone with an Internet connection. We can only search what has been turned over by the companies, of course—and that limitation is profound—but the archives do make it harder for ideas once captured to be lost. And optical character recognition works like an enormous magnet, allowing the tiniest of rhetorical needles to be found even in large archival haystacks. History is rendered transparent in ways not previous possible.

…

Now accessible at http://legacy.library.ucsf.edu, the Legacy Tobacco Documents Library is the largest business archive in the world. Most documents are full-text searchable, and searches for terms like “cancer” or “nicotine” turn up hundreds of thousands of documents. Searches for terms like “baseball” or “sports” yield many thousands of hits. Optical character recognition was introduced in 2007, which means you can now search for expressions like “please destroy” or “subjects to be avoided,” with options to order the documents by date or by size; one can limit one’s search to documents from a particular company or a particular year or author or a particular document type (consumer letters, for example).

pages: 295 words: 66,912

Walled Culture: How Big Content Uses Technology and the Law to Lock Down Culture and Keep Creators Poor by Glyn Moody

Aaron Swartz, Big Tech, bioinformatics, Brewster Kahle, connected car, COVID-19, disinformation, Donald Knuth, en.wikipedia.org, full text search, intangible asset, Internet Archive, Internet of things, jimmy wales, Kevin Kelly, Kickstarter, non-fungible token, Open Library, optical character recognition, p-value, peer-to-peer, place-making, quantitative trading / quantitative ﬁnance, rent-seeking, text mining, the market place, TikTok, transaction costs, WikiLeaks

According to the official Google history of the project, Page told the university that he believed Google could do it in six years. In due course, the Google team came up with a non-destructive scanning technique that allowed even curved pages to be scanned at up to 6,000 pages per hour. Optical Character Recognition (OCR)59 then converted the scans into digital texts. In 2004, what had originally been called Project Ocean was formally announced as Google Print. By then a number of universities and publishers had joined the project. Later that year, Google announced the Google Print Library Project and its plans to digitise 15 million volumes within a decade.

…

_Hart 53 https://web.archive.org/web/20220620163648/http://www.gutenbergnews.org/about/history-of-project-gutenberg/ 54 https://web.archive.org/web/20220620163648/http://www.gutenbergnews.org/about/history-of-project-gutenberg/ 55 https://web.archive.org/web/20220620163725/https://pro.europeana.eu/about-us/mission 56 https://web.archive.org/web/20220621071000/https://www.europeana.eu/en 57 https://web.archive.org/web/20220621070928/https://pro.europeana.eu/post/the-missing-decades-the-20th-century-black-hole-in-europeana 58 https://web.archive.org/web/20160206043510/http://books.google.com/googlebooks/about/history.html 59 https://web.archive.org/web/20220621071024/https://en.wikipedia.org/wiki/Optical_character_recognition 60 https://web.archive.org/web/20220621071132/https://en.wikipedia.org/wiki/Google_Books 61 https://web.archive.org/web/20220621071154/https://www.authorsguild.org/who-we-are/ 62 https://web.archive.org/web/20220621071953/https://www.nytimes.com/2005/09/21/technology/writers-sue-google-accusing-it-of-copyright-violation.html 63 https://web.archive.org/web/20220621072029/https://www.cnet.com/tech/tech-industry/publishers-sue-google-over-book-search-project/ 64 https://web.archive.org/web/20220620164503/https://www.theatlantic.com/technology/archive/2017/04/the-tragedy-of-google-books/523320/ 65 https://web.archive.org/web/20220620171524/https://www.authorsguild.org/wp-content/uploads/2012/08/Amended-Settlement-Agreement.pdf 66 https://web.archive.org/web/20220621072210/https://facultydirectory.virginia.edu/faculty/sv2r 67 https://web.archive.org/web/20220620171618/https://lawreview.law.ucdavis.edu/issues/40/3/copyright-creativity-catalogs/DavisVol40No3_Vaidhyanathan.pdf 68 https://web.archive.org/web/20220620164503/https://www.theatlantic.com/technology/archive/2017/04/the-tragedy-of-google-books/523320/ 69 https://web.archive.org/web/20131103165236/http://publishers.org/press/85/ 70 https://web.archive.org/web/20220621072300/https://www.eff.org/cases/authors-guild-v-google-part-ii-fair-use-proceedings 71 https://web.archive.org/web/20220621072320/https://www.hathitrust.org/authors_guild_lawsuit_information 72 https://web.archive.org/web/20220621072339/https://www.hathitrust.org/press_10-13-2008 73 https://web.archive.org/web/20220620172106/https://www.publishersweekly.com/pw/by-topic/digital/copyright/article/48659-authors-guild-sues-libraries-over-scan-plan.html 74 https://web.archive.org/web/20220621073236/https://www.nytimes.com/2011/09/13/business/media/authors-sue-to-remove-books-from-digital-archive.html 75 https://web.archive.org/web/20220620172106/https://www.publishersweekly.com/pw/by-topic/digital/copyright/article/48659-authors-guild-sues-libraries-over-scan-plan.html 76 https://web.archive.org/web/20220621073355/https://en.wikipedia.org/wiki/Transformative_use 77 https://web.archive.org/web/20220620173115/https://www.publishersweekly.com/pw/by-topic/digital/copyright/article/54321-in-hathitrust-ruling-judge-says-google-scanning-is-fair-use.html 78 https://web.archive.org/web/20220620173140/https://www.wipo.int/marrakesh_treaty/en/ 79 https://web.archive.org/web/20220705085826/https://en.wikipedia.org/wiki/James_Love_%28NGO_director%29 80 https://web.archive.org/web/20220620173240/https://www.keionline.org/ 81 https://web.archive.org/web/20220909084848/https://walledculture.org/interview-james-love/ 82 https://web.archive.org/web/20220701065841/https://corporateeurope.org/en/power-lobbies/2017/03/marrakesh-brussels-long-arm-eu-copyright-lobby 83 https://web.archive.org/web/20220621164103/https://www.wired.com/2017/04/how-google-book-search-got-lost/ 84 https://web.archive.org/web/20220620183320/https://www.hathitrust.org/files/14MillionBooksand6MillionVisitors_1.pdf 85 https://web.archive.org/web/20220621164158/https://www.europeana.eu/en/about-us 86 https://web.archive.org/web/20220620183528/https://dp.la/ 87 https://web.archive.org/web/20220621034828/https://archive.org/ 88 https://web.archive.org/web/20220701073551/https://www.nytimes.com/2005/10/03/business/in-challenge-to-google-yahoo-will-scan-books.html 89 https://web.archive.org/web/20220701073629/https://openlibrary.org/ 90 https://web.archive.org/web/20220621034828/https://archive.org/ 91 https://web.archive.org/web/20220621074031/https://walledculture.org/interview-brewster-kahle-libraries-role-3-internet-battles-licensing-pains-the-national-emergency-library-and-the-internet-archives-controlled-digital-lending-efforts-vs-the-publishers-lawsuit/ 92 https://web.archive.org/web/20220817072842/https:/digitalcommons.du.edu/collaborativelibrarianship/vol12/iss2/8 93 https://web.archive.org/web/20220621074146/https://www.hathitrust.org/ETAS-Description 94 https://web.archive.org/web/20220621075018/https://en.wikipedia.org/wiki/Controlled_digital_lending 95 https://web.archive.org/web/20220621075101/https://controlleddigitallending.org/ 96 https://web.archive.org/web/20220621075139/https://controlleddigitallending.org/faq 97 https://web.archive.org/web/20220621183602/http://openlibraries.online/ 98 https://web.archive.org/web/20220701075026/https://www.wsj.com/articles/SB10001424052748703279704575335193054884632 99 https://web.archive.org/web/20220121095547/https://archive.org/details/TransformingourLibrariesintoDigitalLibraries102016 100 https://web.archive.org/web/20220701075207/https://docs.google.com/document/u/1/d/e/2PACX-1vQeYK7dKWH7Qqw9wLVnmEo1ZktykuULBq15j7L2gPCXSL3zem4WZO4JFyj-dS9yVK6BTnu7T1UAluOl/pub 101 https://web.archive.org/web/20220701075343/https://www.newyorker.com/books/page-turner/the-national-emergency-library-is-a-gift-to-readers-everywhere 102 https://web.archive.org/web/20220621164306/https://www.authorsguild.org/industry-advocacy/internet-archives-uncontrolled-digital-lending/ 103 https://web.archive.org/web/20220621164330/https://publishers.org/news/comment-from-aap-president-and-ceo-maria-pallante-on-the-internet-archives-national-emergency-library/ 104 https://web.archive.org/web/20220621164359/https://www.publishersweekly.com/pw/by-topic/industry-news/publisher-news/article/83472-publishers-charge-the-internet-archive-with-copyright-infringement.html 105 https://web.archive.org/web/20220620201839/https://www.publishersweekly.com/binary-data/ARTICLE_ATTACHMENT/file/000/004/4388-1.pdf 106 https://web.archive.org/web/20220620201859/https://blog.archive.org/2020/06/10/temporary-national-emergency-library-to-close-2-weeks-early-returning-to-traditional-controlled-digital-lending/ 107 https://web.archive.org/web/20220621165251/https://walledculture.org/interview-brewster-kahle-libraries-role-3-internet-battles-licensing-pains-the-national-emergency-library-and-the-internet-archives-controlled-digital-lending-efforts-vs-the-publishers-lawsuit/ 108 https://web.archive.org/web/20220621164359/https://www.publishersweekly.com/pw/by-topic/industry-news/publisher-news/article/83472-publishers-charge-the-internet-archive-with-copyright-infringement.html 109 https://web.archive.org/web/20220701075637/https://www.techdirt.com/2020/06/04/major-publishers-sue-internet-archives-digital-library-program-midst-pandemic/ 110 https://web.archive.org/web/20220701075917/https://help.archive.org/help/national-emergency-library-faqs/ 111 https://web.archive.org/web/20220621165356/https://www.publishersweekly.com/pw/by-topic/industry-news/trade-shows-events/article/62877-ala-2014-raising-the-stakes.html 112 https://web.archive.org/web/20220620202027/https://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/77532-tor-scales-back-library-e-book-lending-as-part-of-test.html 113 https://web.archive.org/web/20220621075324/https://www.authorsguild.org/industry-advocacy/e-book-library-pricing-the-game-changes-again/ 114 https://web.archive.org/web/20220620202210/https://www.publishersweekly.com/pw/by-topic/digital/content-and-e-books/article/46333-librarian-unhappiness-over-new-harper-e-book-lending-policy-grows.html 115 https://web.archive.org/web/20220620203003/https://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/80758-after-tor-experiment-macmillan-expands-embargo-on-library-e-books.html 116 https://web.archive.org/web/20220621075545/https://www.ala.org/news/press-releases/2019/07/ala-denounces-new-macmillan-library-lending-model-urges-library-customers 117 https://web.archive.org/web/20220620203047/https://www.publishersweekly.com/binary-data/ARTICLE_ATTACHMENT/file/000/004/4353-1.pdf 118 https://web.archive.org/web/20220621165501/https://www.newyorker.com/news/annals-of-communications/an-app-called-libby-and-the-surprisingly-big-business-of-library-e-books 119 https://web.archive.org/web/20220621165531/https://company.overdrive.com/2020/05/14/check-out-mays-trending-titles-on-libby/ 120 https://web.archive.org/web/20220621165603/https://publishers.org/news/aap-october-2020-statshot-report-publishing-industry-up-7-3-for-month-down-1-0-year-to-date/ 121 https://web.archive.org/web/20220620203518/https://www.authorsalliance.org/2021/12/10/update-aap-sues-maryland-over-e-lending-law/ 122 https://web.archive.org/web/20220620203541/https://www.publishersweekly.com/pw/by-topic/industry-news/libraries/article/85785-maryland-legislature-passes-law-supporting-library-access-to-digital-content.html 123 https://web.archive.org/web/20220620203604/https://publishers.org/wp-content/uploads/2021/12/AAP-v.

pages: 470 words: 125,992

The Laundromat : Inside the Panama Papers, Illicit Money Networks, and the Global Elite by Jake Bernstein

Albert Einstein, banking crisis, Berlin Wall, bitcoin, blockchain, blood diamond, British Empire, central bank independence, Charlie Hebdo massacre, clean water, commoditize, company town, corporate governance, cryptocurrency, Deng Xiaoping, Donald Trump, Edward Snowden, fake news, Fall of the Berlin Wall, high net worth, income inequality, independent contractor, Julian Assange, Laura Poitras, liberation theology, mega-rich, Mikhail Gorbachev, new economy, offshore financial centre, optical character recognition, pirate software, Ponzi scheme, profit motive, rising living standards, Ronald Reagan, Seymour Hersh, Skype, traveling salesman, WikiLeaks

He wasn’t entirely positive he could pull the project off himself. The September deadline was looking increasingly unrealistic. Making the data accessible for the partners had turned into a complicated slog. Many of the documents consisted of photos or PDFs. Each one had to be put through optical character recognition software in order to be keyword searchable, a slow and laborious pro cess. Ryle set up a special dedicated computer at ICIJ to give the leakers remote access to the material. The idea was that they would help reconstruct the data and add their insights. One day while on the computer, Ryle discovered a communication he wasn’t supposed to see.

…

The most revealing information—such as who actually owned the company—was buried inside the documents themselves. For example, power-of-attorney agreements often gave the actual owner control of the company. This would be inside a document. Financial transactions were in documents as well. ICIJ would have to process the documents with optical character recognition software to make them searchable. It would be a mam-moth undertaking. Then, once the documents were processed, teams of reporters would have to comb through the data to find their stories. Cabra knew they would need another data journalist. After she returned to Spain, ICIJ added a computer engineer, Miguel Fiandor, to the team to help with the reconstruction.

…

When a CPU selected a document, the cue of documents froze so that it would be impossible for a different server to take the same file. Caruana Galizia designed a program that made this ballet of data extraction possible. A CPU grabbed a document—for example, an email with a PDF file as an attachment. The program indexed the text from the email and performed optical character recognition on the PDF. Then the machine extracted all the information from the PDF and indexed that as well, to ensure that every scrap of data from the email could be searched in Blacklight. The data processing was analogous to the journalistic effort. By the Munich meeting, Prometheus was already the largest cross-border media collaboration ever undertaken.

Raw Data Is an Oxymoron by Lisa Gitelman

23andMe, collateralized debt obligation, computer age, continuous integration, crowdsourcing, disruptive innovation, Drosophila, Edmond Halley, Filter Bubble, Firefox, fixed income, folksonomy, Google Earth, Howard Rheingold, index card, informal economy, information security, Isaac Newton, Johann Wolfgang von Goethe, knowledge worker, Large Hadron Collider, liberal capitalism, lifelogging, longitudinal study, Louis Daguerre, Menlo Park, off-the-grid, optical character recognition, Panopticon Jeremy Bentham, peer-to-peer, RFID, Richard Thaler, Silicon Valley, social graph, software studies, statistical model, Stephen Hawking, Steven Pinker, text mining, time value of money, trade route, Turing machine, urban renewal, Vannevar Bush, WikiLeaks

And while users can search for words under the page images, they cannot reveal what the computer sees; they cannot see the characters that the computer recognizes in the page image. Ironically, over time ECCO’s publisher has loosened its rules on downloading page images. So, for database subscribers, it has become easy and quick to download page images of full books from ECCO. Yet regular users cannot even download a single page of text as interpreted by ECCO’s optical character recognition (OCR) software, which suggests that over time Gale determined there is no percentage in books, not even in digitized images of books, unless the books are already packaged as data.23 The future is in data. Using ECCO, I began trying to understand the sense of “data” in Priestley. Happily, my first searches turned out to be promising.

…

., 83–84 National Data Center, 126 National Security Agency, 2 Nature, 151 Newcomb, Simon, 79–81, 85 New York Times, 1, 24 Newton, Isaac, 16, 21, 169 Newton, Robert Russell, 81–84, 85 Nissenbaum, Helen, 130 Noelle-Neumann, Elisabeth, 105 Number, 6, 8, 20, 36, 43–44, 50, 61, 69, 71, 84, 106, 124, 162 Nunberg, Geoffrey, 26, 91 Objectivity, 3–6, 7, 11, 50, 148, 164 Observation, 80, 82–84 Ohm, Paul, 128 Optical Character Recognition (OCR), 28, 31 Orwell, George, 124 Oxford English Dictionary (OED), 18, 20, 34, 35, 126 Paper, 9–10, 157, 165 Paper machine, 10, 105, 108–109 Pattern, 6, 16–17, 95, 123, 129, 159 Petacenter, 151–152, 164 Phenomenology of Spirit, 108, 112 Picciotto, Joanna, 4–5 Pinker, Steven, 19 Playfair, William, 17 Poindexter, John, 131 Pollan, Michael, 148–149 181 182 Index Poovey, Mary, 7, 17 Porter, Theordore, 14n20, 17 Poster, Mark, 126, 129 Preemptive Media Collective, 135–136 Press, the, 10, 89, 96 Priestly, Joseph, 15–17, 28 Privacy, 2, 128, 132, 136 Procustean Marxism, 50 Program, 170 Project Gutenberg, 22, 35 Property, 131 Protocol, 1, 31, 131, 138, 161 Proximity searching, 35 Ptolemy, 82, 83 Pynchon, Thomas, 129 Stephenson, F.

Machine Learning Design Patterns: Solutions to Common Challenges in Data Preparation, Model Building, and MLOps by Valliappa Lakshmanan, Sara Robinson, Michael Munn

A Pattern Language, Airbnb, algorithmic trading, automated trading system, business intelligence, business logic, business process, combinatorial explosion, computer vision, continuous integration, COVID-19, data science, deep learning, DevOps, discrete time, en.wikipedia.org, Hacker News, industrial research laboratory, iterative process, Kubernetes, machine translation, microservices, mobile money, natural language processing, Netflix Prize, optical character recognition, pattern recognition, performance metric, recommendation engine, ride hailing / ride sharing, selection bias, self-driving car, sentiment analysis, speech recognition, statistical model, the payments system, web application

lineage tracking, Lineage tracking in ML pipelines linear models, Models and Frameworks, Solution long short-term memory model (see LSTM) low latency, The Machine Learning Process, Trade-Offs and Alternatives-Trade-Offs and Alternatives, Why It Works, Phase 1: Building the offline model, Problem-Feast, Why It Works LSTM, Choosing a model architecture, Solution, Sequence models M machine learning engineers (see ML engineers) machine learning feasibility study, Discovery machine learning framework, Reproducibility machine learning life cycle (see ML life cycle) machine learning models, Models and Frameworks machine learning problems (see supervised learning; unsupervised learning) machine learning, definition of, Models and Frameworks MAE (mean absolute error), Problem MAP (mean average precision), Problem MapReduce, Why It Works matrix factorization, Problem Representation Design Patterns MD5 hash, Cryptographic hash mean absolute error (MAE), Problem mean average precision (MAP), Problem Mesh TensorFlow, Model parallelism mesh-free approximation, Data-driven discretizations microservices architecture, Problem min-max scaling, Linear scaling-Linear scaling Mirrored Variable, Synchronous training MirroredStrategy, Synchronous training, Synchronous training Mixed Input Representation, Deterministic inputs ML approximation, Solution, Interpolation and chaos theory ML engineersrole of, Roles, Scale, Why It Works, Responsible AI, Problem tasks of, Problem, Solution, Solution, Model versioning with a managed service ML life cycle, ML Life Cycle-Discovery, Development-Deployment ML Operations (see MLOps) ML pipelines, The Machine Learning Process ML researchers, Responsible AI MLflow, Solution MLOps, Deployment, Tactical phase: Manual development MNIST dataset, Images as pixel values, Combining different image representations MobileNetV2, Phase 1: Building the offline model, Phase 1: Building the offline model Mockus, Jonas, Bayesian optimization model builders, Responsible AI(see also data scientists, ML researchers) Model Card Toolkit, Model Cards Model Cards, Model Cards model evaluation, The Machine Learning Process, Problem, Lineage tracking in ML pipelines, Limitations of explanations, Solution(see also Continued Model Evaluation design pattern) model parallelism, Solution, Model parallelism-Model parallelism model parameters, Problem-Problem model understanding (see explainability) Model Versioning design pattern, Reproducibility Design Patterns, Design Pattern 27: Model Versioning-New models versus new model versions, Pattern Interactions model, pre-trained, Implementing transfer learning -Pre-trained embeddings, Fine-tuning versus feature extraction, Responsible AI, Pattern Interactions model, text classification, Problem, Custom serving function, Problem, Multiple serving functions monolithic applications, Problem Monte Carlo approach, Monte Carlo methods-Unbounded domains multi-hot encoding, Array of categorical variables multiclass classification problems, Design Pattern 6: Multilabel multilabel classification, Sigmoid output for models with two classes-Parsing sigmoid results, Solution Multilabel design pattern, Problem Representation Design Patterns, Design Pattern 6: Multilabel -One versus rest multilabel, multiclass classification (see Multilabel design pattern) Multimodal Input design pattern, Design Pattern 4: Multimodal Input-Multimodal feature representations and model interpretability, Pattern Interactions multimodal inputs, definition of, Trade-Offs and Alternatives MultiWorkerMirroredStrategy, Synchronous training, Synchronous training MySQL, Cached results of batch serving MySQL Cluster, Alternative implementations N naive Bayes, Bagging natural language understanding (NLU), Natural Language Understanding Netflix Prize, Trade-Offs and Alternatives Neural Machine Translation, Model parallelism neural networks, Models and Frameworks, Data-driven discretizations Neutral Class design pattern, Problem Representation Design Patterns, Design Pattern 9: Neutral Class -Reframing with neutral class, Responsible AI, Pattern Interactions NLU (natural language understanding), Natural Language Understanding NNLM, Context language models nonlinear transformations, Nonlinear transformations-Nonlinear transformations numerical data, Data and Feature Engineering-Data and Feature Engineering O objective function, Bayesian optimization OCR (optical character recognition), Pre-trained models one versus rest approach, One versus rest one-hot encoding, One-hot encoding-One-hot encoding, Problem-Problem, Choosing the embedding dimension, Static method OneDeviceStrategy, Synchronous training, Asynchronous training online machine learning, Scheduled retraining online prediction, The Machine Learning Process, Trade-Offs and Alternatives online update, High-throughput data streams ONNX, Model export optical character recognition (OCR), Pre-trained models orchestration, definition of, Development versus production pipelines outliers, Linear scaling output layer bias, Weighted classes overfit model, Problem, Problem(see also physics-based model) overfitting, Overfitting a batch-Overfitting a batch P parameter server architecture, Asynchronous training parameter sharing, Multitask learning-Multitask learning ParameterServerStrategy, Asynchronous training partial differential equation (see PDE) Parzen estimator, Bayesian optimization Pattern Language, A, What Are Design Patterns?

…

For example, let’s say we are building a model to detect authorized entrants to a building so that we can automatically open the gate. One of the inputs to our model might be the license plate of the vehicle. Instead of using the security photo directly in our model, we might find it simpler to use the output of an optical character recognition (OCR) model. It is critical that we recognize that OCR systems will have errors, and so we should not train our model with perfect license plate information. Instead, we should train the model on the actual output of the OCR system. Indeed, because different OCR models will behave differently and have different errors, it is necessary to retrain the model if we change the vendor of our OCR system.

pages: 398 words: 86,023

The Wikipedia Revolution: How a Bunch of Nobodies Created the World's Greatest Encyclopedia by Andrew Lih

Albert Einstein, AltaVista, barriers to entry, Benjamin Mako Hill, Bill Atkinson, c2.com, Cass Sunstein, citation needed, commons-based peer production, crowdsourcing, Debian, disinformation, en.wikipedia.org, Firefox, Ford Model T, Free Software Foundation, Hacker Ethic, HyperCard, index card, Jane Jacobs, Jason Scott: textfiles.com, jimmy wales, Ken Thompson, Kickstarter, Marshall McLuhan, Mitch Kapor, Network effects, optical character recognition, Ralph Waldo Emerson, Richard Stallman, side project, Silicon Valley, Skype, slashdot, social software, Steve Jobs, the Cathedral and the Bazaar, The Death and Life of Great American Cities, the long tail, The Wisdom of Crowds, Tragedy of the Commons, urban planning, urban renewal, Vannevar Bush, wikimedia commons, Y2K, Yochai Benkler

One was related to Project Gutenberg, a movement to have public domain print works available for free on the Internet. Project Gutenberg actually started in 1971 on mainframe computers; now it is one of the oldest online text repositories. The problem it faced was that starting in 1989 it digitized books using optical character recognition systems to automatically turn images of book pages into computer text. The problem was that OCR was imperfect, and there were small, but numerous, errors because of smudges, bad image quality, or dust. That gave Charles Franks the idea to start Distributed Proofreaders in 2000, where people from anywhere on the Internet could help proofread these imperfect OCR texts and fix the problems.

…

., 169 211 Merel, Peter, 62 editorial process of, 37–41, 43, 63, 64 meta-moderation, 68–69 GNUpedia and, 79 metaphors, 46–47 rules of, 36–37 242_Index Nupedia ( continued ) radio, amateur ham, 45–46 structure of, 37–38 Ramsey, Derek (Ram-man), 99–104, 108, Wikipedia and, 64–65, 88, 136, 109, 111, 177 171, 172 Rand, Ayn, 32 wiki software and, 61–65 Raul654 (Mark Pellegrini), 180–81 Nupedia Advisory Board, 37, 38, 64 Raymond, Eric S., 43, 85, 172–73, 175 Nupedia-L, 63 Reagle, Joseph, 82, 96, 112 Nupedia Open Content License, 35, 72 Rec.food.chocolate, 84–85 RickK, 120, 185–88 rings, Web site, 23, 31 objectivism, 32, 36–37 robots, software, 88, 99–106, 145, 147, OCR (optical character recognition), 35 177, 179 Open Directory Project (ODP), 30–31, Rosenfeld, Jeremy, 45 33, 35 Rousseau, Jean-Jacques, 15 Ota, Takashi, 146 Russell, Bertrand, 13, 81 Oxford English Dictionary (OED), 70–72 Russian language, 152 peer production, 108–9 Sandbox, 97, 115 Pellegrini, Mark (Raul654), 180–81 Sanger, Larry, 6–7, 32–34, 36–38, Perl, 56, 67, 101, 140 40–41, 43–45, 61–65, 67, 88, 89, Peul language, 158 115, 184, 202, 210–11 phantom authority, 175–76 boldness directive and, 91, 113 Philological Society, 70 Citizendium project of, 190, 211–12 PHP, 74, 101 Essjay and, 197 Pike, Rob, 144 memoir of, 174, 190, 225 piranha effect, 83, 106, 109, 113, 120 resignation from Wikipedia, 174–75, Plautus Satire, 181 210 Pliny the Elder, 15 on rules, 76, 112 Poe, Marshall, 171 Spanish Wikipedia and, 9, 136–38 Polish Wikipedia, 146, 147 trolls and, 170–75, 189–90 Popular Science, 126 Wikipedia license and, 72 Portland Pattern Repository, 59 Y2K bug and, 32–33 Portuguese language, 136 San Jose Mercury News, 126 PostScript, 52 Schechter, Danny, 8–9 “Potato chip” article, 136 Schiff, Stacy, 196 Professor and the Madman, The Schlossberg, Edwin, 46 (Winchester), 70, 71 schools, 177–78 Project Gutenberg, 35 Scott, Jason, 131, 189 public domain content, 26, 111 search engines, 11, 22, 34 Pupek, Dan, 58 Google, see Google Seigenthaler, John, 9–10, 191–94, 200, 220 Quickpolls, 126–27 Senegal, 158 Quiz Show, 13 Serbian Wikipedia, 155–56 Index_243 servers, 77–79, 191 Tagalog language, 160 Sethilys (Seth Anthony), 106–11 Taiwan, 150, 151, 154 Shah, Sunir, 59–60, 64 “Talossan language” article, 120 Shaw, George Bernard, 135 Tamil language, 160 Shell, Tim, 21–22, 32, 36, 66, 174, Tawker, 177, 179, 186 184 Tektronix, 46, 47, 50, 55, 56 sidewalks, 96–97 termites, 82 Sieradski, Daniel, 204 Thompson, Ken, 143–44 Signpost, 200 Time, 9, 13 Silsor, 186 Torvalds, Linus, 28–29, 30, 173, 175 Sinitic languages, 159 Tower of Babel, 133–34 see also China tragedy of the commons, 223 Skrenta, Rich, 23, 30 Trench, Chenevix, 70 Slashdot, 67–69, 73, 76, 88, 205, trolls, 170–76, 179, 186, 187, 189–90 207, 216 Truel, Bob, 23, 30 Sanger’s memoir for, 174, 190, 225 2channel, 145 Sneakernet, 50 Snow, Michael, 206–7 Socialtext, 207 “U,” article on, 64 sock puppets, 128, 178–79 Unicode, 142, 144 software, open-source, 5, 23–28, 30, 35, UTF-8, 144–45 62, 67, 79, 216 UTF-32, 142, 143 design patterns and, 55, 59 UNIX, 27, 30–31, 54, 56, 143 Linux, 28–30, 56, 108, 140, 143, 173, Unregistered Words Committee, 70 216, 228 urban planning, 96–97 software robots, 88, 99–106, 145, 147, URL (Uniform Resource Locator), 53, 54 177, 179 USA Today, 9, 191, 220 Souren, Kasper, 158 UseModWiki, 61–63, 66, 73–74, 140–41 South Africa, 157–58 Usenet, 35, 83–88, 114, 170, 190, 223 spam, 11, 87, 220 Usenet Moderation Project (Usemod), 62 Spanish Wikipedia, 9, 136–39, 175, 183, USWeb, 211 215, 226 squid servers, 77–79 Stallman, Richard, 23–32, 74, 86, 217 vandalism: GNU Free Documentation License of, on LA Times Wikitorial, 207–8 72–73, 211–12 on Wikipedia, 6, 93, 95, 125, 128, GNU General Public License of, 27, 72 176–79, 181, 184–88, 194, 195, GNU Manifesto of, 26 202, 220, 227 GNUpedia of, 79 Van Doren, Charles, 13–14 Steele, Guy, 86 verein, 147 Stevertigo, 184 VeryVerily, 128 stigmergy, 82, 89, 92, 109 Vibber, Brion, 76 Sun Microsystems, 23, 27, 29–30, 56 Viola, 54 Sun Tzu, 169 ViolaWWW, 54–55 Swedish language, 140, 152 Voltaire, 15 244_Index WAIS, 34, 53 Wik, 123–25, 170, 180 Wales, Christine, 20–21, 22, 139 Wikia, 196, 197 Wales, Doris, 18, 19 Wiki Base, 62 Wales, Jimmy, 1, 8, 9, 18–22, 44, 76, Wikibooks, 216 88, 115, 131, 184, 196, 213, 215, Wikimania, 1–3, 8, 146, 147–48 220 WikiMarkup, 90 administrators and, 94, 185 Wikimedia Commons, 216 background of, 18–19 Wikimedia Foundation, 146, 157, 183–84, at Chicago Options Associates, 20, 196, 199, 213–15, 225–26, 227 21, 22 Wikipedia: Cunctator essay and, 172 administrators of, 67, 93–96, 119, 121, and deletion of articles, 120 125, 127, 148, 178, 185–86, dispute resolution and, 179–80, 181, 195–96, 224–25 223 advertising and, 9, 11, 136–38, 215, Essjay and, 197, 199 226 languages and, 139, 140, 157–58 amateurs and professionals in, 225 neutrality policy and, 6, 7, 113 Arbitration Committee of, 180–81, 184, objectivism and, 32, 36–37 197, 223 Nupedia and, 32–35, 41, 43–45, “assume good faith” policy in, 114, 187, 61–63 195, 200 on piranha effect, 83 blocking of people from, 93 role of, in Wikipedia community, 174–76, boldness directive in, 8, 91, 102, 179–80, 223 113–14, 115, 122, 221 Seigenthaler incident and, 192, 194 categories in, 97–98, 221 Spanish Wikipedia and, 137, 175 “checkuser” privilege in, 179, 196, 199 Stallman and, 30–32 database for, 73–74, 77, 78, 94 three revert rule and, 127–28 discussions in, 7–8, 65–66, 75–76, 89, Wikimania and, 146 121–22 Wikipedia license and, 72 DMOZ as inspiration for, 23 Wikitorials and, 206–7 five pillars of, 113, 216 Wales, Jimmy, Sr., 18 future of, 213–17, 219–29 Wall Street Journal, 126 growth of, 4, 9, 10, 77, 88–89, 95–97, “War and Consequences” Wikitorial, 99–100, 126, 184, 215, 219, 220 206–7 how it works, 90–96 wasps, 82 influence of, 201–212 Weatherly, Keith, 106 launch of, 64, 69, 139, 171 Web browsers, 51–55 legal issues and, 94, 111, 186, 191–92, Weblogs Inc., 215 227; see also copyright; libel WebShare, 209 linking in, 66–67, 73 Webster, Noah, 70, 133 mailing list for, 89, 95 Web 2.0, 68, 111, 114, 201 main community namespace in, 76 Wei, Pei-Yan, 54–55 main page of, 95 Weinstock, Steven, 202–3 MeatballWiki and, 60, 114, 119, 187–88 “Why Wikipedia Must Jettison Its mediation of disputes in, 180, 181, 195 Anti-Elitism” (Sanger), 189–90 meta pages in, 91 Index_245 name of, 45 “diff” function and, 74, 75, 93, 99 namespaces in, 75–76 edit histories of, 64–65, 71, 82, 91–93 number of editors in, 95–96 editing of, 3–4, 6, 38, 64–66, 69, 73, Nupedia and, 64–65, 88, 136, 171, 172 88, 114–15, 131, 194 openness of, 5–6, 9 edit wars and, 95, 122–31, 136, 146 origins of, 43–60 eventualism and, 120–21, 129, 159 policies and rules of, 76, 112–14, first written, 64 127–28, 170, 171, 192, 221, flagged revisions of, 148–49, 215–16, 224–25 227 popularity of, 4 inclusion of, 115–21 Quickpolls in, 126–27 inverted pyramid formula for, 90 Recent Changes page in, 64–65, 82, license covering content of, 72–73, 98, 104, 109, 176–77 211–12 schools and, 177–78 locking of, 95 servers for, 77–79, 191 maps in, 107, 109–11 Slashdot and, 69, 73, 76, 88 neutral point of view in, 6–7, 82, 89, 111, sock puppets and, 128, 178–79 112–13, 117, 140, 174, 203–4, 217, SOFIXIT directive in, 114–15, 122, 221 228 software robots and, 88, 99–106, 145, news and, 7 147, 177, 179 original research and, 112–13, 117, 174 spam and self-promotion on, 11, 220 protection and semi-protection of, 194, talk pages in, 75–76, 89, 93, 98 216 templates in, 97–98, 113, 221 reverts and, 125, 127–28 trolls and, 170–76, 179, 186, 187, single versions of, 6 189–90 spelling mistakes in, 104–5 user pages in, 76, 89 stability of, 227–28 vandalism and, 6, 93, 95, 125, 128, stub, 92, 97, 101, 104, 148 176–79, 181, 184–88, 194, 195, talk pages for, 75–76, 89, 93, 98 202, 220, 227 test edits of, 176 watchlists in, 74, 82, 98–99, 109 “undo” function and, 93 wiki markup language for, 221–22 uneven development of, 220 wiki software for, 64–67, 73, 77, 90, 93, unusual, 92, 117–18 140–41, 216 verifiability and, 112–13, 117 Wikipedia articles: watchlists for, 74, 82, 98–99, 109 accuracy of, 10, 72, 188–89, 194, 208 Wikipedia community, 7–8, 81–132, 174, attempts to influence, 11–12 175, 183–200, 215–17, 222–23 biographies of living persons, 192, Essjay controversy and, 194–200 220–21 Missing Wikipedians page and, 184–85, census data in, 100–104, 106 188 citations in, 113 partitioning of, 223 consensus and, 7, 94, 95, 119–20, 122, Seigenthaler incident and, 9–10, 222–23 191–94, 220 consistency among, 213 stress in, 184 creation of, 90–93, 130–31, 188–89 trolls and, 170 deletion of, 93–94, 96, 119–21, 174 Wales’s role in, 174–76, 179–80, 223 246_Index Wikipedia international editions, 12, 77, Wikitorials, 205–8 100, 131–32, 133–67 Wikiversity, 216 African, 157–58 WikiWikiWeb, 44–45, 58–60, 61, 62 Chinese, 10, 141–44, 146, 150–55 Willy on Wheels (WoW), 178–79 encoding languages for, 140–45 Winchester, Simon, 70, 71 French, 83, 139, 146, 147 Wizards of OS conference, 211 German, 11, 139, 140, 147–49, 215, Wolof language, 158 220, 227 Wool, Danny, 3, 158, 199 Japanese, 139, 140, 141–42, 144, World Book, 16–19 145–47 World Is Flat, The (Friedman), 11 Kazakh, 155–57 World Wide Web, 34, 35, 47, 51–55 links to, 134–35, 140 Web 2.0, 68, 111, 114, 201 list of languages by size, 160–67 WYSIWYG, 222 Serbian, 155–56 Spanish, 9, 136–39, 175, 183, 215, 226 Yahoo, 4, 22, 23, 30, 191, 214 Wikipedia Watch, 192 “Year zero” article, 117 Wikipedia Weekly, 225 Yeats, William Butler, 183 wikis, 44, 51 Yongle encyclopedia, 15 Cunningham’s creation of, 2, 4, 56–60, “You have two cows” article, 118 62, 65–66, 90 YouTube, 58 MeatballWiki, 59–60, 114, 119, 175, Y2K bug, 32–33 187–88 Nupedia and, 61–65 Wikisource, 216 ZhengZhu, 152–57 About the Author Andrew Lih was an academic for ten years at Columbia University and Hong Kong University in new media and journalism.

pages: 304 words: 82,395

Big Data: A Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger, Kenneth Cukier

23andMe, Affordable Care Act / Obamacare, airport security, Apollo 11, barriers to entry, Berlin Wall, big data - Walmart - Pop Tarts, Black Swan, book scanning, book value, business intelligence, business process, call centre, cloud computing, computer age, correlation does not imply causation, dark matter, data science, double entry bookkeeping, Eratosthenes, Erik Brynjolfsson, game design, hype cycle, IBM and the Holocaust, index card, informal economy, intangible asset, Internet of things, invention of the printing press, Jeff Bezos, Joi Ito, lifelogging, Louis Pasteur, machine readable, machine translation, Marc Benioﬀ, Mark Zuckerberg, Max Levchin, Menlo Park, Moneyball by Michael Lewis explains big data, Nate Silver, natural language processing, Netflix Prize, Network effects, obamacare, optical character recognition, PageRank, paypal mafia, performance metric, Peter Thiel, Plato's cave, post-materialism, random walk, recommendation engine, Salesforce, self-driving car, sentiment analysis, Silicon Valley, Silicon Valley startup, smart grid, smart meter, social graph, sparse data, speech recognition, Steve Jobs, Steven Levy, systematic bias, the scientific method, The Signal and the Noise by Nate Silver, The Wealth of Nations by Adam Smith, Thomas Davenport, Turing test, vertical integration, Watson beat the top human players on Jeopardy!

All that Google had were images that only humans could transform into useful information—by reading. While this would still have been a great tool—a modern, digital Library of Alexandria, more comprehensive than any library in history—Google wanted more. The company understood that information has stored value that can only be released once it is datafied. And so Google used optical character-recognition software that could take a digital image and recognize the letters, words, sentences, and paragraphs on it. The result was datafied text rather than a digitized picture of a page. Now the information on the page was usable not just for human readers, but also for computers to process and algorithms to analyze.

…

But when he realized that he was responsible for millions of people wasting lots of time each day typing in annoying, squiggly letters—vast amounts of information that was simply discarded afterwards—he didn’t feel so smart. Looking for ways to put all that human computational power to more productive use, he came up with a successor, fittingly named ReCaptcha. Instead of typing in random letters, people type two words from text-scanning projects that a computer’s optical character-recognition program couldn’t understand. One word is meant to confirm what other users have typed and thus is a signal that the person is a human; the other is a new word in need of disambiguation. To ensure accuracy, the system presents the same fuzzy word to an average of five different people to type in correctly before it trusts it’s right.

pages: 527 words: 147,690

Terms of Service: Social Media and the Price of Constant Connection by Jacob Silverman

"World Economic Forum" Davos, 23andMe, 4chan, A Declaration of the Independence of Cyberspace, Aaron Swartz, Airbnb, airport security, Amazon Mechanical Turk, augmented reality, basic income, Big Tech, Brian Krebs, California gold rush, Californian Ideology, call centre, cloud computing, cognitive dissonance, commoditize, company town, context collapse, correlation does not imply causation, Credit Default Swap, crowdsourcing, data science, deep learning, digital capitalism, disinformation, don't be evil, driverless car, drone strike, Edward Snowden, Evgeny Morozov, fake it until you make it, feminist movement, Filter Bubble, Firefox, Flash crash, game design, global village, Google Chrome, Google Glasses, Higgs boson, hive mind, Ian Bogost, income inequality, independent contractor, informal economy, information retrieval, Internet of things, Jacob Silverman, Jaron Lanier, jimmy wales, John Perry Barlow, Kevin Kelly, Kevin Roose, Kickstarter, knowledge economy, knowledge worker, Larry Ellison, late capitalism, Laura Poitras, license plate recognition, life extension, lifelogging, lock screen, Lyft, machine readable, Mark Zuckerberg, Mars Rover, Marshall McLuhan, mass incarceration, meta-analysis, Minecraft, move fast and break things, national security letter, Network effects, new economy, Nicholas Carr, Occupy movement, off-the-grid, optical character recognition, payday loans, Peter Thiel, planned obsolescence, postindustrial economy, prediction markets, pre–internet, price discrimination, price stability, profit motive, quantitative hedge fund, race to the bottom, Ray Kurzweil, real-name policy, recommendation engine, rent control, rent stabilization, RFID, ride hailing / ride sharing, Salesforce, self-driving car, sentiment analysis, shareholder value, sharing economy, Sheryl Sandberg, Silicon Valley, Silicon Valley ideology, Snapchat, social bookmarking, social graph, social intelligence, social web, sorting algorithm, Steve Ballmer, Steve Jobs, Steven Levy, systems thinking, TaskRabbit, technological determinism, technological solutionism, technoutopianism, TED Talk, telemarketer, transportation-network company, Travis Kalanick, Turing test, Uber and Lyft, Uber for X, uber lyft, universal basic income, unpaid internship, women in the workforce, Y Combinator, yottabyte, you are the product, Zipcar

Add RFID chips to all packaged food and grocery products and you can track their movement through supply chains and stores without human assistance. Perhaps companies can partner with stores to help utilize their surveillance systems to monitor the placement of goods. Firm up sentiment analysis, trending-topic algorithms, and optical-character-recognition scanning so that humans aren’t forced to do such drudgery. To save content moderators from their on-the-job stress, we have to put them out of work again. Just as Facebook or Pinterest retains control of your data, online labor markets keep workers wedded to the platform. You can’t take your profile elsewhere, unless two labor markets form a partnership or decide to create an open protocol that other markets can take advantage of.

…

As with online labor markets, the digital labor of social media is highly mediated, disguised. It’s made to look like play or a normal part of Web browsing. For example, CAPTCHA tests—those forms that require you to read a blurry sequence of words/letters/numbers/shapes and enter them to prove you are a human being—often double as ways of improving optical character recognition (OCR) programs. These words were scanned from books, newspaper archives, or other media, but existing OCR software can’t read them. Like a Mechanical Turk worker, you provide the final bit of cognitive labor, deciphering the word for the computer. That word then goes back to whichever company paid the CAPTCHA service to help digitize their material.

…

., 356–57 Nissenbaum, Helen, 284, 297 notifications and alerts, 50–53, 214 NSA (National Security Agency), 129–32, 312, 314 NSF (National Science Foundation), 279 NSLs (National Security Letters), 130 Obama, Barack, 134, 169, 194 “Obama Is Wrong” (Hayes), 105–6 ObscuraCam, 357 Occupy movement, 136–37 OCR (optical character recognition) software, 260, 358 O’Donnell, Robert, 152 Office Max, 279–80 OkCupid, 204 Old Spice advertising campaign, 93–94 Omidyar, Pierre, 239 online persona, 344–45 online recommendations, 201–2 online reputation. See reputation On the Media radio program, 109 Open Graph, 11–12 opting out of advertising-based social networks, 275–77 cost of, 295 difficulty finding option for, 32, 33 of friends adding you to a group, 92 of Google Shared Endorsements, 33 of including your location in messages, 177 of Klout, 195 opt-in vs., 7–8 of social media, 272, 340–41, 342, 346, 347 oral storytelling, 62, 63 Oremus, Will, 106–7, 265 outing students via privacy faux pas, 286 ownership of your identity, 256–57, 273–74, 275–77, 311, 360 Page, Larry, 250 page views overview, 95–96, 98 and advertising dollars, 71, 93, 97 Facebook-ready content for generating, 115 and invented controversy, 107 meme-related, 84, 103–4, 105 new outlets’ boosting of, 122–23 Palihapitiya, Chamath, 249 Pandora, 303 paparazzos, 211–12 parents, scrapbooking about their children, 46, 55–60 Pariser, Eli, 122 Paris, France, 267, 268 Patriot Act, 130 pay-per-gaze advertising, 302 Peers, 238–39, 244 peer-to-peer social networks, 311 Peretti, Jonah, 114–15 personal care, 224 personal endorsements, 31–35 personal graph, 18–19 Persson, Markus, 164–65 Pezold, John, 187 PGP, 368–69 PHD, 304 PhoneID Score, TeleSign, 40 phones.

pages: 339 words: 94,769

Possible Minds: Twenty-Five Ways of Looking at AI by John Brockman

AI winter, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Alignment Problem, AlphaGo, artificial general intelligence, Asilomar, autonomous vehicles, basic income, Benoit Mandelbrot, Bill Joy: nanobots, Bletchley Park, Buckminster Fuller, cellular automata, Claude Shannon: information theory, Computing Machinery and Intelligence, CRISPR, Daniel Kahneman / Amos Tversky, Danny Hillis, data science, David Graeber, deep learning, DeepMind, Demis Hassabis, easy for humans, difficult for computers, Elon Musk, Eratosthenes, Ernest Rutherford, fake news, finite state, friendly AI, future of work, Geoffrey Hinton, Geoffrey West, Santa Fe Institute, gig economy, Hans Moravec, heat death of the universe, hype cycle, income inequality, industrial robot, information retrieval, invention of writing, it is difficult to get a man to understand something, when his salary depends on his not understanding it, James Watt: steam engine, Jeff Hawkins, Johannes Kepler, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, Kickstarter, Laplace demon, Large Hadron Collider, Loebner Prize, machine translation, market fundamentalism, Marshall McLuhan, Menlo Park, military-industrial complex, mirror neurons, Nick Bostrom, Norbert Wiener, OpenAI, optical character recognition, paperclip maximiser, pattern recognition, personalized medicine, Picturephone, profit maximization, profit motive, public intellectual, quantum cryptography, RAND corporation, random walk, Ray Kurzweil, Recombinant DNA, Richard Feynman, Rodney Brooks, self-driving car, sexual politics, Silicon Valley, Skype, social graph, speech recognition, statistical model, Stephen Hawking, Steven Pinker, Stewart Brand, strong AI, superintelligent machines, supervolcano, synthetic biology, systems thinking, technological determinism, technological singularity, technoutopianism, TED Talk, telemarketer, telerobotics, The future is already here, the long tail, the scientific method, theory of mind, trolley problem, Turing machine, Turing test, universal basic income, Upton Sinclair, Von Neumann architecture, Whole Earth Catalog, Y2K, you are the product, zero-sum game

Wearable and public cameras with facial-recognition software touch taboos. Should people with hyperthymesia or photographic memories be excluded from those same settings? Shouldn’t people with prosopagnosia (face blindness) or forgetfulness be able to benefit from facial-recognition software and optical character recognition wherever they go, and if them, then why not everyone? If we all have those tools to some extent, shouldn’t we all be able to benefit? These scenarios echo Kurt Vonnegut’s 1961 short story “Harrison Bergeron,” in which exceptional aptitude is suppressed in deference to the mediocre lowest common denominator of society.

…

You show it an image, and for about ten thousand kinds of things, it will tell you what it is. It’s fun to show it an abstract painting and see what it says. But it does a pretty good job. It works using the same neural-network technology that McCulloch and Pitts imagined in 1943 and lots of us worked on in the early eighties. Back in the 1980s, people successfully did OCR—optical character recognition. They took the twenty-six letters of the alphabet and said, “OK, is that an A? Is that a B? Is that a C?” and so on. That could be done for twenty-six different possibilities, but it couldn’t be done for ten thousand. It was just a matter of scaling up the whole system that makes this possible today.

pages: 372 words: 101,174

How to Create a Mind: The Secret of Human Thought Revealed by Ray Kurzweil

Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Albert Michelson, anesthesia awareness, anthropic principle, brain emulation, cellular automata, Charles Babbage, Claude Shannon: information theory, cloud computing, computer age, Computing Machinery and Intelligence, Dean Kamen, discovery of DNA, double helix, driverless car, en.wikipedia.org, epigenetics, George Gilder, Google Earth, Hans Moravec, Isaac Newton, iterative process, Jacquard loom, Jeff Hawkins, John von Neumann, Law of Accelerating Returns, linear programming, Loebner Prize, mandelbrot fractal, Nick Bostrom, Norbert Wiener, optical character recognition, PalmPilot, pattern recognition, Peter Thiel, Ralph Waldo Emerson, random walk, Ray Kurzweil, reversible computing, selective serotonin reuptake inhibitor (SSRI), self-driving car, speech recognition, Steven Pinker, strong AI, the scientific method, theory of mind, Turing complete, Turing machine, Turing test, Wall-E, Watson beat the top human players on Jeopardy!, X Prize

Once a digital neocortex learns a skill, it can transfer that know-how in minutes or even seconds. As one of many examples, at my first company, Kurzweil Computer Products (now Nuance Speech Technologies), which I founded in 1973, we spent years training a set of research computers to recognize printed letters from scanned documents, a technology called omni-font (any type font) optical character recognition (OCR). This particular technology has now been in continual development for almost forty years, with the current product called OmniPage from Nuance. If you want your computer to recognize printed letters, you don’t need to spend years training it to do so, as we did—you can simply download the evolved patterns already learned by the research computers in the form of software.

…

., 28–29, 206–7, 217 dimming of, 29, 59 hippocampus and, 101–2 as ordered sequences of patterns, 27–29, 54 redundancy of, 59 unexpected recall of, 31–32, 54, 68–69 working, 101 Menabrea, Luigi, 190 metacognition, 200, 201 metaphors, 14–15, 113–17, 176–77 Michelson, Albert, 18, 19, 36, 114 Michelson-Morley experiment, 19, 36, 114 microtubules, 206, 207, 208, 274 Miescher, Friedrich, 16 mind, 11 pattern recognition theory of (PRTM), 5–6, 8, 11, 34–74, 79, 80, 86, 92, 111, 172, 217 thought experiments on, 199–247 mind-body problem, 221 Minsky, Marvin, 62, 133–35, 134, 199, 228 MIT Artificial Intelligence Laboratory, 134 MIT Picower Institute for Learning and Memory, 101 MobilEye, 159 modeling, complexity and, 37–38 Modha, Dharmendra, 128, 195, 271–72 momentum, 20–21 conservation of, 21–22 Money, John William, 118, 119 montane vole, 119 mood, regulation of, 106 Moore, Gordon, 251 Moore’s law, 251, 255, 268 moral intelligence, 201 moral systems, consciousness as basis of, 212–13 Moravec, Hans, 196 Morley, Edward, 18, 19, 36, 114 Moskovitz, Dustin, 156 motor cortex, 36, 99 motor nerves, 99 Mountcastle, Vernon, 36, 37, 94 Mozart, Leopold, 111 Mozart, Wolfgang Amadeus, 111, 112 MRI (magnetic resonance imaging), 129 spatial resolution of, 262–65, 263, 309n MT (V5) visual cortex region, 83, 95 Muckli, Lars, 225 music, as universal to human culture, 62 mutations, simulated, 148 names, recalling, 32 National Institutes of Health, 129 natural selection, 76 geologic process as metaphor for, 14–15, 114, 177 see also evolution Nature, 94 nematode nervous system, simulation of, 124 neocortex, 3, 7, 77, 78 AI reverse-engineering of, see neocortex, digital bidirectional flow of information in, 85–86, 91 evolution of, 35–36 expansion of, through AI, 172, 266–72, 276 expansion of, through collaboration, 116 hierarchical order of, 41–53 learning process of, see learning linear organization of, 250 as metaphor machine, 113 neural leakage in, 150–51 old brain as modulated by, 93–94, 105, 108 one-dimensional representations of multidimensional data in, 53, 66, 91, 141–42 pattern recognition in, see pattern recognition pattern recognizers in, see pattern recognition modules plasticity of, see brain plasticity prediction by, 50–51, 52, 58, 60, 66–67, 250 PRTM as basic algorithm of, 6 pruning of unused connections in, 83, 90, 143, 174 redundancy in, 9, 224 regular grid structure of, 82–83, 84, 85, 129, 262 sensory input in, 58, 60 simultaneous processing of information in, 193 specific types of patterns associated with regions of, 86–87, 89–90, 91, 111, 152 structural simplicity of, 11 structural uniformity of, 36–37 structure of, 35–37, 38, 75–92 as survival mechanism, 79, 250 thalamus as gateway to, 100–101 total capacity of, 40, 280 total number of neurons in, 230 unconscious activity in, 228, 231, 233 unified model of, 24, 34–74 as unique to mammalian brain, 93, 286n universal processing algorithm of, 86, 88, 90–91, 152, 272 see also cerebral cortex neocortex, digital, 6–8, 41, 116–17, 121–78, 195 benefits of, 123–24, 247 bidirectional flow of information in, 173 as capable of being copied, 247 critical thinking module for, 176, 197 as extension of human brain, 172, 276 HHMMs in, 174–75 hierarchical structure of, 173 knowledge bases of, 177 learning in, 127–28, 175–76 metaphor search module in, 176–77 moral education of, 177–78 pattern redundancy in, 175 simultaneous searching in, 177 structure of, 172–78 virtual neural connections in, 173–74 neocortical columns, 36–37, 38, 90, 124–25 nervous systems, 2 neural circuits, unreliability of, 185 neural implants, 243, 245 neural nets, 131–35, 144, 155 algorithm for, 291n–97n feedforward, 134, 135 learning in, 132–33 neural processing: digital emulation of, 195–97 massive parallelism of, 192, 193, 195 speed of, 192, 195 neuromorphic chips, 194–95, 196 neuromuscular junction, 99 neurons, 2, 36, 38, 43, 80, 172 neurotransmitters, 105–7 new brain, see neocortex Newell, Allen, 181 New Kind of Science, A (Wolfram), 236, 239 Newton, Isaac, 94 Nietzsche, Friedrich, 117 nonbiological systems, as capable of being copied, 247 nondestructive imaging techniques, 127, 129, 264, 312n–13n nonmammals, reasoning by, 286n noradrenaline, 107 norepinephrine, 118 Notes from Underground (Dostoevsky), 199 Nuance Speech Technologies, 6–7, 108, 122, 152, 161, 162, 168 nucleus accumbens, 77, 105 Numenta, 156 NuPIC, 156 obsessive-compulsive disorder, 118 occipital lobe, 36 old brain, 63, 71, 90, 93–108 neocortex as modulator of, 93–94, 105, 108 sensory pathway in, 94–98 olfactory system, 100 Oluseun, Oluseyi, 204 OmniPage, 122 One Hundred Years of Solitude (García Márquez), 283n–85n On Intelligence (Hawkins and Blakeslee), 73, 156 On the Origin of Species (Darwin), 15–16 optical character recognition (OCR), 122 optic nerve, 95, 100 channels of, 94–95, 96 organisms, simulated, evolution of, 147–53 overfitting problem, 150 oxytocin, 119 pancreas, 37 panprotopsychism, 203, 213 Papert, Seymour, 134–35, 134 parameters, in pattern recognition: “God,” 147 importance, 42, 48–49, 60, 66, 67 size, 42, 49–50, 60, 61, 66, 67, 73–74, 91–92, 173 size variability, 42, 49–50, 67, 73–74, 91–92 Parker, Sean, 156 Parkinson’s disease, 243, 245 particle physics, see quantum mechanics Pascal, Blaise, 117 patch-clamp robotics, 125–26, 126 pattern recognition, 195 of abstract concepts, 58–59 as based on experience, 50, 90, 273–74 as basic unit of learning, 80–81 bidirectional flow of information in, 52, 58, 68 distortions and, 30 eye movement and, 73 as hierarchical, 33, 90, 138, 142 of images, 48 invariance and, see invariance, in pattern recognition learning as simultaneous with, 63 list combining in, 60–61 in neocortex, see pattern recognition modules redundancy in, 39–40, 57, 60, 64, 185 pattern recognition modules, 35–41, 42, 90, 198 autoassociation in, 60–61 axons of, 42, 43, 66, 67, 113, 173 bidirectional flow of information to and from thalamus, 100–101 dendrites of, 42, 43, 66, 67 digital, 172–73, 175, 195 expectation (excitatory) signals in, 42, 52, 54, 60, 67, 73, 85, 91, 100, 112, 173, 175, 196–97 genetically determined structure of, 80 “God parameter” in, 147 importance parameters in, 42, 48–49, 60, 66, 67 inhibitory signals in, 42, 52–53, 67, 85, 91, 100, 173 input in, 41–42, 42, 53–59 love and, 119–20 neural connections between, 90 as neuronal assemblies, 80–81 one-dimensional representation of multidimensional data in, 53, 66, 91, 141–42 prediction by, 50–51, 52, 58, 60, 66–67 redundancy of, 42, 43, 48, 91 sequential processing of information by, 266 simultaneous firings of, 57–58, 57, 146 size parameters in, 42, 49–50, 60, 61, 66, 67, 73–74, 91–92, 173 size variability parameters in, 42, 67, 73–74, 91–92, 173 of sounds, 48 thresholds of, 48, 52–53, 60, 66, 67, 111–12, 173 total number of, 38, 40, 41, 113, 123, 280 universal algorithm of, 111, 275 pattern recognition theory of mind (PRTM), 5–6, 8, 11, 34–74, 79, 80, 86, 92, 111, 172, 217 patterns: hierarchical ordering of, 41–53 higher-level patterns attached to, 43, 45, 66, 67 input in, 41, 42, 44, 66, 67 learning of, 63–64, 90 name of, 42–43 output of, 42, 44, 66, 67 redundancy and, 64 specific areas of neocortex associated with, 86–87, 89–90, 91, 111, 152 storing of, 64–65 structure of, 41–53 Patterns, Inc., 156 Pavlov, Ivan Petrovich, 216 Penrose, Roger, 207–8, 274 perceptions, as influenced by expectations and interpretations, 31 perceptrons, 131–35 Perceptrons (Minsky and Papert), 134–35, 134 phenylethylamine, 118 Philosophical Investigations (Wittgenstein), 221 phonemes, 61, 135, 137, 146, 152 photons, 20–21 physics, 37 computational capacity and, 281, 316n–19n laws of, 37, 267 standard model of, 2 see also quantum mechanics Pinker, Steven, 76–77, 278 pituitary gland, 77 Plato, 212, 221, 231 pleasure, in old and new brains, 104–8 Poggio, Tomaso, 85, 159 posterior ventromedial nucleus (VMpo), 99–100, 99 prairie vole, 119 predictable outcomes, determined outcomes vs., 26, 239 President’s Council of Advisors on Science and Technology, 269 price/performance, of computation, 4–5, 250–51, 257, 257, 267–68, 301n–3n Principia Mathematica (Russell and Whitehead), 181 probability fields, 218–19, 235–36 professional knowledge, 39–40 proteins, reverse-engineering of, 4–5 qualia, 203–5, 210, 211 quality of life, perception of, 277–78 quantum computing, 207–9, 274 quantum mechanics, 218–19 observation in, 218–19, 235–36 randomness vs. determinism in, 236 Quinlan, Karen Ann, 101 Ramachandran, Vilayanur Subramanian “Rama,” 230 random access memory: growth in, 259, 260, 301n–3n, 306n–7n three-dimensional, 268 randomness, determinism and, 236 rationalization, see confabulation reality, hierarchical nature of, 4, 56, 90, 94, 172 recursion, 3, 7–8, 56, 65, 91, 153, 156, 177, 188 “Red” (Oluseum), 204 redundancy, 9, 39–40, 64, 184, 185, 197, 224 in genome, 271, 314n, 315n of memories, 59 of pattern recognition modules, 42, 43, 48, 91 thinking and, 57 religious ecstacy, 118 “Report to the President and Congress, Designing a Digital Future” (President’s Council of Advisors on Science and Technology), 269 retina, 95 reverse-engineering: of biological systems, 4–5 of human brain, see brain, human, computer emulation of; neocortex, digital Rosenblatt, Frank, 131, 133, 134, 135, 191 Roska, Boton, 94 Rothblatt, Martine, 278 routine tasks, as series of hierarchical steps, 32–33 Rowling, J.

pages: 117 words: 30,654

Kindle Formatting: The Complete Guide to Formatting Books for the Amazon Kindle by Joshua Tallent

book scanning, job automation, optical character recognition

This is most common with out-of-print books, but it can also happen when the rights to the book revert back to the author and the publisher, for whatever reason, does not have a copy of the book in a PDF or other digital format. The easiest way to get the book back into a digital format is to scan it and run it through an Optical Character Recognition (OCR) software program. There are a variety of options available to the do-it-yourself person or to the pay-someone-else person. The main benefit to doing the process yourself is saving money, but you may find that having some help in the process is easier and faster. The first step in the OCR process is to have your book scanned.

pages: 118 words: 35,663

Smart Machines: IBM's Watson and the Era of Cognitive Computing (Columbia Business School Publishing) by John E. Kelly Iii

AI winter, book value, call centre, carbon footprint, Computing Machinery and Intelligence, crowdsourcing, demand response, discovery of DNA, disruptive innovation, Erik Brynjolfsson, Fairchild Semiconductor, future of work, Geoffrey West, Santa Fe Institute, global supply chain, Great Leap Forward, Internet of things, John von Neumann, Large Hadron Collider, Mars Rover, natural language processing, optical character recognition, pattern recognition, planetary scale, RAND corporation, RFID, Richard Feynman, smart grid, smart meter, speech recognition, TED Talk, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!

Their invention represented a major advance in the science of machine learning, a branch of artificial intelligence that focuses on building systems that learn from data. The field was first defined in 1959 by IBM scientist Arthur Samuels and has found plenty of uses over the years—including common applications such as optical character recognition and e-mail spam filters. Such systems are trained to recognize repeated patterns in words or shapes and to react in a certain way when they encounter them again. Watson takes machine learning to a new level. In creating the technology for Watson, called DeepQA, which includes the learning capability, the developers provided the machine with a large corpus of unstructured information and the algorithms to extract knowledge from it.

pages: 666 words: 181,495

In the Plex: How Google Thinks, Works, and Shapes Our Lives by Steven Levy

"World Economic Forum" Davos, 23andMe, AltaVista, Andy Rubin, Anne Wojcicki, Apple's 1984 Super Bowl advert, autonomous vehicles, Bill Atkinson, book scanning, Brewster Kahle, Burning Man, business process, clean water, cloud computing, crowdsourcing, Dean Kamen, discounted cash flows, don't be evil, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Dutch auction, El Camino Real, Evgeny Morozov, fault tolerance, Firefox, General Magic , Gerard Salton, Gerard Salton, Google bus, Google Chrome, Google Earth, Googley, high-speed rail, HyperCard, hypertext link, IBM and the Holocaust, informal economy, information retrieval, Internet Archive, Jeff Bezos, John Markoff, Ken Thompson, Kevin Kelly, Kickstarter, large language model, machine translation, Mark Zuckerberg, Menlo Park, one-China policy, optical character recognition, PageRank, PalmPilot, Paul Buchheit, Potemkin village, prediction markets, Project Xanadu, recommendation engine, risk tolerance, Rubik’s Cube, Sand Hill Road, Saturday Night Live, search inside the book, second-price auction, selection bias, Sheryl Sandberg, Silicon Valley, SimCity, skunkworks, Skype, slashdot, social graph, social software, social web, spectrum auction, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Levy, subscription business, Susan Wojcicki, Ted Nelson, telemarketer, The future is already here, the long tail, trade route, traveling salesman, turn-by-turn navigation, undersea cable, Vannevar Bush, web application, WikiLeaks, Y Combinator

If a web page required users to fill out a form to see certain content, Google had probably taught its spiders how to fill out the form. Sometimes content was locked inside programs that ran when users visit a page—applications running in the JavaScript language or a media program like Adobe’s Flash. Google knew how to look inside those programs and suck out the content for its indexes. Google even used optical character recognition to figure out if an image on the website had text on it. The accumulation of all those improvements lengthened Google’s lead over its competitors, and the circle of early adopters who first discovered Google was eventually joined by the masses, building a dominant market share. Even Google’s toughest competitors had to admit that Brin and Page had built something special.

…

It had to be a rate that someone could maintain for a long time—this was going to scale, remember, to every book ever written. They finally used a metronome to synchronize their actions. After some practice, they found that they could capture a 300-page book such as Startup in about forty-two minutes, faster than they expected. Then they ran optical character recognition (OCR) software on the images and began searching inside the book. Page would open the book to a random page and say, “This word—can you find it?” Mayer would do a search to see if she could. It worked. Presumably, a dedicated machine could work faster, and that would make it possible to capture millions of books.

…

., 236, 331, 344–47, 364, 365–66 Kahle, Brewster, 362, 365 Kamangar, Salar, 71–72, 74, 233, 235 and advertising, 86, 89, 91–92, 109, 113 and business plan, 72, 75, 201 and Google motto, 143–44 and YouTube, 248, 260–65 Karen, Alana, 97–98 Karim, Jawed, 243, 247, 250 Kay, Erik, 207 Keyhole, 239–40, 340 Keyword Pricing Index, 118 Khosla, Vinod, 28, 29 Kim, Jini, 166 Klau, Rick, 312, 318 Kleinberg, Jon, 24–26, 34, 38, 292 Knol, 240 Knuth, Donald, 14 Kohl, Herb, 332 Koogle, Timothy, 44 Kordestani, Omid, 75–76, 78, 81, 96, 97, 130, 155, 242 Krane, David, 69–70, 143, 144–45, 150, 156 Kraus, Joe, 28, 136, 201, 374–75 Kundra, Vivek, 322, 326 Kurzweil, Raymond, 66 language, translations, 55, 62–65 Lantos, Tom, 285–87 Larson, Mark, 208 Leach, Jim, 286 Lee, Kai-Fu: and China office, 4, 281–83, 289–90, 291, 292, 293, 294, 296, 298, 302, 303, 305, 307–8, 313 departure of, 307–8, 312 Lee, Steve, 338–39 Lenat, Douglas, 47 Leonard, Herman, 117 Lessig, Lawrence, 359, 360, 363 Levick, Jeff, 96, 110–11, 112–13 Levinson, Arthur, 218, 237 Li, Mark, 293, 298–99 Li, Robin (Yanhong), 26–27, 278, 292, 293, 298 Library of Congress, 352, 361 Liebman, Jason, 103–5 LinkAds, 102–3 Linux, 78, 182, 210 Litvack, Sanford “Sandy,” 345, 347 Liu, John, 296 Liu, Jun, 294, 303–4 long-tail businesses, 85, 105, 107, 118, 243, 334 Lu, Qi, 380 Lucovsky, Mark, 283 Luk, Ben, 290, 302 Maarek, Yoelle, 272 MacDonald, Brian, 380 Macgillivray, Alex “AMac,” 353–55 machine learning, 64–66, 100–101, 385 Malone, Kim (Scott), 107–8, 135 Manber, Udi, 44, 45, 57–58, 68, 240, 355, 380 MapReduce, 199–200 Marconi International Fellowship Award, 278 Markoff, John, 193 Matias, Yossi, 272 Mayer, Marissa, 36, 41, 381 and advertising, 78–79 and APM program, 1, 4, 5, 161–62, 259 and books, 348–50, 358, 365 and Gmail, 170–71 and Google culture, 121, 122, 126–27, 141, 142, 163, 164, 365 and Google motto, 143–44 and Google’s look, 206–7 and management structure, 160, 235 and social networking, 371–73, 375 and stock price, 155, 156–57 McCaffrey, Cindy, 3, 76, 77, 145, 150, 153, 164 McCarthy, John, 127 McLaughlin, Andrew: and China, 276–79, 283–84, 303, 304 and Obama administration, 316, 321, 322–23, 325–26, 327 and privacy, 176–78, 379 memex, 15, 44 Merrill, Douglas, 183 Mi, James, 276 Microsoft: and antitrust issues, 331–32, 344–45 and aQuantive, 331 Bing search engine, 186, 380–81 and books, 361, 363 and browser wars, 206, 283 and China, 281, 282, 283, 284, 285, 304 and competition, 70, 191, 197, 200–212, 218, 220, 266, 282–83, 331, 344–47, 363, 380–81 and Danger, 214 data centers of, 190 and disclosure, 108 and email, 168, 169, 179–80 Excel, 200 and Facebook, 370 Hotmail, 30, 168, 172, 180, 209 IE 7, 209 Internet Explorer, 204–7 and mapping, 342 monopolies of, 200, 331–32, 364 Office, 200, 202, 203 Outlook, 169 PowerPoint, 200, 203 and user data, 335 and values, 144 WebTV, 217 Windows, 200, 210, 212, 219, 331 Windows Mobile, 220 Word, 200 and Yahoo, 343–44, 346, 380 of yesterday, 369 MIDAS (Mining Data at Stanford), 16 Milgrom, Paul, 90 Miner, Rich, 215, 216 Mobile Accord, 325 mobile phones, 214–17, 219–22, 251 Moderator, 323–24 Mohan, Neal, 332, 336 Monier, Louis, 19, 20, 37 Montessori, Maria, 121, 124, 166 Montessori schools, 121–25, 129, 138, 149 Moonves, Leslie, 246 Moore’s Law, 169, 180, 261 Morgan Stanley, 149, 157 Moritz, Mike, 32, 73–74, 80, 147, 247–48, 249 Morozov, Evgeny, 379 Morris, Doug, 261 Mossberg, Walt, 94 Mowani, Rajeev, 38 Mozilla Firefox, 204, 206, 207–8, 209 Murdoch, Rupert, 249, 370 MySpace, 243, 375 name detection system, 50–52 Napier’s constant, 149 National Federation of the Blind, 365–66 National Institute of Standards and Technology (NIST), 65 National Science Digital Library, 347 National Security Agency (NSA), 310 Native Client, 212 navigation, 229, 232, 338 Nelson, Ted, 15 net neutrality, 222, 326–27, 330, 383–84 Netscape, 30, 75, 78, 147, 204, 206 Nevill-Manning, Craig, 129 Newsweek, 2, 3, 20, 179 New York Public Library, 354, 357 Nexus One, 230, 231–32 95th Percentile Rule, 187 Nokia, 341, 374 Norman, Donald, 12, 106 Norvig, Peter, 47, 62, 63, 138, 142, 316 Novell, 70 Obama, Barack, 315–21, 322, 323–24, 329, 346 Obama administration, 320–28 Ocean, 350–55 Och, Franz, 63–65 Oh, Sunny, 283, 297, 298 OKR (Objectives and Key Results), 163–64, 165, 186, 209, 325 Open Book Alliance, 362 Open Handset Alliance, 221–22 OpenSocial, 375–76 operating systems, 210–12 optical character recognition (OCR), 53, 349–50 Oracle, 220 Orkut, 371–73, 375 Otellini, Paul, 218 Overture, 89, 90, 91, 95, 96, 98–99, 103, 150 Oxford University Press, 354, 357 Page, Larry, 3, 5 achievements of, 53, 383 and advertising, 84, 86–87, 90, 92, 94, 95–97, 114, 334, 336–37 ambition of, 12, 39, 73, 127–28, 139, 198, 215, 238, 362, 386–87 and applications, 205, 206, 207, 208, 210, 240–42, 340 and artificial intelligence, 62, 100, 246, 385–86 and BackRub/PageRank, 17, 18, 21–24, 26 and birth of Google, 31–34 and Book Search, 11, 347–52, 355, 357, 359, 361, 362, 364 on capturing all the web, 22–24, 52, 58, 63 on changing the world, 6, 13, 33, 39, 97, 120, 125, 146, 173, 232, 279, 316, 327, 361, 384–85 childhood and early years of, 11–13 and China, 267, 276, 277–78, 279–80, 283, 284, 305, 311 and data centers, 182–83 and eco-activism, 241 and email, 169–72, 174, 179 and Excite, 28–29 and funding, 32, 33–34, 73–75 and hiring practices, 139–40, 142, 182, 271, 386 imagination of, 14, 271 and IPO, 146–47, 149–54, 157 and machine learning, 66, 67 and management, 74, 75–77, 79–82, 110, 143, 158–60, 162–66, 228, 231, 235, 252–53, 254, 255, 260, 272, 273, 386–87 marriage of, 254 as Montessori kid, 121–25, 127–28, 149, 331, 387 and Obama, 315–16 and philanthropy, 257–58 and privacy, 174, 176–77, 337 and robots, 246, 385 and secrecy, 31–32, 70, 72–73, 106, 218 and smart phones, 214–16, 224, 225, 226–30, 234 and social networking, 372 and speed, 184–85, 207 and Stanford, 12–13, 14, 16–17, 28, 29, 34 and trust, 221, 237 values of, 127–28, 130, 132, 135, 139–40, 146, 196, 361, 364 and wealth, 157 and web links, 51 and YouTube, 248 PageRank, 3, 17, 18, 21–24, 27, 34, 38, 48–49, 53, 55, 56, 294 Palm, 216, 221 Park, Lori, 235, 258 Pashupathy, Kannan, 270–72, 277, 282 Passion Device, 230 Patel, Amit, 45–46, 82 and Google motto, 143–44, 146 patents, 27, 39, 89, 102, 221, 235, 237, 350 PayPal, 242, 243 peer-to-peer protocols, 234–35 Peters, Marybeth, 352 Phil, 99–103 Philip, Prince, 122 Picasa, 185–86, 187, 239 Pichai, Sundar, 205–6, 207–8, 209–12 Pichette, Patrick, 120, 150, 254–56 Pike, Rob, 241 Pittman, R.

pages: 685 words: 203,949

The Organized Mind: Thinking Straight in the Age of Information Overload by Daniel J. Levitin

Abraham Maslow, airport security, Albert Einstein, Amazon Mechanical Turk, Anton Chekhov, autism spectrum disorder, Bayesian statistics, behavioural economics, big-box store, business process, call centre, Claude Shannon: information theory, cloud computing, cognitive bias, cognitive load, complexity theory, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, data science, deep learning, delayed gratification, Donald Trump, en.wikipedia.org, epigenetics, Eratosthenes, Exxon Valdez, framing effect, friendly fire, fundamental attribution error, Golden Gate Park, Google Glasses, GPS: selective availability, haute cuisine, How many piano tuners are there in Chicago?, human-factors engineering, if you see hoof prints, think horses—not zebras, impulse control, index card, indoor plumbing, information retrieval, information security, invention of writing, iterative process, jimmy wales, job satisfaction, Kickstarter, language acquisition, Lewis Mumford, life extension, longitudinal study, meta-analysis, more computing power than Apollo, Network effects, new economy, Nicholas Carr, optical character recognition, Pareto efficiency, pattern recognition, phenotype, placebo effect, pre–internet, profit motive, randomized controlled trial, Rubik’s Cube, Salesforce, shared worldview, Sheryl Sandberg, Skype, Snapchat, social intelligence, statistical model, Steve Jobs, supply-chain management, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Thomas Bayes, traumatic brain injury, Turing test, Twitter Arab Spring, ultimatum game, Wayback Machine, zero-sum game

, IBM, and Apple) advocates scanning everything into PDFs and keeping them on your computer. Home scanners are relatively inexpensive, and there are strikingly good scanning apps available on cell phones. If it’s something you want to keep, Malcolm says, scan it and save it under a filename and folder that will help you find it later. Use OCR (optical character recognition) mode so that the PDF is readable as text characters rather than simply a photograph of the file, to allow your computer’s own search function to find specific keywords you’re looking for. The advantage of digital filing is that it takes up virtually no space, is environmentally friendly, and is electronically searchable.

…

The technology for automatically scanning written materials and turning them into searchable text is not perfect. Many words that a human being can discern are misread by computers. Consider the following example from an actual book being scanned by Google: After the text is scanned, two different OCR (for optical character recognition) programs attempt to map these blotches on the page to known words. If the programs disagree, the word is deemed unsolved, and then reCAPTCHA uses it as a challenge for users to solve. How does the system know if you guessed an unknown word correctly? It doesn’t! But reCAPTCHAs pair the unknown words with known words; they assume that if you solve the known word, you’re a human, and that your guess on the unknown word is reasonable.

…

See brain physiology news media, 338–40 Newton, Isaac, 162 New Yorker, 120, 336 New York Times, 6, 339, 365 Nietzsche, Friedrich, 375 Nixon, Richard, 201 NMDA receptor, 167 nonlinear thinking and perception, 38, 215, 217–18, 262, 380 Norman, Don, 35 number needed to treat metric, 236, 240, 247, 264, 264 Obama, Barack, 219, 303 object permanence, 24 Office of Presidential Correspondence, 303 Olds, James, 101 Old Testament, 151 O’Neal, Shaquille, 352–53 One Hundred Names for Love (Ackerman), 364–65 online dating, 130–34, 422n130, 423n132 optical character recognition (OCR), 93, 119, 119 optimal information, 308–10 orders of magnitude, 354–55, 358–59, 361, 363, 400n7 organizational structure, 271–76, 315–18, 470n315, 471n317 Otellini, Paul, 380–81 Overbye, Dennis, 6, 19 Oxford English Dictionary, 114 Oxford Filing Supply Company, 93–94 Page, Jimmy, 174 pair-bonding, 128, 142 paperwork, 293–306 Pareto optimality, 269 parking tickets, 237, 451n237 Parkinson’s disease, 167–68 passwords, xx, 103–5 Patel, Shreena, 258 paternalism, medical, 245, 257 pattern recognition, 28, 249 Patton, George S., 73–74 peak performance, 167, 189, 191–92, 203, 206 Peer Instruction (Mazur), 367 perfectionism, 174, 199–200 periodic table of elements, 372–73, 373, 480n372 Perry, Bruce, 56 Peterson, Jennifer, 368 pharmaceuticals, 256–57, 343, 345–46 Picasso, Pablo, 283 Pierce, John R., 73 Pirsig, Robert, 69–73, 89, 295–97, 300 placebo effect, 253, 255 place memory, 82–83, 106, 293–94 planning, 43, 161, 174–75, 319–26 Plato, 14, 58, 65–66 plausibility, 350, 352, 478n352 Plimpton, George, 200 Plutarch, 340 Poldrack, Russ, 97 Polya, George, 357 Ponzo illusion, 21, 22 positron emission tomography (PET), 40 prediction, 344–45 prefrontal cortex, 161 Area 47, 287 and attention, 16–17, 43, 45–46 and changing behaviors, 176 and children’s television, 368 and creative time, 202, 210 and decision-making, 277, 282 and flow state, 203, 207 and information overload, 8 and literary fiction, 367 and manager/worker distinction, 176 and multitasking, 96, 98, 307 and procrastination, 197, 198, 200–201 and sleep, 187 and task switching, 171–72 and time organization, 161, 165–66, 174, 180 See also brain physiology preselection effect, 331, 343 Presidential Committee on Information Literacy, 365 primacy effect, 55, 408n56 primates, 17–18, 125–26, 135 Prince, 174 Princeton Theological Seminary, 145–46 prior distributions, 249 prioritization, 5–7, 33–35, 379–80 probability.

The Future of Technology by Tom Standage

air freight, Alan Greenspan, barriers to entry, business process, business process outsourcing, call centre, Clayton Christensen, computer vision, connected car, corporate governance, creative destruction, disintermediation, disruptive innovation, distributed generation, double helix, experimental economics, financial engineering, Ford Model T, full employment, hydrogen economy, hype cycle, industrial robot, informal economy, information asymmetry, information security, interchangeable parts, job satisfaction, labour market flexibility, Larry Ellison, Marc Andreessen, Marc Benioﬀ, market design, Menlo Park, millennium bug, moral hazard, natural language processing, Network effects, new economy, Nicholas Carr, optical character recognition, PalmPilot, railway mania, rent-seeking, RFID, Salesforce, seminal paper, Silicon Valley, Silicon Valley ideology, Silicon Valley startup, six sigma, Skype, smart grid, software as a service, spectrum auction, speech recognition, stem cell, Steve Ballmer, Steve Jurvetson, technological determinism, technology bubble, telemarketer, transcontinental railway, vertical integration, Y2K

The American company had been trying to write software to automate some of this work and reduce its licence-fee payments. Wipro scrapped the software project, hired 110 Indians and still did the work more cheaply. Once work has moved abroad, however, it joins the same cycle of automation and innovation that pushes technology forward everywhere. Optical-character-recognition software is automating the work of Indian data-entry workers. Electronic airline tickets are eliminating some of the ticket-reconciliation work airlines carry out in India. Eventually, natural-language speech recognition is likely to automate some of the call-centre work that is currently going to India, says Steve Rolls of Convergys, the world’s largest call-centre operator.

…

Multiband OFDM Alliance (MBOA) 215–17 Murray, Bill 52, 63 Murthy, Narayana 125, 130, 142 music ix, 95, 99–101, 102, 165–7, 168, 170–1, 172, 202–8, 212–13, 219–29 music industry internet threat 222–9 quality concerns 224–5 music players 204, 207–8, 219–29 see also iPod... hard disks 204, 207–8, 219–20 social issues 220–1 351 THE FUTURE OF TECHNOLOGY MVNOs see mobile virtual network operators Myriad 243 N N-Gage handset 161, 171 nanobots 316 Nanogen 323 nanometre, definition ix Nanometrics 323 Nanosolar 315 NanoSonic 308 Nanosys 321–2, 326 nanotechnology ix–x, 233, 263–4, 306–29 applications 308–15 chemistry 310–11 companies 321–6 computer chips 313–14, 325–6 concepts 306–29 definition 306 developing world 319–20 energy 314–15 fears 316–20, 327–9 funding 308–9 future prospects 309–15, 321–6 “grey goo” 309, 316 patents 321–6, 329 problems 316–20, 327–9 profit expectations 321–6 revenue streams 321–6 safeguards 327–9 toxicity issues 316–17, 319, 328–9 warfare 319 Napster 229 Narayanan, Lakshmi 125, 131–2 NASA 311, 315, 333 National Football League (NFL), America 194 natural-language search software, AI 339–40 NBIC 327 NCR Corporation 210 NEC 171, 203, 311 .NET 86 Netscape 8, 54 Network Associates 68 network computers 102 network effect 91 networks see also internet complexity problems 85–7 concepts 6–7, 13–16, 24–7, 42–8, 85–7, 338 costs 14–15 digital homes ix, 94–7, 147, 200, 202–32 open standards 24–7, 31, 43, 85–7, 115, 152 security issues 42–8, 49–65, 66–9 wireless technology ix, 11, 34–5, 39, 66–7, 93, 95–7, 109–10, 147, 150–3, 167, 168–9, 171–3, 203, 209–13, 334 neural networks 338 352 new inventions see innovations New York Power Authority 287–8 New Zealand 168, 301 Newcomer, Eric 26 Newell, Alan 336 news media, camera phones 182 Nexia Biotechnologies 263, 269 NFL see National Football League nickel-cadmium batteries 280 nickel-metal-hydride batteries 280 Nilekani, Nandan 131 Nimda virus 45, 50, 55 Nintendo 191–3 Nokia 120, 130, 150, 152–3, 154–61, 164–6, 170–4, 176, 208, 211, 217, 280 Nordan, Matthew 322, 325 Nordhaus, William 136–7 Norman, Donald 78, 82–3, 101–2 Novartis 240 Novell 9, 69 Novozymes 258 NTF messages 87 nucleotides 236, 241–8 Nuovo, Frank 173, 176 NVIDIA 202 O Oblix 68–9 obsolescence issues, built-in obsolescence 8–9, 29 ODMs see original design manufacturers OFDM see orthogonal frequency-division multiplexing on-demand computing 22, 88 see also services... O’Neil, David 73 O’Neil, John 28, 30 OneSaf 197 online banks 37 online shopping viii, 37 open standards 7, 10, 22–7, 31, 38, 43, 85–7, 115, 118–19, 152 operating systems 9, 10, 23–5, 31, 38, 85, 101, 109 operators, mobile phones 157–61, 162–9 Opsware 8, 15 optical-character recognition 121 Oracle 5, 20–2, 33, 38, 39–40, 46, 56, 62, 86, 243 Orange 157–8 organic IT 13–16, 88 original design manufacturers (ODMs), mobile phones 156–7 O’Roarke, Brian 192 O’Roarke, John 96 Orr, Scott 187 orthogonal frequency-division multiplexing (OFDM) 212–13, 215–17 INDEX Otellini, Paul 11, 95 outshored developments, software 38, 115, 138–9 outsourcing viii, 9, 19–20, 22, 38, 68–9, 71, 72, 88–92, 112–46, 158–60 see also globalisation barriers 121–2, 143 concepts 112–46 costs 112–24, 131–5, 140–3 cultural issues 122, 142 Europe 140–6 historical background 119–20, 125–6, 133 India 38, 109, 112–15, 119–22, 125–35, 137–8, 140–6 legal agreements 121–4 mobile phones 155–6, 158–60 opportunities 144–6 protectionists 140–6 reasons 123–4, 143 services 113–30 social outsourcing 143 “overshoot” stage, industries 9, 10–11, 109 overview vii–x, 6–7 Ovi, Alessandro 275–6 Oxford GlycoSciences 243 P Pacific Cycle 140 Page, Larry 9 Pait, Rob 207 Palladium 74, 76 Palm Pilot 150 Palmisano, Samuel 22 Paltrow, Gwyneth 173 Panasonic 156 Papadopoulos, Greg 14, 78–9, 83–4, 91 Papadopoulos, Stelios 237 Parker, Andrew 143 Parks Associates 96, 203 Parr, Doug 319 particulate filters 296–7 passwords 53, 58–61, 67, 96–7 patents, nanotechnology 321–6, 329 Patriot Act, America 35 PCs 9–16, 78–81, 82–110, 151, 171–3, 202–18 see also digital homes; hardware commoditisation issues 9–16, 132–5, 203 complexity issues 78–81, 82–110 screen sizes 100–1 UWB 214–18 Wi-Fi 209–18 PDAs see personal digital assistants Peck, Art 203 PentaSafe Security 60 Pentium chips 199–200 PeopleSoft 39, 86, 119, 126, 132 Perez, Carlota 5–6, 134 performance issues see also processing power; returns cars 291–8 Cell chips 198–200 cost links 29–30 Perlegen 244 personal digital assistants (PDAs) 151, 277, 279 see also handheld computers personal video recorders (PVRs) 203, 205–6 perverse incentives, security issues 61–2 Pescatore, John 55 Pfizer 69, 240, 247, 312, 315 pharmaceutical companies 239–40, 241–50, 312 PHAs 260 Philippines 130 Philips 120, 217 “phishing” 76, 89 phonograph 82, 84 photo-voltaic cells 280 photos ix, 78, 95, 101, 179–83 Physiome 248 Picardi, Tony 79 Pick, Adam 156 Pink Floyd 225 Piper, H. 292 Pittsburgh convention centre 304 Pivotal 187 plasma screens 230–2 plastics 238–9, 259–64 PlayStation 191–2, 199–200, 206–7 plug-and-play devices 78 plug-in hybrid cars 295–6 Poland 120 police involvement, security breaches 72 polio 265 politics 32–5 see also governments Pollard, John 157 pollution 275, 296–7, 299–304, 319 Pop Idol (TV show) 225 Pope, Alexander 267 Porsche 292 “post-technology” period, IT industry vii, 5–7 Powell, Michael 98, 206 power grids 233, 285–90 PowerPoint presentations 4–5, 107 Predictive Networks 337 Presley, Elvis 225 prices, downward trends viii, 4–7 PricewaterhouseCoopers 38 printers 78, 96 privacy issues 27, 34, 42–8, 179–83 see also security... mobile phones 179–83 processing power see also computer chips 353 THE FUTURE OF TECHNOLOGY exponential growth 4–7, 8–14 Proctor, Donald 106 Prodi, Romano 274–5 profits, future prospects 7, 17–18, 37–40 proprietary technology 24, 26, 80, 86 protectionists, outsourcing 140–6 proteins, biotechnology 241–64 protocols, complexity issues 86 Proxim 210 Prozac 315 PSA Peugeot Citroën 293, 296–7 PSP, Sony 191–3 public accounts 44 Pullin, Graham 177–8 PVRs see personal video recorders Q Qualcomm 164 quantum dots 312, 317, 322, 325 R radiation fears, mobile phones 176 radio 34–5, 36, 39, 94–5, 108, 155–61, 164, 209–18, 223 see also wireless... chips 155–61, 164 “garbage bands” 209–10, 215 music industry 223 spectrum 34–5, 94–5, 209–18 UWB 96–7, 214–19 Radjou, Navi 333–4 railway age vii, 5, 7, 23, 36, 39, 134 Raleigh, Greg 211 RAND 195 rationalisation exercises 31 RCA 108–9, 206, 208, 220, 315 real-world skills, gaming comparisons 194–7 RealNetworks 203 rechargeable batteries 280–4 Recourse Technologies 62–3 Reed, Philip 177 regulations 35, 44, 209–10, 326–9 see also legal issues relational databases 101–2 reliability needs viii, 42–8 religion 19 renewable energy 275–6, 286, 289, 300, 310, 315 ReplayTV 205 Research in Motion (RIM) 152–3 resistance problems, employees 31 return on investment (ROI) 30–1 returns 20, 29–31, 329 see also performance issues risk 20, 30, 329 revenue streams biotechnology 237–8, 241–2 354 gaming 189–90, 191 GM 251–2 mobile phones 151, 154–5, 157, 162–3, 165–6, 174 nanotechnology 321–6 revolutionary ideas vii–viii, 5–7, 13–14, 36–40, 80–4, 107–10, 116, 134, 151–3, 198–200, 236–40, 326–9 RFID radio tags 39, 94–5 Rhapsody 203 Ricardo 296–7 Riley, James, Lieutenant-Colonel 195–7 RIM see Research in Motion ringtones 165–6 RISC chips 200 risk assessments 70–4, 76 attitudes 18 handling methods 71 insurance policies 71–3 management 70–4 mitigation 71–3 outsourced risk 71, 72, 88–92 returns 20, 30, 329 security issues 42–8, 49–69, 70–4 RNA molecules 241–2, 249–50, 265 Robinson, Shane 15–16 robotics x, 233, 316, 332–5 Roco, Mihail 309 Rodgers, T.J. 32 Rofheart, Martin 216–17 Rogers, Richard 300 ROI see return on investment Rolls, Steve 121 Romm, Joseph 298 Roomba 332, 334–5 “root kit” software 51 Rose, John 226 Roslin Institute 256 Roy, Raman 125–8 Russia 115, 130, 140, 142, 145, 319 Ryan, John 312 S S700 mobile phone 171 Saffo, Paul 83–4, 103, 182 Salesforce.com 19, 20, 84, 91–2, 109 Samsung 158–60, 181, 208, 217, 231, 277 Santa Fe Institute 39 SAP 22, 38, 86, 119, 126, 132 satellite television 205 Saudi Arabia 180 scandals 28 scanning tunnelling microscope (STM) 306 SCC see Sustainable Computing Consortium Schadler, Ted 95, 97 Schainker, Robert 285, 289 INDEX Scherf, Kurt 96–7 Schmelzer, Robert 91 Schmidt, Eric 9, 35, 36–8 Schmidt, Nathan 66 Schneider National 29–31 Schneier, Bruce 43, 58, 61–2, 65, 70, 73–4 schools, surveillance technology 181 Schwartz, John 46 Schwinn 140, 143 Scott, Tony 43, 68–9 screen sizes 100–1 screws 23–4 Seagate Technology 207 seamless computing 96–7 Sears, Roebuck & Co 36 Securities and Exchange Commission 321 security issues viii, 25–7, 32–5, 42–8, 49–74, 86–7 see also privacy... airport approach 68–9 anti-virus software 50–1, 60, 67–8 biometric systems 60, 64–5, 71, 74 breaches 43–4, 46, 49–52, 62, 72–3 civil liberties 74 concepts 42–74, 86–7 costs 45–6, 50–1, 62, 70–4 employees 58–63, 69 encryption 53–4 firewalls 51–3, 58, 60, 62, 66–8, 71, 86–7 hackers 4, 43, 47, 49, 51–3, 58–63 handheld computers 67–8 honeypot decoys 62–3 human factors 57–63, 69 identity management 69 IDSs 51, 53–4, 62, 87 impact assessments 70–1, 76 insider attacks 62–3 insurance policies 71–3 internet 35, 42–8, 49–57, 61–2, 66, 66–7, 71, 73–6, 179–83 job vacancies 46 joint ventures 67 major threats 35, 42, 43, 47, 49–63, 66–9 management approaches 60–3, 69 Microsoft 54–6, 72, 74, 76 misconceptions 46–8 networks 42–8, 49–65, 66–9 passwords 53, 58–61, 67, 96–7 patches 56–7, 76 perverse incentives 61–2 police involvement 72 risk assessments 70–4, 76 standards 71–3 terrorism 35, 42, 43, 50, 65, 74, 75–6, 265–6 tools 49–63, 86–7 viruses 45, 47, 49–56, 59–60, 67–8, 74, 86, 89 Wi-Fi 66–7, 93 sedimentation factors 8–9, 84 segmentation issues, mobile phones 167–9 self-configuration concepts 88–9 Sellers, William 23 Seminis 254 Sendo 160 Senegal 182 September 11th 2001 terrorist attacks 35, 42, 43, 50, 65, 75 servers 9–16, 37–8, 62–3, 85–7, 132–3, 203 services industry 14, 17–22, 25–7, 31, 36–40, 80, 88–92, 109, 113–35, 203 see also web services outsourcing 113–46 session initiation protocol (SIP) 104–6 sewing machines 82, 84 SG Cowen 237 shapes, mobile phones 170–6 Shapiro, Carl 24 Sharp 156, 231, 326 shelfware phenomenon 20 Shelley, Mary 267, 269 shipping costs 121 sick building syndrome 302 Siebel 86 Siemens 120, 130, 142, 156, 159, 170, 172, 174 SightSpeed 84, 98, 103 SilentRunner 62 Silicon Valley 9, 32–40, 45–6, 54, 69, 79, 96, 98, 101, 103, 152, 313–14, 321 silk 263, 269 Simon, Herbert 336 simplicity needs 78–81, 84, 87, 88–92, 98–110 SIP see session initiation protocol Sircam virus 45, 49 Sirkin, Hal 120, 140 “six sigma” methods 128 SK 169 Skidmore, Owings & Merrill 302 Sky 205 Skype 103–4, 110 Sloan School of Management, MIT 30 Slovakia 120 small screens 100 Smalley, Richard 311 smallpox 265–6 smart power grids 233, 285–90 smartcards 64, 69 smartphones 150–3, 157–61 see also mobile phones SMES devices 289 Smith Barney 37 Smith, George 307–8 Smith, Lamar 75 Smith, Vernon 17 SNP 243–4 SOAP 25–7 355 THE FUTURE OF TECHNOLOGY social issues mobile phones 177–8, 182–3 music players 220–1 social outsourcing 143 software see also information technology ASPs 19–20, 91–2, 109 bugs 20–1, 54–6 Cell chips 198–200 commoditisation issues 10–16, 25, 132–5, 159, 203 complexity issues 14–15, 78–81, 82–110, 117–22 firewalls 52–3, 58, 86–7 hackers 51–3, 58–63 Java programming language 21–2, 25, 86 management software 13–16, 21–2, 88, 117–18 mobile phones 158–9 natural-language search software 339–40 operating systems 9, 10, 23–5, 31, 38, 85, 101, 109 outsourcing 38, 115, 138–9 patches 56–7, 76 premature releases 20–1 shelfware phenomenon 20 viruses 45, 47, 49–56, 59–60, 67, 74, 89 solar power 275–6, 286, 289, 301–2, 310, 315, 325 Solectron 112–13, 119 solid-state storage media 204, 207, 219 SOMO... project, mobile phones 177–8 Sony 95, 108, 156, 191–3, 198–200, 203, 206–7, 217, 228, 231, 282–4, 332, 334, 338 Sony Ericsson 156, 158, 159–60, 171 Sony/BMG 222–3, 227, 229 Sood, Rahul 38 Sorrent 187 South Africa 309, 319, 334 South Korea 156, 158, 163–5, 167–9, 170–1, 181, 319 soyabean crops 252–4 spam 76, 89, 118 Spar, Debora 32–3 speculation vii speech recognition 102, 121, 336 SPH-V5400 mobile phone 208 Spider-Man 189–90 Spinks, David 60–1, 63 Spitzer, Eliot 223 Sprint 167–8, 180–1 SQL 53 @Stake 54 Standage, Ella 316 standards green buildings 300–4 open standards 7, 10, 22–7, 31, 38, 43, 85–7, 115, 118–19, 152 356 security issues 71–3 W-CDMA standard 163–4, 168 web services 90–1 Wi-Fi 210–13 Stanford University 82, 137 Star Wars (movie) 186 steam power ix, 5, 134 steel industry 134 steering committees 31 stem cells 268–9 Steven Winter Associates 302 Stewart, Martha 249 STM see scanning tunnelling microscope stop-start hybrid cars 293–4 storage problems, electricity 275–6, 289–90 StorageTek 85 strategy 30 stress-resistance, biotechnology 254 Studio Daniel Libeskind 302 Sturiale, Nick 45 Sun Microsystems 9, 13–15, 21–2, 25, 27, 37–8, 43, 56, 58, 78–9, 83, 85, 87, 91, 102 supercomputers 199–200 Superdome machines 21 supply chains 8, 37–40, 155 surveillance technology 35, 74, 179–83, 309 Sussex University 5, 220, 310 Sustainable Computing Consortium (SCC) 27 Sweden 109 Swiss Army-knife design, mobile phones 171–2 Swiss Re Tower, 30 St Mary Axe 299, 301–2, 304 swivel design, mobile phones 171 Symantec 39, 46, 50, 62–3, 67 Symbian 158 Symbol 210 synthetic materials 258–64, 317 systems analysts 137 T T-Mobile 167–8 Taiwan 156–7, 160 Talwar, Vikram 144 Taylor, Andy 226 Taylor, Carson 287 TCP/IP 25 TCS 132–5, 145–6 Teague, Clayton 314 TechNet 33 techniques, technology 17–18 techno-jewellery design, mobile phones 172–4 technology see also individual technologies concepts vii–x, 4–7, 17–18, 23–7, 32–3, 82–4, 134, 326–9 cultural issues 93–4, 142 INDEX geekiness problems 83–4 government links 7, 18, 27, 31–5, 43–8, 123–4, 179–83, 209–10 Luddites 327 surveillance technology 35, 74, 179–83, 309 Tehrani, Rich 105 telecommunications viii, 23, 26, 103–6, 134, 164–5 telegraph 32–3, 108 telephone systems 84, 103–6, 109–10, 212–13, 214 Telia 109 terrorism 35, 42, 43, 50, 65, 74, 75–6, 265–6 Tesco 168 Tetris 12 Texas Energy Centre 287 Texas Instruments 125–6, 217 text-messaging facilities 165, 167 Thelands, Mike 164 therapeutic antibodies 249–50, 256–7 Thiercy, Max 339–40 thin clients 102 third-generation mobile phone networks (3G) 151, 162–9, 212 Thomas, Jim 318 Thomson, Ken 59 Thornley, Tony 164 3G networks see third-generation mobile phone networks TIA see Total Information Awareness TiVo 203, 205–6 Tomb Raider (game/movie) 187–8 Toshiba 156, 198–200, 203 Total Information Awareness (TIA) 35 toxicity issues, nanotechnology 316–17, 319, 328–9 Toyota 291–5, 297, 300–1, 334 toys see also gaming robotics 334 transatlantic cable 36, 39 transistors 4–7, 8–12, 85–7, 109 see also computer chips Transmeta 313 Treat, Brad 84, 98 Tredennick, Nick 10–11 Treo 150, 153 “Trojan horse” software 51–2 True Crime (game) 187 TruSecure 52, 60, 63 TTPCom 155–6 Tuch, Bruce 210 TVs see also video recorders flat-panel displays ix, 94, 147, 202–3, 230–2, 311 hard disks 204–8 screens 202–3, 230–2 set-top boxes 203, 205–6 UWB 214–18 Wi-Fi 212–18 U UBS Warburg 31, 45, 80–1, 89, 170, 174 UDDI 25–7 ultrawideband (UWB) 96–7, 214–19 UMTS see W-CDMA standard “undershoot” stage, industries 9, 109 UNECE 332–4 Ungerman, Jerry 52 Unimate 332–3 United Airlines 27 Universal Music 222–3, 226–7 Unix 9, 25, 85, 108 USB ports 78 usernames 59 USGBC 300–2 utility companies, cyber-terrorism threats 75–6 utility factors 7, 16, 17, 19–22, 42–8 UWB see ultrawideband V V500 mobile phone 157 vaccines 265–6 Vadasz, Les 33 Vail, Tim 290 value added 5–7, 37–40, 133, 138–9 value transistors 11 van Nee, Richard 211 Varian, Hal 24 VC see venture capital Veeco Instruments 324 vendors complexity issues 84–110 consumer needs 94–7 Venter, Craig 262–3, 271 venture capital (VC) 12, 31, 45, 79, 92, 107, 126–7, 238, 308, 321–6 Verdia 254–5, 261 Veritas 39, 85 Vertex 247 vertical integration, mobile phones 156–61 Vertu brand 173–4 Viacom 224 video phone calls 84, 103–6, 164–5, 167–8 video recorders see also TVs DVRs 205–6 handheld video players 206 hard disks 204–8 PVRs 203, 205–6 Wi-Fi 212–13 video searches, Google 11 357 THE FUTURE OF TECHNOLOGY Video Voyeurism Prevention Act, America 180 video-game consoles see gaming Virgin 95, 160, 167–8 Virgin Mobile 160, 167–8 virtual private networks (VPNs) 54, 68, 86–7 virtual tissue, biotechnology 248 virtualisation concepts 15–16, 88–92 viruses 45, 47, 49–56, 59–60, 67–8, 74, 86, 89 anti-virus software 50–1, 60, 67–8 concepts 49–56, 59–60, 74 costs 50–1 double-clicking dangers 59–60 Vista Research 46, 62, 67 Vodafone 164–5 voice conversations internet 103–6 mobile phones 165–9, 171 voice mail 104–6 voice-over-internet protocol (VOIP) 103–6, 167 Vonage 104, 110 VPNs see virtual private networks W W3C see World Wide Web Consortium W-CDMA standard 163–4, 168 Waksal, Sam 249 Wal-Mart 95, 114–15, 131–2, 140, 224, 228 Walkman 192 warfare AI 338 biotechnology 265–6 gaming comparisons 195–7, 339 nanotechnology 319 Warner Music 222–3, 226–7 Watson, James 236, 247, 271 web services 21–2, 25–7, 31, 80, 88–92, 109, 203 see also internet; services... complexity issues 88–92, 109 standards 90–1 Webster, Mark 211 WECA see Wireless Ethernet Compatibility Alliance Weill, Peter 30 Welland, Mark 318 Western Union 33, 108 Westinghouse Electric 332 wheat 253 white page 99–100 Wi-Fi 34–5, 66–7, 93, 95–7, 153, 203, 209–18 concepts 209–18 forecasts 209, 212–13 historical background 209–13 hotspots 211–12 mobile phones 212 standards 210–13 358 threats 212–13 UWB 214–18 Wilkerson, John 237 Williams, Robbie 222, 226 Wilsdon, James 318 WiMax 212–13 WiMedia 213 Wimmer, Eckard 265 wind power 275–6, 286, 289–90, 302 Windows 15, 24–5, 55–6, 96, 101, 108, 152, 203 Windows Media Center 203 WinFS 101 Wipro 112, 115, 120–1, 125–9, 131–5, 138, 145–6 Wireless Ethernet Compatibility Alliance (WECA) 211 wireless technology ix, 11, 34–5, 39, 66–7, 93, 95–7, 109–10, 147, 150–3, 167, 168–9, 171–3, 203, 209–13, 334 see also Wi-Fi Bluetooth wireless links 171–2, 173, 214–15, 218 concepts 209–13, 334 historical background 209–13 Wladawsky-Berger, Irving vii, 5, 19, 22, 25, 38–9 Wolfe, Josh 323 Wong, Leonard 195 Wood, Ben 156–7, 160, 174 Woodcock, Steven 338–9 Word 84, 107 work-life balance 80–1, 94 see also employees World Wide Web Consortium (W3C) 25 worm viruses 49–50, 59, 86, 89 Wright, Myles 118 “ws splat” 90–1 WSDL 25–7 X x-ray crystallography 247–8 Xbox 189, 206–7 Xelibri mobile phones 170, 172, 174 Xerox 108–9 XML see extensible markup language XtremeSpectrum 216 Y Y2K crisis 76, 126, 128 Yagan, Sam 229 Yanagi, Soetsu 84 Yurek, Greg 288 Z ZapThink 91

pages: 566 words: 122,184

Code: The Hidden Language of Computer Hardware and Software by Charles Petzold

Bill Gates: Altair 8800, Charles Babbage, Claude Shannon: information theory, computer age, Dennis Ritchie, digital divide, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Dynabook, Eratosthenes, Fairchild Semiconductor, Free Software Foundation, Gary Kildall, Grace Hopper, invention of the telegraph, Isaac Newton, Ivan Sutherland, Jacquard loom, James Watt: steam engine, John von Neumann, Joseph-Marie Jacquard, Ken Thompson, Louis Daguerre, millennium bug, Multics, Norbert Wiener, optical character recognition, popular electronics, Richard Feynman, Richard Stallman, Silicon Valley, Steve Jobs, Turing machine, Turing test, Vannevar Bush, Von Neumann architecture

But another way to look at the UPC is as a series of bits. Keep in mind that the whole bar code symbol isn't exactly what the scanning wand "sees" at the checkout counter. The wand doesn't try to interpret the numbers at the bottom, for example, because that would require a more sophisticated computing technique known as optical character recognition, or OCR. Instead, the scanner sees just a thin slice of this whole block. The UPC is as large as it is to give the checkout person something to aim the scanner at. The slice that the scanner sees can be represented like this: This looks almost like Morse code, doesn't it? As the computer scans this information from left to right, it assigns a 1 bit to the first black bar it encounters, a 0 bit to the next white gap.

…

Because video display memory and bitmaps are conceptually identical, if a program knows how to draw a metafile in video display memory, it knows how to draw a metafile on a bitmap. But converting a bitmap to a metafile isn't so easy, and for some complex images might well be impossible. One technique related to this job is optical character recognition, or OCR. OCR is used when you have a bitmap of some text (from a fax machine, perhaps, or scanned from typed pages) and need to convert it to ASCII character codes. The OCR software needs to analyze the patterns of bits and determine what characters they represent. Due to the algorithmic complexity of this job, OCR software is usually not 100 percent accurate.

pages: 382 words: 120,064

Bank 3.0: Why Banking Is No Longer Somewhere You Go but Something You Do by Brett King

3D printing, Abraham Maslow, additive manufacturing, Airbus A320, Albert Einstein, Amazon Web Services, Any sufficiently advanced technology is indistinguishable from magic, Apollo 11, Apollo 13, Apollo Guidance Computer, asset-backed security, augmented reality, barriers to entry, behavioural economics, bitcoin, bounce rate, business intelligence, business process, business process outsourcing, call centre, capital controls, citizen journalism, Clayton Christensen, cloud computing, credit crunch, crowdsourcing, disintermediation, en.wikipedia.org, fixed income, George Gilder, Google Glasses, high net worth, I think there is a world market for maybe five computers, Infrastructure as a Service, invention of the printing press, Jeff Bezos, jimmy wales, Kickstarter, London Interbank Offered Rate, low interest rates, M-Pesa, Mark Zuckerberg, mass affluent, Metcalfe’s law, microcredit, mobile money, more computing power than Apollo, Northern Rock, Occupy movement, operational security, optical character recognition, peer-to-peer, performance metric, Pingit, platform as a service, QR code, QWERTY keyboard, Ray Kurzweil, recommendation engine, RFID, risk tolerance, Robert Metcalfe, self-driving car, Skype, speech recognition, stem cell, telepresence, the long tail, Tim Cook: Apple, transaction costs, underbanked, US Airways Flight 1549, web application, world market for maybe five computers

Augmenting our environment with the application of smart data will be an intriguing and highly profitable business over the next decade. Augmented reality Something that is a little bit out there, but interesting to think about, is the emerging technology around image recognition and data overlays in the real world. We’ve had OCR or Optical Character Recognition for many years now, but there have been recent improvements in image processing and matching. Recently Google has developed search engine technology called “Google Goggles” that allows users to search based on images taken by their camera phones. It is currently in beta with some reasonable search support for books, DVDs, landmarks, logos, contact info, artwork, businesses, products, barcodes, and text.

…

Moore’s Law: Named after Gordon Moore, this law basically states that the number of transistors on a chip doubles every 24 months. NFC: Near Field Communication—a short-range high-frequency wireless communication technology which enables the exchange of data between devices over about a 10-centimetre distance OCR: Optical Character Recognition OpEx: Operating Expense OLED: Organic Light-Emitting Diode (also Organic Electro-luminescent Device OELD)—an LED whose electroluminescent layer is composed of a film of organic compounds. OTC: Over the Counter—refers to physical transactions or trades done on behalf of a customer by a trader or customer representative who has access to a specific closed financial system or network.

pages: 170 words: 51,205

Information Doesn't Want to Be Free: Laws for the Internet Age by Cory Doctorow, Amanda Palmer, Neil Gaiman

Airbnb, barriers to entry, Big Tech, Brewster Kahle, cloud computing, Dean Kamen, Edward Snowden, game design, general purpose technology, Internet Archive, John von Neumann, Kickstarter, Large Hadron Collider, machine readable, MITM: man-in-the-middle, optical character recognition, plutocrats, pre–internet, profit maximization, recommendation engine, rent-seeking, Saturday Night Live, Skype, Steve Jobs, Steve Wozniak, Stewart Brand, Streisand effect, technological determinism, transfer pricing, Whole Earth Catalog, winner-take-all economy

Pair it up with your e-book-reading app (Amazon’s Kindle app, say), click the button that takes you to the first page, and then click the button that captures and saves the rectangle of screen where the page is. Do this once for every page in the book—call it one page per second—and you’ll end up with a folder full of pages. Now upload those pages to Google’s free optical character-recognition software (which converts pictures of words back into plain text), download the results, and call it a day. There are analogs to these processes for practically all locked media. You can play locked audio out the headphone jack of one device and into the mic jack of another, recapturing the audio.

pages: 188 words: 9,226

Collaborative Futures by Mike Linksvayer, Michael Mandiberg, Mushon Zer-Aviv

4chan, AGPL, Benjamin Mako Hill, British Empire, citizen journalism, cloud computing, collaborative economy, corporate governance, crowdsourcing, Debian, Eben Moglen, en.wikipedia.org, fake news, Firefox, informal economy, jimmy wales, Kickstarter, late capitalism, lolcat, loose coupling, Marshall McLuhan, means of production, Naomi Klein, Network effects, optical character recognition, packet switching, planned obsolescence, postnationalism / post nation state, prediction markets, Richard Stallman, semantic web, Silicon Valley, slashdot, Slavoj Žižek, stealth mode startup, technoutopianism, The future is already here, the medium is the message, The Wisdom of Crowds, web application, WikiLeaks, Yochai Benkler

The most evident example is Google, whose PageRank algorithm uses a survey of links between sites to classify their relevance to a user’s query. Likewise ReCaptcha uses a commonplace authentication in a two-part implementation, ﬁrstly to exclude automated spam, and then to digitize words from books that were not recognizable by optical character recognition. Contributions are extracted from participants unconscious of the recycling of their activity into the ﬁnessing of the value-chain. Web site operators who integrate ReCaptcha, however, know precisely what they're doing, and choose to transform a necessary defense mechanism for their site into a productive channel of contributions to what they regard as a useful task. (2) Aggregation services such as delicious and photographic archives like ﬂickr, ordered by tags and geographic information, leverage users’ selfinterests in categorizing their own materials to enhance usability.

pages: 528 words: 146,459

Computer: A History of the Information Machine by Martin Campbell-Kelly, William Aspray, Nathan L. Ensmenger, Jeffrey R. Yost

Ada Lovelace, air freight, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Apple's 1984 Super Bowl advert, barriers to entry, Bill Gates: Altair 8800, Bletchley Park, borderless world, Buckminster Fuller, Build a better mousetrap, Byte Shop, card file, cashless society, Charles Babbage, cloud computing, combinatorial explosion, Compatible Time-Sharing System, computer age, Computer Lib, deskilling, don't be evil, Donald Davies, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Jenner, Evgeny Morozov, Fairchild Semiconductor, fault tolerance, Fellow of the Royal Society, financial independence, Frederick Winslow Taylor, game design, garden city movement, Gary Kildall, Grace Hopper, Herman Kahn, hockey-stick growth, Ian Bogost, industrial research laboratory, informal economy, interchangeable parts, invention of the wheel, Ivan Sutherland, Jacquard loom, Jeff Bezos, jimmy wales, John Markoff, John Perry Barlow, John von Neumann, Ken Thompson, Kickstarter, light touch regulation, linked data, machine readable, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Mitch Kapor, Multics, natural language processing, Network effects, New Journalism, Norbert Wiener, Occupy movement, optical character recognition, packet switching, PageRank, PalmPilot, pattern recognition, Pierre-Simon Laplace, pirate software, popular electronics, prediction markets, pre–internet, QWERTY keyboard, RAND corporation, Robert X Cringely, Salesforce, scientific management, Silicon Valley, Silicon Valley startup, Steve Jobs, Steven Levy, Stewart Brand, Ted Nelson, the market place, Turing machine, Twitter Arab Spring, Vannevar Bush, vertical integration, Von Neumann architecture, Whole Earth Catalog, William Shockley: the traitorous eight, women in the workforce, young professional

Applying the code to products had to be inexpensive in order not to disadvantage small manufacturers; and the expense of product coding could not add significantly to the cost of goods, as this would disadvantage retailers who were unable to participate in the system because they would be forced to bear the increased cost of the system without deriving any benefit. The checkout equipment also had to be relatively inexpensive because millions of barcode readers would eventually be needed. These conditions ruled out the use of the expensive magnetic ink and optical character-recognition systems then in use by banks and the Federal Reserve. Various experimental systems were tested at a cost of several million dollars. By the end of 1971 there was a much better awareness of the complex trade-offs that had to be made—for example, between printing costs and scanner costs.

…

See Business machine industry Office of Naval Research (ONR), 147–148, 150 Office of Scientific Research and Development (OSRD), 49, 65–66, 74 Office systematizers, 19, 134 Olivetti, 197, 251 Olsen, Kenneth, 217–218 Omidyar, Pierre, 295 “On Computable Numbers with an Application to the Entscheidungsproblem” (Turing), 60 Opel, John, 246 Open-source software, 215, 288, 296 Operating systems for mainframe computers, 179–182, 205, 206, 210, 212–215 for mobile devices, 297, 298 for personal computers, 242–243, 246–247, 253–254, 257–258, 264–267 See also specific operating systems Optical character recognition, 164 OS/2 operating system, 265, 266 OS/360 operating system, 179–182, 183, 212 Osborne 1 computer, 198 (photo), 296 Outsourcing of components and software, 245–246, 247 Oxford English Dictionary, 3 Packaged software programs, 186–188, 254 Packard, David, 249 Packet-switching technology, 281–282 Page, Larry, 294 Palm, Inc., 297, 298 Palo Alto Research Center (PARC), 260, 261, 280, 296 Papian, Bill, 150 Parker, Sean, 301 Pascal programming language, 185 Passages from the Life of a Philosopher (C.

The Art of SEO by Eric Enge, Stephan Spencer, Jessie Stricchiola, Rand Fishkin

AltaVista, barriers to entry, bounce rate, Build a better mousetrap, business intelligence, cloud computing, content marketing, dark matter, en.wikipedia.org, Firefox, folksonomy, Google Chrome, Google Earth, hypertext link, index card, information retrieval, Internet Archive, Larry Ellison, Law of Accelerating Returns, linked data, mass immigration, Metcalfe’s law, Network effects, optical character recognition, PageRank, performance metric, Quicken Loans, risk tolerance, search engine result page, self-driving car, sentiment analysis, social bookmarking, social web, sorting algorithm, speech recognition, Steven Levy, text mining, the long tail, vertical integration, Wayback Machine, web application, wikimedia commons

They can only recognize some very basic types of information within images, such as the presence of a face, or whether images may have pornographic content (by how much flesh tone there is in the image). A search engine cannot tell whether an image is a picture of Bart Simpson, a boat, a house, or a tornado. In addition, search engines will not recognize any text rendered in the image. The search engines are experimenting with technologies to use optical character recognition (OCR) to extract text from images, but this technology is not yet in general use within search. In addition, conventional SEO wisdom has always held that the search engines cannot read Flash files, but this is a little overstated. Search engines have been extracting some information from Flash for years, as indicated by this Google announcement in 2008: http://googlewebmastercentral.blogspot.com/2008/06/improved-flash-indexing.html.

…

Google is known to be experimenting with artificial intelligence algorithms to improve detection of image content. For example, you can drag the image of a well-known person or place onto the Google Image search box (http://images.google.com), and Google will attempt to identify the contents and show you other similar images. Search engines are also experimenting with techniques such as optical character recognition (OCR) to read text content within images, but most images don’t have any text to read. Search engines also make use of facial recognition software to be able to determine when an image is of a face versus a body, or something else entirely. However, although these types of technologies are very useful, they are limited in terms of what they can do.

…

, When to Show Different Content to Engines and Visitors, SWFObject and NoScript tags, NoScript, Spammy giveaways misuse by spammers, SWFObject and NoScript tags using with Flash .swf files, NoScript widget giveaways, links embedded in, Spammy giveaways number range operator, Using number ranges O office productivity software, user metrics from, How Google and Bing Collect Engagement Metrics offline relationship building, Offline Relationship Building offline sales, influence of search on, How Search Engines Drive Commerce on the Web OKCupid, Local Search ontology of a website, Taxonomy and ontology Open Site Explorer, Two Spam Examples, Find out where your competitors get links, Open Site Explorer, Open Site Explorer mapping links across the Web, Open Site Explorer operating systems, mobile phones, How Google and Bing Collect Engagement Metrics opportunities, Navigational Queries, Informational Queries, Transactional Queries, Combining Business Assets and Historical Data to Conduct SEO/Website SWOT Analysis identifying in SWOT analysis, Combining Business Assets and Historical Data to Conduct SEO/Website SWOT Analysis in informational queries, Informational Queries in navigational queries, Navigational Queries in transactional queries, Transactional Queries optical character recognition (OCR), use by search engines, What search engines cannot see OR search operator, Using Advanced Search Techniques Organic Search Insight (OSI), Covario, SEO Platforms, Covario Organic Search Insight (OSI) Organic Search Optimizer (OSO), Covario, Covario Organic Search Optimizer (OSO) organic search results, Glossary originurlextension: operator, Advanced doc type searches outbound links, Glossary outsourced SEO, The Dynamics and Challenges of Using In-House Talent Versus Outsourcing, The Value of Outsourced Solutions, Outsourced Agency/Consultant/Contractor, Contracting for Specialist Knowledge and Experience, The Case for Working with an Outside Expert, How to Best Leverage Outside Help, How to Best Leverage Outside Help, Selecting an SEO Firm/Consultant, Making the Decision, Getting the Process Started, Preparing a Request for Proposal (RFP), A sample RFP document outline, Communicating with Candidate SEO Firms, Making the Decision, Mixing Outsourced SEO with In-House SEO Teams case for working with outside expert, The Case for Working with an Outside Expert, How to Best Leverage Outside Help, How to Best Leverage Outside Help how to best leverage outside help, How to Best Leverage Outside Help contracting for specialist for large organization, Contracting for Specialist Knowledge and Experience mixing outsourced SEO with in-house teams, Mixing Outsourced SEO with In-House SEO Teams selecting an SEO firm/consultant, Selecting an SEO Firm/Consultant, Making the Decision, Getting the Process Started, Preparing a Request for Proposal (RFP), A sample RFP document outline, Communicating with Candidate SEO Firms, Making the Decision communicating with candidate firms, Communicating with Candidate SEO Firms making the decision, Making the Decision preparing request for proposal (RFP), Preparing a Request for Proposal (RFP), A sample RFP document outline starting the process, Getting the Process Started for small organizations, Outsourced Agency/Consultant/Contractor value of, The Value of Outsourced Solutions P page and content creation/optimization, SEO for Raw Traffic, SEO for Ecommerce Sales, SEO for Mindshare/Branding, SEO for Lead Generation and Direct Marketing, SEO for Reputation Management, SEO for Ideological Influence SEO for ecommerce sales, SEO for Ecommerce Sales SEO for ideological influence, SEO for Ideological Influence SEO for lead generation and direct marketing, SEO for Lead Generation and Direct Marketing SEO for mindshare or branding, SEO for Mindshare/Branding SEO for raw traffic, SEO for Raw Traffic SEO for reputation management, SEO for Reputation Management page level keyword agnostic features ranking factor, Analyzing Ranking Factors page level keyword usage ranking factor, Analyzing Ranking Factors page level link metrics ranking factor, Analyzing Ranking Factors page level social metrics ranking factor, Analyzing Ranking Factors page level traffic/query data ranking factor, Analyzing Ranking Factors page load time, Glossary Page Not Found (404) error, Redirects page speed (load time) as ranking factor, Page Speed page views per visitor, Measuring Content Quality and User Engagement Page, Larry, The Original PageRank Algorithm pagejacking, Glossary PageRank, The Original PageRank Algorithm, The Original PageRank Algorithm, Other Search Engine Courses of Action, Measuring the value of a link, Measuring the value of a link, What is crawl efficiency and why is it important?

The Orbital Perspective: Lessons in Seeing the Big Picture From a Journey of 71 Million Miles by Astronaut Ron Garan, Muhammad Yunus

Airbnb, Apollo 13, barriers to entry, book scanning, Buckminster Fuller, carbon credits, clean water, corporate social responsibility, crowdsourcing, fake it until you make it, global village, Google Earth, Indoor air pollution, jimmy wales, low earth orbit, optical character recognition, overview effect, private spaceflight, ride hailing / ride sharing, shareholder value, Silicon Valley, Skype, smart transportation, Stephen Hawking, transaction costs, Turing test, Uber for X, web of trust

ReCAPTCHA is an offshoot of this project, stemming from the realization that humans type about two hundred million CAPTCHAs into Internet pages every day—â•‰totaling more than 500,000 hours, if typing a single CAPTCHA takes ten seconds. But in typing that CAPTCHA, the human brain is doing something a machine can’t. And that human capability is now being used to help digitize books. Digitizing old books usually involves scanning the books and then converting the images into text using optical character recognition algorithms. Unfortunately, computers can’t recognize all of the words, and generally the older the book is, the harder it is for computers to decipher the words. ReCAPTCHA takes the images 146â•…  L O O K I N G F O R WARD computers can’t recognize and uses those words as CAPTCHAs, using humans to identify the words the computer couldn’t.

pages: 144 words: 55,142

Interlibrary Loan Practices Handbook by Cherie L. Weible, Karen L. Janke

Firefox, information retrieval, Internet Archive, late fees, machine readable, Multics, optical character recognition, pull request, QR code, transaction costs, Wayback Machine, Works Progress Administration

In June 2002, Texas A&M significantly expanded its services to digitize or reformat its print collections on demand by implementing free electronic document delivery of articles for a campus of 48,000 students. Increasingly, ILL personnel will change ILL services and systems to include more document delivery, digital library production, digitization on demand that supports print-on-demand or reprinting services, and other services such as Optical Character Recognition (OCR). The need for format conversion parallels users’ expectations that the content we deliver take the form of their preferred technology and intended use. change factors for tomorrow’s hybrid services and context-sensitive workflow Today’s workflow must evolve new service models because of several key factors in our environment: the future of interlibrary loan •â•‡ Increased full-text sources will reduce the need for scanning of print.

pages: 1,331 words: 163,200

Hands-On Machine Learning With Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurélien Géron

AlphaGo, Amazon Mechanical Turk, Anton Chekhov, backpropagation, combinatorial explosion, computer vision, constrained optimization, correlation coefficient, crowdsourcing, data science, deep learning, DeepMind, don't repeat yourself, duck typing, Elon Musk, en.wikipedia.org, friendly AI, Geoffrey Hinton, ImageNet competition, information retrieval, iterative process, John von Neumann, Kickstarter, machine translation, natural language processing, Netflix Prize, NP-complete, OpenAI, optical character recognition, P = NP, p-value, pattern recognition, pull request, recommendation engine, self-driving car, sentiment analysis, SpamAssassin, speech recognition, stochastic process

The Machine Learning Landscape When most people hear “Machine Learning,” they picture a robot: a dependable butler or a deadly Terminator depending on who you ask. But Machine Learning is not just a futuristic fantasy, it’s already here. In fact, it has been around for decades in some specialized applications, such as Optical Character Recognition (OCR). But the first ML application that really became mainstream, improving the lives of hundreds of millions of people, took over the world back in the 1990s: it was the spam filter. Not exactly a self-aware Skynet, but it does technically qualify as Machine Learning (it has actually learned so well that you seldom need to flag an email as spam anymore).

…

Pac-Man Using Deep Q-Learning min_after_dequeue, RandomShuffleQueue MNIST dataset, MNIST-MNIST model parallelism, Model Parallelism-Model Parallelism model parameters, Gradient Descent, Batch Gradient Descent, Early Stopping, Under the Hood, Quadratic Programming, Creating Your First Graph and Running It in a Session, Construction Phase, Training RNNsdefining, Model-based learning model selection, Model-based learning model zoos, Model Zoos model-based learning, Model-based learning-Model-based learning modelsanalyzing, Analyze the Best Models and Their Errors-Analyze the Best Models and Their Errors evaluating on test set, Evaluate Your System on the Test Set-Evaluate Your System on the Test Set moments, Adam Optimization Momentum optimization, Momentum optimization-Momentum optimization Monte Carlo tree search, Policy Gradients Multi-Layer Perceptrons (MLP), Introduction to Artificial Neural Networks, The Perceptron-Multi-Layer Perceptron and Backpropagation, Neural Network Policiestraining with TF.Learn, Training an MLP with TensorFlow’s High-Level API multiclass classifiers, Multiclass Classification-Multiclass Classification Multidimensional Scaling (MDS), Other Dimensionality Reduction Techniques multilabel classifiers, Multilabel Classification-Multilabel Classification Multinomial Logistic Regression (see Softmax Regression) multinomial(), Neural Network Policies multioutput classifiers, Multioutput Classification-Multioutput Classification MultiRNNCell, Distributing a Deep RNN Across Multiple GPUs multithreaded readers, Multithreaded readers using a Coordinator and a QueueRunner-Multithreaded readers using a Coordinator and a QueueRunner multivariate regression, Frame the Problem N naive Bayes classifiers, Multiclass Classification name scopes, Name Scopes natural language processing (NLP), Recurrent Neural Networks, Natural Language Processing-An Encoder–Decoder Network for Machine Translationencoder-decoder network for machine translation, An Encoder–Decoder Network for Machine Translation-An Encoder–Decoder Network for Machine Translation TensorFlow tutorials, Natural Language Processing, An Encoder–Decoder Network for Machine Translation word embeddings, Word Embeddings-Word Embeddings Nesterov Accelerated Gradient (NAG), Nesterov Accelerated Gradient-Nesterov Accelerated Gradient Nesterov momentum optimization, Nesterov Accelerated Gradient-Nesterov Accelerated Gradient network topology, Fine-Tuning Neural Network Hyperparameters neural network hyperparameters, Fine-Tuning Neural Network Hyperparameters-Activation Functionsactivation functions, Activation Functions neurons per hidden layer, Number of Neurons per Hidden Layer number of hidden layers, Number of Hidden Layers-Number of Hidden Layers neural network policies, Neural Network Policies-Neural Network Policies neuronsbiological, From Biological to Artificial Neurons-Biological Neurons logical computations with, Logical Computations with Neurons neuron_layer(), Construction Phase next_batch(), Execution Phase No Free Lunch theorem, Testing and Validating node edges, Visualizing the Graph and Training Curves Using TensorBoard nonlinear dimensionality reduction (NLDR), LLE(see also Kernel PCA; LLE (Locally Linear Embedding)) nonlinear SVM classification, Nonlinear SVM Classification-Computational Complexitycomputational complexity, Computational Complexity Gaussian RBF kernel, Gaussian RBF Kernel-Gaussian RBF Kernel with polynomial features, Nonlinear SVM Classification-Polynomial Kernel polynomial kernel, Polynomial Kernel-Polynomial Kernel similarity features, adding, Adding Similarity Features-Adding Similarity Features nonparametric models, Regularization Hyperparameters nonresponse bias, Nonrepresentative Training Data nonsaturating activation functions, Nonsaturating Activation Functions-Nonsaturating Activation Functions normal distribution (see Gaussian distribution) Normal Equation, The Normal Equation-Computational Complexity normalization, Feature Scaling normalized exponential, Softmax Regression norms, Select a Performance Measure notations, Select a Performance Measure-Select a Performance Measure NP-Complete problems, The CART Training Algorithm null hypothesis, Regularization Hyperparameters numerical differentiation, Numerical Differentiation NumPy, Create the Workspace NumPy arrays, Handling Text and Categorical Attributes NVidia Compute Capability, Installation nvidia-smi, Managing the GPU RAM n_components, Choosing the Right Number of Dimensions O observation space, Neural Network Policies off-policy algorithm, Temporal Difference Learning and Q-Learning offline learning, Batch learning one-hot encoding, Handling Text and Categorical Attributes one-versus-all (OvA) strategy, Multiclass Classification, Softmax Regression, Exercises one-versus-one (OvO) strategy, Multiclass Classification online learning, Online learning-Online learning online SVMs, Online SVMs-Online SVMs OpenAI Gym, Introduction to OpenAI Gym-Introduction to OpenAI Gym operation_timeout_in_ms, In-Graph Versus Between-Graph Replication Optical Character Recognition (OCR), The Machine Learning Landscape optimal state value, Markov Decision Processes optimizers, Faster Optimizers-Learning Rate SchedulingAdaGrad, AdaGrad-AdaGrad Adam optimization, Faster Optimizers, Adam Optimization-Adam Optimization Gradient Descent (see Gradient Descent optimizer) learning rate scheduling, Learning Rate Scheduling-Learning Rate Scheduling Momentum optimization, Momentum optimization-Momentum optimization Nesterov Accelerated Gradient (NAG), Nesterov Accelerated Gradient-Nesterov Accelerated Gradient RMSProp, RMSProp out-of-bag evaluation, Out-of-Bag Evaluation-Out-of-Bag Evaluation out-of-core learning, Online learning out-of-memory (OOM) errors, Static Unrolling Through Time out-of-sample error, Testing and Validating OutOfRangeError, Reading the training data directly from the graph, Multithreaded readers using a Coordinator and a QueueRunner output gate, LSTM Cell output layer, Multi-Layer Perceptron and Backpropagation OutputProjectionWrapper, Training to Predict Time Series-Training to Predict Time Series output_put_keep_prob, Applying Dropout overcomplete autoencoder, Unsupervised Pretraining Using Stacked Autoencoders overfitting, Overfitting the Training Data-Overfitting the Training Data, Create a Test Set, Soft Margin Classification, Gaussian RBF Kernel, Regularization Hyperparameters, Regression, Number of Neurons per Hidden Layeravoiding through regularization, Avoiding Overfitting Through Regularization-Data Augmentation P p-value, Regularization Hyperparameters PaddingFIFOQueue, PaddingFifoQueue Pandas, Create the Workspace, Download the Datascatter_matrix, Looking for Correlations-Looking for Correlations parallel distributed computing, Distributing TensorFlow Across Devices and Servers-Exercisesdata parallelism, Data Parallelism-TensorFlow implementation in-graph versus between-graph replication, In-Graph Versus Between-Graph Replication-Model Parallelism model parallelism, Model Parallelism-Model Parallelism multiple devices across multiple servers, Multiple Devices Across Multiple Servers-Other convenience functionsasynchronous communication using queues, Asynchronous Communication Using TensorFlow Queues-PaddingFifoQueue loading training data, Loading Data Directly from the Graph-Other convenience functions master and worker services, The Master and Worker Services opening a session, Opening a Session pinning operations across tasks, Pinning Operations Across Tasks sharding variables, Sharding Variables Across Multiple Parameter Servers sharing state across sessions, Sharing State Across Sessions Using Resource Containers-Sharing State Across Sessions Using Resource Containers multiple devices on a single machine, Multiple Devices on a Single Machine-Control Dependenciescontrol dependencies, Control Dependencies installation, Installation-Installation managing the GPU RAM, Managing the GPU RAM-Managing the GPU RAM parallel execution, Parallel Execution-Parallel Execution placing operations on devices, Placing Operations on Devices-Soft placement one neural network per device, One Neural Network per Device-One Neural Network per Device parameter efficiency, Number of Hidden Layers parameter matrix, Softmax Regression parameter server (ps), Multiple Devices Across Multiple Servers parameter space, Gradient Descent parameter vector, Linear Regression, Gradient Descent, Training and Cost Function, Softmax Regression parametric models, Regularization Hyperparameters partial derivative, Batch Gradient Descent partial_fit(), Incremental PCA Pearson's r, Looking for Correlations peephole connections, Peephole Connections penalties (see rewards, in RL) percentiles, Take a Quick Look at the Data Structure Perceptron convergence theorem, The Perceptron Perceptrons, The Perceptron-Multi-Layer Perceptron and Backpropagationversus Logistic Regression, The Perceptron training, The Perceptron-The Perceptron performance measures, Select a Performance Measure-Select a Performance Measureconfusion matrix, Confusion Matrix-Confusion Matrix cross-validation, Measuring Accuracy Using Cross-Validation-Measuring Accuracy Using Cross-Validation precision and recall, Precision and Recall-Precision/Recall Tradeoff ROC (receiver operating characteristic) curve, The ROC Curve-The ROC Curve performance scheduling, Learning Rate Scheduling permutation(), Create a Test Set PG algorithms, Policy Gradients photo-hosting services, Semisupervised learning pinning operations, Pinning Operations Across Tasks pip, Create the Workspace Pipeline constructor, Transformation Pipelines-Select and Train a Model pipelines, Frame the Problem placeholder nodes, Feeding Data to the Training Algorithm placers (see simple placer; dynamic placer) policy, Policy Search policy gradients, Policy Search (see PG algorithms) policy space, Policy Search polynomial features, adding, Nonlinear SVM Classification-Polynomial Kernel polynomial kernel, Polynomial Kernel-Polynomial Kernel, Kernelized SVM Polynomial Regression, Training Models, Polynomial Regression-Polynomial Regressionlearning curves in, Learning Curves-Learning Curves pooling kernel, Pooling Layer pooling layer, Pooling Layer-Pooling Layer power scheduling, Learning Rate Scheduling precision, Confusion Matrix precision and recall, Precision and Recall-Precision/Recall TradeoffF-1 score, Precision and Recall-Precision and Recall precision/recall (PR) curve, The ROC Curve precision/recall tradeoff, Precision/Recall Tradeoff-Precision/Recall Tradeoff predetermined piecewise constant learning rate, Learning Rate Scheduling predict(), Data Cleaning predicted class, Confusion Matrix predictions, Confusion Matrix-Confusion Matrix, Decision Function and Predictions-Decision Function and Predictions, Making Predictions-Estimating Class Probabilities predictors, Supervised learning, Data Cleaning preloading training data, Preload the data into a variable PReLU (parametric leaky ReLU), Nonsaturating Activation Functions preprocessed attributes, Take a Quick Look at the Data Structure pretrained layers reuse, Reusing Pretrained Layers-Pretraining on an Auxiliary Taskauxiliary task, Pretraining on an Auxiliary Task-Pretraining on an Auxiliary Task caching frozen layers, Caching the Frozen Layers freezing lower layers, Freezing the Lower Layers model zoos, Model Zoos other frameworks, Reusing Models from Other Frameworks TensorFlow model, Reusing a TensorFlow Model-Reusing a TensorFlow Model unsupervised pretraining, Unsupervised Pretraining-Unsupervised Pretraining upper layers, Tweaking, Dropping, or Replacing the Upper Layers Pretty Tensor, Up and Running with TensorFlow primal problem, The Dual Problem principal component, Principal Components Principal Component Analysis (PCA), PCA-Randomized PCAexplained variance ratios, Explained Variance Ratio finding principal components, Principal Components-Principal Components for compression, PCA for Compression-Incremental PCA Incremental PCA, Incremental PCA-Randomized PCA Kernel PCA (kPCA), Kernel PCA-Selecting a Kernel and Tuning Hyperparameters projecting down to d dimensions, Projecting Down to d Dimensions Randomized PCA, Randomized PCA Scikit Learn for, Using Scikit-Learn variance, preserving, Preserving the Variance-Preserving the Variance probabilistic autoencoders, Variational Autoencoders probabilities, estimating, Estimating Probabilities-Estimating Probabilities, Estimating Class Probabilities producer functions, Other convenience functions projection, Projection-Projection propositional logic, From Biological to Artificial Neurons pruning, Regularization Hyperparameters, Symbolic Differentiation Pythonisolated environment in, Create the Workspace-Create the Workspace notebooks in, Create the Workspace-Download the Data pickle, Better Evaluation Using Cross-Validation pip, Create the Workspace Q Q-Learning algorithm, Temporal Difference Learning and Q-Learning-Learning to Play Ms.

pages: 505 words: 161,581

The Founders: The Story of Paypal and the Entrepreneurs Who Shaped Silicon Valley by Jimmy Soni

activist fund / activist shareholder / activist investor, Ada Lovelace, AltaVista, Apple Newton, barriers to entry, Big Tech, bitcoin, Blitzscaling, book value, business logic, butterfly effect, call centre, Carl Icahn, Claude Shannon: information theory, cloud computing, Colonization of Mars, Computing Machinery and Intelligence, corporate governance, COVID-19, crack epidemic, cryptocurrency, currency manipulation / currency intervention, digital map, disinformation, disintermediation, drop ship, dumpster diving, Elon Musk, Fairchild Semiconductor, fear of failure, fixed income, General Magic , general-purpose programming language, Glass-Steagall Act, global macro, global pandemic, income inequality, index card, index fund, information security, intangible asset, Internet Archive, iterative process, Jeff Bezos, Jeff Hawkins, John Markoff, Kwajalein Atoll, Lyft, Marc Andreessen, Mark Zuckerberg, Mary Meeker, Max Levchin, Menlo Park, Metcalfe’s law, mobile money, money market fund, multilevel marketing, mutually assured destruction, natural language processing, Network effects, off-the-grid, optical character recognition, PalmPilot, pattern recognition, paypal mafia, Peter Thiel, pets.com, Potemkin village, public intellectual, publish or perish, Richard Feynman, road to serfdom, Robert Metcalfe, Robert X Cringely, rolodex, Sand Hill Road, Satoshi Nakamoto, seigniorage, shareholder value, side hustle, Silicon Valley, Silicon Valley startup, slashdot, SoftBank, software as a service, Startup school, Steve Ballmer, Steve Jobs, Steve Jurvetson, Steve Wozniak, technoutopianism, the payments system, transaction costs, Turing test, uber lyft, Vanguard fund, winner-take-all economy, Y Combinator, Y2K

Levchin queried his assembled team of engineers. Engineer David Gausebeck thought back to his college research on computers’ ability to decipher images. Humans, he remembered, could read warped, hidden, or distorted letters—a much harder task for computers. He looked at Levchin and said, “OCR,” referring to optical character recognition. The concept wasn’t new to Levchin. In the Usenet and other forums he frequented, hackers distorted words all the time to keep information from prying eyes. Thus SWEET would become $VV££, and HELLO could be expressed as |-|3|_|_() or )-(3££0. Humans could read these codes; government computers could not.

…

Once complete, he pushed the code live—then blasted Wagner’s “The Ride of the Valkyries” over a cubicle-mounted speaker. * * * To perfect their creation, Levchin and his team studied the automated tools available at the time. Levchin trekked to a nearby computer store and bought armfuls of optical character recognition (OCR) software—programs (then still in their infancy) that extracted machine-legible text from images or handwriting. That research led to further refinements, including the use of a stencil font and the addition of thick, translucent lines over the text, both of which tripped up the store-bought OCR software.

pages: 1,302 words: 289,469

The Web Application Hacker's Handbook: Finding and Exploiting Security Flaws by Dafydd Stuttard, Marcus Pinto

business logic, call centre, cloud computing, commoditize, database schema, defense in depth, easy for humans, difficult for computers, Firefox, information retrieval, information security, lateral thinking, machine readable, MITM: man-in-the-middle, MVC pattern, optical character recognition, Ruby on Rails, SQL injection, Turing test, Wayback Machine, web application

The most significant challenges arise with segmenting the image into letters, particularly where letters overlap and are heavily distorted. For simple puzzles in which segmentation into letters is trivial, it is likely that some homegrown code can be used to remove image noise and pass the text into an existing OCR (optical character recognition) library to recognize the letters. For more complex puzzles in which segmentation is a serious challenge. 612 Chapter 14 Automating Customized Attacks various research projects have successfully compromised the CAPTCHA puzzles of high-profile web applications. For other types of puzzles, a different approach is needed, tailored to the nature of the puzzle images.

…

See .NET Binary Format for SOAP negative price method, 120 Ness, Jonathan, 634 .NET encryption, 686 padding oracle, 685-687 .NET Binary Format for SOAP (NBFS), 138 Netcat, 788-789 NETGEAR router, 562 network disclosure, session tokens, 234-237 network hosts, attackers, 561-562 network perimeter, web application security and new, 12-14 next Payload method, 578 NGSSoftware, 640 Nikto hacker's toolkit, 785 hidden content, 93 maximizing effectiveness, 797 non-HTTP services, 562-563 NoSQL advantages, 343 data stores, 342-343 injection, 342-344 MongoDB, 343-344 notNetgear function, 562 ns lookup command, 365 NTLM protocol, 50 NULL bytes attackers, 23-24 WAFs, 460 XSS, 460 NULL value, 306-307 numeric data limits, 417 SQL injection into, 299-301, 315-316 o obfuscation bytecode, decompiling browser extensions, 144-146 custom schemes, 109 OCR. See optical character recognition ODBC. See open database connectivity off-by-one vulnerabilities, 636-638 OllyDbg, 153 Omitted Results, Google, 90 100 Continue, 48 on-site request forgery (OSRF), 502-503 onsubmit attributes, 130 opaque data attackers, 124 Index ■ P-P 869 client-side data transmission, 123-124 open database connectivity (ODBC), 624 open redirection vulnerabilities causes, 540-541 finding and exploiting, 542-546 hacker's methodology, 830-831 JavaScript, 546 preventing, 546-547 rickrolling attacks, 541 source code, 707-708 URLs, 542 absolute prefix, 545-546 blocking absolute, 544-545 user input, 543-544 OpenLDAP, 352 operating system commands (OS commands) ASRNET API methods, 722-723 injection, 358-368 ASP.net, 360-361 dynamic code execution, 362 dynamic code execution, vulnerabilities, 366-367 flaws, 363-366 hacker's methodology, 832-833 metacharacters, 420 Perl language, 358-360 preventing, 367-368 shell metacharacters, 363,365 source code, 708 spaces, 366 time delay, 363-364 Java API methods, 715-716 Perl language API methods, 738 PHP API methods, 731 optical character recognition (OCR), 611 OPTIONS functions, 43 OPTIONS method, 679-680 OPTIONS request, 528 Oracle databases attackers, 327 llg, 318 error messages, 334-338 out-of-band channels, 317-318 syntax, 332-334 time delays, 323-324 UNION operator, 307-308 PL/SQL Exclusion List, 676-677 web server software filter bypass, 692-694 web server, 676-677 The Oracle Hacker's Handbook (Litchfield), 693 oracles.

…

See open database connectivity off-by-one vulnerabilities, 636-638 OllyDbg, 153 Omitted Results, Google, 90 100 Continue, 48 on-site request forgery (OSRF), 502-503 onsubmit attributes, 130 opaque data attackers, 124 Index ■ P-P 869 client-side data transmission, 123-124 open database connectivity (ODBC), 624 open redirection vulnerabilities causes, 540-541 finding and exploiting, 542-546 hacker's methodology, 830-831 JavaScript, 546 preventing, 546-547 rickrolling attacks, 541 source code, 707-708 URLs, 542 absolute prefix, 545-546 blocking absolute, 544-545 user input, 543-544 OpenLDAP, 352 operating system commands (OS commands) ASRNET API methods, 722-723 injection, 358-368 ASP.net, 360-361 dynamic code execution, 362 dynamic code execution, vulnerabilities, 366-367 flaws, 363-366 hacker's methodology, 832-833 metacharacters, 420 Perl language, 358-360 preventing, 367-368 shell metacharacters, 363,365 source code, 708 spaces, 366 time delay, 363-364 Java API methods, 715-716 Perl language API methods, 738 PHP API methods, 731 optical character recognition (OCR), 611 OPTIONS functions, 43 OPTIONS method, 679-680 OPTIONS request, 528 Oracle databases attackers, 327 llg, 318 error messages, 334-338 out-of-band channels, 317-318 syntax, 332-334 time delays, 323-324 UNION operator, 307-308 PL/SQL Exclusion List, 676-677 web server software filter bypass, 692-694 web server, 676-677 The Oracle Hacker's Handbook (Litchfield), 693 oracles.

Mastering Structured Data on the Semantic Web: From HTML5 Microdata to Linked Open Data by Leslie Sikos

AGPL, Amazon Web Services, bioinformatics, business process, cloud computing, create, read, update, delete, Debian, en.wikipedia.org, fault tolerance, Firefox, Google Chrome, Google Earth, information retrieval, Infrastructure as a Service, Internet of things, linked data, machine readable, machine translation, natural language processing, openstreetmap, optical character recognition, platform as a service, search engine result page, semantic web, Silicon Valley, social graph, software as a service, SPARQL, text mining, Watson beat the top human players on Jeopardy!, web application, Wikidata, wikimedia commons, Wikivoyage

However, because there is a huge gap between what the human mind understands and what computers can interpret, a large amount of data on the Internet cannot be processed efficiently with computer software. For example, a scanned table in an image file is unstructured and cannot be interpreted by computers. While optical character recognition programs can be used to convert images of printed text into machine-encoded text, such conversions cannot be done in real time and with 100% accuracy, rely on a relatively clear image in high resolution, and require different processing algorithms, depending on the image file format. More important, table headings and table data cells will all become plain text, with no correlation whatsoever.

Hands-On Machine Learning With Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Geron

AlphaGo, Amazon Mechanical Turk, Bayesian statistics, centre right, combinatorial explosion, constrained optimization, correlation coefficient, crowdsourcing, data science, deep learning, DeepMind, duck typing, en.wikipedia.org, Geoffrey Hinton, iterative process, Netflix Prize, NP-complete, optical character recognition, P = NP, p-value, pattern recognition, performance metric, recommendation engine, self-driving car, SpamAssassin, speech recognition, statistical model

pages: 301 words: 89,076

The Globotics Upheaval: Globalisation, Robotics and the Future of Work by Richard Baldwin

agricultural Revolution, Airbnb, AlphaGo, AltaVista, Amazon Web Services, Apollo 11, augmented reality, autonomous vehicles, basic income, Big Tech, bread and circuses, business process, business process outsourcing, call centre, Capital in the Twenty-First Century by Thomas Piketty, Cass Sunstein, commoditize, computer vision, Corn Laws, correlation does not imply causation, Credit Default Swap, data science, David Ricardo: comparative advantage, declining real wages, deep learning, DeepMind, deindustrialization, deskilling, Donald Trump, Douglas Hofstadter, Downton Abbey, Elon Musk, Erik Brynjolfsson, facts on the ground, Fairchild Semiconductor, future of journalism, future of work, George Gilder, Google Glasses, Google Hangouts, Hans Moravec, hiring and firing, hype cycle, impulse control, income inequality, industrial robot, intangible asset, Internet of things, invisible hand, James Watt: steam engine, Jeff Bezos, job automation, Kevin Roose, knowledge worker, laissez-faire capitalism, Les Trente Glorieuses, low skilled workers, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, manufacturing employment, Mark Zuckerberg, mass immigration, mass incarceration, Metcalfe’s law, mirror neurons, new economy, optical character recognition, pattern recognition, Ponzi scheme, post-industrial society, post-work, profit motive, remote working, reshoring, ride hailing / ride sharing, Robert Gordon, Robert Metcalfe, robotic process automation, Ronald Reagan, Salesforce, San Francisco homelessness, Second Machine Age, self-driving car, side project, Silicon Valley, Skype, Snapchat, social intelligence, sovereign wealth fund, standardized shipping container, statistical model, Stephen Hawking, Steve Jobs, supply-chain management, systems thinking, TaskRabbit, telepresence, telepresence robot, telerobotics, Thomas Malthus, trade liberalization, universal basic income, warehouse automation

You’d need another sixteen years or so to get up to a hundred million—and by then, Flickr probably would have doubled their dataset size several times. But with all this amazing computer power and all these big data sets, why don’t we see machine learning deployed more widely? One problem is that once AI gets good enough, we stop thinking of it as AI. For example, Optical Character Recognition, which lets you scan a document and turn it into a Word file is AI, but most people just think of it as a standard feature. In other words, we already are surrounded by AI, but we don’t know it. A second problem is a skill shortage. RPA systems like Poppy or Henry can be trained very easily by people with only minimal training in the training.

pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat

AI winter, air gap, AltaVista, Amazon Web Services, artificial general intelligence, Asilomar, Automated Insights, Bayesian statistics, Bernie Madoff, Bill Joy: nanobots, Bletchley Park, brain emulation, California energy crisis, cellular automata, Chuck Templeton: OpenTable:, cloud computing, cognitive bias, commoditize, computer vision, Computing Machinery and Intelligence, cuban missile crisis, Daniel Kahneman / Amos Tversky, Danny Hillis, data acquisition, don't be evil, drone strike, dual-use technology, Extropian, finite state, Flash crash, friendly AI, friendly fire, Google Glasses, Google X / Alphabet X, Hacker News, Hans Moravec, Isaac Newton, Jaron Lanier, Jeff Hawkins, John Markoff, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, Loebner Prize, lone genius, machine translation, mutually assured destruction, natural language processing, Neil Armstrong, Nicholas Carr, Nick Bostrom, optical character recognition, PageRank, PalmPilot, paperclip maximiser, pattern recognition, Peter Thiel, precautionary principle, prisoner's dilemma, Ray Kurzweil, Recombinant DNA, Rodney Brooks, rolling blackouts, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, smart grid, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Jurvetson, Steve Wozniak, strong AI, Stuxnet, subprime mortgage crisis, superintelligent machines, technological singularity, The Coming Technological Singularity, Thomas Bayes, traveling salesman, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, zero day

In essence, the ANN is recognizing patterns in the data. Today, finding patterns in vast amounts of unstructured data is one of AI’s most lucrative jobs. Besides language translation and data mining, ANNs are at work today in computer game AI, analyzing the stock market, and identifying objects in images. They’re in Optical Character Recognition programs that read the printed word, and in computer chips that steer guided missiles. ANNs put the “smart” in smart bombs. They’ll be critical to most AGI architectures as well. And there’s something important to remember from chapter 7 about these ubiquitous neural nets. Like genetic algorithms, ANNs are “black box” systems.

pages: 259 words: 84,261

Scary Smart: The Future of Artificial Intelligence and How You Can Save Our World by Mo Gawdat

3D printing, accounting loophole / creative accounting, AI winter, AlphaGo, anthropic principle, artificial general intelligence, autonomous vehicles, basic income, Big Tech, Black Lives Matter, Black Monday: stock market crash in 1987, butterfly effect, call centre, carbon footprint, cloud computing, computer vision, coronavirus, COVID-19, CRISPR, cryptocurrency, deep learning, deepfake, DeepMind, Demis Hassabis, digital divide, digital map, Donald Trump, Elon Musk, fake news, fulfillment center, game design, George Floyd, global pandemic, Google Glasses, Google X / Alphabet X, Law of Accelerating Returns, lockdown, microplastics / micro fibres, Nick Bostrom, off-the-grid, OpenAI, optical character recognition, out of africa, pattern recognition, Ponzi scheme, Ray Kurzweil, recommendation engine, self-driving car, Silicon Valley, smart contracts, Stanislav Petrov, Stephen Hawking, subprime mortgage crisis, superintelligent machines, TED Talk, TikTok, Turing machine, Turing test, universal basic income, Watson beat the top human players on Jeopardy!, Y2K

All of the above, however, were based on traditional computer programming, and while they delivered impressive results, they failed to offer the accuracy and scale today’s computer vision can offer, due to the advancement of Deep Learning artificial intelligence techniques, which have completely surpassed and replaced all prior methods. This intelligence did not learn to see by following a programmer’s list of instructions, but rather through the very act of seeing itself. With AI helping computers see, they can now do it much better than we do, specifically when it comes to individual tasks. Optical character recognition allows computers to read text just like you are reading these words. Object recognition allows them to recognize objects in a picture or in the real world, through the lens of a camera. Computers today not only recognize the items you take off the shelf in an Amazon Go store, but they can give you all the information you need to know about a historical monument if you just point your phone at it and use Google Goggles.

Algorithms Unlocked by Thomas H. Cormen

bioinformatics, Donald Knuth, knapsack problem, NP-complete, optical character recognition, P = NP, Silicon Valley, sorting algorithm, traveling salesman

We can still say that the routing algorithm that the GPS runs is correct, however, even if the input to the algorithm is not; for the input given to the routing algorithm, the algorithm produces the fastest route. Now, for some problems, it might be difficult or even impossible to say whether an algorithm produces a correct solution. Take optical character recognition for example. Is this 11 6 pixel image a 5 or an S? Some people might call it a 5, whereas others might call it an S, so how could we declare that a computer’s decision is either correct or incor- Chapter 1: What Are Algorithms and Why Should You Care? 3 rect? We won’t. In this book, we will focus on computer algorithms that have knowable solutions.

pages: 330 words: 91,805

Peers Inc: How People and Platforms Are Inventing the Collaborative Economy and Reinventing Capitalism by Robin Chase

Airbnb, Amazon Web Services, Andy Kessler, Anthropocene, Apollo 13, banking crisis, barriers to entry, basic income, Benevolent Dictator For Life (BDFL), bike sharing, bitcoin, blockchain, Burning Man, business climate, call centre, car-free, carbon tax, circular economy, cloud computing, collaborative consumption, collaborative economy, collective bargaining, commoditize, congestion charging, creative destruction, crowdsourcing, cryptocurrency, data science, deal flow, decarbonisation, different worldview, do-ocracy, don't be evil, Donald Shoup, Elon Musk, en.wikipedia.org, Ethereum, ethereum blockchain, Eyjafjallajökull, Ferguson, Missouri, Firefox, Free Software Foundation, frictionless, Gini coefficient, GPS: selective availability, high-speed rail, hive mind, income inequality, independent contractor, index fund, informal economy, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Jane Jacobs, Jeff Bezos, jimmy wales, job satisfaction, Kickstarter, Kinder Surprise, language acquisition, Larry Ellison, Lean Startup, low interest rates, Lyft, machine readable, means of production, megacity, Minecraft, minimum viable product, Network effects, new economy, Oculus Rift, off-the-grid, openstreetmap, optical character recognition, pattern recognition, peer-to-peer, peer-to-peer lending, peer-to-peer model, Post-Keynesian economics, Richard Stallman, ride hailing / ride sharing, Ronald Coase, Ronald Reagan, Salesforce, Satoshi Nakamoto, Search for Extraterrestrial Intelligence, self-driving car, shareholder value, sharing economy, Silicon Valley, six sigma, Skype, smart cities, smart grid, Snapchat, sovereign wealth fund, Steve Crocker, Steve Jobs, Steven Levy, TaskRabbit, The Death and Life of Great American Cities, The Future of Employment, the long tail, The Nature of the Firm, Tragedy of the Commons, transaction costs, Turing test, turn-by-turn navigation, Uber and Lyft, uber lyft, vertical integration, Zipcar

reCAPTCHA takes the effort of typing the characters in a CAPTCHA and repurposes it to solve an entirely different problem. In order to make old newspapers or books useful online, they have to be scanned and the resulting images turned into machine-readable text to be usefully searchable. Sometimes the scanned or photographed image results in words that can’t be decoded using optical character recognition (OCR). This is a problem. When the CAPTCHAs are constructed using words tagged by OCR programs as unreadable, we smart humans do what computers can’t: We easily decode them! Tests have shown that reCAPTCHA text images are deciphered and transcribed with 99.1 percent accuracy, a rate comparable to the best human professional transcription services.

pages: 224 words: 12,941

From Gutenberg to Google: electronic representations of literary texts by Peter L. Shillingsburg

bread and circuses, British Empire, computer age, disinformation, double helix, HyperCard, hypertext link, interchangeable parts, invention of the telephone, language acquisition, means of production, optical character recognition, pattern recognition, Saturday Night Live, Socratic dialogue

This unsought notion of a dank cellar of electronic texts initiated a train of thoughts – the first being that even this early in the electronic revolution the world is overwhelmed by texts of unknown provenance, with unknown corruptions, representing unidentified or misidentified versions. These texts frequently result from enthusiasm for computers and the Internet in particular. Texts are easily scanned, either as images or by optical character recognition (OCR) software and posted on the World Wide Web; thus, almost anyone can easily become an editor, producer, and publisher. From comments at conferences and advice given on the Internet, I conclude that the big worry is not authenticity, verification, or attribution. It is to avoid posting texts of works still in copyright.

pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists by Gary Marcus, Jeremy Freeman

23andMe, Albert Einstein, backpropagation, bioinformatics, bitcoin, brain emulation, cloud computing, complexity theory, computer age, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, data science, deep learning, Drosophila, epigenetics, Geoffrey Hinton, global pandemic, Google Glasses, ITER tokamak, iterative process, language acquisition, linked data, mouse model, optical character recognition, pattern recognition, personalized medicine, phenotype, race to the bottom, Richard Feynman, Ronald Reagan, semantic web, speech recognition, stem cell, Steven Pinker, supply-chain management, synthetic biology, tacit knowledge, traumatic brain injury, Turing machine, twin studies, web application

The PDP approach promised to solve problems that classic AI could not. Although neural network and machine learning have proven to be very powerful at performing certain kinds of tasks, but they have not bridged the gap between biological and artificial intelligence, except in very narrow domains, such as optical character recognition. What is missing? One possibility is that even neural networks are not “biological” enough. For example, in my PhD thesis I explored the possibility that endowing the simple summation nodes of neural networks with greater complexity, such as that provided by the elaborate dendritic trees of real neurons, would qualitatively enhance the power of these networks to compute.

pages: 340 words: 97,723

The Big Nine: How the Tech Titans and Their Thinking Machines Could Warp Humanity by Amy Webb

"Friedman doctrine" OR "shareholder theory", Ada Lovelace, AI winter, air gap, Airbnb, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic bias, AlphaGo, Andy Rubin, artificial general intelligence, Asilomar, autonomous vehicles, backpropagation, Bayesian statistics, behavioural economics, Bernie Sanders, Big Tech, bioinformatics, Black Lives Matter, blockchain, Bretton Woods, business intelligence, Cambridge Analytica, Cass Sunstein, Charles Babbage, Claude Shannon: information theory, cloud computing, cognitive bias, complexity theory, computer vision, Computing Machinery and Intelligence, CRISPR, cross-border payments, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, data science, deep learning, DeepMind, Demis Hassabis, Deng Xiaoping, disinformation, distributed ledger, don't be evil, Donald Trump, Elon Musk, fail fast, fake news, Filter Bubble, Flynn Effect, Geoffrey Hinton, gig economy, Google Glasses, Grace Hopper, Gödel, Escher, Bach, Herman Kahn, high-speed rail, Inbox Zero, Internet of things, Jacques de Vaucanson, Jeff Bezos, Joan Didion, job automation, John von Neumann, knowledge worker, Lyft, machine translation, Mark Zuckerberg, Menlo Park, move fast and break things, Mustafa Suleyman, natural language processing, New Urbanism, Nick Bostrom, one-China policy, optical character recognition, packet switching, paperclip maximiser, pattern recognition, personalized medicine, RAND corporation, Ray Kurzweil, Recombinant DNA, ride hailing / ride sharing, Rodney Brooks, Rubik’s Cube, Salesforce, Sand Hill Road, Second Machine Age, self-driving car, seminal paper, SETI@home, side project, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart cities, South China Sea, sovereign wealth fund, speech recognition, Stephen Hawking, strong AI, superintelligent machines, surveillance capitalism, technological singularity, The Coming Technological Singularity, the long tail, theory of mind, Tim Cook: Apple, trade route, Turing machine, Turing test, uber lyft, Von Neumann architecture, Watson beat the top human players on Jeopardy!, zero day

Sometime in the middle of all that work were a handful of researchers who, once again, were workshopping neural networks, an idea championed by Marvin Minsky and Frank Rosenblatt during the initial Dartmouth meeting. Cognitive scientist Geoff Hinton and computer scientists Yann Lecun and Yoshua Bengio each believed that neural net–based systems would not only have serious practical applications—like automatic fraud detection for credit cards and automatic optical character recognition for reading documents and checks—but that it would become the basis for what artificial intelligence would become. It was Hinton, a professor at the University of Toronto, who imagined a new kind of neural net, one made up of multiple layers that each extracted different information until it recognized what it was looking for.

pages: 368 words: 96,825

Bold: How to Go Big, Create Wealth and Impact the World by Peter H. Diamandis, Steven Kotler

3D printing, additive manufacturing, adjacent possible, Airbnb, Amazon Mechanical Turk, Amazon Web Services, Apollo 11, augmented reality, autonomous vehicles, Boston Dynamics, Charles Lindbergh, cloud computing, company town, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, deal flow, deep learning, dematerialisation, deskilling, disruptive innovation, driverless car, Elon Musk, en.wikipedia.org, Exxon Valdez, fail fast, Fairchild Semiconductor, fear of failure, Firefox, Galaxy Zoo, Geoffrey Hinton, Google Glasses, Google Hangouts, gravity well, hype cycle, ImageNet competition, industrial robot, information security, Internet of things, Jeff Bezos, John Harrison: Longitude, John Markoff, Jono Bacon, Just-in-time delivery, Kickstarter, Kodak vs Instagram, Law of Accelerating Returns, Lean Startup, life extension, loss aversion, Louis Pasteur, low earth orbit, Mahatma Gandhi, Marc Andreessen, Mark Zuckerberg, Mars Rover, meta-analysis, microbiome, minimum viable product, move fast and break things, Narrative Science, Netflix Prize, Network effects, Oculus Rift, OpenAI, optical character recognition, packet switching, PageRank, pattern recognition, performance metric, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, Ray Kurzweil, recommendation engine, Richard Feynman, ride hailing / ride sharing, risk tolerance, rolodex, Scaled Composites, self-driving car, sentiment analysis, shareholder value, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, skunkworks, Skype, smart grid, SpaceShipOne, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Stewart Brand, Stuart Kauffman, superconnector, Susan Wojcicki, synthetic biology, technoutopianism, TED Talk, telepresence, telepresence robot, Turing test, urban renewal, Virgin Galactic, Wayback Machine, web application, X Prize, Y Combinator, zero-sum game

“What if,” says Ahn, “there was some giant task that humans could do that computers could not that can be broken down into ten-second chunks?” This was the birth of reCAPTCHA, a website that serves a dual purpose, both helping to distinguish bots from humans while simultaneously helping to digitize books.15 Normally, we digitize books by scanning pages into a computer; next, an optical character recognition program runs through this text, attempting to turn images into actual words. Sometimes this works great; other times, not so well. The big problem is with old books, especially ones whose pages have yellowed. On average, for books written more than fifty years ago, computers can make out only about 70 percent of the text.

pages: 484 words: 104,873

Rise of the Robots: Technology and the Threat of a Jobless Future by Martin Ford

3D printing, additive manufacturing, Affordable Care Act / Obamacare, AI winter, algorithmic management, algorithmic trading, Amazon Mechanical Turk, artificial general intelligence, assortative mating, autonomous vehicles, banking crisis, basic income, Baxter: Rethink Robotics, Bernie Madoff, Bill Joy: nanobots, bond market vigilante , business cycle, call centre, Capital in the Twenty-First Century by Thomas Piketty, carbon tax, Charles Babbage, Chris Urmson, Clayton Christensen, clean water, cloud computing, collateralized debt obligation, commoditize, computer age, creative destruction, data science, debt deflation, deep learning, deskilling, digital divide, disruptive innovation, diversified portfolio, driverless car, Erik Brynjolfsson, factory automation, financial innovation, Flash crash, Ford Model T, Fractional reserve banking, Freestyle chess, full employment, general purpose technology, Geoffrey Hinton, Goldman Sachs: Vampire Squid, Gunnar Myrdal, High speed trading, income inequality, indoor plumbing, industrial robot, informal economy, iterative process, Jaron Lanier, job automation, John Markoff, John Maynard Keynes: technological unemployment, John von Neumann, Kenneth Arrow, Khan Academy, Kiva Systems, knowledge worker, labor-force participation, large language model, liquidity trap, low interest rates, low skilled workers, low-wage service sector, Lyft, machine readable, machine translation, manufacturing employment, Marc Andreessen, McJob, moral hazard, Narrative Science, Network effects, new economy, Nicholas Carr, Norbert Wiener, obamacare, optical character recognition, passive income, Paul Samuelson, performance metric, Peter Thiel, plutocrats, post scarcity, precision agriculture, price mechanism, public intellectual, Ray Kurzweil, rent control, rent-seeking, reshoring, RFID, Richard Feynman, Robert Solow, Rodney Brooks, Salesforce, Sam Peltzman, secular stagnation, self-driving car, Silicon Valley, Silicon Valley billionaire, Silicon Valley startup, single-payer health, software is eating the world, sovereign wealth fund, speech recognition, Spread Networks laid a new fibre optics cable between New York and Chicago, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Steven Pinker, strong AI, Stuxnet, technological singularity, telepresence, telepresence robot, The Bell Curve by Richard Herrnstein and Charles Murray, The Coming Technological Singularity, The Future of Employment, the long tail, Thomas L Friedman, too big to fail, Tragedy of the Commons, Tyler Cowen, Tyler Cowen: Great Stagnation, uber lyft, union organizing, Vernor Vinge, very high income, warehouse automation, warehouse robotics, Watson beat the top human players on Jeopardy!, women in the workforce

The first truly intelligent machine, he tells us, will be built by the late 2020s. The Singularity itself will occur some time around 2045. Kurzweil is by all accounts a brilliant inventor and engineer. He has founded a series of successful companies to market his inventions in areas like optical character recognition, computer-generated speech, and music synthesis. He’s been awarded twenty honorary doctorate degrees as well as the National Medal of Technology and was inducted into the US Patent Office’s Hall of Fame. Inc. magazine once referred to him as the “rightful heir” to Thomas Edison. His work on the Singularity, however, is an odd mixture composed of a well-grounded and coherent narrative about technological acceleration, together with ideas that seem so speculative as to border on the absurd—including, for example, a heartfelt desire to resurrect his late father by gathering DNA from the gravesite and then regenerating his body using futuristic nanotechnology.

Remix: Making Art and Commerce Thrive in the Hybrid Economy by Lawrence Lessig

Aaron Swartz, Amazon Web Services, Andrew Keen, Benjamin Mako Hill, Berlin Wall, Bernie Sanders, Brewster Kahle, carbon tax, Cass Sunstein, collaborative editing, commoditize, disintermediation, don't be evil, Erik Brynjolfsson, folksonomy, Free Software Foundation, Internet Archive, invisible hand, Jeff Bezos, jimmy wales, John Perry Barlow, Joi Ito, Kevin Kelly, Larry Wall, late fees, Mark Shuttleworth, Netflix Prize, Network effects, new economy, optical character recognition, PageRank, peer-to-peer, recommendation engine, revision control, Richard Stallman, Ronald Coase, Saturday Night Live, search costs, SETI@home, sharing economy, Silicon Valley, Skype, slashdot, Steve Jobs, the long tail, The Nature of the Firm, thinkpad, transaction costs, VA Linux, Wayback Machine, yellow journalism, Yochai Benkler

. • Distributed Proofreaders is a sharing economy. Inspired by Michael Hart’s Project Gutenberg, and launched in 2000 by Charles Franks, the Distributed Proofreaders project was conceived to help proofread for free the books that Hart made available for free. To compensate for the errors of optical character recognition (OCR) technology, the Distributed Proofreaders project takes individual pages from scanned books and presents them to individuals, along with the original text. Volunteers then correct the text through a kind of distributed-computing project. (See the next item for more on distributed computing.)

pages: 372 words: 109,536

The Panama Papers: Breaking the Story of How the Rich and Powerful Hide Their Money by Frederik Obermaier

air gap, banking crisis, blood diamond, book value, credit crunch, crony capitalism, Deng Xiaoping, Edward Snowden, family office, Global Witness, high net worth, income inequality, Jeremy Corbyn, Kickstarter, Laura Poitras, liquidationism / Banker’s doctrine / the Treasury view, mega-rich, megaproject, Mikhail Gorbachev, mortgage debt, Nelson Mandela, offshore financial centre, optical character recognition, out of africa, race to the bottom, vertical integration, We are the 99%, WikiLeaks

This is relatively easy with Word documents and emails, but harder with PDFs and photo files – and there are already hundreds of thousands of those in our data by this point. The Nuix program must therefore first be able to identify if there is any text in the pictures. This is done by text recognition software called optical character recognition or OCR. Only when every document has undergone OCR is a negative search result truly a negative search result. Only then can you be relatively certain that Angela Merkel is not hiding in the data after the search for ‘Angela Merkel’ has produced zero hits. That’s as long as the name is not in a fax that has been printed out and later scanned or has been written using an old typewriter; if that’s the case, OCR will not produce any hits.

pages: 482 words: 106,041

The World Without Us by Alan Weisman

British Empire, carbon-based life, company town, conceptual framework, coronavirus, invention of radio, Nick Bostrom, nuclear winter, optical character recognition, out of africa, Ray Kurzweil, rewilding, the High Line, trade route, uranium enrichment, William Langewiesche

Via the self-accruing wizardry of computers, an abundance of silicon, and vast opportunities afforded by modular memory and mechanical appendages, human extinction would become merely a jettisoning of the limited and not very durable vessels that our technological minds have finally outgrown. Prominent in the transhumanist (sometimes called posthuman) movement are Oxford philosopher Nick Bostrom; heralded inventor Ray Kurzweil, originator of optical character recognition, flat-bed scanners, and print-to-speech reading machines for the blind; and Trinity College bioethicist James Hughes, author of Citizen Cyborg: Why Democratic Societies Must Respond to the Redesigned Human of the Future. However Faustian, their discussion is compelling in its lure of immortality and preternatural power—and almost touching in its Utopian faith that a machine could be made so perfect that it would transcend entropy.

pages: 302 words: 82,233

Beautiful security by Andy Oram, John Viega

Albert Einstein, Amazon Web Services, An Inconvenient Truth, Bletchley Park, business intelligence, business process, call centre, cloud computing, corporate governance, credit crunch, crowdsourcing, defense in depth, do well by doing good, Donald Davies, en.wikipedia.org, fault tolerance, Firefox, information security, loose coupling, Marc Andreessen, market design, MITM: man-in-the-middle, Monroe Doctrine, new economy, Nicholas Carr, Nick Leeson, Norbert Wiener, operational security, optical character recognition, packet switching, peer-to-peer, performance metric, pirate software, Robert Bork, Search for Extraterrestrial Intelligence, security theater, SETI@home, Silicon Valley, Skype, software as a service, SQL injection, statistical model, Steven Levy, the long tail, The Wisdom of Crowds, Upton Sinclair, web application, web of trust, zero day, Zimmermann PGP

Printed books were and are exempt from the export controls. This happened first in 1995 with the publication of PGP Source Code and Internals (MIT Press). It happened again later when Pretty Good Privacy, Inc., published the source code of PGP in a more sophisticated set of books with specialized software tools that were optimized for easy optical character recognition (OCR) scanning of C source code. This made it easy to export unlimited quantities of cryptographic source code, rendering the export controls moot and undermining the political will to continue imposing the export controls. Today, there has been nearly an about-face in government attitude about cryptography.

pages: 380 words: 118,675

The Everything Store: Jeff Bezos and the Age of Amazon by Brad Stone

airport security, Amazon Mechanical Turk, Amazon Web Services, AOL-Time Warner, Apollo 11, bank run, Bear Stearns, Bernie Madoff, big-box store, Black Swan, book scanning, Brewster Kahle, buy and hold, call centre, centre right, Chuck Templeton: OpenTable:, Clayton Christensen, cloud computing, collapse of Lehman Brothers, crowdsourcing, cuban missile crisis, Danny Hillis, deal flow, Douglas Hofstadter, drop ship, Elon Musk, facts on the ground, fulfillment center, game design, housing crisis, invention of movable type, inventory management, James Dyson, Jeff Bezos, John Markoff, junk bonds, Kevin Kelly, Kiva Systems, Kodak vs Instagram, Larry Ellison, late fees, loose coupling, low skilled workers, Maui Hawaii, Menlo Park, Neal Stephenson, Network effects, new economy, off-the-grid, optical character recognition, PalmPilot, pets.com, Ponzi scheme, proprietary trading, quantitative hedge fund, reality distortion field, recommendation engine, Renaissance Technologies, RFID, Rodney Brooks, search inside the book, shareholder value, Silicon Valley, Silicon Valley startup, six sigma, skunkworks, Skype, SoftBank, statistical arbitrage, Steve Ballmer, Steve Jobs, Steven Levy, Stewart Brand, the long tail, Thomas L Friedman, Tony Hsieh, two-pizza team, Virgin Galactic, Whole Earth Catalog, why are manhole covers round?, zero-sum game

Which one do you think will sell more books?” Publishers were concerned that Search Inside the Book might open up the floodgates of online piracy. Most, however, agreed to try it out and gave Amazon physical copies of their titles, which were shipped to a contractor in the Philippines to be scanned. Then Manber’s team ran optical character-recognition software over the book files to convert the scanned images into text that Amazon’s search algorithms could navigate and index. To reduce the chance that customers would read the books for free, Amazon served up only snippets of content—one or two pages before and after the search term, for example, and only to customers who had credit cards on file.

pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber

"World Economic Forum" Davos, AI winter, Alan Greenspan, algorithmic trading, AOL-Time Warner, Apollo 11, asset allocation, banking crisis, barriers to entry, Bear Stearns, Big bang: deregulation of the City of London, Bob Litterman, book value, business cycle, butter production in bangladesh, butterfly effect, buttonwood tree, buy and hold, buy low sell high, capital asset pricing model, Charles Babbage, citizen journalism, collateralized debt obligation, Cornelius Vanderbilt, corporate governance, Craig Reynolds: boids flock, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, electricity market, Emanuel Derman, en.wikipedia.org, experimental economics, fake news, financial engineering, financial innovation, fixed income, Ford Model T, Gordon Gekko, Hans Moravec, Herman Kahn, implied volatility, index arbitrage, index fund, information retrieval, intangible asset, Internet Archive, Ivan Sutherland, Jim Simons, John Bogle, John Nash: game theory, Kenneth Arrow, load shedding, Long Term Capital Management, machine readable, machine translation, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, Metcalfe’s law, military-industrial complex, moral hazard, mutually assured destruction, Myron Scholes, natural language processing, negative equity, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, proprietary trading, quantitative hedge fund, quantitative trading / quantitative ﬁnance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Reminiscences of a Stock Operator, Renaissance Technologies, risk free rate, risk tolerance, risk-adjusted returns, risk/return, Robert Metcalfe, Ronald Reagan, Rubik’s Cube, Savings and loan crisis, semantic web, Sharpe ratio, short selling, short squeeze, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, stock buybacks, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, tontine, too big to fail, transaction costs, Turing machine, two and twenty, Upton Sinclair, value at risk, value engineering, Vernor Vinge, Wayback Machine, yield curve, Yogi Berra, your tax dollars at work

I hope that after reading this book you’ll have a better sense of how technology shapes markets, and how to be a nimble participant in the future of electronic finance. Web Site This book includes many URLs, which would tire the fingers of even the most dedicated nerds. Someday soon you’ll point your handheld’s camera at the book and it will use OCR (optical character recognition) to find (or offer to sell you) the material you’re looking for. Absent that fancy gadget, try the web site NerdsonWallStreet.com. It has links in to all of these references, plus color and animated versions of the black & white screen grabs found in the book. The site will be updated often with new and topical items.

pages: 397 words: 110,222

Habeas Data: Privacy vs. The Rise of Surveillance Tech by Cyrus Farivar

Apple's 1984 Super Bowl advert, autonomous vehicles, call centre, citizen journalism, cloud computing, computer age, connected car, do-ocracy, Donald Trump, Edward Snowden, en.wikipedia.org, failed state, Ferguson, Missouri, Frank Gehry, Golden Gate Park, information security, John Markoff, Laura Poitras, license plate recognition, lock screen, Lyft, national security letter, Occupy movement, operational security, optical character recognition, Port of Oakland, RAND corporation, Ronald Reagan, sharing economy, Silicon Valley, Silicon Valley startup, Skype, Steve Jobs, Steven Levy, tech worker, The Hackers Conference, Tim Cook: Apple, transaction costs, uber lyft, WikiLeaks, you are the product, Zimmermann PGP

Fourteen years after the FBI first began testing the technology, LPRs are now in use nearly everywhere across America. These are essentially specialized cameras that can scan license plates incredibly fast—60 plates per second. When mounted on a police patrol car, they can scan in multiple directions, capturing cars driving in front or parked perpendicular. LPRs use the same optical character recognition technology as modern-day desktop scanners. The software can read license plates, which have a standard size and format, and compare them against a “hot list” of stolen or wanted cars. If the LPR scans a “hot” car, the computer inside the police car will alert the officer, and she or he is typically supposed to verify that the scanned plate actually matches the wanted plate, and that it’s attached to the right make and model of car.

pages: 423 words: 126,096

Our Own Devices: How Technology Remakes Humanity by Edward Tenner

A. Roger Ekirch, Apple Newton, Bonfire of the Vanities, card file, Douglas Engelbart, Douglas Engelbart, Frederick Winslow Taylor, future of work, indoor plumbing, informal economy, invention of the telephone, invisible hand, Johannes Kepler, John Markoff, Joseph-Marie Jacquard, Lewis Mumford, Multics, multilevel marketing, Network effects, optical character recognition, PalmPilot, QWERTY keyboard, safety bicycle, scientific management, Shoshana Zuboff, Stewart Brand, tacit knowledge, women in the workforce

While voice recognition software can promote its own overuse injuries, it can now work with natural phrasing and infer spelling from context. But it is unlikely to eliminate the keyboard, because it will still make errors (or users will still fail to enunciate properly), and editing copy orally is even slower and more tedious than correcting it with a keyboard. Optical character recognition data also need checking and editing. Typing will probably be further reduced in familiar applications in the future, but it will also be extended to new tasks. A new global keyboard order is emerging. Intensive “production” typing is less necessary because more data arrive already digitized and need only formatting and correction.

pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts by Richard Susskind, Daniel Susskind

23andMe, 3D printing, Abraham Maslow, additive manufacturing, AI winter, Albert Einstein, Amazon Mechanical Turk, Amazon Robotics, Amazon Web Services, Andrew Keen, Atul Gawande, Automated Insights, autonomous vehicles, Big bang: deregulation of the City of London, big data - Walmart - Pop Tarts, Bill Joy: nanobots, Blue Ocean Strategy, business process, business process outsourcing, Cass Sunstein, Checklist Manifesto, Clapham omnibus, Clayton Christensen, clean water, cloud computing, commoditize, computer age, Computer Numeric Control, computer vision, Computing Machinery and Intelligence, conceptual framework, corporate governance, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, data science, death of newspapers, disintermediation, Douglas Hofstadter, driverless car, en.wikipedia.org, Erik Brynjolfsson, Evgeny Morozov, Filter Bubble, full employment, future of work, Garrett Hardin, Google Glasses, Google X / Alphabet X, Hacker Ethic, industrial robot, informal economy, information retrieval, interchangeable parts, Internet of things, Isaac Newton, James Hargreaves, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Khan Academy, knowledge economy, Large Hadron Collider, lifelogging, lump of labour, machine translation, Marshall McLuhan, Metcalfe’s law, Narrative Science, natural language processing, Network effects, Nick Bostrom, optical character recognition, Paul Samuelson, personalized medicine, planned obsolescence, pre–internet, Ray Kurzweil, Richard Feynman, Second Machine Age, self-driving car, semantic web, Shoshana Zuboff, Skype, social web, speech recognition, spinning jenny, strong AI, supply-chain management, Susan Wojcicki, tacit knowledge, TED Talk, telepresence, The Future of Employment, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Tragedy of the Commons, transaction costs, Turing test, Two Sigma, warehouse robotics, Watson beat the top human players on Jeopardy!, WikiLeaks, world market for maybe five computers, Yochai Benkler, young professional

This system was being used by more than 70 per cent of FTSE 100 companies when it was sold in 2009 to Thomson Reuters, the global information provider. Also at Deloitte, the task of recovering foreign VAT payments is no longer done by human experts, but by a system, Revatic Smart. This scans clients’ documents using optical character-recognition software, and automatically files the correct forms, with little human input.260 In most cases, these tax platforms computerize tasks that would have once been done manually by a human being. But the firm also employs more than 10,000 people in India to undertake routine tax work. With regard to national tax authorities and their operations, many still rely on taxpayers, from individuals to the largest multinationals, to self-assess.

pages: 370 words: 129,096

Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future by Ashlee Vance

addicted to oil, Burning Man, clean tech, digital map, El Camino Real, Elon Musk, fail fast, Ford Model T, gigafactory, global supply chain, Great Leap Forward, high-speed rail, Hyperloop, industrial robot, Jeff Bezos, Kickstarter, Kwajalein Atoll, Larry Ellison, low earth orbit, Mark Zuckerberg, Mars Society, Maui Hawaii, Max Levchin, Menlo Park, Mercator projection, military-industrial complex, money market fund, multiplanetary species, off-the-grid, optical character recognition, orbital mechanics / astrodynamics, PalmPilot, paypal mafia, performance metric, Peter Thiel, pneumatic tube, pre–internet, risk tolerance, Ronald Reagan, Sand Hill Road, Scaled Composites, self-driving car, side project, Silicon Valley, Silicon Valley startup, Solyndra, Steve Jobs, Steve Jurvetson, technoutopianism, Tesla Model S, Tony Fadell, transaction costs, Tyler Cowen, Tyler Cowen: Great Stagnation, vertical integration, Virgin Galactic, We wanted flying cars, instead we got 140 characters, X Prize

It depicted a pair of giant solar arrays in space—each four kilometers in width—sending their juice down to Earth via microwave beams to a receiving antenna with a seven-kilometer diameter. Musk received a 98 on what his professor deemed a “very interesting and well written paper.” A second paper talked about taking research documents and books and electronically scanning them, performing optical character recognition, and putting all of the information in a single database—much like a mix between today’s Google Books and Google Scholar. And a third paper dwelled on another of Musk’s favorite topics—ultracapacitors. In the forty-four-page document, Musk is plainly jubilant over the idea of a new form of energy storage that would suit his future pursuits with cars, planes, and rockets.

AI 2041 by Kai-Fu Lee, Chen Qiufan

3D printing, Abraham Maslow, active measures, airport security, Albert Einstein, AlphaGo, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, augmented reality, autonomous vehicles, basic income, bitcoin, blockchain, blue-collar work, Cambridge Analytica, carbon footprint, Charles Babbage, computer vision, contact tracing, coronavirus, corporate governance, corporate social responsibility, COVID-19, CRISPR, cryptocurrency, DALL-E, data science, deep learning, deepfake, DeepMind, delayed gratification, dematerialisation, digital map, digital rights, digital twin, Elon Musk, fake news, fault tolerance, future of work, Future Shock, game design, general purpose technology, global pandemic, Google Glasses, Google X / Alphabet X, GPT-3, happiness index / gross national happiness, hedonic treadmill, hiring and firing, Hyperloop, information security, Internet of things, iterative process, job automation, language acquisition, low earth orbit, Lyft, Maslow's hierarchy, mass immigration, mirror neurons, money: store of value / unit of account / medium of exchange, mutually assured destruction, natural language processing, Neil Armstrong, Nelson Mandela, OpenAI, optical character recognition, pattern recognition, plutocrats, post scarcity, profit motive, QR code, quantitative easing, Richard Feynman, ride hailing / ride sharing, robotic process automation, Satoshi Nakamoto, self-driving car, seminal paper, Silicon Valley, smart cities, smart contracts, smart transportation, Snapchat, social distancing, speech recognition, Stephen Hawking, synthetic biology, telemarketer, Tesla Model S, The future is already here, trolley problem, Turing test, uber lyft, universal basic income, warehouse automation, warehouse robotics, zero-sum game

These provide a natural source of supervision for machines to learn to translate languages. The AI can be trained from the simple pairing of, say, each of the millions of sentences in English with its professionally translated counterpart in French. Using this approach, supervised learning can be extended to speech recognition (converting speech into text), optical character recognition (converting handwriting or images into text), or speech synthesis (converting text to speech). For these types of natural language recognition tasks where supervised training is feasible, AI already outperforms most humans. A more complex application of NLP goes from recognition to understanding.

pages: 562 words: 146,544

Daemon by Daniel Suarez

Berlin Wall, Burning Man, call centre, digital map, disruptive innovation, double helix, failed state, Fall of the Berlin Wall, game design, high net worth, invisible hand, McMansion, offshore financial centre, optical character recognition, peer-to-peer, plutocrats, RFID, satellite internet, SQL injection, Stewart Brand, tech worker, telemarketer, web application

Gragg had an epiphany: there was no encrypted string in the Monte Cassino map. Gragg had perceived the encrypted text, but it wasn’t really computer text; it was a graphical image—and one done in a Teutonic stone-carved font, no less. The encrypted string, “m0wFG3PRCo JVTs7JcgBwsOXb3U7yPxBB,” was an arrangement of pixels that only a human eye—or a really good optical character-recognition scanner—could interpret. Programmatically scanning the contents of this map wouldn’t uncover any encrypted text—only a human being viewing the map in the context in which it was meant to be seen could see its significance. But even within the game the significance of the coded string wasn’t truly revealed until… Gragg smiled.

Multitool Linux: Practical Uses for Open Source Software by Michael Schwarz, Jeremy Anderson, Peter Curtis

business process, Debian, defense in depth, Free Software Foundation, GnuPG, index card, indoor plumbing, Larry Ellison, Larry Wall, MITM: man-in-the-middle, optical character recognition, PalmPilot, publish or perish, RFC: Request For Comment, Richard Stallman, seminal paper, SETI@home, slashdot, the Cathedral and the Bazaar, two and twenty, web application

Linux provides a number of powerful tools that make it an excellent platform for image processing. And this means more than just making images for a Web site: It can include document archiving and preservation, scene rendering and creation of textures for computer games, artistic endeavors, and optical character recognition, to name a few. In this chapter, we'll discuss some of the popular types of image formats, their strengths and weaknesses, and ways to convert between them. We'll then discuss various types of image retouching, primarily oriented toward Web presentation. And we'll learn how to use various Linux tools along the way.

pages: 523 words: 148,929

Physics of the Future: How Science Will Shape Human Destiny and Our Daily Lives by the Year 2100 by Michio Kaku

agricultural Revolution, AI winter, Albert Einstein, Alvin Toffler, Apollo 11, Asilomar, augmented reality, Bill Joy: nanobots, bioinformatics, blue-collar work, British Empire, Brownian motion, caloric restriction, caloric restriction, cloud computing, Colonization of Mars, DARPA: Urban Challenge, data science, delayed gratification, digital divide, double helix, Douglas Hofstadter, driverless car, en.wikipedia.org, Ford Model T, friendly AI, Gödel, Escher, Bach, Hans Moravec, hydrogen economy, I think there is a world market for maybe five computers, industrial robot, Intergovernmental Panel on Climate Change (IPCC), invention of movable type, invention of the telescope, Isaac Newton, John Markoff, John von Neumann, Large Hadron Collider, life extension, Louis Pasteur, Mahatma Gandhi, Mars Rover, Mars Society, mass immigration, megacity, Mitch Kapor, Murray Gell-Mann, Neil Armstrong, new economy, Nick Bostrom, oil shale / tar sands, optical character recognition, pattern recognition, planetary scale, postindustrial economy, Ray Kurzweil, refrigerator car, Richard Feynman, Rodney Brooks, Ronald Reagan, Search for Extraterrestrial Intelligence, Silicon Valley, Simon Singh, social intelligence, SpaceShipOne, speech recognition, stem cell, Stephen Hawking, Steve Jobs, synthetic biology, telepresence, The future is already here, The Wealth of Nations by Adam Smith, Thomas L Friedman, Thomas Malthus, trade route, Turing machine, uranium enrichment, Vernor Vinge, Virgin Galactic, Wall-E, Walter Mischel, Whole Earth Review, world market for maybe five computers, X Prize

However, his supporters say that he has an uncanny ability to correctly see into the future, judging by his track record.) Kurzweil cut his teeth on the computer revolution by starting up companies in diverse fields involving pattern recognition, such as speech recognition technology, optical character recognition, and electronic keyboard instruments. In 1999, he wrote a best seller, The Age of Spiritual Machines: When Computers Exceed Human Intelligence, which predicted when robots will surpass us in intelligence. In 2005, he wrote The Singularity Is Near and elaborated on those predictions. The fateful day when computers surpass human intelligence will come in stages.

pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom

agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, Anthropocene, anti-communist, artificial general intelligence, autism spectrum disorder, autonomous vehicles, backpropagation, barriers to entry, Bayesian statistics, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, Computing Machinery and Intelligence, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, Demis Hassabis, demographic transition, different worldview, Donald Knuth, Douglas Hofstadter, driverless car, Drosophila, Elon Musk, en.wikipedia.org, endogenous growth, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, general purpose technology, Geoffrey Hinton, Gödel, Escher, Bach, hallucination problem, Hans Moravec, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John Markoff, John von Neumann, knowledge worker, Large Hadron Collider, longitudinal study, machine translation, megaproject, Menlo Park, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Nick Bostrom, Norbert Wiener, NP-complete, nuclear winter, operational security, optical character recognition, paperclip maximiser, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, search costs, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, Strategic Defense Initiative, strong AI, superintelligent machines, supervolcano, synthetic biology, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, time dilation, Tragedy of the Commons, transaction costs, trolley problem, Turing machine, Vernor Vinge, WarGames: Global Thermonuclear War, Watson beat the top human players on Jeopardy!, World Values Survey, zero-sum game

There are robotic pets and cleaning robots, lawn-mowing robots, rescue robots, surgical robots, and over a million industrial robots.64 The world population of robots exceeds 10 million.65 Modern speech recognition, based on statistical techniques such as hidden Markov models, has become sufficiently accurate for practical use (some fragments of this book were drafted with the help of a speech recognition program). Personal digital assistants, such as Apple’s Siri, respond to spoken commands and can answer simple questions and execute commands. Optical character recognition of handwritten and typewritten text is routinely used in applications such as mail sorting and digitization of old documents.66 Machine translation remains imperfect but is good enough for many applications. Early systems used the GOFAI approach of hand-coded grammars that had to be developed by skilled linguists from the ground up for each language.

Digital Accounting: The Effects of the Internet and Erp on Accounting by Ashutosh Deshmukh

accounting loophole / creative accounting, AltaVista, book value, business continuity plan, business intelligence, business logic, business process, call centre, computer age, conceptual framework, corporate governance, currency risk, data acquisition, disinformation, dumpster diving, fixed income, hypertext link, information security, interest rate swap, inventory management, iterative process, late fees, machine readable, money market fund, new economy, New Journalism, optical character recognition, packet switching, performance metric, profit maximization, semantic web, shareholder value, six sigma, statistical model, supply chain finance, supply-chain management, supply-chain management software, telemarketer, transaction costs, value at risk, vertical integration, warehouse automation, web application, Y2K

The checks at this post box are sorted, totaled, recorded and deposited. The details of these checks are then forwarded to the accounts receivable or credit department of the concerned organization. There are generally two types of lockboxes — retail and wholesale. Retail lockboxes use Optical Character Recognition (OCR) technology and are suitable for lowdollar and high-volume payments. Wholesale lockboxes manually process invoices, and payments are suitable for high-dollar and low-volume payments. The paper-based lockbox invariably induces a time lag in payment information and creates problems for credit departments and working capital management.

In the Age of the Smart Machine by Shoshana Zuboff

affirmative action, American ideology, blue-collar work, collective bargaining, computer age, Computer Numeric Control, conceptual framework, data acquisition, demand response, deskilling, factory automation, Ford paid five dollars a day, fudge factor, future of work, industrial robot, information retrieval, interchangeable parts, job automation, lateral thinking, linked data, Marshall McLuhan, means of production, old-boy network, optical character recognition, Panopticon Jeremy Bentham, pneumatic tube, post-industrial society, radical decentralization, RAND corporation, scientific management, Shoshana Zuboff, social web, systems thinking, tacit knowledge, The Wealth of Nations by Adam Smith, Thorstein Veblen, union organizing, vertical integration, work culture , zero-sum game

New methods of automating this textualization process, such as build- ing it into organizational members' natural activities (for example, ac- count officers enter their own data), sharing it among several organiza- tions through interorganizational systems, and relying on increasingly sophisticated automated data-entry devices based on optical character recognition and high-speed communications, mean that fewer people will be needed to accomplish routine transactions in conjunction with the machine system. This new scenario calls into question both the forms of knowledge that people need and the way in which that knowledge should be dis- tributed.

pages: 586 words: 186,548

Architects of Intelligence by Martin Ford

3D printing, agricultural Revolution, AI winter, algorithmic bias, Alignment Problem, AlphaGo, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, barriers to entry, basic income, Baxter: Rethink Robotics, Bayesian statistics, Big Tech, bitcoin, Boeing 747, Boston Dynamics, business intelligence, business process, call centre, Cambridge Analytica, cloud computing, cognitive bias, Colonization of Mars, computer vision, Computing Machinery and Intelligence, correlation does not imply causation, CRISPR, crowdsourcing, DARPA: Urban Challenge, data science, deep learning, DeepMind, Demis Hassabis, deskilling, disruptive innovation, Donald Trump, Douglas Hofstadter, driverless car, Elon Musk, Erik Brynjolfsson, Ernest Rutherford, fake news, Fellow of the Royal Society, Flash crash, future of work, general purpose technology, Geoffrey Hinton, gig economy, Google X / Alphabet X, Gödel, Escher, Bach, Hans Moravec, Hans Rosling, hype cycle, ImageNet competition, income inequality, industrial research laboratory, industrial robot, information retrieval, job automation, John von Neumann, Large Hadron Collider, Law of Accelerating Returns, life extension, Loebner Prize, machine translation, Mark Zuckerberg, Mars Rover, means of production, Mitch Kapor, Mustafa Suleyman, natural language processing, new economy, Nick Bostrom, OpenAI, opioid epidemic / opioid crisis, optical character recognition, paperclip maximiser, pattern recognition, phenotype, Productivity paradox, radical life extension, Ray Kurzweil, recommendation engine, Robert Gordon, Rodney Brooks, Sam Altman, self-driving car, seminal paper, sensor fusion, sentiment analysis, Silicon Valley, smart cities, social intelligence, sparse data, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, Steven Pinker, strong AI, superintelligent machines, synthetic biology, systems thinking, Ted Kaczynski, TED Talk, The Rise and Fall of American Growth, theory of mind, Thomas Bayes, Travis Kalanick, Turing test, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, women in the workforce, working-age population, workplace surveillance , zero-sum game, Zipcar

Ray received his engineering degree from MIT, where he was mentored by Marvin Minsky, one of the founding fathers of the field of artificial intelligence. He went on to make major contributions in a variety of areas. He was the principal inventor of the first CCD flat-bed scanner, the first omni-font optical character recognition, the first print-to-speech reading machine for the blind, the first text-to-speech synthesizer, the first music synthesizer capable of recreating the grand piano and other orchestral instruments, and the first commercially marketed large-vocabulary speech recognition. Among Ray’s many honors, he received a Grammy Award for outstanding achievements in music technology; he is the recipient of the National Medal of Technology (the nation’s highest honor in technology), was inducted into the National Inventors Hall of Fame, holds twenty-one honorary doctorates, and honors from three US presidents.

pages: 677 words: 206,548

Future Crimes: Everything Is Connected, Everyone Is Vulnerable and What We Can Do About It by Marc Goodman

23andMe, 3D printing, active measures, additive manufacturing, Affordable Care Act / Obamacare, Airbnb, airport security, Albert Einstein, algorithmic trading, Alvin Toffler, Apollo 11, Apollo 13, artificial general intelligence, Asilomar, Asilomar Conference on Recombinant DNA, augmented reality, autonomous vehicles, Baxter: Rethink Robotics, Bill Joy: nanobots, bitcoin, Black Swan, blockchain, borderless world, Boston Dynamics, Brian Krebs, business process, butterfly effect, call centre, Charles Lindbergh, Chelsea Manning, Citizen Lab, cloud computing, Cody Wilson, cognitive dissonance, computer vision, connected car, corporate governance, crowdsourcing, cryptocurrency, data acquisition, data is the new oil, data science, Dean Kamen, deep learning, DeepMind, digital rights, disinformation, disintermediation, Dogecoin, don't be evil, double helix, Downton Abbey, driverless car, drone strike, Edward Snowden, Elon Musk, Erik Brynjolfsson, Evgeny Morozov, Filter Bubble, Firefox, Flash crash, Free Software Foundation, future of work, game design, gamification, global pandemic, Google Chrome, Google Earth, Google Glasses, Gordon Gekko, Hacker News, high net worth, High speed trading, hive mind, Howard Rheingold, hypertext link, illegal immigration, impulse control, industrial robot, information security, Intergovernmental Panel on Climate Change (IPCC), Internet of things, Jaron Lanier, Jeff Bezos, job automation, John Harrison: Longitude, John Markoff, Joi Ito, Jony Ive, Julian Assange, Kevin Kelly, Khan Academy, Kickstarter, Kiva Systems, knowledge worker, Kuwabatake Sanjuro: assassination market, Large Hadron Collider, Larry Ellison, Laura Poitras, Law of Accelerating Returns, Lean Startup, license plate recognition, lifelogging, litecoin, low earth orbit, M-Pesa, machine translation, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Metcalfe’s law, MITM: man-in-the-middle, mobile money, more computing power than Apollo, move fast and break things, Nate Silver, national security letter, natural language processing, Nick Bostrom, obamacare, Occupy movement, Oculus Rift, off grid, off-the-grid, offshore financial centre, operational security, optical character recognition, Parag Khanna, pattern recognition, peer-to-peer, personalized medicine, Peter H. Diamandis: Planetary Resources, Peter Thiel, pre–internet, printed gun, RAND corporation, ransomware, Ray Kurzweil, Recombinant DNA, refrigerator car, RFID, ride hailing / ride sharing, Rodney Brooks, Ross Ulbricht, Russell Brand, Salesforce, Satoshi Nakamoto, Second Machine Age, security theater, self-driving car, shareholder value, Sheryl Sandberg, Silicon Valley, Silicon Valley startup, SimCity, Skype, smart cities, smart grid, smart meter, Snapchat, social graph, SoftBank, software as a service, speech recognition, stealth mode startup, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, Stuxnet, subscription business, supply-chain management, synthetic biology, tech worker, technological singularity, TED Talk, telepresence, telepresence robot, Tesla Model S, The future is already here, The Future of Employment, the long tail, The Wisdom of Crowds, Tim Cook: Apple, trade route, uranium enrichment, Virgin Galactic, Wall-E, warehouse robotics, Watson beat the top human players on Jeopardy!, Wave and Pay, We are Anonymous. We are Legion, web application, Westphalian system, WikiLeaks, Y Combinator, you are the product, zero day

Previously, such high-tech gear would only have resided in a spy agency or with the FBI, but now, given the exponential drop in pricing of these technologies, even a neighborhood mom can spy on her kids or potentially cheating spouse. In the world of big data, we can even leak our physical location without a bugged mobile phone or GPS tracker hidden in our car. A new technology, known as an automatic license plate reader (ALPR), allows both governments and individuals to use video cameras and optical character recognition to record the locations of cars as they pass from one camera point to another, revealing the real-time movement of any vehicle throughout a city or country with great detail. From Minnesota to New Jersey, and from Ankara to Sydney, hundreds of millions of individual license plate records have been stored.

pages: 669 words: 210,153

Tools of Titans: The Tactics, Routines, and Habits of Billionaires, Icons, and World-Class Performers by Timothy Ferriss

Abraham Maslow, Adam Curtis, Airbnb, Alexander Shulgin, Alvin Toffler, An Inconvenient Truth, artificial general intelligence, asset allocation, Atul Gawande, augmented reality, back-to-the-land, Ben Horowitz, Bernie Madoff, Bertrand Russell: In Praise of Idleness, Beryl Markham, billion-dollar mistake, Black Swan, Blue Bottle Coffee, Blue Ocean Strategy, blue-collar work, book value, Boris Johnson, Buckminster Fuller, business process, Cal Newport, call centre, caloric restriction, caloric restriction, Carl Icahn, Charles Lindbergh, Checklist Manifesto, cognitive bias, cognitive dissonance, Colonization of Mars, Columbine, commoditize, correlation does not imply causation, CRISPR, David Brooks, David Graeber, deal flow, digital rights, diversification, diversified portfolio, do what you love, Donald Trump, effective altruism, Elon Musk, fail fast, fake it until you make it, fault tolerance, fear of failure, Firefox, follow your passion, fulfillment center, future of work, Future Shock, Girl Boss, Google X / Alphabet X, growth hacking, Howard Zinn, Hugh Fearnley-Whittingstall, Jeff Bezos, job satisfaction, Johann Wolfgang von Goethe, John Markoff, Kevin Kelly, Kickstarter, Lao Tzu, lateral thinking, life extension, lifelogging, Mahatma Gandhi, Marc Andreessen, Mark Zuckerberg, Mason jar, Menlo Park, microdosing, Mikhail Gorbachev, MITM: man-in-the-middle, Neal Stephenson, Nelson Mandela, Nicholas Carr, Nick Bostrom, off-the-grid, optical character recognition, PageRank, Paradox of Choice, passive income, pattern recognition, Paul Graham, peer-to-peer, Peter H. Diamandis: Planetary Resources, Peter Singer: altruism, Peter Thiel, phenotype, PIHKAL and TIHKAL, post scarcity, post-work, power law, premature optimization, private spaceflight, QWERTY keyboard, Ralph Waldo Emerson, Ray Kurzweil, recommendation engine, rent-seeking, Richard Feynman, risk tolerance, Ronald Reagan, Salesforce, selection bias, sharing economy, side project, Silicon Valley, skunkworks, Skype, Snapchat, Snow Crash, social graph, software as a service, software is eating the world, stem cell, Stephen Hawking, Steve Jobs, Stewart Brand, superintelligent machines, TED Talk, Tesla Model S, The future is already here, the long tail, The Wisdom of Crowds, Thomas L Friedman, traumatic brain injury, trolley problem, vertical integration, Wall-E, Washington Consensus, We are as Gods, Whole Earth Catalog, Y Combinator, zero-sum game

I copy them from that page and paste them into an Evernote file to have all of my notes on a specific book in one place. I also take a screen grab of a specific iPad Kindle page with my highlighted passage, and then email that screen grab into my Evernote email because Evernote has, as you know, optical character recognition. So, when I search within it, it’s also going to search the text in that image. I don’t have to wait until I finish the book to explore all my notes. . . . I love Evernote. I’ve been using it for many years, and I could probably not get through my day without it.” If Maria is reading a paper book and adding her notes in the margins (what she calls “marginalia”), she’ll sometimes add “BL” to indicate “beautiful language.”

pages: 678 words: 216,204

The Wealth of Networks: How Social Production Transforms Markets and Freedom by Yochai Benkler

affirmative action, AOL-Time Warner, barriers to entry, bioinformatics, Brownian motion, business logic, call centre, Cass Sunstein, centre right, clean water, commoditize, commons-based peer production, dark matter, desegregation, digital divide, East Village, Eben Moglen, fear of failure, Firefox, Free Software Foundation, game design, George Gilder, hiring and firing, Howard Rheingold, informal economy, information asymmetry, information security, invention of radio, Isaac Newton, iterative process, Jean Tirole, jimmy wales, John Markoff, John Perry Barlow, Kenneth Arrow, Lewis Mumford, longitudinal study, machine readable, Mahbub ul Haq, market bubble, market clearing, Marshall McLuhan, Mitch Kapor, New Journalism, optical character recognition, pattern recognition, peer-to-peer, power law, precautionary principle, pre–internet, price discrimination, profit maximization, profit motive, public intellectual, radical decentralization, random walk, Recombinant DNA, recommendation engine, regulatory arbitrage, rent-seeking, RFID, Richard Stallman, Ronald Coase, scientific management, search costs, Search for Extraterrestrial Intelligence, SETI@home, shareholder value, Silicon Valley, Skype, slashdot, social software, software patent, spectrum auction, subscription business, tacit knowledge, technological determinism, technoutopianism, The Fortune at the Bottom of the Pyramid, the long tail, The Nature of the Firm, the strength of weak ties, Timothy McVeigh, transaction costs, vertical integration, Vilfredo Pareto, work culture , Yochai Benkler

The volunteer submits a copy of the title page of the book to Michael Hart--who founded the project--for copyright research. The volunteer is notified to proceed if the book passes the copyright clearance. The decision on which book to convert to e-text is left up to the volunteer, subject to copyright limitations. Typically, a volunteer converts a book to ASCII format using OCR (optical character recognition) and proofreads it one time in order to screen it for major errors. He or she then passes the ASCII file to a volunteer proofreader. This exchange is orchestrated with very little supervision. The volunteers use a Listserv mailing list and a bulletin board to initiate and supervise the exchange.

pages: 761 words: 231,902

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil

additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, backpropagation, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business cycle, business intelligence, c2.com, call centre, carbon-based life, cellular automata, Charles Babbage, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, coronavirus, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, digital divide, disintermediation, double helix, Douglas Hofstadter, en.wikipedia.org, epigenetics, factory automation, friendly AI, functional programming, George Gilder, Gödel, Escher, Bach, Hans Moravec, hype cycle, informal economy, information retrieval, information security, invention of the telephone, invention of the telescope, invention of writing, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Marshall McLuhan, Mikhail Gorbachev, Mitch Kapor, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Nick Bostrom, Norbert Wiener, oil shale / tar sands, optical character recognition, PalmPilot, pattern recognition, phenotype, power law, precautionary principle, premature optimization, punch-card reader, quantum cryptography, quantum entanglement, radical life extension, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Robert Metcalfe, Rodney Brooks, scientific worldview, Search for Extraterrestrial Intelligence, selection bias, semantic web, seminal paper, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, Stuart Kauffman, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, two and twenty, Vernor Vinge, Y2K, Yogi Berra

There are tens of thousands of projects that are advancing the various aspects of the law of accelerating returns in diverse incremental ways. Regardless of near-term business cycles, support for "high tech" in the business community, and in particular for software development, has grown enormously. When I started my optical character recognition (OCR) and speech-synthesis company (Kurzweil Computer Products) in 1974, high-tech venture deals in the United States totaled less than thirty million dollars (in 1974 dollars). Even during the recent high-tech recession (2000–2003), the figure was almost one hundred times greater.79 We would have to repeal capitalism and every vestige of economic competition to stop this progression.

pages: 825 words: 228,141

MONEY Master the Game: 7 Simple Steps to Financial Freedom by Tony Robbins

"World Economic Forum" Davos, 3D printing, active measures, activist fund / activist shareholder / activist investor, addicted to oil, affirmative action, Affordable Care Act / Obamacare, Albert Einstein, asset allocation, backtesting, Bear Stearns, behavioural economics, bitcoin, Black Monday: stock market crash in 1987, buy and hold, Carl Icahn, clean water, cloud computing, corporate governance, corporate raider, correlation does not imply causation, Credit Default Swap, currency risk, Dean Kamen, declining real wages, diversification, diversified portfolio, Donald Trump, estate planning, fear of failure, fiat currency, financial independence, fixed income, forensic accounting, high net worth, index fund, Internet of things, invention of the wheel, it is difficult to get a man to understand something, when his salary depends on his not understanding it, Jeff Bezos, John Bogle, junk bonds, Kenneth Rogoff, lake wobegon effect, Lao Tzu, London Interbank Offered Rate, low interest rates, Marc Benioﬀ, market bubble, Michael Milken, money market fund, mortgage debt, Neil Armstrong, new economy, obamacare, offshore financial centre, oil shock, optical character recognition, Own Your Own Home, passive investing, profit motive, Ralph Waldo Emerson, random walk, Ray Kurzweil, Richard Thaler, risk free rate, risk tolerance, riskless arbitrage, Robert Shiller, Salesforce, San Francisco homelessness, self-driving car, shareholder value, Silicon Valley, Skype, Snapchat, sovereign wealth fund, stem cell, Steve Jobs, subscription business, survivorship bias, tail risk, TED Talk, telerobotics, The 4% rule, The future is already here, the rule of 72, thinkpad, tontine, transaction costs, Upton Sinclair, Vanguard fund, World Values Survey, X Prize, Yogi Berra, young professional, zero-sum game

If you’ve ever dictated an email to Siri or other voice-to-text systems, that’s because of Ray. I remember meeting Ray Kurzweil nearly 20 years ago and listening with amazement as he described the future. It seemed like magic then, but it’s all real now. Self-driving cars. A computer that could beat the world’s greatest chess master. He had already invented an optical character-recognition system to create the first reading machine for the blind—Stevie Wonder was his first customer. Now he wanted to help blind people read street signs and navigate cities without help, and go into restaurants and order off the menu using a little device the size of a pack of cigarettes. He told me the year it was going to happen: 2005.

Growth: From Microorganisms to Megacities by Vaclav Smil

2013 Report for America's Infrastructure - American Society of Civil Engineers - 19 March 2013, 3D printing, agricultural Revolution, air freight, Alan Greenspan, American Society of Civil Engineers: Report Card, Anthropocene, Apollo 11, Apollo Guidance Computer, autonomous vehicles, Benoit Mandelbrot, Berlin Wall, Bernie Madoff, Boeing 747, Bretton Woods, British Empire, business cycle, caloric restriction, caloric restriction, carbon tax, circular economy, colonial rule, complexity theory, coronavirus, decarbonisation, degrowth, deindustrialization, dematerialisation, demographic dividend, demographic transition, Deng Xiaoping, disruptive innovation, Dissolution of the Soviet Union, Easter island, endogenous growth, energy transition, epigenetics, Fairchild Semiconductor, Ford Model T, general purpose technology, Gregor Mendel, happiness index / gross national happiness, Helicobacter pylori, high-speed rail, hydraulic fracturing, hydrogen economy, Hyperloop, illegal immigration, income inequality, income per capita, industrial robot, Intergovernmental Panel on Climate Change (IPCC), invention of movable type, Isaac Newton, James Watt: steam engine, knowledge economy, Kondratiev cycle, labor-force participation, Law of Accelerating Returns, longitudinal study, low interest rates, mandelbrot fractal, market bubble, mass immigration, McMansion, megacity, megaproject, megastructure, meta-analysis, microbiome, microplastics / micro fibres, moral hazard, Network effects, new economy, New Urbanism, old age dependency ratio, optical character recognition, out of africa, peak oil, Pearl River Delta, phenotype, Pierre-Simon Laplace, planetary scale, Ponzi scheme, power law, Productivity paradox, profit motive, purchasing power parity, random walk, Ray Kurzweil, Report Card for America’s Infrastructure, Republic of Letters, rolodex, Silicon Valley, Simon Kuznets, social distancing, South China Sea, synthetic biology, techno-determinism, technoutopianism, the market place, The Rise and Fall of American Growth, three-masted sailing ship, total factor productivity, trade liberalization, trade route, urban sprawl, Vilfredo Pareto, yield curve

But the rapid diffusion of electronics and software are trivial matters compared to the expected ultimate achievements of accelerated growth—and nobody has expressed them more expansively than Ray Kurzweil, since 2012 the director of engineering at Google and long before that the inventor of such electronic devices as the charged-couple flat-bed scanner, the first commercial text-to-speech synthesizer, and the first omnifont optical character recognition. In 2001 he formulated his law of accelerating returns (Kurzweil 2001, 1): An analysis of the history of technology shows that technological change is exponential, contrary to the common-sense “intuitive linear” view. So we won’t experience 100 years of progress in the 21st century—it will be more like 20,000 years of progress (at today’s rate).

pages: 889 words: 433,897

The Best of 2600: A Hacker Odyssey by Emmanuel Goldstein

affirmative action, Apple II, benefit corporation, call centre, disinformation, don't be evil, Firefox, game design, Hacker Ethic, hiring and firing, information retrieval, information security, John Markoff, John Perry Barlow, late fees, license plate recognition, Mitch Kapor, MITM: man-in-the-middle, Oklahoma City bombing, optical character recognition, OSI model, packet switching, pirate software, place-making, profit motive, QWERTY keyboard, RFID, Robert Hanssen: Double agent, rolodex, Ronald Reagan, satellite internet, Silicon Valley, Skype, spectrum auction, statistical model, Steve Jobs, Steve Wozniak, Steven Levy, Telecommunications Act of 1996, telemarketer, undersea cable, UUNET, Y2K

If there are different speed limits for trucks and cars, this is how they can be differentiated. The resultant freeze frame will be automatically processed to produce a printed picture of your vehicle from the rear, showing your license plate, and then imprint the image with your vehicle’s speed, the date, and time. AT&T is above 95 percent accuracy in doing optical character recognition on your license plate and automatically entering the plate number into the computer system. Imagine how easy those European license plates must be for OCR. Now if we could just standardize the print and colors used on U.S. plates.… 94192c10.qxd 6/3/08 3:32 PM Page 331 Learning to Hack Other Things Not uncommonly, a second camera will simultaneously take a photo of the driver.