iterative process

95 results back to index

pages: 893 words: 199,542

Structure and interpretation of computer programs by Harold Abelson, Gerald Jay Sussman, Julie Sussman


Andrew Wiles, conceptual framework, Donald Knuth, Douglas Hofstadter, Eratosthenes, Fermat's Last Theorem, Gödel, Escher, Bach, industrial robot, information retrieval, iterative process, loose coupling, probability theory / Blaise Pascal / Pierre de Fermat, Richard Stallman, Turing machine

By contrast, the second process does not grow and shrink. At each step, all we need to keep track of, for any n, are the current values of the variables product, counter, and max-count. We call this an iterative process. In general, an iterative process is one whose state can be summarized by a fixed number of state variables, together with a fixed rule that describes how the state variables should be updated as the process moves from state to state and an (optional) end test that specifies conditions under which the process should terminate. In computing n!, the number of steps required grows linearly with n. Such a process is called a linear iterative process. The contrast between the two processes can be seen in another way. In the iterative case, the program variables provide a complete description of the state of the process at any point.

It may seem disturbing that we refer to a recursive procedure such as fact-iter as generating an iterative process. However, the process really is iterative: Its state is captured completely by its three state variables, and an interpreter need keep track of only three variables in order to execute the process. One reason that the distinction between process and procedure may be confusing is that most implementations of common languages (including Ada, Pascal, and C) are designed in such a way that the interpretation of any recursive procedure consumes an amount of memory that grows with the number of procedure calls, even when the process described is, in principle, iterative. As a consequence, these languages can describe iterative processes only by resorting to special-purpose “looping constructs” such as do, repeat, until, for, and while.

As a consequence, these languages can describe iterative processes only by resorting to special-purpose “looping constructs” such as do, repeat, until, for, and while. The implementation of Scheme we shall consider in chapter 5 does not share this defect. It will execute an iterative process in constant space, even if the iterative process is described by a recursive procedure. An implementation with this property is called tail-recursive. With a tail-recursive implementation, iteration can be expressed using the ordinary procedure call mechanism, so that special iteration constructs are useful only as syntactic sugar.31 Exercise 1.9. Each of the following two procedures defines a method for adding two positive integers in terms of the procedures inc, which increments its argument by 1, and dec, which decrements its argument by 1. (define (+ a b) (if (= a 0) b (inc (+ (dec a) b)))) (define (+ a b) (if (= a 0) b (+ (dec a) (inc b)))) Using the substitution model, illustrate the process generated by each procedure in evaluating (+ 4 5).

pages: 1,387 words: 202,295

Structure and Interpretation of Computer Programs, Second Edition by Harold Abelson, Gerald Jay Sussman, Julie Sussman


Andrew Wiles, conceptual framework, Donald Knuth, Douglas Hofstadter, Eratosthenes, Gödel, Escher, Bach, industrial robot, information retrieval, iterative process, loose coupling, probability theory / Blaise Pascal / Pierre de Fermat, Richard Stallman, Turing machine, wikimedia commons

By contrast, the second process does not grow and shrink. At each step, all we need to keep track of, for any , are the current values of the variables product, counter, and max-count. We call this an iterative process. In general, an iterative process is one whose state can be summarized by a fixed number of state variables, together with a fixed rule that describes how the state variables should be updated as the process moves from state to state and an (optional) end test that specifies conditions under which the process should terminate. In computing , the number of steps required grows linearly with . Such a process is called a linear iterative process. The contrast between the two processes can be seen in another way. In the iterative case, the program variables provide a complete description of the state of the process at any point.

It may seem disturbing that we refer to a recursive procedure such as fact-iter as generating an iterative process. However, the process really is iterative: Its state is captured completely by its three state variables, and an interpreter need keep track of only three variables in order to execute the process. One reason that the distinction between process and procedure may be confusing is that most implementations of common languages (including Ada, Pascal, and C) are designed in such a way that the interpretation of any recursive procedure consumes an amount of memory that grows with the number of procedure calls, even when the process described is, in principle, iterative. As a consequence, these languages can describe iterative processes only by resorting to special-purpose “looping constructs” such as do, repeat, until, for, and while.

As a consequence, these languages can describe iterative processes only by resorting to special-purpose “looping constructs” such as do, repeat, until, for, and while. The implementation of Scheme we shall consider in Chapter 5 does not share this defect. It will execute an iterative process in constant space, even if the iterative process is described by a recursive procedure. An implementation with this property is called tail-recursive. With a tail-recursive implementation, iteration can be expressed using the ordinary procedure call mechanism, so that special iteration constructs are useful only as syntactic sugar.31 Exercise 1.9: Each of the following two procedures defines a method for adding two positive integers in terms of the procedures inc, which increments its argument by 1, and dec, which decrements its argument by 1. (define (+ a b) (if (= a 0) b (inc (+ (dec a) b)))) (define (+ a b) (if (= a 0) b (+ (dec a) (inc b)))) Using the substitution model, illustrate the process generated by each procedure in evaluating (+ 4 5).

pages: 132 words: 31,976

Getting Real by Jason Fried, David Heinemeier Hansson, Matthew Linderman, 37 Signals


call centre, collaborative editing, David Heinemeier Hansson, iterative process, John Gruber, knowledge worker, Merlin Mann, Metcalfe's law, performance metric, premature optimization, Ruby on Rails, slashdot, Steve Jobs, web application

—Matt Hamer, developer and product manager, Kinja Table of contents | Essay list for this chapter | Next essay Rinse and Repeat Work in iterations Don't expect to get it right the first time. Let the app grow and speak to you. Let it morph and evolve. With web-based software there's no need to ship perfection. Design screens, use them, analyze them, and then start over again. Instead of banking on getting everything right upfront, the iterative process lets you continue to make informed decisions as you go along. Plus, you'll get an active app up and running quicker since you're not striving for perfection right out the gate. The result is real feedback and real guidance on what requires your attention. Iterations lead to liberation You don't need to aim for perfection on the first try if you know it's just going to be done again later anyway.

No One's Going to Read It I can't even count how many multi-page product specifications or business requirement documents that have languished, unread, gathering dust nearby my dev team while we coded away, discussing problems, asking questions and user testing as we went. I've even worked with developers who've spent hours writing long, descriptive emails or coding standards documents that also went unread. Webapps don't move forward with copious documentation. Software development is a constantly shifting, iterative process that involves interaction, snap decisions, and impossible-to-predict issues that crop up along the way. None of this can or should be captured on paper. Don't waste your time typing up that long visionary tome; no one's going to read it. Take consolation in the fact that if you give your product enough room to grow itself, in the end it won't resemble anything you wrote about anyway. —Gina Trapani, web developer and editor of Lifehacker, the productivity and software guide Table of contents | Essay list for this chapter | Next essay Tell Me a Quick Story Write stories, not details If you do find yourself requiring words to explain a new feature or concept, write a brief story about it.

pages: 1,758 words: 342,766

Code Complete (Developer Best Practices) by Steve McConnell


Ada Lovelace, Albert Einstein, Buckminster Fuller, call centre, choice architecture, continuous integration, data acquisition, database schema, don't repeat yourself, Donald Knuth, fault tolerance, Grace Hopper, haute cuisine, if you see hoof prints, think horses—not zebras, index card, inventory management, iterative process, Larry Wall, late fees, loose coupling, Menlo Park, Perl 6, place-making, premature optimization, revision control, Sapir-Whorf hypothesis, slashdot, sorting algorithm, statistical model, Tacoma Narrows Bridge, the scientific method, Thomas Kuhn: the structure of scientific revolutions, Turing machine, web application

The quality of the thinking that goes into a program largely determines the quality of the program, so paying attention to warnings about the quality of thinking directly affects the final product. 34.8. Iterate, Repeatedly, Again and Again Iteration is appropriate for many software-development activities. During your initial specification of a system, you work with the user through several versions of requirements until you're sure you agree on them. That's an iterative process. When you build flexibility into your process by building and delivering a system in several increments, that's an iterative process. If you use prototyping to develop several alternative solutions quickly and cheaply before crafting the final product, that's another form of iteration. Iterating on requirements is perhaps as important as any other aspect of the software-development process. Projects fail because they commit themselves to a solution before exploring alternatives.

—Scott Meyers The interface to a class should reveal as little as possible about its inner workings. As shown in Figure 5-9, a class is a lot like an iceberg: seven-eighths is under water, and you can see only the one-eighth that's above the surface. Figure 5-9. A good class interface is like the tip of an iceberg, leaving most of the class unexposed [View full size image] Designing the class interface is an iterative process just like any other aspect of design. If you don't get the interface right the first time, try a few more times until it stabilizes. If it doesn't stabilize, you need to try a different approach. An Example of Information Hiding Suppose you have a program in which each object is supposed to have a unique ID stored in a member variable called id. One design approach would be to use integers for the IDs and to store the highest ID assigned so far in a global variable called g_maxId.

Iterate You might have had an experience in which you learned so much from writing a program that you wished you could write it again, armed with the insights you gained from writing it the first time. The same phenomenon applies to design, but the design cycles are shorter and the effects downstream are bigger, so you can afford to whirl through the design loop a few times. Design is an iterative process. You don't usually go from point A only to point B; you go from point A to point B and back to point A. As you cycle through candidate designs and try different approaches, you'll look at both high-level and low-level views. The big picture you get from working with high-level issues will help you to put the low-level details in perspective. The details you get from working with low-level issues will provide a foundation in solid reality for the high-level decisions.

Martin Kleppmann-Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems-O’Reilly (2017) by Unknown

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process,, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, general-purpose programming language, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, loose coupling, Marc Andreessen, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, web application, WebSocket, wikimedia commons

Batch Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Batch Processing with Unix Tools Simple Log Analysis The Unix Philosophy MapReduce and Distributed Filesystems MapReduce Job Execution Reduce-Side Joins and Grouping Map-Side Joins The Output of Batch Workflows Comparing Hadoop to Distributed Databases Beyond MapReduce Materialization of Intermediate State Graphs and Iterative Processing High-Level APIs and Languages Summary 391 391 394 397 399 403 408 411 414 419 419 424 426 429 11. Stream Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439 Transmitting Event Streams Messaging Systems Partitioned Logs Databases and Streams Keeping Systems in Sync Change Data Capture Event Sourcing State, Streams, and Immutability Processing Streams Uses of Stream Processing Reasoning About Time Stream Joins Fault Tolerance Summary 440 441 446 451 452 454 457 459 464 465 468 472 476 479 12.

When the job completes, its output needs to go somewhere durable so that users can find it and use it—most likely, it is written to the distributed filesystem again. Thus, when using a dataflow engine, materialized datasets on HDFS are still usually the inputs and the final outputs of a job. Like with MapReduce, the inputs are immutable and the output is completely replaced. The improvement over MapReduce is that you save yourself writing all the intermediate state to the filesystem as well. Beyond MapReduce | 423 Graphs and Iterative Processing In “Graph-Like Data Models” on page 49 we discussed using graphs for modeling data, and using graph query languages to traverse the edges and vertices in a graph. The discussion in Chapter 2 was focused around OLTP-style use: quickly executing queries to find a small number of vertices matching certain criteria. It is also interesting to look at graphs in a batch processing context, where the goal is to perform some kind of offline processing or analysis on an entire graph.

The opposite of bounded. 558 | Glossary Index A aborts (transactions), 222, 224 in two-phase commit, 356 performance of optimistic concurrency con‐ trol, 266 retrying aborted transactions, 231 abstraction, 21, 27, 222, 266, 321 access path (in network model), 37, 60 accidental complexity, removing, 21 accountability, 535 ACID properties (transactions), 90, 223 atomicity, 223, 228 consistency, 224, 529 durability, 226 isolation, 225, 228 acknowledgements (messaging), 445 active/active replication (see multi-leader repli‐ cation) active/passive replication (see leader-based rep‐ lication) ActiveMQ (messaging), 137, 444 distributed transaction support, 361 ActiveRecord (object-relational mapper), 30, 232 actor model, 138 (see also message-passing) comparison to Pregel model, 425 comparison to stream processing, 468 Advanced Message Queuing Protocol (see AMQP) aerospace systems, 6, 10, 305, 372 aggregation data cubes and materialized views, 101 in batch processes, 406 in stream processes, 466 aggregation pipeline query language, 48 Agile, 22 minimizing irreversibility, 414, 497 moving faster with confidence, 532 Unix philosophy, 394 agreement, 365 (see also consensus) Airflow (workflow scheduler), 402 Ajax, 131 Akka (actor framework), 139 algorithms algorithm correctness, 308 B-trees, 79-83 for distributed systems, 306 hash indexes, 72-75 mergesort, 76, 402, 405 red-black trees, 78 SSTables and LSM-trees, 76-79 all-to-all replication topologies, 175 AllegroGraph (database), 50 ALTER TABLE statement (SQL), 40, 111 Amazon Dynamo (database), 177 Amazon Web Services (AWS), 8 Kinesis Streams (messaging), 448 network reliability, 279 postmortems, 9 RedShift (database), 93 S3 (object storage), 398 checking data integrity, 530 amplification of bias, 534 of failures, 364, 495 Index | 559 of tail latency, 16, 207 write amplification, 84 AMQP (Advanced Message Queuing Protocol), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 message ordering, 446 analytics, 90 comparison to transaction processing, 91 data warehousing (see data warehousing) parallel query execution in MPP databases, 415 predictive (see predictive analytics) relation to batch processing, 411 schemas for, 93-95 snapshot isolation for queries, 238 stream analytics, 466 using MapReduce, analysis of user activity events (example), 404 anti-caching (in-memory databases), 89 anti-entropy, 178 Apache ActiveMQ (see ActiveMQ) Apache Avro (see Avro) Apache Beam (see Beam) Apache BookKeeper (see BookKeeper) Apache Cassandra (see Cassandra) Apache CouchDB (see CouchDB) Apache Curator (see Curator) Apache Drill (see Drill) Apache Flink (see Flink) Apache Giraph (see Giraph) Apache Hadoop (see Hadoop) Apache HAWQ (see HAWQ) Apache HBase (see HBase) Apache Helix (see Helix) Apache Hive (see Hive) Apache Impala (see Impala) Apache Jena (see Jena) Apache Kafka (see Kafka) Apache Lucene (see Lucene) Apache MADlib (see MADlib) Apache Mahout (see Mahout) Apache Oozie (see Oozie) Apache Parquet (see Parquet) Apache Qpid (see Qpid) Apache Samza (see Samza) Apache Solr (see Solr) Apache Spark (see Spark) 560 | Index Apache Storm (see Storm) Apache Tajo (see Tajo) Apache Tez (see Tez) Apache Thrift (see Thrift) Apache ZooKeeper (see ZooKeeper) Apama (stream analytics), 466 append-only B-trees, 82, 242 append-only files (see logs) Application Programming Interfaces (APIs), 5, 27 for batch processing, 403 for change streams, 456 for distributed transactions, 361 for graph processing, 425 for services, 131-136 (see also services) evolvability, 136 RESTful, 133 SOAP, 133 application state (see state) approximate search (see similarity search) archival storage, data from databases, 131 arcs (see edges) arithmetic mean, 14 ASCII text, 119, 395 ASN.1 (schema language), 127 asynchronous networks, 278, 553 comparison to synchronous networks, 284 formal model, 307 asynchronous replication, 154, 553 conflict detection, 172 data loss on failover, 157 reads from asynchronous follower, 162 Asynchronous Transfer Mode (ATM), 285 atomic broadcast (see total order broadcast) atomic clocks (caesium clocks), 294, 295 (see also clocks) atomicity (concurrency), 553 atomic increment-and-get, 351 compare-and-set, 245, 327 (see also compare-and-set operations) replicated operations, 246 write operations, 243 atomicity (transactions), 223, 228, 553 atomic commit, 353 avoiding, 523, 528 blocking and nonblocking, 359 in stream processing, 360, 477 maintaining derived data, 453 for multi-object transactions, 229 for single-object writes, 230 auditability, 528-533 designing for, 531 self-auditing systems, 530 through immutability, 460 tools for auditable data systems, 532 availability, 8 (see also fault tolerance) in CAP theorem, 337 in service level agreements (SLAs), 15 Avro (data format), 122-127 code generation, 127 dynamically generated schemas, 126 object container files, 125, 131, 414 reader determining writer’s schema, 125 schema evolution, 123 use in Hadoop, 414 awk (Unix tool), 391 AWS (see Amazon Web Services) Azure (see Microsoft) B B-trees (indexes), 79-83 append-only/copy-on-write variants, 82, 242 branching factor, 81 comparison to LSM-trees, 83-85 crash recovery, 82 growing by splitting a page, 81 optimizations, 82 similarity to dynamic partitioning, 212 backpressure, 441, 553 in TCP, 282 backups database snapshot for replication, 156 integrity of, 530 snapshot isolation for, 238 use for ETL processes, 405 backward compatibility, 112 BASE, contrast to ACID, 223 bash shell (Unix), 70, 395, 503 batch processing, 28, 389-431, 553 combining with stream processing lambda architecture, 497 unifying technologies, 498 comparison to MPP databases, 414-418 comparison to stream processing, 464 comparison to Unix, 413-414 dataflow engines, 421-423 fault tolerance, 406, 414, 422, 442 for data integration, 494-498 graphs and iterative processing, 424-426 high-level APIs and languages, 403, 426-429 log-based messaging and, 451 maintaining derived state, 495 MapReduce and distributed filesystems, 397-413 (see also MapReduce) measuring performance, 13, 390 outputs, 411-413 key-value stores, 412 search indexes, 411 using Unix tools (example), 391-394 Bayou (database), 522 Beam (dataflow library), 498 bias, 534 big ball of mud, 20 Bigtable data model, 41, 99 binary data encodings, 115-128 Avro, 122-127 MessagePack, 116-117 Thrift and Protocol Buffers, 117-121 binary encoding based on schemas, 127 by network drivers, 128 binary strings, lack of support in JSON and XML, 114 BinaryProtocol encoding (Thrift), 118 Bitcask (storage engine), 72 crash recovery, 74 Bitcoin (cryptocurrency), 532 Byzantine fault tolerance, 305 concurrency bugs in exchanges, 233 bitmap indexes, 97 blockchains, 532 Byzantine fault tolerance, 305 blocking atomic commit, 359 Bloom (programming language), 504 Bloom filter (algorithm), 79, 466 BookKeeper (replicated log), 372 Bottled Water (change data capture), 455 bounded datasets, 430, 439, 553 (see also batch processing) bounded delays, 553 in networks, 285 process pauses, 298 broadcast hash joins, 409 Index | 561 brokerless messaging, 442 Brubeck (metrics aggregator), 442 BTM (transaction coordinator), 356 bulk synchronous parallel (BSP) model, 425 bursty network traffic patterns, 285 business data processing, 28, 90, 390 byte sequence, encoding data in, 112 Byzantine faults, 304-306, 307, 553 Byzantine fault-tolerant systems, 305, 532 Byzantine Generals Problem, 304 consensus algorithms and, 366 C caches, 89, 553 and materialized views, 101 as derived data, 386, 499-504 database as cache of transaction log, 460 in CPUs, 99, 338, 428 invalidation and maintenance, 452, 467 linearizability, 324 CAP theorem, 336-338, 554 Cascading (batch processing), 419, 427 hash joins, 409 workflows, 403 cascading failures, 9, 214, 281 Cascalog (batch processing), 60 Cassandra (database) column-family data model, 41, 99 compaction strategy, 79 compound primary key, 204 gossip protocol, 216 hash partitioning, 203-205 last-write-wins conflict resolution, 186, 292 leaderless replication, 177 linearizability, lack of, 335 log-structured storage, 78 multi-datacenter support, 184 partitioning scheme, 213 secondary indexes, 207 sloppy quorums, 184 cat (Unix tool), 391 causal context, 191 (see also causal dependencies) causal dependencies, 186-191 capturing, 191, 342, 494, 514 by total ordering, 493 causal ordering, 339 in transactions, 262 sending message to friends (example), 494 562 | Index causality, 554 causal ordering, 339-343 linearizability and, 342 total order consistent with, 344, 345 consistency with, 344-347 consistent snapshots, 340 happens-before relationship, 186 in serializable transactions, 262-265 mismatch with clocks, 292 ordering events to capture, 493 violations of, 165, 176, 292, 340 with synchronized clocks, 294 CEP (see complex event processing) certificate transparency, 532 chain replication, 155 linearizable reads, 351 change data capture, 160, 454 API support for change streams, 456 comparison to event sourcing, 457 implementing, 454 initial snapshot, 455 log compaction, 456 changelogs, 460 change data capture, 454 for operator state, 479 generating with triggers, 455 in stream joins, 474 log compaction, 456 maintaining derived state, 452 Chaos Monkey, 7, 280 checkpointing in batch processors, 422, 426 in high-performance computing, 275 in stream processors, 477, 523 chronicle data model, 458 circuit-switched networks, 284 circular buffers, 450 circular replication topologies, 175 clickstream data, analysis of, 404 clients calling services, 131 pushing state changes to, 512 request routing, 214 stateful and offline-capable, 170, 511 clocks, 287-299 atomic (caesium) clocks, 294, 295 confidence interval, 293-295 for global snapshots, 294 logical (see logical clocks) skew, 291-294, 334 slewing, 289 synchronization and accuracy, 289-291 synchronization using GPS, 287, 290, 294, 295 time-of-day versus monotonic clocks, 288 timestamping events, 471 cloud computing, 146, 275 need for service discovery, 372 network glitches, 279 shared resources, 284 single-machine reliability, 8 Cloudera Impala (see Impala) clustered indexes, 86 CODASYL model, 36 (see also network model) code generation with Avro, 127 with Thrift and Protocol Buffers, 118 with WSDL, 133 collaborative editing multi-leader replication and, 170 column families (Bigtable), 41, 99 column-oriented storage, 95-101 column compression, 97 distinction between column families and, 99 in batch processors, 428 Parquet, 96, 131, 414 sort order in, 99-100 vectorized processing, 99, 428 writing to, 101 comma-separated values (see CSV) command query responsibility segregation (CQRS), 462 commands (event sourcing), 459 commits (transactions), 222 atomic commit, 354-355 (see also atomicity; transactions) read committed isolation, 234 three-phase commit (3PC), 359 two-phase commit (2PC), 355-359 commutative operations, 246 compaction of changelogs, 456 (see also log compaction) for stream operator state, 479 of log-structured storage, 73 issues with, 84 size-tiered and leveled approaches, 79 CompactProtocol encoding (Thrift), 119 compare-and-set operations, 245, 327 implementing locks, 370 implementing uniqueness constraints, 331 implementing with total order broadcast, 350 relation to consensus, 335, 350, 352, 374 relation to transactions, 230 compatibility, 112, 128 calling services, 136 properties of encoding formats, 139 using databases, 129-131 using message-passing, 138 compensating transactions, 355, 461, 526 complex event processing (CEP), 465 complexity distilling in theoretical models, 310 hiding using abstraction, 27 of software systems, managing, 20 composing data systems (see unbundling data‐ bases) compute-intensive applications, 3, 275 concatenated indexes, 87 in Cassandra, 204 Concord (stream processor), 466 concurrency actor programming model, 138, 468 (see also message-passing) bugs from weak transaction isolation, 233 conflict resolution, 171, 174 detecting concurrent writes, 184-191 dual writes, problems with, 453 happens-before relationship, 186 in replicated systems, 161-191, 324-338 lost updates, 243 multi-version concurrency control (MVCC), 239 optimistic concurrency control, 261 ordering of operations, 326, 341 reducing, through event logs, 351, 462, 507 time and relativity, 187 transaction isolation, 225 write skew (transaction isolation), 246-251 conflict-free replicated datatypes (CRDTs), 174 conflicts conflict detection, 172 causal dependencies, 186, 342 in consensus algorithms, 368 in leaderless replication, 184 Index | 563 in log-based systems, 351, 521 in nonlinearizable systems, 343 in serializable snapshot isolation (SSI), 264 in two-phase commit, 357, 364 conflict resolution automatic conflict resolution, 174 by aborting transactions, 261 by apologizing, 527 convergence, 172-174 in leaderless systems, 190 last write wins (LWW), 186, 292 using atomic operations, 246 using custom logic, 173 determining what is a conflict, 174, 522 in multi-leader replication, 171-175 avoiding conflicts, 172 lost updates, 242-246 materializing, 251 relation to operation ordering, 339 write skew (transaction isolation), 246-251 congestion (networks) avoidance, 282 limiting accuracy of clocks, 293 queueing delays, 282 consensus, 321, 364-375, 554 algorithms, 366-368 preventing split brain, 367 safety and liveness properties, 365 using linearizable operations, 351 cost of, 369 distributed transactions, 352-375 in practice, 360-364 two-phase commit, 354-359 XA transactions, 361-364 impossibility of, 353 membership and coordination services, 370-373 relation to compare-and-set, 335, 350, 352, 374 relation to replication, 155, 349 relation to uniqueness constraints, 521 consistency, 224, 524 across different databases, 157, 452, 462, 492 causal, 339-348, 493 consistent prefix reads, 165-167 consistent snapshots, 156, 237-242, 294, 455, 500 (see also snapshots) 564 | Index crash recovery, 82 enforcing constraints (see constraints) eventual, 162, 322 (see also eventual consistency) in ACID transactions, 224, 529 in CAP theorem, 337 linearizability, 324-338 meanings of, 224 monotonic reads, 164-165 of secondary indexes, 231, 241, 354, 491, 500 ordering guarantees, 339-352 read-after-write, 162-164 sequential, 351 strong (see linearizability) timeliness and integrity, 524 using quorums, 181, 334 consistent hashing, 204 consistent prefix reads, 165 constraints (databases), 225, 248 asynchronously checked, 526 coordination avoidance, 527 ensuring idempotence, 519 in log-based systems, 521-524 across multiple partitions, 522 in two-phase commit, 355, 357 relation to consensus, 374, 521 relation to event ordering, 347 requiring linearizability, 330 Consul (service discovery), 372 consumers (message streams), 137, 440 backpressure, 441 consumer offsets in logs, 449 failures, 445, 449 fan-out, 11, 445, 448 load balancing, 444, 448 not keeping up with producers, 441, 450, 502 context switches, 14, 297 convergence (conflict resolution), 172-174, 322 coordination avoidance, 527 cross-datacenter, 168, 493 cross-partition ordering, 256, 294, 348, 523 services, 330, 370-373 coordinator (in 2PC), 356 failure, 358 in XA transactions, 361-364 recovery, 363 copy-on-write (B-trees), 82, 242 CORBA (Common Object Request Broker Architecture), 134 correctness, 6 auditability, 528-533 Byzantine fault tolerance, 305, 532 dealing with partial failures, 274 in log-based systems, 521-524 of algorithm within system model, 308 of compensating transactions, 355 of consensus, 368 of derived data, 497, 531 of immutable data, 461 of personal data, 535, 540 of time, 176, 289-295 of transactions, 225, 515, 529 timeliness and integrity, 524-528 corruption of data detecting, 519, 530-533 due to pathological memory access, 529 due to radiation, 305 due to split brain, 158, 302 due to weak transaction isolation, 233 formalization in consensus, 366 integrity as absence of, 524 network packets, 306 on disks, 227 preventing using write-ahead logs, 82 recovering from, 414, 460 Couchbase (database) durability, 89 hash partitioning, 203-204, 211 rebalancing, 213 request routing, 216 CouchDB (database) B-tree storage, 242 change feed, 456 document data model, 31 join support, 34 MapReduce support, 46, 400 replication, 170, 173 covering indexes, 86 CPUs cache coherence and memory barriers, 338 caching and pipelining, 99, 428 increasing parallelism, 43 CRDTs (see conflict-free replicated datatypes) CREATE INDEX statement (SQL), 85, 500 credit rating agencies, 535 Crunch (batch processing), 419, 427 hash joins, 409 sharded joins, 408 workflows, 403 cryptography defense against attackers, 306 end-to-end encryption and authentication, 519, 543 proving integrity of data, 532 CSS (Cascading Style Sheets), 44 CSV (comma-separated values), 70, 114, 396 Curator (ZooKeeper recipes), 330, 371 curl (Unix tool), 135, 397 cursor stability, 243 Cypher (query language), 52 comparison to SPARQL, 59 D data corruption (see corruption of data) data cubes, 102 data formats (see encoding) data integration, 490-498, 543 batch and stream processing, 494-498 lambda architecture, 497 maintaining derived state, 495 reprocessing data, 496 unifying, 498 by unbundling databases, 499-515 comparison to federated databases, 501 combining tools by deriving data, 490-494 derived data versus distributed transac‐ tions, 492 limits of total ordering, 493 ordering events to capture causality, 493 reasoning about dataflows, 491 need for, 385 data lakes, 415 data locality (see locality) data models, 27-64 graph-like models, 49-63 Datalog language, 60-63 property graphs, 50 RDF and triple-stores, 55-59 query languages, 42-48 relational model versus document model, 28-42 data protection regulations, 542 data systems, 3 about, 4 Index | 565 concerns when designing, 5 future of, 489-544 correctness, constraints, and integrity, 515-533 data integration, 490-498 unbundling databases, 499-515 heterogeneous, keeping in sync, 452 maintainability, 18-22 possible faults in, 221 reliability, 6-10 hardware faults, 7 human errors, 9 importance of, 10 software errors, 8 scalability, 10-18 unreliable clocks, 287-299 data warehousing, 91-95, 554 comparison to data lakes, 415 ETL (extract-transform-load), 92, 416, 452 keeping data systems in sync, 452 schema design, 93 slowly changing dimension (SCD), 476 data-intensive applications, 3 database triggers (see triggers) database-internal distributed transactions, 360, 364, 477 databases archival storage, 131 comparison of message brokers to, 443 dataflow through, 129 end-to-end argument for, 519-520 checking integrity, 531 inside-out, 504 (see also unbundling databases) output from batch workflows, 412 relation to event streams, 451-464 (see also changelogs) API support for change streams, 456, 506 change data capture, 454-457 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 unbundling, 499-515 composing data storage technologies, 499-504 designing applications around dataflow, 504-509 566 | Index observing derived state, 509-515 datacenters geographically distributed, 145, 164, 278, 493 multi-tenancy and shared resources, 284 network architecture, 276 network faults, 279 replication across multiple, 169 leaderless replication, 184 multi-leader replication, 168, 335 dataflow, 128-139, 504-509 correctness of dataflow systems, 525 differential, 504 message-passing, 136-139 reasoning about, 491 through databases, 129 through services, 131-136 dataflow engines, 421-423 comparison to stream processing, 464 directed acyclic graphs (DAG), 424 partitioning, approach to, 429 support for declarative queries, 427 Datalog (query language), 60-63 datatypes binary strings in XML and JSON, 114 conflict-free, 174 in Avro encodings, 122 in Thrift and Protocol Buffers, 121 numbers in XML and JSON, 114 Datomic (database) B-tree storage, 242 data model, 50, 57 Datalog query language, 60 excision (deleting data), 463 languages for transactions, 255 serial execution of transactions, 253 deadlocks detection, in two-phase commit (2PC), 364 in two-phase locking (2PL), 258 Debezium (change data capture), 455 declarative languages, 42, 554 Bloom, 504 CSS and XSL, 44 Cypher, 52 Datalog, 60 for batch processing, 427 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 delays bounded network delays, 285 bounded process pauses, 298 unbounded network delays, 282 unbounded process pauses, 296 deleting data, 463 denormalization (data representation), 34, 554 costs, 39 in derived data systems, 386 materialized views, 101 updating derived data, 228, 231, 490 versus normalization, 462 derived data, 386, 439, 554 from change data capture, 454 in event sourcing, 458-458 maintaining derived state through logs, 452-457, 459-463 observing, by subscribing to streams, 512 outputs of batch and stream processing, 495 through application code, 505 versus distributed transactions, 492 deterministic operations, 255, 274, 554 accidental nondeterminism, 423 and fault tolerance, 423, 426 and idempotence, 478, 492 computing derived data, 495, 526, 531 in state machine replication, 349, 452, 458 joins, 476 DevOps, 394 differential dataflow, 504 dimension tables, 94 dimensional modeling (see star schemas) directed acyclic graphs (DAGs), 424 dirty reads (transaction isolation), 234 dirty writes (transaction isolation), 235 discrimination, 534 disks (see hard disks) distributed actor frameworks, 138 distributed filesystems, 398-399 decoupling from query engines, 417 indiscriminately dumping data into, 415 use by MapReduce, 402 distributed systems, 273-312, 554 Byzantine faults, 304-306 cloud versus supercomputing, 275 detecting network faults, 280 faults and partial failures, 274-277 formalization of consensus, 365 impossibility results, 338, 353 issues with failover, 157 limitations of distributed transactions, 363 multi-datacenter, 169, 335 network problems, 277-286 quorums, relying on, 301 reasons for using, 145, 151 synchronized clocks, relying on, 291-295 system models, 306-310 use of clocks and time, 287 distributed transactions (see transactions) Django (web framework), 232 DNS (Domain Name System), 216, 372 Docker (container manager), 506 document data model, 30-42 comparison to relational model, 38-42 document references, 38, 403 document-oriented databases, 31 many-to-many relationships and joins, 36 multi-object transactions, need for, 231 versus relational model convergence of models, 41 data locality, 41 document-partitioned indexes, 206, 217, 411 domain-driven design (DDD), 457 DRBD (Distributed Replicated Block Device), 153 drift (clocks), 289 Drill (query engine), 93 Druid (database), 461 Dryad (dataflow engine), 421 dual writes, problems with, 452, 507 duplicates, suppression of, 517 (see also idempotence) using a unique ID, 518, 522 durability (transactions), 226, 554 duration (time), 287 measurement with monotonic clocks, 288 dynamic partitioning, 212 dynamically typed languages analogy to schema-on-read, 40 code generation and, 127 Dynamo-style databases (see leaderless replica‐ tion) E edges (in graphs), 49, 403 property graph model, 50 edit distance (full-text search), 88 effectively-once semantics, 476, 516 Index | 567 (see also exactly-once semantics) preservation of integrity, 525 elastic systems, 17 Elasticsearch (search server) document-partitioned indexes, 207 partition rebalancing, 211 percolator (stream search), 467 usage example, 4 use of Lucene, 79 ElephantDB (database), 413 Elm (programming language), 504, 512 encodings (data formats), 111-128 Avro, 122-127 binary variants of JSON and XML, 115 compatibility, 112 calling services, 136 using databases, 129-131 using message-passing, 138 defined, 113 JSON, XML, and CSV, 114 language-specific formats, 113 merits of schemas, 127 representations of data, 112 Thrift and Protocol Buffers, 117-121 end-to-end argument, 277, 519-520 checking integrity, 531 publish/subscribe streams, 512 enrichment (stream), 473 Enterprise JavaBeans (EJB), 134 entities (see vertices) epoch (consensus algorithms), 368 epoch (Unix timestamps), 288 equi-joins, 403 erasure coding (error correction), 398 Erlang OTP (actor framework), 139 error handling for network faults, 280 in transactions, 231 error-correcting codes, 277, 398 Esper (CEP engine), 466 etcd (coordination service), 370-373 linearizable operations, 333 locks and leader election, 330 quorum reads, 351 service discovery, 372 use of Raft algorithm, 349, 353 Ethereum (blockchain), 532 Ethernet (networks), 276, 278, 285 packet checksums, 306, 519 568 | Index Etherpad (collaborative editor), 170 ethics, 533-543 code of ethics and professional practice, 533 legislation and self-regulation, 542 predictive analytics, 533-536 amplifying bias, 534 feedback loops, 536 privacy and tracking, 536-543 consent and freedom of choice, 538 data as assets and power, 540 meaning of privacy, 539 surveillance, 537 respect, dignity, and agency, 543, 544 unintended consequences, 533, 536 ETL (extract-transform-load), 92, 405, 452, 554 use of Hadoop for, 416 event sourcing, 457-459 commands and events, 459 comparison to change data capture, 457 comparison to lambda architecture, 497 deriving current state from event log, 458 immutability and auditability, 459, 531 large, reliable data systems, 519, 526 Event Store (database), 458 event streams (see streams) events, 440 deciding on total order of, 493 deriving views from event log, 461 difference to commands, 459 event time versus processing time, 469, 477, 498 immutable, advantages of, 460, 531 ordering to capture causality, 493 reads as, 513 stragglers, 470, 498 timestamp of, in stream processing, 471 EventSource (browser API), 512 eventual consistency, 152, 162, 308, 322 (see also conflicts) and perpetual inconsistency, 525 evolvability, 21, 111 calling services, 136 graph-structured data, 52 of databases, 40, 129-131, 461, 497 of message-passing, 138 reprocessing data, 496, 498 schema evolution in Avro, 123 schema evolution in Thrift and Protocol Buffers, 120 schema-on-read, 39, 111, 128 exactly-once semantics, 360, 476, 516 parity with batch processors, 498 preservation of integrity, 525 exclusive mode (locks), 258 eXtended Architecture transactions (see XA transactions) extract-transform-load (see ETL) F Facebook Presto (query engine), 93 React, Flux, and Redux (user interface libra‐ ries), 512 social graphs, 49 Wormhole (change data capture), 455 fact tables, 93 failover, 157, 554 (see also leader-based replication) in leaderless replication, absence of, 178 leader election, 301, 348, 352 potential problems, 157 failures amplification by distributed transactions, 364, 495 failure detection, 280 automatic rebalancing causing cascading failures, 214 perfect failure detectors, 359 timeouts and unbounded delays, 282, 284 using ZooKeeper, 371 faults versus, 7 partial failures in distributed systems, 275-277, 310 fan-out (messaging systems), 11, 445 fault tolerance, 6-10, 555 abstractions for, 321 formalization in consensus, 365-369 use of replication, 367 human fault tolerance, 414 in batch processing, 406, 414, 422, 425 in log-based systems, 520, 524-526 in stream processing, 476-479 atomic commit, 477 idempotence, 478 maintaining derived state, 495 microbatching and checkpointing, 477 rebuilding state after a failure, 478 of distributed transactions, 362-364 transaction atomicity, 223, 354-361 faults, 6 Byzantine faults, 304-306 failures versus, 7 handled by transactions, 221 handling in supercomputers and cloud computing, 275 hardware, 7 in batch processing versus distributed data‐ bases, 417 in distributed systems, 274-277 introducing deliberately, 7, 280 network faults, 279-281 asymmetric faults, 300 detecting, 280 tolerance of, in multi-leader replication, 169 software errors, 8 tolerating (see fault tolerance) federated databases, 501 fence (CPU instruction), 338 fencing (preventing split brain), 158, 302-304 generating fencing tokens, 349, 370 properties of fencing tokens, 308 stream processors writing to databases, 478, 517 Fibre Channel (networks), 398 field tags (Thrift and Protocol Buffers), 119-121 file descriptors (Unix), 395 financial data, 460 Firebase (database), 456 Flink (processing framework), 421-423 dataflow APIs, 427 fault tolerance, 422, 477, 479 Gelly API (graph processing), 425 integration of batch and stream processing, 495, 498 machine learning, 428 query optimizer, 427 stream processing, 466 flow control, 282, 441, 555 FLP result (on consensus), 353 FlumeJava (dataflow library), 403, 427 followers, 152, 555 (see also leader-based replication) foreign keys, 38, 403 forward compatibility, 112 forward decay (algorithm), 16 Index | 569 Fossil (version control system), 463 shunning (deleting data), 463 FoundationDB (database) serializable transactions, 261, 265, 364 fractal trees, 83 full table scans, 403 full-text search, 555 and fuzzy indexes, 88 building search indexes, 411 Lucene storage engine, 79 functional reactive programming (FRP), 504 functional requirements, 22 futures (asynchronous operations), 135 fuzzy search (see similarity search) G garbage collection immutability and, 463 process pauses for, 14, 296-299, 301 (see also process pauses) genome analysis, 63, 429 geographically distributed datacenters, 145, 164, 278, 493 geospatial indexes, 87 Giraph (graph processing), 425 Git (version control system), 174, 342, 463 GitHub, postmortems, 157, 158, 309 global indexes (see term-partitioned indexes) GlusterFS (distributed filesystem), 398 GNU Coreutils (Linux), 394 GoldenGate (change data capture), 161, 170, 455 (see also Oracle) Google Bigtable (database) data model (see Bigtable data model) partitioning scheme, 199, 202 storage layout, 78 Chubby (lock service), 370 Cloud Dataflow (stream processor), 466, 477, 498 (see also Beam) Cloud Pub/Sub (messaging), 444, 448 Docs (collaborative editor), 170 Dremel (query engine), 93, 96 FlumeJava (dataflow library), 403, 427 GFS (distributed file system), 398 gRPC (RPC framework), 135 MapReduce (batch processing), 390 570 | Index (see also MapReduce) building search indexes, 411 task preemption, 418 Pregel (graph processing), 425 Spanner (see Spanner) TrueTime (clock API), 294 gossip protocol, 216 government use of data, 541 GPS (Global Positioning System) use for clock synchronization, 287, 290, 294, 295 GraphChi (graph processing), 426 graphs, 555 as data models, 49-63 example of graph-structured data, 49 property graphs, 50 RDF and triple-stores, 55-59 versus the network model, 60 processing and analysis, 424-426 fault tolerance, 425 Pregel processing model, 425 query languages Cypher, 52 Datalog, 60-63 recursive SQL queries, 53 SPARQL, 59-59 Gremlin (graph query language), 50 grep (Unix tool), 392 GROUP BY clause (SQL), 406 grouping records in MapReduce, 406 handling skew, 407 H Hadoop (data infrastructure) comparison to distributed databases, 390 comparison to MPP databases, 414-418 comparison to Unix, 413-414, 499 diverse processing models in ecosystem, 417 HDFS distributed filesystem (see HDFS) higher-level tools, 403 join algorithms, 403-410 (see also MapReduce) MapReduce (see MapReduce) YARN (see YARN) happens-before relationship, 340 capturing, 187 concurrency and, 186 hard disks access patterns, 84 detecting corruption, 519, 530 faults in, 7, 227 sequential write throughput, 75, 450 hardware faults, 7 hash indexes, 72-75 broadcast hash joins, 409 partitioned hash joins, 409 hash partitioning, 203-205, 217 consistent hashing, 204 problems with hash mod N, 210 range queries, 204 suitable hash functions, 203 with fixed number of partitions, 210 HAWQ (database), 428 HBase (database) bug due to lack of fencing, 302 bulk loading, 413 column-family data model, 41, 99 dynamic partitioning, 212 key-range partitioning, 202 log-structured storage, 78 request routing, 216 size-tiered compaction, 79 use of HDFS, 417 use of ZooKeeper, 370 HDFS (Hadoop Distributed File System), 398-399 (see also distributed filesystems) checking data integrity, 530 decoupling from query engines, 417 indiscriminately dumping data into, 415 metadata about datasets, 410 NameNode, 398 use by Flink, 479 use by HBase, 212 use by MapReduce, 402 HdrHistogram (numerical library), 16 head (Unix tool), 392 head vertex (property graphs), 51 head-of-line blocking, 15 heap files (databases), 86 Helix (cluster manager), 216 heterogeneous distributed transactions, 360, 364 heuristic decisions (in 2PC), 363 Hibernate (object-relational mapper), 30 hierarchical model, 36 high availability (see fault tolerance) high-frequency trading, 290, 299 high-performance computing (HPC), 275 hinted handoff, 183 histograms, 16 Hive (query engine), 419, 427 for data warehouses, 93 HCatalog and metastore, 410 map-side joins, 409 query optimizer, 427 skewed joins, 408 workflows, 403 Hollerith machines, 390 hopping windows (stream processing), 472 (see also windows) horizontal scaling (see scaling out) HornetQ (messaging), 137, 444 distributed transaction support, 361 hot spots, 201 due to celebrities, 205 for time-series data, 203 in batch processing, 407 relieving, 205 hot standbys (see leader-based replication) HTTP, use in APIs (see services) human errors, 9, 279, 414 HyperDex (database), 88 HyperLogLog (algorithm), 466 I I/O operations, waiting for, 297 IBM DB2 (database) distributed transaction support, 361 recursive query support, 54 serializable isolation, 242, 257 XML and JSON support, 30, 42 electromechanical card-sorting machines, 390 IMS (database), 36 imperative query APIs, 46 InfoSphere Streams (CEP engine), 466 MQ (messaging), 444 distributed transaction support, 361 System R (database), 222 WebSphere (messaging), 137 idempotence, 134, 478, 555 by giving operations unique IDs, 518, 522 idempotent operations, 517 immutability advantages of, 460, 531 Index | 571 deriving state from event log, 459-464 for crash recovery, 75 in B-trees, 82, 242 in event sourcing, 457 inputs to Unix commands, 397 limitations of, 463 Impala (query engine) for data warehouses, 93 hash joins, 409 native code generation, 428 use of HDFS, 417 impedance mismatch, 29 imperative languages, 42 setting element styles (example), 45 in doubt (transaction status), 358 holding locks, 362 orphaned transactions, 363 in-memory databases, 88 durability, 227 serial transaction execution, 253 incidents cascading failures, 9 crashes due to leap seconds, 290 data corruption and financial losses due to concurrency bugs, 233 data corruption on hard disks, 227 data loss due to last-write-wins, 173, 292 data on disks unreadable, 309 deleted items reappearing, 174 disclosure of sensitive data due to primary key reuse, 157 errors in transaction serializability, 529 gigabit network interface with 1 Kb/s throughput, 311 network faults, 279 network interface dropping only inbound packets, 279 network partitions and whole-datacenter failures, 275 poor handling of network faults, 280 sending message to ex-partner, 494 sharks biting undersea cables, 279 split brain due to 1-minute packet delay, 158, 279 vibrations in server rack, 14 violation of uniqueness constraint, 529 indexes, 71, 555 and snapshot isolation, 241 as derived data, 386, 499-504 572 | Index B-trees, 79-83 building in batch processes, 411 clustered, 86 comparison of B-trees and LSM-trees, 83-85 concatenated, 87 covering (with included columns), 86 creating, 500 full-text search, 88 geospatial, 87 hash, 72-75 index-range locking, 260 multi-column, 87 partitioning and secondary indexes, 206-209, 217 secondary, 85 (see also secondary indexes) problems with dual writes, 452, 491 SSTables and LSM-trees, 76-79 updating when data changes, 452, 467 Industrial Revolution, 541 InfiniBand (networks), 285 InfiniteGraph (database), 50 InnoDB (storage engine) clustered index on primary key, 86 not preventing lost updates, 245 preventing write skew, 248, 257 serializable isolation, 257 snapshot isolation support, 239 inside-out databases, 504 (see also unbundling databases) integrating different data systems (see data integration) integrity, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 in consensus formalization, 365 integrity checks, 530 (see also auditing) end-to-end, 519, 531 use of snapshot isolation, 238 maintaining despite software bugs, 529 Interface Definition Language (IDL), 117, 122 intermediate state, materialization of, 420-423 internet services, systems for implementing, 275 invariants, 225 (see also constraints) inversion of control, 396 IP (Internet Protocol) unreliability of, 277 ISDN (Integrated Services Digital Network), 284 isolation (in transactions), 225, 228, 555 correctness and, 515 for single-object writes, 230 serializability, 251-266 actual serial execution, 252-256 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 violating, 228 weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-237 snapshot isolation, 237-242 iterative processing, 424-426 J Java Database Connectivity (JDBC) distributed transaction support, 361 network drivers, 128 Java Enterprise Edition (EE), 134, 356, 361 Java Message Service (JMS), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 distributed transaction support, 361 message ordering, 446 Java Transaction API (JTA), 355, 361 Java Virtual Machine (JVM) bytecode generation, 428 garbage collection pauses, 296 process reuse in batch processors, 422 JavaScript in MapReduce querying, 46 setting element styles (example), 45 use in advanced queries, 48 Jena (RDF framework), 57 Jepsen (fault tolerance testing), 515 jitter (network delay), 284 joins, 555 by index lookup, 403 expressing as relational operators, 427 in relational and document databases, 34 MapReduce map-side joins, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 MapReduce reduce-side joins, 403-408 handling skew, 407 sort-merge joins, 405 parallel execution of, 415 secondary indexes and, 85 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 support in document databases, 42 JOTM (transaction coordinator), 356 JSON Avro schema representation, 122 binary variants, 115 for application data, issues with, 114 in relational databases, 30, 42 representing a résumé (example), 31 Juttle (query language), 504 K k-nearest neighbors, 429 Kafka (messaging), 137, 448 Kafka Connect (database integration), 457, 461 Kafka Streams (stream processor), 466, 467 fault tolerance, 479 leader-based replication, 153 log compaction, 456, 467 message offsets, 447, 478 request routing, 216 transaction support, 477 usage example, 4 Ketama (partitioning library), 213 key-value stores, 70 as batch process output, 412 hash indexes, 72-75 in-memory, 89 partitioning, 201-205 by hash of key, 203, 217 by key range, 202, 217 dynamic partitioning, 212 skew and hot spots, 205 Kryo (Java), 113 Kubernetes (cluster manager), 418, 506 L lambda architecture, 497 Lamport timestamps, 345 Index | 573 Large Hadron Collider (LHC), 64 last write wins (LWW), 173, 334 discarding concurrent writes, 186 problems with, 292 prone to lost updates, 246 late binding, 396 latency instability under two-phase locking, 259 network latency and resource utilization, 286 response time versus, 14 tail latency, 15, 207 leader-based replication, 152-161 (see also replication) failover, 157, 301 handling node outages, 156 implementation of replication logs change data capture, 454-457 (see also changelogs) statement-based, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 linearizability of operations, 333 locking and leader election, 330 log sequence number, 156, 449 read-scaling architecture, 161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 leaderless replication, 177-191 (see also replication) detecting concurrent writes, 184-191 capturing happens-before relationship, 187 happens-before relationship and concur‐ rency, 186 last write wins, 186 merging concurrently written values, 190 version vectors, 191 multi-datacenter, 184 quorums, 179-182 consistency limitations, 181-183, 334 sloppy quorums and hinted handoff, 183 read repair and anti-entropy, 178 leap seconds, 8, 290 in time-of-day clocks, 288 leases, 295 implementation with ZooKeeper, 370 574 | Index need for fencing, 302 ledgers, 460 distributed ledger technologies, 532 legacy systems, maintenance of, 18 less (Unix tool), 397 LevelDB (storage engine), 78 leveled compaction, 79 Levenshtein automata, 88 limping (partial failure), 311 linearizability, 324-338, 555 cost of, 335-338 CAP theorem, 336 memory on multi-core CPUs, 338 definition, 325-329 implementing with total order broadcast, 350 in ZooKeeper, 370 of derived data systems, 492, 524 avoiding coordination, 527 of different replication methods, 332-335 using quorums, 334 relying on, 330-332 constraints and uniqueness, 330 cross-channel timing dependencies, 331 locking and leader election, 330 stronger than causal consistency, 342 using to implement total order broadcast, 351 versus serializability, 329 LinkedIn Azkaban (workflow scheduler), 402 Databus (change data capture), 161, 455 Espresso (database), 31, 126, 130, 153, 216 Helix (cluster manager) (see Helix) profile (example), 30 reference to company entity (example), 34 (RPC framework), 135 Voldemort (database) (see Voldemort) Linux, leap second bug, 8, 290 liveness properties, 308 LMDB (storage engine), 82, 242 load approaches to coping with, 17 describing, 11 load testing, 16 load balancing (messaging), 444 local indexes (see document-partitioned indexes) locality (data access), 32, 41, 555 in batch processing, 400, 405, 421 in stateful clients, 170, 511 in stream processing, 474, 478, 508, 522 location transparency, 134 in the actor model, 138 locks, 556 deadlock, 258 distributed locking, 301-304, 330 fencing tokens, 303 implementation with ZooKeeper, 370 relation to consensus, 374 for transaction isolation in snapshot isolation, 239 in two-phase locking (2PL), 257-261 making operations atomic, 243 performance, 258 preventing dirty writes, 236 preventing phantoms with index-range locks, 260, 265 read locks (shared mode), 236, 258 shared mode and exclusive mode, 258 in two-phase commit (2PC) deadlock detection, 364 in-doubt transactions holding locks, 362 materializing conflicts with, 251 preventing lost updates by explicit locking, 244 log sequence number, 156, 449 logic programming languages, 504 logical clocks, 293, 343, 494 for read-after-write consistency, 164 logical logs, 160 logs (data structure), 71, 556 advantages of immutability, 460 compaction, 73, 79, 456, 460 for stream operator state, 479 creating using total order broadcast, 349 implementing uniqueness constraints, 522 log-based messaging, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 disk space usage, 450 replaying old messages, 451, 496, 498 slow consumers, 450 using logs for message storage, 447 log-structured storage, 71-79 log-structured merge tree (see LSMtrees) replication, 152, 158-161 change data capture, 454-457 (see also changelogs) coordination with snapshot, 156 logical (row-based) replication, 160 statement-based replication, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 scalability limits, 493 loose coupling, 396, 419, 502 lost updates (see updates) LSM-trees (indexes), 78-79 comparison to B-trees, 83-85 Lucene (storage engine), 79 building indexes in batch processes, 411 similarity search, 88 Luigi (workflow scheduler), 402 LWW (see last write wins) M machine learning ethical considerations, 534 (see also ethics) iterative processing, 424 models derived from training data, 505 statistical and numerical algorithms, 428 MADlib (machine learning toolkit), 428 magic scaling sauce, 18 Mahout (machine learning toolkit), 428 maintainability, 18-22, 489 defined, 23 design principles for software systems, 19 evolvability (see evolvability) operability, 19 simplicity and managing complexity, 20 many-to-many relationships in document model versus relational model, 39 modeling as graphs, 49 many-to-one and many-to-many relationships, 33-36 many-to-one relationships, 34 MapReduce (batch processing), 390, 399-400 accessing external services within job, 404, 412 comparison to distributed databases designing for frequent faults, 417 diversity of processing models, 416 diversity of storage, 415 Index | 575 comparison to stream processing, 464 comparison to Unix, 413-414 disadvantages and limitations of, 419 fault tolerance, 406, 414, 422 higher-level tools, 403, 426 implementation in Hadoop, 400-403 the shuffle, 402 implementation in MongoDB, 46-48 machine learning, 428 map-side processing, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 mapper and reducer functions, 399 materialization of intermediate state, 419-423 output of batch workflows, 411-413 building search indexes, 411 key-value stores, 412 reduce-side processing, 403-408 analysis of user activity events (exam‐ ple), 404 grouping records by same key, 406 handling skew, 407 sort-merge joins, 405 workflows, 402 marshalling (see encoding) massively parallel processing (MPP), 216 comparison to composing storage technolo‐ gies, 502 comparison to Hadoop, 414-418, 428 master-master replication (see multi-leader replication) master-slave replication (see leader-based repli‐ cation) materialization, 556 aggregate values, 101 conflicts, 251 intermediate state (batch processing), 420-423 materialized views, 101 as derived data, 386, 499-504 maintaining, using stream processing, 467, 475 Maven (Java build tool), 428 Maxwell (change data capture), 455 mean, 14 media monitoring, 467 median, 14 576 | Index meeting room booking (example), 249, 259, 521 membership services, 372 Memcached (caching server), 4, 89 memory in-memory databases, 88 durability, 227 serial transaction execution, 253 in-memory representation of data, 112 random bit-flips in, 529 use by indexes, 72, 77 memory barrier (CPU instruction), 338 MemSQL (database) in-memory storage, 89 read committed isolation, 236 memtable (in LSM-trees), 78 Mercurial (version control system), 463 merge joins, MapReduce map-side, 410 mergeable persistent data structures, 174 merging sorted files, 76, 402, 405 Merkle trees, 532 Mesos (cluster manager), 418, 506 message brokers (see messaging systems) message-passing, 136-139 advantages over direct RPC, 137 distributed actor frameworks, 138 evolvability, 138 MessagePack (encoding format), 116 messages exactly-once semantics, 360, 476 loss of, 442 using total order broadcast, 348 messaging systems, 440-451 (see also streams) backpressure, buffering, or dropping mes‐ sages, 441 brokerless messaging, 442 event logs, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 replaying old messages, 451, 496, 498 slow consumers, 450 message brokers, 443-446 acknowledgements and redelivery, 445 comparison to event logs, 448, 451 multiple consumers of same topic, 444 reliability, 442 uniqueness in log-based messaging, 522 Meteor (web framework), 456 microbatching, 477, 495 microservices, 132 (see also services) causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 Microsoft Azure Service Bus (messaging), 444 Azure Storage, 155, 398 Azure Stream Analytics, 466 DCOM (Distributed Component Object Model), 134 MSDTC (transaction coordinator), 356 Orleans (see Orleans) SQL Server (see SQL Server) migrating (rewriting) data, 40, 130, 461, 497 modulus operator (%), 210 MongoDB (database) aggregation pipeline, 48 atomic operations, 243 BSON, 41 document data model, 31 hash partitioning (sharding), 203-204 key-range partitioning, 202 lack of join support, 34, 42 leader-based replication, 153 MapReduce support, 46, 400 oplog parsing, 455, 456 partition splitting, 212 request routing, 216 secondary indexes, 207 Mongoriver (change data capture), 455 monitoring, 10, 19 monotonic clocks, 288 monotonic reads, 164 MPP (see massively parallel processing) MSMQ (messaging), 361 multi-column indexes, 87 multi-leader replication, 168-177 (see also replication) handling write conflicts, 171 conflict avoidance, 172 converging toward a consistent state, 172 custom conflict resolution logic, 173 determining what is a conflict, 174 linearizability, lack of, 333 replication topologies, 175-177 use cases, 168 clients with offline operation, 170 collaborative editing, 170 multi-datacenter replication, 168, 335 multi-object transactions, 228 need for, 231 Multi-Paxos (total order broadcast), 367 multi-table index cluster tables (Oracle), 41 multi-tenancy, 284 multi-version concurrency control (MVCC), 239, 266 detecting stale MVCC reads, 263 indexes and snapshot isolation, 241 mutual exclusion, 261 (see also locks) MySQL (database) binlog coordinates, 156 binlog parsing for change data capture, 455 circular replication topology, 175 consistent snapshots, 156 distributed transaction support, 361 InnoDB storage engine (see InnoDB) JSON support, 30, 42 leader-based replication, 153 performance of XA transactions, 360 row-based replication, 160 schema changes in, 40 snapshot isolation support, 242 (see also InnoDB) statement-based replication, 159 Tungsten Replicator (multi-leader replica‐ tion), 170 conflict detection, 177 N nanomsg (messaging library), 442 Narayana (transaction coordinator), 356 NATS (messaging), 137 near-real-time (nearline) processing, 390 (see also stream processing) Neo4j (database) Cypher query language, 52 graph data model, 50 Nephele (dataflow engine), 421 netcat (Unix tool), 397 Netflix Chaos Monkey, 7, 280 Network Attached Storage (NAS), 146, 398 network model, 36 Index | 577 graph databases versus, 60 imperative query APIs, 46 Network Time Protocol (see NTP) networks congestion and queueing, 282 datacenter network topologies, 276 faults (see faults) linearizability and network delays, 338 network partitions, 279, 337 timeouts and unbounded delays, 281 next-key locking, 260 nodes (in graphs) (see vertices) nodes (processes), 556 handling outages in leader-based replica‐ tion, 156 system models for failure, 307 noisy neighbors, 284 nonblocking atomic commit, 359 nondeterministic operations accidental nondeterminism, 423 partial failures in distributed systems, 275 nonfunctional requirements, 22 nonrepeatable reads, 238 (see also read skew) normalization (data representation), 33, 556 executing joins, 39, 42, 403 foreign key references, 231 in systems of record, 386 versus denormalization, 462 NoSQL, 29, 499 transactions and, 223 Notation3 (N3), 56 npm (package manager), 428 NTP (Network Time Protocol), 287 accuracy, 289, 293 adjustments to monotonic clocks, 289 multiple server addresses, 306 numbers, in XML and JSON encodings, 114 O object-relational mapping (ORM) frameworks, 30 error handling and aborted transactions, 232 unsafe read-modify-write cycle code, 244 object-relational mismatch, 29 observer pattern, 506 offline systems, 390 (see also batch processing) 578 | Index stateful, offline-capable clients, 170, 511 offline-first applications, 511 offsets consumer offsets in partitioned logs, 449 messages in partitioned logs, 447 OLAP (online analytic processing), 91, 556 data cubes, 102 OLTP (online transaction processing), 90, 556 analytics queries versus, 411 workload characteristics, 253 one-to-many relationships, 30 JSON representation, 32 online systems, 389 (see also services) Oozie (workflow scheduler), 402 OpenAPI (service definition format), 133 OpenStack Nova (cloud infrastructure) use of ZooKeeper, 370 Swift (object storage), 398 operability, 19 operating systems versus databases, 499 operation identifiers, 518, 522 operational transformation, 174 operators, 421 flow of data between, 424 in stream processing, 464 optimistic concurrency control, 261 Oracle (database) distributed transaction support, 361 GoldenGate (change data capture), 161, 170, 455 lack of serializability, 226 leader-based replication, 153 multi-table index cluster tables, 41 not preventing write skew, 248 partitioned indexes, 209 PL/SQL language, 255 preventing lost updates, 245 read committed isolation, 236 Real Application Clusters (RAC), 330 recursive query support, 54 snapshot isolation support, 239, 242 TimesTen (in-memory database), 89 WAL-based replication, 160 XML support, 30 ordering, 339-352 by sequence numbers, 343-348 causal ordering, 339-343 partial order, 341 limits of total ordering, 493 total order broadcast, 348-352 Orleans (actor framework), 139 outliers (response time), 14 Oz (programming language), 504 P package managers, 428, 505 packet switching, 285 packets corruption of, 306 sending via UDP, 442 PageRank (algorithm), 49, 424 paging (see virtual memory) ParAccel (database), 93 parallel databases (see massively parallel pro‐ cessing) parallel execution of graph analysis algorithms, 426 queries in MPP databases, 216 Parquet (data format), 96, 131 (see also column-oriented storage) use in Hadoop, 414 partial failures, 275, 310 limping, 311 partial order, 341 partitioning, 199-218, 556 and replication, 200 in batch processing, 429 multi-partition operations, 514 enforcing constraints, 522 secondary index maintenance, 495 of key-value data, 201-205 by key range, 202 skew and hot spots, 205 rebalancing partitions, 209-214 automatic or manual rebalancing, 213 problems with hash mod N, 210 using dynamic partitioning, 212 using fixed number of partitions, 210 using N partitions per node, 212 replication and, 147 request routing, 214-216 secondary indexes, 206-209 document-based partitioning, 206 term-based partitioning, 208 serial execution of transactions and, 255 Paxos (consensus algorithm), 366 ballot number, 368 Multi-Paxos (total order broadcast), 367 percentiles, 14, 556 calculating efficiently, 16 importance of high percentiles, 16 use in service level agreements (SLAs), 15 Percona XtraBackup (MySQL tool), 156 performance describing, 13 of distributed transactions, 360 of in-memory databases, 89 of linearizability, 338 of multi-leader replication, 169 perpetual inconsistency, 525 pessimistic concurrency control, 261 phantoms (transaction isolation), 250 materializing conflicts, 251 preventing, in serializability, 259 physical clocks (see clocks) pickle (Python), 113 Pig (dataflow language), 419, 427 replicated joins, 409 skewed joins, 407 workflows, 403 Pinball (workflow scheduler), 402 pipelined execution, 423 in Unix, 394 point in time, 287 polyglot persistence, 29 polystores, 501 PostgreSQL (database) BDR (multi-leader replication), 170 causal ordering of writes, 177 Bottled Water (change data capture), 455 Bucardo (trigger-based replication), 161, 173 distributed transaction support, 361 foreign data wrappers, 501 full text search support, 490 leader-based replication, 153 log sequence number, 156 MVCC implementation, 239, 241 PL/pgSQL language, 255 PostGIS geospatial indexes, 87 preventing lost updates, 245 preventing write skew, 248, 261 read committed isolation, 236 recursive query support, 54 representing graphs, 51 Index | 579 serializable snapshot isolation (SSI), 261 snapshot isolation support, 239, 242 WAL-based replication, 160 XML and JSON support, 30, 42 pre-splitting, 212 Precision Time Protocol (PTP), 290 predicate locks, 259 predictive analytics, 533-536 amplifying bias, 534 ethics of (see ethics) feedback loops, 536 preemption of datacenter resources, 418 of threads, 298 Pregel processing model, 425 primary keys, 85, 556 compound primary key (Cassandra), 204 primary-secondary replication (see leaderbased replication) privacy, 536-543 consent and freedom of choice, 538 data as assets and power, 540 deleting data, 463 ethical considerations (see ethics) legislation and self-regulation, 542 meaning of, 539 surveillance, 537 tracking behavioral data, 536 probabilistic algorithms, 16, 466 process pauses, 295-299 processing time (of events), 469 producers (message streams), 440 programming languages dataflow languages, 504 for stored procedures, 255 functional reactive programming (FRP), 504 logic programming, 504 Prolog (language), 61 (see also Datalog) promises (asynchronous operations), 135 property graphs, 50 Cypher query language, 52 Protocol Buffers (data format), 117-121 field tags and schema evolution, 120 provenance of data, 531 publish/subscribe model, 441 publishers (message streams), 440 punch card tabulating machines, 390 580 | Index pure functions, 48 putting computation near data, 400 Q Qpid (messaging), 444 quality of service (QoS), 285 Quantcast File System (distributed filesystem), 398 query languages, 42-48 aggregation pipeline, 48 CSS and XSL, 44 Cypher, 52 Datalog, 60 Juttle, 504 MapReduce querying, 46-48 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 query optimizers, 37, 427 queueing delays (networks), 282 head-of-line blocking, 15 latency and response time, 14 queues (messaging), 137 quorums, 179-182, 556 for leaderless replication, 179 in consensus algorithms, 368 limitations of consistency, 181-183, 334 making decisions in distributed systems, 301 monitoring staleness, 182 multi-datacenter replication, 184 relying on durability, 309 sloppy quorums and hinted handoff, 183 R R-trees (indexes), 87 RabbitMQ (messaging), 137, 444 leader-based replication, 153 race conditions, 225 (see also concurrency) avoiding with linearizability, 331 caused by dual writes, 452 dirty writes, 235 in counter increments, 235 lost updates, 242-246 preventing with event logs, 462, 507 preventing with serializable isolation, 252 write skew, 246-251 Raft (consensus algorithm), 366 sensitivity to network problems, 369 term number, 368 use in etcd, 353 RAID (Redundant Array of Independent Disks), 7, 398 railways, schema migration on, 496 RAMCloud (in-memory storage), 89 ranking algorithms, 424 RDF (Resource Description Framework), 57 querying with SPARQL, 59 RDMA (Remote Direct Memory Access), 276 read committed isolation level, 234-237 implementing, 236 multi-version concurrency control (MVCC), 239 no dirty reads, 234 no dirty writes, 235 read path (derived data), 509 read repair (leaderless replication), 178 for linearizability, 335 read replicas (see leader-based replication) read skew (transaction isolation), 238, 266 as violation of causality, 340 read-after-write consistency, 163, 524 cross-device, 164 read-modify-write cycle, 243 read-scaling architecture, 161 reads as events, 513 real-time collaborative editing, 170 near-real-time processing, 390 (see also stream processing) publish/subscribe dataflow, 513 response time guarantees, 298 time-of-day clocks, 288 rebalancing partitions, 209-214, 556 (see also partitioning) automatic or manual rebalancing, 213 dynamic partitioning, 212 fixed number of partitions, 210 fixed number of partitions per node, 212 problems with hash mod N, 210 recency guarantee, 324 recommendation engines batch process outputs, 412 batch workflows, 403, 420 iterative processing, 424 statistical and numerical algorithms, 428 records, 399 events in stream processing, 440 recursive common table expressions (SQL), 54 redelivery (messaging), 445 Redis (database) atomic operations, 243 durability, 89 Lua scripting, 255 single-threaded execution, 253 usage example, 4 redundancy hardware components, 7 of derived data, 386 (see also derived data) Reed–Solomon codes (error correction), 398 refactoring, 22 (see also evolvability) regions (partitioning), 199 register (data structure), 325 relational data model, 28-42 comparison to document model, 38-42 graph queries in SQL, 53 in-memory databases with, 89 many-to-one and many-to-many relation‐ ships, 33 multi-object transactions, need for, 231 NoSQL as alternative to, 29 object-relational mismatch, 29 relational algebra and SQL, 42 versus document model convergence of models, 41 data locality, 41 relational databases eventual consistency, 162 history, 28 leader-based replication, 153 logical logs, 160 philosophy compared to Unix, 499, 501 schema changes, 40, 111, 130 statement-based replication, 158 use of B-tree indexes, 80 relationships (see edges) reliability, 6-10, 489 building a reliable system from unreliable components, 276 defined, 6, 22 hardware faults, 7 human errors, 9 importance of, 10 of messaging systems, 442 Index | 581 software errors, 8 Remote Method Invocation (Java RMI), 134 remote procedure calls (RPCs), 134-136 (see also services) based on futures, 135 data encoding and evolution, 136 issues with, 134 using Avro, 126, 135 using Thrift, 135 versus message brokers, 137 repeatable reads (transaction isolation), 242 replicas, 152 replication, 151-193, 556 and durability, 227 chain replication, 155 conflict resolution and, 246 consistency properties, 161-167 consistent prefix reads, 165 monotonic reads, 164 reading your own writes, 162 in distributed filesystems, 398 leaderless, 177-191 detecting concurrent writes, 184-191 limitations of quorum consistency, 181-183, 334 sloppy quorums and hinted handoff, 183 monitoring staleness, 182 multi-leader, 168-177 across multiple datacenters, 168, 335 handling write conflicts, 171-175 replication topologies, 175-177 partitioning and, 147, 200 reasons for using, 145, 151 single-leader, 152-161 failover, 157 implementation of replication logs, 158-161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 state machine replication, 349, 452 using erasure coding, 398 with heterogeneous data systems, 453 replication logs (see logs) reprocessing data, 496, 498 (see also evolvability) from log-based messaging, 451 request routing, 214-216 582 | Index approaches to, 214 parallel query execution, 216 resilient systems, 6 (see also fault tolerance) response time as performance metric for services, 13, 389 guarantees on, 298 latency versus, 14 mean and percentiles, 14 user experience, 15 responsibility and accountability, 535 REST (Representational State Transfer), 133 (see also services) RethinkDB (database) document data model, 31 dynamic partitioning, 212 join support, 34, 42 key-range partitioning, 202 leader-based replication, 153 subscribing to changes, 456 Riak (database) Bitcask storage engine, 72 CRDTs, 174, 191 dotted version vectors, 191 gossip protocol, 216 hash partitioning, 203-204, 211 last-write-wins conflict resolution, 186 leaderless replication, 177 LevelDB storage engine, 78 linearizability, lack of, 335 multi-datacenter support, 184 preventing lost updates across replicas, 246 rebalancing, 213 search feature, 209 secondary indexes, 207 siblings (concurrently written values), 190 sloppy quorums, 184 ring buffers, 450 Ripple (cryptocurrency), 532 rockets, 10, 36, 305 RocksDB (storage engine), 78 leveled compaction, 79 rollbacks (transactions), 222 rolling upgrades, 8, 112 routing (see request routing) row-oriented storage, 96 row-based replication, 160 rowhammer (memory corruption), 529 RPCs (see remote procedure calls) Rubygems (package manager), 428 rules (Datalog), 61 S safety and liveness properties, 308 in consensus algorithms, 366 in transactions, 222 sagas (see compensating transactions) Samza (stream processor), 466, 467 fault tolerance, 479 streaming SQL support, 466 sandboxes, 9 SAP HANA (database), 93 scalability, 10-18, 489 approaches for coping with load, 17 defined, 22 describing load, 11 describing performance, 13 partitioning and, 199 replication and, 161 scaling up versus scaling out, 146 scaling out, 17, 146 (see also shared-nothing architecture) scaling up, 17, 146 scatter/gather approach, querying partitioned databases, 207 SCD (slowly changing dimension), 476 schema-on-read, 39 comparison to evolvable schema, 128 in distributed filesystems, 415 schema-on-write, 39 schemaless databases (see schema-on-read) schemas, 557 Avro, 122-127 reader determining writer’s schema, 125 schema evolution, 123 dynamically generated, 126 evolution of, 496 affecting application code, 111 compatibility checking, 126 in databases, 129-131 in message-passing, 138 in service calls, 136 flexibility in document model, 39 for analytics, 93-95 for JSON and XML, 115 merits of, 127 schema migration on railways, 496 Thrift and Protocol Buffers, 117-121 schema evolution, 120 traditional approach to design, fallacy in, 462 searches building search indexes in batch processes, 411 k-nearest neighbors, 429 on streams, 467 partitioned secondary indexes, 206 secondaries (see leader-based replication) secondary indexes, 85, 557 partitioning, 206-209, 217 document-partitioned, 206 index maintenance, 495 term-partitioned, 208 problems with dual writes, 452, 491 updating, transaction isolation and, 231 secondary sorts, 405 sed (Unix tool), 392 self-describing files, 127 self-joins, 480 self-validating systems, 530 semantic web, 57 semi-synchronous replication, 154 sequence number ordering, 343-348 generators, 294, 344 insufficiency for enforcing constraints, 347 Lamport timestamps, 345 use of timestamps, 291, 295, 345 sequential consistency, 351 serializability, 225, 233, 251-266, 557 linearizability versus, 329 pessimistic versus optimistic concurrency control, 261 serial execution, 252-256 partitioning, 255 using stored procedures, 253, 349 serializable snapshot isolation (SSI), 261-266 detecting stale MVCC reads, 263 detecting writes that affect prior reads, 264 distributed execution, 265, 364 performance of SSI, 265 preventing write skew, 262-265 two-phase locking (2PL), 257-261 index-range locks, 260 performance, 258 Serializable (Java), 113 Index | 583 serialization, 113 (see also encoding) service discovery, 135, 214, 372 using DNS, 216, 372 service level agreements (SLAs), 15 service-oriented architecture (SOA), 132 (see also services) services, 131-136 microservices, 132 causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 remote procedure calls (RPCs), 134-136 issues with, 134 similarity to databases, 132 web services, 132, 135 session windows (stream processing), 472 (see also windows) sessionization, 407 sharding (see partitioning) shared mode (locks), 258 shared-disk architecture, 146, 398 shared-memory architecture, 146 shared-nothing architecture, 17, 146-147, 557 (see also replication) distributed filesystems, 398 (see also distributed filesystems) partitioning, 199 use of network, 277 sharks biting undersea cables, 279 counting (example), 46-48 finding (example), 42 website about (example), 44 shredding (in relational model), 38 siblings (concurrent values), 190, 246 (see also conflicts) similarity search edit distance, 88 genome data, 63 k-nearest neighbors, 429 single-leader replication (see leader-based rep‐ lication) single-threaded execution, 243, 252 in batch processing, 406, 421, 426 in stream processing, 448, 463, 522 size-tiered compaction, 79 skew, 557 584 | Index clock skew, 291-294, 334 in transaction isolation read skew, 238, 266 write skew, 246-251, 262-265 (see also write skew) meanings of, 238 unbalanced workload, 201 compensating for, 205 due to celebrities, 205 for time-series data, 203 in batch processing, 407 slaves (see leader-based replication) sliding windows (stream processing), 472 (see also windows) sloppy quorums, 183 (see also quorums) lack of linearizability, 334 slowly changing dimension (data warehouses), 476 smearing (leap seconds adjustments), 290 snapshots (databases) causal consistency, 340 computing derived data, 500 in change data capture, 455 serializable snapshot isolation (SSI), 261-266, 329 setting up a new replica, 156 snapshot isolation and repeatable read, 237-242 implementing with MVCC, 239 indexes and MVCC, 241 visibility rules, 240 synchronized clocks for global snapshots, 294 snowflake schemas, 95 SOAP, 133 (see also services) evolvability, 136 software bugs, 8 maintaining integrity, 529 solid state drives (SSDs) access patterns, 84 detecting corruption, 519, 530 faults in, 227 sequential write throughput, 75 Solr (search server) building indexes in batch processes, 411 document-partitioned indexes, 207 request routing, 216 usage example, 4 use of Lucene, 79 sort (Unix tool), 392, 394, 395 sort-merge joins (MapReduce), 405 Sorted String Tables (see SSTables) sorting sort order in column storage, 99 source of truth (see systems of record) Spanner (database) data locality, 41 snapshot isolation using clocks, 295 TrueTime API, 294 Spark (processing framework), 421-423 bytecode generation, 428 dataflow APIs, 427 fault tolerance, 422 for data warehouses, 93 GraphX API (graph processing), 425 machine learning, 428 query optimizer, 427 Spark Streaming, 466 microbatching, 477 stream processing on top of batch process‐ ing, 495 SPARQL (query language), 59 spatial algorithms, 429 split brain, 158, 557 in consensus algorithms, 352, 367 preventing, 322, 333 using fencing tokens to avoid, 302-304 spreadsheets, dataflow programming capabili‐ ties, 504 SQL (Structured Query Language), 21, 28, 43 advantages and limitations of, 416 distributed query execution, 48 graph queries in, 53 isolation levels standard, issues with, 242 query execution on Hadoop, 416 résumé (example), 30 SQL injection vulnerability, 305 SQL on Hadoop, 93 statement-based replication, 158 stored procedures, 255 SQL Server (database) data warehousing support, 93 distributed transaction support, 361 leader-based replication, 153 preventing lost updates, 245 preventing write skew, 248, 257 read committed isolation, 236 recursive query support, 54 serializable isolation, 257 snapshot isolation support, 239 T-SQL language, 255 XML support, 30 SQLstream (stream analytics), 466 SSDs (see solid state drives) SSTables (storage format), 76-79 advantages over hash indexes, 76 concatenated index, 204 constructing and maintaining, 78 making LSM-Tree from, 78 staleness (old data), 162 cross-channel timing dependencies, 331 in leaderless databases, 178 in multi-version concurrency control, 263 monitoring for, 182 of client state, 512 versus linearizability, 324 versus timeliness, 524 standbys (see leader-based replication) star replication topologies, 175 star schemas, 93-95 similarity to event sourcing, 458 Star Wars analogy (event time versus process‐ ing time), 469 state derived from log of immutable events, 459 deriving current state from the event log, 458 interplay between state changes and appli‐ cation code, 507 maintaining derived state, 495 maintenance by stream processor in streamstream joins, 473 observing derived state, 509-515 rebuilding after stream processor failure, 478 separation of application code and, 505 state machine replication, 349, 452 statement-based replication, 158 statically typed languages analogy to schema-on-write, 40 code generation and, 127 statistical and numerical algorithms, 428 StatsD (metrics aggregator), 442 stdin, stdout, 395, 396 Stellar (cryptocurrency), 532 Index | 585 stock market feeds, 442 STONITH (Shoot The Other Node In The Head), 158 stop-the-world (see garbage collection) storage composing data storage technologies, 499-504 diversity of, in MapReduce, 415 Storage Area Network (SAN), 146, 398 storage engines, 69-104 column-oriented, 95-101 column compression, 97-99 defined, 96 distinction between column families and, 99 Parquet, 96, 131 sort order in, 99-100 writing to, 101 comparing requirements for transaction processing and analytics, 90-96 in-memory storage, 88 durability, 227 row-oriented, 70-90 B-trees, 79-83 comparing B-trees and LSM-trees, 83-85 defined, 96 log-structured, 72-79 stored procedures, 161, 253-255, 557 and total order broadcast, 349 pros and cons of, 255 similarity to stream processors, 505 Storm (stream processor), 466 distributed RPC, 468, 514 Trident state handling, 478 straggler events, 470, 498 stream processing, 464-481, 557 accessing external services within job, 474, 477, 478, 517 combining with batch processing lambda architecture, 497 unifying technologies, 498 comparison to batch processing, 464 complex event processing (CEP), 465 fault tolerance, 476-479 atomic commit, 477 idempotence, 478 microbatching and checkpointing, 477 rebuilding state after a failure, 478 for data integration, 494-498 586 | Index maintaining derived state, 495 maintenance of materialized views, 467 messaging systems (see messaging systems) reasoning about time, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 types of windows, 472 relation to databases (see streams) relation to services, 508 search on streams, 467 single-threaded execution, 448, 463 stream analytics, 466 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 streams, 440-451 end-to-end, pushing events to clients, 512 messaging systems (see messaging systems) processing (see stream processing) relation to databases, 451-464 (see also changelogs) API support for change streams, 456 change data capture, 454-457 derivative of state by time, 460 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 topics, 440 strict serializability, 329 strong consistency (see linearizability) strong one-copy serializability, 329 subjects, predicates, and objects (in triplestores), 55 subscribers (message streams), 440 (see also consumers) supercomputers, 275 surveillance, 537 (see also privacy) Swagger (service definition format), 133 swapping to disk (see virtual memory) synchronous networks, 285, 557 comparison to asynchronous networks, 284 formal model, 307 synchronous replication, 154, 557 chain replication, 155 conflict detection, 172 system models, 300, 306-310 assumptions in, 528 correctness of algorithms, 308 mapping to the real world, 309 safety and liveness, 308 systems of record, 386, 557 change data capture, 454, 491 treating event log as, 460 systems thinking, 536 T t-digest (algorithm), 16 table-table joins, 474 Tableau (data visualization software), 416 tail (Unix tool), 447 tail vertex (property graphs), 51 Tajo (query engine), 93 Tandem NonStop SQL (database), 200 TCP (Transmission Control Protocol), 277 comparison to circuit switching, 285 comparison to UDP, 283 connection failures, 280 flow control, 282, 441 packet checksums, 306, 519, 529 reliability and duplicate suppression, 517 retransmission timeouts, 284 use for transaction sessions, 229 telemetry (see monitoring) Teradata (database), 93, 200 term-partitioned indexes, 208, 217 termination (consensus), 365 Terrapin (database), 413 Tez (dataflow engine), 421-423 fault tolerance, 422 support by higher-level tools, 427 thrashing (out of memory), 297 threads (concurrency) actor model, 138, 468 (see also message-passing) atomic operations, 223 background threads, 73, 85 execution pauses, 286, 296-298 memory barriers, 338 preemption, 298 single (see single-threaded execution) three-phase commit, 359 Thrift (data format), 117-121 BinaryProtocol, 118 CompactProtocol, 119 field tags and schema evolution, 120 throughput, 13, 390 TIBCO, 137 Enterprise Message Service, 444 StreamBase (stream analytics), 466 time concurrency and, 187 cross-channel timing dependencies, 331 in distributed systems, 287-299 (see also clocks) clock synchronization and accuracy, 289 relying on synchronized clocks, 291-295 process pauses, 295-299 reasoning about, in stream processors, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 timestamp of events, 471 types of windows, 472 system models for distributed systems, 307 time-dependence in stream joins, 475 time-of-day clocks, 288 timeliness, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 timeouts, 279, 557 dynamic configuration of, 284 for failover, 158 length of, 281 timestamps, 343 assigning to events in stream processing, 471 for read-after-write consistency, 163 for transaction ordering, 295 insufficiency for enforcing constraints, 347 key range partitioning by, 203 Lamport, 345 logical, 494 ordering events, 291, 345 Titan (database), 50 tombstones, 74, 191, 456 topics (messaging), 137, 440 total order, 341, 557 limits of, 493 sequence numbers or timestamps, 344 total order broadcast, 348-352, 493, 522 consensus algorithms and, 366-368 Index | 587 implementation in ZooKeeper and etcd, 370 implementing with linearizable storage, 351 using, 349 using to implement linearizable storage, 350 tracking behavioral data, 536 (see also privacy) transaction coordinator (see coordinator) transaction manager (see coordinator) transaction processing, 28, 90-95 comparison to analytics, 91 comparison to data warehousing, 93 transactions, 221-267, 558 ACID properties of, 223 atomicity, 223 consistency, 224 durability, 226 isolation, 225 compensating (see compensating transac‐ tions) concept of, 222 distributed transactions, 352-364 avoiding, 492, 502, 521-528 failure amplification, 364, 495 in doubt/uncertain status, 358, 362 two-phase commit, 354-359 use of, 360-361 XA transactions, 361-364 OLTP versus analytics queries, 411 purpose of, 222 serializability, 251-266 actual serial execution, 252-256 pessimistic versus optimistic concur‐ rency control, 261 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 single-object and multi-object, 228-232 handling errors and aborts, 231 need for multi-object transactions, 231 single-object writes, 230 snapshot isolation (see snapshots) weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-238 transitive closure (graph algorithm), 424 trie (data structure), 88 triggers (databases), 161, 441 implementing change data capture, 455 implementing replication, 161 588 | Index triple-stores, 55-59 SPARQL query language, 59 tumbling windows (stream processing), 472 (see also windows) in microbatching, 477 tuple spaces (programming model), 507 Turtle (RDF data format), 56 Twitter constructing home timelines (example), 11, 462, 474, 511 DistributedLog (event log), 448 Finagle (RPC framework), 135 Snowflake (sequence number generator), 294 Summingbird (processing library), 497 two-phase commit (2PC), 353, 355-359, 558 confusion with two-phase locking, 356 coordinator failure, 358 coordinator recovery, 363 how it works, 357 issues in practice, 363 performance cost, 360 transactions holding locks, 362 two-phase locking (2PL), 257-261, 329, 558 confusion with two-phase commit, 356 index-range locks, 260 performance of, 258 type checking, dynamic versus static, 40 U UDP (User Datagram Protocol) comparison to TCP, 283 multicast, 442 unbounded datasets, 439, 558 (see also streams) unbounded delays, 558 in networks, 282 process pauses, 296 unbundling databases, 499-515 composing data storage technologies, 499-504 federation versus unbundling, 501 need for high-level language, 503 designing applications around dataflow, 504-509 observing derived state, 509-515 materialized views and caching, 510 multi-partition data processing, 514 pushing state changes to clients, 512 uncertain (transaction status) (see in doubt) uniform consensus, 365 (see also consensus) uniform interfaces, 395 union type (in Avro), 125 uniq (Unix tool), 392 uniqueness constraints asynchronously checked, 526 requiring consensus, 521 requiring linearizability, 330 uniqueness in log-based messaging, 522 Unix philosophy, 394-397 command-line batch processing, 391-394 Unix pipes versus dataflow engines, 423 comparison to Hadoop, 413-414 comparison to relational databases, 499, 501 comparison to stream processing, 464 composability and uniform interfaces, 395 loose coupling, 396 pipes, 394 relation to Hadoop, 499 UPDATE statement (SQL), 40 updates preventing lost updates, 242-246 atomic write operations, 243 automatically detecting lost updates, 245 compare-and-set operations, 245 conflict resolution and replication, 246 using explicit locking, 244 preventing write skew, 246-251 V validity (consensus), 365 vBuckets (partitioning), 199 vector clocks, 191 (see also version vectors) vectorized processing, 99, 428 verification, 528-533 avoiding blind trust, 530 culture of, 530 designing for auditability, 531 end-to-end integrity checks, 531 tools for auditable data systems, 532 version control systems, reliance on immutable data, 463 version vectors, 177, 191 capturing causal dependencies, 343 versus vector clocks, 191 Vertica (database), 93 handling writes, 101 replicas using different sort orders, 100 vertical scaling (see scaling up) vertices (in graphs), 49 property graph model, 50 Viewstamped Replication (consensus algo‐ rithm), 366 view number, 368 virtual machines, 146 (see also cloud computing) context switches, 297 network performance, 282 noisy neighbors, 284 reliability in cloud services, 8 virtualized clocks in, 290 virtual memory process pauses due to page faults, 14, 297 versus memory management by databases, 89 VisiCalc (spreadsheets), 504 vnodes (partitioning), 199 Voice over IP (VoIP), 283 Voldemort (database) building read-only stores in batch processes, 413 hash partitioning, 203-204, 211 leaderless replication, 177 multi-datacenter support, 184 rebalancing, 213 reliance on read repair, 179 sloppy quorums, 184 VoltDB (database) cross-partition serializability, 256 deterministic stored procedures, 255 in-memory storage, 89 output streams, 456 secondary indexes, 207 serial execution of transactions, 253 statement-based replication, 159, 479 transactions in stream processing, 477 W WAL (write-ahead log), 82 web services (see services) Web Services Description Language (WSDL), 133 webhooks, 443 webMethods (messaging), 137 WebSocket (protocol), 512 Index | 589 windows (stream processing), 466, 468-472 infinite windows for changelogs, 467, 474 knowing when all events have arrived, 470 stream joins within a window, 473 types of windows, 472 winners (conflict resolution), 173 WITH RECURSIVE syntax (SQL), 54 workflows (MapReduce), 402 outputs, 411-414 key-value stores, 412 search indexes, 411 with map-side joins, 410 working set, 393 write amplification, 84 write path (derived data), 509 write skew (transaction isolation), 246-251 characterizing, 246-251, 262 examples of, 247, 249 materializing conflicts, 251 occurrence in practice, 529 phantoms, 250 preventing in snapshot isolation, 262-265 in two-phase locking, 259-261 options for, 248 write-ahead log (WAL), 82, 159 writes (database) atomic write operations, 243 detecting writes affecting prior reads, 264 preventing dirty writes with read commit‐ ted, 235 WS-* framework, 133 (see also services) WS-AtomicTransaction (2PC), 355 590 | Index X XA transactions, 355, 361-364 heuristic decisions, 363 limitations of, 363 xargs (Unix tool), 392, 396 XML binary variants, 115 encoding RDF data, 57 for application data, issues with, 114 in relational databases, 30, 41 XSL/XPath, 45 Y Yahoo!

pages: 416 words: 39,022

Asset and Risk Management: Risk Oriented Finance by Louis Esch, Robert Kieffer, Thierry Lopez


asset allocation, Brownian motion, business continuity plan, business process, capital asset pricing model, computer age, corporate governance, discrete time, diversified portfolio, fixed income, implied volatility, index fund, interest rate derivative, iterative process, P = NP, p-value, random walk, risk/return, shareholder value, statistical model, stochastic process, transaction costs, value at risk, Wiener process, yield curve, zero-coupon bond

Let us now assume that we wish to determine a solution a with a degree of precision ε. We could stop the iterative process on the basis of the error estimation formula. These formulae, however, require a certain level of information on the derivative f (x), information that is not easy to obtain. On the other hand, the limit specification εa will not generally be known beforehand.3 Consequently, we are running the risk of ε, the accuracy level sought, never being reached, as it is better than the limit precision εa (ε < εa ). In this case, the iterative process will carry on indefinitely. This leads us to accept the following stop criterion: |xn − xn−1 | < ε |xn+1 − xn | ≥ |xn − xn−1 | This means that the iteration process will be stopped when the iteration n produces a variation in value less than that of the iteration n + 1.

In addition, if the Jacobian matrix J(x), defined by [J(x)]ij = gj (x) xi is such that for every x ∈ I , ||J(x)|| ≤ m for a norm compatible with m < 1, Lipschitz’s condition is satisfied. The order of convergence is defined by lim k→∞ ||ek+1 || =C ||ek ||p where C is the constant for the asymptotic error. 8.3.2 Principal methods If one chooses a constant matrix A as the value for A(x), the iterative process is the generalisation in n dimensions of the chord method. If the inverse of the Jacobian matrix of f is chosen as the value of A(x), we will obtain the generalisation in n dimensions of the Newton–Raphson method. Another approach to solving the equation f (x) = 0 involves using the i th equation to determine the (i + 1)th component. Therefore, for i = 1, 2, . . . , n, the following equations will be solved in succession: (k+1) (k) fi (x1(k+1) , . . . , xi−1 , xi , xi+1 , . . . , xn(k) ) = 0 with respect to xi .

pages: 398 words: 31,161

Gnuplot in Action: Understanding Data With Graphs by Philipp Janert


bioinformatics, business intelligence, centre right, Debian, general-purpose programming language, iterative process, mandelbrot fractal, pattern recognition, random walk, Richard Stallman, six sigma, survivorship bias

Here are just a few items of practical advice to get you started. You may also want to take a look at the gnuplot reference documention for further discussion and additional features. Since the fitting algorithm is an iterative process, it’s not guaranteed to converge. If the iteration doesn’t converge, or converges to an obviously wrong solution, try to initialize the fitting parameters with better starting values. Unless the variables have been initialized explicitly, they’ll be equal to zero, which is often a particularly bad starting value. In special situations, you may also want to try hand-tuning the iteration process itself by fiddling with values of FIT_START_LAMBDA and FIT_LAMBDA_FACTOR. All fitting parameters should be of roughly equal scale. If some of the parameters differ wildly (by many orders of magnitude) from one another, the fitting function should be modified to take these factors into account explicitly.

It takes only one line to read and plot a data file, and most of the command syntax is straightforward and quite intuitive. Gnuplot does not require programming or any deeper understanding of its command syntax to get started. So this is the fundamental workflow of all work with gnuplot: plot, examine, repeat—until you have found out whatever you wanted to learn from the data. Gnuplot supports the iterative process model required for exploratory work perfectly. 1.3.1 Gnuplot isn’t GNU To dispel one common confusion right away: gnuplot isn’t GNU software, has nothing to do with the GNU project, and isn’t released under the GNU Public License (GPL). Gnuplot is released under a permissive open source license. Gnuplot has been around a long time—a very long time! It was started by Thomas Williams and Colin Kelley in 1986.

In general, the colors are distributed rather uniformly over the entire spectrum, because this matches up with the regularly varying function in this plot. 9.4.2 A complex figure As an example of a graph that includes a lot of fine detail, I’ve chosen a section from the edge of the Mandelbrot set. The Mandelbrot set is the set of all points in the complex plane for which a certain simple iteration process stays bounded. What’s noteworthy here is that the border between points inside the set and outside of it isn’t smooth—in fact the border is “infinitely” complicated, showing details at all levels of magnification.6 For points far from the Mandelbrot set, the iteration will diverge quickly (after just a few steps). But as we approach the border, the iteration will take many more steps before finally diverging.

pages: 410 words: 114,005

Black Box Thinking: Why Most People Never Learn From Their Mistakes--But Some Do by Matthew Syed


Airbus A320, Alfred Russel Wallace, Arthur Eddington, Atul Gawande, Black Swan, British Empire, call centre, Captain Sullenberger Hudson, Checklist Manifesto, cognitive bias, cognitive dissonance, conceptual framework, corporate governance, creative destruction, credit crunch, crew resource management, deliberate practice, double helix, epigenetics, fear of failure, fundamental attribution error, Henri Poincaré, hindsight bias, Isaac Newton, iterative process, James Dyson, James Hargreaves, James Watt: steam engine, Joseph Schumpeter, Lean Startup, mandatory minimum, meta analysis, meta-analysis, minimum viable product, publication bias, quantitative easing, randomized controlled trial, selection bias, Silicon Valley, six sigma, spinning jenny, Steve Jobs, the scientific method, Thomas Kuhn: the structure of scientific revolutions, too big to fail, Toyota Production System, US Airways Flight 1549, Wall-E, Yom Kippur War

One of the pit stops I witnessed was completed in an astonishing 1.95 seconds.* Vowles said: The secret to modern F1 is not really to do with big ticket items; it is about hundreds of thousands of small items, optimized to the nth degree. People think that things like engines are based upon high-level strategic decisions, but they are not. What is an engine except many iterations of small components? You start with a sensible design, but it is the iterative process that guides you to the best solution. Success is about creating the most effective optimization loop. I also spoke to Andy Cowell, the leader of the team that devised the engine. His attitude was a carbon copy of that of Vowles. We got our development engine up and running in late December [2012]. We didn’t design it to be car friendly. We didn’t try and figure out the perfect weight and aerodynamic design.

“A cyclone has a number of variables: size of entry, exit, angle, diameter, length: and the trying thing is that if you change one dimension, it affects all the others.” His discipline was astonishing. “I couldn’t afford a computer, so I would hand-write the results into a book,” he recalls. “In the first year alone, I conducted literally hundreds of experiments. It was a very, very thick book.” But as the intensive, iterative process gradually solved the problem of separating ultra-fine dust, Dyson came up against another problem: long pieces of hair and fluff. These were not being separated from the airflow by the cyclone dynamics. “They were just coming out of the top along with the air,” he says. “It was another huge problem and it didn’t seem as if a conventional cyclone could solve it.” The sheer scale of the problem set the stage for a second eureka moment: the dual cyclone.

A good storyline is an act of creative synthesis: bringing disparate narrative strands together in novel form. It is a crucial part of the Pixar process. But now consider what happens next. The story line is pulled apart. As the animation gets into operation, each frame, each strand of the story, each scene is subject to debate, dissent, and testing. All told, it takes around twelve thousand storyboard drawings to make one ninety-minute feature, and because of the iterative process, story teams often create more than 125,000 storyboards by the time the film is actually delivered. Monsters, Inc. is a perfect illustration of a creative idea adapted in the light of criticism. It started off with a plot centered on a middle-aged accountant who hates his job and who is given a sketchbook by his mother. As a child he had drawn some monsters in the sketchbook and that night they turn up in his bedroom, but only the accountant can see them.

pages: 396 words: 112,748

Chaos by James Gleick


Benoit Mandelbrot, butterfly effect, cellular automata, Claude Shannon: information theory, discrete time, Edward Lorenz: Chaos theory, experimental subject, Georg Cantor, Henri Poincaré, Isaac Newton, iterative process, John von Neumann, Louis Pasteur, mandelbrot fractal, Murray Gell-Mann, Norbert Wiener, pattern recognition, Richard Feynman, Richard Feynman, Stephen Hawking, stochastic process, trade route

The Mandelbrot set became a kind of public emblem for chaos, appearing on the glossy covers of conference brochures and engineering quarterlies, forming the centerpiece of an exhibit of computer art that traveled internationally in 1985 and 1986. Its beauty was easy to feel from these pictures; harder to grasp was the meaning it had for the mathematicians who slowly understood it. Many fractal shapes can be formed by iterated processes in the complex plane, but there is just one Mandelbrot set. It started appearing, vague and spectral, when Mandelbrot tried to find a way of generalizing about a class of shapes known as Julia sets. These were invented and studied during World War I by the French mathematicians Gaston Julia and Pierre Fatou, laboring without the pictures that a computer could provide. Mandelbrot had seen their modest drawings and read their work—already obscure—when he was twenty years old.

Were the buglike, floating “molecules” isolated islands? Or were they attached to the main body by filaments too fine to be observed? It was impossible to tell. For a one-dimensional process, no one need actually resort to experimental trial. It is easy enough to establish that numbers greater than one lead to infinity and the rest do not. But in the two dimensions of the complex plane, to deduce a shape defined by an iterated process, knowing the equation is generally not enough. Unlike the traditional shapes of geometry, circles and ellipses and parabolas, the Mandelbrot set allows no shortcuts. The only way to see what kind of shape goes with a particular equation is by trial and error, and the trial-and–error style brought the explorers of this new terrain closer in spirit to Magellan than to Euclid. Joining the world of shapes to the world of numbers in this way represented a break with the past.

If you look at it now it seems to have passed. People don’t like it any more. In Germany they built huge apartment blocks in the Bauhaus style and people move out, they don’t like to live there. There are very deep reasons, it seems to me, in society right now to dislike some aspects of our conception of nature.” Peitgen had been helping a visitor select blowups of regions of the Mandelbrot set, Julia sets, and other complex iterative processes, all exquisitely colored. In his small California office he offered slides, large transparencies, even a Mandelbrot set calendar. “The deep enthusiasm we have has to do with this different perspective of looking at nature. What is the true aspect of the natural object? The tree, let’s say—what is important? Is it the straight line, or is it the fractal object?” At Cornell, meanwhile, John Hubbard was struggling with the demands of commerce.

pages: 287 words: 44,739

Guide to business modelling by John Tennent, Graham Friend, Economist Group


correlation coefficient, discounted cash flows, double entry bookkeeping, intangible asset, iterative process, purchasing power parity, RAND corporation, shareholder value, the market place, time value of money

Chart 15.10 Applying a discount rate As the discount rate increases so the npv of the project will fall. The graph in Chart 15.11 shows the npv for a range of discount rates. 184 15. PROJECT APPRAISAL AND COMPANY VALUATION Chart 15.11 The NPV of a project with a range of discount rates The graph is a curved shape so the irr has to be found by trial and error or interpolation between two known points. Even spreadsheets use a trial and error iterative process to find the breakeven point. In the example it was possible to find the point almost exactly. In spreadsheets the irr function can be used to find the breakeven interest rate. The syntax is: ⫽IRR(range,guess) The range is the cash flows in the model and the guess is the point near where the irr is expected to be found. If no guess is used the formula assumes 10%. Note that the range for irr is time 0 to time N, whereas with npv above the range is time 1 to time N.

Even when a model contains no technical errors, however, it may still fail to deliver intuitive results because of a conceptual flaw. Conceptual errors These constitute a flaw in the logic, the rationale or the mechanisms depicted in the model. As the business modelling process map in Chapter 6 (Chart 6.1, page 34) indicated, developing an understanding of the logical flows and the relationships within the environment is an iterative process. The testing phase offers the modeller another opportunity to increase and test his or her understanding of the business. User errors These occur in poorly structured and badly documented models with limited checks on user inputs and inadequately trained users. The problems may arise as a result of human error, but the fundamental problem often lies with the design of the model. Types of errors 207 Allow time for testing A testing and debugging strategy must ensure the identification and removal of as many of the three types of errors as possible.

However, if a circular reference is an integral part of the design, such as in the case of interest calculations, then the spreadsheet package must be instructed to find a set of values that satisfy the circularity. The values represent an equilibrium that is effectively the solution to a set of simultaneous equations. To allow the spreadsheet to solve the circular reference, the modeller should select tools➞options➞calculation tab➞iteration. The model uses an iterative process where a range of values is used until a consistent set of results is found. Additional error handling may be required in the presence of circular references because if, for example, a #DIV/0!, #N/A! or #REF! occurs, the model will be unable to find a solution and the errors become compounded by the circularity. This is common in the case of items that influence all the financial statements such as taxation, cash and dividend calculations.

pages: 312 words: 35,664

The Mathematics of Banking and Finance by Dennis W. Cox, Michael A. A. Cox


barriers to entry, Brownian motion, call centre, correlation coefficient, fixed income, inventory management, iterative process, linear programming, meta analysis, meta-analysis, P = NP, pattern recognition, random walk, traveling salesman, value at risk

. ; options design/approach to analysis, data 129–47 dice-rolling examples, probability theory 21–3, 53–5 differentiation 251 discount factors adjusted discount rates 228–9 net present value (NPV) 220–1, 228–9, 231–2 discrete data bar charts 7–12, 13 concepts 7–12, 13, 44–5, 53–5, 72 discrete uniform distribution, concepts 53–5 displays see also presentational approaches data 1–5 Disraeli, Benjamin 1 division notation 280, 282 dynamic programming complex examples 184–7 concepts 179–87 costs 180–82 examples 180–87 principle of optimality 179–87 returns 179–80 schematic 179–80 ‘travelling salesman’ problem 185–7 e-mail surveys 50–1 economic order quantity see also stock control concepts 195–201 examples 196–9 empowerment, staff 189–90 error sum of the squares (SSE), concepts 122–5, 133–47 errors, data analysis 129–47 estimates mean 76–81 probability theory 22, 25–6, 31–5, 75–81 Euler, L. 131 288 Index events independent events 22–4, 35, 58, 60, 92–5 mutually exclusive events 22–4, 58 probability theory 21–35, 58–66, 92–5 scenario analysis 40, 193–4, 271–4 tree diagrams 30–5 Excel 68, 206–7 exclusive events see mutually exclusive events expected errors, sensitivity analysis 268–9 expected value, net present value (NPV) 231–2 expert systems 275 exponent notation 282–4 exponential distribution, concepts 65–6, 209–10, 252–5 external fraud 272–4 extrapolation 119 extreme value distributions, VaR 262–4 F distribution ANOVA (analysis of variance) 110–20, 127, 134–7 concepts 85–9, 110–20, 127, 134–7 examples 85–9, 110–20, 127, 137 tables 85–8 f notation 8–9, 13–20, 26, 38–9, 44–5, 65–6, 85 factorial notation 53–5, 283–4 failure probabilities see also reliability replacement of assets 215–18, 249–60 feasibility polygons 152–7, 163–4 finance selection, linear programming 164–6 fire extinguishers, ANOVA (analysis of variance) 123–7 focus groups 51 forward recursion 179–87 four by four tables 94–5 fraud 272–4, 276 Fréchet distribution 262 frequency concepts 8–9, 13–20, 37–45 cumulative frequency polygons 13–20, 39–40, 203 graphical presentational approaches 8–9, 13–20 frequentist approach, probability theory 22, 25–6 future cash flows 219–25, 227–34, 240–1 fuzzy logic 276 Garbage In, Garbage Out (GIGO) 261–2 general rules, linear programming 167–70 genetic algorithms 276 ghost costs, transport problems 172–7 goodness of fit test, chi-squared test 91–5 gradient (a notation), linear regression 103–4, 107–20 graphical method, linear programming 149–57, 163–4 graphical presentational approaches concepts 1–20, 149–57, 235–47 rules 8–9 greater-than notation 280–4 Greek alphabet 283 guesswork, modelling 191 histograms 2, 7, 13–20, 41, 73 class intervals 13–20, 44–5 comparative histograms 14–19 concepts 7, 13–20, 41, 73 continuous data 7, 13–14 examples 13–20, 73 skewness 41 uses 7, 13–20 holding costs 182–5, 197–201, 204–8 home insurance 10–12 Hopfield 275 horizontal axis bar charts 8–9 histograms 14–20 linear regression 103–4, 107–20 scatter plots 2–5, 103 hypothesis testing concepts 77–81, 85–95, 110–27 examples 78–80, 85 type I and type II errors 80–1 i notation 8–9, 13–20, 28–30, 37–8, 103–20 identification data 2–5, 261–5 trends 241–7 identity rule 282 impact assessments 21, 271–4 independent events, probability theory 22–4, 35, 58, 60, 92–5 independent variables, concepts 2–5, 70, 103–20, 235 infinity, normal distribution 67–72 information, quality needs 190–4 initial solution, linear programming 167–70 insurance industry 10–12, 29–30 integers 280–4 integration 65–6, 251 intercept (b notation), linear regression 103–4, 107–20 interest rates base rates 240 daily movements 40, 261 project evaluation 219–25, 228–9 internal rate of return (IRR) concepts 220–2, 223–5 examples 220–2 interpolation, IRR 221–2 interviews, uses 48, 51–2 inventory control see stock control Index investment strategies 149–57, 164–6, 262–5 IRR see internal rate of return iterative processes, linear programming 170 j notation 28–30, 37, 104–20, 121–2 JP Morgan 263 k notation 20, 121–7 ‘know your customer’ 272 Kohonen self-organising maps 275 Latin squares concepts 131–2, 143–7 examples 143–7 lead times, stock control 195–201 learning strategies, neural networks 275–6 less-than notation 281–4 lethargy pitfalls, decisions 189 likelihood considerations, scenario analysis 272–3 linear programming additional variables 167–70 concepts 149–70 concerns 170 constraining equations 159–70 costs 167–70, 171–7 critique 170 examples 149–57, 159–70 finance selection 164–6 general rules 167–70 graphical method 149–57, 163–4 initial solution 167–70 iterative processes 170 manual preparation 170 most profitable loans 159–66 optimal advertising allocation 154–7 optimal investment strategies 149–57, 164–6 returns 149–57, 164–6 simplex method 159–70, 171–2 standardisation 167–70 time constraints 167–70 transport problems 171–7 linear regression analysis 110–20 ANOVA (analysis of variance) 110–20 concepts 3, 103–20 equation 103–4 examples 107–20 gradient (a notation) 103–4, 107–20 intercept (b notation) 103–4, 107–20 interpretation 110–20 notation 103–4 residual sum of the squares 109–20 slope significance test 112–20 uncertainties 108–20 literature searches, surveys 48 289 loans finance selection 164–6 linear programming 159–66 risk assessments 159–60 log-normal distribution, concepts 257–8 logarithms (logs), types 20, 61 losses, banks 267–9, 271–4 lotteries 22 lower/upper quartiles, concepts 39–41 m notation 55–8 mail surveys 48, 50–1 management information, graphical presentational approaches 1–20 Mann–Whitney test see U test manual preparation, linear programming 170 margin of error, project evaluation 229–30 market prices, VaR 264–5 marketing brochures 184–7 mathematics 1, 7–8, 196–9, 219–20, 222–5, 234, 240–1, 251, 279–84 matrix plots, concepts 2, 4–5 matrix-based approach, transport problems 171–7 maximum and minimum, concepts 37–9, 40, 254–5 mean comparison of two sample means 79–81 comparisons 75–81 concepts 37–45, 59–60, 65–6, 67–74, 75–81, 97–8, 100–2, 104–27, 134–5 confidence intervals 71, 75–81, 105, 109, 116–20, 190, 262–5 continuous data 44–5, 65–6 estimates 76–81 hypothesis testing 77–81 linear regression 104–20 normal distribution 67–74, 75–81, 97–8 sampling 75–81 mean square causes (MSC), concepts 122–7, 134–47 mean square errors (MSE), ANOVA (analysis of variance) 110–20, 121–7, 134–7 median, concepts 37, 38–42, 83, 98–9 mid-points class intervals 44–5, 241–7 moving averages 241–7 minimax regret rule, concepts 192–4 minimum and maximum, concepts 37–9, 40 mode, concepts 37, 39, 41 modelling banks 75–81, 85, 97, 267–9, 271–4 concepts 75–81, 83, 91–2, 189–90, 195–201, 215–18, 261–5 decision-making pitfalls 189–91 economic order quantity 195–201 290 Index modelling (cont.) guesswork 191 neural networks 275–7 operational risk 75, 262–5, 267–9, 271–4 output reviews 191–2 replacement of assets 215–18, 249–60 VaR 261–5 moments, density functions 65–6, 83–4 money laundering 272–4 Monte Carlo simulation bank cashier problem 209–12 concepts 203–14, 234 examples 203–8 Monty Hall problem 212–13 queuing problems 208–10 random numbers 207–8 stock control 203–8 uses 203, 234 Monty Hall problem 34–5, 212–13 moving averages concepts 241–7 even numbers/observations 244–5 moving totals 245–7 MQMQM plot, concepts 40 MSC see mean square causes MSE see mean square errors multi-way tables, concepts 94–5 multiplication notation 279–80, 282 multiplication rule, probability theory 26–7 multistage sampling 50 mutually exclusive events, probability theory 22–4, 58 n notation 7, 20, 28–30, 37–45, 54–8, 103–20, 121–7, 132–47, 232–4 n!

The current cost can now be calculated as: 142.86 cost(x1 ) + 285.71 cost(x2 ) = 142.86 × 1 + 285.71 × 1 = 428.57 There are now no remaining negative opportunity costs, so the solution has variable 1 as 142.86 and variable 2 as 285.71 with the remaining variables no longer being used since they are now zero. 17.4 THE CONCERNS WITH THE APPROACH In practice when you are inputting data into a system and then using some iterative process to try to find a better estimate of what is the best strategy, you are actually conducting the process laid out in this chapter – it is just that the actual work is normally embedded within a computer program. However, where possible there are real merits in carrying out the analysis in a manual form, not the least of which is the relative complexity of the software solutions currently available.

pages: 372 words: 101,174

How to Create a Mind: The Secret of Human Thought Revealed by Ray Kurzweil


Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Albert Michelson, anesthesia awareness, anthropic principle, brain emulation, cellular automata, Claude Shannon: information theory, cloud computing, computer age, Dean Kamen, discovery of DNA, double helix,, epigenetics, George Gilder, Google Earth, Isaac Newton, iterative process, Jacquard loom, Jacquard loom, John von Neumann, Law of Accelerating Returns, linear programming, Loebner Prize, mandelbrot fractal, Norbert Wiener, optical character recognition, pattern recognition, Peter Thiel, Ralph Waldo Emerson, random walk, Ray Kurzweil, reversible computing, selective serotonin reuptake inhibitor (SSRI), self-driving car, speech recognition, Steven Pinker, strong AI, the scientific method, theory of mind, Turing complete, Turing machine, Turing test, Wall-E, Watson beat the top human players on Jeopardy!, X Prize

(For an algorithmic description of genetic algorithms, see this endnote.)11 The key to a genetic algorithm is that the human designers don’t directly program a solution; rather, we let one emerge through an iterative process of simulated competition and improvement. Biological evolution is smart but slow, so to enhance its intelligence we greatly speed up its ponderous pace. The computer is fast enough to simulate many generations in a matter of hours or days, and we’ve occasionally had them run for as long as weeks to simulate hundreds of thousands of generations. But we have to go through this iterative process only once; as soon as we have let this simulated evolution run its course, we can apply the evolved and highly refined rules to real problems in a rapid fashion. In the case of our speech recognition systems, we used them to evolve the initial topology of the network and other critical parameters.

We then collapse the two (one-point) clusters that are closest together into a single cluster. We are thus still left with 1,024 clusters. After processing the 1,025th vector, one of those clusters now has more than one point. We keep processing points in this way, always maintaining 1,024 clusters. After we have processed all the points, we represent each multipoint cluster by the geometric center of the points in that cluster. We continue this iterative process until we have run through all the sample points. Typically we would process millions of points into 1,024 (210) clusters; we’ve also used 2,048 (211) or 4,096 (212) clusters. Each cluster is represented by one vector that is at the geometric center of all the points in that cluster. Thus the total of the distances of all the points in the cluster to the center point of the cluster is as small as possible.

pages: 193 words: 98,671

The Inmates Are Running the Asylum by Alan Cooper


Albert Einstein, delayed gratification, Donald Trump, Howard Rheingold, informal economy, iterative process, Jeff Bezos, Menlo Park, natural language processing, new economy,, Robert X Cringely, Silicon Valley, Silicon Valley startup, skunkworks, Steve Jobs, Steven Pinker, telemarketer, urban planning

I'll talk a lot about goals in the next chapter, but we discover them in the same way we discover personas. We determine the relevant personas and their goals in a process of successive refinement during our initial investigation of the problem domain. Typically, we start with a reasonable approximation and quickly converge on a believable population of personas. Although this iterative process is similar to the iterative process used by software engineers during the implementation process, it is significantly different in one major respect. Iterating the design and its premises is quick and easy because we are working in paper and words. Iterating the implementation is slow and difficult because it requires code. The Cast of Characters We give every project its own cast of characters, which consists of anywhere from 3 to 12 unique personas.

They write any old program that can be built in the least time and then put it before their users. They then listen to the complaints and feedback, measure the patterns of the user's navigation clicks, change the weak parts, and then ship it again. Generally, programmers aren't thrilled about the iterative method because it means extra work for them. Typically, it's managers new to technology who like the iterative process because it relieves them of having to perform rigorous planning, thinking, and product due diligence (in other words, interaction design). Of course, it's the users who pay the dearest price. They have to suffer through one halfhearted attempt after another before they get a program that isn't too painful. Just because customer feedback improves your understanding of your product or service, you cannot then deduce that it is efficient, cheap, or even effective to toss random features at your customers and see which ones are liked and which are disliked.

Functional Programming in Scala by Paul Chiusano, Rúnar Bjarnason


domain-specific language, iterative process, loose coupling, sorting algorithm, type inference, web application

Though because this style of organization is so common in FP, we sometimes don't bother to distinguish between an ordinary functional library and a "combinator library". 7.2 Choosing data types and functions Our goal in this section is to discover a data type and a set of primitive functions for our domain, and derive some useful combinators. This will be a somewhat meandering journey. Functional design can be a messy, iterative process. We hope to show at least a stylized view of this messiness that nonetheless gives some insight into how functional design proceeds in the real world. Don't worry if you don't follow absolutely every bit of discussion throughout this process. This chapter is a bit like peering over the shoulder of someone as they think through possible designs. And because no two people approach this process the same way, the particular path we walk here might not strike you as the most natural one—perhaps it considers issues in what seems like an odd order, skips too fast or goes too slow.

And lastly, you can look at your implementation and come up with laws you expect to hold based on your implementation.17 Footnote 17mThis last way of generating laws is probably the weakest, since it can be a little too easy to just have the laws reflect the implementation, even if the implementation is buggy or requires all sorts of unusual side conditions that make composition difficult. EXERCISE 13: Can you think of other laws that should hold for your implementation of unit, fork, and map2? Do any of them have interesting consequences? 7.2.4 Expressiveness and the limitations of an algebra Functional design is an iterative process. After you've written down your API and have at least a prototype implementation, try using it for progressively more complex or realistic scenarios. Often you'll find that these scenarios require only some combination of existing primitive or derived combinators, and this is a chance to factor out common usage patterns into other combinators; occasionally you'll find situations where your existing primitives are insufficient.

As a result, you might arrive at a design that's much better for your purposes. But even if you decide you like the existing library's solution, spending an hour or two of playing with designs and writing down some type signatures is a great way to learn more about a domain, understand the design tradeoffs, and improve your ability to think through design problems. 8.3 Choosing data types and functions In this section, we will embark on another messy, iterative process of discovering data types and a set of primitive functions and combinators for doing property-based testing. As before, this is a chance to peer over the shoulder of someone working through possible designs. The particular path we take and the library we arrive at isn't necessarily the same as what you would discover. If property-based testing is unfamiliar to you, even better; this is a chance to explore a new domain and its design space, and make your own discoveries about it.

pages: 338 words: 106,936

The Physics of Wall Street: A Brief History of Predicting the Unpredictable by James Owen Weatherall


Albert Einstein, algorithmic trading, Antoine Gombaud: Chevalier de Méré, Asian financial crisis, bank run, beat the dealer, Benoit Mandelbrot, Black Swan, Black-Scholes formula, Bonfire of the Vanities, Bretton Woods, Brownian motion, butterfly effect, capital asset pricing model, Carmen Reinhart, Claude Shannon: information theory, collateralized debt obligation, collective bargaining, dark matter, Edward Lorenz: Chaos theory, Edward Thorp, Emanuel Derman, Eugene Fama: efficient market hypothesis, financial innovation, fixed income, George Akerlof, Gerolamo Cardano, Henri Poincaré, invisible hand, Isaac Newton, iterative process, John Nash: game theory, Kenneth Rogoff, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, martingale, Myron Scholes, new economy, Paul Lévy, Paul Samuelson, prediction markets, probability theory / Blaise Pascal / Pierre de Fermat, quantitative trading / quantitative finance, random walk, Renaissance Technologies, risk-adjusted returns, Robert Gordon, Robert Shiller, Robert Shiller, Ronald Coase, Sharpe ratio, short selling, Silicon Valley, South Sea Bubble, statistical arbitrage, statistical model, stochastic process, The Chicago School, The Myth of the Rational Market, tulip mania, V2 rocket, Vilfredo Pareto, volatility smile

Extreme events occur far more often than Bachelier and Osborne believed they would, and markets are wilder places than normal distributions can describe. To fully understand markets, and to model them as safely as possible, these facts must be accounted for. And Mandelbrot is singularly responsible for discovering the shortcomings of the Bachelier-Osborne approach, and for developing the mathematics necessary to study them. Getting the details right may be an ongoing project — indeed, we should never expect to finish the iterative process of improving our mathematical models — but there is no doubt that Mandelbrot took a crucially important step forward. After a decade of interest in the statistics of markets, Mandelbrot gave up on his crusade to replace normal distributions with other Lévy-stable distributions. By this time, his ideas on randomness and disorder had begun to find applications in a wide variety of other fields, from cosmology to meteorology.

A sledgehammer may be great for laying train rails, but you need to recognize that it won’t be very good for hammering in finishing nails on a picture frame. I believe the history that I have recounted in this book supports the closely related claims that models in finance are best thought of as tools for certain kinds of purposes, and also that these tools make sense only in the context of an iterative process of developing models and then figuring out when, why, and how they fail — so that the next generation of models are robust in ways that the older models were not. From this perspective, Bachelier represents a first volley, the initial attempt to apply new ideas from statistical physics to an entirely different set of problems. He laid the groundwork for a revolutionary way of thinking about markets.

This led to a feedback loop that wasn’t fully recognized until after the 1987 crash. As sociologist Donald MacKenzie has observed, financial models are as much the engine behind markets as they are a camera capable of describing them. This means that the markets financial models are trying to capture are a moving target. Far from undermining the usefulness of models in understanding markets, the fact that markets are constantly evolving only makes the iterative process I have emphasized more important. Suppose that Sornette’s model of market crashes is perfect for current markets. Even then, we have to remain ever vigilant. What would happen if investors around the world started using his methods to predict crashes? Would this prevent crashes from occurring? Or would it simply make them bigger, or harder to predict? I don’t think anyone knows the answer to this question, which means that it is just the kind of thing we should be studying.

pages: 343 words: 93,544

vN: The First Machine Dynasty (The Machine Dynasty Book 1) by Madeline Ashby


big-box store, iterative process, natural language processing, place-making, traveling salesman, urban planning

Dr Singh had suggested a vN variety of naan as a replacement. "You're going to be entering a deep game immersion. You won't eat for a few hours. So you'd better fuel up now." Amy re-examined the plates. They were the smart kind; if she'd asked, they would have told her how many ounces she was eating from each. But she didn't need to ask. "There's too much here," she said. "If I eat all this without having to repair myself, it'll trigger the iteration process." She leaned as far forward as the Cuddlebug would allow. "Will I have to repair myself?" "No. It'll just wear you out, that's all." "How do you know?" "I've seen it happen." Dr Singh stood. "I thought you'd be happy with the spread. Your mother says you were never allowed to eat as much as you wanted. She says you were always hungry." Amy shut her eyes. She was going to cry, and she didn't want Dr Singh or the others to see it.

Asimov's Frankenstein Complex notion isn't just an early version of Mori's Uncanny Valley hypothesis, it's a reasonable extension of the fear that when we create in our own image, we will inevitably re-create the worst parts of ourselves. In other words: "I'm fucked up – therefore my kids will be fucked up, too." When I completed the submission draft of this book, I had just finished the first year of my second Master's – a design degree in strategic foresight. So I had spent months listening to discussions about the iterative process. And I started to realize that a self-replicating species of machine wouldn't have the usual fears about its offspring repeating its signature mistakes, nor would it have that uncanny response to copying. Machines like that could consider their iterations as prototypes, and nothing more. Stephen King has a famous adage about killing your darlings, and they could do that – literally – without a flood of oxytocin or normative culture telling them different.

But even so, the book was rejected by a bunch of different publishers, and I still had to re-write the whole opening of the submission draft before the book became sale-able. David Nickle was invaluable for that – we watched A History of Violence together and suddenly everything clicked. Normally he gives me the end of all my stories, and this time he helped me see a new beginning. In short: it was an iterative process. ANGRY ROBOT A member of the Osprey Group Lace Market House, 54-56 High Pavement, Nottingham, NG1 1HW, UK No three rules An Angry Robot paperback original 2012 1 Copyright © Madeline Ashby 2012 Madeline Ashby asserts the moral right to be identified as the author of this work. A catalogue record for this book is available from the British Library.

pages: 52 words: 14,333

Growth Hacker Marketing: A Primer on the Future of PR, Marketing, and Advertising by Ryan Holiday


Airbnb, iterative process, Kickstarter, Lean Startup, Marc Andreessen, market design, minimum viable product, Paul Graham,, Silicon Valley, slashdot, Steve Wozniak

I start and end with my own experiences in this book not because I am anyone special but because I think they illustrate a microcosm of the industry itself. The old way—where product development and marketing were two distinct and separate processes—has been replaced. We all find ourselves in the same position: needing to do more with less and finding, increasingly, that the old strategies no longer generate results. So in this book, I am going to take you through a new cycle, a much more fluid and iterative process. A growth hacker doesn’t see marketing as something one does, but rather as something one builds into the product itself. The product is then kick-started, shared, and optimized (with these steps repeated multiple times) on its way to massive and rapid growth. The chapters of this book follow that structure. But first, let’s make a clean break between the old and the new. What Is Growth Hacking?

pages: 205 words: 20,452

Data Mining in Time Series Databases by Mark Last, Abraham Kandel, Horst Bunke


4chan, call centre, computer vision, discrete time, information retrieval, iterative process, NP-complete, p-value, pattern recognition, random walk, sensor fusion, speech recognition, web application

For better computational efficiency a second genetic algorithm is designed using a more elaborate chromosome coding scheme of strings; see [17] for details. The time complexity becomes O(N nmpP ), m << n, implying a substantial speedup. 5.2.3. Perturbation-Based Iterative Refinement The set median represents an approximation of the generalized median string. The greedy algorithms and the genetic search techniques also give approximate solutions. An approximate solution p̄ can be further improved by an iterative process of systematic perturbations. This idea was first suggested in [20]. But no algorithmic details are specified there. A concrete algorithm for realizing systematic perturbations is given in [26]. For each 184 X. Jiang, H. Bunke and J. Csirik position i, the following operations are performed: (i) Build perturbations • Substitution: Replace the i-th symbol of p̄ by each symbol of Σ in turn and choose the resulting string x with the smallest consensus error relative to S. • Insertion: Insert each symbol of Σ in turn at the i-th position of p̄ and choose the resulting string y with the smallest consensus error relative to S. • Deletion: Delete the i-th symbol of p̄ to generate z.

Note that the consensus errors of digit 6 are substantially larger than those of the other digits because of the definition of consensus error as the sum, but not the average, of the distances to all input samples. The best results are achieved by GA, followed by the dynamic approach. Except for digit 1, the greedy algorithm reveals some weakness. Looking at the median for digits 2, 3 and 6 it seems that the iterative process terminates too early, resulting in a string (digit) much shorter than it should be. The reason lies in the simple termination criterion defined in [4]. It works well for the (short) words used there, but obviously encounters difficulties in dealing with longer strings occurring in our study. At first glance, the dynamic approach needs more computation time than the greedy algorithm. But one has to take into account that the recorded time is the total time of the dynamic process of adding one sample to the existing set each time, starting from a set consisting of the first sample.

pages: 252 words: 73,131

The Inner Lives of Markets: How People Shape Them—And They Shape Us by Tim Sullivan


Airbnb, airport security, Al Roth, Alvin Roth, Andrei Shleifer, attribution theory, autonomous vehicles, barriers to entry, Brownian motion, centralized clearinghouse, Chuck Templeton: OpenTable, clean water, conceptual framework, constrained optimization, continuous double auction, creative destruction, deferred acceptance, Donald Trump, Edward Glaeser, experimental subject, first-price auction, framing effect, frictionless, fundamental attribution error, George Akerlof, Goldman Sachs: Vampire Squid, Gunnar Myrdal, helicopter parent, information asymmetry, Internet of things, invisible hand, Isaac Newton, iterative process, Jean Tirole, Jeff Bezos, Johann Wolfgang von Goethe, John Nash: game theory, John von Neumann, Joseph Schumpeter, Kenneth Arrow, late fees, linear programming, Lyft, market clearing, market design, market friction, medical residency, multi-sided market, mutually assured destruction, Nash equilibrium, Occupy movement, Pareto efficiency, Paul Samuelson, Peter Thiel,, pez dispenser, pre–internet, price mechanism, price stability, prisoner's dilemma, profit motive, proxy bid, RAND corporation, ride hailing / ride sharing, Robert Shiller, Robert Shiller, Ronald Coase, school choice, school vouchers, sealed-bid auction, second-price auction, second-price sealed-bid, sharing economy, Silicon Valley, spectrum auction, Steve Jobs, Tacoma Narrows Bridge, technoutopianism, telemarketer, The Market for Lemons, The Wisdom of Crowds, Thomas Malthus, Thorstein Veblen, trade route, transaction costs, two-sided market, uranium enrichment, Vickrey auction, Vilfredo Pareto, winner-take-all economy

The pioneers of information economics set the profession on a path to better describing the nature of markets, which has in turn led the current generation to turn its attention outward to dabble in the design of markets and policy. A great many applied theorists and empirical economists are, together, able to match theories up to data they can use to evaluate how they perform in practice. We hope that the iterative process of theorizing and testing of theories in the field and the reformulating of theories (a process of experimentation that we’re in the midst of) will make it more likely that economics’ increased influence on the world is ultimately for the better. 5 BUILDING AN AUCTION FOR EVERYTHING THE TALE OF THE ROLLER-SKATING ECONOMIST In the fall of 2006, the Japanese baseball phenom Daisuke Matsuzaka announced his interest in moving to the American big leagues.

These allocation problems all now have centralized clearinghouses, many designed with the basic deferred acceptance algorithm as their foundations. But that’s really all that Gale and Shapley provided: a conceptual framework that market designers have, for several decades now, been applying, evaluating, and refining. They’ve learned from its successes and, unfortunately, learned even more from its inevitable failures: modeling real-life exchanges is an imprecise, iterative process in which many of us find ourselves as experimental subjects. The Complicated Job of Engineering Matches Market designer Al Roth likes to use a bridge-building metaphor to explain the contrast between his own work and that of design pioneers like Shapley. Suppose you want to build a suspension bridge connecting Brooklyn and Manhattan. In confronting decisions like where to place the suspension cables and how thick each should be, you’d better have paid attention in physics class.

pages: 462 words: 172,671

Clean Code: A Handbook of Agile Software Craftsmanship by Robert C. Martin


continuous integration, database schema, domain-specific language, don't repeat yourself, Donald Knuth,, Eratosthenes, finite state, Ignaz Semmelweis: hand washing, iterative process, place-making, Rubik’s Cube, web application, WebSocket

All the analysis functions appear first, and all the synthesis functions appear last. If you look carefully, you will notice that I reversed several of the decisions I made earlier in this chapter. For example, I inlined some extracted methods back into formatCompactedComparison, and I changed the sense of the shouldNotBeCompacted expression. This is typical. Often one refactoring leads to another that leads to the undoing of the first. Refactoring is an iterative process full of trial and error, inevitably converging on something that we feel is worthy of a professional. Conclusion And so we have satisfied the Boy Scout Rule. We have left this module a bit cleaner than we found it. Not that it wasn’t clean already. The authors had done an excellent job with it. But no module is immune from improvement, and each of us has the responsibility to leave the code a little better than we found it. 16 Refactoring SerialDate If you go to, you will find the JCommon library.

See Hungarian Notation horizontal alignment, of code, 87–88 horizontal formatting, 85–90 horizontal white space, 86 HTML, in source code, 69 Hungarian Notation (HN), 23–24, 295 Hunt, Andy, 8, 289 hybrid structures, 99 I if statements duplicate, 276 eliminating, 262 if-else chain appearing again and again, 290 eliminating, 233 ignored tests, 313 implementation duplication of, 173 encoding, 24 exposing, 94 hiding, 94 wrapping an abstraction, 11 Implementation Patterns, 3, 296 implicity, of code, 18 import lists avoiding long, 307 shortening in SerialDate, 270 imports, as hard dependencies, 307 imprecision, in code, 301 inaccurate comments, 54 inappropriate information, in comments, 286 inappropriate static methods, 296 include method, 48 inconsistency, in code, 292 inconsistent spellings, 20 incrementalism, 212–214 indent level, of a function, 35 indentation, of code, 88–89 indentation rules, 89 independent tests, 132 information inappropriate, 286 too much, 70, 291–292 informative comments, 56 inheritance hierarchy, 308 inobvious connection, between a comment and code, 70 input arguments, 41 instance variables in classes, 140 declaring, 81 hiding the declaration of, 81–82 passing as function arguments, 231 proliferation of, 140 instrumented classes, 342 insufficient tests, 313 integer argument(s) defining, 194 integrating, 224–225 integer argument functionality, moving into ArgumentMarshaler, 215–216 integer argument type, adding to Args, 212 integers, pattern of changes for, 220 IntelliJ, 26 intent explaining in code, 55 explanation of, 56–57 obscured, 295 intention-revealing function, 19 intention-revealing names, 18–19 interface(s) defining local or remote, 158–160 encoding, 24 implementing, 149–150 representing abstract concerns, 150 turning ArgumentMarshaler into, 237 well-defined, 291–292 writing, 119 internal structures, objects hiding, 97 intersection, of domains, 160 intuition, not relying on, 289 inventor of C++, 7 Inversion of Control (IoC), 157 InvocationHandler object, 162 I/O bound, 318 isolating, from change, 149–150 isxxxArg methods, 221–222 iterative process, refactoring as, 265 J jar files, deploying derivatives and bases in, 291 Java aspects or aspect-like mechanisms, 161–166 heuristics on, 307–309 as a wordy language, 200 Java 5, improvements for concurrent development, 182–183 Java 5 Executor framework, 320–321 Java 5 VM, nonblocking solutions in, 327–328 Java AOP frameworks, 163–166 Java programmers, encoding not needed, 24 Java proxies, 161–163 Java source files, 76–77 javadocs as clutter, 276 in nonpublic code, 71 preserving formatting in, 270 in public APIs, 59 requiring for every function, 63 java.util.concurrent package, collections in, 182–183 JBoss AOP, proxies in, 163 JCommon library, 267 JCommon unit tests, 270 JDepend project, 76, 77 JDK proxy, providing persistence support, 161–163 Jeffries, Ron, 10–11, 289 jiggling strategies, 190 JNDI lookups, 157 journal comments, 63–64 JUnit, 34 JUnit framework, 252–265 Junit project, 76, 77 Just-In-Time Compiler, 180 K keyword form, of a function name, 43 L L, lower-case in variable names, 20 language design, art of programming as, 49 languages appearing to be simple, 12 level of abstraction, 2 multiple in one source file, 288 multiples in a comment, 270 last-in, first-out (LIFO) data structure, operand stack as, 324 Law of Demeter, 97–98, 306 LAZY INITIALIZATION/EVALUATION idiom, 154 LAZY-INITIALIZATION, 157 Lea, Doug, 182, 342 learning tests, 116, 118 LeBlanc’s law, 4 legacy code, 307 legal comments, 55–56 level of abstraction, 36–37 levels of detail, 99 lexicon, having a consistent, 26 lines of code duplicating, 173 width of, 85 list(s) of arguments, 43 meaning specific to programmers, 19 returning a predefined immutable, 110 literate code, 9 literate programming, 9 Literate Programming, 141 livelock, 183, 338 local comments, 69–70 local variables, 324 declaring, 292 at the top of each function, 80 lock & wait, 337, 338 locks, introducing, 185 log4j package, 116–118 logical dependencies, 282, 298–299 LOGO language, 36 long descriptive names, 39 long names, for long scopes, 312 loop counters, single-letter names for, 25 M magic numbers obscuring intent, 295 replacing with named constants, 300–301 main function, moving construction to, 155, 156 managers, role of, 6 mandated comments, 63 manual control, over a serial ID, 272 Map adding for ArgumentMarshaler, 221 methods of, 114 maps, breaking the use of, 222–223 marshalling implementation, 214–215 meaningful context, 27–29 member variables f prefix for, 257 prefixing, 24 renaming for clarity, 259 mental mapping, avoiding, 25 messy code.

See POJOs platforms, running threaded code, 188 pleasing code, 7 pluggable thread-based code, 187 POJO system, agility provided by, 168 POJOs (Plain-Old Java Objects) creating, 187 implementing business logic, 162 separating threaded-aware code, 190 in Spring, 163 writing application domain logic, 166 polyadic argument, 40 polymorphic behavior, of functions, 296 polymorphic changes, 96–97 polymorphism, 37, 299 position markers, 67 positives as easier to understand, 258 expressing conditionals as, 302 of decisions, 301precision as the point of all naming, 30 predicates, naming, 25 preemption, breaking, 338 prefixes for member variables, 24 as useless in today’s environments, 312–313 pre-increment operator, ++, 324, 325, 326 “prequel”, this book as, 15 principle of least surprise, 288–289, 295 principles, of design, 15 PrintPrimes program, translation into Java, 141 private behavior, isolating, 148–149 private functions, 292 private method behavior, 147 problem domain names, 27 procedural code, 97 procedural shape example, 95–96 procedures, compared to objects, 101 process function, repartitioning, 319–320 process method, I/O bound, 319 processes, competing for resources, 184 processor bound, code as, 318 producer consumer execution model, 184 producer threads, 184 production environment, 127–130 productivity, decreased by messy code, 4 professional programmer, 25 professional review, of code, 268 programmers as authors, 13–14 conundrum faced by, 6 responsibility for messes, 5–6 unprofessional, 5–6 programming defined, 2 structured, 48–49 programs, getting them to work, 201 pronounceable names, 21–22 protected variables, avoiding, 80 proxies, drawbacks of, 163 public APIs, javadocs in, 59 puns, avoiding, 26–27 PUTFIELD instruction, as atomic, 325 Q queries, separating from commands, 45–46 R random jiggling, tests running, 190 range, including end-point dates in, 276 readability of clean tests, 124 of code, 76 Dave Thomas on, 9 improving using generics, 115 readability perspective, 8 readers of code, 13–14 continuous, 184 readers-writers execution model, 184 reading clean code, 8 code from top to bottom, 37 versus writing, 14 reboots, as a lock up solution, 331 recommendations, in this book, 13 redesign, demanded by the team, 5 redundancy, of noise words, 21 redundant comments, 60–62, 272, 275, 286–287 ReentrantLock class, 183 refactored programs, as longer, 146 refactoring Args, 212 code incrementally, 172 as an iterative process, 265 putting things in to take out, 233 test code, 127 Refactoring (Fowler), 285 renaming, fear of, 30 repeatability, of concurrency bugs, 180 repeatable tests, 132 requirements, specifying, 2 resetId, byte-code generated for, 324–325 resources bound, 183 processes competing for, 184 threads agreeing on a global ordering of, 338 responsibilities counting in classes, 136 definition of, 138 identifying, 139 misplaced, 295–296, 299 splitting a program into main, 146 return codes, using exceptions instead, 103–105 reuse, 174 risk of change, reducing, 147 robust clear code, writing, 112 rough drafts, writing, 200 runnable interface, 326 run-on expressions, 295 run-on journal entries, 63–64 runtime logic, separating startup from, 154 S safety mechanisms, overridden, 289 scaling up, 157–161 scary noise, 66 schema, of a class, 194 schools of thought, about clean code, 12–13 scissors rule, in C++, 81 scope(s) defined by exceptions, 105 dummy, 90 envying, 293 expanding and indenting, 89 hierarchy in a source file, 88 limiting for data, 181 names related to the length of, 22–23, 312 of shared variables, 333 searchable names, 22–23 Second Law, of TDD, 122 sections, within functions, 36 selector arguments, avoiding, 294–295 self validating tests, 132 Semaphore class, 183 semicolon, making visible, 90 “serial number”, SerialDate using, 271 SerialDate class making it right, 270–284 naming of, 270–271 refactoring, 267–284 SerialDateTests class, 268 serialization, 272 server, threads created by, 319–321 server application, 317–318, 343–344 server code, responsibilities of, 319 server-based locking, 329 as preferred, 332–333 with synchronized methods, 185 “Servlet” model, of Web applications, 178 Servlets, synchronization problems, 182 set functions, moving into appropriate derivatives, 232, 233–235 setArgument, changing, 232–233 setBoolean function, 217 setter methods, injecting dependencies, 157 setup strategy, 155 listing, 50–52 shape classes, 95–96 shared data, limiting access, 181 shared variables method updating, 328 reducing the scope of, 333 shotgun approach, hand-coded instrumentation as, 189 shut-down code, 186 shutdowns, graceful, 186 side effects having none, 44 names describing, 313 Simmons, Robert, 276 simple code, 10, 12 Simple Design, rules of, 171–176 simplicity, of code, 18, 19 single assert rule, 130–131 single concepts, in each test function, 131–132 Single Responsibility Principle (SRP), 15, 138–140 applying, 321 breaking, 155 as a concurrency defense principle, 181 recognizing violations of, 174 server violating, 320 Sql class violating, 147 supporting, 157 in test classes conforming to, 172 violating, 38 single value, ordered components of, 42 single-letter names, 22, 25 single-thread calculation, of throughput, 334 SINGLETON pattern, 274 small classes, 136 Smalltalk Best Practice Patterns, 296 smart programmer, 25 software project, maintenance of, 175 software systems.

pages: 556 words: 46,885

The World's First Railway System: Enterprise, Competition, and Regulation on the Railway Network in Victorian Britain by Mark Casson


banking crisis, barriers to entry, Beeching cuts, British Empire, combinatorial explosion, Corn Laws, corporate social responsibility, David Ricardo: comparative advantage, intermodal, iterative process, joint-stock company, joint-stock limited liability company, knowledge economy, linear programming, Network effects, New Urbanism, performance metric, railway mania, rent-seeking, strikebreaker, the market place, transaction costs

It is this ‘dominating’ counterfactual that is reported here. It should be emphasized that this counterfactual is not superior for every conceivable consignment of traYc, but only for a typical consignment of a certain type of traYc. The counterfactual system is developed from a ‘blank sheet of paper’—almost literally—and not by simply exploring variations to the conWguration of the actual network. It is constructed using an iterative process, as explained below. To avoid the need for a full evaluation of the performance of the network after each iteration, a set of simple criteria were used to guide the initial formulation of the model. These criteria represent conditions that the counterfactual would 6 The World’s First Railway System almost certainly have to fulWl if it were to stand any chance of matching the performance of the actual system.

The destinations involved both large and small towns—because large towns typically act as hubs for small towns on the counterfactual, distinguishing between them is somewhat artificial. Finally, seven regional samples were constructed—reflecting the fact that most journeys on any railway system tend to be relatively short. Comparing the results for different samples illustrates how well the actual system served different types of traffic and different parts of the country. The counterfactual network was constructed using an iterative process. The performance of the actual system was first assessed. This proved to be a most illuminating process, indicating that the actual performance of the system for many categories of traffic was much inferior to what has often been suggested— particularly in the enthusiasts’ literature. An initial counterfactual system was then constructed, using only a limited number of local lines, and its performance compared with the actual system.

The second stage was to specify the local lines. Wherever possible, local lines feed into the hubs identified at the first stage. By concentrating interchange traffic at a limited number of hubs, the number of stops that long-distance trains need to make for connection purposes is reduced. At the same time, the power of hubs to act as a ‘one-stop shop’ for local connections is increased. An iterative process was then followed to fine-tune the interfaces between trunk and local networks. The final stage is based on the counterfactual timetable. The preparation of the timetable provides an opportunity to assess whether upgrading certain links to permit higher speeds would improve connections. Connections are improved when speeding up one link relative to others allows trains entering a hub from the accelerated link to make connections with trains that they would otherwise miss.

pages: 376 words: 110,796

Realizing Tomorrow: The Path to Private Spaceflight by Chris Dubbs, Emeline Paat-dahlstrom, Charles D. Walker


Berlin Wall, call centre, desegregation, Donald Trump, Doomsday Book, Elon Musk, high net worth, Iridium satellite, iterative process, Jeff Bezos, Mark Shuttleworth, Mikhail Gorbachev, multiplanetary species, Norman Mailer, Richard Feynman, Richard Feynman, Ronald Reagan, Search for Extraterrestrial Intelligence, Silicon Valley, Skype, Steve Jobs, Steve Wozniak, technoutopianism, V2 rocket, X Prize, young professional

Having watched one of the CATS teams develop their rocket Carmack then set about going through the same process. "They build a rocket for a year, go out into the desert, they press the button and hope it doesn't blow up. It really rarely works right." He wanted to follow the same "rapid iterative process" he used in developing software and apply it to his rocketry business. "The background that I came from in software is you compile and test maybe a dozen times a day. It's a cyclic thing where you try to make it right but much of the benefit you get is in the exploration of the process, not so much plan it out perfect, implement it perfect for it to work. It's an iterative process of exploring your options." Carmack taught himself aerospace engineering and became one of Armadillo's principal engineers for the project. Armadillo officially registered for the x PRIZE in October zooz when Carmack was sure the prize was funded.

pages: 353 words: 104,146

European Founders at Work by Pedro Gairifo Santos


business intelligence, cloud computing, crowdsourcing, fear of failure, full text search, information retrieval, inventory management, iterative process, Jeff Bezos, Lean Startup, Mark Zuckerberg, natural language processing, pattern recognition, pre–internet, recommendation engine, Richard Stallman, Silicon Valley, Skype, slashdot, Steve Jobs, Steve Wozniak, subscription business, technology bubble, web application, Y Combinator

So that was a small signal that certain people had the kind of engagement with this product that they would probably pay for it. But it got me thinking around that time about monetizing it. I was actually more bothered about what happens when all these early adopters have the product, and perhaps some of them have donated generously. That’s very nice. What happens next? Is it literally just the iterative process of adding more and more features? Or is there something a bit more to this? I think over time it became very apparent that to keep up the iterative process, I needed more staff. I needed help to just keep doing this. We needed to integrate Facebook, and LinkedIn, and more recently Foursquare. To make it more of a hub than simply a Twitter client. So, yes, it didn’t take very long for me to start thinking in that way. It did take quite a while to actually move on it though, to actually start turning it into the company.

pages: 445 words: 105,255

Radical Abundance: How a Revolution in Nanotechnology Will Change Civilization by K. Eric Drexler


3D printing, additive manufacturing, agricultural Revolution, Bill Joy: nanobots, Brownian motion, carbon footprint, Cass Sunstein, conceptual framework, continuation of politics by other means, crowdsourcing, dark matter, double helix, failed state, global supply chain, industrial robot, iterative process, Mars Rover, means of production, Menlo Park, mutually assured destruction, New Journalism, performance metric, reversible computing, Richard Feynman, Richard Feynman, Silicon Valley, South China Sea, Thomas Malthus, V2 rocket, Vannevar Bush, zero-sum game

These words are inscribed on the obelisk that marks his grave: “Man will not always stay on Earth; the pursuit of light and space will lead him to penetrate the bounds of the atmosphere, timidly at first but in the end to conquer the whole of solar space.” 139Engineering with an Exploratory Twist: The structural contrasts between conventional and exploratory engineering can be laid out as follows: Kind of engineering: Production-oriented Exploratory Basic purpose: provides products provides knowledge Basic constraint: accessible fabrication valid modeling Level of design: detailed specification parametric model Primary costs: production, operation design, analysis Design margins: enable robust products enable robust analyses Larger margins: increase costs reduce costs Chapter 10: The Machinery of Radical Abundance 150successive layers will have thicknesses of 1, ½, ¼, , and so on: This neat, self-similar architecture was suggested by Ralph Merkle and improves on the functionally similar version described in Nanosystems. 153inherently messy contact with the stuff of nature: In converting raw materials into purified feedstocks, the path to atomic precision leads through reducing molecular complexity, a natural task for conventional chemical processes like those used in industrial processes to dissolve minerals and convert their materials into simple molecular and ionic species. These can then be separated through cascade sorting processes well suited to atomically precise mechanisms, as discussed in Nanosystems. 153with a site that binds a feedstock molecule: A mechanism downstream can probe each site to ensure that it’s full and push empties on a path that leads them around for another go. With an iterative process of this kind, the free energy requirement for reliable binding can approach the minimum required for reducing entropy, which can be thought of as the work required for compression in a configuration space with both positional and angular coordinates. Computational chemists will note that free energy calculations involving oriented, non-solvated molecules in a rigid binding site are far less challenging than calculations that must probe the configuration space of mobile, solvated, conformationally flexible molecules.

A large slowdown factor from this base (to reduce phonon drag in bearings, for example) is compatible with a still-enormous product throughput. 154chemical steps that prepare reactive bits of molecular structure: To work reliably with small reactive groups and fine-grained structures, placement mechanisms must be stiff enough to adequately constrain thermal fluctuations. (Appendix I and Appendix II place this requirement in perspective.) 156each step typically must expend substantial chemical energy: As with binding, an iterated process with conditional repetition can in some instances avoid this constraint. 157density functional methods . . . applied in conservative ways: Methods in quantum chemistry have limited accuracy and ranges of applicability that must be kept in mind when considering which methods to use and how far to trust their results. Density functional methods, for example, typically underestimate the energies of reaction transition states, and this may or may not be acceptable, depending on the intended application.

pages: 121 words: 36,908

Four Futures: Life After Capitalism by Peter Frase

3D printing, Airbnb, basic income, bitcoin, call centre, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, cryptocurrency, deindustrialization, Edward Snowden, Erik Brynjolfsson, Ferguson, Missouri, fixed income, full employment, future of work, high net worth, income inequality, industrial robot, informal economy, Intergovernmental Panel on Climate Change (IPCC), iterative process, job automation, John Maynard Keynes: Economic Possibilities for our Grandchildren, litecoin, mass incarceration, means of production, Norbert Wiener, Occupy movement, pattern recognition, peak oil, Plutocrats, plutocrats, postindustrial economy, price mechanism, private military company, Ray Kurzweil, Robert Gordon, Second Machine Age, self-driving car, sharing economy, Silicon Valley, smart meter, TaskRabbit, technoutopianism, The Future of Employment, Thomas Malthus, Tyler Cowen: Great Stagnation, universal basic income, Wall-E, Watson beat the top human players on Jeopardy!, We are the 99%, Wolfgang Streeck

Even high-level managerial functions can be partly automated: in 2014, a Hong Kong venture capital fund called Deep Knowledge appointed an algorithm, a program called VITAL, to its board, where it receives a vote on all investments.19 And perhaps even “creativity” isn’t such a uniquely human talent (if we reduce that word to the creation of replicator patterns). In a paper presented to a 2014 conference of the Association of Computing Machinery, a group of medical researchers presented a method for automatically generating plausible hypotheses for scientists to test, using data mining techniques.20 Such approaches could eventually be applied to other formulaic, iterative processes like the design of pop songs or smartphone games. What’s more, there is also another way for private companies to avoid employing workers for some of these tasks: turn them into activities that people will find pleasurable and will thus do for free on their own time. The computer scientist Luis von Ahn has specialized in developing such “games with a purpose”: applications that present themselves to end users as enjoyable diversions but which also perform a useful computational task, what von Ahn calls “Human Computation.”21 One of Von Ahn’s early games asked users to identify objects in photos, and the data was then fed back into a database that was used for searching images, a technology later licensed by Google to improve its Image Search.

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil


additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business intelligence,, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter,, epigenetics, factory automation, friendly AI, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Mikhail Gorbachev, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Richard Feynman, Robert Metcalfe, Rodney Brooks, Search for Extraterrestrial Intelligence, selection bias, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Y2K, Yogi Berra

When the improvement in the evaluation of the design creatures from one generation to the next becomes very small, we stop this iterative cycle of improvement and use the best design(s) in the last generation. (For an algorithmic description of genetic algorithms, see this note.175) The key to a GA is that the human designers don't directly program a solution; rather, they let one emerge through an iterative process of simulated competition and improvement. As we discussed, biological evolution is smart but slow, so to enhance its intelligence we retain its discernment while greatly speeding up its ponderous pace. The computer is fast enough to simulate many generations in a matter of hours or days or weeks. But we have to go through this iterative process only once; once we have let this simulated evolution run its course, we can apply the evolved and highly refined rules to real problems in a rapid fashion. Like neural nets GAs are a way to harness the subtle but profound patterns that exist in chaotic data.

But it certainly achieves vast levels of all of these qualities, including intelligence. With the reverse engineering of the human brain we will be able to apply the parallel, self-organizing, chaotic algorithms of human intelligence to enormously powerful computational substrates. This intelligence will then be in a position to improve its own design, both hardware and software, in a rapidly accelerating iterative process. But there still appears to be a limit. The capacity of the universe to support intelligence appears to be only about 1090 calculations per second, as I discussed in chapter 6. There are theories such as the holographic universe that suggest the possibility of higher numbers (such as 10120), but these levels are all decidedly finite. Of course, the capabilities of such an intelligence may appear infinite for all practical purposes to our current level of intelligence.

pages: 1,606 words: 168,061

Python Cookbook by David Beazley, Brian K. Jones


don't repeat yourself, Firefox, Guido van Rossum, iterative process, p-value, web application

For example: >>> s = ' hello world \n' >>> s = s.strip() >>> s 'hello world' >>> If you needed to do something to the inner space, you would need to use another technique, such as using the replace() method or a regular expression substitution. For example: >>> s.replace(' ', '') 'helloworld' >>> import re >>> re.sub('\s+', ' ', s) 'hello world' >>> It is often the case that you want to combine string stripping operations with some other kind of iterative processing, such as reading lines of data from a file. If so, this is one area where a generator expression can be useful. For example: with open(filename) as f: lines = (line.strip() for line in f) for line in lines: ... Here, the expression lines = (line.strip() for line in f) acts as a kind of data transform. It’s efficient because it doesn’t actually read the data into any kind of temporary list first.

_child_iter) return nextchild except StopIteration: self._child_iter = None return next(self) # Advance to the next child and start its iteration else: self._child_iter = next(self._children_iter).depth_first() return next(self) The DepthFirstIterator class works in the same way as the generator version, but it’s a mess because the iterator has to maintain a lot of complex state about where it is in the iteration process. Frankly, nobody likes to write mind-bending code like that. Define your iterator as a generator and be done with it. 4.5. Iterating in Reverse Problem You want to iterate in reverse over a sequence. Solution Use the built-in reversed() function. For example: >>> a = [1, 2, 3, 4] >>> for x in reversed(a): ... print(x) ... 4 3 2 1 Reversed iteration only works if the object in question has a size that can be determined or if the object implements a __reversed__() special method.

Nor does it perform any kind of validation of the inputs to check if they meet the ordering requirements. Instead, it simply examines the set of items from the front of each input sequence and emits the smallest one found. A new item from the chosen sequence is then read, and the process repeats itself until all input sequences have been fully consumed. 4.16. Replacing Infinite while Loops with an Iterator Problem You have code that uses a while loop to iteratively process data because it involves a function or some kind of unusual test condition that doesn’t fall into the usual iteration pattern. Solution A somewhat common scenario in programs involving I/O is to write code like this: CHUNKSIZE = 8192 def reader(s): while True: data = s.recv(CHUNKSIZE) if data == b'': break process_data(data) Such code can often be replaced using iter(), as follows: def reader(s): for chunk in iter(lambda: s.recv(CHUNKSIZE), b''): process_data(data) If you’re a bit skeptical that it might work, you can try a similar example involving files.

pages: 348 words: 39,850

Data Scientists at Work by Sebastian Gutierrez


Albert Einstein, algorithmic trading, Bayesian statistics, bioinformatics, bitcoin, business intelligence, chief data officer, clean water, cloud computing, commoditize, computer vision, continuous integration, correlation does not imply causation, creative destruction, crowdsourcing, data is the new oil, DevOps, domain-specific language, Donald Knuth, follow your passion, full text search, informal economy, information retrieval, Infrastructure as a Service, Intergovernmental Panel on Climate Change (IPCC), inventory management, iterative process, lifelogging, linked data, Mark Zuckerberg, microbiome, Moneyball by Michael Lewis explains big data, move fast and break things, move fast and break things, natural language processing, Network effects, nuclear winter, optical character recognition, pattern recognition, Paul Graham, personalized medicine, Peter Thiel, pre–internet, quantitative hedge fund, quantitative trading / quantitative finance, recommendation engine, Renaissance Technologies, Richard Feynman, Richard Feynman, self-driving car, side project, Silicon Valley, Skype, software as a service, speech recognition, statistical model, Steve Jobs, stochastic process, technology bubble, text mining, the scientific method, web application

You have to think about what question you want to answer, as well as what question you can answer with the data. So many people, I think, neglect to think about how long that takes and what industry-specific knowledge, as well as knowledge of your own data that this takes. So that was an important lesson. Gutierrez: Once you arrived at the modeling stage, what was the process like? Hu: The modeling was definitely an iterative process. We started off with throwing theoretical models at it, and quickly realized that there were a lot of things we had not accounted for in the initial thinking. For example, most artists do not have all the social media networks set up and connected. So you get this unusual data artifact that, for each row of data about an artist, you only have a couple of metrics for that artist, and it varies across the whole universe of artists.

I realized it when I ran the model, and all of a sudden, all of these artists who did not have certain networks connected were showing up really low—like Kanye West did not have Facebook or a similar network connected, so his predictions were really low, and that obviously did not make any sense. 267 268 Chapter 13 | Victor Hu, Next Big Sound We had to go back and figure out how to deal with that, so it was very much an iterative process. That was where a lot of the statistical testing comes in, and you can see that the fact that someone does not have a network connected actually does provide a lot of information. Eventually, I had to code that in—the presence of a network is one of the predictor variables. So that is one interesting and kind of unusual aspect to the music data that we discovered during the modeling process.

The chapter locations were each chosen for their unique combination of local technical data science expertise and the range of opportunities to work with mission-driven organizations tackling the world’s biggest problems. This means that we’re building on our track record of project work, where we’ve been functioning like a Data for Good consultancy, and taking the first steps to build a global Data for Good movement, where people are doing the same thing around the world in their own communities, on their own, with our help and our playbook. And of course it’s an iterative process.They’ll learn from our experience and framework, but they’ll also find ways that work better and help us improve our process. This year is going to be a really big year for us in terms of understanding this process, its impact, and helping scale it out to others. Gutierrez: Why is it important to scale up DataKind? Porway: We really feel that if we’re going to tackle the world’s biggest problems with data science, then we want as much of a movement and community as possible.

pages: 696 words: 143,736

The Age of Spiritual Machines: When Computers Exceed Human Intelligence by Ray Kurzweil


Ada Lovelace, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, Any sufficiently advanced technology is indistinguishable from magic, Buckminster Fuller, call centre, cellular automata, combinatorial explosion, complexity theory, computer age, computer vision, cosmological constant, cosmological principle, Danny Hillis, double helix, Douglas Hofstadter, Everything should be made as simple as possible, first square of the chessboard / second half of the chessboard, fudge factor, George Gilder, Gödel, Escher, Bach, I think there is a world market for maybe five computers, information retrieval, invention of movable type, Isaac Newton, iterative process, Jacquard loom, Jacquard loom, John Markoff, John von Neumann, Lao Tzu, Law of Accelerating Returns, mandelbrot fractal, Marshall McLuhan, Menlo Park, natural language processing, Norbert Wiener, optical character recognition, ought to be enough for anybody, pattern recognition, phenotype, Ralph Waldo Emerson, Ray Kurzweil, Richard Feynman, Richard Feynman, Robert Metcalfe, Schrödinger's Cat, Search for Extraterrestrial Intelligence, self-driving car, Silicon Valley, speech recognition, Steven Pinker, Stewart Brand, stochastic process, technological singularity, Ted Kaczynski, telepresence, the medium is the message, There's no reason for any individual to have a computer in his home - Ken Olsen, traveling salesman, Turing machine, Turing test, Whole Earth Review, Y2K

This includes a majority stake in Advanced Investment Technologies, which runs a successful fund in which buy-and-sell decisions are made by a program combining these methods.30 Evolutionary and related techniques guide a $95 billion fund managed by Barclays Global Investors, as well as funds run by Fidelity and PanAgora Asset Management. The above paradigm is called an evolutionary (sometimes called genetic) algorithm.31 The system designers don’t directly program a solution; they let one emerge through an iterative process of simulated competition and improvement. Recall that evolution is smart but slow, so to enhance its intelligence we retain its discernment while greatly speeding up its ponderous pace. The computer is fast enough to simulate thousands of generations in a matter of hours or days or weeks. But we have only to go through this iterative process one time. Once we have let this simulated evolution run its course, we can apply the evolved and highly refined rules to real problems in a rapid fashion. Like neural nets, evolutionary algorithms are a way of harnessing the subtle but profound patterns that exist in chaotic data.

pages: 468 words: 124,573

How to Build a Billion Dollar App: Discover the Secrets of the Most Successful Entrepreneurs of Our Time by George Berkowski


Airbnb, Amazon Web Services, barriers to entry, Black Swan, business intelligence, call centre, crowdsourcing,, game design, Google Glasses, Google Hangouts, Google X / Alphabet X, iterative process, Jeff Bezos, Jony Ive, Kickstarter, knowledge worker, Lean Startup, loose coupling, Marc Andreessen, Mark Zuckerberg, minimum viable product, move fast and break things, move fast and break things, Network effects, Oculus Rift, Paul Graham, QR code, Ruby on Rails, self-driving car, Silicon Valley, Silicon Valley startup, Skype, Snapchat, social graph, software as a service, software is eating the world, Steve Jobs, Steven Levy, Y Combinator

We knew that, if we couldn’t generate enough interest from drivers to use the app, there wouldn’t be a large enough supply of taxis, which would mean no passengers using the service, which would then mean we didn’t have a business. The passenger app could clearly wait. Great. So Jay, Caspar and our taxi-driver cofounders – TRG (Terry, Russell and Gary) – were all full of great ideas about what features to include in the driver app to make it appeal to drivers. After a long debate we had agreed on three features – and they were baked into the very first ‘paper sketches’ of the driver app. Now came the long – and iterative – process of actually transforming those sketches into a piece of software. Every week I hosted a group of about 15 taxi drivers for breakfast at a café and restaurant called Smiths of Smithfields. They were all keen to give us a round of colourful weekly feedback: ‘These buttons are way too small, gov,’ said one driver. ‘I think your fingers are just too fat,’ replied another driver. I couldn’t disagree – he did seem to have pretty fat fingers.

We were able to build it in a matter of weeks, cutting plenty of corners along the way. We even added the ability for the customer to pay with a stored credit card (yes, we faked it again). The first time I summoned one of our test group drivers he actually drove quite a few miles to pick me up, and I have to say I that was truly wowed. I couldn’t believe it had worked. Really Viable Building your own prototype is a tricky and iterative process. What you are trying to do is create the bare bones of something – the very basic vision of your app – and see whether it can become something that people love. You need to get to wow as quickly, cheaply and efficiently as possible. There’s no point wasting time or money on any app that doesn’t get to wow. The point of doing it this way – using paper designs, testing the bare minimum – is to get real data, to get real validation.

pages: 170 words: 42,196

Don't Make Me Think!: A Common Sense Approach to Web Usability by Steve Krug


collective bargaining, iterative process,, Silicon Valley, web application, Whole Earth Catalog

People like to think, for instance, that they can use testing to prove whether navigation system “a” is better than navigation system “b”, but you can’t. No one has the resources to set up the kind of controlled experiment you’d need. What testing can do is provide you with invaluable input which, taken together with your experience, professional judgment, and common sense, will make it easier for you to choose wisely—and with greater confidence—between “a” and “b.” > Testing is an iterative process. Testing isn’t something you do once. You make something, test it, fix it, and test it again. > Nothing beats a live audience reaction. One reason why the Marx Brothers’ movies are so wonderful is that before they started filming they would go on tour on the vaudeville circuit and perform scenes from the movie, doing five shows a day, improvising constantly and noting which lines got the best laughs.

pages: 199 words: 43,653

Hooked: How to Build Habit-Forming Products by Nir Eyal


Airbnb, AltaVista, Cass Sunstein, choice architecture, cognitive bias, cognitive dissonance,, framing effect, game design, Google Glasses, Inbox Zero, invention of the telephone, iterative process, Jeff Bezos, Lean Startup, Mahatma Gandhi, Mark Zuckerberg, meta analysis, meta-analysis, Oculus Rift, Paul Buchheit, Paul Graham, Peter Thiel, QWERTY keyboard, Silicon Valley, Silicon Valley startup, Snapchat, TaskRabbit, telemarketer, the new new thing, Toyota Production System, Y Combinator

The process of developing successful habit-forming technologies requires patience and persistence. The Hook Model can be a helpful tool for filtering out bad ideas with low habit potential as well as a framework for identifying room for improvement in existing products. However, after the designer has formulated new hypotheses, there is no way to know which ideas will work without testing them with actual users. Building a habit-forming product is an iterative process and requires user behavior analysis and continuous experimentation. How can you implement the concepts in this book to measure your product’s effectiveness building user habits? Through my studies and discussions with entrepreneurs at today’s most successful habit-forming companies, I’ve distilled this process into what I call “Habit Testing.” It is a process inspired by the build-measure-learn methodology championed by the lean startup movement.

pages: 137 words: 44,363

Design Is a Job by Mike Monteiro


4chan, crowdsourcing, index card, iterative process, John Gruber, Kickstarter, late fees, Steve Jobs

Encourage them to stay in their own zone of expertise and they won’t attempt to hop on yours. Never apologize for what you’re not showing. By the time you’re presenting, you should be focused on presenting what you have, not making excuses for what you don’t. And you need to believe what you’re saying to convince the client of the same. If you think the work is on the way to meeting their goals then say that. Design is an iterative process, done with a client’s proper involvement at key points. The goal isn’t always to present finished work; it’s to present work at the right time. I’ve met a few designers over the years who feel like selling design is manipulation. Manipulation is convincing someone that the truth is different than what it seems. You’re familiar with the marketing phrase “Sell the sizzle, not the steak”?

pages: 202 words: 62,199

Essentialism: The Disciplined Pursuit of Less by Greg McKeown


Albert Einstein, Clayton Christensen, Daniel Kahneman / Amos Tversky, deliberate practice, double helix,, endowment effect, Isaac Newton, iterative process, Jeff Bezos, Lao Tzu, loss aversion, Mahatma Gandhi, microcredit, minimum viable product, North Sea oil, Peter Thiel, Ralph Waldo Emerson, Richard Thaler, Rosa Parks, side project, Silicon Valley, Silicon Valley startup, sovereign wealth fund, Steve Jobs, Vilfredo Pareto

We can ask ourselves, “What is the smallest amount of progress that will be useful and valuable to the essential task we are trying to get done?” I used this practice in writing this book. For example, when I was still in the exploratory mode of the book, before I’d even begun to put pen to paper (or fingers to keyboard), I would share a short idea (my minimal viable product) on Twitter. If it seemed to resonate with people there, I would write a blog piece on Harvard Business Review. Through this iterative process, which required very little effort, I was able to find where there seemed to be a connection between what I was thinking and what seemed to have the highest relevancy in other people’s lives. It is the process Pixar uses on their movies. Instead of starting with a script, they start with storyboards—or what have been described as the comic book version of a movie. They try ideas out and see what works.

pages: 222 words: 53,317

Overcomplicated: Technology at the Limits of Comprehension by Samuel Arbesman


3D printing, algorithmic trading, Anton Chekhov, Apple II, Benoit Mandelbrot, citation needed, combinatorial explosion, Danny Hillis, David Brooks, digital map, discovery of the americas,, Erik Brynjolfsson, Flash crash, friendly AI, game design, Google X / Alphabet X, Googley, HyperCard, Inbox Zero, Isaac Newton, iterative process, Kevin Kelly, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, mandelbrot fractal, Minecraft, Netflix Prize, Nicholas Carr, Parkinson's law, Ray Kurzweil, recommendation engine, Richard Feynman, Richard Feynman, Richard Feynman: Challenger O-ring, Second Machine Age, self-driving car, software studies, statistical model, Steve Jobs, Steve Wozniak, Steven Pinker, Stewart Brand, superintelligent machines, Therac-25, Tyler Cowen: Great Stagnation, urban planning, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, Y2K

Humility recognizes our own limitations but is not paralyzed by them, nor does it enshrine them. A humble approach to our technologies helps us strive to understand these human-made, messy constructions, yet still yield to our limits. And this humble approach to technology fits quite nicely with biological thinking. While at every moment an incremental approach to knowledge provides additional understanding of a system, this iterative process will always feel incomplete. And that’s okay. New York Times columnist David Brooks has noted, “Wisdom starts with epistemological modesty.” Humility, alongside an interest in the details of complex systems, can do what both fear and worship cannot: help us peer and poke around the backs of our systems, even if we never look them in the face with complete understanding. In many instances, an incomplete muddle of understanding may be the best that we can do.

pages: 189 words: 52,741

Lifestyle Entrepreneur: Live Your Dreams, Ignite Your Passions and Run Your Business From Anywhere in the World by Jesse Krieger


Airbnb, always be closing, bounce rate, call centre, carbon footprint, commoditize, Deng Xiaoping, financial independence, follow your passion, income inequality, iterative process, Ralph Waldo Emerson, search engine result page, Skype, software as a service, South China Sea, Steve Jobs

Ask for specific deliverables, but always ask for 1-2 creative ideas of their own. This encourages creativity and you may be pleasantly surprised Once the initial sketches designs are completed I’ll look for various elements in the logo that I like and write feedback asking them to incorporate various aspects from the initial designs into a new round of logos based on my feedback. This is an iterative process where each round of designs helps clarify the idea I have in mind and informs the directions I give the designer for the next round of improvements. Generally going through this process 2-3 times gets me 80-90% of the way there and then the final changes usually revolve around changing font styles, adjusting color schemes and the placement of elements within the logo. Having a basic working knowledge of Photoshop allows me to try ideas out and play with the placement of elements, although describing the changes that need to be made accomplishes the same goal.

pages: 167 words: 50,652

Alternatives to Capitalism by Robin Hahnel, Erik Olin Wright


3D printing, affirmative action, basic income, crowdsourcing, inventory management, iterative process, Kickstarter, loose coupling, means of production, Pareto efficiency, profit maximization, race to the bottom, transaction costs

If the proposals are rejected, households revise them. 5.Neighborhood consumption councils aggregate the approved individual consumption requests of all households in the neighborhood, append requests for whatever neighborhood public goods they want, and submit the total list as the neighborhood consumption council’s request in the planning process. 6.Higher-level federations of consumption councils make requests for whatever public goods are consumed by their membership. 7.On the basis of all of the consumption proposals along with the production proposals from worker councils, the IFB recalculates the indicative prices and, where necessary, sends proposals back to the relevant councils for revision. 8.This iterative process continues until no revisions are needed. There are two issues that I would like to raise with this account about how household consumption planning would actually work in practice: (1) How useful is household consumption planning? (2) How marketish are “adjustments”? How Useful Is Household Consumption Planning? Robin argues that this planning process would not be especially demanding on people.

pages: 176 words: 54,784

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life by Mark Manson


false memory syndrome, fear of failure, iterative process, Parkinson's law, Rubik’s Cube

Just as Present Mark can look back on Past Mark’s every flaw and mistake, one day Future Mark will look back on Present Mark’s assumptions (including the contents of this book) and notice similar flaws. And that will be a good thing. Because that will mean I have grown. There’s a famous Michael Jordan quote about him failing over and over and over again, and that’s why he succeeded. Well, I’m always wrong about everything, over and over and over again, and that’s why my life improves. Growth is an endlessly iterative process. When we learn something new, we don’t go from “wrong” to “right.” Rather, we go from wrong to slightly less wrong. And when we learn something additional, we go from slightly less wrong to slightly less wrong than that, and then to even less wrong than that, and so on. We are always in the process of approaching truth and perfection without actually ever reaching truth or perfection. We shouldn’t seek to find the ultimate “right” answer for ourselves, but rather, we should seek to chip away at the ways that we’re wrong today so that we can be a little less wrong tomorrow.

pages: 307 words: 17,123

Behind the cloud: the untold story of how went from idea to billion-dollar company--and revolutionized an industry by Marc Benioff, Carlye Adler


Albert Einstein, Apple's 1984 Super Bowl advert, barriers to entry, Bay Area Rapid Transit, business continuity plan, call centre, carbon footprint, Clayton Christensen, cloud computing, corporate social responsibility, crowdsourcing, iterative process, Maui Hawaii, Nicholas Carr, platform as a service, Silicon Valley, software as a service, Steve Ballmer, Steve Jobs

In fact, we have determined a sequential process to growth that we initiated in the United States and adhere to in nearly every market we enter. The system includes entering a country, establishing a beachhead, gaining customers, earning local references, and then making hires. Next, we seek partners, build add-ons, and grow field sales. It is a system that operates as a machine with distinct cogs that work together. The best part is that it is an iterative process that works in almost all markets; or as Doug Farber, 177 BEHIND THE CLOUD who’s built our markets in Australia and Asia, says, the ability to ‘‘rinse and repeat’’ is the key to global growth. Play #81: Uphold a One-Company Attitude Across Borders The Internet was making the world more homogeneous when it came to IT needs, and the services we were selling were not affected by global boundaries.

pages: 220 words: 73,451

Democratizing innovation by Eric von Hippel


additive manufacturing, correlation coefficient, Debian, hacker house, informal economy, information asymmetry, inventory management, iterative process, James Watt: steam engine, knowledge economy, meta analysis, meta-analysis, Network effects, placebo effect, principal–agent problem, Richard Stallman, software patent, transaction costs, Vickrey auction

Repartitioning of Development Tasks To create the setting for a toolkit, one must partition the tasks of product development to concentrate need-related information in some and solution-related information in others. This can involve fundamental changes to the underlying architecture of a product or service. As illustration, I first discuss the repartioning of the tasks involved in custom semiconductor chip development. Then, I show how the same principles can be applied in the less technical context of custom food design. Traditionally, fully customized integrated circuits were developed in an iterative process like that illustrated in figure 11.1. The process began with a user specifying the functions that the custom chip was to perform to a manufacturer of integrated circuits. The chip would then be designed by manufacturer employees, and an (expensive) prototype would be produced and sent to the user. Testing by the user would typically reveal faults in the chip and/or in the initial specification, responsive changes would be made, a new prototype would be built.

pages: 411 words: 80,925

What's Mine Is Yours: How Collaborative Consumption Is Changing the Way We Live by Rachel Botsman, Roo Rogers


Airbnb, barriers to entry, Bernie Madoff, bike sharing scheme, Buckminster Fuller, carbon footprint, Cass Sunstein, collaborative consumption, collaborative economy, commoditize, Community Supported Agriculture, credit crunch, crowdsourcing, dematerialisation, disintermediation,, experimental economics, George Akerlof, global village, Hugh Fearnley-Whittingstall, information retrieval, iterative process, Kevin Kelly, Kickstarter, late fees, Mark Zuckerberg, market design, Menlo Park, Network effects, new economy, new new economy, out of africa, Parkinson's law, peer-to-peer, peer-to-peer lending, peer-to-peer rental, Ponzi scheme, pre–internet, recommendation engine, RFID, Richard Stallman, ride hailing / ride sharing, Robert Shiller, Robert Shiller, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, Simon Kuznets, Skype, slashdot, smart grid, South of Market, San Francisco, Stewart Brand, The Nature of the Firm, The Spirit Level, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thorstein Veblen, Torches of Freedom, transaction costs, traveling salesman, ultimatum game, Victor Gruen, web of trust, women in the workforce, Zipcar

Consumption is no longer an asymmetrical activity of endless acquisition but a dynamic push and pull of giving and collaborating in order to get what you want. Along the way, the acts of collaboration and giving become an end in themselves. Collaborative Consumption shows consumers that their material wants and needs do not need to be in conflict with the responsibilities of a connected citizen. The idea of happiness being epitomized by the lone shopper surrounded by stuff becomes absurd, and happiness becomes a much broader, more iterative process. Reputation Bank Account Reputation is one of the most salient areas where the push and pull between the collective good and self-interest have real impact. Reputation is a personal reward that is intimately bound up with respecting and considering the needs of others. Undeniably, almost all of us wonder and care, at least a little bit, what other people—friends, family, coworkers, and people we have just met—think about us.

pages: 239 words: 64,812

Geek Sublime: The Beauty of Code, the Code of Beauty by Vikram Chandra


Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Apple II, barriers to entry, Berlin Wall, British Empire, business process, conceptual framework, create, read, update, delete, crowdsourcing, don't repeat yourself, Donald Knuth, East Village, European colonialism, finite state, Firefox, Flash crash, glass ceiling, Grace Hopper, haute couture, iterative process, Jaron Lanier, John von Neumann, land reform, London Whale, Norman Mailer, Paul Graham, pink-collar, revision control, Silicon Valley, Silicon Valley ideology, Skype, Steve Jobs, Steve Wozniak, supercomputer in your pocket, theory of mind, Therac-25, Turing machine, wikimedia commons, women in the workforce

The best-known assertion of this notion is the essay “Hackers and Painters” by programmer and venture capitalist Paul Graham. “Of all the different types of people I’ve known, hackers and painters are among the most alike,” writes Graham. “What hackers and painters have in common is that they’re both makers. Along with composers, architects, and writers, what hackers and painters are trying to do is make good things.”1 According to Graham, the iterative processes of programming—write, debug (discover and remove bugs, which are coding errors, mistakes), rewrite, experiment, debug, rewrite—exactly duplicate the methods of artists: “The way to create something beautiful is often to make subtle tweaks to something that already exists, or to combine existing ideas in a slightly new way … You should figure out programs as you’re writing them, just as writers and painters and architects do.”2 Attention to detail further marks good hackers with artist-like passion: All those unseen details [in a Leonardo da Vinci painting] combine to produce something that’s just stunning, like a thousand barely audible voices all singing in tune.

pages: 186 words: 50,651

Interactive Data Visualization for the Web by Scott Murray


barriers to entry, Firefox, iterative process, web application, your tax dollars at work

So we usually employ the power of computation to speed things up. The increased speed enables us to work with much larger datasets of thousands or millions of values; what would have taken years of effort by hand can be mapped in a moment. Just as important, we can rapidly experiment with alternate mappings, tweaking our rules and seeing their output re-rendered immediately. This loop of write/render/evaluate is critical to the iterative process of refining a design. Sets of mapping rules function as design systems. The human hand no longer executes the visual output; the computer does. Our human role is to conceptualize, craft, and write out the rules of the system, which is then finally executed by software. Unfortunately, software (and computation generally) is extremely bad at understanding what, exactly, people want. (To be fair, many humans are also not good at this challenging task.)

pages: 202 words: 64,725

Designing Your Life: How to Build a Well-Lived, Joyful Life by Bill Burnett, Dave Evans


David Brooks, fear of failure, financial independence, game design, Haight Ashbury, invention of the printing press, iterative process, knowledge worker, market design, science of happiness, Silicon Valley, Silicon Valley startup, Skype, Steve Jobs

Knowing the current status of your health / work / play / love dashboard gives you a framework and some data about yourself, all in one place. Only you know what’s good enough or not good enough—right now. After a few more chapters and a few more tools and ideas, you may want to come back to this assessment and check the dashboard one more time, to see if anything has changed. Since life design is an iterative process of prototypes and experimentation, there are lots of on ramps and off ramps along the way. If you’re beginning to think like a designer, you will recognize that life is never done. Work is never done. Play is never done. Love and health are never done. We are only done designing our lives when we die. Until then, we’re involved in a constant iteration of the next big thing: life as we know it.

pages: 310 words: 82,592

Never Split the Difference: Negotiating as if Your Life Depended on It by Chris Voss, Tahl Raz


banking crisis, Black Swan, clean water, cognitive bias, Daniel Kahneman / Amos Tversky, Donald Trump, framing effect, friendly fire, iterative process, loss aversion, market fundamentalism, price anchoring, telemarketer, ultimatum game, uranium enrichment

All I knew about the techniques we used at the FBI was that they worked. In the twenty years I spent at the Bureau we’d designed a system that had successfully resolved almost every kidnapping we applied it to. But we didn’t have grand theories. Our techniques were the products of experiential learning; they were developed by agents in the field, negotiating through crisis and sharing stories of what succeeded and what failed. It was an iterative process, not an intellectual one, as we refined the tools we used day after day. And it was urgent. Our tools had to work, because if they didn’t someone died. But why did they work? That was the question that drew me to Harvard, to that office with Mnookin and Blum. I lacked confidence outside my narrow world. Most of all, I needed to articulate my knowledge and learn how to combine it with theirs—and they clearly had some—so I could understand, systematize, and expand it.

pages: 265 words: 70,788

The Wide Lens by Ron Adner

barriers to entry, call centre, Clayton Christensen, inventory management, iterative process, Jeff Bezos, Lean Startup, M-Pesa, minimum viable product, mobile money, new economy, RAND corporation, RFID, smart grid, smart meter, spectrum auction, Steve Ballmer, Steve Jobs, Steven Levy, supply-chain management, Tim Cook: Apple, transaction costs

Red lights, however, are a major problem. Any red light that appears on your map—whether because of a partner’s inability to deliver or unwillingness to cooperate, or due to a problem on your part—must be addressed. This can mean any number of scenarios, from managing incentives to finding a way to eliminate the troublesome link in your blueprint. Often, identifying the most promising path is an iterative process. Only once you have made the necessary adjustments can you confidently start your engines. This is not to say that seeing all green guarantees success; you will still face all the usual unknowns of the market and its vagaries. Execution is critical. But unless you have a plan to get to green across the board, expect delays and disappointment even if you deliver your own part flawlessly. The Elusive E-Reader Let’s apply the value blueprint methodology to examine why Amazon and Sony achieved radically different outcomes in developing the market for e-readers, and how these outcomes were rooted in the starkly different approach they used to construct their ecosystems.

Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei


bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

For each of these K seeds, we find all the patterns within a ball of a size specified by τ. All the patterns in each “ball” are then fused together to generate a set of superpatterns. These superpatterns form a new pool. If the pool contains more than K patterns, the next iteration begins with this pool for the new round of random drawing. As the support set of every superpattern shrinks with each new iteration, the iteration process terminates. Note that Pattern-Fusion merges small subpatterns of a large pattern instead of incrementally-expanding patterns with single items. This gives the method an advantage to circumvent midsize patterns and progress on a path leading to a potential colossal pattern. The idea is illustrated in Figure 7.10. Each point shown in the metric space represents a core pattern. In comparison to a smaller pattern, a larger pattern has far more core patterns that are close to one another, all of which are bounded by a ball, as shown by the dotted lines.

Cross-validation techniques for accuracy estimation (described in Chapter 8) can be used to help decide when an acceptable network has been found. A number of automated techniques have been proposed that search for a “good” network structure. These typically use a hill-climbing approach that starts with an initial structure that is selectively modified. 9.2.3. Backpropagation “How does backpropagation work?” Backpropagation learns by iteratively processing a data set of training tuples, comparing the network's prediction for each tuple with the actual known target value. The target value may be the known class label of the training tuple (for classification problems) or a continuous value (for numeric prediction). For each training tuple, the weights are modified so as to minimize the mean-squared error between the network's prediction and the actual target value.

The Partitioning Around Medoids (PAM) algorithm (see Figure 10.5 later) is a popular realization of k-medoids clustering. It tackles the problem in an iterative, greedy way. Like the k-means algorithm, the initial representative objects (called seeds) are chosen arbitrarily. We consider whether replacing a representative object by a nonrepresentative object would improve the clustering quality. All the possible replacements are tried out. The iterative process of replacing representative objects by other objects continues until the quality of the resulting clustering cannot be improved by any replacement. This quality is measured by a cost function of the average dissimilarity between an object and the representative object of its cluster. Specifically, let o1, …, ok be the current set of representative objects (i.e., medoids). To determine whether a nonrepresentative object, denoted by orandom, is a good replacement for a current medoid oj (1 ≤ j ≤ k), we calculate the distance from every object p to the closest object in the set {o1, …, oj−1, orandom, oj+1, …, ok}, and use the distance to update the cost function.

pages: 728 words: 182,850

Cooking for Geeks by Jeff Potter


3D printing, A Pattern Language, carbon footprint, centre right, Community Supported Agriculture, Computer Numeric Control, crowdsourcing, Donald Knuth, double helix,, European colonialism, fear of failure, food miles, functional fixedness, hacker house, haute cuisine, helicopter parent, Internet Archive, iterative process, Parkinson's law, placebo effect, random walk, Rubik’s Cube, slashdot, stochastic process, the scientific method

Sure, to be proficient at something you do need the technical skill to be able to see where you want to go and to understand how to get there. And happy accidents do happen. However, the methodical approach is to look at A, wonder if maybe B would be better, and rework it until you have B. ("Hmm, seems a bit dull, needs a bit more zing, how about some lemon juice?") The real skill isn’t in getting to B, though: it’s in holding the memory of A in your head and judging whether B is actually an improvement. It’s an iterative process—taste, adjust, taste, adjust—with each loop either improving the dish or educating you about what guesses didn’t work out. Even the bad guesses are useful because they’ll help you build up a body of knowledge. Taste the dish. It’s your feedback mechanism both for checking if A is "good enough" and for determining if B is better than A. Don’t be afraid to burn dinner! Talking with other geeks, I realized how lucky I was as a kid to have parents who both liked to cook and made time to sit down with us every day over a good home-cooked meal.

It’s like learning to play the guitar: at first you strive just to hit the notes and play the chords, and it takes time to gain command of the basic techniques and to move on to the level where subtle improvisation and nuanced expression can occur. If your dream is to play in a band, don’t expect to get up on stage after a day or even a month; start by picking up a basic book on learning to play the guitar and practicing somewhere you’re comfortable. A beta tester for this book commented: While there are chefs with natural-born abilities, people have to be aware that learning to cook is an iterative process. They have to learn to expect not to get it right the first time, and proceed from there, doing it again and again. What about when you fubar (foobar?) a meal and can’t figure out why? Think of it like not solving a puzzle on the first try. When starting to cook, make sure you don’t pick puzzles that are too difficult. Start with simpler puzzles (recipes) that will allow you to gain the insights needed to solve the harder ones.

pages: 678 words: 216,204

The Wealth of Networks: How Social Production Transforms Markets and Freedom by Yochai Benkler


affirmative action, barriers to entry, bioinformatics, Brownian motion, call centre, Cass Sunstein, centre right, clean water, commoditize, dark matter, desegregation, East Village, fear of failure, Firefox, game design, George Gilder, hiring and firing, Howard Rheingold, informal economy, invention of radio, Isaac Newton, iterative process, Jean Tirole, jimmy wales, John Markoff, Kenneth Arrow, market bubble, market clearing, Marshall McLuhan, New Journalism, optical character recognition, pattern recognition, peer-to-peer, pre–internet, price discrimination, profit maximization, profit motive, random walk, recommendation engine, regulatory arbitrage, rent-seeking, RFID, Richard Stallman, Ronald Coase, Search for Extraterrestrial Intelligence, SETI@home, shareholder value, Silicon Valley, Skype, slashdot, social software, software patent, spectrum auction, technoutopianism, The Fortune at the Bottom of the Pyramid, The Nature of the Firm, transaction costs, Vilfredo Pareto

The fact that power law distributions of attention to Web sites result from random distributions of interests, not from formal or practical bottlenecks that cannot be worked around, means that whenever an individual chooses to search based on some mechanism other than the simplest, thinnest belief that individuals are all equally similar and dissimilar, a different type of site will emerge as highly visible. Topical sites cluster, unsurprisingly, around topical preference groups; one site does not account for all readers irrespective of their interests. We, as individuals, also go through an iterative process of assigning a likely relevance to the judgments of others. Through this process, we limit the information overload that would threaten to swamp our capacity to know; we diversify the sources of information to which we expose ourselves; and we avoid a stifling dependence on an editor whose judgments we cannot circumvent. We might spend some of our time using the most general, "human interest has some overlap" algorithm represented by Google for some things, but use political common interest, geographic or local interest, hobbyist, subject matter, or the like, to slice the universe of potential others with whose judgments we will choose to affiliate for any given search.

Without forming or requiring a formal hierarchy, and without creating single points of control, each cluster generates a set of sites that offer points of initial filtering, in ways that are still congruent with the judgments of participants in the highly connected small cluster. The process is replicated at larger and more general clusters, to the point where positions that have been synthesized "locally" and "regionally" can reach Web-wide visibility and salience. It turns out that we are not intellectual lemmings. We do not use the freedom that the network has made possible to plunge into the abyss of incoherent babble. Instead, through iterative processes of cooperative filtering and "transmission" through the high visibility nodes, the low-end thin tail turns out to be a peer-produced filter and transmission medium for a vastly larger number of speakers than was imaginable in the mass-media model. 459 The effects of the topology of the network are reinforced by the cultural forms of linking, e-mail lists, and the writable Web. The network topology literature treats every page or site as a node.

pages: 398 words: 86,855

Bad Data Handbook by Q. Ethan McCallum


Amazon Mechanical Turk, asset allocation, barriers to entry, Benoit Mandelbrot, business intelligence, cellular automata, chief data officer, Chuck Templeton: OpenTable, cloud computing, cognitive dissonance, combinatorial explosion, commoditize, conceptual framework, database schema,, Firefox, Flash crash, Gini coefficient, illegal immigration, iterative process, labor-force participation, loose coupling, natural language processing, Netflix Prize, quantitative trading / quantitative finance, recommendation engine, selection bias, sentiment analysis, statistical model, supply-chain management, survivorship bias, text mining, too big to fail, web application

This was one of the areas where our data contest approach led to quite a different experience than you’d find with a more traditionally outsourced project. All of the previous preparation steps had created an extremely well-defined problem for the contestants to tackle, and on our end we couldn’t update any data or change the rules part-way through. Working with a consultant or part-time employee is a much more iterative process, because you can revise your requirements and inputs as you go. Because those changes are often costly in terms of time and resources, up-front preparation is still extremely effective. Several teams apparently tried to use external data sources to help improve their results, without much success. Their hope was that by adding in extra information about things like the geographic location where a photo was taken, they could produce better guesses about its quality.

pages: 502 words: 107,510

Natural Language Annotation for Machine Learning by James Pustejovsky, Amber Stubbs


Amazon Mechanical Turk, bioinformatics, cloud computing, computer vision, crowdsourcing, easy for humans, difficult for computers, finite state, game design, information retrieval, iterative process, natural language processing, pattern recognition, performance metric, sentiment analysis, social web, speech recognition, statistical model, text mining

In particular, we will look at: What makes a good annotation goal Where to find related research How your dataset reflects your annotation goals Preparing the data for annotators to use How much data you will need for your task What you should be able to take away from this chapter is a clear answer to the questions “What am I trying to do?”, “How am I trying to do it?”, and “Which resources best fit my needs?”. As you progress through the MATTER cycle, the answers to these questions will probably change—corpus creation is an iterative process—but having a stated goal will help keep you from getting off track. Defining Your Goal In terms of the MATTER cycle, at this point we’re right at the start of “M”—being able to clearly explain what you hope to accomplish with your corpus is the first step in creating your model. While you probably already have a good idea about what you want to do, in this section we’ll give you some pointers on how to create a goal definition that is useful and will help keep you focused in the later stages of the MATTER cycle.

pages: 411 words: 108,119

The Irrational Economist: Making Decisions in a Dangerous World by Erwann Michel-Kerjan, Paul Slovic


Andrei Shleifer, availability heuristic, bank run, Black Swan, Cass Sunstein, clean water, cognitive dissonance, collateralized debt obligation, complexity theory, conceptual framework, corporate social responsibility, Credit Default Swap, credit default swaps / collateralized debt obligations, cross-subsidies, Daniel Kahneman / Amos Tversky, endowment effect, experimental economics, financial innovation, Fractional reserve banking, George Akerlof, hindsight bias, incomplete markets, information asymmetry, Intergovernmental Panel on Climate Change (IPCC), invisible hand, Isaac Newton, iterative process, Kenneth Arrow, Loma Prieta earthquake, London Interbank Offered Rate, market bubble, market clearing, money market fund, moral hazard, mortgage debt, Pareto efficiency, Paul Samuelson, placebo effect, price discrimination, price stability, RAND corporation, Richard Thaler, Robert Shiller, Robert Shiller, Ronald Reagan, source of truth, statistical model, stochastic process, The Wealth of Nations by Adam Smith, Thomas Bayes, Thomas Kuhn: the structure of scientific revolutions, too big to fail, transaction costs, ultimatum game, University of East Anglia, urban planning, Vilfredo Pareto

All rights reserved. Figure 6.2 provides a snapshot of the quality of the decision at a given point in time, in order to judge whether more work is needed. The practical challenge is to select decision-making methods that move the cursor in each link to the right efficiently, while periodically taking stock of the overall profile, without overshooting the optimal target. This is essentially a heuristic and iterative process, guided by intuition and decision coaching, in order to find the optimal position for each link. It is important to recognize that in the rational components of the model lie judgments and values that are behaviorally rooted and, thus, that deep biases may never fully surface or be completely eliminated. This is a key challenge for the aggregation assumption alluded to earlier. The DEF approach, illustrated in Figure 6.3, is a well-founded, proven, and practical way to integrate traditional decision analysis with behavioral insights about the psychology and sociology of choice.

pages: 364 words: 102,528

An Economist Gets Lunch: New Rules for Everyday Foodies by Tyler Cowen


agricultural Revolution, big-box store, business climate, carbon footprint, cognitive bias, creative destruction, cross-subsidies, East Village,, food miles, guest worker program, haute cuisine, illegal immigration, informal economy, iterative process, mass immigration, oil shale / tar sands, out of africa, pattern recognition, Peter Singer: altruism, price discrimination, refrigerator car, The Wealth of Nations by Adam Smith, Tyler Cowen: Great Stagnation, Upton Sinclair, winner-take-all economy, women in the workforce

They are a constant challenge as to whether I have mastered various codes of Indian cooking and their lack of detail gives me room to improvise, learn, and make mistakes. Every now and then I go back to the more thorough cookbooks (another is 1,000 Indian Recipes by Neelam Batra) to learn new recipes and techniques, and then I do a batch more Indian cooking from the shorter guides. It’s an iterative process where I step back and forth between a food world where I am told what to do and a food world where I am immersed in the implicit codes of meaning and contributing to innovation within established structures. Some of your cookbooks, or more broadly your recipe sources, should have very short recipes for use in this manner. Maureen Evans posts recipes at @cookbook on Twitter and has a book called Eat Tweet.

pages: 398 words: 100,679

The Knowledge: How to Rebuild Our World From Scratch by Lewis Dartnell


agricultural Revolution, Albert Einstein, Any sufficiently advanced technology is indistinguishable from magic, clean water, Dava Sobel, decarbonisation, discovery of penicillin, Dmitri Mendeleev, global village, Haber-Bosch Process, invention of movable type, invention of radio, invention of writing, iterative process, James Watt: steam engine, John Harrison: Longitude, lone genius, mass immigration, nuclear winter, off grid, Richard Feynman, Richard Feynman, technology bubble, the scientific method, Thomas Kuhn: the structure of scientific revolutions, trade route

Once you get fermentation, throw half of the culture away and replace with fresh flour and water in the same proportions, repeating this refill twice a day. This gives the culture more nutrients to reproduce and continually doubles the size of the microbial territory to expand into. After about a week, once you have a healthy-smelling culture reliably growing and frothing after every replenishment, like a microbial pet thriving on the feed left in its bowl, you are ready to extract some of the dough and bake bread. By running through this iterative process you have essentially created a rudimentary microbiological selection protocol—narrowing down to wild strains that can grow on the starch nutrients in the flour with the fastest cell division rates at a temperature of around 20°–30°C. Your resultant sourdough is not a pure culture of a single isolate, but actually a balanced community of lactobacillus bacteria, able to break down the complex storage molecules of the grain, and yeast living on the byproducts of the lactobacilli and releasing carbon dioxide gas to leaven the bread.

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose


Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, speech recognition, statistical model, William of Occam

The standard formulas for mean and standard deviation are adjusted to use the cluster membership probabilities wi as weights. Thus, the following weighted mean and standard deviation are computed: n μC = i=1 n n σC2 = wi xi i=1 wi wi (xi − μC )2 n w i=1 i i=1 Note that the sums go for all values, not only for those belonging to the corresponding cluster. Thus, given a sample size n, we have an n-component weight vector for each cluster. The iterative process is similar to that of k-means; the data points are redistributed among clusters repeatedly until the process reaches a fixpoint. The k-means algorithm stops when the cluster membership does not change from one iteration to the next. k-Means uses “hard”2 cluster assignment, however, whereas the EM uses 2 In fact, there exist versions of k-means with soft assignment, which are special cases of EM. 80 CHAPTER 3 CLUSTERING “soft” assignment—probability of membership.

pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists by Gary Marcus, Jeremy Freeman


23andMe, Albert Einstein, bioinformatics, bitcoin, brain emulation, cloud computing, complexity theory, computer age, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, Drosophila, epigenetics, Google Glasses, iterative process, linked data, mouse model, optical character recognition, pattern recognition, personalized medicine, phenotype, race to the bottom, Richard Feynman, Richard Feynman, Ronald Reagan, semantic web, speech recognition, stem cell, Steven Pinker, supply-chain management, Turing machine, web application

Even if it is clear which kinds of measurements we want to make (for example, whole-brain calcium imaging of the larval zebrafish, two-photon imaging of multiple areas of mouse cortex), it is not clear which behaviors the organism should be performing while we collect those data, or which environment it should be experiencing. It is hard to imagine a single dataset, however massive, from which the truths we seek will emerge with only the right analysis, especially when we consider the nearly infinite set of alternative experiments we might have performed. Instead, we need an iterative process by which we move back and forth between using analytic tools to identify patterns in data and using the recovered patterns to inform and guide the next set of experiments. After many iterations, the patterns we identify may coalesce into rules and themes, perhaps even themes that extend across different systems and modalities. And with luck, we might ultimately arrive at theories of neural computation, which will shape not only the design of our experiments but also the very foundations of neuroscience.

pages: 398 words: 108,889

The Paypal Wars: Battles With Ebay, the Media, the Mafia, and the Rest of Planet Earth by Eric M. Jackson


bank run, business process, call centre, creative destruction, disintermediation, Elon Musk, index fund, Internet Archive, iterative process, Joseph Schumpeter, market design, Menlo Park, Metcalfe’s law, money market fund, moral hazard, Network effects, new economy, offshore financial centre, Peter Thiel, Robert Metcalfe, Sand Hill Road, shareholder value, Silicon Valley, Silicon Valley startup, telemarketer, The Chicago School, the new new thing, Turing test

But the liberals are even worse—they always want to rely on regulation to make things better. Neither side is asking the right questions regarding the pressing needs of the day. “In our own way, at PayPal this is what we’ve been doing all along. We’ve been creating a system that enables global commerce for everyone. And we’ve been fighting the people who would do us and our users harm. It’s been a gradual, iterative process, and we’ve gotten plenty of stuff wrong along the way, but we’ve kept moving in the right direction to address these major issues while the rest of the world has been ignoring them. “And so I’d like to send a message back to planet Earth from Palo Alto. Life is good here in Palo Alto. We’ve been able to improve on many of the ways you do things. Come to Palo Alto for a visit sometime and learn something.

pages: 292 words: 94,324

How Doctors Think by Jerome Groopman


affirmative action, Atul Gawande, Daniel Kahneman / Amos Tversky, deliberate practice, fear of failure, framing effect, index card, iterative process, medical malpractice, medical residency, Menlo Park, pattern recognition, placebo effect, stem cell, theory of mind

In addition to his work with patients, Nimer oversees a large research program studying malignant blood diseases like lymphoma and leukemia. "I believe that my thinking in the clinic is helped by having a laboratory. If you do an experiment two times and you don't get results, then it doesn't make sense to do it the same way a third time. You have to ask yourself: What am I missing? How should I do it differently the next time? It is the same iterative process in the clinic. If you are taking care of someone and he is not getting better, then you have to think of a new way to treat him, not just keep giving him the same therapy. You also have to wonder whether you are missing something." This seemingly obvious set of statements is actually a profound realization, because it is much easier both psychologically and logistically for a doctor to keep treating a serious disease with a familiar therapy even when the disease is not responding.

pages: 323 words: 95,939

Present Shock: When Everything Happens Now by Douglas Rushkoff


algorithmic trading, Andrew Keen, bank run, Benoit Mandelbrot, big-box store, Black Swan, British Empire, Buckminster Fuller, cashless society, citizen journalism, clockwork universe, cognitive dissonance, Credit Default Swap, crowdsourcing, Danny Hillis, disintermediation, Donald Trump, double helix, East Village, Elliott wave, European colonialism, Extropian, facts on the ground, Flash crash, game design, global supply chain, global village, Howard Rheingold, hypertext link, Inbox Zero, invention of agriculture, invention of hypertext, invisible hand, iterative process, John Nash: game theory, Kevin Kelly, laissez-faire capitalism, Law of Accelerating Returns, loss aversion, mandelbrot fractal, Marshall McLuhan, Merlin Mann, Milgram experiment, mutually assured destruction, negative equity, Network effects, New Urbanism, Nicholas Carr, Norbert Wiener, Occupy movement, passive investing, pattern recognition, peak oil, price mechanism, prisoner's dilemma, Ralph Nelson Elliott, RAND corporation, Ray Kurzweil, recommendation engine, selective serotonin reuptake inhibitor (SSRI), Silicon Valley, Skype, social graph, South Sea Bubble, Steve Jobs, Steve Wozniak, Steven Pinker, Stewart Brand, supply-chain management, the medium is the message, The Wisdom of Crowds, theory of mind, Turing test, upwardly mobile, Whole Earth Catalog, WikiLeaks, Y2K, zero-sum game

She achieved greater efficiency while also granting herself greater flow. The digital can be stacked; the human gets to live in real time. This experience is what makes us creative, intelligent, and capable of learning. As science and innovation writer Steven Johnson has shown, great ideas don’t really come out of sudden eureka moments, but after long, steady slogs through problems.31 They are slow, iterative processes. Great ideas, as Johnson explained it to a TED audience, “fade into view over long periods of time.” For instance, Charles Darwin described his discovery of evolution as a eureka moment that occurred while he was reading Malthus on a particular night in October of 1838. But Darwin’s notebooks reveal that he had the entire theory of evolution long before this moment; he simply hadn’t fully articulated it yet.

pages: 484 words: 104,873

Rise of the Robots: Technology and the Threat of a Jobless Future by Martin Ford


3D printing, additive manufacturing, Affordable Care Act / Obamacare, AI winter, algorithmic trading, Amazon Mechanical Turk, artificial general intelligence, assortative mating, autonomous vehicles, banking crisis, basic income, Baxter: Rethink Robotics, Bernie Madoff, Bill Joy: nanobots, call centre, Capital in the Twenty-First Century by Thomas Piketty, Chris Urmson, Clayton Christensen, clean water, cloud computing, collateralized debt obligation, commoditize, computer age, creative destruction, debt deflation, deskilling, diversified portfolio, Erik Brynjolfsson, factory automation, financial innovation, Flash crash, Fractional reserve banking, Freestyle chess, full employment, Goldman Sachs: Vampire Squid, Gunnar Myrdal, High speed trading, income inequality, indoor plumbing, industrial robot, informal economy, iterative process, Jaron Lanier, job automation, John Markoff, John Maynard Keynes: technological unemployment, John von Neumann, Kenneth Arrow, Khan Academy, knowledge worker, labor-force participation, labour mobility, liquidity trap, low skilled workers, low-wage service sector, Lyft, manufacturing employment, Marc Andreessen, McJob, moral hazard, Narrative Science, Network effects, new economy, Nicholas Carr, Norbert Wiener, obamacare, optical character recognition, passive income, Paul Samuelson, performance metric, Peter Thiel, Plutocrats, plutocrats, post scarcity, precision agriculture, price mechanism, Ray Kurzweil, rent control, rent-seeking, reshoring, RFID, Richard Feynman, Richard Feynman, Rodney Brooks, secular stagnation, self-driving car, Silicon Valley, Silicon Valley startup, single-payer health, software is eating the world, sovereign wealth fund, speech recognition, Spread Networks laid a new fibre optics cable between New York and Chicago, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steven Levy, Steven Pinker, strong AI, Stuxnet, technological singularity, telepresence, telepresence robot, The Bell Curve by Richard Herrnstein and Charles Murray, The Coming Technological Singularity, The Future of Employment, Thomas L Friedman, too big to fail, Tyler Cowen: Great Stagnation, union organizing, Vernor Vinge, very high income, Watson beat the top human players on Jeopardy!, women in the workforce

Inevitably, we would soon share the planet with something entirely unprecedented: a genuinely alien—and superior—intellect. And that might well be only the beginning. It’s generally accepted by AI researchers that such a system would eventually be driven to direct its intelligence inward. It would focus its efforts on improving its own design, rewriting its software, or perhaps using evolutionary programming techniques to create, test, and optimize enhancements to its design. This would lead to an iterative process of “recursive improvement.” With each revision, the system would become smarter and more capable. As the cycle accelerated, the ultimate result would be an “intelligence explosion”—quite possibly culminating in a machine thousands or even millions of times smarter than any human being. As Hawking and his collaborators put it, it “would be the biggest event in human history.” If such an intelligence explosion were to occur, it would certainly have dramatic implications for humanity.

pages: 315 words: 93,522

How Music Got Free: The End of an Industry, the Turn of the Century, and the Patient Zero of Piracy by Stephen Witt


4chan, barriers to entry, Berlin Wall, big-box store, cloud computing, collaborative economy, crowdsourcing, game design, Internet Archive, invention of movable type, inventory management, iterative process, Jason Scott:, job automation, late fees, mental accounting, moral panic, packet switching, pattern recognition, peer-to-peer, pirate software, Ronald Reagan, security theater, sharing economy, side project, Silicon Valley, software patent, Steve Jobs, zero day

A NOTE ON SOURCES A private detective once explained to me the essence of the investigative method: “You start with a document. Then you take that document to a person, and ask them about it. Then that person tells you about another document. You repeat this process until you run out of people, or documents.” Starting with the Affinity e-zine interview quoted in this book, and following this iterative process for the next four years, I ended up with dozens of people and tens of thousands of documents. A comprehensive catalog would take pages—below is a selection. The key interview subjects for this book were Karlheinz Brandenburg, Robert Buchanan, Brad Buckles, Leonardo Chiariglione, Ernst Eberlein, Keith P. Ellison, Frank Foti, Harvey Geller, Bennie Lydell Glover, Bennie Glover, Jr., Loretta Glover, Iain Grant, Tom Grasso, Bernhard Grill, Bruce Hack, Jürgen Herre, Bruce Huckfeldt, James Johnston, Larry Kenswil, Carlos Linares, Henri Linde, Doug Morris, George Murphy, Tyler Newby, Harald Popp, Eileen Richardson, Domingo Rivera, Hilary Rosen, Johnny Ryan, Patrick Saunders, Dieter Seitzer, Jacob Stahler, Alex Stein, Simon Tai, Steve Van Buren, Terry Yates, and Elizabeth Young.

Writing Effective Use Cases by Alistair Cockburn


business process,, create, read, update, delete, finite state, index card, information retrieval, iterative process, recommendation engine, Silicon Valley, web application

The most common use is when there are many asynchronous or interrupting services the user might use, which should not disturb the base use case. Often, they will be developed by different teams. These situations show up with shrink-wrapped software such as word processors, as illustrated above. The second situation is when you are writing additions to a locked requirements document. Susan Lilly writes, "You’re working on a project with an iterative process and multiple drops. You have baselined requirements for a drop. In a subsequent drop, you extend a baselined use case with new or additional functionality. You do not touch the baselined use case." If the base use case is not locked, then the extension is fragile: changing the base use case can mess up the condition mentioned in the extending use case. Be wary about using extension use cases is such situations.

pages: 325 words: 110,330

Creativity, Inc.: Overcoming the Unseen Forces That Stand in the Way of True Inspiration by Ed Catmull, Amy Wallace


Albert Einstein, business climate, buy low sell high, complexity theory, fear of failure, Golden Gate Park, iterative process, Menlo Park, rolodex, Rubik’s Cube, Sand Hill Road, Silicon Valley, Silicon Valley startup, Steve Jobs, Wall-E

But think about how easy it would be for a movie about talking toys to feel derivative, sappy, or overtly merchandise-driven. Think about how off-putting a movie about rats preparing food could be, or how risky it must’ve seemed to start WALL-E with 39 dialogue-free minutes. We dare to attempt these stories, but we don’t get them right on the first pass. And this is as it should be. Creativity has to start somewhere, and we are true believers in the power of bracing, candid feedback and the iterative process—reworking, reworking, and reworking again, until a flawed story finds its throughline or a hollow character finds its soul. As I’ve discussed, first we draw storyboards of the script and then edit them together with temporary voices and music to make a crude mock-up of the film, known as reels. Then the Braintrust watches this version of the movie and discusses what’s not ringing true, what could be better, what’s not working at all.

pages: 378 words: 110,408

Peak: Secrets From the New Science of Expertise by Anders Ericsson, Robert Pool


Albert Einstein, deliberate practice, iterative process, meta analysis, meta-analysis, pattern recognition, randomized controlled trial, Richard Feynman, Richard Feynman, Rubik’s Cube, sensible shoes

Once you get to the edge of your field, you may not know exactly where you’re headed, but you know the general direction, and you have spent a good deal of your life building this ladder, so you have a good sense of what it takes to add on one more step. Researchers who study how the creative geniuses in any field—science, art, music, sports, and so on—come up with their innovations have found that it is always a long, slow, iterative process. Sometimes these pathbreakers know what they want to do but don’t know how to do it—like a painter trying to create a particular effect in the eye of the viewer—so they explore various approaches to find one that works. And sometimes they don’t know exactly where they’re going, but they recognize a problem that needs a solution or a situation that needs improving—like mathematicians trying to prove an intractable theorem—and again they try different things, guided by what has worked in the past.

pages: 366 words: 94,209

Throwing Rocks at the Google Bus: How Growth Became the Enemy of Prosperity by Douglas Rushkoff


3D printing, activist fund / activist shareholder / activist investor, Airbnb, algorithmic trading, Amazon Mechanical Turk, Andrew Keen, bank run, banking crisis, barriers to entry, bitcoin, blockchain, Burning Man, business process, buy low sell high, California gold rush, Capital in the Twenty-First Century by Thomas Piketty, carbon footprint, centralized clearinghouse, citizen journalism, clean water, cloud computing, collaborative economy, collective bargaining, colonial exploitation, Community Supported Agriculture, corporate personhood, corporate raider, creative destruction, crowdsourcing, cryptocurrency, disintermediation, diversified portfolio, Elon Musk, Erik Brynjolfsson, ethereum blockchain, fiat currency, Firefox, Flash crash, full employment, future of work, gig economy, Gini coefficient, global supply chain, global village, Google bus, Howard Rheingold, IBM and the Holocaust, impulse control, income inequality, index fund, iterative process, Jaron Lanier, Jeff Bezos, jimmy wales, job automation, Joseph Schumpeter, Kickstarter, loss aversion, Lyft, Marc Andreessen, Mark Zuckerberg, market bubble, market fundamentalism, Marshall McLuhan, means of production, medical bankruptcy, minimum viable product, Naomi Klein, Network effects, new economy, Norbert Wiener, Oculus Rift, passive investing, payday loans, peer-to-peer lending, Peter Thiel, post-industrial society, profit motive, quantitative easing, race to the bottom, recommendation engine, reserve currency, RFID, Richard Stallman, ride hailing / ride sharing, Ronald Reagan, Satoshi Nakamoto, Second Machine Age, shareholder value, sharing economy, Silicon Valley, Snapchat, social graph, software patent, Steve Jobs, TaskRabbit, The Future of Employment, trade route, transportation-network company, Turing test, Uber and Lyft, Uber for X, unpaid internship, Y Combinator, young professional, zero-sum game, Zipcar

Indeed, the more that algorithms dominate the marketplace, the more the market begins to take on the properties of a dynamic system. It’s no longer a marketplace driven directly by supply and demand, business conditions, or commodity prices. Rather, prices, flows, and volatility are determined by the trading going on among all the algorithms. Each algorithm is a feedback loop, taking an action, observing the resulting conditions, and taking another action after that. Again, and again, and again. It’s an iterative process, in which the algorithms adjust themselves and their activity on every loop, responding less to the news on the ground than to one another. Such systems go out of control because the feedback of their own activity has become louder than the original signal. It’s like when a performer puts a microphone too close to an amplified speaker. It picks up its own feedback, sends it to the speaker, picks it up again, and sends it through again, ad infinitum.

pages: 345 words: 86,394

Frequently Asked Questions in Quantitative Finance by Paul Wilmott


Albert Einstein, asset allocation, beat the dealer, Black-Scholes formula, Brownian motion, butterfly effect, capital asset pricing model, collateralized debt obligation, Credit Default Swap, credit default swaps / collateralized debt obligations, delta neutral, discrete time, diversified portfolio, Edward Thorp, Emanuel Derman, Eugene Fama: efficient market hypothesis, fixed income, fudge factor, implied volatility, incomplete markets, interest rate derivative, interest rate swap, iterative process, London Interbank Offered Rate, Long Term Capital Management, Louis Bachelier, mandelbrot fractal, margin call, market bubble, martingale, Myron Scholes, Norbert Wiener, Paul Samuelson, quantitative trading / quantitative finance, random walk, regulatory arbitrage, risk/return, Sharpe ratio, statistical arbitrage, statistical model, stochastic process, stochastic volatility, transaction costs, urban planning, value at risk, volatility arbitrage, volatility smile, Wiener process, yield curve, zero-coupon bond

Since it is not traded on an exchange it must be priced using some mathematical model. See pages 305-325. Expected loss The average loss once a specified threshold has been breached. Used as a measure of Value at Risk. See page 48. Finite difference A numerical method for solving differential equations wherein derivatives are approximated by differences. The differential equation thus becomes a difference equation which can be solved numerically, usually by an iterative process. Gamma The sensitivity of an option’s delta to the underlying. Therefore it is the second derivative of an option price with respect to the underlying. See page 111. GARCH Generalized Auto Regressive Conditional Heteroscedasticity, an econometric model for volatility in which the current variance depends on the previous random increments. Hedge To reduce risk by exploiting correlations between financial instruments.

pages: 344 words: 96,020

Hacking Growth: How Today's Fastest-Growing Companies Drive Breakout Success by Sean Ellis, Morgan Brown

Airbnb, Amazon Web Services, barriers to entry, bounce rate, business intelligence, business process, correlation does not imply causation, crowdsourcing, DevOps, Elon Musk, game design, Google Glasses, Internet of things, inventory management, iterative process, Jeff Bezos, Khan Academy, Lean Startup, Lyft, Mark Zuckerberg, market design, minimum viable product, Network effects, Paul Graham, Peter Thiel, Ponzi scheme, recommendation engine, ride hailing / ride sharing, side project, Silicon Valley, Silicon Valley startup, Skype, Snapchat, software as a service, Steve Jobs, subscription business, Uber and Lyft, Uber for X, working poor, Y Combinator, young professional

It wasn’t the immaculate conception of a world-changing product nor any single insight, lucky break, or stroke of genius that rocketed these companies to success. In reality, their success was driven by the methodical, rapid-fire generation and testing of new ideas for product development and marketing, and the use of data on user behavior to find the winning ideas that drove growth. If this iterative process sounds familiar, it’s likely because you’ve encountered a similar approach in agile software development or the Lean Startup methodology. What those two approaches have done for new business models and product development, respectively, growth hacking does for customer acquisition, retention, and revenue growth. Building on these methods was natural for Sean and other start-up teams, because the companies that Sean advised and others that developed the method were stacked with great engineering talent familiar with the methods, and because the founders were inclined to apply a similar approach to customer growth as the engineers applied to their software and product development.

pages: 290 words: 87,549

The Airbnb Story: How Three Ordinary Guys Disrupted an Industry, Made Billions...and Created Plenty of Controversy by Leigh Gallagher

Airbnb, Amazon Web Services, barriers to entry, Bernie Sanders, cloud computing, crowdsourcing, don't be evil, Donald Trump, East Village, Elon Musk, housing crisis, iterative process, Jeff Bezos, Jony Ive,, Lyft, Marc Andreessen, Mark Zuckerberg, medical residency, Menlo Park, Network effects, Paul Buchheit, Paul Graham, performance metric, Peter Thiel, RFID, Sand Hill Road, Saturday Night Live, sharing economy, side project, Silicon Valley, Silicon Valley startup, South of Market, San Francisco, Startup school, Steve Jobs, TaskRabbit, the payments system, Tony Hsieh, Y Combinator, yield management

“My hope is that at the end of this launch, everything you knew about travel will look different,” he says. “You might still call it a trip, you might still call it travel, but it’s going to make all the travel you knew before that look very, very different.” In many ways, this plan is a logical extension of the company’s core business. It doubles down on its focus on “living like a local,” the “anti-Frommer’s” approach to tourism that Airbnb has homed in on in the past few years. During the iteration process, the company plucked a tourist named Ricardo out of Fisherman’s Wharf and followed him with a photographer for a few days, documenting him at Alcatraz Island, trying to gaze through a fogged-out view of the Golden Gate Bridge, and eating at Bubba Gump Shrimp Company. Airbnb tallied up his receipts and found he spent most of his money on chain franchises based in other cities. The Magical Trips team reengineered what might be a perfect trip for that same tourist, plunging him into a 1920s-themed dinner party, sending him on a walking tour of the city’s Bernal Heights neighborhood led by a local, and presenting him with instructions to show up for a spontaneous midnight “mystery” bike ride, where sixty riders outfitted their bikes in neon lights and rode all over the city until 2 or 3 a.m.

pages: 597 words: 119,204

Website Optimization by Andrew B. King


AltaVista, bounce rate, don't be evil,, Firefox, In Cold Blood by Truman Capote, information retrieval, iterative process, medical malpractice, Network effects, performance metric, search engine result page, second-price auction, second-price sealed-bid, semantic web, Silicon Valley, slashdot, social graph, Steve Jobs, web application

Acquire inbound links Search engines use external factors such as inbound links, anchor text, surrounding text, and domain history, among others, to determine the relative importance of your site. Most of your rankings in search engines are determined by the number and popularity of your inbound links. [19]. These concepts will come up again and again as you optimize for search-friendliness, and we'll discuss them in more detail shortly. Step 1: Determine Your Keyword Phrases Finding the best keyword phrases to target is an iterative process. First, start with a list of keywords that you want to target with your website. Next, expand that list by brainstorming about other phrases, looking at competitor sites and your logfiles, and including plurals, splits, stems, synonyms, and common misspellings. Then triage those phrases based on search demand and the number of result pages to find the most effective phrases. Finally, play the long tail by targeting multiword phrases to get more targeted traffic and higher conversion rates.

pages: 420 words: 124,202

The Most Powerful Idea in the World: A Story of Steam, Industry, and Invention by William Rosen


Albert Einstein, All science is either physics or stamp collecting, barriers to entry, collective bargaining, computer age, Copley Medal, creative destruction, David Ricardo: comparative advantage, decarbonisation, delayed gratification, Fellow of the Royal Society, Flynn Effect, fudge factor, full employment, invisible hand, Isaac Newton, Islamic Golden Age, iterative process, Jacquard loom, James Hargreaves, James Watt: steam engine, John Harrison: Longitude, Joseph Schumpeter, Joseph-Marie Jacquard, knowledge economy, moral hazard, Network effects, Paul Samuelson, Peace of Westphalia, Peter Singer: altruism, QWERTY keyboard, Ralph Waldo Emerson, rent-seeking, Ronald Coase, Simon Kuznets, spinning jenny, the scientific method, The Wealth of Nations by Adam Smith, Thomas Malthus, transaction costs, transcontinental railway, zero-sum game, éminence grise

And not just a screw fastener; the reason lathes are frequently called history’s “first self-replicating machines” is that, beginning in the sixteenth century, they were used to produce their own leadscrews. A dozen inventors from all over Europe, including the Huguenots Jacques Besson and Salomon de Caus, the Italian clockmaker Torriano de Cremona, the German military engineer Konrad Keyser, and the Swede Christopher Polhem, mastered the iterative process by which a lathe could use one leadscrew to cut another, over and over again, each time achieving a higher order of accuracy. By connecting the lathe spindle and carriage to the leadscrew, the workpiece could be moved a set distance for every revolution of the spindle; if the workpiece revolved eight times while the cutting tool was moved a single inch, then eight spiral grooves would be cut on the metal for every inch: eight turns per inch.

Programming Android by Zigurd Mednieks, Laird Dornin, G. Blake Meike, Masumi Nakamura


anti-pattern, business process, conceptual framework, create, read, update, delete, database schema, Debian, domain-specific language,, fault tolerance, Google Earth, interchangeable parts, iterative process, loose coupling, MVC pattern, revision control, RFID, web application

In the following text, we describe SQLite commands as they are used inside the sqlite3 command-line utility. Later we will show ways to achieve the same effects using the Android API. Although command-line SQL will not be part of the application you ship, it can certainly help to debug applications as you’re developing them. You will find that writing database code in Android is usually an iterative process of writing Java code to manipulate tables, and then peeking at created data using the command line. SQL Data Definition Commands Statements in the SQL language fall into two distinct categories: those used to create and modify tables—the locations where data is stored—and those used to create, read, update, and delete the data in those tables. In this section we’ll look at the former, the data definition commands: CREATE TABLE Developers start working with SQL by creating a table to store data.

pages: 471 words: 124,585

The Ascent of Money: A Financial History of the World by Niall Ferguson


Admiral Zheng, Andrei Shleifer, Asian financial crisis, asset allocation, asset-backed security, Atahualpa, bank run, banking crisis, banks create money, Black Swan, Black-Scholes formula, Bonfire of the Vanities, Bretton Woods, BRICs, British Empire, capital asset pricing model, capital controls, Carmen Reinhart, Cass Sunstein, central bank independence, collateralized debt obligation, colonial exploitation, commoditize, Corn Laws, corporate governance, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, currency manipulation / currency intervention, currency peg, Daniel Kahneman / Amos Tversky, deglobalization, diversification, diversified portfolio, double entry bookkeeping, Edmond Halley, Edward Glaeser, Edward Lloyd's coffeehouse, financial innovation, financial intermediation, fixed income, floating exchange rates, Fractional reserve banking, Francisco Pizarro, full employment, German hyperinflation, Hernando de Soto, high net worth, hindsight bias, Home mortgage interest deduction, Hyman Minsky, income inequality, information asymmetry, interest rate swap, Intergovernmental Panel on Climate Change (IPCC), Isaac Newton, iterative process, John Meriwether, joint-stock company, joint-stock limited liability company, Joseph Schumpeter, Kenneth Arrow, Kenneth Rogoff, knowledge economy, labour mobility, Landlord’s Game, liberal capitalism, London Interbank Offered Rate, Long Term Capital Management, market bubble, market fundamentalism, means of production, Mikhail Gorbachev, money market fund, money: store of value / unit of account / medium of exchange, moral hazard, mortgage debt, mortgage tax deduction, Myron Scholes, Naomi Klein, negative equity, Nick Leeson, Northern Rock, Parag Khanna, pension reform, price anchoring, price stability, principal–agent problem, probability theory / Blaise Pascal / Pierre de Fermat, profit motive, quantitative hedge fund, RAND corporation, random walk, rent control, rent-seeking, reserve currency, Richard Thaler, Robert Shiller, Robert Shiller, Ronald Reagan, savings glut, seigniorage, short selling, Silicon Valley, South Sea Bubble, sovereign wealth fund, spice trade, structural adjustment programs, technology bubble, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, Thomas Bayes, Thomas Malthus, Thorstein Veblen, too big to fail, transaction costs, value at risk, Washington Consensus, Yom Kippur War

This provides the basis for the concept of statistical significance and modern formulations of probabilities at specified confidence intervals (for example, the statement that 40 per cent of the balls in the jar are white, at a confidence interval of 95 per cent, implies that the precise value lies somewhere between 35 and 45 per cent - 40 plus or minus 5 per cent). 4. Normal distribution. It was Abraham de Moivre who showed that outcomes of any kind of iterated process could be distributed along a curve according to their variance around the mean or standard deviation. ‘Tho’ Chance produces Irregularities,’ wrote de Moivre in 1733, ‘still the Odds will be infinitely great, that in process of Time, those Irregularities will bear no proportion to recurrency of that Order which naturally results from Original Design.’ The bell curve that we encountered in Chapter 3 represents the normal distribution, in which 68.2 per cent of outcomes are within one standard deviation (plus or minus) of the mean. 5.

pages: 1,064 words: 114,771

Tcl/Tk in a Nutshell by Paul Raines, Jeff Tranter


AltaVista, iterative process, off grid, place-making, Silicon Valley

Semicolons are another way to separate commands, in addition to newline characters: set n {[0--9]};# regular expression to match a digit Without the semicolon before the comment character, the set command will fail because it would receive too many arguments. Tcl treats "#" as an ordinary character if it is not at the beginning of a command. A Symbolic Gesture Much of Tcl's strength as a programming languages lies in the manipulation of strings and lists. Compare the following two methods for printing each element of a list: set cpu_types [list pentium sparc powerpc m88000 alpha mips hppa] # "C-like" method of iterative processing for {set i 0} {$i < [llength $cpu_types]} {incr i} { puts [lindex $cpu_types $i] } # "The Tcl Way"-using string symbols foreach cpu $cpu_types { puts $cpu } The loop coded with for is similar to how a C program might be coded, iterating over the list by the use of an integer index value. The second loop, coded with foreach, is more natural for Tcl. The loop coded with foreach contains over 50% less characters, contributing to greater readability and less code to maintain.

pages: 655 words: 141,257

Programming Android: Java Programming for the New Generation of Mobile Devices by Zigurd Mednieks, Laird Dornin, G. Blake Meike, Masumi Nakamura


anti-pattern, business process, conceptual framework, create, read, update, delete, database schema, Debian, domain-specific language,, fault tolerance, Google Earth, interchangeable parts, iterative process, loose coupling, MVC pattern, revision control, RFID, web application, yellow journalism

In the following text, we describe SQLite commands as they are used inside the sqlite3 command-line utility. Later we will show ways to achieve the same effects using the Android API. Although command-line SQL will not be part of the application you ship, it can certainly help to debug applications as you’re developing them. You will find that writing database code in Android is usually an iterative process of writing Java code to manipulate tables, and then peeking at created data using the command line. SQL Data Definition Commands Statements in the SQL language fall into two distinct categories: those used to create and modify tables—the locations where data is stored—and those used to create, read, update, and delete the data in those tables. In this section we’ll look at the former, the data definition commands: CREATE TABLE Developers start working with SQL by creating a table to store data.

pages: 503 words: 131,064

Liars and Outliers: How Security Holds Society Together by Bruce Schneier


airport security, barriers to entry, Berlin Wall, Bernie Madoff, Bernie Sanders, Brian Krebs, Broken windows theory, carried interest, Cass Sunstein, Chelsea Manning, commoditize, corporate governance, crack epidemic, credit crunch, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, David Graeber, desegregation, don't be evil, Double Irish / Dutch Sandwich, Douglas Hofstadter, experimental economics, Fall of the Berlin Wall, financial deregulation, George Akerlof, hydraulic fracturing, impulse control, income inequality, invention of agriculture, invention of gunpowder, iterative process, Jean Tirole, John Nash: game theory, joint-stock company, Julian Assange, mass incarceration, meta analysis, meta-analysis, microcredit, moral hazard, mutually assured destruction, Nate Silver, Network effects, Nick Leeson, offshore financial centre, patent troll, phenotype, pre–internet, principal–agent problem, prisoner's dilemma, profit maximization, profit motive, race to the bottom, Ralph Waldo Emerson, RAND corporation, rent-seeking, RFID, Richard Thaler, risk tolerance, Ronald Coase, security theater, shareholder value, slashdot, statistical model, Steven Pinker, Stuxnet, technological singularity, The Market for Lemons, The Nature of the Firm, The Spirit Level, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, too big to fail, traffic fines, transaction costs, ultimatum game, UNCLOS, union organizing, Vernor Vinge, WikiLeaks, World Values Survey, Y2K, zero-sum game

Security systems are often what economists call an experiential good: something you don't understand the value of until you've already bought, installed, and experienced it.3 This holds true for other forms of societal pressure as well. If you're knowledgeable and experienced and perform a good analysis, you can make some good guesses, but it can be impossible to know the actual effects—or unintended consequences—of a particular societal pressure until you've already implemented it. This means that implementing societal pressures is always an iterative process. We try something, see how well it works, then fine-tune. Any society—a family, a business, a government—is constantly balancing its need for security with the side effects, unintended consequences, and other considerations. Can we afford this particular societal pressure system? Are our fundamental freedoms and liberties more important than more security?4 More onerous ATM security will result in fewer ATM transactions, costing a bank more than the ATM fraud.

pages: 560 words: 158,238

Fifty Degrees Below by Kim Stanley Robinson


airport security, bioinformatics, Burning Man, clean water, Donner party, full employment, Intergovernmental Panel on Climate Change (IPCC), invisible hand, iterative process, means of production, minimum wage unemployment, North Sea oil, Ralph Waldo Emerson, Richard Feynman, Richard Feynman, statistical model, Stephen Hawking, the scientific method

When Frank expressed doubt that any major climate mitigation was possible, either physically or politically, Wracke waved a hand. “The Corps has always done things on a big scale. Huge scale. Sometimes with huge blunders. All with the best intentions of course. That’s just the way things happen. We’re still gung-ho to try. Lots of things are reversible, in the long run. Hopefully this time around we’ll be working with better science. But, you know, it’s an iterative process. So, long story short, you get a project approved, and we’re good to go. We’ve got the expertise. The Corps’ esprit de corps is always high.” “What about budget?” Frank asked. “What about it? We’ll spend what we’re given.” “Well, but is there any kind of, you know, discretionary fund that you can tap into?” “We don’t seek funding, usually,” the general admitted. “But could you?” “Well, in tandem with a request for action.

pages: 574 words: 164,509

Superintelligence: Paths, Dangers, Strategies by Nick Bostrom


agricultural Revolution, AI winter, Albert Einstein, algorithmic trading, anthropic principle, anti-communist, artificial general intelligence, autonomous vehicles, barriers to entry, Bayesian statistics, bioinformatics, brain emulation, cloud computing, combinatorial explosion, computer vision, cosmological constant, dark matter, DARPA: Urban Challenge, data acquisition, delayed gratification, demographic transition, Donald Knuth, Douglas Hofstadter, Drosophila, Elon Musk,, endogenous growth, epigenetics, fear of failure, Flash crash, Flynn Effect, friendly AI, Gödel, Escher, Bach, income inequality, industrial robot, informal economy, information retrieval, interchangeable parts, iterative process, job automation, John Markoff, John von Neumann, knowledge worker, Menlo Park, meta analysis, meta-analysis, mutually assured destruction, Nash equilibrium, Netflix Prize, new economy, Norbert Wiener, NP-complete, nuclear winter, optical character recognition, pattern recognition, performance metric, phenotype, prediction markets, price stability, principal–agent problem, race to the bottom, random walk, Ray Kurzweil, recommendation engine, reversible computing, social graph, speech recognition, Stanislav Petrov, statistical model, stem cell, Stephen Hawking, strong AI, superintelligent machines, supervolcano, technological singularity, technoutopianism, The Coming Technological Singularity, The Nature of the Firm, Thomas Kuhn: the structure of scientific revolutions, transaction costs, Turing machine, Vernor Vinge, Watson beat the top human players on Jeopardy!, World Values Survey, zero-sum game

The idea of using learning as a means of bootstrapping a simpler system to human-level intelligence can be traced back at least to Alan Turing’s notion of a “child machine,” which he wrote about in 1950: Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain.3 Turing envisaged an iterative process to develop such a child machine: We cannot expect to find a good child machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution…. One may hope, however, that this process will be more expeditious than evolution. The survival of the fittest is a slow method for measuring advantages.

pages: 1,025 words: 150,187

ZeroMQ by Pieter Hintjens


anti-pattern, carbon footprint, cloud computing, Debian, distributed revision control, domain-specific language, factory automation, fault tolerance, fear of failure, finite state, Internet of things, iterative process, premature optimization, profit motive, pull request, revision control, RFC: Request For Comment, Richard Stallman, Skype, smart transportation, software patent, Steve Jobs, Valgrind, WebSocket

This was how we built (or grew, or gently steered) the ØMQ community into existence. Your goal as leader of a community is to motivate people to get out there and explore; to ensure they can do so safely and without disturbing others; to reward them when they make successful discoveries; and to ensure they share their knowledge with everyone else (and not because we ask them, not because they feel generous, but because it’s The Law). It is an iterative process. You make a small product, at your own cost, but in public view. You then build a small community around that product. If you have a small but real hit, the community then helps design and build the next version, and grows larger. And then that community builds the next version, and so on. It’s evident that you remain part of the community, maybe even a majority contributor, but the more control you try to assert over the material results, the less people will want to participate.

Poking a Dead Frog: Conversations With Today's Top Comedy Writers by Mike Sacks


Bernie Madoff, Columbine, hive mind, index card, iterative process, Norman Mailer, period drama, Ponzi scheme, pre–internet, Saturday Night Live, Upton Sinclair

They’re overly clever and jump around a lot, and have more conversational fill in them—clichés and empty phrases and so on. And they meander in terms of their causality. Things happen for no reason, and lead to nothing, or lead to something, but with weak causation. But in revision they get tighter and funnier and also gentler. And one thing leads to the next in a tighter, more undeniable way—a way that seems to “mean.” Which, I guess, makes sense, if we think of revision as just an iterative process of exerting one’s taste. Gradually the story comes to feel more like “you” than you could have imagined at the outset, and starts to manifest a sort of superlogic—an internal logic that is more direct and “caused” than mere real-life logic. The thing is, writing is really just the process of charming someone via prose—compelling them to keep reading. So, as with actual personality, part of the process is learning what it is that you’ve got to work with: How do I keep that reader reading?

pages: 481 words: 125,946

What to Think About Machines That Think: Today's Leading Thinkers on the Age of Machine Intelligence by John Brockman


3D printing, agricultural Revolution, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, artificial general intelligence, augmented reality, autonomous vehicles, basic income, bitcoin, blockchain, clean water, cognitive dissonance, Colonization of Mars, complexity theory, computer age, computer vision, constrained optimization, corporate personhood, cosmological principle, cryptocurrency, cuban missile crisis, Danny Hillis, dark matter, discrete time, Douglas Engelbart, Elon Musk, Emanuel Derman, endowment effect, epigenetics, Ernest Rutherford, experimental economics, Flash crash, friendly AI, functional fixedness, Google Glasses, hive mind, income inequality, information trail, Internet of things, invention of writing, iterative process, Jaron Lanier, job automation, John Markoff, John von Neumann, Kevin Kelly, knowledge worker, loose coupling, microbiome, Moneyball by Michael Lewis explains big data, natural language processing, Network effects, Norbert Wiener, pattern recognition, Peter Singer: altruism, phenotype, planetary scale, Ray Kurzweil, recommendation engine, Republic of Letters, RFID, Richard Thaler, Rory Sutherland, Satyajit Das, Search for Extraterrestrial Intelligence, self-driving car, sharing economy, Silicon Valley, Skype, smart contracts, speech recognition, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, superintelligent machines, supervolcano, the scientific method, The Wisdom of Crowds, theory of mind, Thorstein Veblen, too big to fail, Turing machine, Turing test, Von Neumann architecture, Watson beat the top human players on Jeopardy!, Y2K

AI can easily look like the real thing but still be a million miles away from being the real thing—like kissing through a pane of glass: It looks like a kiss but is only a faint shadow of the actual concept. I concede to AI proponents all of the semantic prowess of Shakespeare, the symbol juggling they do perfectly. Missing is the direct relationship with the ideas the symbols represent. Much of what is certain to come soon would have belonged in the old-school “Strong AI” territory. Anything that can be approached in an iterative process can and will be achieved, sooner than many think. On this point I reluctantly side with the proponents: exaflops in CPU+GPU performance, 10K resolution immersive VR, personal petabyte databases . . . here in a couple of decades. But it is not all “iterative.” There’s a huge gap between that and the level of conscious understanding that truly deserves to be called Strong, as in “Alive AI.” The big elusive question: Is consciousness an emergent behavior?

Commodity Trading Advisors: Risk, Performance Analysis, and Selection by Greg N. Gregoriou, Vassilios Karavas, François-Serge Lhabitant, Fabrice Douglas Rouah


Asian financial crisis, asset allocation, backtesting, capital asset pricing model, collateralized debt obligation, commodity trading advisor, compound rate of return, constrained optimization, corporate governance, correlation coefficient, Credit Default Swap, credit default swaps / collateralized debt obligations, discrete time, distributed generation, diversification, diversified portfolio, dividend-yielding stocks, fixed income, high net worth, implied volatility, index arbitrage, index fund, interest rate swap, iterative process, linear programming, London Interbank Offered Rate, Long Term Capital Management, market fundamentalism, merger arbitrage, Mexican peso crisis / tequila crisis, p-value, Pareto efficiency, Ponzi scheme, quantitative trading / quantitative finance, random walk, risk-adjusted returns, risk/return, selection bias, Sharpe ratio, short selling, stochastic process, survivorship bias, systematic trading, technology bubble, transaction costs, value at risk, zero-sum game

Following Chang, Pinegar, and Schachter (1997), the volume and volatility relationship is modeled without including past volatility. 2. Following Irwin and Yoshimaru (1999), volatility lags are included as independent variables to account for the time series persistence of volatility. 3. Following Bessembinder and Seguin (1993), the persistence in volume and volatility is modeled through specification of an iterative process.5 Since estimation results for the different model specifications are quite similar, only results for a modified version of Chang, Pinegar, and Schachter’s specification are reported here.6 Chang, Pinegar, and Schachter (1997) regress futures price volatility on volume associated with large speculators (as provided by the CFTC large trader reports) and all other market volume. Including two additional sets 5Another approach would be to use a model with a mean equation and a volatility equation that has both volume and GARCH (generalized autoregressive conditional heteroskedasticity) terms.

pages: 397 words: 110,130

Smarter Than You Think: How Technology Is Changing Our Minds for the Better by Clive Thompson


3D printing, 4chan, A Declaration of the Independence of Cyberspace, augmented reality, barriers to entry, Benjamin Mako Hill, butterfly effect, citizen journalism, Claude Shannon: information theory, conceptual framework, corporate governance, crowdsourcing, Deng Xiaoping, discovery of penicillin, Douglas Engelbart, Douglas Engelbart, drone strike, Edward Glaeser, Edward Thorp,, experimental subject, Filter Bubble, Freestyle chess, Galaxy Zoo, Google Earth, Google Glasses, Gunnar Myrdal, Henri Poincaré, hindsight bias, hive mind, Howard Rheingold, information retrieval, iterative process, jimmy wales, Kevin Kelly, Khan Academy, knowledge worker, lifelogging, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Netflix Prize, Nicholas Carr, patent troll, pattern recognition, pre–internet, Richard Feynman, Richard Feynman, Ronald Coase, Ronald Reagan, Rubik’s Cube, sentiment analysis, Silicon Valley, Skype, Snapchat, Socratic dialogue, spaced repetition, telepresence, telepresence robot, The Nature of the Firm, the scientific method, The Wisdom of Crowds, theory of mind, transaction costs, Vannevar Bush, Watson beat the top human players on Jeopardy!, WikiLeaks, X Prize, éminence grise

Young chess enthusiasts could buy CD-ROMs filled with hundreds of thousands of chess games. Chess-playing software could show you how an artificial opponent would respond to any move. This dramatically increased the pace at which young chess players built up intuition. If you were sitting at lunch and had an idea for a bold new opening move, you could instantly find out which historic players had tried it, then war-game it yourself by playing against software. The iterative process of thought experiments—“If I did this, then what would happen?”—sped up exponentially. Chess itself began to evolve. “Players became more creative and daring,” as Frederic Friedel, the publisher of the first popular chess databases and software, tells me. Before computers, grand masters would stick to lines of attack they’d long studied and honed. Since it took weeks or months for them to research and mentally explore the ramifications of a new move, they stuck with what they knew.

pages: 624 words: 127,987

The Personal MBA: A World-Class Business Education in a Single Volume by Josh Kaufman


Albert Einstein, Atul Gawande, Black Swan, business process, buy low sell high, capital asset pricing model, Checklist Manifesto, cognitive bias, correlation does not imply causation, Credit Default Swap, Daniel Kahneman / Amos Tversky, David Heinemeier Hansson, David Ricardo: comparative advantage, Dean Kamen, delayed gratification, discounted cash flows, Donald Knuth, double entry bookkeeping, Douglas Hofstadter,, Frederick Winslow Taylor, George Santayana, Gödel, Escher, Bach, high net worth, hindsight bias, index card, inventory management, iterative process, job satisfaction, Johann Wolfgang von Goethe, Kevin Kelly, Lao Tzu, loose coupling, loss aversion, Marc Andreessen, market bubble, Network effects, Parkinson's law, Paul Buchheit, Paul Graham, place-making, premature optimization, Ralph Waldo Emerson, rent control, side project, statistical model, stealth mode startup, Steve Jobs, Steve Wozniak, subscription business, telemarketer, the scientific method, time value of money, Toyota Production System, tulip mania, Upton Sinclair, Vilfredo Pareto, Walter Mischel, Y Combinator, Yogi Berra

Even the most discouraging Feedback contains crucial pieces of information that can help you make your offering better. The worst response you can get when asking for Feedback isn’t emphatic dislike: it’s total apathy. If no one seems to care about what you’ve created, you don’t have a viable business idea. 5. Give potential customers the opportunity to preorder. One of the most important pieces of Feedback you can receive during the iteration process is the other person’s willingness to actually purchase what you’re creating. It’s one thing for a person to say that they’d purchase something and quite another for them to be willing to pull out their wallet or credit card and place a real order. You can do this even if the offer isn’t ready yet—a tactic called Shadow Testing (discussed later). Whenever possible, give the people who are giving you Feedback the opportunity to preorder the offering.

pages: 472 words: 117,093

Machine, Platform, Crowd: Harnessing Our Digital Future by Andrew McAfee, Erik Brynjolfsson

3D printing, additive manufacturing, AI winter, Airbnb, airline deregulation, airport security, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, artificial general intelligence, augmented reality, autonomous vehicles, backtesting, barriers to entry, bitcoin, blockchain, book scanning, British Empire, business process, carbon footprint, Cass Sunstein, centralized clearinghouse, Chris Urmson, cloud computing, cognitive bias, commoditize, complexity theory, computer age, creative destruction, crony capitalism, crowdsourcing, cryptocurrency, Daniel Kahneman / Amos Tversky, Dean Kamen, discovery of DNA, disintermediation, distributed ledger, double helix, Elon Musk,, Erik Brynjolfsson, ethereum blockchain, everywhere but in the productivity statistics, family office, fiat currency, financial innovation, George Akerlof, global supply chain, Hernando de Soto, hive mind, information asymmetry, Internet of things, inventory management, iterative process, Jean Tirole, Jeff Bezos, jimmy wales, John Markoff, joint-stock company, Joseph Schumpeter, Kickstarter, law of one price, Lyft, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, Marc Andreessen, Mark Zuckerberg, meta analysis, meta-analysis, moral hazard, multi-sided market, Myron Scholes, natural language processing, Network effects, new economy, Norbert Wiener, Oculus Rift, PageRank, pattern recognition, peer-to-peer lending, performance metric, Plutocrats, plutocrats, precision agriculture, prediction markets, pre–internet, price stability, principal–agent problem, Ray Kurzweil, Renaissance Technologies, Richard Stallman, ride hailing / ride sharing, risk tolerance, Ronald Coase, Satoshi Nakamoto, Second Machine Age, self-driving car, sharing economy, Silicon Valley, Skype, slashdot, smart contracts, Snapchat, speech recognition, statistical model, Steve Ballmer, Steve Jobs, Steven Pinker, supply-chain management, TaskRabbit, Ted Nelson, The Market for Lemons, The Nature of the Firm, Thomas L Friedman, too big to fail, transaction costs, transportation-network company, traveling salesman, two-sided market, Uber and Lyft, Uber for X, Watson beat the top human players on Jeopardy!, winner-take-all economy, yield management, zero day

It’s highly energy efficient, using technology that reduces its carbon footprint by 34,000 metric tons per year, and sparing enough in its use of materials to save $58 million in construction costs. What’s more, we find its twisting, gleaming form quite beautiful. Both the building’s initial shape and structure were computer-generated. They were then advanced and refined by teams of human architects in a highly iterative process, but the starting point for these human teams was a computer-designed building, which is about as far from a blank sheet of paper as you can get. What We Are That Computers Aren’t Autogenerated-music pioneer David Cope says, “Most of what I’ve heard [and read] is the same old crap. It’s all about machines versus humans, and ‘aren’t you taking away the last little thing we have left that we can call unique to human beings — creativity?’

pages: 923 words: 516,602

The C++ Programming Language by Bjarne Stroustrup


combinatorial explosion, conceptual framework, database schema, distributed generation, Donald Knuth, fault tolerance, general-purpose programming language, index card, iterative process, job-hopping, locality of reference, Menlo Park, Parkinson's law, premature optimization, sorting algorithm

All rights reserved. 696 Development and Design Chapter 23 The purpose of ‘‘design’’ is to create a clean and relatively simple internal structure, sometimes also called an architecture, for a program. In other words, we want to create a framework into which the individual pieces of code can fit and thereby guide the writing of those individual pieces of code. A design is the end product of the design process (as far as there is an end product of an iterative process). It is the focus of the communication between the designer and the programmer and between programmers. It is important to have a sense of proportion here. If I – as an individual programmer – design a small program that I’m going to implement tomorrow, the appropriate level of precision and detail may be some scribbles on the back of an envelope. At the other extreme, the development of a system involving hundreds of designers and programmers may require books of specifications carefully written using formal or semi-formal notations.

In particular, consider the needs for construction, copying, and destruction. – Consider minimalism, completeness, and convenience. [3] Refine the classes by specifying their dependencies. – Consider parameterization, inheritance, and use dependencies. [4] Specify the interfaces. – Separate functions into public and protected operations. – Specify the exact type of the operations on the classes. Note that these are steps in an iterative process. Typically, several loops through this sequence are needed to produce a design one can comfortably use for an initial implementation or a reimplementation. One advantage of well-done analysis and data abstraction as described here is that it becomes relatively easy to reshuffle class relationships even after code has been written. This is never a trivial task, though. After that, we implement the classes and go back and review the design based on what was learned from implementing them.

pages: 496 words: 174,084

Masterminds of Programming: Conversations With the Creators of Major Programming Languages by Federico Biancuzzi, Shane Warden


Benevolent Dictator For Life (BDFL), business intelligence, business process, cellular automata, cloud computing, commoditize, complexity theory, conceptual framework, continuous integration, data acquisition, domain-specific language, Douglas Hofstadter, Fellow of the Royal Society, finite state, Firefox, follow your passion, Frank Gehry, general-purpose programming language, Guido van Rossum, HyperCard, information retrieval, iterative process, John von Neumann, Larry Wall, linear programming, loose coupling, Mars Rover, millennium bug, NP-complete, Paul Graham, performance metric, Perl 6, QWERTY keyboard, RAND corporation, randomized controlled trial, Renaissance Technologies, Ruby on Rails, Sapir-Whorf hypothesis, Silicon Valley, slashdot, software as a service, software patent, sorting algorithm, Steve Jobs, traveling salesman, Turing complete, type inference, Valgrind, Von Neumann architecture, web application

Well, all of a sudden I’ve just made some decisions like, “Wow, it’s maybe like a 2D language as opposed to a 3D language.” Maybe I make the decision that color is important for me and all of a sudden I realize, “Wow, I’ve just alienated the whole community of color-blind programmers.” Every one of those things becomes a constraint that I have to work out, and I have to deal with the consequences of those constraints. That argues for an iterative process. Grady: Absolutely. All of life is iterative. It goes back to the point I made earlier, which is you can’t a priori know enough to even ask the right questions. One has to take a leap of faith and move forward in the presence of imperfect information. Is it likely we’ll see a break-out visual programming language or system in the next 10 years? Grady: Oh, it already exists. It’s National Instruments’ Lab View.

pages: 1,156 words: 229,431

The IDA Pro Book by Chris Eagle


barriers to entry, business process,, information retrieval, iterative process

However, the process seldom runs that smoothly. Note The sigmake documentation file, sigmake.txt, recommends that signature filenames follow the MS-DOS 8.3 name-length convention. This is not a hard-and-fast requirement, however. When longer filenames are used, only the first eight characters of the base filename are displayed in the signature-selection dialog. Signature generation is often an iterative process, as it is during this phase when collisions must be handled. A collision occurs anytime two functions have identical patterns. If collisions are not resolved in some manner, it is not possible to determine which function is actually being matched during the signature-application process. Therefore, sigmake must be able to resolve each generated signature to exactly one function name. When this is not possible, based on the presence of identical patterns for one or more functions, sigmake refuses to generate a .sig file and instead generates an exclusions file (.exc).

pages: 685 words: 203,949

The Organized Mind: Thinking Straight in the Age of Information Overload by Daniel J. Levitin


airport security, Albert Einstein, Amazon Mechanical Turk, Anton Chekhov, Bayesian statistics, big-box store, business process, call centre, Claude Shannon: information theory, cloud computing, cognitive bias, complexity theory, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, cuban missile crisis, Daniel Kahneman / Amos Tversky, delayed gratification, Donald Trump,, epigenetics, Eratosthenes, Exxon Valdez, framing effect, friendly fire, fundamental attribution error, Golden Gate Park, Google Glasses, haute cuisine, impulse control, index card, indoor plumbing, information retrieval, invention of writing, iterative process, jimmy wales, job satisfaction, Kickstarter, life extension, meta analysis, meta-analysis, more computing power than Apollo, Network effects, new economy, Nicholas Carr, optical character recognition, Pareto efficiency, pattern recognition, phenotype, placebo effect, pre–internet, profit motive, randomized controlled trial, Rubik’s Cube, Skype, Snapchat, statistical model, Steve Jobs, supply-chain management, the scientific method, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, theory of mind, Thomas Bayes, Turing test, ultimatum game, zero-sum game

And the drawers could not be allowed to get too full, since then papers would catch and tear as the drawers were opened. Letter boxes had to be taken down from a shelf and opened up, a time-consuming operation when large amounts of filing were done. As Yates notes, keeping track of whether a given document or pile of documents was deemed active or archival was not always made explicit. Moreover, if the user wanted to expand, this might require transferring the contents of one box to another in an iterative process that might require dozens of boxes being moved down in the cabinet, to make room for the new box. To help prevent document loss, and to keep documents in the order they were filed, a ring system was introduced around 1881, similar to the three-ring binders we now use. The advantages of ringed flat files were substantial, providing random access (like Phaedrus’s 3 x 5 index card system) and minimizing the risk of document loss.

The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling by Ralph Kimball, Margy Ross


active measures, Albert Einstein, business intelligence, business process, call centre, cloud computing, data acquisition, discrete time, inventory management, iterative process, job automation, knowledge worker, performance metric, platform as a service, side project, supply-chain management, zero-sum game

The modeling effort typically works through the following sequence of tasks and deliverables, as illustrated in Figure 18-1: ■ High-level model defining the model’s scope and granularity ■ Detailed design with table-by-table attributes and metrics ■ Review and validation with IT and business representatives ■ Finalization of the design documentation Dimensional Modeling Process and Tasks 435 As with any data modeling effort, dimensional modeling is an iterative process. You will work back and forth between business requirements and source details to further refine the model, changing the model as you learn more. This section describes each of these major tasks. Depending on the design team’s experience and exposure to dimensional modeling concepts, you might begin with basic dimensional modeling education before kicking off the effort to ensure everyone is on the same page regarding standard dimensional vocabulary and best practices.

pages: 828 words: 232,188

Political Order and Political Decay: From the Industrial Revolution to the Globalization of Democracy by Francis Fukuyama


Affordable Care Act / Obamacare, Andrei Shleifer, Asian financial crisis, Atahualpa, banking crisis, barriers to entry, Berlin Wall, blood diamonds, British Empire, centre right, clean water, collapse of Lehman Brothers, colonial rule, conceptual framework, crony capitalism, deindustrialization, Deng Xiaoping, double entry bookkeeping, Edward Snowden, Erik Brynjolfsson, European colonialism, facts on the ground, failed state, Fall of the Berlin Wall, first-past-the-post, Francis Fukuyama: the end of history, Francisco Pizarro, Frederick Winslow Taylor, full employment, Gini coefficient, Hernando de Soto, Home mortgage interest deduction, income inequality, information asymmetry, invention of the printing press, iterative process, knowledge worker, land reform, land tenure, life extension, low skilled workers, manufacturing employment, means of production, Menlo Park, Mohammed Bouazizi, Monroe Doctrine, moral hazard, new economy, open economy, out of africa, Peace of Westphalia, Port of Oakland, post-industrial society, Post-materialism, post-materialism, price discrimination, quantitative easing, RAND corporation, rent-seeking, road to serfdom, Ronald Reagan, Scientific racism, Scramble for Africa, Second Machine Age, Silicon Valley, special economic zone, stem cell, the scientific method, The Wealth of Nations by Adam Smith, Thomas L Friedman, Thomas Malthus, too big to fail, trade route, transaction costs, Tyler Cowen: Great Stagnation, Vilfredo Pareto, women in the workforce, World Values Survey, zero-sum game

But while variation in biological evolution is random, human beings exercise some degree of agency over the design of their institutions. It is true, as authors like Friedrich A. Hayek have argued, that human beings are never knowledgeable or wise enough to be able to predict the outcomes of their efforts to design institutions or plan policies with full ex ante knowledge of the results.1 But the exercise of human agency is not a one-shot affair: human beings learn from their mistakes and take actions to correct them in an iterative process. The constitution adopted by the Federal Republic of Germany in 1949 differed in significant ways from the constitution of the Weimar Republic, precisely because Germans had learned from the failure of democracy during the 1930s. In biological evolution, there are separate specific and general processes. Under specific evolution, organisms adapt to particular environments and diverge in their characteristics.

pages: 823 words: 220,581

Debunking Economics - Revised, Expanded and Integrated Edition: The Naked Emperor Dethroned? by Steve Keen


accounting loophole / creative accounting, banking crisis, banks create money, barriers to entry, Benoit Mandelbrot, Big bang: deregulation of the City of London, Black Swan, Bonfire of the Vanities, butterfly effect, capital asset pricing model, cellular automata, central bank independence, citizen journalism, clockwork universe, collective bargaining, complexity theory, correlation coefficient, creative destruction, credit crunch, David Ricardo: comparative advantage, debt deflation, diversification, double entry bookkeeping,, Eugene Fama: efficient market hypothesis, experimental subject, Financial Instability Hypothesis, fixed income, Fractional reserve banking, full employment, Henri Poincaré, housing crisis, Hyman Minsky, income inequality, information asymmetry, invisible hand, iterative process, John von Neumann, laissez-faire capitalism, liquidity trap, Long Term Capital Management, mandelbrot fractal, margin call, market bubble, market clearing, market microstructure, means of production, minimum wage unemployment, money market fund, open economy, Pareto efficiency, Paul Samuelson, place-making, Ponzi scheme, profit maximization, quantitative easing, RAND corporation, random walk, risk tolerance, risk/return, Robert Shiller, Robert Shiller, Ronald Coase, Schrödinger's Cat, scientific mainstream, seigniorage, six sigma, South Sea Bubble, stochastic process, The Great Moderation, The Wealth of Nations by Adam Smith, Thorstein Veblen, time value of money, total factor productivity, tulip mania, wage slave, zero-sum game

The auctioneer then refuses to allow any sale to take place, and instead adjusts prices – increasing the price of those commodities where demand exceeded supply, and decreasing the price where demand was less than supply. This then results in a second set of prices, which are also highly unlikely to balance demand and supply for all commodities; so another round of price adjustments will take place, and another, and another. Walras called this iterative process of trying to find a set of prices which equates supply to demand for all commodities ‘tatonnement’ – which literally translates as ‘groping.’ He believed that this process would eventually converge to an equilibrium set of prices, where supply and demand are balanced in all markets (so long as trade at disequilibrium prices can be prevented). This was not necessarily the case, since adjusting one price so that supply and demand are balanced for one commodity could well push demand and supply farther apart for all other commodities.

pages: 1,294 words: 210,361

The Emperor of All Maladies: A Biography of Cancer by Siddhartha Mukherjee


Barry Marshall: ulcers, conceptual framework, discovery of penicillin, experimental subject, iterative process, life extension, Louis Pasteur, medical residency, meta analysis, meta-analysis, mouse model, New Journalism, phenotype, randomized controlled trial, scientific mainstream, Silicon Valley, social web, statistical model, stem cell, women in the workforce, éminence grise

In other words, if you started off with 100,000 leukemia cells in a mouse and administered a drug that killed 99 percent of those cells in a single round, then every round would kill cells in a fractional manner, resulting in fewer and fewer cells after every round of chemotherapy: 100,000 . . . 1,000 . . . 10 . . . and so forth, until the number finally fell to zero after four rounds. Killing leukemia was an iterative process, like halving a monster’s body, then halving the half, and halving the remnant half. Second, Skipper found that by adding drugs in combination, he could often get synergistic effects on killing. Since different drugs elicited different resistance mechanisms, and produced different toxicities in cancer cells, using drugs in concert dramatically lowered the chance of resistance and increased cell killing.

pages: 892 words: 91,000

Valuation: Measuring and Managing the Value of Companies by Tim Koller, McKinsey, Company Inc., Marc Goedhart, David Wessels, Barbara Schwimmer, Franziska Manoury


activist fund / activist shareholder / activist investor, air freight, barriers to entry, Basel III, BRICs, business climate, business process, capital asset pricing model, capital controls, Chuck Templeton: OpenTable, cloud computing, commoditize, compound rate of return, conceptual framework, corporate governance, corporate social responsibility, creative destruction, credit crunch, Credit Default Swap, discounted cash flows, distributed generation, diversified portfolio, energy security, equity premium, fixed income, index fund, intangible asset, iterative process, Long Term Capital Management, market bubble, market friction, meta analysis, meta-analysis, Myron Scholes, negative equity, new economy, p-value, performance metric, Ponzi scheme, price anchoring, purchasing power parity, quantitative easing, risk/return, Robert Shiller, Robert Shiller, shareholder value, six sigma, sovereign wealth fund, speech recognition, survivorship bias, technology bubble, time value of money, too big to fail, transaction costs, transfer pricing, value at risk, yield curve, zero-coupon bond

ENTERPRISE DISCOUNTED CASH FLOW MODEL 141 EXHIBIT 8.4 UPS: Enterprise DCF Valuation Forecast year 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 Continuing value Present value of cash flow Free cash flow Discount factor, Present value (FCF), $ million @ 8.0% of FCF, $ million 3,472 4,108 4,507 4,892 5,339 5,748 6,194 6,678 7,086 7,523 168,231 0.926 0.857 0.794 0.735 0.681 0.630 0.584 0.541 0.501 0.463 0.463 3,215 3,522 3,579 3,596 3,634 3,623 3,615 3,609 3,547 3,486 77,967 113,395 Midyear adjustment factor Value of operations 1.039 117,840 Value of excess cash Value of investments Enterprise value 4,136 148 122,124 Less: Value of debt Less: Value of after-tax unfunded retirement obligations Less: Value of capitalized operating leases Less: Value of noncontrolling interest Equity value (10,872) (5,042) (5,841) (14) 100,355 Millions of shares outstanding (December 2013) Equity value per share ($) 923 109 amount of noncontrolling interests.5 Divide the resulting equity value of $100.4 billion by the number of shares outstanding (923 million) to estimate a pershare value of $109. During the middle part of 2014, when we performed this valuation, UPS’s stock traded between $95 and $105 per share, well within a reasonable range of the DCF valuation (reasonable changes in forecast assumptions or WACC estimates can easily move a company’s value by up to 15 percent). Although this chapter presents the enterprise DCF valuation sequentially, valuation is an iterative process. To value operations, first reorganize the company’s financial statements to separate operating items from nonoperating items and capital structure. Then analyze the company’s historical performance; define and project free cash flow over the short, medium, and long 5 A noncontrolling interest arises when an outside investor owns a minority share of a subsidiary. Since this outside investor has a partial claim on cash flows, the claim’s value must be deducted from enterprise value to compute equity value. 142 FRAMEWORKS FOR VALUATION run; and discount the projected free cash flows at the weighted average cost of capital.

The Art of Computer Programming by Donald Ervin Knuth


Brownian motion, complexity theory, correlation coefficient, Donald Knuth, Eratosthenes, Georg Cantor, information retrieval, Isaac Newton, iterative process, John von Neumann, Louis Pasteur, mandelbrot fractal, Menlo Park, NP-complete, P = NP, Paul Erdős, probability theory / Blaise Pascal / Pierre de Fermat, RAND corporation, random walk, sorting algorithm, Turing machine, Y2K

(On the other hand the constant of proportionality is such that N must be really large before Algorithms L and T lose out to this "high-speed" method.) Historical note: J. N. Bramhall and M. A. Chappie published the first O(N3) method for power series reversion in CACM 4 A961), 317-318, 503. It was an offline algorithm essentially equivalent to the method of exercise 16, with running time approximately the same as that of Algorithms L and T. Iteration of series. If we want to study the behavior of an iterative process xn «— f(xn-i), we are interested in studying the n-fold composition of a given function / with itself, namely xn = /(/(... f{xo) •••))• Let us define f^(x) = x and /W(x) = /(/[n~1](z))> so that A8) for all integers m, n > 0. In many cases the notation f^n\x) makes sense also when n is a negative integer, namely if /M and /t~nl are inverse functions such that x — f^(f^~n^(x)); if inverse functions are unique, A8) holds for all integers m and n.

pages: 1,797 words: 390,698

Power at Ground Zero: Politics, Money, and the Remaking of Lower Manhattan by Lynne B. Sagalyn


affirmative action, airport security, Bonfire of the Vanities, clean water, conceptual framework, corporate governance, deindustrialization, Donald Trump, Edward Glaeser, estate planning, Frank Gehry, Guggenheim Bilbao, high net worth, informal economy, intermodal, iterative process, Jane Jacobs, mortgage debt, New Urbanism, place-making, rent control, Rosa Parks, Rubik’s Cube, Silicon Valley, sovereign wealth fund, the built environment, the High Line, time value of money, too big to fail, Torches of Freedom, urban decay, urban planning, urban renewal, white flight, young professional

Two months later, the LMDC released both statements to the public and then launched “an aggressive public outreach campaign to solicit public input,” receiving some twenty-four hundred comments from public hearings, meetings with its advisory councils and Community Board 1, mailings to the families of victims and elected officials, and input from its official website, e-mail, and regular mail. The drafting committees convened again to review the public comments and make adjustments. During the process, Contini constantly went back to the Families Advisory Council to keep its members informed; there were always a few who didn’t agree, but most agreed with what was being formulated. The memorial, she said, “had to be about the individual and about the larger event.” The iterative process, Goldberger wrote, produced a final version “not nearly so genteel” as the initial attempt at a mission statement, which was “notable for its cautious, even hesitant language and sense of propriety.” The final version was “short, simpler, and blunter”:21 Remember and honor the thousands of innocent men, women, and children murdered by terrorists in the horrific attacks of February 26, 1993, and September 11, 2001.