semantic web

42 results back to index

pages: 315 words: 70,044

Learning SPARQL by Bob DuCharme


database schema, Donald Knuth,, G4S, linked data, semantic web, SPARQL, web application

Later chapters describe how to create more complex queries, how to modify data, how to build applications around your queries, and how it all fits into the semantic web, but if you can execute the queries shown in this chapter, you’re ready to put SPARQL to work for you. Chapter 2. The Semantic Web, RDF, and Linked Data (and SPARQL) SPARQL is a query language for data that follows a particular model, but the semantic web isn’t about the query language or about the model—it’s about the data. The booming amount of data becoming available on the semantic web is making great new kinds of applications possible, and as a well-implemented, mature standard designed with the semantic web in mind, SPARQL is the best way to get that data and put it to work in your applications. What Exactly Is the “Semantic Web”? As excitement over the semantic web grows, some vendors use the phrase to sell products with strong connections to the ideas behind the semantic web, and others use it to sell products with weaker connections.

, Querying the Data, More Realistic Data and Matching on Multiple Triples, URLs, URIs, IRIs, and Namespaces, Storing RDF in Databases, Data That Might Not Be There, Searching Further in the Data, Querying a Remote SPARQL Service, Creating New Data, Using Existing SPARQL Rules Vocabularies, Deleting and Replacing Triples in Named Graphs, Middleware SPARQL Support join (SPARQL equivalent), Searching Further in the Data normalization and, Creating New Data outer join (SPARQL equivalent), Data That Might Not Be There row ID values and, More Realistic Data and Matching on Multiple Triples, URLs, URIs, IRIs, and Namespaces SPARQL middleware and, Middleware SPARQL Support SPARQL rules and, Using Existing SPARQL Rules Vocabularies SQL habits, Querying the Data remote SPARQL service, querying, Querying a Remote SPARQL Service, Querying a Remote SPARQL Service Resource Description Format, The Data to Query (see RDF) round(), Numeric Functions S sample code, Using Code Examples schema, What Exactly Is the “Semantic Web”?, Glossary Schemarama, Using Existing SPARQL Rules Vocabularies screen scraping, What Exactly Is the “Semantic Web”?, Storing RDF in Files, Glossary searching for string, Searching for Strings SELECT, Querying the Data, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT semantic web, What Exactly Is the “Semantic Web”? semantics, What Exactly Is the “Semantic Web”?, Reusing and Creating Vocabularies: RDF Schema and OWL semicolon, Storing RDF in Files, More Readable Query Results, Converting Data, Named Graphs CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files serialization, Storing RDF in Files, Glossary SERVICE, Querying a Remote SPARQL Service simple literal, Glossary SKOS, Making RDF More Readable with Language Tags and Labels, Datatypes and Queries, Checking, Adding, and Removing Spoken Language Tags creating, Checking, Adding, and Removing Spoken Language Tags custom datatypes and, Datatypes and Queries SKOS-XL, Changing Existing Data SNORQL, Querying a Public Data Source sorting data, Sorting Data space before SPARQL punctuation, The Data to Query SPARQL, Jumping Right In: Some Data and Some Queries, Jumping Right In: Some Data and Some Queries, The Data to Query, Querying the Data, Querying the Data, Querying the Data, Storing RDF in Databases, The SPARQL Specifications, The SPARQL Specifications, The SPARQL Specifications, Updating Data with SPARQL, Named Graphs, Glossary comments, The Data to Query engine, Querying the Data Graph Store HTTP Protocol specification, Named Graphs processor, Querying the Data protocol, Jumping Right In: Some Data and Some Queries, The SPARQL Specifications query language, The SPARQL Specifications SPARQL 1.1, Updating Data with SPARQL specifications, The SPARQL Specifications triplestores and, Storing RDF in Databases uppercase keywords, Querying the Data SPARQL endpoint, Querying a Public Data Source, SPARQL and Web Application Development, Triplestore SPARQL Support, Glossary creating your own, Triplestore SPARQL Support SPARQL processor, Glossary SPARQL protocol, Glossary SPARQL Query Results XML Format, The SPARQL Specifications, SPARQL Query Results XML Format, Standalone Processors as ARQ output, Standalone Processors SPARQL rules, Defining Rules with SPARQL, Defining Rules with SPARQL SPIN, Using Existing SPARQL Rules Vocabularies spreadsheets, Checking, Adding, and Removing Spoken Language Tags SQL, Querying the Data, Glossary square braces, Blank Nodes and Why They’re Useful, Using Existing SPARQL Rules Vocabularies str(), Node Type Conversion Functions STRDT(), Datatype Conversion STRENDS(), String Functions string datatype, Datatypes and Queries, Representing Strings striping, Storing RDF in Files, Glossary STRLANG(), Checking, Adding, and Removing Spoken Language Tags STRLEN(), String Functions STRSTARTS(), String Functions subject (of triple), The Data to Query, URLs, URIs, IRIs, and Namespaces, The Resource Description Format (RDF), Glossary namespaces and, URLs, URIs, IRIs, and Namespaces subqueries, Queries in Your Queries, Combining Values and Assigning Values to Variables, Federated Queries: Searching Multiple Datasets with One Query SUBSTR(), Creating New Data, String Functions subtraction, Comparing Values and Doing Arithmetic SUM(), Finding the Smallest, the Biggest, the Count, the Average...

By adding just a little bit of metadata (for example, the information about the ab:spouse, ab:patient, and ab:doctor properties above) to a small set of data (the information about Richard, Craig, and Cindy) we got more out of this dataset than we originally put into it. This is one of the great payoffs of semantic web technology. Tip The OWL 2 upgrade to the original OWL standard introduced several profiles, or subsets of OWL, that are specialized for certain kinds of applications. These profiles are easier to implement and use than attempting to take on all of OWL at once. If you’re thinking of doing some data modeling with OWL, look into OWL 2 RL, OWL 2 QL, and OWL 2 EL as possible starting points for your needs. Of all the W3C semantic web standards, OWL is the key one for putting the “semantic” in “semantic web.” The term “semantics” is sometimes defined as the meaning behind words, and those who doubt the value of semantic web technology like to question the viability of storing all the meaning of a word in a machine-readable way.

pages: 511 words: 111,423

Learning SPARQL by Bob Ducharme


Donald Knuth,, G4S, hypertext link, linked data, place-making, semantic web, SPARQL, web application

Summary In this chapter, we learned: What SPARQL is The basics of RDF The meaning and role of URIs The parts of a simple SPARQL query How to execute a SPARQL query with ARQ How the same variable in multiple triple patterns can connect up the data in different triples What can lead to a query returning nothing What SPARQL endpoints are and how to query the most popular one, DBpedia Later chapters describe how to create more complex queries, how to modify data, how to build applications around your queries, the potential role of inferencing, and the technology’s roots in the semantic web world, but if you can execute the queries shown in this chapter, you’re ready to put SPARQL to work for you. Chapter 2. The Semantic Web, RDF, and Linked Data (and SPARQL) The SPARQL query language is for data that follows a particular model, but the semantic web isn’t about the query language or about the model—it’s about the data. The booming amount of data becoming available on the semantic web is making great new kinds of applications possible, and as a well-implemented, mature standard designed with the semantic web in mind, SPARQL is the best way to get that data and put it to work in your applications. Note The flexibility of the RDF data model means that it’s being used more and more with projects that have nothing to do with the “semantic web” other than their use of technology that uses these standards—that’s why you’ll often see references to “semantic web technology.”

Note The flexibility of the RDF data model means that it’s being used more and more with projects that have nothing to do with the “semantic web” other than their use of technology that uses these standards—that’s why you’ll often see references to “semantic web technology.” What Exactly Is the “Semantic Web”? As excitement over the semantic web grows, some vendors use the phrase to sell products with strong connections to the ideas behind the semantic web, and others use it to sell products with weaker connections. This can be confusing for people trying to understand the semantic web landscape. I like to define the semantic web as a set of standards and best practices for sharing data and the semantics of that data over the Web for use by applications. Let’s look at this definition one or two phrases at a time, and then we’ll look at these issues in more detail. A set of standards Before Tim Berners-Lee invented the World Wide Web, more powerful hypertext systems were available, but he built his around simple specifications that he published as public standards.

, Storing RDF in Databases, Querying a Remote SPARQL Service, Deleting and Replacing Triples in Named Graphs (see also SQL) join (SPARQL equivalent), Searching Further in the Data normalization and, Creating New Data outer join (SPARQL equivalent), Data That Might Not Be There row ID values and, More Realistic Data and Matching on Multiple Triples, URLs, URIs, IRIs, and Namespaces SPARQL middleware and, Middleware SPARQL Support SPARQL rules and, Using Existing SPARQL Rules Vocabularies remote SPARQL service, querying, Querying a Remote SPARQL Service–Querying a Remote SPARQL Service Resource Description Framework (see RDF) REST, SPARQL and HTTP restriction classes, SPARQL and OWL Inferencing round(), Numeric Functions Ruby, SPARQL and Web Application Development rules, SPARQL (see SPARQL rules) S sameTerm(), Node Type and Datatype Checking Functions sample code, Using Code Examples schema, What Exactly Is the “Semantic Web”?, Glossary querying, Querying Schemas Schemarama, Using Existing SPARQL Rules Vocabularies Schematron, Finding Bad Data screen scraping, What Exactly Is the “Semantic Web”?, Storing RDF in Files, Glossary search space, Reduce the Search Space searching for string, Searching for Strings SELECT, Querying the Data, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT semantic web, What Exactly Is the “Semantic Web”?, Glossary semantics, What Exactly Is the “Semantic Web”?, Reusing and Creating Vocabularies: RDF Schema and OWL semicolon, More Readable Query Results connecting operations with, Named Graphs CONSTRUCT queries and, Converting Data in N3 and Turtle, Storing RDF in Files serialization, Storing RDF in Files, Glossary SERVICE, Querying a Remote SPARQL Service Sesame triplestore, Querying Named Graphs, Datatypes and Queries inferencing with, Inferred Triples and Your Query repositories, SPARQL and HTTP simple literal, Glossary SKOS, Making RDF More Readable with Language Tags and Labels creating, Checking, Adding, and Removing Spoken Language Tags custom datatypes and, Datatypes and Queries SKOS-XL, Changing Existing Data SNORQL, Querying a Public Data Source sorting, Sorting Data query efficiency and, Efficiency Outside the WHERE Clause space before SPARQL punctuation, The Data to Query SPARQL, Jumping Right In: Some Data and Some Queries, Glossary comments, The Data to Query endpoint, Querying a Remote SPARQL Service engine, Querying the Data Graph Store HTTP Protocol specification, Named Graphs processor, Querying the Data protocol, Jumping Right In: Some Data and Some Queries, The SPARQL Specifications query language, The SPARQL Specifications SPARQL 1.1, Updating Data with SPARQL specifications, The SPARQL Specifications triplestores and, Storing RDF in Databases uppercase keywords, Querying the Data SPARQL algebra, SPARQL Algebra SPARQL endpoint, Querying a Public Data Source, Public Endpoints, Private Endpoints–Public Endpoints, Private Endpoints, Glossary creating your own, Triplestore SPARQL Support identifier, SPARQL and Web Application Development Linked Data Cloud and, Problem retrieving triples from, Problem SERVICE keyword and, Federated Queries: Searching Multiple Datasets with One Query SPARQL processor, SPARQL Processors–Public Endpoints, Private Endpoints, Glossary SPARQL protocol, Glossary SPARQL Query Results CSV and TSV Formats, SPARQL Query Results CSV and TSV Formats SPARQL Query Results JSON Format, SPARQL Query Results JSON Format SPARQL Query Results XML Format, The SPARQL Specifications, SPARQL Query Results XML Format as ARQ output, Standalone Processors SPARQL rules, Defining Rules with SPARQL–Defining Rules with SPARQL SPIN (SPARQL Inferencing Notation), Using Existing SPARQL Rules Vocabularies, Using SPARQL to Do Your Inferencing spreadsheets, Checking, Adding, and Removing Spoken Language Tags, Using CSV Query Results SQL, Querying the Data, Query Forms: SELECT, DESCRIBE, ASK, and CONSTRUCT, Middleware SPARQL Support, Glossary habits, Querying the Data square braces, Blank Nodes and Why They’re Useful, Using Existing SPARQL Rules Vocabularies str(), Node Type Conversion Functions CSV format and, SPARQL Query Results CSV and TSV Formats STRDT(), Datatype Conversion STRENDS(), String Functions string converting to URI, Problem datatype, Datatypes and Queries, Representing Strings functions, String Functions–String Functions searching for substrings, Problem striping, Storing RDF in Files, Glossary STRLANG(), Checking, Adding, and Removing Spoken Language Tags STRLEN(), String Functions STRSTARTS(), String Functions subject (of triple), The Data to Query, The Resource Description Framework (RDF), Glossary namespaces and, URLs, URIs, IRIs, and Namespaces subqueries, Queries in Your Queries, Combining Values and Assigning Values to Variables, Federated Queries: Searching Multiple Datasets with One Query SUBSTR(), Creating New Data, String Functions subtraction, Comparing Values and Doing Arithmetic SUM(), Finding the Smallest, the Biggest, the Count, the Average...

pages: 541 words: 109,698

Mining the Social Web: Finding Needles in the Social Haystack by Matthew A. Russell


Climategate, cloud computing, crowdsourcing,, fault tolerance, Firefox, full text search, Georg Cantor, Google Earth, information retrieval, Mark Zuckerberg, natural language processing, NP-complete, profit motive, Saturday Night Live, semantic web, Silicon Valley, slashdot, social graph, social web, statistical model, Steve Jobs, supply-chain management, text mining, traveling salesman, Turing test, web application

Figure 9-9. A rotating tag cloud that’s highly customizable and requires very little effort to get up and running Chapter 10. The Semantic Web: A Cocktail Discussion While the previous chapters attempted to provide an overview of the social web and motivate you to get busy hacking on data, it seems appropriate to wrap up with a brief postscript on the semantic web. This short discussion makes no attempt to regurgitate the reams of interesting mailing list discussions, blog posts, and other sources of information that document the origin of the Web, how it has revolutionized just about everything in our lives in under two decades, and how the semantic web has always been a part of that vision. It does, however, aim to engage you in something akin to a cocktail discussion that, while glossing over a lot of the breadth and depth of these issues, hopefully excites you about the possibilities that lie ahead.

At present, there’s no real consensus about what Web 3.0 really means, but most discussions of the subject generally include the phrase “semantic web” and the notion of information being consumed and acted upon by machines in ways that are not yet possible at web scale. For example, it’s still very difficult for machines to extract and make inferences about the facts contained in documents available online. Keyword searching and heuristics can certainly provide listings of very relevant search results, but human intelligence is still required to interpret and synthesize the information in the documents themselves. Whether Web 3.0 and the semantic web are really the same thing is open for debate; however, it’s generally accepted that the term semantic web refers to a web that’s much like the one we already know and love, but that has evolved to the point where machines can extract and act on the information contained in documents at a granular level.

Various manifestations/eras of the Web and their virtues Manifestation/era Virtues Internet Application protocols such as SMTP, FTP, BitTorrent, HTTP, etc. Web 1.0 Mostly static HTML pages and hyperlinks Web 2.0 Platforms, collaboration, rich user experiences Social web (Web 2.x ???) People and their virtual and real-world social connections and activities Web 3.0 (the semantic web) Prolific amounts of machine-understandable content * * * [62] As defined in Programming the Semantic Web, by Toby Segaran, Jamie Taylor, and Colin Evans (O’Reilly). [63] Inter-net literally implies “mutual or cooperating networks.” Man Cannot Live on Facts Alone The semantic web’s fundamental construct for representing knowledge is called a triple, which is a highly intuitive and very natural way of expressing a fact. As an example, the sentence we’ve considered on many previous occasions—“Mr. Green killed Colonel Mustard in the study with the candlestick”—expressed as a triple might be something like (Mr.

pages: 377 words: 110,427

The Boy Who Could Change the World: The Writings of Aaron Swartz by Aaron Swartz, Lawrence Lessig


affirmative action, Alfred Russel Wallace, American Legislative Exchange Council, Benjamin Mako Hill, bitcoin, Bonfire of the Vanities, Brewster Kahle, Cass Sunstein, deliberate practice, Donald Knuth, Donald Trump, failed state, fear of failure, Firefox, full employment, Howard Zinn, index card, invisible hand, John Gruber, Lean Startup, More Guns, Less Crime, peer-to-peer, post scarcity, Richard Feynman, Richard Feynman, Richard Stallman, Ronald Reagan, school vouchers, semantic web, single-payer health, SpamAssassin, SPARQL, telemarketer, The Bell Curve by Richard Herrnstein and Charles Murray, the scientific method, Toyota Production System, unbiased observer, wage slave, Washington Consensus, web application, WikiLeaks, working poor, zero-sum game

All of which has led “web engineers” (as this series’ title so cutely calls them) to tune out and go back to doing real work, not wanting to waste their time with things that don’t exist and, in all likelihood, never will. And it’s led many who have been working on the Semantic Web, in the vain hope of actually building a world where software can communicate, to burn out and tune out and find more productive avenues for their attentions. For an example, look at Sean B. Palmer. In his influential piece, “Ditching the Semantic Web?,” he proclaims “It’s not prudent, perhaps even not moral (if that doesn’t sound too melodramatic), to work on RDF, OWL, SPARQL, RIF, the broken ideas of distributed trust, CWM, Tabulator, Dublin Core, FOAF, SIOC, and any of these kinds of things” and says not only will he “stop working on the Semantic Web” but “I will, moreover, actively dissuade anyone from working on the Semantic Web where it distracts them from working on” more practical projects. It would be only fair here to point out that I am not exactly an unbiased observer.

The Techniques of Mass Collaboration: A Third Way Out July 19, 2006 Age 19 I’m not the first to suggest that the Internet could be used for bringing users together to build grand databases. The most famous example is the Semantic Web project (where, in full disclosure, I worked for several years). The project, spearheaded by Tim Berners-Lee, inventor of the web, proposed to extend the working model of the web to more structured data, so that instead of simply publishing text web pages, users could publish their own databases, which could be aggregated by search engines like Google into major resources. The Semantic Web project has received an enormous amount of criticism, much (in my view) rooted in misunderstandings, but much legitimate as well. In the news today is just the most recent example, in which famed computer scientist turned Google executive Peter Norvig challenged Tim Berners-Lee on the subject at a conference.

But Wikipedia points to a different model, where all the users come to one website, where the interface for inputting data in the proper format is clear and unambiguous, and the users can work together to resolve any conflicts that may come up. Indeed, this method strikes me as so superior that I’m surprised I don’t see it discussed in this context more often. Ignorance doesn’t seem plausible; even if Wikipedia was a latecomer, sites like ChefMoz and MusicBrainz followed this model and were Semantic Web case studies. (Full disclosure: I worked on the Semantic Web portions of MusicBrainz.) Perhaps the reason is simply that both sides—W3C and Google—have the existing web as the foundation for their work, so it’s not surprising that they assume future work will follow from the same basic model. One possible criticism of the million-dollar-users proposal is that it’s somehow less free than the individualist approach. One site will end up being in charge of all the data and thus will be able to control its formation.

pages: 223 words: 52,808

Intertwingled: The Work and Influence of Ted Nelson (History of Computing) by Douglas R. Dechow


3D printing, Apple II, Bill Duvall, Brewster Kahle, Buckminster Fuller, Claude Shannon: information theory, cognitive dissonance, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, Dynabook, Edward Snowden, game design, HyperCard, hypertext link, information retrieval, Internet Archive, Jaron Lanier, knowledge worker, linked data, Marc Andreessen, Marshall McLuhan, Menlo Park, Mother of all demos, pre–internet, RAND corporation, semantic web, Silicon Valley, software studies, Steve Jobs, Steve Wozniak, Stewart Brand, Ted Nelson, the medium is the message, Vannevar Bush, Wall-E, Whole Earth Catalog

Microcosm was an open hypermedia system in that all the links were stored in a database as first-class entities that could be reasoned about and applied to any document. Each link was a triple that consisted of a source, a destination and a description. Little did I know at the time how prescient of the Semantic Web these ideas would be. Of course, there are problems with automatically making a link on a word without knowing its precise semantic meaning. There are a lot of different people with the name Mountbatten in the Mountbatten archive for example. So working out the context in which the link was being applied and therefore the meaning of the word became a key focus of our work: problems we are still dealing with as the Semantic Web develops today. We did also have specific links in Microcosm that were more like standard hypertext links because they were embedded in the documents and represented to the user through highlighted buttons, and you could trace them backwards though the link database or linkbase as we called it.

Three things happened at that conference as I recall. Tim started talking about the Semantic Web again in his keynote for the conference. He had talked about it at the first WWW conference in 1994 [1] and the idea of making links on data in the information management proposal he wrote in 1989. As far as he was concerned in 1998, the web of linked documents was beginning to emerge but his vision wasn’t complete until it was also a web of linked data, and so he started to re-educate the community about this at the Brisbane conference. Ted was also at the Brisbane conference to pick up a special award. I remember him demoing ZigZag to us in the bar one night at that conference. He was so excited, and we were all mesmerized. So I had heard Tim talk about the Semantic Web and I saw Ted demo ZigZag at the same conference, and I didn’t fully appreciate either of them at the time.

I understood the principles, but I didn’t understand the detail. It’s taken me a long time to appreciate both the Semantic Web and ZigZag, but as my understanding of both of them has increased I now firmly believe what I suspected all along: there is a one-to-one correspondence between the two ideas, and that you can implement ZigZag in the RDF graph. Someday I’ll find the time to prove that. I need to get Ted involved in making that happen. I really believe that these two amazing people—Tim and Ted—have the same idea of how you can make links on data to create an incredibly rich hyper-structure for generating knowledge. Tim will never talk about it like that. His idea with the Semantic Web is that machines can, if you describe the data using a vocabulary like an ontology, make inferences about the information contained in the data that couldn’t be made in any other way.

pages: 314 words: 94,600

Business Metadata: Capturing Enterprise Knowledge by William H. Inmon, Bonnie K. O'Neil, Lowell Fryman


affirmative action, bioinformatics, business intelligence, business process, call centre, carbon-based life, continuous integration, corporate governance, create, read, update, delete, database schema,, informal economy, knowledge economy, knowledge worker, semantic web, The Wisdom of Crowds, web application

After a brief survey of semantics and semantic technology, we will cover the relationship of semantics and business metadata. 11.2 C H A P T E R 11 C H A P T E R TA B L E O F CO N T E N T S The Vision of the Semantic Web Tim Berners-Lee envisioned the idea of the “semantic web,” wherein intelligent agents would be truly intelligent. 195 196 Chapter 11 Semantics and Business Metadata In his vision the computer would know exactly what “booking a restaurant reservation” meant, as well as all the underlying tasks associated with it. For example, you could ask the computer to book a reservation at an Indian restaurant on the way home from work, and the computer would find an Indian restaurant located directly on your way home, book a reservation for you, and put it automatically on your calendar, all without human intervention. In the context of searching for documents, a semantic web would be able to understand what the documents contained.

In the context of searching for documents, a semantic web would be able to understand what the documents contained. Today, we rely mostly on document titles and tagging. Tagging is usually done manually either by the document author, someone else charged with tagging after the fact, or through a folksonomy like But a true semantic web could decipher document contents on its own. On a smaller scale, the semantic web means distinguishing between word senses: when there are two or more senses of a word, the user is asked, “Did you mean…?” For example, we have used the word “mole” throughout the book to illustrate word sense. Google can now distinguish between spelling variations and probable errors. However, if Google were semantically enabled, it would be able to distinguish between the different word senses of mole, and Google would either ask the user which sense he or she wanted or, better, would display results based on each sense.

Business Metadata Praise for Business Metadata “Despite the presence of some excellent books on what is essentially “technical” metadata, up until now there has been a dearth of wellpresented material to help address the growing need for interaction at the conceptual and semantic levels between data professionals and the business clients they support. In Business Metadata, Bill, Bonnie, and Lowell provide the means for bridging the gap between the sometimes “fuzzy” human perception of data that fuels business processes and the rigid information management models used by business applications. Look to the future: next generation business intelligence, enterprise content management and search, the semantic web all will depend on business metadata. Read this book!” —David Loshin, President, Knowledge Integrity Incorporated These authors have written a book that ventures into new territory for data and information management. There are several books about metadata, but this is the first to offer in-depth discussion of the important topic of business metadata. Business metadata is really about understanding the business – something that IT people have struggled with since the dawn of information technology.

The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences by Rob Kitchin


Bayesian statistics, business intelligence, business process, cellular automata, Celtic Tiger, cloud computing, collateralized debt obligation, conceptual framework, congestion charging, corporate governance, correlation does not imply causation, crowdsourcing, discrete time, George Gilder, Google Earth, Infrastructure as a Service, Internet Archive, Internet of things, invisible hand, knowledge economy, late capitalism, lifelogging, linked data, Masdar, means of production, Nate Silver, natural language processing, openstreetmap, pattern recognition, platform as a service, recommendation engine, RFID, semantic web, sentiment analysis, slashdot, smart cities, Smart Cities: Big Data, Civic Hackers, and the Quest for a New Utopia, smart grid, smart meter, software as a service, statistical model, supply-chain management, the scientific method, The Signal and the Noise by Nate Silver, transaction costs

McClean, T. (2011) Not with a Bang but a Whimper: the Politics of Accountability and Open Data in the UK. Paper prepared for the American Political Science Association Annual Meeting. Seattle, Washington, 1–4 September 2011. (last accessed 19 August 2013). McCreary, D. (2009) ‘Entity extraction and the semantic web’, Semantic Web, 12 January, (last accessed 19 July 2013). McKeon, S.G. (2013) ‘Hacking the hackathon’,, 10 October, (last accessed 21 October 2013). McNay, L. (1994) Foucault: A Critical Introduction. Polity Press, Oxford. Miller, H.J. (2010) ‘The data avalanche is here. Shouldn’t we be digging?’, Journal of Regional Science, 50(1): 181–201.

Since the late 2000s the movement has noticeably gained prominence and traction, initially with the Guardian newspaper’s campaign in the UK to ‘Free Our Data’ (, the Organization for Economic Cooperation and Development (OECD)’s call for member governments to open up their data in 2008, the launch in 2009 by the US government of, a website designed to provide access to non-sensitive and historical datasets held by US state and federal agencies, and the development of linked data and the promotion of the ‘Semantic Web’ as a standard element of future Internet technologies, in which open and linked data are often discursively conjoined (Berners-Lee 2009). Since 2010 dozens of countries and international organisations (e.g., the European Union [EU] and the United Nations Development Programme [UNDP]) have followed suit, making thousands of previously restricted datasets open in nature for non-commercial and commercial use (see DataRemixed 2013).

Given that by their nature open data generate no or little income to fund such service arrangements, nor indeed the costs of opening data, while it is easy to agree that open data should be delivered as a service, in practice it might be an aspiration unless effective funding models are developed (as discussed more fully below). Linked Data The idea of linked data is to transform the Internet from a ‘web of documents’ to a ‘web of data’ through the creation of a semantic web (Berners-Lee 2009; P. Miller, 2010), or what Goddard and Byrne (2010) term a ‘machine-readable web’. Such a vision recognises that all of the information shared on the Web contains a rich diversity of data – names, addresses, product details, facts, figures, and so on. However, these data are not necessarily formally identified as such, nor are they formally structured in such a way as to be easily harvested and used.

pages: 287 words: 86,919

Protocol: how control exists after decentralization by Alexander R. Galloway


Ada Lovelace, airport security, Berlin Wall, bioinformatics, Bretton Woods, computer age, Craig Reynolds: boids flock, discovery of DNA, Donald Davies, double helix, Douglas Engelbart, Douglas Engelbart, easy for humans, difficult for computers, Fall of the Berlin Wall, Grace Hopper, Hacker Ethic, informal economy, John Conway, John Markoff, Kevin Kelly, late capitalism, linear programming, Marshall McLuhan, means of production, Menlo Park, moral panic, mutually assured destruction, Norbert Wiener, old-boy network, packet switching, phenotype, post-industrial society, profit motive, QWERTY keyboard, RAND corporation, Ray Kurzweil, RFC: Request For Comment, Richard Stallman, semantic web, SETI@home, stem cell, Steve Crocker, Steven Levy, Stewart Brand, Ted Nelson, telerobotics, the market place, theory of mind, urban planning, Vannevar Bush, Whole Earth Review, working poor

By making the descriptive protocols more complex, one is able to say more complex things about information, namely, that Galloway is my surname, and my given name is Alexander, and so on. The Semantic Web is simply the process of adding extra metalayers on top of information so that it can be parsed according to its semantic value. Why is this significant? Before this, protocol had very little to do with meaningful information. Protocol does not interface with content, with semantic value. It is, as I have said, against interpretation. But with Berners-Lee comes a new strain of protocol: protocol that cares about meaning. This is what he means by a Semantic Web. It is, as he says, “machineunderstandable information.” Does the Semantic Web, then, contradict my earlier principle that protocol is against interpretation? I’m not so sure. Protocols can certainly say things about their contents.

In many ways the core protocols of the Internet had their development heyday in the 1980s. But Web protocols are experiencing explosive growth 38. Berners-Lee, Weaving the Web, p. 36. 39. Berners-Lee, Weaving the Web, p. 71. 40. Berners-Lee, Weaving the Web, pp. 92, 94. Chapter 4 138 today. Current growth is due to an evolution of the concept of the Web into what Berners-Lee calls the Semantic Web. In the Semantic Web, information is not simply interconnected on the Internet using links and graphical markup—what he calls “a space in which information could permanently exist and be referred to”41—but it is enriched using descriptive protocols that say what the information actually is. For example, the word “Galloway” is meaningless to a machine. It is just a piece of information that says nothing about what it is or what it means.

But do they actually know the meaning of their contents? So it is a matter of debate as to whether descriptive protocols actually add intelligence to information, or whether they are simply subjective descriptions (originally written by a human) that computers mimic but understand little about. Berners-Lee himself stresses that the Semantic Web is not an artificial intelligence machine.42 He calls it “well-defined” data, not interpreted data—and 41. Berners-Lee, Weaving the Web, p. 18. 42. Tim Berners-Lee, “What the Semantic Web Can Represent,” available online at http:// Institutionalization 139 in reality those are two very different things. I promised in the introduction to skip all epistemological questions, and so I leave this one to be debated by others. As this survey of protocological institutionalization shows, the primary source materials for any protocological analysis of Internet standards are the RFC memos.

Data Mining the Web: Uncovering Patterns in Web Content, Structure, and Usage by Zdravko Markov, Daniel T. Larose


Firefox, information retrieval, Internet Archive, iterative process, natural language processing, pattern recognition, random walk, recommendation engine, semantic web, speech recognition, statistical model, William of Occam

There are also approaches to do this automatically by applying machine learning methods for classification and clustering. We look into these approaches in Part II. Semantic Web Semantic web is a recent initiative led by the web consortium ( Its main objective is to bring formal knowledge representation techniques into the Web. Currently, web pages are designed basically for human readers. It is widely acknowledged that the Web is like a “fancy fax machine” used to send good-looking documents worldwide. The problem here is that the nice format of web pages is very difficult for computers to understand—something that we expect search engines to do. The main idea behind the semantic web is to add formal descriptive material to each web page that although invisible to people would make its content easily understandable by computers.

Thus, the Web would be organized and turned into the largest knowledge base in the world, which with the help of advanced reasoning techniques developed in the area of artificial intelligence would be able not just to provide ranked documents that match a keyword search query, but would also be able to answer questions and give explanations. The web consortium site ( provides detailed information about the latest developments in the area of the semantic web. Although the semantic web is probably the future of the Web, our focus is on the former two approaches to bring semantics to the Web. The reason for this is that web search is the data mining approach to web semantics: extracting knowledge from web data. In contrast, the semantic web approach is about turning web pages into formal knowledge structures and extending the functionality of web browsers with knowledge manipulation and reasoning tools. 6 CHAPTER 1 INFORMATION RETRIEVAL AND WEB SEARCH CRAWLING THE WEB In this and later sections we use basic web terminology such as HTML, URL, web browsers, and servers.

Larose, Daniel T. II. Title. QA76.9.D343M38 2007 005.74 – dc22 2006025099 Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 For my children Teodora, Kalin, and Svetoslav – Z.M. For my children Chantal, Ellyriane, Tristan, and Ravel – D.T.L. CONTENTS PREFACE xi PART I WEB STRUCTURE MINING 1 2 INFORMATION RETRIEVAL AND WEB SEARCH 3 Web Challenges Web Search Engines Topic Directories Semantic Web Crawling the Web Web Basics Web Crawlers Indexing and Keyword Search Document Representation Implementation Considerations Relevance Ranking Advanced Text Search Using the HTML Structure in Keyword Search Evaluating Search Quality Similarity Search Cosine Similarity Jaccard Similarity Document Resemblance References Exercises 3 4 5 5 6 6 7 13 15 19 20 28 30 32 36 36 38 41 43 43 HYPERLINK-BASED RANKING 47 Introduction Social Networks Analysis PageRank Authorities and Hubs Link-Based Similarity Search Enhanced Techniques for Page Ranking References Exercises 47 48 50 53 55 56 57 57 vii viii CONTENTS PART II WEB CONTENT MINING 3 4 5 CLUSTERING 61 Introduction Hierarchical Agglomerative Clustering k-Means Clustering Probabilty-Based Clustering Finite Mixture Problem Classification Problem Clustering Problem Collaborative Filtering (Recommender Systems) References Exercises 61 63 69 73 74 76 78 84 86 86 EVALUATING CLUSTERING 89 Approaches to Evaluating Clustering Similarity-Based Criterion Functions Probabilistic Criterion Functions MDL-Based Model and Feature Evaluation Minimum Description Length Principle MDL-Based Model Evaluation Feature Selection Classes-to-Clusters Evaluation Precision, Recall, and F-Measure Entropy References Exercises 89 90 95 100 101 102 105 106 108 111 112 112 CLASSIFICATION 115 General Setting and Evaluation Techniques Nearest-Neighbor Algorithm Feature Selection Naive Bayes Algorithm Numerical Approaches Relational Learning References Exercises 115 118 121 125 131 133 137 138 PART III WEB USAGE MINING 6 INTRODUCTION TO WEB USAGE MINING 143 Definition of Web Usage Mining Cross-Industry Standard Process for Data Mining Clickstream Analysis 143 144 147 CONTENTS 7 8 9 ix Web Server Log Files Remote Host Field Date/Time Field HTTP Request Field Status Code Field Transfer Volume (Bytes) Field Common Log Format Identification Field Authuser Field Extended Common Log Format Referrer Field User Agent Field Example of a Web Log Record Microsoft IIS Log Format Auxiliary Information References Exercises 148 PREPROCESSING FOR WEB USAGE MINING 156 Need for Preprocessing the Data Data Cleaning and Filtering Page Extension Exploration and Filtering De-Spidering the Web Log File User Identification Session Identification Path Completion Directories and the Basket Transformation Further Data Preprocessing Steps References Exercises 156 149 149 149 150 151 151 151 151 151 152 152 152 153 154 154 154 158 161 163 164 167 170 171 174 174 174 EXPLORATORY DATA ANALYSIS FOR WEB USAGE MINING 177 Introduction Number of Visit Actions Session Duration Relationship between Visit Actions and Session Duration Average Time per Page Duration for Individual Pages References Exercises 177 MODELING FOR WEB USAGE MINING: CLUSTERING, ASSOCIATION, AND CLASSIFICATION Introduction Modeling Methodology Definition of Clustering The BIRCH Clustering Algorithm Affinity Analysis and the A Priori Algorithm 177 178 181 183 185 188 188 191 191 192 193 194 197 x CONTENTS Discretizing the Numerical Variables: Binning Applying the A Priori Algorithm to the CCSU Web Log Data Classification and Regression Trees The C4.5 Algorithm References Exercises INDEX 199 201 204 208 210 211 213 PREFACE DEFINING DATA MINING THE WEB By data mining the Web, we refer to the application of data mining methodologies, techniques, and models to the variety of data forms, structures, and usage patterns that comprise the World Wide Web.

pages: 369 words: 80,355

Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room by David Weinberger


airport security, Alfred Russel Wallace, Amazon Mechanical Turk, Berlin Wall, Black Swan, book scanning, Cass Sunstein, commoditize, corporate social responsibility, crowdsourcing, Danny Hillis, David Brooks, Debian, double entry bookkeeping, double helix,, Exxon Valdez, Fall of the Berlin Wall, future of journalism, Galaxy Zoo, Hacker Ethic, Haight Ashbury, hive mind, Howard Rheingold, invention of the telegraph, jimmy wales, John Harrison: Longitude, Kevin Kelly, linked data, Netflix Prize, New Journalism, Nicholas Carr, Norbert Wiener, openstreetmap, P = NP, Pluto: dwarf planet, profit motive, Ralph Waldo Emerson, RAND corporation, Ray Kurzweil, Republic of Letters, RFID, Richard Feynman, Richard Feynman, Ronald Reagan, semantic web, slashdot, social graph, Steven Pinker, Stewart Brand, technological singularity, Ted Nelson, the scientific method, The Wisdom of Crowds, Thomas Kuhn: the structure of scientific revolutions, Thomas Malthus, Whole Earth Catalog, X Prize

But because each of these has a different way of identifying the book, there’s no easy way to write a program that will reliably pull all that information together. If each of these sites followed the conventions specified by the Semantic Web—initiated by Sir Tim Berners-Lee, the inventor of the World Wide Web, around the turn of the millennium—computer programs could far more easily know that these sites were referring to the same book. In fact, the Semantic Web would make it possible to share far more complex information from across multiple sites. Agreeing on how to encode metadata makes the Net capable of expressing more knowledge than was put into it. That is the very definition of a smart network. But creating that metadata can be difficult, especially since many Semantic Web adherents originally proceeded by trying to write large, complex, logical representations of domains of the world. Writing these ontologies, as they are called, can be difficult.

If you’re just trying to write a model of, say, knitting, it might not be too complex; you’d have to represent all the objects (needles, yarns, patterns, knitters, knit goods, etc.) and all their relationships (knit goods have knitters, knitters use needles, needles have sizes, etc.). But writing an ontology of financial markets would require agreeing on exactly what the required definitional elements of a “trade,” “bond,” “regulation,” and “report” are—as well as on every detail and every connection with other domains, such as law, economics, and politics. So, some supporters of the Semantic Web (including Tim Berners-Lee 8) decided that there would be faster and enormous benefits to making data accessible in standardized but imperfect form—as what is called “Linked Data”—without waiting for agreement about overarching ontologies. So, if you have a store of information about, say, chemical elements, you can make it available on the Web as a series of basic assertions that are called “triples” because they have the form of two objects joined by a relation: “Mercury is an element.”

Now any application that wants to understand your triple knows that the relationship is the one defined over on the Dublin Core site. This approach may be messy and imperfect, but it is 100 percent better than not releasing data because you haven’t figured out how to get the metadata perfectly right. The rise of Linked Data encapsulates the transformation of knowledge we have explored throughout this book. While the original Semantic Web emphasized building ontologies that are “knowledge representations” of the world, it turns out that if we go straight to unleashing an abundance of linked but imperfect data, making it widely and openly available in standardized form, the Net becomes a dramatically improved infrastructure for knowledge. Linked Data is nevertheless itself only an example of a more expansive practice: Create metadata so your information can be reused.

pages: 397 words: 102,910

The Idealist: Aaron Swartz and the Rise of Free Culture on the Internet by Justin Peters


4chan, Any sufficiently advanced technology is indistinguishable from magic, Bayesian statistics, Brewster Kahle, buy low sell high, corporate governance, crowdsourcing, disintermediation, don't be evil, global village, Hacker Ethic, hypertext link, index card, informal economy, information retrieval, Internet Archive, invention of movable type, invention of writing, Isaac Newton, John Markoff, Lean Startup, moral panic, Paul Buchheit, Paul Graham, profit motive, RAND corporation, Republic of Letters, Richard Stallman, selection bias, semantic web, Silicon Valley, social web, Steve Jobs, Steven Levy, Stewart Brand, strikebreaker, Vannevar Bush, Whole Earth Catalog, Y Combinator

. ,” Schoolyard Subversion, August 16, 2000, 41 Aaron Swartz, “The Weight of School,” Schoolyard Subversion, October 8, 2000, 42 Aaron Swartz, “Welcome to Unschooling,” Schoolyard Subversion, April 5, 2001, 43 Robert Swartz, interview. 44 Tim Berners-Lee, James Hendler, and Ora Lassila, “The Semantic Web.” Scientific American, May 17, 2001, 45 Aaron Swartz, “I think there is a,” Aaron Swartz: The Weblog, January 14, 2002, 46 Felter, interview. 47 Wilcox-O’Hearn, “Part 1.” 48 Eldred, “Battle of the Books.” 49 Interview with Lisa Rein, January 2013. 50 Interview with Ben Adida, January 2013. 51 Rein, interview. 52 Aaron Swartz, “Emerging Technologies—Day 2,” Aaron Swartz: The Weblog, May 15, 2002, 53 Aaron Swartz, “May 13, 2002: Visiting Google,” Google Weblog, May 13, 2002, 54 Felter, interview. 55 Aaron Swartz, “Emerging Technologies—Day 3,” Aaron Swartz: The Weblog, May 16, 2002, 56 Ibid. 57 Eric Eldred to Book People mailing list, October 19, 1998,

,” a developer named Gabe Beged-Dov wrote to an online mailing list on July 3, 2000.38 Swartz responded: “I generally try not to mention my age, because I find that unfortunately some people immediately discredit me because of it. :-(, Thanks to everyone who is able to put aside their prejudices not only in age, but in all matters, so that work on standards like these can go ahead and we can build the Web of the future. I don’t know about all of you, but I get very excited when I think about the possibilities for the Semantic Web. The sooner we get standards, the better. It’s not hard—even an 8th grader can do it! :-) So let’s get moving.”39 Swartz attended a private school, North Shore Country Day, in Winnetka, Illinois, and he chafed at its rules and customs. After-school sports were mandatory, much to his dismay. (“I narrowly escaped another day of practice due to an awful migraine headache. I don’t know which is worse: the headache or practice,” he blogged in August 2000.)40 Students were burdened with too much homework and too many course requirements.

The W3C members were among the first to understand the potential value and power of metadata as a solution to the search-and-retrieval problems that plagued the Web. Just as a supermarket checkout machine scans a bar code to determine exactly what you’re buying and how much it costs, a computer reads a website’s metadata to acquire salient information about that site and the content therein. In a 2001 Scientific American article, Berners-Lee, James Hendler, and Ora Lassila made the case for a metadata-rich “Semantic Web” as one “in which information is given well-defined meaning, better enabling computers and people to work in cooperation. . . . In the near future, these developments will usher in significant new functionality as machines become much better able to process and ‘understand’ the data that they merely display at present.”44 The idea sounded great to Swartz. “The future will be made of thousands of small pieces—computers, protocols, programming languages and people—all working together,” he wrote in January 2002.

pages: 291 words: 77,596

Total Recall: How the E-Memory Revolution Will Change Everything by C. Gordon Bell, Jim Gemmell


airport security, Albert Einstein, book scanning, cloud computing, conceptual framework, Douglas Engelbart, full text search, information retrieval, invention of writing, inventory management, Isaac Newton, John Markoff, lifelogging, Menlo Park, optical character recognition, pattern recognition, performance metric, RAND corporation, RFID, semantic web, Silicon Valley, Skype, social web, statistical model, Stephen Hawking, Steve Ballmer, Ted Nelson, telepresence, Turing test, Vannevar Bush, web application

Bell, Gordon, and Jim Gemmell. 2002. “A Call for the Home Media Network. Communications of the ACM 45, no. 7 ( July): 71-75. Association for Computing Machinery, Inc. Montalbano, Elizabeth. 2008. “IBM Pledges $1 Billion to Unified Communications.” PC World (March 11). O’Reilly, Paul. 2009. “Managing Unified Communications Performance.” CRN (March 9). Semantic Web: Berners-Lee, T., and J. Hendler. 2001. “Scientific Publishing on the Semantic Web.” Nature (26 April). W3C Semantic Web Frequently Asked Questions. British Library Digital Lives Project and conference: Digital Lives Research Project Web page. First Digital Lives Research Conference: Personal Digital Archives for the 21st Century. British Library, St. Pancras, London, February 9-11, 2009. Randy Hahn helped us craft the story about him.

Translation software is required to preserve the correct meaning between systems. As anyone who has translated between languages knows, a word-for-word translation is inadequate; it gives us translations that turn “The spirit is willing but the flesh is weak” to “The alcohol is good but the meat is bad.” Likewise, it can be difficult to translate between storage formats, and a lot of work is yet to go into this effort. The Semantic Web, which aims to standardize transmission and translation of information, is an important effort in this area. There will also be a unification of networking in the sense that we will cease to have distinct networks for different types of data. Already we get telephone over our cable TV network and TV shows over our telephone’s DSL. Eventually, we will get a digital dial tone that carries anything and everything. scanners and digitizing books and file formats and implementation of Total Recall and memex and organization of data and origin of MyLifeBits pen scanners scanning services Schacter, Daniel scholarship science fiction scientific method scrapbooking screensavers Scripps Genomic Health Initiative searching data and associative memory and data analysis desktop search and e-books and implementation of Total Recall and lifelogging Second Life security of data and adaptation to lifelogging and education and encryption and ownership of health records and passwords and privacy self-awareness semantic memory Semantic Web SenseCam and CARPE and diet monitoring and memory aids origin of and summarization of data and travelogues sensory technology. See also biometric sensors The Seven Sins of Memory: How the Mind Forgets and Remembers (Schacter) sexual molestation memories sharing data sheet music shopping lists The Simpsons situational awareness Sixth Sense system Sky Server sleep data SmartDraw smartphones.

pages: 201 words: 63,192

Graph Databases by Ian Robinson, Jim Webber, Emil Eifrem


Amazon Web Services, anti-pattern, bioinformatics, commoditize, corporate governance, create, read, update, delete, data acquisition,, fault tolerance, linked data, loose coupling, Network effects, recommendation engine, semantic web, sentiment analysis, social graph, software as a service, SPARQL, web application

However, in situations where you need to capture meta-intent, effectively qualifying one relationship with another, (e.g. I like the fact that you liked that car), hypergraphs typically require fewer primitives than property graphs. Triples Triple stores come from the Semantic Web movement, where researchers are interested in large-scale knowledge inference by adding semantic markup to the links that connect Web resources.10 To date, very little of the Web has been marked up in a useful fashion, so running queries across the semantic layer is uncommon. Instead, most effort in the semantic Web appears to be invested in harvesting useful data and relationship infor‐ mation from the Web (or other more mundane data sources, such as applications) and depositing it in triple stores for querying. A triple is a subject-predicate-object data structure.

A triple is a subject-predicate-object data structure. Using triples, we can capture facts, such as “Ginger dances with Fred” and “Fred likes ice cream.” Individually, single triples are semantically rather poor, but en-masse they provide a rich dataset from which to harvest knowledge and infer connections. Triple stores typically provide SPARQL ca‐ pabilities to reason about stored RDF data.11 RDF—the lingua franca of triple stores and the Semantic Web—can be serialized several ways. RDF encoding of a simple three-node graph shows the RDF/XML format. Here we see how triples come together to form linked data. RDF encoding of a simple three-node graph. <rdf:RDF xmlns:rdf="" xmlns=" <rdf:Description rdf:about=""> <name>Ginger Rogers</name> <occupation>dancer</occupation> <partner rdf:resource=""/> </rdf:Description> 10. 11.

See and Graph Databases | 185 <rdf:Description rdf:about=""> <name>Fred Astaire</name> <occupation>dancer</occupation> <likes rdf:resource=""/> </rdf:Description> </rdf:RDF> W3C support That they produce logical representations of triples doesn’t mean triple stores necessarily have triple-like internal implementations. Most triple stores, however, are unified by their support for Semantic Web technology such as RDF and SPARQL. While there’s nothing particularly special about RDF as a means of serializing linked data, it is en‐ dorsed by the W3C and therefore benefits from being widely understood and well doc‐ umented. The query language SPARQL benefits from similar W3C patronage. In the graph database space there is a similar abundance of innovation around graph serialization formats (e.g.

pages: 400 words: 94,847

Reinventing Discovery: The New Era of Networked Science by Michael Nielsen


Albert Einstein, augmented reality, barriers to entry, bioinformatics, Cass Sunstein, Climategate, Climatic Research Unit, conceptual framework, dark matter, discovery of DNA, Donald Knuth, double helix, Douglas Engelbart, Douglas Engelbart,, Erik Brynjolfsson, fault tolerance, Fellow of the Royal Society, Firefox, Freestyle chess, Galaxy Zoo, Internet Archive, invisible hand, Jane Jacobs, Jaron Lanier, Kevin Kelly, Magellanic Cloud, means of production, medical residency, Nicholas Carr, publish or perish, Richard Feynman, Richard Feynman, Richard Stallman, selection bias, semantic web, Silicon Valley, Silicon Valley startup, Simon Singh, Skype, slashdot, social web, statistical model, Stephen Hawking, Stewart Brand, Ted Nelson, The Death and Life of Great American Cities, The Nature of the Firm, The Wisdom of Crowds, University of East Anglia, Vannevar Bush, Vernor Vinge

[12] Yochai Benkler. Coase’s penguin, or, Linux and The Nature of the Firm. The Yale Law Journal, 112:369–446, 2002. [13] Yochai Benkler. The Wealth of Networks. New Haven: Yale University Press, 2006. [14] Tim Berners-Lee. Weaving the Web. New York: Harper Business, 2000. [15] Tim Berners-Lee and James Hendler. Publishing on the semantic web. Nature, 410:1023–1024, April 26, 2001. [16] Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web. Scientific American, May 17, 2001. [17] Mario Biagioli. Galileo’s Instruments of credit: Telescopes, images, secrecy. Chicago: University of Chicago Press, 2006. [18] Peter Block. Community: The Structure of Belonging. San Francisco: Berrett Koehler, 2008. [19] Barry Boehm, Bradford Clark, Ellis Horowitz, Ray Madachy, Richard Shelby, and Chris Westland.

It was out of that mess of ideas that the first airplanes slowly emerged. In a similar way, today thousands of people and organizations have their own ideas about the best way to build the data web. All are aiming in roughly the same direction, but there are many differences in the details. Perhaps the best-known effort comes from academia, where many researchers are developing an approach called the semantic web. In the business world, the state of affairs is more fluid, as companies try out many different ways of sharing data. Because of these many approaches, there are passionate arguments about the best way to build the data web, often carried out with great conviction and certainty. But the data web is still in its infancy, and it’s too early to say which approach will succeed. For these reasons, Il use the term “data web” rather loosely to refer to all open data, taken together in aggregate.

Interestingly, Hydra has played and lost twice in games of correspondence chess, against correspondence chess grandmaster Arno Nickel. Nickel was, however, allowed to use computer chess programs in these games. A full record of Hydra’s games may be found at [40]. p 119: Chuck Hansen’s book is [92]. The story I recount about Hansen’s methodology is told in Richard Rhodes’s book How to Write, [182], page 61. p 120: On the semantic web, see [16, 15] and A stimulating alternate point of view is [88]. p 120: For Obama’s memorandum on transparency and open government, see [158]. p 123: The beautiful summary of Einstein’s general theory of relativity, “Spacetime tells matter how to move; matter tells spacetime how to curve,” is due to John Wheeler [240]. p 125 these models have no understanding of the meaning of “hola” or “hello”: I use the term “understanding” here in its everyday sense.

Martin Kleppmann-Designing Data-Intensive Applications. The Big Ideas Behind Reliable, Scalable and Maintainable Systems-O’Reilly (2017) by Unknown

active measures, Amazon Web Services, bitcoin, blockchain, business intelligence, business process,, cloud computing, collaborative editing, commoditize, conceptual framework, cryptocurrency, database schema, DevOps, distributed ledger, Donald Knuth, Edward Snowden, ethereum blockchain, fault tolerance, finite state, Flash crash, full text search, general-purpose programming language, informal economy, information retrieval, Internet of things, iterative process, John von Neumann, loose coupling, Marc Andreessen, natural language processing, Network effects, packet switching, peer-to-peer, performance metric, place-making, premature optimization, recommendation engine, Richard Feynman, Richard Feynman, self-driving car, semantic web, Shoshana Zuboff, social graph, social web, software as a service, software is eating the world, sorting algorithm, source of truth, SPARQL, speech recognition, statistical model, web application, WebSocket, wikimedia commons

_:lucy a :Person; :name _:idaho a :Location; :name _:usa a :Location; :name _:namerica a :Location; :name "Lucy"; "Idaho"; "United States"; "North America"; :bornIn _:idaho. :type "state"; :within _:usa. :type "country"; :within _:namerica. :type "continent". The semantic web If you read more about triple-stores, you may get sucked into a maelstrom of articles written about the semantic web. The triple-store data model is completely independ‐ ent of the semantic web—for example, Datomic [40] is a triple-store that does not claim to have anything to do with it.vii But since the two are so closely linked in many people’s minds, we should discuss them briefly. The semantic web is fundamentally a simple and reasonable idea: websites already publish information as text and pictures for humans to read, so why don’t they also publish information as machine-readable data for computers to read?

The Resource Description Framework (RDF) [41] was intended as a mechanism for different web‐ sites to publish data in a consistent format, allowing data from different websites to be automatically combined into a web of data—a kind of internet-wide “database of everything.” Unfortunately, the semantic web was overhyped in the early 2000s but so far hasn’t shown any sign of being realized in practice, which has made many people cynical about it. It has also suffered from a dizzying plethora of acronyms, overly complex standards proposals, and hubris. However, if you look past those failings, there is also a lot of good work that has come out of the semantic web project. Triples can be a good internal data model for appli‐ cations, even if you have no interest in publishing RDF data on the semantic web. The RDF data model The Turtle language we used in Example 2-7 is a human-readable format for RDF data. Sometimes RDF is also written in an XML format, which does the same thing much more verbosely—see Example 2-8.

# SPARQL Because RDF doesn’t distinguish between properties and edges but just uses predi‐ cates for both, you can use the same syntax for matching properties. In the following expression, the variable usa is bound to any vertex that has a name property whose value is the string "United States": (usa {name:'United States'}) # Cypher ?usa :name "United States". # SPARQL SPARQL is a nice query language—even if the semantic web never happens, it can be a powerful tool for applications to use internally. Graph-Like Data Models | 59 Graph Databases Compared to the Network Model In “Are Document Databases Repeating History?” on page 36 we discussed how CODASYL and the relational model competed to solve the problem of many-tomany relationships in IMS. At first glance, CODASYL’s network model looks similar to the graph model.

pages: 903 words: 235,753

The Stack: On Software and Sovereignty by Benjamin H. Bratton


1960s counterculture, 3D printing, 4chan, Ada Lovelace, additive manufacturing, airport security, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, algorithmic trading, Amazon Mechanical Turk, Amazon Web Services, augmented reality, autonomous vehicles, basic income, Benevolent Dictator For Life (BDFL), Berlin Wall, bioinformatics, bitcoin, blockchain, Buckminster Fuller, Burning Man, call centre, carbon footprint, carbon-based life, Cass Sunstein, Celebration, Florida, charter city, clean water, cloud computing, connected car, corporate governance, crowdsourcing, cryptocurrency, dark matter, David Graeber, deglobalization, dematerialisation, disintermediation, distributed generation, don't be evil, Douglas Engelbart, Douglas Engelbart, Edward Snowden, Elon Musk,, Eratosthenes, ethereum blockchain, facts on the ground, Flash crash, Frank Gehry, Frederick Winslow Taylor, future of work, Georg Cantor, gig economy, global supply chain, Google Earth, Google Glasses, Guggenheim Bilbao, High speed trading, Hyperloop, illegal immigration, industrial robot, information retrieval, Intergovernmental Panel on Climate Change (IPCC), intermodal, Internet of things, invisible hand, Jacob Appelbaum, Jaron Lanier, John Markoff, Jony Ive, Julian Assange, Khan Academy, liberal capitalism, lifelogging, linked data, Mark Zuckerberg, market fundamentalism, Marshall McLuhan, Masdar, McMansion, means of production, megacity, megastructure, Menlo Park, Minecraft, Monroe Doctrine, Network effects, new economy, offshore financial centre, oil shale / tar sands, packet switching, PageRank, pattern recognition, peak oil, peer-to-peer, performance metric, personalized medicine, Peter Eisenman, Peter Thiel, phenotype, Philip Mirowski, Pierre-Simon Laplace, place-making, planetary scale, RAND corporation, recommendation engine, reserve currency, RFID, Robert Bork, Sand Hill Road, self-driving car, semantic web, sharing economy, Silicon Valley, Silicon Valley ideology, Slavoj Žižek, smart cities, smart grid, smart meter, social graph, software studies, South China Sea, sovereign wealth fund, special economic zone, spectrum auction, Startup school, statistical arbitrage, Steve Jobs, Steven Levy, Stewart Brand, Stuxnet, Superbowl ad, supply-chain management, supply-chain management software, TaskRabbit, the built environment, The Chicago School, the scientific method, Torches of Freedom, transaction costs, Turing complete, Turing machine, Turing test, universal basic income, urban planning, Vernor Vinge, Washington Consensus, web application, Westphalian system, WikiLeaks, working poor, Y Combinator

These range from the prosaic (Google “my car keys” to find them under the couch) to the barely fathomable (“search the contagion distribution of the RNA in the virus that laid me up”). Just as for today's web pages, search providers are eager to provide more direct services built directly into query results themselves by predictively interpreting the intention of the query and providing its likely solution along with tools for the User to accomplish that intention as part of the search result. These are techniques sometimes associated with the semantic web, for which structured data are linked and associated to allow instrumental relations with other data, making the web as a whole more programmable by Users. Through various combinations of open or proprietary exigetics of data, and perhaps a sequence of application programming interfaces (APIs), a query entered as “book me a ticket to New York” can activate a series of secondary inquiries to calendars, banks, flight schedules, airline databases, bank accounts, and so on and, through this, initiate the cascading programming resulting in that booking.

The designation of semantic relations between objects, according to some disinterested (or extremely interested and capitalized) graph of addresses and their interlocking sets, might reorganize what we take to be the natural proximities of one thing to one another and introduce another map (even topology) of queryable association between them. This resulting platform might provide for the programming and counterprogramming of the resulting object landscapes and event graphs, putting them to direct use, as well as providing secondary metadata about their efficacy or accuracy. Just as most of the traffic on the Internet today is machine-to-machine, or at least machine generated, so too a semantic web of things21 would be correlated less by the cognitive dispositions or instrumental intentions of human Users, but those of “objects” and other instances within the larger meta-assemblage all querying and programming one another without human intervention or supervision. In the hype, it's easy to forget that the Internet of Things is also an Internet for Things (or for any addressable entity, however immaterial).

Kerry Stevenson, “The 3D Printer Virus, Really?” Fabbaloo, April 7, 2010, 20.  Cory Doctorow, “Metacrap: Putting the Torch to Seven Straw-men of the Meta-Utopia,” Well, August 26, 2011. 21.  Payam Barnaghi, Cory Henson, Kerry Taylor, and Wei Wang, “Semantics for the Internet of Things: Early Progress and Back to the Future,” International Journal on Semantic Web and Information System 8, no. 1 (2012): 1–21, 22.  Yann Moulier-Boutang, Cognitive Capitalism (London: Polity Press, 2012). 23.  Open Internet of Things Assembly, “Bill of Rights” (July 17, 2012). 24.  See, for example, Saul A. Kripke, Naming and Necessity (Cambridge, MA: Harvard University Press, 1980).

pages: 402 words: 110,972

Nerds on Wall Street: Math, Machines and Wired Markets by David J. Leinweber


AI winter, algorithmic trading, asset allocation, banking crisis, barriers to entry, Big bang: deregulation of the City of London, butterfly effect, buttonwood tree, buy low sell high, capital asset pricing model, citizen journalism, collateralized debt obligation, corporate governance, Craig Reynolds: boids flock, creative destruction, credit crunch, Credit Default Swap, credit default swaps / collateralized debt obligations, Danny Hillis, demand response, disintermediation, distributed generation, diversification, diversified portfolio, Emanuel Derman,, experimental economics, financial innovation, fixed income, Gordon Gekko, implied volatility, index arbitrage, index fund, information retrieval, intangible asset, Internet Archive, John Nash: game theory, Kenneth Arrow, Khan Academy, load shedding, Long Term Capital Management, Machine translation of "The spirit is willing, but the flesh is weak." to Russian and back, market fragmentation, market microstructure, Mars Rover, Metcalfe’s law, moral hazard, mutually assured destruction, Myron Scholes, natural language processing, negative equity, Network effects, optical character recognition, paper trading, passive investing, pez dispenser, phenotype, prediction markets, quantitative hedge fund, quantitative trading / quantitative finance, QWERTY keyboard, RAND corporation, random walk, Ray Kurzweil, Renaissance Technologies, Richard Stallman, risk tolerance, risk-adjusted returns, risk/return, Robert Metcalfe, Ronald Reagan, Rubik’s Cube, semantic web, Sharpe ratio, short selling, Silicon Valley, Small Order Execution System, smart grid, smart meter, social web, South Sea Bubble, statistical arbitrage, statistical model, Steve Jobs, Steven Levy, Tacoma Narrows Bridge, the scientific method, The Wisdom of Crowds, time value of money, too big to fail, transaction costs, Turing machine, Upton Sinclair, value at risk, Vernor Vinge, yield curve, Yogi Berra, your tax dollars at work

The new so-called adaptive market hypothesis and a certain degree of common sense allow that some news ( but not all) is news to everyone at the same time, and that someone can be the first to profit from it. This opens yet another front in the algo wars. In the past year, we have seen the major news providers, Dow Jones25 and Reuters,26 offering costly high-end, low-latency news feeds designed for machines. In addition to being faster, they include extensive XML tagging for a variety of stories. These semantic Web approaches allow clever algo warriors to extract the salient facts with much greater accuracy than they could achieve writing code to parse plaintext feeds designed for human readers. What kind of tags are they talking about? The Dow Jones product is described as over 150 macroeconomic indicators, in developed markets, and a wide range of news on publicly traded U.S. and Canadian firms, as well as some in the United Kingdom.

Collectively, the new alphabet soup of technologies—AI, IA, NLP, and IR (artificial intelligence, intelligence amplification, natural language processing, and information retrieval, for those with a bigger soup bowl)—provides a means to make sense of patterns in the data collected in enterprise and global search. These means are molecular search, the use of persistent software agents so you don’t have to keep doing the same thing all the time; the semantic Web, using the information associated with data at the point of origin so there is less guessing about meaning of what find; and modern user interfaces and visualizations, so you can prioritize what you find, and focus on the important and the valuable in a timely way. The SEC: The Mother Lode of Pre-News The Securities and Exchange Commission is a good place to start looking for pre-news. There many reasons for this.

This came along at the same time 218 Nerds on Wall Str eet as other Enron (and WorldCom and Tyco)-inspired reforms in the Sarbanes-Oxley Act. The elimination of the time disadvantage for ordinary investors, paying only with their taxes and using the SEC web site, is an overdue improvement in a system that (literally) delivered yesterday’s news for its first six years of existence. Other advances were slower in coming. The filings themselves remain unstructured text files, with no sign of the semantic Web and XML ideas that are used to deliver meaningful information in many other contexts. After years of lip service to modernizing EDGAR, SEC chairman Christopher Cox (who took office in 2005) made a serious effort to do so, replacing TRW with more Internet-savvy firms and actually demonstrating prototypes that allow extraction of specific content from SEC filings. A description of the agency’s plans for this “21st Century Disclosure Initiative” is now featured prominently on the home page of the SEC site.

pages: 394 words: 118,929

Dreaming in Code: Two Dozen Programmers, Three Years, 4,732 Bugs, and One Quest for Transcendent Software by Scott Rosenberg


A Pattern Language, Benevolent Dictator For Life (BDFL), Berlin Wall,, call centre, collaborative editing, conceptual framework, continuous integration, Donald Knuth, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dynabook,, Firefox, Ford paid five dollars a day, Francis Fukuyama: the end of history, George Santayana, Grace Hopper, Guido van Rossum, Gödel, Escher, Bach, Howard Rheingold, index card, Internet Archive, inventory management, Jaron Lanier, John Markoff, John von Neumann, knowledge worker, Larry Wall, life extension, Loma Prieta earthquake, Menlo Park, Merlin Mann, new economy, Nicholas Carr, Norbert Wiener, pattern recognition, Paul Graham, Potemkin village, RAND corporation, Ray Kurzweil, Richard Stallman, Ronald Reagan, Ruby on Rails, semantic web, side project, Silicon Valley, Singularitarianism, slashdot, software studies, source of truth, South of Market, San Francisco, speech recognition, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Stewart Brand, Ted Nelson, Therac-25, thinkpad, Turing test, VA Linux, Vannevar Bush, Vernor Vinge, web application, Whole Earth Catalog, Y2K

You could model just about anything in a simple three-part format that looked something like the subject-verb-object arrangement of a simple English sentence: <this> <has-relationship-with> <that> Then they discovered that the answer they’d come up with had already been outlined and at least partially implemented by researchers led by Tim Berners-Lee, the scientist who had invented the World Wide Web a dozen years before. Berners-Lee had a dream he called the Semantic Web, an upgraded version of the existing Web that relied on smarter and more complex representations of data. The Semantic Web would be built on a technical foundation called RDF, for Resource Description Framework. RDF stores all information in “triples”—statements in three parts that declare relationships between things. This was very close to the structure Sagen had independently sketched out, with the advantage that a considerable amount of work over several years had already been put into codifying the details.

In meetings through November and December, culminating in a marathon right after the New Year, the Chandler team struggled toward an elusive consensus. The RDF-based Shimmer repository was something Morgen Sagen had built expressly as a prototype; it couldn’t simply be hitched onto the real Chandler. Besides, John Anderson had never gotten the RDF religion. The whole RDF enterprise had a reputation for academic complexity and impracticality. There were lots of papers about the Semantic Web, but not a lot of working software. As one programmer after another had a look at the world of RDF, each came to a similar conclusion: It was “scary.” Anderson knew how much work programming Chandler’s user interface would be. He had been there before and understood how critical it was to keep that job manageable; it was the area most likely to cause endless delay. His chief requirement for the repository was to make things easier for the front-end developers.

“We took the plan out”: From “Painful Birth: Creating New Software Was Agonizing Task for Mitch Kapor Firm” by Paul B. Carroll, Wall Street Journal, May 11, 1990. CHAPTER 3 PROTOTYPES AND PYTHON “a crew of twenty people”: Artist Chris Cobb’s project at the Adobe Bookstore in San Francisco is chronicled at the McSweeney’s Web site at htm. Information about the Semantic Web and RDF is at “plan to throw one away” and “promise to deliver a throwaway”: Frederick Brooks, The Mythical Man-Month (Addison Wesley, 1995), pp. 115–16. “The programmer, like the poet”: Ibid., p. 7. “The lunatic, the lover, and the poet”: William Shakespeare, A Midsummer Night’s Dream, Act V, sc. i. “The process of combining multiple”: The phrase is from Wikipedia’s definition of “Abstraction (computer science).”

pages: 570 words: 115,722

The Tangled Web: A Guide to Securing Modern Web Applications by Michal Zalewski


barriers to entry, business process, defense in depth, easy for humans, difficult for computers, fault tolerance, finite state, Firefox, Google Chrome, information retrieval, RFC: Request For Comment, semantic web, Steve Jobs, telemarketer, Turing test, Vannevar Bush, web application, WebRTC, WebSocket

In the traditional HTML parser in Firefox versions prior to 4, any occurrence of “--”, later followed by “>”, is also considered good enough. The Battle over Semantics The low-level syntax of the language aside, HTML is also the subject of a fascinating conceptual struggle: a clash between the ideology and the reality of the online world. Tim Berners-Lee always championed the vision of a semantic web, an interconnected system of documents in which every functional block, such as a citation, a snippet of code, a mailing address, or a heading, has its meaning explained by an appropriate machine-readable tag (say, <cite>, <code>, <address>, or <h1> to <h6>). This approach, he and other proponents argued, would make it easier for machines to crawl, analyze, and index the content in a meaningful way, and in the near future, it would enable computers to reason using the sum of human knowledge.

Although tags such as <font> have been successfully obsoleted and largely abandoned in favor of CSS, this is only because stylesheets offered more powerful and consistent visual controls. With the help of CSS, the developers simply started relying on a soup of semantically agnostic <span> and <div> tags to build everything from headings to user-clickable buttons, all in a manner completely opaque to any automated content extraction tools. Despite having had a lasting impact on the design of the language, in some ways, the idea of a semantic web may be becoming obsolete: Online content less frequently maps to the concept of a single, viewable document, and HTML is often reduced to providing a convenient drawing surface and graphic primitives for JavaScript applications to build their interfaces with. * * * [25] To process HTML documents, Internet Explorer uses the Trident engine (aka MSHTML); Firefox and some derived products use Gecko; Safari, Chrome, and several other browsers use WebKit; and Opera relies on Presto.

See Safari (Apple), Type-Specific Content Inclusion, Content Rendering with Browser Plug-ins, Sun Java, Cross-Domain Content Inclusion application/binary, Detection for Non-HTTP Files application/javascript document type, Plaintext Files application/json document type, Plaintext Files, Unrecognized Content Type application/mathml+xml document type, Audio and Video application/octet-stream document type, Special Content-Type Values, Detection for Non-HTTP Files application/x-www-for-urlencoded, Forms and Form-Triggered Requests Arce, Ivan, Information Security in a Nutshell Arya, Abhishek, Character Set Inheritance and Override asynchronous XMLHttpRequest, Interactions with Browser Credentials Atom, RSS and Atom Feeds authentication, in HTTP, HTTP Cookie Semantics Authorization header (HTTP), HTTP Authentication authorization, vs. authentication, HTTP Cookie Semantics B background parameter for HTML tags, Type-Specific Content Inclusion background processes, in JavaScript, Content-Level Features \ (backslashes) in URLs, browser acceptance of, Fragment ID backslashes (\) in URLs, browser acceptance of, Fragment ID ` (backticks), as quote characters, Understanding HTML Parser Behavior, The Document Object Model backticks (`), as quote characters, Understanding HTML Parser Behavior, The Document Object Model Bad Request status error (400), 300-399: Redirection and Other Status Messages bandwidth, and XML, XML User Interface Language Barth, Adam, Nonconvergence of Visions, Frame Descendant Policy and Cross-Domain Communications, XDomainRequest, Other Uses of the Origin Header, Sandboxed Frames, URL- and Protocol-Level Proposals Base64 encoding, Header Character Set and Encoding Schemes basic credential-passing method, HTTP Authentication Bell-La Padula security model, Flirting with Formal Solutions, Flirting with Formal Solutions Berners-Lee, Tim, Tales of the Stone Age: 1945 to 1994, Tales of the Stone Age: 1945 to 1994, The First Browser Wars: 1995 to 1999, Hypertext Transfer Protocol, Hypertext Markup Language, Document Parsing Modes and semantic web, Document Parsing Modes World Wide Web browser, Tales of the Stone Age: 1945 to 1994 World Wide Web Consortium, The First Browser Wars: 1995 to 1999 binary HTTP, URL- and Protocol-Level Proposals bitmap images, browser recognition of, Plaintext Files blacklists, Same-Origin Policy for XMLHttpRequest, Same-Origin Policy for XMLHttpRequest, New and Upcoming Security Features malicious URLs, New and Upcoming Security Features of HTTP headers in XMLHttpRequest, Same-Origin Policy for XMLHttpRequest BMP file format, Type-Specific Content Inclusion BOM (byte order marks), Character Set Handling Breckman, John, Referer Header Behavior browser cache, Caching Behavior, Caching Behavior, Caching Behavior information in, Caching Behavior poisoning, Caching Behavior browser extensions and UI, Pseudo-URLs browser market share, May 2011, Global browser market share, May 2011 browser wars, The First Browser Wars: 1995 to 1999, A Glimpse of Things to Come browser-managed site permissions, Extrinsic Site Privileges browser-side scripts, Browser-Side Scripts buffer overflow, Common Problems Unique to Server-Side Code bugs, preventing classes of, Enlightenment Through Taxonomy Bush, Vannevar, Toward Practical Approaches byte order marks (BOM), Character Set Handling C cache manifests, URL- and Protocol-Level Proposals cache poisoning, Access to Internal Networks, Vulnerabilities Specific to Web Applications Cache-Control directive, Resolution of Duplicate or Conflicting Headers, Caching Behavior cache.

pages: 188 words: 9,226

Collaborative Futures by Mike Linksvayer, Michael Mandiberg, Mushon Zer-Aviv


4chan, Benjamin Mako Hill, British Empire, citizen journalism, cloud computing, collaborative economy, corporate governance, crowdsourcing, Debian,, Firefox, informal economy, jimmy wales, Kickstarter, late capitalism, loose coupling, Marshall McLuhan, means of production, Naomi Klein, Network effects, optical character recognition, packet switching, postnationalism / post nation state, prediction markets, Richard Stallman, semantic web, Silicon Valley, slashdot, Slavoj Žižek, stealth mode startup, technoutopianism, the medium is the message, The Wisdom of Crowds, web application

The announcement of Google Wave was probably the most ambitious vision for a decentralized collaborative protocol coming from Silicon Valley. It was launched with the same celebratory terminology propagated by the selfproclaimed social media gurus, only to be terminated a year later when the vision could not live up to the hype. Web 3.0 is also bullshit. The term was initially used to describe a web enhanced by Semantic Web technologies. However, these technologies have been developed painstakingly over essentially the entire history of the web and deployed increasingly in the la er part of the last decade. Many Open Source projects reject the arbitrary and counter-productive terminology of “dot releases” the difference between the 2.9 release and the 3.0 release should not necessarily be more substantial than the one between 2.8 and 2.9.

Publishing the entire “research compendium” under appropriate terms (e.g. usually public domain for data, a free so ware license for so ware, and a liberal Creative Commons license for articles and other content) and in open formats has recently been called “reproducible research”—in computational fields, the publication of such a compendium gives other researches all of the tools they need to build upon one’s work. Standards are also very important for enabling scientific collaboration, and not just coarse standards like RSS. The Semantic Web and in particular ontologies have sometimes been ridiculed by consumer web developers, but they are necessary for science. How can one treat the world's scientific literature as a database if it isn't possible to identify, for example, a specific chemical or gene, and agree on a name for the chemical or gene in question that different programs can use interoperably? The biological sciences have taken a lead in implementation of semantic technologies, from ontology development and semantic databases to in-line web page annotation using RDFa.

pages: 66 words: 9,247

MongoDB and Python by Niall O’Higgins


cloud computing, Debian, fault tolerance, semantic web, web application

I’ve worked with most of the usual relational databases (MSSQL Server, MySQL, PostgreSQL) and with some very interesting nonrelational databases (’s Graphd/MQL, Berkeley DB, MongoDB). MongoDB is at this point the system I enjoy working with the most, and choose for most projects. It sits somewhere at a crossroads between the performance and pragmatism of a relational system and the flexibility and expressiveness of a semantic web database. It has been central to my success in building some quite complicated systems in a short period of time. I hope that after reading this book you will find MongoDB to be a pleasant database to work with, and one which doesn’t get in the way between you and the application you wish to build. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions.

pages: 272 words: 83,378

Digital Barbarism: A Writer's Manifesto by Mark Helprin


Albert Einstein, anti-communist, Berlin Wall, carbon footprint, computer age, crowdsourcing, hive mind, invention of writing, Jacquard loom, Jacquard loom, Plutocrats, plutocrats, race to the bottom, semantic web, Silicon Valley, Silicon Valley ideology, the scientific method, Yogi Berra, zero-sum game

Had each Turkish soldier had to decide individually whether or not to make that winter ascent, they all might have thought harder and better about it in the absence of so many others carrying them and their orders along on an utterly worthless wave of quick-set belief. In the electronic culture, however, the decision has already been made in regard to such things. To quote Jeremy Carroll, chief product architect of Top Quadrant, discussing an aspect of his work: “Semantic Web technology…” will make possible “consensus instructions from many different sources, or instructions that other people have already found helpful (rather than back-breaking searches and comparisons).”23 It is the labor, care, and learning in making such comparisons that bring the benefits of experience, a sharp eye, and good judgment. As anyone who has ever used it knows, the internet is a magnificent (if often unreliable) research tool.

See also Taxes from copyright, 111 royalties, 47, 51, 74, 78, 113, 158 A River Runs Through It (Maclean), 164 Robinson Crusoe (DeFoe), 119 Roth, Philip, 114 Royalties, 47, 51, 74, 78, 113, 158 Rushdie, Salman, 75 Russia, copyright law in, 128 S Satie, Erik, 80 Schlesinger, Arthur, 89 Schumann, Robert, 80 SEC. See Securities and Exchange Commission Second World War, 192, 196 Securities and Exchange Commission (SEC), 29 “Semantic Web technology,” 64 Seward, William H., 59–60 Sex, 17 Shakespeare, William, 179, 194 Sharpton, Al, 166 Signet Society (Harvard), 183 Silent Spring (Carson), 105 Silicon Valley, xiii, 205 Sinatra, Frank, 24 Skull and Bones (Yale), 182 Smith, Kate, 52 Social contract, 173 Socialism, 168 Social Security, 81 Social theorists, 185 Software, piracy, 38, 214 A Soldier of the Great War (Helprin), 113 Sonny Bono Copyright Term Extension Act (1998), 120, 125–126, 127, 139, 140 Sports, 94–95 Star Wars, 164 Statute of Queen Anne (1709), 124, 127 Stevens, Justice, 115 Story, Joseph, 124 Sweden, copyright law in, 128 Switzerland, copyright law in, 128 T Tartakovsky, Joseph, 87 Taubman, Arnold, 24 Taxes, 81, 86, 87, 171–172.

pages: 742 words: 137,937

The Future of the Professions: How Technology Will Transform the Work of Human Experts by Richard Susskind, Daniel Susskind


23andMe, 3D printing, additive manufacturing, AI winter, Albert Einstein, Amazon Mechanical Turk, Amazon Web Services, Andrew Keen, Atul Gawande, Automated Insights, autonomous vehicles, Big bang: deregulation of the City of London, big data - Walmart - Pop Tarts, Bill Joy: nanobots, business process, business process outsourcing, Cass Sunstein, Checklist Manifesto, Clapham omnibus, Clayton Christensen, clean water, cloud computing, commoditize, computer age, Computer Numeric Control, computer vision, conceptual framework, corporate governance, creative destruction, crowdsourcing, Daniel Kahneman / Amos Tversky, death of newspapers, disintermediation, Douglas Hofstadter,, Erik Brynjolfsson, Filter Bubble, Frank Levy and Richard Murnane: The New Division of Labor, full employment, future of work, Google Glasses, Google X / Alphabet X, Hacker Ethic, industrial robot, informal economy, information retrieval, interchangeable parts, Internet of things, Isaac Newton, James Hargreaves, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, Joseph Schumpeter, Khan Academy, knowledge economy, lifelogging, lump of labour, Marshall McLuhan, Metcalfe’s law, Narrative Science, natural language processing, Network effects, optical character recognition, Paul Samuelson, personalized medicine, pre–internet, Ray Kurzweil, Richard Feynman, Richard Feynman, Second Machine Age, self-driving car, semantic web, Shoshana Zuboff, Skype, social web, speech recognition, spinning jenny, strong AI, supply-chain management, telepresence, The Future of Employment, the market place, The Wealth of Nations by Adam Smith, The Wisdom of Crowds, transaction costs, Turing test, Watson beat the top human players on Jeopardy!, young professional

Next, there are machines that can interact with apparent manual skill and dexterity in the physical world (robotics). Finally, there are systems that can detect and express emotions (affective computing). Volumes have already been written on each of these four subjects. We try to give an overview rather than make an academic assessment. We are not suggesting, incidentally, that these are the only important developments. We could also have added the ‘semantic web’, ‘search algorithms’, and ‘intelligent agents’.33 But to debate which technologies are primary distracts from the bigger point—that, exploiting various technologies, our machines will continue to become increasingly capable, and able to discharge more and more tasks that we used to think were the distinctive province of human beings. Big Data In 1988, foreshadowing much that is now claimed in the field of ‘Big Data’, Harvard’s Soshana Zuboff made the following claim in her ground-breaking book In the Age of the Smart Machine: ‘Information technology not only produces action but also produces a voice that symbolically renders events, objects, and processes so that they become visible, knowable, and shareable in a new way.’34 In more homely terms, she was referring to the value of the great streams of information that are generated as a by-product of computerization.

Searle, John, Mind, Language and Society (London: Weidenfeld & Nicolson, 1999). Searle, John, ‘Watson Doesn’t Know it Won on “Jeopardy!”’, Wall Street Journal, 23 Feb. 2011 <> (accessed 28 March 2015). Seidman, Dov, How (Hoboken, NJ: Wiley, 2007). Sennett, Richard, The Craftsman (London: Penguin Books, 2009). Sennett, Richard, Together (London: Allen Lane, 2012). Shadbolt, Nigel, Wendy Hall, and Tim Berners-Lee, ‘The Semantic Web Revisited’, IEEE Intelligent Systems, 21: 3 (2006), 96–101. Shanteau, James, ‘Cognitive Heuristics and Biases in Behavioral Auditing: Review, Comments, and Observations’, Accounting, Organizations, and Society, 14: 1 (1989), 165–77. Shapiro, Carl, and Hal Varian, Information Rules (Boston: Harvard Business School Press, 1999). Shapiro, Carl, and Hal Varian, ‘Versioning: The Smart Way to Sell Information’, in James Gilmore and Joseph Pine (eds.), Markets of One (Boston: Harvard Business School Press, 2000).

Wickenden, William, A Professional Guide for Young Engineers (New York: Engineers’ Council for Professional Development, 1949). Widdicombe, Lizzie, ‘From Mars’, New Yorker, 23 Sept. 2013. WikiHouse, ‘WikiHouse 4.0’ <> (accessed 8 March 2015). Wikistrat, ‘Become an Analyst’ <> (accessed 8 March 2015). Wilensky, Harold, ‘The Professionalization of Everyone?’, American Journal of Sociology, 70: 2 (1964), 137–58. Wilks, Yorick, ‘What is the Semantic Web and What Will it Do for eScience’, Research Report, No.12, Oxford Internet Institute, October 2006. Winner, Langdon, Autonomous Technology (Cambridge, Mass.: MIT Press, 1977). Winner, Langdon, ‘Technology Today: Utopia or Dystopia?’, in Technology and the Rest of Culture, ed. Arien Mack (Columbus, Ohio: Ohio State University Press, 2001). Winograd, Terry, Language as a Cognitive Process (Boston: Addison-Wesley, 1982).

pages: 573 words: 163,302

Year's Best SF 15 by David G. Hartwell; Kathryn Cramer


air freight, Black Swan, experimental subject, Georg Cantor, gravity well, job automation, Kuiper Belt, phenotype, semantic web

I’d seen him trudging the porticoes in Turin, hunch-shouldered, slapping his feet, always looking sly and preoccupied. You only had to see the man to know that he had an agenda like no other writer in the world. “When Calvino finished his six lectures,” mused Massimo, “they carried him off to CERN in Geneva and they made him work on the ‘Semantic Web.’ The Semantic Web works beautifully, by the way. It’s not like your foul little Internet—so full of spam and crime.” He wiped the sausage knife on an oil-stained napkin. “I should qualify that remark. The Semantic Web works beautifully—in the Italian language. Because the Semantic Web was built by Italians. They had a little bit of help from a few French Oulipo writers.” “Can we leave this place now? And visit this Italy you boast so much about? And then drop by my Italy?” “That situation is complicated,” Massimo hedged, and stood up.

pages: 336 words: 93,672

The Future of the Brain: Essays by the World's Leading Neuroscientists by Gary Marcus, Jeremy Freeman


23andMe, Albert Einstein, bioinformatics, bitcoin, brain emulation, cloud computing, complexity theory, computer age, computer vision, conceptual framework, correlation does not imply causation, crowdsourcing, dark matter, data acquisition, Drosophila, epigenetics, Google Glasses, iterative process, linked data, mouse model, optical character recognition, pattern recognition, personalized medicine, phenotype, race to the bottom, Richard Feynman, Richard Feynman, Ronald Reagan, semantic web, speech recognition, stem cell, Steven Pinker, supply-chain management, Turing machine, web application

INCF has worked closely with partners, including the Allen Institute, University of Oslo, Duke University, University of Edinburgh, and others, to develop a standard coordinate space for mouse brain data, dubbed “Waxholm Space” and web services to facilitate translation between mouse brain atlases. In addition, in collaboration with the Neuroscience Information Framework (NIF, San Diego) it has produced community consensus ontologies and nomenclatures for neurons and brain structures, which have been placed in a public wiki ( employing the latest semantic web technologies. INCF supports working groups of experts from around the world to produce new standards, tools, services, and guidelines for the global community. With the advent of multiple large-scale brain initiatives around the world, INCF is well positioned to help coordinate standards and infrastructure between such projects at a global scale. INCF has agreed to coordinate some of the tools for brain atlases in the HBP Neuroinformatics Platform, and the HBP will build off of INCF infrastructures and adhere to INCF standards and guidelines whenever applicable.

Ontologies formalize the definitions of these structures and their names (and synonyms) so that the relationships between entities are explicit. Alternatively, by annotating the data with the spatial coordinates of where it was measured, it would be associated with the volume that has been named reticular nucleus of the thalamus. Careful curation of data and annotating it using the next generation semantic web technologies and spatial coordinates, each piece of data will be part of a rich brain atlas integrated with a web of knowledge about the brain. The Neuroinformatics Platform, coordinated by groups from the École Polytechnique Fédérale de Lausanne (EPFL), Karolinska Institute, University of Oslo, Forschungszentrum Jülich, Universidad Politécnica de Madrid, and Radboud Universiteit Nijmegen, will provide the tools for organizing neuroscience data in atlases that bring together collections of data about the mouse and human brains from around the world.

pages: 532 words: 139,706

Googled: The End of the World as We Know It by Ken Auletta


23andMe, AltaVista, Anne Wojcicki, Apple's 1984 Super Bowl advert, bioinformatics, Burning Man, carbon footprint, citizen journalism, Clayton Christensen, cloud computing, Colonization of Mars, commoditize, corporate social responsibility, creative destruction, death of newspapers, disintermediation, don't be evil, facts on the ground, Firefox, Frank Gehry, Google Earth, hypertext link, Innovator's Dilemma, Internet Archive, invention of the telephone, Jeff Bezos, jimmy wales, John Markoff, Kevin Kelly, knowledge worker, Long Term Capital Management, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, Menlo Park, Network effects, new economy, Nicholas Carr, PageRank, Paul Buchheit, Peter Thiel, Ralph Waldo Emerson, Richard Feynman, Richard Feynman, Sand Hill Road, Saturday Night Live, semantic web, sharing economy, Silicon Valley, Skype, slashdot, social graph, spectrum auction, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, strikebreaker, telemarketer, the scientific method, The Wisdom of Crowds, Upton Sinclair, X Prize, yield management, zero-sum game

Google will have competition from Microsoft’s renamed and reengineered search engine, Bing, launched in May 2009, which in July 2009 finally succeeded in merging with Yahoo search. One could argue that the ultimate vertical search would be provided by Artificial Intelligence (AI), computers that could infer what users actually sought. This has always been an obsession of Google’s founders, and they have recruited engineers who specialize in AI. The term is sometimes used synonymously with another, “the semantic Web,” which has long been championed by Tim Berners-Lee. This vision appears to be a long way from becoming real. Craig Silverstein, Google employee number 1, said a thinking machine is probably “hundreds of years away” Marc Andreessen suggests that it is a pipe dream. “We are no closer to a computer that thinks like a person than we were fifty years ago,” he said. Sometimes lost in the excitement over the wonders of ever more relevant search is the potential social cost.

Davenport, “Reverse Engineering Google’s Innovation Machine,” Harvard Business Review, April 2008. 324 Its social network site: author interviews with Google executives in Russia, Jason Bush, “Where Google Isn’t Goliath,” BusinessWeek, June 26, 2008. 324 “These companies air kiss”: author interview with Andrew Lack, October 4, 2007. 324 Facebook had 200 million users: author interview with Sheryl Sandberg, March 30, 2009. 324 “Anybody that gets”: author interview with Bill Campbell, October 8, 2007. 325 Lee began with : author interview with Kwan Lee, February 10, 2009. 325 “lacks a social gene”: author interview with John Borthwick, April 28, 2008. 326 “If I were Google”: author interview with Danny Sullivan, August 27, 2007. 326 The problem with horizontal search: author interview with Jason Calacanus, September 21, 2007. 327 “the semantic web”: Katie Franklin, “Google May Be Displaced, Said World Wide Web Creator Tim Berners-Lee”, Daily Telegraph, March 3, 2008. 327 “hundreds of years away”: author interview with Craig Silverstein, September 17, 2007. 327 “We are no closer”: author interview with Marc Andreessen, March 27, 2008. 327 In his provocative book: Nicholas Carr, The Big Switch: Rewiring the World, from Edison to Google, W.

pages: 597 words: 119,204

Website Optimization by Andrew B. King


AltaVista, bounce rate, don't be evil,, Firefox, In Cold Blood by Truman Capote, information retrieval, iterative process, medical malpractice, Network effects, performance metric, search engine result page, second-price auction, second-price sealed-bid, semantic web, Silicon Valley, slashdot, social graph, Steve Jobs, web application

So, instead of this: do this: Even better, remove all the variable query characters (?, $, and #): By eliminating the suffix to URIs, you avoid broken links and messy mapping when changing technologies in the future. See Chapter 9 for details on URI rewriting. See also "Cool URIs for the Semantic Web," at Write compelling summaries In newspaper parlance, the description that goes with a headline is called a deck or a blurb. Great decks summarize the story in a couple of sentences, enticing the user to read the article. Include keywords describing the major theme of the article for search engines. Don't get too bogged down in the details of your story.

The following short example shows how the statement mentioned previously could be encoded in a web page: <div xmlns:dc="" about=""> <span property="dc:creator">Andy King</span> </div> Soon, a significant amount of traffic from search engines will depend on the extent to which the underlying site makes useful structured data available. Things such as microformats and RDFa have been around in various forms for years, but now that search engines are noticing them, SEO practitioners are starting to take note, too. * * * [34] [35] [36] [37] [38] [39] [40] [41] Chapter 2. SEO Case Study: In this chapter, we'll show you how to put into action the optimization techniques that you learned in Chapter 1 and the conversion techniques you'll learn in Chapter 5.

pages: 303 words: 67,891

Advances in Artificial General Intelligence: Concepts, Architectures and Algorithms: Proceedings of the Agi Workshop 2006 by Ben Goertzel, Pei Wang


AI winter, artificial general intelligence, bioinformatics, brain emulation, combinatorial explosion, complexity theory, computer vision, conceptual framework, correlation coefficient, epigenetics, friendly AI, information retrieval, Isaac Newton, John Conway, Loebner Prize, Menlo Park, natural language processing, Occam's razor, p-value, pattern recognition, performance metric, Ray Kurzweil, Rodney Brooks, semantic web, statistical model, strong AI, theory of mind, traveling salesman, Turing machine, Turing test, Von Neumann architecture, Y2K

It also includes the biennial ECAI, the European Conference on Artificial Intelligence, proceedings volumes, and other ECCAI – the European Coordinating Committee on Artificial Intelligence – sponsored publications. An editorial panel of internationally well-known scholars is appointed to provide a high quality selection. Series Editors: J. Breuker, R. Dieng-Kuntz, N. Guarino, J.N. Kok, J. Liu, R. López de Mántaras, R. Mizoguchi, M. Musen and N. Zhong Volume 157 Recently published in this series Vol. 156. R.M. Colomb, Ontology and the Semantic Web Vol. 155. O. Vasilecas et al. (Eds.), Databases and Information Systems IV – Selected Papers from the Seventh International Baltic Conference DB&IS’2006 Vol. 154. M. Duží et al. (Eds.), Information Modelling and Knowledge Bases XVIII Vol. 153. Y. Vogiazou, Design for Emergence – Collaborative Social Play with Online and Location-Based Media Vol. 152. T.M. van Engers (Ed.), Legal Knowledge and Information Systems – JURIX 2006: The Nineteenth Annual Conference Vol. 151.

NARS can be connected to existing knowledge bases, such as Cyc (for commonsense knowledge), WordNet (for linguistic knowledge), Mizar (for mathematical knowledge), and so on. For each of them, a special interface module should be able to approximately translate knowledge from its original format into Narsese. x The Internet. It is possible for NARS to be equipped with additional modules, which use techniques like semantic web, information retrieval, and data mining, to directly acquire certain knowledge from the Internet, and put them into Narsese. x Natural language interface. After NARS has learned a natural language (as discussed previously), it should be able to accept knowledge from various sources in that language. Additionally, interactive tutoring will be necessary, which allows a human trainer to monitor the establishing of the knowledge base, to answer questions, to guide the system to form a proper goal structure and priority distributions among its concepts, tasks, and beliefs.

pages: 219 words: 63,495

50 Future Ideas You Really Need to Know by Richard Watson


23andMe, 3D printing, access to a mobile phone, Albert Einstein, artificial general intelligence, augmented reality, autonomous vehicles, BRICs, Buckminster Fuller, call centre, clean water, cloud computing, collaborative consumption, computer age, computer vision, crowdsourcing, dark matter, dematerialisation, digital Maoism, digital map, Elon Musk, energy security, failed state, future of work, Geoffrey West, Santa Fe Institute, germ theory of disease, happiness index / gross national happiness, hive mind, hydrogen economy, Internet of things, Jaron Lanier, life extension, Mark Shuttleworth, Marshall McLuhan, megacity, natural language processing, Network effects, new economy, oil shale / tar sands, pattern recognition, peak oil, personalized medicine, phenotype, precision agriculture, profit maximization, RAND corporation, Ray Kurzweil, RFID, Richard Florida, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Skype, smart cities, smart meter, smart transportation, statistical model, stem cell, Stephen Hawking, Steve Jobs, Steven Pinker, Stewart Brand, strong AI, Stuxnet, supervolcano, telepresence, The Wisdom of Crowds, Thomas Malthus, Turing test, urban decay, Vernor Vinge, Watson beat the top human players on Jeopardy!, web application, women in the workforce, working-age population, young professional

Web 2.0 A term often used to describe Web applications that help individuals to share information online, examples being sites such as Facebook and YouTube. Sometimes referred to as the participatory or conversational Web. Web 3.0 The next stage of Web development, although the term causes much disagreement. Sometimes refers to the ability of search engines to answer complex questions. It can also refer to the personalized Web, semantic Web or the geo-tagging of information. Web 4.0 Like Web 3.0 but immersive.

pages: 226 words: 17,533

Programming Scala: tackle multicore complexity on the JVM by Venkat Subramaniam


augmented reality, continuous integration, domain-specific language, don't repeat yourself, loose coupling, semantic web, type inference, web application

You don’t have to throw away the time, money, and effort you’ve invested writing Java code. You can intermix Scala with Java libraries. You can build full applications entirely in Scala or intermix it to the extent you desire with Java and other languages on the JVM. So, your Scala code could be as small as a script or as large as a full-fledged enterprise application. Scala has been used to build applications in various domains including telecommunications, social networking, semantic web, and digital asset management. Apache Camel uses Scala for its DSL to create routing rules. Lift WebFramework is a powerful web development framework built using Scala. It takes full advantage of Scala features such as conciseness, expressiveness, pattern matching, and concurrency. 1.2 What’s Scala? Scala, short for Scalable Language, is a hybrid functional programming language. It was created by Martin Odersky3 and was first released in 2003.

pages: 193 words: 19,478

Memory Machines: The Evolution of Hypertext by Belinda Barnet


augmented reality, Benoit Mandelbrot, Bill Duvall, British Empire, Buckminster Fuller, Claude Shannon: information theory, collateralized debt obligation, computer age, conceptual framework, Douglas Engelbart, Douglas Engelbart, game design, hiring and firing, Howard Rheingold, HyperCard, hypertext link, information retrieval, Internet Archive, John Markoff, linked data, mandelbrot fractal, Marshall McLuhan, Menlo Park, nonsequential writing, Norbert Wiener, publish or perish, Robert Metcalfe, semantic web, Steve Jobs, Stewart Brand, technoutopianism, Ted Nelson, the scientific method, Vannevar Bush, wikimedia commons

At least some of these cultures take part in open-source software development, including some for whom Xanadu is not just a vision in a dream. There are some grounds for hope. However poorly conceived the general infrastructure, however corrupt and benighted the superstructure, the society of networks does support, somewhat obscurely, a plurality of ideas. Even on what ostensibly counts as the ascendant side, there is room for Berners-Lee to envision a Semantic Web that aims to cast some light below our diving xviii Memory Machines boards  –  and for great institutional innovators such as Wendy Hall of Southampton to extend the affordances of the Web through artful exploitations on the server side. Hypertext takes no single line. The concept itself arises from the idea of extension or complication ­– writing in a higher-dimensional space – so how could it be confined to one chain of transmission?

pages: 743 words: 201,651

Free Speech: Ten Principles for a Connected World by Timothy Garton Ash


A Declaration of the Independence of Cyberspace, Affordable Care Act / Obamacare, Andrew Keen, Apple II, Ayatollah Khomeini, battle of ideas, Berlin Wall, bitcoin, British Empire, Cass Sunstein, Chelsea Manning, citizen journalism, Clapham omnibus, colonial rule, crowdsourcing, David Attenborough, don't be evil, Donald Davies, Douglas Engelbart, Edward Snowden, Etonian, European colonialism, eurozone crisis, failed state, Fall of the Berlin Wall, Ferguson, Missouri, Filter Bubble, financial independence, Firefox, Galaxy Zoo, George Santayana, global village, index card, Internet Archive, invention of movable type, invention of writing, Jaron Lanier, jimmy wales, John Markoff, Julian Assange, Mark Zuckerberg, Marshall McLuhan, mass immigration, megacity, mutually assured destruction, national security letter, Netflix Prize, Nicholas Carr, obamacare, Peace of Westphalia, Peter Thiel, pre–internet, profit motive, RAND corporation, Ray Kurzweil, Ronald Reagan, semantic web, Silicon Valley, Simon Singh, Snapchat, social graph, Stephen Hawking, Steve Jobs, Steve Wozniak, The Death and Life of Great American Cities, The Wisdom of Crowds, Turing test, We are Anonymous. We are Legion, WikiLeaks, World Values Survey, Yom Kippur War

The richest university in the world, Harvard, called on its scholars to make their work available in open-access journals, saying that its library could no longer afford the $3.5 million annual bill payable to the likes of Elsevier.36 The British government demanded that the results of any publicly funded research should be made freely available to the public and commissioned a report on the best way to cover the editorial, peer-review and production costs of academic publications.37 Tragically, this battle over intellectual property claimed the life of a brilliant young man. Aaron Swartz, an American computing prodigy, co-developed Reddit, an online bulletin board which by 2015 clocked more than 150 million unique monthly visitors viewing more than six billion pages. He was involved in pioneering the widely used RSS web feed, worked with Tim Berners-Lee to improve data sharing through the Semantic Web and with cyberlaw guru Lawrence Lessig on the Creative Commons licences. All this by age 26.38 Swartz believed passionately that data, information and knowledge should be freely accessible to all. So he obtained the book-cataloguing data kept by the Library of Congress, for which it usually charged, and posted it on something called the Open Library. He found his way into 19.9 million pages of electronic records of US court proceedings and uploaded them for all to see on Using his computer skills and his Massachusetts Institute of Technology (MIT) guest access to the JSTOR online library of journal articles, for which most universities pay a hefty fee, he started downloading articles to a laptop hidden in a wiring cupboard at MIT.

., 195 search engine manipulation, 365 ‘search engine optimisation,’ 302 Second Life, 316 secrecy: C/S ratio, 324; guarding the guardians, 334–38; official, 324–27, 332–34, 337–38, 344–45; in wartime, 326; ‘well-placed sources,’ 341–45; whistleblowers and leakers, 339–41 section 295 of Indian/Pakistani penal code, 225, 254, 268, 275 secularism, 261, 265, 267, 273, 277–78, 281 security: executive oversight of, 335; versus freedom, 327–29; judiciary oversight of, 336–37; legislative oversight of, 335–36; national and personal, 321 sedition, 325 seditious libel, 331 Sedley, Stephen, 77, 131 Seinfeld, Jerry, 244 Selassie, Haile, 205 self-broadcasting/-publishing, 56–58 self-restraint, 213 Semantic Web, 164 Semprun, Jorgé, 304 Sen, Amartya, 78, 109, 193–94 Senegal, 243, 277 September 11, 2001 attacks, 64, 273, 322–24 Serbia, 133, 242 Serbo-Croat language, 123, 207 Serrano, Andres, 146 Serres, Michel, 25 sex, speech as, 89, 247–48 Shakarian, Hayastan, 349 Shakespeare, William, 156, 212 Shamikah, 313 Shamsie, Kamila, 90 ‘sharing,’ 166 Sharp, Gene, 148–49 Shaw, George Bernard, 17, 109 Shayegan, Daryush, 98 shield laws, 342 Shils, Edward, 99, 208–9 Shotoku (Prince), 109 Shrimsley, Robert, 142 Shteyngart, Gary, 13, 16 ‘Shunga’ art exhibition, 246–47 Sikhs, 131, 253, 262, 274 Siliconese, 50 Simone, Nina, 74, 78, 119, 212 Simpson, O.

pages: 336 words: 90,749

How to Fix Copyright by William Patry


A Declaration of the Independence of Cyberspace, barriers to entry, big-box store, borderless world, business intelligence, citizen journalism, cloud computing, commoditize, creative destruction, crowdsourcing, death of newspapers,, facts on the ground, Frederick Winslow Taylor, George Akerlof, Gordon Gekko, haute cuisine, informal economy, invisible hand, Joseph Schumpeter, Kickstarter, knowledge economy, lone genius, means of production, moral panic, new economy, road to serfdom, Ronald Coase, Ronald Reagan, semantic web, shareholder value, Silicon Valley, The Chicago School, The Wealth of Nations by Adam Smith, trade route, transaction costs, trickle-down economics, web application, winner-take-all economy, zero-sum game

To make copyright work on a technical system (like the Internet) you’d need to look at the system itself for the means to implement the law. 6. I have no idea. 7. Advances in technologies create problems that can only be solved by further advances in those same technologies. 8. The answer to beating a machine (say, at chess) is understanding how it works. “THE ANSWER TO THE MACHINE IS IN THE MACHINE” IS A REALLY BAD METAPHOR 235 9. The semantic web is the answer to all potential problems of access, control, and copying online. In other words, the proliferation of metadata standards will solve the “problem” of the existing behavior of computers, and in particular, search engines. 10. The challenge to copyright that the machine has always posed historically—shifting production cost and thus power—can be met by building a response into the same machine.

pages: 387 words: 105,250

The Caryatids by Bruce Sterling


carbon footprint, clean water, failed state, impulse control, negative equity, new economy, nuclear winter, semantic web, sexual politics, social software, stem cell, supervolcano, urban renewal, Whole Earth Review

The results arrived in a blistering deluge of search hits. The results were ugly. They had hit on a subject that knowledgeable experts had been discussing for a hundred years. The most heavily trafficked tag was the strange coinage “Supervolcano.” Supervolcanoes had been a topic of mild intellectual interest for many years. Recently, people had talked much less about supervolcanoes, and with more pejoratives in their semantics. Web-semantic traffic showed that people were actively shunning the subject of supervolcanoes. That scientific news seemed to be rubbing people the wrong way. “So,” said Guillermo at last, “according to our best sources here, there are some giant … and I mean really giant magma plumes rising up and chewing at the West Coast of North America. Do we have a Family consensus about that issue?” Raph still wasn’t buying it.

pages: 294 words: 81,292

Our Final Invention: Artificial Intelligence and the End of the Human Era by James Barrat


3D printing, AI winter, Amazon Web Services, artificial general intelligence, Asilomar, Automated Insights, Bayesian statistics, Bernie Madoff, Bill Joy: nanobots, brain emulation, cellular automata, Chuck Templeton: OpenTable, cloud computing, cognitive bias, commoditize, computer vision, cuban missile crisis, Daniel Kahneman / Amos Tversky, Danny Hillis, data acquisition, don't be evil, drone strike, Extropian, finite state, Flash crash, friendly AI, friendly fire, Google Glasses, Google X / Alphabet X, Isaac Newton, Jaron Lanier, John Markoff, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, Loebner Prize, lone genius, mutually assured destruction, natural language processing, Nicholas Carr, optical character recognition, PageRank, pattern recognition, Peter Thiel, prisoner's dilemma, Ray Kurzweil, Rodney Brooks, Search for Extraterrestrial Intelligence, self-driving car, semantic web, Silicon Valley, Singularitarianism, Skype, smart grid, speech recognition, statistical model, stealth mode startup, stem cell, Stephen Hawking, Steve Jobs, Steve Wozniak, strong AI, Stuxnet, superintelligent machines, technological singularity, The Coming Technological Singularity, Thomas Bayes, traveling salesman, Turing machine, Turing test, Vernor Vinge, Watson beat the top human players on Jeopardy!, zero day

But since AIXI is uncomputable, it would never be a candidate for an intelligence explosion anyway. AIXItl—a computable approximation of AIXI—is another matter. This is also probably not true of mind uploading, if such a thing ever comes to pass. Computer science-based researchers want to engineer AGI: The mind versus brain debate is too large to address here. with $50 million in grants: Lenat, Doug, “Doug Lenat on Cyc, a truly semantic Web, and artificial intelligence (AI),” developerWorks, September 16, 2008, (accessed September 28, 2011). Carnegie Mellon University’s NELL: Lohr, Steve, “Aiming to Learn as We Do, a Machine Teaches Itself,” New York Times, sec. science, October 4, 2010, (accessed September 28, 2011).

pages: 407 words: 103,501

The Digital Divide: Arguments for and Against Facebook, Google, Texting, and the Age of Social Netwo Rking by Mark Bauerlein


Amazon Mechanical Turk, Andrew Keen, centre right, citizen journalism, collaborative editing, computer age, computer vision, corporate governance, crowdsourcing, David Brooks, disintermediation, Frederick Winslow Taylor, Howard Rheingold, invention of movable type, invention of the steam engine, invention of the telephone, Jaron Lanier, Jeff Bezos, jimmy wales, Kevin Kelly, knowledge worker, late fees, Mark Zuckerberg, Marshall McLuhan, means of production, meta analysis, meta-analysis, moral panic, Network effects, new economy, Nicholas Carr, PageRank, peer-to-peer,, Results Only Work Environment, Saturday Night Live, search engine result page, semantic web, Silicon Valley, slashdot, social graph, social web, software as a service, speech recognition, Steve Jobs, Stewart Brand, technology bubble, Ted Nelson, The Wisdom of Crowds, Thorstein Veblen, web application

Hence our theme for this year: Web Squared. 1990–2004 was the match being struck; 2005–2009 was the fuse; and 2010 will be the explosion. Ever since we first introduced the term “Web 2.0,” people have been asking, “What’s next?” Assuming that Web 2.0 was meant to be a kind of software version number (rather than a statement about the second coming of the Web after the dot-com bust), we’re constantly asked about “Web 3.0.” Is it the semantic web? The sentient web? Is it the social web? The mobile web? Is it some form of virtual reality? It is all of those, and more. The Web is no longer a collection of static pages of HTML that describe something in the world. Increasingly, the Web is the world—everything and everyone in the world casts an “information shadow,” an aura of data which, when captured and processed intelligently, offers extraordinary opportunity and mind-bending implications.

pages: 1,038 words: 137,468

JavaScript Cookbook by Shelley Powers


Firefox, Google Chrome, hypertext link, p-value, semantic web, web application, WebSocket

operator, 120 test method (RegExp), 24 U testing code with JsUnit, 392–396 text elements (forms), 162 undefined array elements, 70 text input (forms), accessing, 159–161 undefined data type, 11 text results to Ajax requests, processing, 422 Unicode sequences, 16 text value (aria-relevant attribute), 324 unit testing, 393 textareas universal selector (*), 232 events for, 162 unload events, 115 lines in, processing, 16–17 warnings when leaving pages, 147 observing character input for, 129–132 unordered lists, applying striping theme to, textInput events, 130 230–231 TextRectangle objects, 272 uppercase (see case) this context, 163 URIError errors, 185 this keyword, 360, 383–385 URLs, adding persistent information to, 458– keeping object members private, 361–362 461 throw statements, 184 user error, about, 177 throwing exceptions, 184 user input, form (see forms) Thunderbird extensions, building, 486 user input, validating (see validating) time (see date and time; tiers) userAgent property (Navigator), 146 timed page updates, 427–430 UTC date and time, printing, 42–43 timeouts, 49–50 UTCString method (Date), 42 timerEvent function, 428 timers, 41 V function closures with, 52–53 validating incremental counters in code, 57–58 array contents, 86–87 recurring, 50–51 checking for function errors, 180–181 triggering timeouts, 49–50 with forms title elements, 211 based on format, 166–167 today’s date, printing, 41–42 canceling invalid data submission, 167– toISOString method (Date), 44 168 toLowerCase method (String), 5 dynamic selection lists, 173–176 tools, extending with JavaScript, 496–499 preventing multiple submissions, 169– top property (bounding rectangle), 272, 273 171 toString method, 1, 59 function arguments (input), 95 touch swiping events, 117 highlighting invalid form fields, 302–307 toUpperCase method (String), 5 with jQuery Validation plug-in, 403 tr elements social security numbers, 26–28 adding to tables, 257–260 value attribute (objects), 370 Index | 527 valueOf method, 11, 12 writable attribute (objects), 370 variable values, checking, 181–182 vendor property (Navigator), 146 X video (see rich media) video elements, 326, 353–357 X3D, 326 visibility property (CSS), 172, 276 XML documents VoiceOver screen reader, 297 extracting pertinent information from, 437– 442 W processing, 436–437 XMLHttpRequest objects \w in regular expressions, 23 accessing, 414–415 \W in regular expressions, 23 adding callback functions to, 420–421 warn function (JsUnit), 394 checking for error conditions, 421 Watch panel (Firebug), 190 making requests to other domains, 422– Web Inspector (Safari), 203 424 web page elements (see elements) XScriptContext objects, 499 web page space (see page space) web pages (see document elements; pages) web-safe colors, 148 Web Sockets API, 413, 429 Web Workers, 500–509 WebGL (Web Graphics Library), 326, 350– 351 WebKit (Google) debugging with, 208–209 WebGL support in, 350–351 .wgt files, 493 while loop, iterating through arrays with, 71 whitespace, 269 (see also page space) matching in regular expressions, 23 nonbreaking space character, 19 trimming from form data, 162 trimming from strings, 17–19 using regular expressions, 35–36 widgets, creating, 489–494 width (see size) width attribute (canvas element), 327 width property (bounding rectangle), 272 width property (Screen), 149 window area, measuring, 270–271 window elements, 143 creating new stripped-down, 144–145 open method, 145 window space (see page space) windows, communicating across, 430–434 Windows-Eyes, 297 words, 32 (see also strings) swapping order of, 32–34 528 | Index About the Author Shelley Powers has been working with and writing about web technologies—from the first release of JavaScript to the latest graphics and design tools—for more than 15 years. Her recent O’Reilly books have covered the semantic web, Ajax, JavaScript, and web graphics. She’s an avid amateur photographer and web development aficionado. Colophon The animal on the cover of JavaScript Cookbook is a little (or lesser) egret ( Egretta garzetta). A small white heron, it is the old world counterpart to the very similar new world snowy egret. It is the smallest and most common egret in Singapore, and its original breeding distribution included the large inland and coastal wetlands in warm temperate parts of Europe, Asia, Africa, Taiwan, and Australia.

pages: 413 words: 119,587

Machines of Loving Grace: The Quest for Common Ground Between Humans and Robots by John Markoff


A Declaration of the Independence of Cyberspace, AI winter, airport security, Apple II, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, basic income, Baxter: Rethink Robotics, Bill Duvall, bioinformatics, Brewster Kahle, Burning Man, call centre, cellular automata, Chris Urmson, Claude Shannon: information theory, Clayton Christensen, clean water, cloud computing, collective bargaining, computer age, computer vision, crowdsourcing, Danny Hillis, DARPA: Urban Challenge, data acquisition, Dean Kamen, deskilling, don't be evil, Douglas Engelbart, Douglas Engelbart, Douglas Hofstadter, Dynabook, Edward Snowden, Elon Musk, Erik Brynjolfsson, factory automation, From Mathematics to the Technologies of Life and Death, future of work, Galaxy Zoo, Google Glasses, Google X / Alphabet X, Grace Hopper, Gunnar Myrdal, Gödel, Escher, Bach, Hacker Ethic, haute couture, hive mind, hypertext link, indoor plumbing, industrial robot, information retrieval, Internet Archive, Internet of things, invention of the wheel, Jacques de Vaucanson, Jaron Lanier, Jeff Bezos, job automation, John Conway, John Markoff, John Maynard Keynes: Economic Possibilities for our Grandchildren, John Maynard Keynes: technological unemployment, John von Neumann, Kevin Kelly, knowledge worker, Kodak vs Instagram, labor-force participation, loose coupling, Marc Andreessen, Mark Zuckerberg, Marshall McLuhan, medical residency, Menlo Park, Mother of all demos, natural language processing, new economy, Norbert Wiener, PageRank, pattern recognition, pre–internet, RAND corporation, Ray Kurzweil, Richard Stallman, Robert Gordon, Rodney Brooks, Sand Hill Road, Second Machine Age, self-driving car, semantic web, shareholder value, side project, Silicon Valley, Silicon Valley startup, Singularitarianism, skunkworks, Skype, social software, speech recognition, stealth mode startup, Stephen Hawking, Steve Ballmer, Steve Jobs, Steve Wozniak, Steven Levy, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Nelson, telemarketer, telepresence, telepresence robot, Tenerife airport disaster, The Coming Technological Singularity, the medium is the message, Thorstein Veblen, Turing test, Vannevar Bush, Vernor Vinge, Watson beat the top human players on Jeopardy!, Whole Earth Catalog, William Shockley: the traitorous eight, zero-sum game

Critics, such as Ben Shneiderman, insisted that software assistants were both technically and ethically flawed. They argued for keeping human users in direct control rather than handing off decisions to a software valet. The Siri team did not shy away from the controversy, and it wasn’t long before they pulled back the curtain on their project, just a bit. By late spring 2009, Gruber was speaking obliquely about the new technology. During the summer of that year he appeared at a Semantic Web conference and described, point by point, how the futuristic technologies in the Knowledge Navigator were becoming a reality: there were now touch screens that enabled so-called gestural interfaces, there was a global network for information sharing and collaboration, developers were coding programs that interacted with humans, and engineers had started to finesse natural and continuous speech recognition.

pages: 373 words: 112,822

The Upstarts: How Uber, Airbnb, and the Killer Companies of the New Silicon Valley Are Changing the World by Brad Stone

Affordable Care Act / Obamacare, Airbnb, AltaVista, Amazon Web Services, Andy Kessler, autonomous vehicles, Burning Man, call centre, Chuck Templeton: OpenTable, collaborative consumption, East Village, fixed income, Google X / Alphabet X, housing crisis, inflight wifi, Jeff Bezos,, Kickstarter, Lyft, Marc Andreessen, Mark Zuckerberg, Menlo Park, Necker cube, obamacare, Paul Graham, peer-to-peer, Peter Thiel, race to the bottom, rent control, ride hailing / ride sharing, Ruby on Rails, Sand Hill Road, self-driving car, semantic web, sharing economy, side project, Silicon Valley, Silicon Valley startup, Skype, South of Market, San Francisco, Startup school, Steve Jobs, TaskRabbit, Tony Hsieh, transportation-network company, Uber and Lyft, Uber for X, Y Combinator, Y2K, Zipcar

He got his undergraduate degree in 2001 and stayed at the university to pursue a master of science, finally leaving his comfortable nest after he turned twenty-two to move into a campus apartment with classmates. Camp met Geoff Smith, who would become his StumbleUpon co-founder, through one of his childhood friends and together they started the site as a way for users to share and find interesting things on the internet without having to search for them on Google. Camp was obsessed with collaborative information systems and the semantic web. He didn’t go out much back then, splitting his time between his graduate thesis and the company and immersing himself in dense academic papers about esoteric topics in computer science. By the time Camp finished his degree in 2005, StumbleUpon was starting to show promise. Camp and Smith met an angel investor that year who convinced them to move to San Francisco and raise capital. They incorporated the company in the United States, and over the next year, the number of users on StumbleUpon grew from five hundred thousand to two million.

The Singularity Is Near: When Humans Transcend Biology by Ray Kurzweil


additive manufacturing, AI winter, Alan Turing: On Computable Numbers, with an Application to the Entscheidungsproblem, Albert Einstein, anthropic principle, Any sufficiently advanced technology is indistinguishable from magic, artificial general intelligence, Asilomar, augmented reality, autonomous vehicles, Benoit Mandelbrot, Bill Joy: nanobots, bioinformatics, brain emulation, Brewster Kahle, Brownian motion, business intelligence,, call centre, carbon-based life, cellular automata, Claude Shannon: information theory, complexity theory, conceptual framework, Conway's Game of Life, cosmological constant, cosmological principle, cuban missile crisis, data acquisition, Dava Sobel, David Brooks, Dean Kamen, disintermediation, double helix, Douglas Hofstadter,, epigenetics, factory automation, friendly AI, George Gilder, Gödel, Escher, Bach, informal economy, information retrieval, invention of the telephone, invention of the telescope, invention of writing, Isaac Newton, iterative process, Jaron Lanier, Jeff Bezos, job automation, job satisfaction, John von Neumann, Kevin Kelly, Law of Accelerating Returns, life extension, lifelogging, linked data, Loebner Prize, Louis Pasteur, mandelbrot fractal, Mikhail Gorbachev, mouse model, Murray Gell-Mann, mutually assured destruction, natural language processing, Network effects, new economy, Norbert Wiener, oil shale / tar sands, optical character recognition, pattern recognition, phenotype, premature optimization, randomized controlled trial, Ray Kurzweil, remote working, reversible computing, Richard Feynman, Richard Feynman, Robert Metcalfe, Rodney Brooks, Search for Extraterrestrial Intelligence, selection bias, semantic web, Silicon Valley, Singularitarianism, speech recognition, statistical model, stem cell, Stephen Hawking, Stewart Brand, strong AI, superintelligent machines, technological singularity, Ted Kaczynski, telepresence, The Coming Technological Singularity, Thomas Bayes, transaction costs, Turing machine, Turing test, Vernor Vinge, Y2K, Yogi Berra

Wikipedia, 10. See note 57 in chapter 2 for an analysis of the information content in the genome, which I estimate to be 30 to 100 million bytes, therefore less than 109 bits. See the section "Human Memory Capacity" in chapter 3 (p. 126) for my analysis of the information in a human brain, estimated at 1018 bits. 11. Marie Gustafsson and Christian Balkenius, "Using Semantic Web Techniques for Validation of Cognitive Models against Neuroscientific Data," AILS04 Workshop, SAIS/SSLS Workshop (Swedish Artificial Intelligence Society; Swedish Society for Learning Systems), April 15–16, 2004, Lund, Sweden, 12. See discussion in chapter 3. In one useful reference, when modeling neuron by neuron, Tomaso Poggio and Christof Koch describe the neuron as similar to a chip with thousands of logical gates.

pages: 999 words: 194,942

Clojure Programming by Chas Emerick, Brian Carper, Christophe Grand


Amazon Web Services, Benoit Mandelbrot, cloud computing, continuous integration, database schema, domain-specific language, don't repeat yourself,, failed state, finite state, Firefox, game design, general-purpose programming language, Guido van Rossum, Larry Wall, mandelbrot fractal, Paul Graham, platform as a service, premature optimization, random walk, Ruby on Rails, Schrödinger's Cat, semantic web, software as a service, sorting algorithm, Turing complete, type inference, web application

* * * [177] Note that metadata on keys of &env can’t be relied upon, in particular in the presence of local aliases. [178] See Testing Contextual Macros for our stab at an alternative macroexpansion function that does support this without the var-dereferencing line noise. [179] Or, returned by a previous expansion. [180] Triples are a term for subject-predicate-object expressions, as found in semantic web technologies like RDF. Specific representations and semantics of triples vary from implementation to implementation, but a simplified example of a vector triple might be ["Boston" :capital-of "Massachusetts"]. [181] refer is described in “refer”, and is also reused by use, described later in that chapter. [182] We describe type hints in Type Hinting for Performance. [183] This is due to an unfortunate implementation detail: special forms (like let, the outermost form in the expression returned by or) cannot be hinted.

Data Mining: Concepts and Techniques: Concepts and Techniques by Jiawei Han, Micheline Kamber, Jian Pei


bioinformatics, business intelligence, business process, Claude Shannon: information theory, cloud computing, computer vision, correlation coefficient, cyber-physical system, database schema, discrete time, distributed generation, finite state, information retrieval, iterative process, knowledge worker, linked data, natural language processing, Netflix Prize, Occam's razor, pattern recognition, performance metric, phenotype, random walk, recommendation engine, RFID, semantic web, sentiment analysis, speech recognition, statistical model, stochastic process, supply-chain management, text mining, thinkpad, Thomas Bayes, web application

seeStructural Clustering Algorithm for Networks core vertex 531 illustrated 532 scatter plots 54 2-D data set visualization with 59 3-D data set visualization with 60 correlations between attributes 54–56 illustrated 55 matrix 56, 59 schemas integration 94 snowflake 140–141 star 139–140 science applications 611–613 search engines 28 search space pruning 263, 301 second guess heuristic 369 selection dimensions 225 self-training 432 semantic annotations applications 317, 313, 320–321 with context modeling 316 from DBLP data set 316–317 effectiveness 317 example 314–315 of frequent patterns 313–317 mutual information 315–316 task definition 315 Semantic Web 597 semi-offline materialization 226 semi-supervised classification 432–433, 437 alternative approaches 433 cotraining 432–433 self-training 432 semi-supervised learning 25 outlier detection by 572 semi-supervised outlier detection 551 sensitivity analysis 408 sensitivity measure 367 sentiment classification 434 sequence data analysis 319 sequences 586 alignment 590 biological 586, 590–591 classification of 589–590 similarity searches 587 symbolic 586, 588–590 time-series 586, 587–588 sequential covering algorithm 359 general-to-specific search 360 greedy search 361 illustrated 359 rule induction with 359–361 sequential pattern mining 589 constraint-based 589 in symbolic sequences 588–589 shapelets method 590 shared dimensions 204 pruning 205 shared-sorts 193 shared-partitions 193 shell cubes 160 shell fragments 192, 235 approach 211–212 computation algorithm 212, 213 computation example 214–215 precomputing 210 shrinking diameter 592 sigmoid function 402 signature-based detection 614 significance levels 373 significance measure 312 significance tests 372–373, 386 silhouette coefficient 489–490 similarity asymmetric binary 71 cosine 77–78 measuring 65–78, 79 nominal attributes 70 similarity measures 447–448, 525–528 constraints on 533 geodesic distance 525–526 SimRank 526–528 similarity searches 587 in information networks 594 in multimedia data mining 596 simple random sample with replacement (SRSWR) 108 simple random sample without replacement (SRSWOR) 108 SimRank 526–528, 539 computation 527–528 random walk 526–528 structural context 528 simultaneous aggregation 195 single-dimensional association rules 17, 287 single-linkage algorithm 460, 461 singular value decomposition (SVD) 587 skewed data balanced 271 negatively 47 positively 47 wavelet transforms on 102 slice operation 148 small-world phenomenon 592 smoothing 112 by bin boundaries 89 by bin means 89 by bin medians 89 for data discretization 90 snowflake schema 140 example 141 illustrated 141 star schema versus 140 social networks 524–525, 526–528 densification power law 592 evolution of 594 mining 623 small-world phenomenon 592see alsonetworks social science/social studies data mining 613 soft clustering 501 soft constraints 534, 539 example 534 handling 536–537 space-filling curve 58 sparse data 102 sparse data cubes 190 sparsest cuts 539 sparsity coefficient 579 spatial data 14 spatial data mining 595 spatiotemporal data analysis 319 spatiotemporal data mining 595, 623–624 specialized SQL servers 165 specificity measure 367 spectral clustering 520–522, 539 effectiveness 522 framework 521 steps 520–522 speech recognition 430 speed, classification 369 spiral method 152 split-point 333, 340, 342 splitting attributes 333 splitting criterion 333, 342 splitting rules.