Wikipedia Find Link

In 2008 I built a tool called 'find link' for adding links between Wikipedia articles. Briefly; editors can enter the title of an article into find link, then the software uses the search API to find mentions of that topic. Articles with mentions that already link back to the original are filtered out.

Each of the remaining results include a link to the Wikipedia edit change preview page, the editor can inspect the diff, then hit save to add the link. The find link tool is linked from the Orphan box template on English Wikipedia. I've used it myself to add 47,800 links to Wikipedia. In total I estimate 562,000 links have been added to Wikipedia using my find link tool.

example: http://edwardbetts.com/find_link/cantilever_bridge

source code: https://github.com/EdwardBetts/find_link

Adding Wikidata identifiers to OpenStreetMap

I'm involved with two Wikipedia-style collaborative projects that contain large amounts of geographic data. Wikidata is a knowledge base operated by the Wikimedia Foundation. OpenStreetMap (OSM) is a project with the aim of building a free editable map of the world. It would be useful if the places and geographic entities in the two systems could be linked together. OSM editors have add 32,500 Wikidata identifiers to objects in OSM by hand. I've written software to automated the process.

The software starts by looking for Wikidata items within a set of categories. Any item without geographic information is skipped. Then it searches OSM for the entities nearby with the same type and compares the names. There is a lot of complexity in the name matching, both systems support names in multiple languages. The names are normalized to improve the chances of finding a positive match. If the coordinates, entity type and names are the same then the system considers it a match.

Using this method I've been able to find 226,919 matches.

The latest results: http://edwardbetts.com/osm-wikidata/results/

Source code: https://github.com/EdwardBetts/osm-wikidata

There are still a few bugs that need to be resolved before I upload my results to OpenStreetMap. There are at least 34 false positives in the results that need to investigated.

Open Library work finder

As part of my work as Open Library data munger I built a system for identifying bibliographic works. When we started Open Library it contained database records for authors and editions. For popular authors we had over 100 editions of the same work. This made the system difficult to browser or search.

The work finder analysis all the editions written by a given author and groups them into works. Library catalog records of translated books have the translated title in the title field, but include the original untranslated title in another field. I was able to use this information to produce a work record that included translations. Matching the titles was similar to the OSM and Wikidata name matching.

Source code: https://github.com/internetarchive/openlibrary/blob/master/openlibrary/catalog/works/find_works.py

I was hired by the Internet Archive to work on the Open Library with the job title Data Munger. I loaded data about books from many sources including libraries, publishers and book sellers. Each of these data sources use a different format to describe a book, I munged this data into a standard format. Our book database contained records describing authors and editions. When