- New solr
- http://pi.us.archive.org/control/pi.php
- http://ia600307.us.archive.org/~edward/marc.php?path=/35/items/flatlandromanceo00abbouoft&identifier=flatlandromanceo00abbouoft
Import
For merge we need four indexes, three book identifiers, ISBN, OCLC, LCCN, and normalised title truncated to 25 characters as a fall back. These indexes are implemented using dbmhash. The problem with this is that only one process can read and write the indexes at a time. The solution is to build an import server that handles reading and writing the indexes. The most obvious way of implementing interprocess communication is with HTTP. The import server will be written using web.py.
The HTTP method GET is used for searching the indexes, and POST for adding a new record to the database and updating the indexes.
The import server handles updating the database to avoid conflicts when generating new keys for authors and editions.
Index fields are passed as GET parameters, lists are joined with '_'. The response is in JSON. For example:
For the fields: {'isbn': ['0415045568', '0391025511'], 'title': ['phenomenology of percepti']}
The URL is: http://wiki-beta.us.archive.org:9020/?isbn=0415045568_0391025511&title=phenomenology+of+percepti
And the response is:
{"fields": {"isbn": ["0415045568"], "title": ["phenomenology of percepti"]}, "pool": {"isbn": [1366447, 10187591], "title": [1366447, 8071041, 10146455, 10187591, 10198188, 10198270, 10568028, 13557619, 13620735, 17343673]}}
The numbers in the response are database IDs.
Roberts wishlist
- Lending waiting list
- Multiple copies to loan, pull books from shelves in libraries
- Search inside filtered by collection
- All archive.org books on Open Library
Bugs
- MARC source should say 'Humboldt State University'
- Get rid of ia:ic in source_records: bug report from Dan
Todo
Fix ol-tasks so it has FTP iptables rules:
iptables -I INPUT 8 -p tcp --dport ftp-data -j ACCEPT
iptables -I INPUT 8 -p tcp --dport ftp-data -j ACCEPT- Load data from http://librisbloggen.kb.se/2011/09/21/swedish-national-bibliography-and-authority-data-released-with-open-license/ (see e-mail)
- Browse http://www.smalldemons.com/
Notes
- Hank implemented range requests on archive.org: http://www.archive.org/download/marc_records_scriblio_net/part05.dat?range=18095778:686
- Available circulation data (e-mail from Karen): http://www.oclc.org/research/activities/ohiolink/circulation.htm
- S3: http://www.archive.org/help/abouts3.txt
- contrib_submit: http://www.archive.org/contrib_submit.php?help=1
clear out ivm29
onix_wiley_crawl
elsevier_covers_crawl
onix_princeton