It would be nice to have a machine readable copy of the UK National rail timetable. Network rail supply a timetable in PDF format. It is possible to extract machine readable timetable data from this PDF.

Requirements

  • pdftops: Portable Document Format (PDF) to PostScript converter. Ships as part of Poppler (GPL 2)

    To install on Debian or Ubuntu run: apt-get install poppler-utils

  • CompleteTimetable.pdf: Download the time table PDF from Network Rail (60M)

  • Perl: Also need the modules Data::Dump and List::MoreUtils.

    On Debian or Ubuntu run: apt-get install libdata-dump-perl liblist-moreutils-perl

  • Rail timetable parser: Download parser

Usage

Put CompleteTimetable.pdf in a directory with parse. Run parse: perl parse The first time it is run it will call pdftops to convert the PDF, a binary format, into PostScript, a text format, which is easy to work with.

Then it will print lots of debugging output about pages, timetables and trains.

Output

Here is an sample of output in a range of formats:

Further work

This code is unfinished, lots of cases are not handled. Specifically:

  • Base notes
  • Head notes
  • Date ranges
  • Train flags
  • Trains that join or split
  • Repeat trains: "and at the same minutes past each hour until"