Problem
Can the PDFs at http://www.x.org/docs/AMD/ be converted into a more usable format?
Solution
Convert from PDF to PostScript, because text formats are easier to work with:
pdftops -noembcidps -noembcidtt -noembtt -noembt1 42589_rv630_rrg_1.01o.pdf
This creates a PostScript file named '42589_rv630_rrg_1.01o.ps
'. We don't need any fonts, thats the reason for the flags.
Download two Perl scripts I wrote: parse ati pdf, and ati pdf to html. To run parse_ati_pdf you'll need these Perl modules: Data::Dump, List::Util and List::MoreUtils.
Run like this:
perl parse_ati_pdf > ati_pdf_data
perl ati_pdf_to_html ati_pdf_data > output.html
Result
Further work
It is left as an exercise for the reader to modifier the parser to work with the other PDF.