The index page number problems described in the previous section cannot be solved by the DocBook XSL stylesheets because the page number for a given indexterm
is not known in the XSLT step. Text is placed on pages by the XSL-FO processor, which does not necessarily recognize that text is an index entry. Also, there are no properties in the XSL-FO standard to consolidate page ranges.
Some FO processors such as XEP and Antenna House have extension functions that can be used to fix up index page numbers. The DocBook XSL stylesheets output these indexing extensions if the xep.extensions
parameter or the axf.extensions
parameter, respectively, is set to 1. The FOP processor
does not yet have
such extensions.
For FOP, one solution to this problem is to extract page number information from the PDF output file, and then use that to fix up the FO file. This method is described briefly on the reference page for the make.index.markup
parameter. The following is a summary of the
steps.
You need a utility named pstotext
to extract information from PDF files. It is available packaged in an RPM for Linux from http://rpmfind.net.
Process your document containing an empty <index/>
element with the fo/docbook.xsl
stylesheet with the make.index.markup
parameter set to 1. That will generate the index but
will insert it as XML markup in the FO file. For example:
xsltproc -o mybook.fo \ --stringparam make.index.markup 1 \ fo/docbook.xsl mybook.xml
Convert the FO file to PDF using your favorite XSL-FO processor.
Execute this Perl script on your PDF file and save the output to a file:
fo/pdf2index mybook.pdf > myindex.xml
The content of that myindex.xml
is an index marked up with DocBook index elements, with page information inserted as well.
Replace the empty <index/>
element in your document with the contents of this generated file. You can do it with a system entity or XInclude.
Process your document again with fo/docbook.xsl
and your favorite XSL-FO processor, this time omitting the make.index.markup
parameter.
The result of this process is a PDF file for your document that contains an index with page numbers properly collapsed. Duplicate numbers should be removed, and sequences of consecutive pages should appear as page ranges.
DocBook XSL: The Complete Guide - 3rd Edition | PDF version available | Copyright © 2002-2005 Sagehill Enterprises |