Installing an XSL-FO processor

This section describes how to install and use the free XSL-FO processor, FOP. The commercial processors are assumed to provide their own documentation and support, so installation instructions for commercial processors are not provided in this book.

Installing FOP

FOP is also a Java program, so it is easy to install, especially if you already are using Java programs such as Saxon or Xalan.

  1. Update your Java

    Since FOP requires a Java runtime environment, you might need to obtain or update your Java setup before FOP will work. See the step on updating Java in the section “Installing Saxon”.

  2. Download FOP

    To download FOP, go to http://xml.apache.org and locate the latest stable version for download. You probably won't want the latest experimental version. The distribution comes as a compressed tar file with everything you need. That site will also provide you with detailed instructions for getting started with FOP.

  3. Unpack the archive

    FOP is distributed as a gzipped tar file (.tar.gz suffix), so you need to use gzip to uncompress the downloaded file. If your computer doesn't have a program that handles gzip files, you can download one from http://www.gzip.org. Then you use the tar command to extract the files to a temporary location:

    tar xvf fop.0.20.5-bin.tar

    On Windows, you may need to locate a utility that can unpack tar files, such as PowerArchiver 2000 (which also handles gzip files).

  4. Locate the FOP .jar files

    To run FOP, you only need to tell your Java processor where the FOP .jar files are. The main file you need is build/fop.jar in the directory you unpacked FOP into. The lib directory has other .jar files that may be used by the FOP convenience scripts. The version numbers shown here may differ from the ones in your distribution.

    avalon-framework-cvs-20020806.jar

    A software framework that allows software components to work together. It is used internally by FOP.

    batik.jar

    Provides the support library for SVG graphics.

    xalan-2.4.1.jar

    The Xalan XSLT processor that may be used by the FOP convenience scripts. The scripts have an option to convert your XML to XSL-FO using Xalan, and then process the XSL-FO, all with one command.

    xercesImpl-2.2.1.jar

    The XML parser used to parse the XSL-FO file.

    xml-apis.jar

    Provides the SAX, DOM, and JAVAX interfaces used by Xalan.

  5. Download the graphics library files

    You will most likely want to process bitmap graphics in your document. FOP has built-in support for some graphics formats, but some popular formats such as PNG are not supported natively. To process other graphics formats, FOP version 0.20.5 and later supports the use of Sun's Java Advanced Imaging (JAI) library, although it doesn't include the files. You can download the JAI files from http://java.sun.com/products/java-media/jai/current.html (you don't need the Image IO Tools download). If you do the CLASSPATH installation, you can put the files wherever you like. The easiest way to get JAI included is to copy the jai_core.jar and the jai_codec.jar files from the JAI installation area to the lib subdirectory of the FOP 0.20.5 installation. Then they will automatically be included in the CLASSPATH for FOP processing.

    If you are using a version of FOP prior to 0.20.5, then you must use the Jimi graphics library instead. The readme file in the lib subdirectory of your FOP distribution describes from where to download the Jimi file. It is currently http://java.sun.com/products/jimi/. Unpack the zip file and locate JimiProClasses.jar, currently in examples/AppletDemo/JimiProClasses.jar in the distribution. You'll need to include that file in your CLASSPATH, so you might want to copy it to your FOP lib subdirectory where it will automatically be included in the CLASSPATH when you use the FOP convenience scripts described later.

Using FOP

FOP will convert a .fo file generated by one of the above processors into a .pdf file. FOP is a Java application, so to use the FOP Java command line, you need to set the CLASSPATH environment variable as described in the section “Setting CLASSPATH manually”. However, if you use one of the FOP convenience scripts, they will set the CLASSPATH for the duration of the script.

Before you run the FOP command, you need to process your DocBook file with the fo/docbook.xsl stylesheet to generate a .fo file. The .fo file is the input to the FOP processor. The stylesheet will tune the XSL-FO output for FOP when you set the stylesheet parameter fop.extensions to 1. Here is an example using xsltproc:

xsltproc  \
    --output myfile.fo  
    --stringparam fop.extensions 1  \
    docbook-xsl/fo/docbook.xsl  \
    myfile.xml

See Chapter 5, Using stylesheet parameters for more information on using stylesheet parameters.

Fop convenience scripts

The FOP distribution includes some convenience scripts that set the CLASSPATH for you and run the Java command. Which script you use depends on the operating system: fop.sh for Linux or Unix, or fop.bat for Windows. The scripts can optionally run the XSLT process on your XML source file to produce the XSL-FO file before generating PDF. That may save you a step, but you won't be able to set any stylesheet parameters when you do that. Here are some examples of using the scripts:

Convert a .fo file on Unix or Linux:
fop.sh -fo myfile.fo -pdf myfile.pdf

Convert an XML source file Unix or Linux:
fop.sh -xsl /docbook-xsl/fo/docbook.xsl -xml myfile.xml -pdf myfile.pdf

Convert a .fo file on Windows:
fop.bat -fo myfile.fo -pdf myfile.pdf

Convert an XML source file on Windows:
fop.bat -xsl /docbook-xsl/fo/docbook.xsl -xml myfile.xml -pdf myfile.pdf

All of the arguments to the command are in the form of options, and they can be presented in any order. The options for FOP are listed at http://xml.apache.org/fop/running.html. One option you won't find is the ability to set DocBook stylesheet parameters on the command line when you use the -xsl option that processes the stylesheet. If you need to use parameters, you should use a separate XSLT processor first to generate the XSL-FO file for FOP to process.

Setting CLASSPATH manually

You may want to set your CLASSPATH yourself to run the FOP Java command. See the section “Installing FOP” for information on what files need to included in the CLASSPATH. The safest approach is to include everything in the lib directory of the FOP distribution as well as build/fop.jar. The example below assumes the FOP .jar files are installed into /usr/java. Replace any version strings in the example below with the actual version numbers on the files in your FOP distribution.

Setting CLASSPATH:
CLASSPATH="/usr/java/fop-0.20.5/build/fop.jar:\
/usr/java/fop-0.20.5/lib/batik.jar:\
/usr/java/fop-0.20.5/xalan-version.jar:\
/usr/java/fop-0.20.5/lib/xercesImpl-version.jar:\
/usr/java/fop-0.20.5/lib/JimiProClasses.jar\
/usr/java/avalon-framework-cvs-version.jar" 
export CLASSPATH

General syntax:
java  org.apache.fop.apps.Fop  [options]  \
    [-fo|-xml] infile  \
    [-xsl stylesheet-path]   \
    -pdf  outfile.pdf

Convert a .fo file to pdf:
 java  org.apache.fop.apps.Fop  \
    -fo  myfile.fo  \
    -pdf myfile.pdf 

Convert an XML source file directly to pdf:
 java  org.apache.fop.apps.Fop  \
    -xml myfile.xml  \
    -xsl docbook-xsl/fo/docbook.xsl  \
    -pdf myfile.pdf

This form of the command takes the same set of options as the FOP convenience scripts.

FOP java.lang.OutOfMemoryError

Depending on the memory configuration of your machine, your FOP process may fail on large documents with a java.lang.OutOfMemoryError. It may be that your system is not allocating enough memory to the Java Virtual Machine. You can increase the memory allocation by adding a -Xmx option to any Java command. You can make the change permanent by adding it in the FOP convenience script, such as fop.bat:

java -Xmx128m -cp "%LOCALCLASSPATH%" org.apache.fop.apps.Fop %1 %2 \
    %3 %4 %5 %6 %7 %8

In this example, the memory allocation is 128 MB. The value you use should be less than the installed memory on the system, and should leave enough memory for other processes that may be running.

Using other XSL-FO processors

The number of XSL-FO processors is growing. Most of them are commercial products, but they are in serious competition on price and features, which benefits the user community. They also differ in the features they offer. Here is a quick description of some of the features:

  • Some products like Antenna House's XSL Formatter provide a graphical interface that previews the formatted output.

  • Some products provide a command line interface or convenience script. These are useful for automated batch processing of many documents, so you don't have to open them one at a time in a graphical interface.

  • Some provide a programming API, so that you can incorporate the XSL-FO processing into larger applications.

  • Some provide extension elements and processing instructions to enable features that are not covered in the XSL-FO 1.0 standard. Many of those extensions will appear in the emerging XSL-FO 1.1 standard.

  • Some products can generate multiple output types, such as PDF and PostScript.

Because these products are undergoing rapid development, and because they provide their own documentation and support, this book will not provide general instructions on how to use them. But the DocBook XSL stylesheets include support for some of the extensions provided by a few of the processors, and those will be described in this book.

Processor extensions

As of the current writing, the DocBook stylesheets support extensions in RenderX's XEP and Antenna House's XSL Formatter products. When the extensions for one of these processors is turned on, extra code is written by the stylesheet into the XSL-FO file. That extra code is understood only by a specific processor, so this feature is controlled by stylesheet parameters.

If you are using XEP, then set the xep.extensions parameter to 1. If you are using Antenna House's product, then set the axf.extensions parameter to 1. You should never turn on the extensions for a processor you aren't using, or you will likely get a lot of error messages from the XSL-FO processor that doesn't understand the extra code.

Not all extension functions in each product are used by the DocBook stylesheets. If you find in their documentation an extension you want to use, you can write a customization layer that implements an extension.

Here are the XSL-FO processor extensions that the stylesheets currently implement:

  • PDF bookmarks. When you open a PDF file in a PDF reader, the left window pane may show a table of contents. Those links are PDF bookmarks inserted into the PDF file by the stylesheet using the processor's extension elements. In XEP, the extension element is rx:bookmark. In Antenna House, the extension element is axf:outline-level.

  • PDF document information. When you view a PDF file's document properties in the reader, it may show title, author, subject, and keywords information. That information is inserted by the stylesheet as extension elements in the XSL-FO file. In XEP, the extension element is rx:meta-info. In Antenna House, the extension element is axf:document-info.

  • Index cleanup. The XSL-FO 1.0 standard has no way of specifying how page numbers in a book's index should be cleaned up. The cleanup process entails removing duplicate page numbers on an entry, and converting a sequence of consecutive numbers to a page range. This produces a more usable index. In XEP, the extension element is rx:page-index. In Antenna House, the extension is an attribute named axf:suppress-duplicate-page-number.