http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Charter
Release Info

Installation
Download
Bug-Reporting

FAQs
Samples
API JavaDoc

Features
Properties

XNI Manual
XML Schema
SAX
DOM
Limitations

Source Repository
User Mail Archive
Dev Mail Archive

Re-using Xerces2 Parser Components
 

The Xerces Native Interface (XNI) defines a general way to build parser components and configurations. The Xerces2 reference implementation is written using this framework so that the parser components can be re-used to create new parser configurations. In order to re-use the Xerces2 parser components, however, the developer must know the dependencies of each standard component. This document provides an overview of the Xerces2 parser components and lists the relevent dependencies.

An overview of the general dependencies and the dependencies for each standard component are detailed below:


Overview
 

The standard parser configuration for the Xerces2 reference implementation of XNI is defined in the org.apache.xerces.parsers.StandardParserConfiguration class. This configuration is comprised of a number of components. Some of these components are configurable and some are shared within the configuration but do not implement the XMLComponent interface.

The following list details the set of components used in the Xerces2 standard configuration. The components marked with an asterisk (*) are configurable.

  • Symbol Table
  • Error Reporter (*)
  • Document Scanner (*)
  • DTD Scanner (*)
  • Entity Manager (*)
  • DTD Validator (*)
  • Namespace Binder (*)
  • Schema Validator (*)
Note: There are additional components other than those in the above list, such as the "Grammar Pool" and "Datatype Validator Factory". However, the validation engine in the Xerces2 reference implementation is currently being re-designed and re-implemented. Therefore, these components are subject to change and should not be used or relied upon.

In general, there are levels of dependency among the components in the standard configuration. Some components are required by all of the configurable components, where as there are certain components required by other components. The following diagram illustrates these basic levels of dependency.

Xerces2 Component Dependencies

The dependencies of each component are detailed in subsequent sections of this document but the basic dependencies are listed below:

Each configurable component queries the components that it depends on before each document is parsed. The configuration is required to call the XMLComponent's "reset" method. From the XMLComponentManager object that is passed to the "reset" method, the component can query the other components that it needs. Therefore, each component is assigned a unique property identifier used to query the components from the component manager.

The following example source code shows how one of the standard Xerces2 components is queried within a configurable component. However, for complete dependency details and the property identifiers defined for each component, refer to the appropriate sections of this document.

import org.apache.xerces.xni.parser.XMLComponent;
import org.apache.xerces.xni.parser.XMLComponentManager;
import org.apache.xerces.xni.parser.XMLConfigurationException;

public class MyComponent
    implements XMLComponent {
    
    // Constants
    
    public static final String SYMBOL_TABLE =
        "http://apache.org/xml/properties/internal/symbol-table";

    // XMLComponent methods
    
    public void reset(XMLComponentManager manager)
        throws XMLConfigurationException {
        SymbolTable symbolTable = 
	    (SymbolTable)manager.getProperty(SYMBOL_TABLE);
    }

}

Symbol Table
 

Property information:

Property Id  http://apache.org/xml/properties/internal/symbol-table 
Type  org.apache.xerces.util.SymbolTable 

For performance reasons, the Xerces2 reference implementation uses a custom symbol table in order to re-use common strings that appear in the document. The symbol table is responsible for keeping track of these common strings and always return the same java.lang.String reference for lexically equivalent strings. This not only reduces the amount of unique objects created while parsing, it also allows components (e.g. the validators, etc) to perform comparisons directly on the references for certain string objects without having to call the "equals" method.

Note: Nearly all of the standard components depend on this component. Therefore, if you write a parser configuration that re-uses any of the standard components, you must have an instance of this component registered with the appropriate property identifier.


Error Reporter
 

Property information:

Property Id  http://apache.org/xml/properties/internal/error-reporter 
Type  org.apache.xerces.impl.XMLErrorReporter 

Recognized features:

  • http://apache.org/xml/features/continue-after-fatal-error

Recognized properties:

  • http://apache.org/xml/properties/internal/error-handler

In any parser instance, there must be a way for components to report errors in a uniform way. The "Error Reporter" component serves this purpose and simplifies the process of localizing error messages and notifying the registered XMLErrorHandler.

In general, errors are identified by the domain of the error and a unique key within that domain. The XMLErrorReporter class allows message formatters to be set for each domain and then delegates the formatting of error messages (with replacement text) to the message formatter assigned to that error domain. The localized error message is then sent to the registered error handler.

An error message formatter is any class that implements the org.apache.xerces.util.MessageFormatter interface. If you write a new parser component for use with the existing Xerces2 components, you should implement your own message formatter and register it with the Error Reporter. For example:

import java.util.Locale;
import java.util.MissingResourceException;
import org.apache.xerces.util.MessageFormatter;

public class MyFormatter
    implements MessageFormatter {

    // MessageFormatter methods
    
    public String formatMessage(Locale locale, String key, Object[] args)
        throws MissingResourceException {
        // localize and format message based on locale, key, 
	// and replacement text arguments
	return "MY ERROR ("+key+")";
    }

}
import org.apache.xerces.impl.XMLErrorReporter;
import org.apache.xerces.xni.parser.XMLComponent;
import org.apache.xerces.xni.parser.XMLComponentManager;
import org.apache.xerces.xni.parser.XMLConfigurationException;

public class MyComponent
    implements XMLComponent {
    
    // Constants
    
    public static final String ERROR_REPORTER =
        "http://apache.org/xml/properties/internal/error-reporter";

    public static final String DOMAIN = "http://example.com/mydomain";

    // XMLComponent methods
    
    public void reset(XMLComponentManager manager)
        throws XMLConfigurationException {
        XMLErrorReporter reporter = 
	    (XMLErrorReporter)manager.getProperty(ERROR_REPORTER);
	if (reporter.getMessageFormatter(DOMAIN) == null) {
	    reporter.putMesssageFormatter(DOMAIN, new MyFormatter());
	}
    }

}

Note: It is strongly encouraged that any new error domains that you create follow the standard URI syntax. While there is no requirement that the URI must point to an actual resource on the Internet, it is a common way to separate domains and it provides more useful information to the application.

Note: Nearly all of the standard components depend on this component. Therefore, if you write a parser configuration that re-uses any of the standard components, you must have an instance of this component registered with the appropriate property identifier.


Document Scanner
 

Property information:

Property Id  http://apache.org/xml/properties/internal/document-scanner 
Type  org.apache.xerces.xni.parser.XMLDocumentScanner 

Required properties:

  • http://apache.org/xml/properties/internal/symbol-table
  • http://apache.org/xml/properties/internal/error-reporter
  • http://apache.org/xml/properties/internal/entity-manager
  • http://apache.org/xml/properties/internal/dtd-scanner

Recognized features:

  • http://xml.org/sax/features/namespaces
  • http://xml.org/sax/features/validation
  • http://apache.org/xml/features/nonvalidating/load-external-dtd
  • http://apache.org/xml/features/scanner/notify-char-refs
  • http://apache.org/xml/features/scanner/notify-builtin-refs

The org.apache.xerces.impl.XMLDocumentScannerImpl class implements the XNI document scanner interface and is implemented so that it can also function as a "pull" parser. A pull parser allows the application to drive the parsing of the document instead of having all of the document events "pushed" to the registered handlers.


DTD Scanner
 

Property information:

Property Id  http://apache.org/xml/properties/internal/dtd-scanner 
Type  org.apache.xerces.xni.parser.XMLDTDScanner 

Required properties:

  • http://apache.org/xml/properties/internal/symbol-table
  • http://apache.org/xml/properties/internal/error-reporter
  • http://apache.org/xml/properties/internal/entity-manager

Recognized features:

  • http://xml.org/sax/features/validation
  • http://apache.org/xml/features/scanner/notify-char-refs

The org.apache.xerces.impl.XMLDTDScannerImpl class implements the XNI DTD scanner interface and is implemented so that it can also function as a "pull" parser. A pull parser allows the application to drive the parsing of the DTD instead of having all of the DTD events "pushed" to the registered handlers.


Entity Manager
 

Property information:

Property Id  http://apache.org/xml/properties/internal/entity-manager 
Type  org.apache.xerces.impl.XMLEntityManager 

Required properties:

  • http://apache.org/xml/properties/internal/symbol-table
  • http://apache.org/xml/properties/internal/error-reporter

Recognized features:

  • http://xml.org/sax/features/validation
  • http://xml.org/sax/features/external-general-entities
  • http://xml.org/sax/features/external-parameter-entities
  • http://apache.org/xml/features/allow-java-encodings

Recognized properties:

  • http://apache.org/xml/properties/entity-resolver

Both the Document Scanner and the DTD Scanner depend on the Entity Manager. This component handles starting and stopping entities automatically so that the scanners can continue operation transparently even when entities go in and out of scope.

The Entity Manager implements an Entity Scanner which is a low-level scanner for document and DTD information. Because the document and DTD scanners interact only with the Entity Scanner to scan the document, the scanners are shielded from changes caused by starting and stopping entities. Changes in the entities being scanned happens transparently within the Manager/Scanner combination but the scanner components are notified of the start and end of the entity by implementing the XMLEntityHandler interface that is only part of the Xerces2 reference implementation.


DTD Validator
 

Property information:

Property Id  http://apache.org/xml/properties/internal/validator/dtd 
Type  org.apache.xerces.impl.dtd.XMLDTDValidator 

Required properties:

  • http://apache.org/xml/properties/internal/symbol-table
  • http://apache.org/xml/properties/internal/error-reporter

Recognized features:

  • http://xml.org/sax/features/namespaces
  • http://xml.org/sax/features/validation
  • http://apache.org/xml/features/validation/dynamic

The DTD Validator performs validation of the document events that it receives which may augment the streaming information set with default attribute values and normalizing attribute values.


Namespace Binder
 

Property information:

Property Id  http://apache.org/xml/properties/internal/namespace-binder 
Type  org.apache.xerces.impl.XMLNamespaceBinder 

Required properties:

  • http://apache.org/xml/properties/internal/symbol-table
  • http://apache.org/xml/properties/internal/error-reporter

Recognized features:

  • http://xml.org/sax/features/namespaces

The Namespace Binder is responsible for detecting namespace bindings in the startElement/emptyElement methods and emitting appropriate start and end prefix mapping events. Namespace binding should always occur after DTD validation (since namespace bindings may have been defaulted from an attribute declaration in the DTD) and before Schema validation.


Schema Validator
 

Property information:

Property Id  http://apache.org/xml/properties/internal/validator/schema 
Type  org.apache.xerces.impl.xs.XMLSchemaValidator 

Required properties:

  • http://apache.org/xml/properties/internal/symbol-table
  • http://apache.org/xml/properties/internal/error-reporter

Recognized features:

  • http://xml.org/sax/features/namespaces
  • http://xml.org/sax/features/validation
  • http://apache.org/xml/features/validation/dynamic

The Schema Validator performs validation of the document events that it receives which may augment the streaming information set with default simple type values and normalizing simple type values.



Copyright © 1999-2005 The Apache Software Foundation. All Rights Reserved.