|
| | | | What can I use instead of Xerces' HTML, XHTML, or XML serializers?
| | | | |
| |
If you want to achieve interoperability, you should not be using Xerces serialization code directly.
Instead, the JAXP Transformer API should be used to serialize HTML, XHTML, and SAX. The DOM Level 3 Load and Save API (or JAXP Transformer API) should be used to serialize DOM.
Using JAXP you can serialize HTML and XHTML as follows:
| | | |
// Create an "identity" transformer - copies input to output
Transformer t = TransformerFactory.newInstance().newTransformer();
// for "XHTML" serialization, use the output method "xml"
// and set publicId as shown
t.setOutputProperty(OutputKeys.METHOD, "xml");
t.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC,
"-//W3C//DTD XHTML 1.0 Transitional//EN");
t.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM,
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd");
// For "HTML" serialization, use
t.setOutputProperty(OutputKeys.METHOD, "html");
// Serialize DOM tree
t.transform(new DOMSource(doc), new StreamResult(System.out));
| | | | |
You can find more details about the future of Xerces' serializers in the archives.
| The HTML and XHTML serializers (org.apache.xml.serialize ) have been deprecated in the Xerces 2.6.2 release. We might deprecate XMLSerializer in a future release. |
|
| | | | What international encodings are supported by Xerces-J? | | | | |
| |
- UTF-8
- UTF-16 Big Endian and Little Endian
- UCS-2 (ISO-10646-UCS-2) Big Endian and Little Endian
- UCS-4 (ISO-10646-UCS-4) Big Endian and Little Endian
- IBM-1208
- ISO Latin-1 (ISO-8859-1)
-
ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech,
Hungarian, Polish, Romanian, Serbian (in Latin transcription),
Serbocroatian, Slovak, Slovenian, Upper and Lower Sorbian]
- ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]
- ISO Latin-4 (ISO-8859-4)
- ISO Latin Cyrillic (ISO-8859-5)
- ISO Latin Arabic (ISO-8859-6)
- ISO Latin Greek (ISO-8859-7)
- ISO Latin Hebrew (ISO-8859-8)
- ISO Latin-5 (ISO-8859-9) [Turkish]
- ISO Latin-7 (ISO-8859-13)
- ISO Latin-9 (ISO-8859-15)
- Extended Unix Code, packed for Japanese (euc-jp, eucjis)
- Japanese Shift JIS (shift-jis)
- Chinese (big5)
- Chinese for PRC (mixed 1/2 byte) (gb2312)
- Japanese ISO-2022-JP (iso-2022-jp)
- Cyrillic (koi8-r)
- Extended Unix Code, packed for Korean (euc-kr)
- Russian Unix, Cyrillic (koi8-r)
- Windows Thai (cp874)
- Latin 1 Windows (cp1252) (and all other cp125? encodings recognized by IANA)
- cp858
- EBCDIC encodings:
- EBCDIC US (ebcdic-cp-us)
- EBCDIC Canada (ebcdic-cp-ca)
- EBCDIC Netherland (ebcdic-cp-nl)
- EBCDIC Denmark (ebcdic-cp-dk)
- EBCDIC Norway (ebcdic-cp-no)
- EBCDIC Finland (ebcdic-cp-fi)
- EBCDIC Sweden (ebcdic-cp-se)
- EBCDIC Italy (ebcdic-cp-it)
- EBCDIC Spain, Latin America (ebcdic-cp-es)
- EBCDIC Great Britain (ebcdic-cp-gb)
- EBCDIC France (ebcdic-cp-fr)
- EBCDIC Hebrew (ebcdic-cp-he)
- EBCDIC Switzerland (ebcdic-cp-ch)
- EBCDIC Roece (ebcdic-cp-roece)
- EBCDIC Yugoslavia (ebcdic-cp-yu)
- EBCDIC Iceland (ebcdic-cp-is)
- EBCDIC Urdu (ebcdic-cp-ar2)
- Latin 0 EBCDIC
- EBCDIC Arabic (ebcdic-cp-ar1)
|
|