Project

Profile

Help

How to connect?
Download (2.1 KB) Statistics
| Branch: | Tag: | Revision:

he / latest8.9 / bj / net / sf / saxon / charcode / package.html @ 5c9d209e

1
<html>
2

    
3
<head>
4
<title>Package overview for net.sf.saxon.charcode</title>
5
</head>
6

    
7
<body>
8

    
9
<p>This package provides classes for handling different output character sets. </p>
10

    
11
<p>The sole
12
function of these classes is to determine whether a particular character is present in the
13
character set or not: if not, Saxon has to replace it with a character reference.</p>
14

    
15
<p>The actual translation of Unicode characters to characters in the selected encoding
16
is left to the Java run-time library. (Note that different versions of Java support
17
different sets of encodings, and there is no easy way to find out which encodings
18
are supported in a given installation).</p>
19

    
20
<p>It is possible to configure Saxon to support additional character sets by writing an
21
implementation of the PluggableCharacterSet interface, and registering this class as the
22
value of the system property whose name is given by the expression:</p>
23

    
24
<p><code>OutputKeys.ENCODING + "." + encoding</code></p>
25

    
26
<p>where "encoding" is the name of the encoding as used in &lt;xsl:output&gt; - for example,
27
iso-8859-10.</p>
28

    
29
<p>If an output encoding is requested that Saxon does not recognize, but which the Java
30
platform does recognize, then Saxon attempts to determine which characters the encoding
31
can represent, so that unsupported characters can be written as numeric character references.
32
Saxon uses two approaches to doing this. (The logic for this is in the
33
<code>CharacterSetFactory</code> class.) Where possible, it uses the <code>UnknownCharacterSet</code>
34
class, which tests the availability of individual characters using the Java interrogative
35
<code>encoding.canEncode()</code>. However, some encodings do not implement this method
36
reliably; Saxon attempts to detect this, and represents such encodings instead using the
37
<code>BuggyCharacterSet</code> class. This class attempts to encode each character, and relies
38
on catching an exception when it fails: expensive, but it only happens once for any given character.</p>
39

    
40

    
41

    
42
<hr>
43

    
44
<p align="center"><i>Michael H. Kay<br/>
45
Saxonica Limited<br/>
46
9 February 2005</i></p>
47
</body>
48
</html>
(24-24/24)