Here's what the input file contains in hex, according to net.sf.saxon.functions.UnparsedText:
3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f 64 69 6e
< ? x m l v e r s i o n = " 1 . 0 " e n c o d i n
67 3d 22 55 54 46 2d 38 22 3f 3e 3c 6e 61 6d 65 20 73 6f 72 74 61 62 6c 65 3d 22
g = " U T F - 8 " ? > < n a m e s o r t a b l e = "
f0 9d 9a a4 f0 9d 9a a4 22 2f 3e
ð ¤ ð ¤ " / >
So yes, the sequence (f0 9d 9a a4) appears twice in the value of the attribute.
When I transform this with:
Saxon-EE 9.7.0.1J from Saxonica
Java version 1.6.0_27
Generating byte code...
Stylesheet compilation time: 355.856ms
Processing file:/Users/mike/bugs/2015/nunes/input.xml
Using parser org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser
I get a file whose content is identical:
3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f 64 69 6e
< ? x m l v e r s i o n = " 1 . 0 " e n c o d i n
67 3d 22 55 54 46 2d 38 22 3f 3e 3c 6e 61 6d 65 20 73 6f 72 74 61 62 6c 65 3d 22
g = " U T F - 8 " ? > < n a m e s o r t a b l e = "
f0 9d 9a a4 f0 9d 9a a4 22 2f 3e
ð ¤ ð ¤ " / >
If I remove the Apache parser from the classpath and use the JDK parser:
Saxon-EE 9.7.0.1J from Saxonica
Java version 1.6.0_27
Stylesheet compilation time: 300.808ms
Processing file:/Users/mike/bugs/2015/nunes/input.xml
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
I get this output file:
3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f 64 69 6e
< ? x m l v e r s i o n = " 1 . 0 " e n c o d i n
67 3d 22 55 54 46 2d 38 22 3f 3e 3c 6e 61 6d 65 20 73 6f 72 74 61 62 6c 65 3d 22
g = " U T F - 8 " ? > < n a m e s o r t a b l e = "
f0 9d 9a a4 f0 9d 9a a4 f0 9d 9a a4 22 2f 3e
ð ¤ ð ¤ ð ¤ " / >
which contains the character 3 times.
So yes, it's the old JDK parser bug, I'm afraid.
I ran it with Java 8:
Saxon-EE 9.7.0.1J from Saxonica
Java version 1.8.0_25
Stylesheet compilation time: 361.745129ms
Processing file:/Users/mike/bugs/2015/nunes/input.xml
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
and it seems the bug is still there:
3c 3f 78 6d 6c 20 76 65 72 73 69 6f 6e 3d 22 31 2e 30 22 20 65 6e 63 6f 64 69 6e
< ? x m l v e r s i o n = " 1 . 0 " e n c o d i n
67 3d 22 55 54 46 2d 38 22 3f 3e 3c 6e 61 6d 65 20 73 6f 72 74 61 62 6c 65 3d 22
g = " U T F - 8 " ? > < n a m e s o r t a b l e = "
f0 9d 9a a4 f0 9d 9a a4 f0 9d 9a a4 22 2f 3e
ð ¤ ð ¤ ð ¤ " / >
Just use Apache Xerces!