Project

Profile

Help

Bug #2534

closed

For some unicode characters, Saxon produces incorrect output when they are defined as XML entities in the source document

Added by Peter Ross almost 9 years ago. Updated almost 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Third-party product
Sprint/Milestone:
-
Start date:
2015-12-10
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

The affected unicode characters are rare Chinese characters.

If an affected character is defined as an XML entity in the source document, and is used inside an attribute, Saxon produces garbage output.

e.g. ...

Whereas if an affected character is defined inline using ampersand notation in the source document, Saxon produces correct output.

e.g.

Please use the attached files to reproduce the problem. Te XSLT performs a simple transformation of the source document.

% java -cp saxon9he.jar net.sf.saxon.Transform -s:inline.xml -xsl:test.xsl -o:out-inline.xml

% java -cp saxon9he.jar net.sf.saxon.Transform -s:entity.xml -xsl:test.xsl -o:out-entity.xml

% diff -u out-escape.xml out-entity.xml

I expect the two output files to be identical. And if I use a different xslt processor, such as libxsltproc, I do get identical output.

Below is snippet of the output to give you an idea of what is going on. It seems that the affected characters are duplicated in the output stage.

out-inline.xml

==============

...

out-entity.xml

==========

...


Files

test.xsl (438 Bytes) test.xsl Peter Ross, 2015-12-10 06:48
inline.xml (374 Bytes) inline.xml Peter Ross, 2015-12-10 06:48
entity.xml (743 Bytes) entity.xml Peter Ross, 2015-12-10 06:48

Please register to edit this issue

Also available in: Atom PDF