Bug #424: Serialization of japanese characters corrupts XML - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Bug #424

closed

Serialization of japanese characters corrupts XML

Added by Anonymous about 20 years ago. Updated over 12 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Category:

Serialization

Sprint/Milestone:

Start date:

Due date:

% Done:

Estimated time:

Legacy ID:

sf-966759

Applies to branch:

Fix Committed on Branch:

Fixed in Maintenance Release:

Platforms:

Description

SourceForge user: kcritz

I am constructing a DOM from JAVA which includes

japanese characters. When I try to serialize this DOM,

the "<" character of a close-tag after certain japanese

text is not written properly. Also, the text itself

is not written properly.

I have attached several files which demonstrate the issue:

A simplified java test file
A screenshot of the japanese section of the file
An example of the result file
A screenshot of the result file

Interestingly enough, the result file is parseable by

Xerces, though JADE has trouble reading it.

Am I doing something wrong in my serialization, or is

this a legit bug in SAXON?

Files

Download all files

EncodingTest.java (9.56 KB) EncodingTest.java		Anonymous, 2004-06-04 19:35
EncodingTest.java.png (9.56 KB) EncodingTest.java.png		Anonymous, 2004-06-04 19:39
EncodingTestResult.xml (9.56 KB) EncodingTestResult.xml		Anonymous, 2004-06-04 19:40
EncodingTestResult.xml.png (9.56 KB) EncodingTestResult.xml.png		Anonymous, 2004-06-04 19:40

Actions

Copy link

Updated by Anonymous about 20 years ago

SourceForge user: kcritz

Logged In: YES

user_id=189759

Using SAXON 6.5.3, if you're interested

Actions

Copy link

Updated by Anonymous about 20 years ago

SourceForge user: mhkay

Logged In: YES

user_id=251681

PLEASE do not enter suspected bugs in this area of the site

until they have been confirmed. There is a bright yellow

notice asking you not to do this on the "Submit New" page -

I fail to see how people can fail to see this.

I want people to be able to browse the bugs area knowing

that it only contains real bugs.

I'm afraid I can't see what's wrong with the output. It

appears to be correctly encoded UTF-8, and is a well-formed

XML file. I can't tell whether the output is correct,

because I don;t know what the encoding used in your Java

source file is - it doesn't appear to be UTF-8, as far as I

can see.

I am closing this bug because you raised it in the wrong

place. Please use the saxon-help list or forum.

Michael Kay

Please register to edit this issue

Actions

Send by e-mail Copy link

Also available in: Atom PDF

Project

Profile

Help

Saxon

Bug #424

Serialization of japanese characters corrupts XML

Updated by Anonymous about 20 years ago

Updated by Anonymous about 20 years ago