Project

Profile

Help

Bug #2533

closed

Character Duplication during Serialization

Added by Nick Nunes over 8 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
JAXP Java API
Sprint/Milestone:
-
Start date:
2015-12-07
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

Hi,

We've seen this bug a few times over the years and were finally able to isolate it. When various Unicode characters from higher planes show up in attributes, during serialization they will be duplicated. Since our pipeline serializes multiple times, we always encounter this as exponentially ballooning file sizes. In the attached example the specific character is U+1D6A4 "MATHEMATICAL ITALIC SMALL DOTLESS I". In the file input.xml, it appears twice. When run with a basic identity transform, the output will contain the character three times.

I've been able to replicate this in multiple versions of Saxon, as far back as 8.9 EE and as recent as 9.7.0.1J PE. Interestingly, I am not able to duplicate it in Oxygen 16.1.

If and when this is fixed, if we could get a maintenance release of 9.5 (the version we use in our processing pipeline) it would be very helpful.

Thank you for your assistance.


Files

CharacterDuplicationBug.zip (845 Bytes) CharacterDuplicationBug.zip Nick Nunes, 2015-12-07 20:17

Please register to edit this issue

Also available in: Atom PDF