Project

Profile

Help

Support #4638

closed

Output xsl3 transform to UTF-16LE with BOM

Added by ofer benoliel about 4 years ago. Updated over 2 years ago.

Status:
Closed
Priority:
Normal
Category:
C++ API
Start date:
2020-07-09
Due date:
% Done:

0%

Estimated time:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Found in version:
1.2.1
SaxonC Languages:
SaxonC Platforms:
SaxonC Architecture:

Description

Hi all, I'm testing Saxon/C with C++ (version PE). While transform without xsl:output create file encoded with UTF-8, trying to output to UTF-16LE (windows) with <xsl:output encoding="utf-16le" byte-order-mark="yes"/> create output encoded UTF-16 without BOM. What do I do wrong? My XML and XSL file encoded utf-16le with BOM. Using "utf-16" create utf-16be. Thank you


Related issues

Related to SaxonC - Bug #4467: transformToString() Encoding issueClosedO'Neil Delpratt2020-02-26

Actions
Actions #1

Updated by ofer benoliel about 4 years ago

Saxon/C version 1.2.1

Actions #2

Updated by Michael Kay about 4 years ago

First thing to do is to check that it's actually Saxon doing both (a) the serialization, and (b) the encoding. That's going to depend on which APIs you are using to invoke the transformation. If the API returns the result as a string rather than a byte stream, then Saxon has no control over the encoding.

Actions #3

Updated by ofer benoliel about 4 years ago

If I transform to string and save to file, its OK. If I used Tranform.exe command line tool or Saxon/C API its not working : proc->setInitialMatchSelectionAsFile(xmlFile); proc->compileFromFile(xslFile); proc->setGlobalContextFromFile(xmlFile); proc->setOutputFile(outFile); proc->applyTemplatesReturningFile(NULL, outFile);

Actions #4

Updated by O'Neil Delpratt about 4 years ago

  • Found in version changed from 9.8,9.9 to 1.2.1

Hi Ofer,

Thanks for reporting this issue. This may be similar to the bug issue #4467.

Just to add to Mike's post in comment 2. Encoding/decoding can become confusing if the text supplied into Saxon is not of the format that the user in the users intended encoding. What we found in the C++ code we use the JNI function GetStringUTFChars which I think is the culprit of corruption in the encoding to UTF-8 of the string returned from the transformation.

We are confident that the transformToString() returns a UTF-8 encoded string, you can then convert the encoding to what you desire as in this case UTF-16

But I am a little confused by comment 3. you state the following "transform to string and save to file, its OK" Please can you send me a reproducible of this working? Such as C++ code, XSLT stylesheet, etc. Either privately or on this bug issue if the content is not confidential.

I am surprised the Transform.exe does not work. Hopefully I can try this out at my end.

If your situation is the same as bug issue #4467 then I would think using the methods that save the transformation to file should work. i.e. transformToFile or as in your case applyTemplatesReturningFile.

Actions #5

Updated by O'Neil Delpratt over 2 years ago

  • Related to Bug #4467: transformToString() Encoding issue added
Actions #6

Updated by O'Neil Delpratt over 2 years ago

  • Status changed from New to Closed

I am closing this bug issue as it looks similar to bug issue #4467 which was resolved in SaxonC 11. Feel free to reopen this bug issue if problem occurs in SaxonC 11

Please register to edit this issue

Also available in: Atom PDF