Project

Profile

Help

Support #4638

closed

Output xsl3 transform to UTF-16LE with BOM

Added by ofer benoliel almost 4 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Category:
C++ API
Start date:
2020-07-09
Due date:
% Done:

0%

Estimated time:
Found in version:
1.2.1
Platforms:

Description

Hi all, I'm testing Saxon/C with C++ (version PE). While transform without xsl:output create file encoded with UTF-8, trying to output to UTF-16LE (windows) with <xsl:output encoding="utf-16le" byte-order-mark="yes"/> create output encoded UTF-16 without BOM. What do I do wrong? My XML and XSL file encoded utf-16le with BOM. Using "utf-16" create utf-16be. Thank you


Related issues

Related to SaxonC - Bug #4467: transformToString() Encoding issueClosedO'Neil Delpratt2020-02-26

Actions
Actions #1

Updated by ofer benoliel almost 4 years ago

Saxon/C version 1.2.1

Actions #2

Updated by Michael Kay almost 4 years ago

First thing to do is to check that it's actually Saxon doing both (a) the serialization, and (b) the encoding. That's going to depend on which APIs you are using to invoke the transformation. If the API returns the result as a string rather than a byte stream, then Saxon has no control over the encoding.

Actions #3

Updated by ofer benoliel almost 4 years ago

If I transform to string and save to file, its OK. If I used Tranform.exe command line tool or Saxon/C API its not working : proc->setInitialMatchSelectionAsFile(xmlFile); proc->compileFromFile(xslFile); proc->setGlobalContextFromFile(xmlFile); proc->setOutputFile(outFile); proc->applyTemplatesReturningFile(NULL, outFile);

Actions #4

Updated by O'Neil Delpratt almost 4 years ago

  • Found in version changed from 9.8,9.9 to 1.2.1

Hi Ofer,

Thanks for reporting this issue. This may be similar to the bug issue #4467.

Just to add to Mike's post in comment 2. Encoding/decoding can become confusing if the text supplied into Saxon is not of the format that the user in the users intended encoding. What we found in the C++ code we use the JNI function GetStringUTFChars which I think is the culprit of corruption in the encoding to UTF-8 of the string returned from the transformation.

We are confident that the transformToString() returns a UTF-8 encoded string, you can then convert the encoding to what you desire as in this case UTF-16

But I am a little confused by comment 3. you state the following "transform to string and save to file, its OK" Please can you send me a reproducible of this working? Such as C++ code, XSLT stylesheet, etc. Either privately or on this bug issue if the content is not confidential.

I am surprised the Transform.exe does not work. Hopefully I can try this out at my end.

If your situation is the same as bug issue #4467 then I would think using the methods that save the transformation to file should work. i.e. transformToFile or as in your case applyTemplatesReturningFile.

Actions #5

Updated by O'Neil Delpratt about 2 years ago

  • Related to Bug #4467: transformToString() Encoding issue added
Actions #6

Updated by O'Neil Delpratt about 2 years ago

  • Status changed from New to Closed

I am closing this bug issue as it looks similar to bug issue #4467 which was resolved in SaxonC 11. Feel free to reopen this bug issue if problem occurs in SaxonC 11

Please register to edit this issue

Also available in: Atom PDF