Project

Profile

Help

Support #4638

Output xsl3 transform to UTF-16LE with BOM

Added by ofer benoliel about 1 month ago. Updated about 1 month ago.

Status:
New
Priority:
Normal
Category:
C++ API
Start date:
2020-07-09
Due date:
% Done:

0%

Estimated time:
Found in version:
1.2.1

Description

Hi all, I'm testing Saxon/C with C++ (version PE). While transform without xsl:output create file encoded with UTF-8, trying to output to UTF-16LE (windows) with <xsl:output encoding="utf-16le" byte-order-mark="yes"/> create output encoded UTF-16 without BOM. What do I do wrong? My XML and XSL file encoded utf-16le with BOM. Using "utf-16" create utf-16be. Thank you

History

#1 Updated by ofer benoliel about 1 month ago

Saxon/C version 1.2.1

#2 Updated by Michael Kay about 1 month ago

First thing to do is to check that it's actually Saxon doing both (a) the serialization, and (b) the encoding. That's going to depend on which APIs you are using to invoke the transformation. If the API returns the result as a string rather than a byte stream, then Saxon has no control over the encoding.

#3 Updated by ofer benoliel about 1 month ago

If I transform to string and save to file, its OK. If I used Tranform.exe command line tool or Saxon/C API its not working : proc->setInitialMatchSelectionAsFile(xmlFile); proc->compileFromFile(xslFile); proc->setGlobalContextFromFile(xmlFile); proc->setOutputFile(outFile); proc->applyTemplatesReturningFile(NULL, outFile);

#4 Updated by O'Neil Delpratt about 1 month ago

  • Found in version changed from 9.8,9.9 to 1.2.1

Hi Ofer,

Thanks for reporting this issue. This may be similar to the bug issue #4467.

Just to add to Mike's post in comment 2. Encoding/decoding can become confusing if the text supplied into Saxon is not of the format that the user in the users intended encoding. What we found in the C++ code we use the JNI function GetStringUTFChars which I think is the culprit of corruption in the encoding to UTF-8 of the string returned from the transformation.

We are confident that the transformToString() returns a UTF-8 encoded string, you can then convert the encoding to what you desire as in this case UTF-16

But I am a little confused by comment 3. you state the following "transform to string and save to file, its OK" Please can you send me a reproducible of this working? Such as C++ code, XSLT stylesheet, etc. Either privately or on this bug issue if the content is not confidential.

I am surprised the Transform.exe does not work. Hopefully I can try this out at my end.

If your situation is the same as bug issue #4467 then I would think using the methods that save the transformation to file should work. i.e. transformToFile or as in your case applyTemplatesReturningFile.

Please register to edit this issue

Also available in: Atom PDF