Project

Profile

Help

Bug #4467

closed

transformToString() Encoding issue

Added by O'Neil Delpratt over 2 years ago. Updated 6 months ago.

Status:
Closed
Priority:
Normal
Category:
Saxon-C Internals
Start date:
2020-02-26
Due date:
% Done:

100%

Estimated time:
Found in version:
1.2.1
Fixed in version:
11.1
Platforms:

Description

Issue reported by Saxon/C user in the PHP extension:

transformToString() => encoding issue with ISO-8859-1 specific characters (output utf-8?).

The string from the transformation comes back in the ISO-8859-1 encoding but is being decoded as UTF in the C++ code by the JNI function GetStringUTFChars.

I also noticed that the NewStringUTF potentially can cause encoding issues too.


Related issues

Related to SaxonC - Support #4638: Output xsl3 transform to UTF-16LE with BOMClosedO'Neil Delpratt2020-07-09

Actions
Actions #1

Updated by O'Neil Delpratt over 2 years ago

So it looks like the JNI function GetStringUTFChars successfully encodes the jstring to UTF-8 char array. But we have the meta element:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"

Then the browser will try to render the document as ISO-8859-1, when in fact the transformtToString has encoded the string to UTF-8.

Actions #2

Updated by O'Neil Delpratt 7 months ago

  • Status changed from In Progress to Resolved

We have redesigned the handling of string encoding in SaxonC 11.

Actions #3

Updated by O'Neil Delpratt 6 months ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in version set to 11.1

Bug fix patched in SaxonC 11.1 release

Actions #4

Updated by O'Neil Delpratt 5 months ago

  • Related to Support #4638: Output xsl3 transform to UTF-16LE with BOM added

Please register to edit this issue

Also available in: Atom PDF