Project

Profile

Help

Bug #4467

transformToString() Encoding issue

Added by O'Neil Delpratt 6 months ago. Updated 6 months ago.

Status:
In Progress
Priority:
Normal
Category:
Saxon-C Internals
Start date:
2020-02-26
Due date:
% Done:

0%

Estimated time:
Found in version:
1.2.1
Fixed in version:

Description

Issue reported by Saxon/C user in the PHP extension:

transformToString() => encoding issue with ISO-8859-1 specific characters (output utf-8?).

The string from the transformation comes back in the ISO-8859-1 encoding but is being decoded as UTF in the C++ code by the JNI function GetStringUTFChars.

I also noticed that the NewStringUTF potentially can cause encoding issues too.

History

#1 Updated by O'Neil Delpratt 6 months ago

So it looks like the JNI function GetStringUTFChars successfully encodes the jstring to UTF-8 char array. But we have the meta element:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"

Then the browser will try to render the document as ISO-8859-1, when in fact the transformtToString has encoded the string to UTF-8.

Please register to edit this issue

Also available in: Atom PDF