Project

Profile

Help

Bug #6372

open

Unable to parse Windows-1252 encoded XML files on Linux

Added by Matt Patterson about 2 months ago. Updated about 2 months ago.

Status:
In Progress
Priority:
Normal
Category:
-
Start date:
2024-03-15
Due date:
% Done:

0%

Estimated time:
Found in version:
12.4.2
Fixed in version:
Platforms:

Description

From the forum (https://saxonica.plan.io/boards/4/topics/9617):

working with version 12.4.2 on Linux and having this simplified C++ code to explain what I do:

  SaxonProcessor *processor = new SaxonProcessor(true);
  Xslt30Processor *trans = processor->newXslt30Processor();
  XsltExecutable *executable = executable = trans->compileFromFile("/tmp/test.xsl");
  executable->setInitialMatchSelectionAsFile("/tmp/file.xml");
  const char *output = executable->applyTemplatesReturningString();

My file.xml header is like this:

<?xml version="1.0" encoding="windows-1252" standalone="no"?>

I get the following exception running my program:

  SXXP0003  I/O error reported by XML parser processing
  file:///tmp/file1.xml. Caused by
  java.io.UnsupportedEncodingException: Cp1252

My Linux its locale is en_US.UTF-8. Using XML files with utf-8 or iso-8859-1 encodings all work fine.

The same program and input files with windows-1252 encoding on Windows work though. I face this problem only on Linux.


Files

saxonc12xmlparse-test1.py (379 Bytes) saxonc12xmlparse-test1.py Martin Honnen, 2024-03-15 16:14
windows-notepad-ansi-sample1.xml (131 Bytes) windows-notepad-ansi-sample1.xml Martin Honnen, 2024-03-15 16:14
sample1.xml (18 Bytes) sample1.xml Martin Honnen, 2024-03-15 16:14

Please register to edit this issue

Also available in: Atom PDF