Project

Profile

Help

Bug #6182

open

UTF-8 in string based C API functions

Added by Omar Siam about 1 year ago. Updated 11 months ago.

Status:
In Progress
Priority:
Normal
Category:
Saxon-C Internals
Start date:
2023-08-22
Due date:
% Done:

90%

Estimated time:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Found in version:
12.3
Fixed in version:
SaxonC Languages:
SaxonC Platforms:
SaxonC Architecture:

Description

I tried to get the following code running and the encoding of the return value seems off.

void testUTF8StringTemplate(SaxonProcessor *proc, Xslt30Processor *trans,
                         sResultCount *sresult) {

  const char *source =
      "<?xml version='1.0' encoding='UTF8'?>  <xsl:stylesheet "
      "xmlns:xsl='http://www.w3.org/1999/XSL/Transform'  "
      "xmlns:xs='http://www.w3.org/2001/XMLSchema'  version='3.0'>  "
      "<xsl:template match='*'>     <xsl:sequence select='&apos;تيست&apos;'/>  </xsl:template>  </xsl:stylesheet>";
  cout << endl << "Test:testUTF8StringTemplate" << endl;
  XsltExecutable *executable = trans->compileFromString(source);
  if (executable == nullptr) {
    if (trans->exceptionOccurred()) {
      cout << "Error: " << trans->getErrorMessage() << endl;
    }
    return;
  }
  const char* _in = "<?xml version='1.0' encoding='UTF8'?><e>تيست</e>";
  XdmNode *node = proc->parseXmlFromString(_in);
  executable->setResultAsRawValue(false);
  std::map<std::string, XdmValue *> parameterValues;

  executable->setInitialTemplateParameters(parameterValues, false);
  executable->setInitialMatchSelection(node);
  XdmValue *result = executable->applyTemplatesReturningValue();
  if (result != nullptr) {
    sresult->success++;
    cout << "Input=" << _in;
    cout << "Result=" << result->getHead()->getStringValue() << endl << node->toString() << endl;
    delete result;
  } else {
    sresult->failure++;
    sresult->failureList.push_back("testUTF8StringTemplate");
  }
  delete executable;
  delete node;
  parameterValues.clear();
}
Compiled with VS 2017:
cl /utf-8 /EHsc "-I%graalvmdir%"  testXSLT30.cpp ../../Saxon.C.API/SaxonCGlue.c ../../Saxon.C.API/SaxonCXPath.c  ../../Saxon.C.API/SaxonProcessor.cpp ../../Saxon.C.API/XdmValue.cpp ../../Saxon.C.API/XdmItem.cpp ../../Saxon.C.API/XdmAtomicValue.cpp ../../Saxon.C.API/DocumentBuilder.cpp ../../Saxon.C.API/XdmNode.cpp ../../Saxon.C.API/XdmFunctionItem.cpp ../../Saxon.C.API/XdmArray.cpp ../../Saxon.C.API/XdmMap.cpp ../../Saxon.C.API/SaxonApiException.cpp ../../Saxon.C.API/XQueryProcessor.cpp ../../Saxon.C.API/Xslt30Processor.cpp ../../Saxon.C.API/XsltExecutable.cpp ../../Saxon.C.API/XPathProcessor.cpp ../../Saxon.C.API/SchemaValidator.cpp /link ..\..\libs\win\libsaxon-hec-12.3.lib

Result in an UTF-8 enabled powershell window:

Test:testUTF8StringTemplate
Input=<?xml version='1.0' encoding='UTF8'?><e>تيست</e>Result=تيست
<e>تيست</e>

Any ideas how to fix this? It seems the UTF-8 string is encoded twice.

Please register to edit this issue

Also available in: Atom PDF