Project

Profile

Help

Bug #4704

parse-xml fails in browser on string that has XML declaration in CDATA section

Added by Martin Honnen 24 days ago. Updated 9 days ago.

Status:
New
Priority:
Normal
Category:
XPath Conformance
Sprint/Milestone:
-
Start date:
2020-09-01
Due date:
% Done:

0%

Estimated time:
Applies to JS Branch:
Fix Committed on JS Branch:
2.0
Fixed in JS Release:
SEF Generated with:
Company:
-
Contact person:
-
Additional contact persons:
-

Description

With Saxon JS 2 in the browser (tested with Chrome 84), the JavaScript code

SaxonJS.XPath.evaluate(`parse-xml($xml)`, [], { params : { 'xml' : `<root><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<a/>
<a/>]]></root>` }})

fails for me with

SaxonJS2.rt.js:548 Uncaught q {message: "Misplaced or malformed XML", stack: "Error↵    at new q (http://saxonica.com/saxon-js/d…S/SaxonJS2.rt.js:789:494)↵    at <anonymous>:1:15", name: "XError", code: "FODC0006"}

I think the parsing should work without giving any error.

History

#1 Updated by Martin Honnen 24 days ago

With Node the code runs fine and doesn't give a parse error.

#2 Updated by Michael Kay 24 days ago

In the browser, Saxon uses the browser-supplied XML parser and these have many limitations. We don't know exactly what the limitations are, but we have no way of fixing them other than switching to use our own XML parser.

On Node.js, Saxon-JS uses a modified version of the SAX2 open-source parser. Some of the modifications we've made are to fix non-conformances like this.

Looking at the code, NodeJSPlatform.js linie 232 has

                if (/^.+<\?xml/i.test(str)) {
                    //  throw new Error();
                }

which is pretty useless; more intelligently, line 115 has

       parser["onprocessinginstruction"] = function (tag) {
            if (tag.name !== "xml") {
                appendElement(document.createProcessingInstruction(tag.name, tag.body));
            }
        };

which is Saxon adding a check at application level that should have been done by the parser. But it's still not right, because (a) the check should be case-blind, and (b) we should error the construct rather than ignoring it. There are many little details like this where the XML parsing technology on the Javascript platform is grossly inadequate.

#3 Updated by Michael Kay 24 days ago

In fact, BrowserPlatform.js line 443 has the same check, without the commenting out:

                if (/^.+<\?xml/i.test(str)) {
                    throw new Error();
                }

and this is leading to the error you observe. It's a half-hearted and incorrect attempt to fix a bug in the browser that we're not in a position to fix properly.

If we had access to a fully-functional and conformant XML parser written in JS, then I think we'd be offering a choice so you can decide whether to use that in place of the browser-vendor's parser. But we don't, sadly.

#4 Updated by Martin Honnen 24 days ago

Michael Kay wrote:

In the browser, Saxon uses the browser-supplied XML parser and these have many limitations. We don't know exactly what the limitations are, but we have no way of fixing them other than switching to use our own XML parser.

I understand but

new DOMParser().parseFromString(`<root><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<a/>
<a/>]]></root>`, 'application/xml')

with the browser's XML parser does work.

I guess the check if (/^.+<\?xml/i.test(str)) is in there to fix some lack of strictness of the parsers in browsers allowing content before an XML declaration.

I can't really judge which issue is more important so feel free to close this as not fixable.

#5 Updated by Debbie Lockett 9 days ago

  • Assignee set to Debbie Lockett

Please register to edit this issue

Also available in: Atom PDF Tracking page