Project

Profile

Help

Bug #4704

closed

parse-xml fails in browser on string that has XML declaration in CDATA section

Added by Martin Honnen about 2 years ago. Updated 26 days ago.

Status:
Resolved
Priority:
Normal
Category:
XPath Conformance
Sprint/Milestone:
Start date:
2020-09-01
Due date:
% Done:

0%

Estimated time:
Applies to JS Branch:
2, Trunk
Fix Committed on JS Branch:
2, Trunk
Fixed in JS Release:
SEF Generated with:
Platforms:
Company:
-
Contact person:
-
Additional contact persons:
-

Description

With Saxon JS 2 in the browser (tested with Chrome 84), the JavaScript code

SaxonJS.XPath.evaluate(`parse-xml($xml)`, [], { params : { 'xml' : `<root><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<a/>
<a/>]]></root>` }})

fails for me with

SaxonJS2.rt.js:548 Uncaught q {message: "Misplaced or malformed XML", stack: "Error↵    at new q (http://saxonica.com/saxon-js/d…S/SaxonJS2.rt.js:789:494)↵    at <anonymous>:1:15", name: "XError", code: "FODC0006"}

I think the parsing should work without giving any error.

Actions #1

Updated by Martin Honnen about 2 years ago

With Node the code runs fine and doesn't give a parse error.

Actions #2

Updated by Michael Kay about 2 years ago

In the browser, Saxon uses the browser-supplied XML parser and these have many limitations. We don't know exactly what the limitations are, but we have no way of fixing them other than switching to use our own XML parser.

On Node.js, Saxon-JS uses a modified version of the SAX2 open-source parser. Some of the modifications we've made are to fix non-conformances like this.

Looking at the code, NodeJSPlatform.js linie 232 has

                if (/^.+<\?xml/i.test(str)) {
                    //  throw new Error();
                }

which is pretty useless; more intelligently, line 115 has

       parser["onprocessinginstruction"] = function (tag) {
            if (tag.name !== "xml") {
                appendElement(document.createProcessingInstruction(tag.name, tag.body));
            }
        };

which is Saxon adding a check at application level that should have been done by the parser. But it's still not right, because (a) the check should be case-blind, and (b) we should error the construct rather than ignoring it. There are many little details like this where the XML parsing technology on the Javascript platform is grossly inadequate.

Actions #3

Updated by Michael Kay about 2 years ago

In fact, BrowserPlatform.js line 443 has the same check, without the commenting out:

                if (/^.+<\?xml/i.test(str)) {
                    throw new Error();
                }

and this is leading to the error you observe. It's a half-hearted and incorrect attempt to fix a bug in the browser that we're not in a position to fix properly.

If we had access to a fully-functional and conformant XML parser written in JS, then I think we'd be offering a choice so you can decide whether to use that in place of the browser-vendor's parser. But we don't, sadly.

Actions #4

Updated by Martin Honnen about 2 years ago

Michael Kay wrote:

In the browser, Saxon uses the browser-supplied XML parser and these have many limitations. We don't know exactly what the limitations are, but we have no way of fixing them other than switching to use our own XML parser.

I understand but

new DOMParser().parseFromString(`<root><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<a/>
<a/>]]></root>`, 'application/xml')

with the browser's XML parser does work.

I guess the check if (/^.+<\?xml/i.test(str)) is in there to fix some lack of strictness of the parsers in browsers allowing content before an XML declaration.

I can't really judge which issue is more important so feel free to close this as not fixable.

Actions #5

Updated by Debbie Lockett about 2 years ago

  • Assignee set to Debbie Lockett
Actions #6

Updated by Community Admin over 1 year ago

  • Fix Committed on JS Branch 2 added
  • Fix Committed on JS Branch deleted (2.0)
Actions #7

Updated by Debbie Lockett over 1 year ago

  • Applies to JS Branch 2 added
  • Fix Committed on JS Branch deleted (2)
Actions #8

Updated by Norm Tovey-Walsh 4 months ago

  • Sprint/Milestone set to SaxonJS 3.0
Actions #9

Updated by Debbie Lockett 26 days ago

Returning to this after it has been forgotten about for some time. It does look like it would be worthwhile tidying up this code.

I assumed the /^.+<\?xml/i.test(str) check in BrowserPlatform (which produces the observed error) was there to ensure other tests passed, but it looks like we can simply remove this check without causing test failures. Being liberal here is probably the better way to go.

We suspect that the check actually originated from trying to handle XML parsing on the Node.js side; but was replaced there once the sax2parse function handled this in a better way, with the parser["onprocessinginstruction"] code:

       parser["onprocessinginstruction"] = function (tag) {
            if (tag.name !== "xml") {
                appendElement(document.createProcessingInstruction(tag.name, tag.body));
            }
        };

Above, Mike had suggested improving this code:

(a) the check should be case-blind, and (b) we should error the construct rather than ignoring it

For (a), it seems that tag.name is always lower-case, so I don't think there's anything to do. For (b), I propose adding:

 else if (document.hasChildNodes()) {
                // Check for misplaced XML/text declarations
                throw new XError("XML parse error - misplaced XML declaration", "FODC0006");
            }

We only want to produce an error for misplaced XML declarations; I believe we should continue to ignore them otherwise.

Actions #10

Updated by Debbie Lockett 26 days ago

  • Status changed from New to Resolved
  • Sprint/Milestone changed from SaxonJS 3.0 to SaxonJS 2.5
  • Applies to JS Branch Trunk added
  • Fix Committed on JS Branch 2, Trunk added

Code changes (in BrowserPlatform.js and NodeJSPlatform.js) committed on saxonjs2 and main branches. Browser and nodejs unit tests (iss4704) added.

Please register to edit this issue

Also available in: Atom PDF Tracking page