Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA section - SaxonJS - Saxonica Developer Community

Actions

Send by e-mail Copy link

Bug #4704

closed

parse-xml fails in browser on string that has XML declaration in CDATA section

Added by Martin Honnen over 4 years ago. Updated over 2 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Debbie Lockett

Category:

XPath Conformance

Sprint/Milestone:

SaxonJS 2.5

Start date:

2020-09-01

Due date:

% Done:

100%

Estimated time:

Applies to JS Branch:

2, Trunk

Fix Committed on JS Branch:

2, Trunk

Fixed in JS Release:

SaxonJS 2.5

SEF Generated with:

Platforms:

Company:

Contact person:

Additional contact persons:

Description

With Saxon JS 2 in the browser (tested with Chrome 84), the JavaScript code

SaxonJS.XPath.evaluate(`parse-xml($xml)`, [], { params : { 'xml' : `<root><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<a/>
<a/>]]></root>` }})

fails for me with

SaxonJS2.rt.js:548 Uncaught q {message: "Misplaced or malformed XML", stack: "Error↵    at new q (http://saxonica.com/saxon-js/d…S/SaxonJS2.rt.js:789:494)↵    at <anonymous>:1:15", name: "XError", code: "FODC0006"}

I think the parsing should work without giving any error.

Actions

Copy link

Updated by Martin Honnen over 4 years ago

With Node the code runs fine and doesn't give a parse error.

Actions

Copy link

Updated by Michael Kay over 4 years ago

In the browser, Saxon uses the browser-supplied XML parser and these have many limitations. We don't know exactly what the limitations are, but we have no way of fixing them other than switching to use our own XML parser.

On Node.js, Saxon-JS uses a modified version of the SAX2 open-source parser. Some of the modifications we've made are to fix non-conformances like this.

Looking at the code, NodeJSPlatform.js linie 232 has

                if (/^.+<\?xml/i.test(str)) {
                    //  throw new Error();
                }

which is pretty useless; more intelligently, line 115 has

       parser["onprocessinginstruction"] = function (tag) {
            if (tag.name !== "xml") {
                appendElement(document.createProcessingInstruction(tag.name, tag.body));
            }
        };

which is Saxon adding a check at application level that should have been done by the parser. But it's still not right, because (a) the check should be case-blind, and (b) we should error the construct rather than ignoring it. There are many little details like this where the XML parsing technology on the Javascript platform is grossly inadequate.

Actions

Copy link

Updated by Michael Kay over 4 years ago

In fact, BrowserPlatform.js line 443 has the same check, without the commenting out:

                if (/^.+<\?xml/i.test(str)) {
                    throw new Error();
                }

and this is leading to the error you observe. It's a half-hearted and incorrect attempt to fix a bug in the browser that we're not in a position to fix properly.

If we had access to a fully-functional and conformant XML parser written in JS, then I think we'd be offering a choice so you can decide whether to use that in place of the browser-vendor's parser. But we don't, sadly.

Actions

Copy link

Updated by Martin Honnen over 4 years ago

Michael Kay wrote:

In the browser, Saxon uses the browser-supplied XML parser and these have many limitations. We don't know exactly what the limitations are, but we have no way of fixing them other than switching to use our own XML parser.

I understand but

new DOMParser().parseFromString(`<root><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<a/>
<a/>]]></root>`, 'application/xml')

with the browser's XML parser does work.

I guess the check if (/^.+<\?xml/i.test(str)) is in there to fix some lack of strictness of the parsers in browsers allowing content before an XML declaration.

I can't really judge which issue is more important so feel free to close this as not fixable.

Actions

Copy link

Updated by Debbie Lockett over 4 years ago

Assignee set to Debbie Lockett

Actions

Copy link

Updated by Community Admin almost 4 years ago

Fix Committed on JS Branch 2 added
Fix Committed on JS Branch deleted (~~2.0~~)

Actions

Copy link

Updated by Debbie Lockett almost 4 years ago

Applies to JS Branch 2 added
Fix Committed on JS Branch deleted (2)

Actions

Copy link

Updated by Norm Tovey-Walsh over 2 years ago

Sprint/Milestone set to SaxonJS 3.0

Actions

Copy link

Updated by Debbie Lockett over 2 years ago

Returning to this after it has been forgotten about for some time. It does look like it would be worthwhile tidying up this code.

I assumed the /^.+<\?xml/i.test(str) check in BrowserPlatform (which produces the observed error) was there to ensure other tests passed, but it looks like we can simply remove this check without causing test failures. Being liberal here is probably the better way to go.

We suspect that the check actually originated from trying to handle XML parsing on the Node.js side; but was replaced there once the sax2parse function handled this in a better way, with the parser["onprocessinginstruction"] code:

       parser["onprocessinginstruction"] = function (tag) {
            if (tag.name !== "xml") {
                appendElement(document.createProcessingInstruction(tag.name, tag.body));
            }
        };

Above, Mike had suggested improving this code:

(a) the check should be case-blind, and (b) we should error the construct rather than ignoring it

For (a), it seems that tag.name is always lower-case, so I don't think there's anything to do. For (b), I propose adding:

 else if (document.hasChildNodes()) {
                // Check for misplaced XML/text declarations
                throw new XError("XML parse error - misplaced XML declaration", "FODC0006");
            }

We only want to produce an error for misplaced XML declarations; I believe we should continue to ignore them otherwise.

Actions

Copy link

#10

Updated by Debbie Lockett over 2 years ago

Status changed from New to Resolved
Sprint/Milestone changed from SaxonJS 3.0 to SaxonJS 2.5
Applies to JS Branch Trunk added
Fix Committed on JS Branch 2, Trunk added

Code changes (in BrowserPlatform.js and NodeJSPlatform.js) committed on saxonjs2 and main branches. Browser and nodejs unit tests (iss4704) added.

Actions

Copy link

#11

Updated by Norm Tovey-Walsh over 2 years ago

Status changed from Resolved to Closed
% Done changed from 0 to 100
Fixed in JS Release set to SaxonJS 2.5

Fixed in SaxonJS 2.5.

Please register to edit this issue

Actions

Send by e-mail Copy link

Also available in: Atom PDF Tracking page

Project

Profile

Help

Saxon » SaxonJS

Planio Inbox

Bug #4704

parse-xml fails in browser on string that has XML declaration in CDATA section

Updated by Martin Honnen over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Michael Kay over 4 years ago

Updated by Martin Honnen over 4 years ago

Updated by Debbie Lockett over 4 years ago

Updated by Community Admin almost 4 years ago

Updated by Debbie Lockett almost 4 years ago

Updated by Norm Tovey-Walsh over 2 years ago

Updated by Debbie Lockett over 2 years ago

Updated by Debbie Lockett over 2 years ago

Updated by Norm Tovey-Walsh over 2 years ago