https://saxonica.plan.io/https://saxonica.plan.io/favicon.ico2020-09-01T17:01:37ZSaxonica Developer CommunitySaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=161492020-09-01T17:01:37ZMartin Honnenmartin.honnen@gmx.de
<ul></ul><p>With Node the code runs fine and doesn't give a parse error.</p> SaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=161502020-09-01T17:38:26ZMichael Kaymike@saxonica.com
<ul></ul><p>In the browser, Saxon uses the browser-supplied XML parser and these have many limitations. We don't know exactly what the limitations are, but we have no way of fixing them other than switching to use our own XML parser.</p>
<p>On Node.js, Saxon-JS uses a modified version of the SAX2 open-source parser. Some of the modifications we've made are to fix non-conformances like this.</p>
<p>Looking at the code, NodeJSPlatform.js linie 232 has</p>
<pre><code> if (/^.+<\?xml/i.test(str)) {
// throw new Error();
}
</code></pre>
<p>which is pretty useless; more intelligently, line 115 has</p>
<pre><code> parser["onprocessinginstruction"] = function (tag) {
if (tag.name !== "xml") {
appendElement(document.createProcessingInstruction(tag.name, tag.body));
}
};
</code></pre>
<p>which is Saxon adding a check at application level that should have been done by the parser. But it's still not right, because (a) the check should be case-blind, and (b) we should error the construct rather than ignoring it. There are many little details like this where the XML parsing technology on the Javascript platform is grossly inadequate.</p> SaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=161512020-09-01T17:45:09ZMichael Kaymike@saxonica.com
<ul></ul><p>In fact, BrowserPlatform.js line 443 has the same check, without the commenting out:</p>
<pre><code> if (/^.+<\?xml/i.test(str)) {
throw new Error();
}
</code></pre>
<p>and this is leading to the error you observe. It's a half-hearted and incorrect attempt to fix a bug in the browser that we're not in a position to fix properly.</p>
<p>If we had access to a fully-functional and conformant XML parser written in JS, then I think we'd be offering a choice so you can decide whether to use that in place of the browser-vendor's parser. But we don't, sadly.</p> SaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=161532020-09-01T18:29:39ZMartin Honnenmartin.honnen@gmx.de
<ul></ul><p>Michael Kay wrote:</p>
<blockquote>
<p>In the browser, Saxon uses the browser-supplied XML parser and these have many limitations. We don't know exactly what the limitations are, but we have no way of fixing them other than switching to use our own XML parser.</p>
</blockquote>
<p>I understand but</p>
<pre><code class="javascript syntaxhl" data-language="javascript"><span class="k">new</span> <span class="nx">DOMParser</span><span class="p">().</span><span class="nx">parseFromString</span><span class="p">(</span><span class="s2">`<root><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
<a/>
<a/>]]></root>`</span><span class="p">,</span> <span class="dl">'</span><span class="s1">application/xml</span><span class="dl">'</span><span class="p">)</span>
</code></pre>
<p>with the browser's XML parser does work.</p>
<p>I guess the check <code>if (/^.+<\?xml/i.test(str))</code> is in there to fix some lack of strictness of the parsers in browsers allowing content before an XML declaration.</p>
<p>I can't really judge which issue is more important so feel free to close this as not fixable.</p> SaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=163182020-09-16T11:49:31ZDebbie Lockettdebbie@saxonica.com
<ul><li><strong>Assignee</strong> set to <i>Debbie Lockett</i></li></ul> SaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=172962021-02-11T12:30:57ZCommunity Adminsupport@saxonica.com
<ul><li><strong>Fix Committed on JS Branch</strong> <i>2</i> added</li><li><strong>Fix Committed on JS Branch</strong> deleted (<del><i>2.0</i></del>)</li></ul> SaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=173212021-02-11T13:05:41ZDebbie Lockettdebbie@saxonica.com
<ul><li><strong>Applies to JS Branch</strong> <i>2</i> added</li><li><strong>Fix Committed on JS Branch</strong> deleted (<del><i>2</i></del>)</li></ul> SaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=210032022-06-15T14:25:25ZNorm Tovey-Walsh
<ul><li><strong>Sprint/Milestone</strong> set to <i>SaxonJS 3.0</i></li></ul> SaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=216982022-09-02T12:03:28ZDebbie Lockettdebbie@saxonica.com
<ul></ul><p>Returning to this after it has been forgotten about for some time. It does look like it would be worthwhile tidying up this code.</p>
<p>I assumed the <code>/^.+<\?xml/i.test(str)</code> check in BrowserPlatform (which produces the observed error) was there to ensure other tests passed, but it looks like we can simply remove this check without causing test failures. Being liberal here is probably the better way to go.</p>
<p>We suspect that the check actually originated from trying to handle XML parsing on the Node.js side; but was replaced there once the <code>sax2parse</code> function handled this in a better way, with the <code>parser["onprocessinginstruction"]</code> code:</p>
<pre><code class="javascript syntaxhl" data-language="javascript"> <span class="nx">parser</span><span class="p">[</span><span class="dl">"</span><span class="s2">onprocessinginstruction</span><span class="dl">"</span><span class="p">]</span> <span class="o">=</span> <span class="kd">function</span> <span class="p">(</span><span class="nx">tag</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">tag</span><span class="p">.</span><span class="nx">name</span> <span class="o">!==</span> <span class="dl">"</span><span class="s2">xml</span><span class="dl">"</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">appendElement</span><span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">createProcessingInstruction</span><span class="p">(</span><span class="nx">tag</span><span class="p">.</span><span class="nx">name</span><span class="p">,</span> <span class="nx">tag</span><span class="p">.</span><span class="nx">body</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">};</span>
</code></pre>
<p>Above, Mike had suggested improving this code:</p>
<blockquote>
<p>(a) the check should be case-blind, and (b) we should error the construct rather than ignoring it</p>
</blockquote>
<p>For (a), it seems that <code>tag.name</code> is always lower-case, so I don't think there's anything to do.
For (b), I propose adding:</p>
<pre><code class="javascript syntaxhl" data-language="javascript"> <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="nb">document</span><span class="p">.</span><span class="nx">hasChildNodes</span><span class="p">())</span> <span class="p">{</span>
<span class="c1">// Check for misplaced XML/text declarations</span>
<span class="k">throw</span> <span class="k">new</span> <span class="nx">XError</span><span class="p">(</span><span class="dl">"</span><span class="s2">XML parse error - misplaced XML declaration</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">FODC0006</span><span class="dl">"</span><span class="p">);</span>
<span class="p">}</span>
</code></pre>
<p>We only want to produce an error for misplaced XML declarations; I believe we should continue to ignore them otherwise.</p> SaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=216992022-09-02T12:33:11ZDebbie Lockettdebbie@saxonica.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li><li><strong>Sprint/Milestone</strong> changed from <i>SaxonJS 3.0</i> to <i>SaxonJS 2.5</i></li><li><strong>Applies to JS Branch</strong> <i>Trunk</i> added</li><li><strong>Fix Committed on JS Branch</strong> <i>2, Trunk</i> added</li></ul><p>Code changes (in BrowserPlatform.js and NodeJSPlatform.js) committed on saxonjs2 and main branches. Browser and nodejs unit tests (iss4704) added.</p> SaxonJS - Bug #4704: parse-xml fails in browser on string that has XML declaration in CDATA sectionhttps://saxonica.plan.io/issues/4704?journal_id=219352022-10-04T09:53:32ZNorm Tovey-Walsh
<ul><li><strong>Status</strong> changed from <i>Resolved</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li><li><strong>Fixed in JS Release</strong> set to <i>SaxonJS 2.5</i></li></ul><p>Fixed in SaxonJS 2.5.</p>