Bug #6596: Failure upgrading to Axiom 1.4.0 - CDATA nodes - Saxon - Saxonica Developer Community

Actions

Send by e-mail Copy link

Bug #6596

closed

Failure upgrading to Axiom 1.4.0 - CDATA nodes

Added by Norm Tovey-Walsh 28 days ago. Updated 25 days ago.

Status:

Resolved

Priority:

Normal

Assignee:

Michael Kay

Category:

Saxon extensions

Sprint/Milestone:

Start date:

2024-11-25

Due date:

% Done:

Estimated time:

Legacy ID:

Applies to branch:

12, trunk

Fix Committed on Branch:

Fixed in Maintenance Release:

Platforms:

Java

Description

As I said in Slack

I was trying to chase up a maven failure in a shell script. I think org.apache.ws.commons.axiom:axiom:{version} doesn't exist. I don't know why the build doesn't fail. I think that should be org.apache.ws.commons.axiom:axomi-api:{version}. Also, we're loading 1.2.x and the latest is 1.4.0. I'm going to bump the dependencies.

We don't distribute these jars, so the risk is smaller, but we should document the versions we test against.

Actions

Copy link

Updated by Norm Tovey-Walsh 28 days ago

Casual experiments with upgrading to 1.4.0 were unsuccessful. It will still build if we reduce the dependencies to only axiom-dom and axiom-impl, but 1.4.0 appears to introduce a new node type for CDATA. Attempting to treat CDATA as text is only partially successful as the resulting text nodes don't get merged.

Actions

Copy link

Updated by Norm Tovey-Walsh 28 days ago

Looking at the JDOM2 interface, I see that managing adjacent text nodes is spread across a few different methods. It's not immediately clear if the Axiom model can be approached in the same way.

Actions

Copy link

Updated by Norm Tovey-Walsh 28 days ago

Curiously, 1.2.15 has a OMNode.CDATA_SECTION_NODE but doesn't use it? Or maybe the Axiom API has methods for merging adjacent nodes and those have changed in 1.4.0?

Actions

Copy link

Updated by Norm Tovey-Walsh 28 days ago

According to the Axiom docs,

Preserving CDATA sections during parsing

By default, StAXUtils creates StAX parsers in coaelescing mode. In this mode, the parser will never return two character data events in sequence, while in non coaelescing mode, the parser is allowed to break up character data into smaller chunks and to return multiple consecutive character events, which may improve throughput for documents containing large text nodes. It should be noted that StAXUtils overrides the default settings mandated by the StAX specification, which specifies that by default, a StAX parser must be in non coalescing mode. The primary reason is compatibility: older versions of Woodstox had coalescing switched on by default.

A side effect of the default settings chosen by Axiom is that by default, CDATA sections are not reported by parser created by StAXUtils. The reason is that in coalescing mode, the parser will not only coaelsce adjacent text nodes, but also CDATA sections. Applications that require correct reporting of CDATA sections should therefore disable coalescing. This can be achieved by creating a XMLInputFactory.properties file with the following content:

javax.xml.stream.isCoalescing=false

But using System.setProperty to change values of javax.xml.stream.isCoalescing doesn't seem to have any effect in either version of the API.

Actions

Copy link

Updated by Michael Kay 26 days ago

Subject changed from How is axiom used? to Failure upgrading to Axiom 1.4.0 - CDATA nodes
Category set to Saxon extensions
Assignee set to Michael Kay
Priority changed from Low to Normal
Applies to branch 12, trunk added
Platforms Java added

Actions

Copy link

Updated by Michael Kay 26 days ago

Changed AxiomLeafNodeWrapper to treat a CDATA node on the child axis in the same way as a text node (ie, wrapping it in a NodeInfo of type text).

Changed AxiomTreeTests to set mergesAdjacentTextNodes to false, so the test no longer expects adjacent text nodes to be merged (the Axiom tree wrapper, like the DOM4J tree wrapper, makes no attempt to do this)

Actions

Copy link