StAXSource and Saxon => 11 with indented XML issues (using Camel)

Added by Jonas N 2 months ago

Hi! I'm testing Saxon through Camel, and in latest Camel 3.20.x release they updated Saxon-HE to version 11.4 (previous it used 9.9 or something like that).

Environment: Spring Boot 2.7.7, Camel 3.20.1 (Saxon-HE 11.4) and Red Hat Java 11.

After upgrading, I started to see alot of errors in different tests, which seems to be related to Saxon when the XML input is indented (pretty printed).

java.lang.ArrayIndexOutOfBoundsException: Index 8192 out of bounds for length 8192
    at net.sf.saxon.str.StringTool.compress( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.str.CompressedWhitespace.compressWS( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.str.StringTool.compress( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.pull.StaxBridge.getStringValue( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.pull.PullPushTee.copyEvent( ~[Saxon-HE-11.4.jar:na]
    at ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.pull.PullConsumer.consume( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.pull.PullPushCopier.copy( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.pull.PullSource.deliver( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.pull.ActiveStAXSource.deliver( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.event.Sender.send( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.Controller.makeSourceTree( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.s9api.XsltTransformer.transform( ~[Saxon-HE-11.4.jar:na]
    at net.sf.saxon.jaxp.TransformerImpl.transform( ~[Saxon-HE-11.4.jar:na]
    at org.apache.camel.component.xslt.XsltBuilder.process( ~[camel-xslt-3.20.1.jar:3.20.1]
    at ~[camel-support-3.20.1.jar:3.20.1]
    at org.apache.camel.component.xslt.XsltEndpoint.onExchange( ~[camel-xslt-3.20.1.jar:3.20.1]

Depending on the XMLInputFactory in use, error message and behaviour is slightly different.

Using default I get "Index 8192 out of bounds for length 8192" and most indented files seems to fail if they are larger than around 4500 bytes. This seems pretty easy to reproduce.

Changing to com.ctc.wstx.stax.WstxInputFactory (Woodstox 6.4.0) I get "Index 4000 out of bounds for length 4000" instead (exact same error otherwise).

With Woodstox it's possible, but harder, to reproduce. I think it might depend on what position whitespace (tabs or spaces) appear in the file if it works or not, but I'm not sure about that.

I noticed that Camel by default is using StAXSource, which seems to be whats the root cause of the issue.

As default, Camel seems to be doing this before sending the payload to Saxon:

o.a.camel.component.xslt.XsltBuilder: Using javax.xml.transform.stax.StAXSource as source

Forcing Camel to disable StAXSource does "fix" the problem, Camel will log this instead and the indented XML file will go through.

o.a.camel.component.xslt.XsltBuilder: Using as source

I also tried myself to switch between javax.xml.transform.Source and javax.xml.transform.stax.StAXSource, and using the same input XML StAXSource would fail when using Source didn't.

I tried different versions of Saxon-HE and anything below 11 seems fine, if I use Saxon-HE 10.8 I don't see any issues. But all versions after that (even Saxon-HE 12.0) seems to do do this.

Not sure exactly where this issue belongs, but it would be interesting to hear if anyone else have come across this or if you have any idea what could be causing it?

Replies (2)

RE: StAXSource and Saxon => 11 with indented XML issues (using Camel) - Added by Michael Kay 2 months ago

Thanks for reporting it.

Logged here

Please watch the issue to follow this up.


    Please register to reply