Project

Profile

Help

selecting wrong values

Added by Mart Seedre over 8 years ago

I am using saxon9.6.0.7 from linux command line to process 150Mb XML file. net.sf.saxon.Transform -s:input.xml -xsl:template2_1.xsl -o:output.xml I'm selecting many nodes and one of them is date - <xsl:value-of select="billduedate"/>. Input file includes 34711 tags and they all hold value 2015-11-25. Somehow in the result file 42 tags hold non date value as: t>15-12-25 t015-12-25 ta15-12-25 tamou12-25 total12-25 unt5-12-25 voi5-12-25 All other 34669 tags have proper value 2015-12-25. At the time of running the transformation I have at least 4gigs of free memory.

Any idea what could be the problem?


Replies (9)

Please register to reply

RE: selecting wrong values - Added by Michael Kay over 8 years ago

As a first step, I would suggest running it with the Apache Xerces parser rather than the built-in JDK parser. You haven't said which version of the JDK you are using, but certainly in the past I've seen it deliver corrupt data like this (though usually from attribute values rather than text nodes).

If you add the -t option, Saxon will tell you which parser is in use. To use the Apache Xerces parser, it should be enough to have it on the classpath.

If that doesn't solve the problem I would want to run diagnostics by inserting a filter between the XML parser and Saxon and seeing what data is being passed. You could try that yourself if you want.

Michael Kay Saxonica

RE: selecting wrong values - Added by Mart Seedre over 8 years ago

Hi

here's the output with -t parameter:

Saxon-HE 9.6.0.7J from Saxonica Java version 1.7.0_55 Stylesheet compilation time: 397.672ms Processing file:/mart/xml/proper_swed.xml Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser Building tree for file:/mart/xml/proper_swed.xml using class net.sf.saxon.tree.tiny.TinyBuilder Tree built in 2.603799s (2603.799ms) Tree size: 6379706 nodes, 13208902 characters, 477710 attributes Writing to file:/mart/xml/swed_20151119_0.xml Writing to file:/mart/xml/swed_20151119_1.xml Writing to file:/mart/xml/swed_20151119_2.xml Writing to file:/mart/xml/swed_20151119_3.xml Writing to file:/mart/xml/swed_20151119_4.xml Execution time: 5.723637s (5723.637ms) Memory used: 259863784 NamePool contents: 60 entries in 56 chains. 6 URIs

Anyway, I kind of accidentally figured out what was the issue. In my xsl file I had indicated that output needs to be XML ver 1.1. After I changed it to XML ver 1.0 then no more corrupt data was populated.

RE: selecting wrong values - Added by Mart Seedre over 8 years ago

Mart Seedre wrote:

Hi

here's the output with -t parameter: Saxon-HE 9.6.0.7J from Saxonica Java version 1.7.0_55 Stylesheet compilation time: 397.672ms Processing file:/mart/xml/proper_swed.xml Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser Building tree for file:/mart/xml/proper_swed.xml using class net.sf.saxon.tree.tiny.TinyBuilder Tree built in 2.603799s (2603.799ms) Tree size: 6379706 nodes, 13208902 characters, 477710 attributes Writing to file:/mart/xml/swed_20151119_0.xml Writing to file:/mart/xml/swed_20151119_1.xml Writing to file:/mart/xml/swed_20151119_2.xml Writing to file:/mart/xml/swed_20151119_3.xml Writing to file:/mart/xml/swed_20151119_4.xml Execution time: 5.723637s (5723.637ms) Memory used: 259863784 NamePool contents: 60 entries in 56 chains. 6 URIs

Anyway, I kind of accidentally figured out what was the issue. In my xsl file I had indicated that output needs to be XML ver 1.1. After I changed it to XML ver 1.0 then no more corrupt data was populated.

RE: selecting wrong values - Added by Michael Kay over 8 years ago

I kind of accidentally figured out what was the issue

Well, you found an empirical solution. I don't think you found out what the problem was... It would be nice to know. If you can construct a repro, I would certainly like to investigate it further.

RE: selecting wrong values - Added by Mart Seedre over 8 years ago

No problem. Do you want me to send you input xml and xsl?

RE: selecting wrong values - Added by Michael Kay over 8 years ago

Yes please. If you can reproduce it with less than 150Mb that would be nice, but otherwise we'll cope.

RE: selecting wrong values - Added by Mart Seedre over 8 years ago

Hi

I tried to reproduce the issue with small file but didn't succeed. So I compressed the original file. It is password protected and will send pw to Michael's email.

XMLT.zip (6.53 MB) XMLT.zip

RE: selecting wrong values - Added by Michael Kay over 8 years ago

I have reproduced the problem (with the built-in JDK parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser), and I have established that the data is corrupt at the point where it is passed from the XML parser to Saxon. I was using JDK 1.6.0.27 here.

The problem is not present when I switch to JDK 1.8.0_25.

I have also established that the problem goes away when you use the Xerces parser from Apache (org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser)

Conclusion: bug in JDK XML parser.

RE: selecting wrong values - Added by Mart Seedre over 8 years ago

Thank you, Michael!

    (1-9/9)

    Please register to reply