Mysterious "document must be well-formed" Failure
Added by Eliot Kimber over 12 years ago
I have a custom DITA Open Toolkit transform (the DITA for Publishers EPUB transform) that runs correctly when run from within OxygenXML but fails consistently when run from the command line.
The failure reported is:
@ [xslt] + [DEBUG] after generate-index [xslt] /Applications/oxygen/frameworks/dita/DITA-OT/plugins/net.sourceforge.dita4publishers.epub/xsl/map2epubImpl.xsl:366: Fatal Error! When 'standalone' or 'doctype-system' is specified, the document must be well-formed; but this document contains a top-level text node [xslt] Failed to process null@
Researching this failure on MarkMail, it would indicate an attempt to serialize a document that is not well formed. However, my code is not doing that, or it would fail under Oxygen as well.
The Toolkit uses 9.1.0.5J, which is I believe the last free version that supports Java extension functions (which the Toolkit depends on).
By inspection of the code, there is in fact no serialization happening at the time of the reported failure at all.
The failure is reported against this bit:
@ xsl:message + [DEBUG] after generate-index</xsl:message> --> <xsl:apply-templates select="." mode="generate-book-lists"> <xsl:with-param name="collected-data" as="element()" select="$collected-data" tunnel="yes"/> </xsl:apply-templates> xsl:message + [DEBUG] after generate-book-lists</xsl:message>@
I put debug messages in all the templates in the generate-book-lists mode and none of those messages are emitted before the failure.
If I comment out the generate-book-lists template then the failure does not occur and if I change its order in the transform it still occurs, so the failure is caused directly or indirectly by that code, but I can't find any obvious reason why.
I tried installing Saxon9ee using an evaluation license, but when I do that, I get a failure on the Java extension functions:
@ [xslt] /Applications/oxygen/frameworks/dita/DITA-OT/plugins/net.sourceforge.dita4publishers.common.html/xsl/commonHtmlOverrides.xsl:34:81: Fatal Error! Cannot find a matching 2-argument function named {org.dita.dost.util.ImgUtils}getWidth() [xslt] /Applications/oxygen/frameworks/dita/DITA-OT/plugins/net.sourceforge.dita4publishers.common.html/xsl/commonHtmlOverrides.xsl:42:82: Fatal Error! Cannot find a matching 2-argument function named {org.dita.dost.util.ImgUtils}getHeight()@
I have the evaluation license in the same directory as the jar file and added it to the class path as well, so I'm not sure what I'm doing wrong here.
What could cause this particular error in the 9.1.0.5 version and what could Oxygen be doing to cause it to not happen?
Unfortunately, because this is an Open Toolkit transform, it's non-trivial to package it up except by providing the entire Toolkit with my plugins deployed to it.
This feels like either silly user error on my part that Oxygen is somehow suppressing or a subtle bug in 9.1.0.5 that the EE version installed with Oxygen has fixed or otherwise avoids, but I'm at a loss as to what it might be.
Thanks,
Eliot
Replies (2)
RE: Mysterious "document must be well-formed" Failure [SOLVED] - Added by Eliot Kimber over 12 years ago
Found my user error: I wasn't actually including the module that had the templates that I wasn't seeing the messages from.
When I include it, the failure of course goes away.
That explains the failure: I was emitting text to the main output (which normally gets nothing).
However, that doesn't explain why I didn't get a failure in Oxygen. Something to do with how Oxygen configures Ant to run the transform perhaps?
I have asked Oxygen support that specific question.
Cheers,
Eliot
RE: Mysterious "document must be well-formed" Failure - Added by Michael Kay over 12 years ago
First, the issue with the evaluation license: I haven't worked out under what circumstances putting the license next to the JAR files succeeds/fails, but experience shows that it isn't 100% reliable. The only 100% reliable approach is to make sure that the directory containing the license file is on the class path.
Now the substantive issue.
You're right about the meaning of the message: it means a non-whitespace text node is being sent to the serializer before or after the root element of the result tree.
The most obvious explanation for oXygen not reporting the error is that it isn't using the Saxon serializer; we would have to ask oXygen to find out if that is the case.
I'm not sure you can deduce much about whether xsl:message output comes before or after the failure; order of processing can be very different from the source code order; and the location information for a serialization error is rather unreliable because of buffering. On the other hand if removing a line of code makes the problem go away, that is much stronger evidence (though still not 100% reliable; changing code in one area can affect the optimization applied to apparently unrelated code).
I would start by using "!standalone:omit" and "!doctype-system:" (nothing after the colon) on the command line. This should suppress the check that is failing, and produce serialized output which can then be checked for well-formedness. It should be easy enough to do the check visually, taking care of course that some non-whitespace characters are invisible, such as NBSP.
It might also be worth trying it with Saxon-B 9.1.0.8, just incase there's a known bug fixed after 9.1.0.5, though I think that's unlikely.
Another angle would be to capture the result of the offending apply-templates call in a variable, and examine its contents fairly carefully. Use as="item()*" on the xsl:variable declaration to prevent any document node being added. Use string-to-codepoints() to output any text nodes so we can see exactly what the text node contents are.
After that my next step would probably be to insert a tracing Receiver into the serialization pipeline, but unfortunately I think that requires a little Java coding - I can't see a way to switch it on from the command line, although the code is all there.
Michael Kay Saxonica
Please register to reply