Project

Profile

Help

SXXP0003:...Document root element is missing.

Added by Anonymous over 18 years ago

Legacy ID: #3352535 Legacy Poster: xristy (xristy)

I'm getting the subject error from SaxonB 8.5.1 with the following pared down document: <?xml version="1.0" encoding="UTF-8"?> <outline:outline xmlns:outline="http://www.tbrc.org/models/outline#" RID="O01CT0003"> <outline:name>gsung ‘bum / ngag dbang blo bzang bsam gtan</outline:name> <outline:isOutlineOf work="W29486" type="subjectCollection">gsung ‘bum / ngag dbang blo bzang bsam gtan</outline:isOutlineOf> <outline:node type="section"> <outline:name>volume 1</outline:name> </outline:node> </outline:outline> This document is well-formed and valid per schema as far as oXygen 6.2 is concerned. Further, the following nearly identical pared down document is processed with no problem: <?xml version="1.0" encoding="UTF-8"?> <outline:outline xmlns:outline="http://www.tbrc.org/models/outline#" RID="O01CT0004"> <outline:name>gsung 'bum / ngag dbang chos grags</outline:name> <outline:isOutlineOf work="W10205" type="subjectCollection">gsung 'bum / ngag dbang chos grags. -- darjeeling :sakya choepheling monastery, 2000.</outline:isOutlineOf> <outline:node type="section"> <outline:name>volume 1 (ka), 2140</outline:name> </outline:node> </outline:outline> I've stared at these and chopped them to the bone so that there would be a succinct test case. I'm at a loss to see what's doing it. I've tried pasting pieces from the O01CT0004 document into O01CT0003 to no avail. The stylesheet I'm using is: <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:outline="http://www.tbrc.org/models/outline#" xpath-default-namespace="http://www.tbrc.org/models/outline#"> <xsl:template match="/outline:outline"> <outline:outline RID="{@RID}"> <xsl:apply-templates/> </outline:outline> </xsl:template> <xsl:template match="node"> <outline:node> <xsl:for-each select="@"> <xsl:attribute name="{local-name()}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:for-each> <xsl:attribute name="RID"> <xsl:value-of select="generate-id()"/> </xsl:attribute> <xsl:apply-templates/> </outline:node> </xsl:template> <xsl:template match=""> <xsl:element name="outline:{local-name()}"> <xsl:for-each select="@*"> <xsl:attribute name="{local-name()}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:for-each> <xsl:apply-templates/> </xsl:element> </xsl:template> </xsl:stylesheet> The stylesheet works on a number of these kinds of documents - with many many more nodes - except for this one document. I'm sure it's something obvious. Thanks.


Replies (3)

Please register to reply

RE: SXXP0003:...Document root element is missing. - Added by Anonymous over 18 years ago

Legacy ID: #3352579 Legacy Poster: xristy (xristy)

I copied the entire contents of the full version of the suspect document into a new file and it works!? Permissions on the suspect file are the same as for the other document files that had no problems. What kind of conditions would cause this message?

RE: SXXP0003:...Document root element is miss - Added by Anonymous over 18 years ago

Legacy ID: #3352760 Legacy Poster: Michael Kay (mhkay)

I would look carefully at that character between "gsung" and "bum". Is it really a properly-encoded UTF-8 character? What does it look like in a hex editor? In the email it's hex 91, which is a typographical opening quote in the proprietary Microsoft cp1252 encoding. The "nearly identical" document has an ordinary ASCII apostrophe in this position. I agree that if this is the problem, it's not a very helpful error message - but the message comes from the XML parser, so there's nothing much I can do about it (except to suggest that you use a different parser).

RE: SXXP0003:...Document root element is miss - Added by Anonymous over 18 years ago

Legacy ID: #3355872 Legacy Poster: Dr. Frank Mabry (drmabry)

The character in question is for code point 0x2018. The definition from the Unicode database is: comma quotation mark, single turned The single quote code point is 0x0027. The source (as displayed from Sourceforge) appears to be valid Unicode (in UTF8). To verify the contents as valid UTF8 I would need file access to the original file. I have a Perl script (5.8) that checks UTF8 validity. I would be happy to provide it if there is any interest. This alternative repressentation of a single quote is not appropriate to delimit information from the standpoint of parsing. I suspect that the token parser is being too generous in noticing this character. The actual value could be referenced as its hex value as an alternative to see at what level the problem is occurring at. Just my 2¢ worth. (alt0162 from the keypad). Frank

    (1-3/3)

    Please register to reply