Project

Profile

Help

accept non-UTF8 characters in XML?

Added by Anonymous almost 16 years ago

Legacy ID: #5131918 Legacy Poster: y10k (y10k)

When using Saxon (s9api) to perform XQuery on XMLs, I have one XML that contain some non-UTF8 characters and Saxon reported the following error: Error on line 1401270 column 141 of file:/D:/workspace/MEET/bin/temp/ast.xml: SXXP0003: Error reported by XML parser: Invalid byte 1 of 1-byte UTF-8 sequence. MeetException: net.sf.saxon.s9api.SaxonApiException: org.xml.sax.SAXParseException: Invalid byte 1 of 1-byte UTF-8 sequence. at XQueryEngine.query(Unknown Source) at XQueryEngine.execute(Unknown Source) at MeetEngine.analyzePLSQL(Unknown Source) at GenerateReportThread.run(Unknown Source) The word on the line is: proc¿dure What do I need to do to parse this XML (other than changing the content of the XML itself)?


Replies (1)

RE: accept non-UTF8 characters in XML? - Added by Anonymous almost 16 years ago

Legacy ID: #5132248 Legacy Poster: Michael Kay (mhkay)

As the message tells you, this error is reported by the XML parser: Saxon is merely passing it on. To be precise there's no such thing as a non-UTF8 character; what you have is a sequence of bytes that isn't the correct UTF8 encoding of any character. Usually this means your file is in some encoding other than UTF8 (perhaps iso-8859-1 or cp1252) and you neglected to specify this encoding in an XML declaration at the start of the file, so UTF8 was assumed.

    (1-1/1)

    Please register to reply