Character converts to junk by XSL Transform
Added by Anonymous over 14 years ago
Legacy ID: #8568951 Legacy Poster: Vinothkumar M R (mrvinoth1)
Problem: while converting an xhtml using xsl transformation, few characters get converted to a junk, In the below code, my input has Á and Â. After xsl transformation, Â has proper character where Á results in ??. I need to use UTF-8 character set. Please help me to resolve the issue. How can i get all the special characters properly in the output.? JAVA CODE: (XSL is pasted below the java code) import java.io.ByteArrayOutputStream; import java.io.File; import java.io.OutputStream; import java.io.StringReader; import java.nio.charset.Charset; import javax.xml.parsers.SAXParserFactory; import javax.xml.transform.sax.SAXSource; import javax.xml.transform.stream.StreamSource; import net.sf.saxon.s9api.Processor; import net.sf.saxon.s9api.SaxonApiException; import net.sf.saxon.s9api.Serializer; import net.sf.saxon.s9api.XdmNode; import net.sf.saxon.s9api.XsltCompiler; import net.sf.saxon.s9api.XsltExecutable; import net.sf.saxon.s9api.XsltTransformer; import org.xml.sax.EntityResolver; import org.xml.sax.InputSource; import org.xml.sax.XMLReader; public class XHTMLConvertor { public static final String XSL="C:/copy.xsl"; public static void main(String[] args) { String input = "<html xmlns="http://www.w3.org/1999/xhtml\" version="-//W3C//DTD XHTML 1.1//EN">
special character 193 Á 194 Â
Replies (3)
Please register to reply
RE: Character converts to junk by XSL Transform - Added by Anonymous over 14 years ago
Legacy ID: #8568996 Legacy Poster: Vinothkumar M R (mrvinoth1)
More Info on the problem: i am using saxon9he.jar and running the code in windowsxp
RE: Character converts to junk by XSL Transform - Added by Anonymous over 14 years ago
Legacy ID: #8569276 Legacy Poster: Michael Kay (mhkay)
The characters are not being turned into junk by your XSLT transformation, but rather when you call the toString() method on your ByteArrayOutputStream. The spec for this method says "Converts the buffer's contents into a string decoding bytes using the platform's default character set." The platform's default character set [it means encoding] is probably iso-8859-1 - the method has no idea that your ByteArrayOutputStream actually contains the characters encoded in UTF-8. Java lets you specify the encoding when you convert a ByteArray to a string. You could do that; or you could ask Saxon to write directly to a StringWriter, in which case encoding is not an issue.
RE: Character converts to junk by XSL Transform - Added by Anonymous over 14 years ago
Legacy ID: #8571403 Legacy Poster: Vinothkumar M R (mrvinoth1)
Thank you very much for the solution. It worked.
Please register to reply