Project

Profile

Help

UTF-8 instead of ANSI as result file

Added by Anonymous almost 15 years ago

Legacy ID: #7819600 Legacy Poster: JohnSea (johnsea)

Hi, I execute test.bat file containing this unique command line : java -jar c:\saxon9he.jar -o result.html test.xml test.xsl Transformation is fine and I get result.html file as predicted, but it's in ANSI format instead of UTF-8, even though xsl:output declaration specifies UTF-8 as result file, and that the file has the xml-declaration as first line of document saying UTF-8. I also tried xsl:result-document with the same result. I also did the same test using oXygen software, with the same result. Both test.xml and test.xsl input files are in UTF-8 format. I'm under WinXP, using saxon9he and Java version is 1.6.0_17. Is there a way to make sure result file of transformation is in UTF-8 format instead of ANSI on a Windows system? Thanks all, JohnSea


Replies (7)

Please register to reply

RE: UTF-8 instead of ANSI as result file - Added by Anonymous almost 15 years ago

Legacy ID: #7819792 Legacy Poster: Michael Kay (mhkay)

Are you sure there is no xsl:output declaration in the stylesheet specifying "ANSI" encoding? (it would probably be called something like cp1250, "ANSI" is an unofficial and rather misleading name used (formerly) by Microsoft) Note that the XML declaration in the stylesheet defines the encoding of the stylesheet, not of the result document. The encoding of the result is specified using xsl:output.

RE: UTF-8 instead of ANSI as result file - Added by Anonymous almost 15 years ago

Legacy ID: #7820013 Legacy Poster: JohnSea (johnsea)

Hi M. Kay, No, there is no output declaration with encoding ANSI or cp1250 in the stylesheet. I specify the encoding myself using xsl:output but cannot produce an UTF-8 result file. I've tried both : xsl:output method="xml" encoding="UTF-8" version="1.0" indent="yes"... and xsl:result-document encoding="UTF-8" version="1.0"... Using saxon9he in test.bat file, and Saxon-B 9.1.0.7 in oXygen with the same result, under WinXP.

RE: UTF-8 instead of ANSI as result file - Added by Anonymous almost 15 years ago

Legacy ID: #7820046 Legacy Poster: Michael Kay (mhkay)

I will need a complete stylesheet and source document, and exact details of how you are running it, so that I can attempt to reproduce the problem. Please try and cut it down to the minimum needed to show the effect.

RE: UTF-8 instead of ANSI as result file - Added by Anonymous almost 15 years ago

Legacy ID: #7820409 Legacy Poster: JohnSea (johnsea)

Thanks, Here's the code. test.xml (edited with TextPad, saved as UTF-8) : *** I put the tags in bracket so you can see the code here. [?xml version="1.0" encoding="UTF-8"?] [test] [welcome]Hello world![/welcome] [/test] test.xsl (edited with TextPad, saved as UTF-8) : <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:output encoding="UTF-8" method="xhtml" version="1.0" doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd" indent="yes" omit-xml-declaration="yes"/> <xsl:template match="test"> <xsl:apply-templates select="welcome"/> </xsl:template> <xsl:template match="welcome">

<title><xsl:value-of select="."/></title> <xsl:value-of select="."/> </xsl:template> </xsl:stylesheet> result.html : *** I put the tags in bracket so you can see the code here. [!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"] [html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"] [head] [title]Hello world![/title] [/head] [body] [p]Hello world![/p] [/body] [/html] When I open result.html with TextPad and look in menu File/Save as... It shows that the file is in ANSI format. I made a scenario in oXygen using Saxon-B 9.1.0.7, and a test.bat file containing this line : java -jar c:\saxon9he.jar -o result.html test.xml test.xsl, with the same end result. Could it be a setting in Java output under Windows?

RE: UTF-8 instead of ANSI as result file - Added by Anonymous almost 15 years ago

Legacy ID: #7820456 Legacy Poster: David Lee (daldei)

I suggest your problem is believing TextPad. TextPad has no clue what the encoding of a file is unless it has the UTF header bytes. If there are no UTF header bytes and it encounters no non-ascii charactors it will assume "ANSI". A test would be to put in a real unicode (> 0x7f) charactor in the data and see with a binary editor what the output is.

RE: UTF-8 instead of ANSI as result file - Added by Anonymous almost 15 years ago

Legacy ID: #7820561 Legacy Poster: Michael Kay (mhkay)

As far as I can see all the characters involved in this transformation are ASCII characters, and there is no difference between the ANSI and UTF-8 encodings of ASCII characters. The UTF-8 encoding for this result is byte-for-byte identical with the ANSI encoding.

RE: UTF-8 instead of ANSI as result file - Added by Anonymous almost 15 years ago

Legacy ID: #7820610 Legacy Poster: JohnSea (johnsea)

Thanks, it solved it. I replaced "Hello world!" in test.xml with "Hello î ï œ" and the file now appears as UTF-8 in TextPad. I won't beleive TextPad anymore...!

    (1-7/7)

    Please register to reply