Command line parameter encoding
Added by Anonymous about 19 years ago
Legacy ID: #3439222 Legacy Poster: Will McCutchen (mccutchen)
I'm calling saxon on the command line from a Python program. I've run into a situation where I want to pass an em dash in to Saxon as a parameter, but I can't figure out how to make it work. Here are some of the parameters I've passed in on the command line so far: schedule-title="Enrolling Now—Spring 2006" That makes it through okay, but it comes out rendered as a u with an accent grave over it (I think). schedule-title="Enrolling Now—Spring 2006" The ampersand is escaped, so the entity reference doesn't work. Any suggestions?
Replies (7)
Please register to reply
RE: Command line parameter encoding - Added by Anonymous about 19 years ago
Legacy ID: #3440867 Legacy Poster: Michael Kay (mhkay)
I'm afraid I don't know much about how character encoding is handled by command line processors, and I suspect it depends strongly on which particular command line processor you are using. An XML character reference is very unlikely to work in this context, as it isn't XML. Is a command line really the best way of invoking Saxon from Python? Isn't there some way of invoking the Java API directly (or wrapping it in a Python API)? If all else fails you could pass in a piece of XML and use saxon:parse within the stylesheet to interpret it. Michael Kay
RE: Command line parameter encoding - Added by Anonymous almost 19 years ago
Legacy ID: #3447963 Legacy Poster: Will McCutchen (mccutchen)
Thanks for the response, Michael. I was afraid that might be the case, but I thought I would check, first. And you're probably right that invoking Saxon from Python via the command line is not best way to go about things, but it works for my purposes (up til now, at least), and I don't have a lot of time to research the alternatives. Thanks again for your help, and for Saxon, which is really an amazing piece of work. Will.
RE: Command line parameter encoding - Added by Anonymous almost 19 years ago
Legacy ID: #3450301 Legacy Poster: Kevin Rodgers (notorious_kev)
I've tried to find something in the Java specs that specifies what encoding the main() method assumes for its String args[] parameter, but the most relevant bit I've found so far is http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#100850 Have you tried using the Java hexadecimal escape sequence: schedule-title="Enrolling Now\u2014Spring 2006"
RE: Command line parameter encoding - Added by Anonymous almost 19 years ago
Legacy ID: #3450342 Legacy Poster: Will McCutchen (mccutchen)
> Have you tried using the Java hexadecimal > escape sequence: > > schedule-title="Enrolling Now\u2014Spring 2006" No, I had not tried that. Unfortunately, it did not work, either. Thanks for the help, though. Will.
RE: Command line parameter encoding - Added by Anonymous almost 19 years ago
Legacy ID: #3452637 Legacy Poster: Kevin Rodgers (notorious_kev)
In your original attempt,specifying the character directly, what encoding did you use (e.g. UTF-8)? And how is your locale specified (e.g. what are the LANG and LC_* environment variables set to)? -- Kevin
RE: Command line parameter encoding - Added by Anonymous almost 19 years ago
Legacy ID: #3452678 Legacy Poster: Michael Kay (mhkay)
I suspect that Java assumes the same encoding for command line arguments as it assumes for reading text files with an unspecified encoding: i.e. it's platform dependent. If you can't call Saxon in a more efficient way using a direct call, I'd suggest you consider passing parameters as XML encoded in US-ASCII, and then decoding them from within the XSLT code using saxon:parse. Michael Kay
RE: Command line parameter encoding - Added by Anonymous almost 19 years ago
Legacy ID: #3452877 Legacy Poster: Will McCutchen (mccutchen)
Unfortunately, this is a work project and I'm confined to Windows XP. So I assume (but I may be wrong) that I specified the character in ISO-8859 or Windows-1252. When I do that, Saxon interprets the character as a "latin small letter u with grave" (whose decimal reference is ù).
Please register to reply