Project

Profile

Help

Requirements for Collator Flexibility

Added by Anonymous almost 19 years ago

Legacy ID: #3141649 Legacy Poster: W. Eliot Kimber (drmacro)

Most of my XSLT work revolves around transforms for generating XSL-FO for technical documents in many different national languages. As part of this locale-specific collation is of vital importance. With Saxon 6 I implemented a package that provides a generic collator factory driven by an external configuration system that is then integrated with Saxon 6 through its simple collator extension mechanism, which binds locale codes to collator classes. With XSLT 2 and Saxon 8 this approach isn't really appropriate. I've been thinking about how best to get what I want, which is the ability to easily extend Saxon 8's collators with my collator factory, which is currently bound to ICU4J, which provides good collators for most languages out of the box. Unfortunately, I see two problems with Saxon 8 as currently implemented: 1. Using the collator-URI-to-class method isn't ideal because it requires the XSLT to be processor-specific, which I'd rather avoid (even though in practice we will probably only ever use Saxon). That is, I'd rather have the locale-specific information on xsl:sort drive the selection of collators, with the binding between saxon and my collator factory handled outside the XSLT itself. As an alternative/refinement to this approach, I might use a collator URI to identify a general collator factory which then uses the other properties on xsl:sort to construct a specific comparator instance. But I would not want/be able to use collator URIs to point to locale-specific comparator instances, for the simple reason that a single XSLT processor might need to handle up to 50+ locales, often in the same input document (such as a mult-language "getting started" manual). Thus the condition that controls collator selection is the locale of the data being processed at that moment. Thus it makes more sense to delegate the details of collator selection to the collator factory rather than putting it into the XSLT. In addition, in my use cases, the person implementing the XSLT and the person managing the collator configurations are often two different people operating in different parts of the business process, so I need this separation of concerns to avoid unnecessary dependency on XSLT engineers. 2. Saxon's makeUsingProperties() method of the CollationFactory class won't work for ICU4J because ICU4J does not subclass java.text.RuleBasedCollator (but it does implement Comparator). This poses a problem for the posibility of replacing the built-in CollationFactory, since makeUsingProperties() is just what I want to do but I couldn't with ICU4J. Doh. I think the best approach would be as follows: 1. Revise the API for CollationFactory so that makeUsingProperties returns a Comparator, not a java.text.Collator. 2. Provide the run-time option of providing a CollationFactory implementation to replace the built-in CollationFactory. Given these two changes I could quickly adapt my existing ICU4J comparator factory (which underlies my current Saxon 6.2 support code) to work with Saxon 8. I'm happy to take a stab at these changes to the code but only if you think this approach is appropriate. Cheers, Eliot Kimber


Replies (1)

RE: Requirements for Collator Flexibility - Added by Anonymous almost 19 years ago

Legacy ID: #3142554 Legacy Poster: Michael Kay (mhkay)

Thanks for the suggestions. I think the future direction should be in terms of collation URIs, because that's the way the spec is written. The mechanism needs to handle all uses of collations, not just xsl:sort; I think one should treat the xsl:sort attributes such as lang as historic. I think it would be entirely appropriate to generalize the CollationFactory class so that a used-defined CollationFactory can be registered with the Configuration, with the main method being essentially along the lines of makeCollationFrom URI. This already returns a Comparator, so it doesn't seem to be very far away from what you need, and because it takes the Configuration as a parameter, there's very little change needed to the code that calls this method if it changes from CollationFactory.makeCollationFromURI(uri, config) to config.getCollationFactory().makeCollationFromURI(uri). Although makeUsingProperties is declared public, it's currently called only from within the CollationFactory itself, so changes here are no problem. Currently the CollationFactory.makeCollationFromURI() method, in XSLT at any rate, is used only as a fallback, after first trying to look up the collation URI in a table maintained in the Executable (which is populated using saxon:collation). I think I would want to integrate these mechanisms, using a similar approach to the FunctionLibrary mechanism used for binding external functions: allow a sequence of CollationFactories to search for a URI in some kind of priority order, and put the current Executable table into such a CollationFactory that's first on the list.

    (1-1/1)

    Please register to reply