Project

Profile

Help

format-date and format-integer are taking alot of time at initial use in xslt

Added by Micha Hagg almost 4 years ago

I am using Saxon HE 9.9.1 and have an XSLT in which I use format-date multiple times. The date-argument to format-date are typed differently across the usages: current-date(), xs:date(substring-before(*/aanvraagdatum,'T')), */ingangsdatum, etc. The format-argument is mostly the same: '[D] [Mn] [Y]'.

What I find is that when I profile the transform action, it shows that the template, that calls the first format-date occurrence, takes a lot of time. After the first call of format-date, all further calls return quickly. With a lot of time, I mean 200 msecs. Further calls take a half msecs.

Is that to be expected?

Furthermore, after debugging the source code, I found that the FormatDate class makes use of the FormatInteger class, that when loaded instantiates three ARegularExpression instances of which the first one takes a lot of time to instantiate.

Thank you for your interest and support.


Replies (3)

Please register to reply

RE: format-date and format-integer are taking alot of time at initial use in xslt - Added by Michael Kay almost 4 years ago

This is not entirely unexpected; the first time a regular expression is used, we initialize a lot of data from tables derived from the Unicode database, such as lists of character categories.

However, we do try to avoid gratuitous use of regular expressions that trigger this behaviour, and I will take a look at this path to see whether the overhead is easily avoidable.

RE: format-date and format-integer are taking alot of time at initial use in xslt - Added by Michael Kay almost 4 years ago

First, I think we need to put this into perspective. This cost is static initialization, which means it only happens once for the Java VM; it's only going to affect stylesheet execution time if you load a new Java VM for each transformation, and the cost of 200ms is pretty negligible compared with the overall cost of loading the Java VM. If it's important for you to save 200ms, then you need to be looking at ways of running transformations without loading a new JVM each time.

We could try to avoid the initialisation of the regex character tables for simple uses of format-date, for example for cases where the format picture consists entirely of ASCII characters. But I'm becoming a bit resistant to such optimisations; they increase the size of the product (which itself increases Java loading times); they create new opportunities for bugs; and they often end up benefiting very few users.

We could also try to speed up the initialisation by changing the format in which the data files are held. But again, I'm not convinced the benefits justify the effort.

RE: format-date and format-integer are taking alot of time at initial use in xslt - Added by Michael Kay almost 4 years ago

One other point: the profile you provided doesn't distinguish the costs of loading the regex tables from other JVM initialisation costs. Everything in Java costs more the first time you do it, because so much of the initialisation is done lazily. It would require very careful analysis to show how much of the 200ms is actually accounted for by the initialisation of these tables.

    (1-3/3)

    Please register to reply