Project

Profile

Help

NamePoolLimitException

Added by Anonymous about 17 years ago

Legacy ID: #4368472 Legacy Poster: Ole Lensmar (omatzura)

Hi all, I'm parsing fairly large xml documents (>15mb) which an extreme amount of namespace-prefixes.. when upgrading from saxon8-6-1 to 8.8 I now get a NamePoolLimitException when trying to parse these, looking at the sources these limits seem to be hardcoded.. can I just increase these limits and recompile the NamePool or is there any other recommended way of getting around this? Maybe subclass NamePool and set this subclass version as the default NamePool? but which methods should I override? Since NamePool isn't an interface its a bit hard to know what is required by clients.. (all public methods?) thanks for any reply! regards, /Ole eviware.com


Replies (5)

Please register to reply

RE: NamePoolLimitException - Added by Anonymous about 17 years ago

Legacy ID: #4368533 Legacy Poster: Michael Kay (mhkay)

At first sight this seems surprising, since some changes were made in 8.7 that actually increased the limits (from 256 prefixes for a given URI to 1024 prefixes for a given URI). However, it's possible that at the same time code was added to check more rigorously for the limits being exceeded, and that the code happened to work before perhaps by using the "wrong" prefix (which would be non-conformant, but you'd probably get away with it). As you say the limits are hardcoded and they can't be changed simply by changing a constant here and there, because they are built in to the way Saxon allocates integer fingerprints for names, for fast comparisons. It's not a design objective for Saxon that it should be able to handle "pathological" documents - products are allowed to impose limits, and I try to set them reasonably, but there will always be some cases that can't be handled. You say you have an extreme number of prefixes. Do you also have an extreme number of namespace URIs? If not, the obvious solution is to write a SAX filter to process the input and reduce the number of prefixes by always using the same prefix with the same URI. This assumes, of course, that you are not using namespace prefixes in the content of elements or attributes. If the number of namespace URIs is also extreme (i.e. above 1000 or so), then it's going to be harder to find a solution: one might have to look at mapping the names into a smaller number of namespaces on the way in, and then mapping them back on the way out.

RE: NamePoolLimitException - Added by Anonymous about 17 years ago

Legacy ID: #4368605 Legacy Poster: Ole Lensmar (omatzura)

Thanks for your reply. I could probably create a SAX filter as you describe, the problem is that our application is a generic SOAP client (soapUI), which may encounter all kinds of "strange" documents and I would rather not add code solving some specific users issues.. I've reverted back to 8.6.1 for now.. If I just increase the constants and recompile, would that work? Or is the NamePool optimized for the existing constant value? regards! /Ole eviware.com

RE: NamePoolLimitException - Added by Anonymous about 17 years ago

Legacy ID: #4368657 Legacy Poster: Michael Kay (mhkay)

No, as I thought I explained, you can't simply change the constants and recompile. These limits are based on the number of bits available in a fingerprint, and there is no practical way of increasing the number of bits in a fingerprint. The bottom 20 bits are used for the (URI, localName) combination, and 10 upper bits for the prefix; the meaning of the prefix code is specific to the URI, so you are allowed 1024 distinct prefixes for every distinct URI, which is enough for anything that isn't completely pathological. I suspect you're dealing with the same source of documents that caused me to increase the limit from 256 to 1024 in Saxon 8.7. I thought at the time it was probably pointless, since anyone who has that many prefixes is doing something pretty weird and could equally have a million prefixes, but I had a couple of bits still spare so I allocated them. To be honest, I think it might have been wiser to keep them reserved to add to the 20 used for URL/local, since I would have expected that limit to blow first. Do you know how many prefixes this doument actually uses? I don't think reverting to 8.6.1 will work. The limits are there in 8.6.1, in fact they are smaller, it's just that 8.6.1 doesn't detect when they are exceeded, it just produces wrong (unreliable) output. If you're lucky it won't be badly wrong, it will just use the wrong prefix, and it's possible the user doesn't care about that, but it's still non conformant. What one could try to do here is formalize the old behaviour and say that if the limits are exceeded, we will "fail soft": rather than crashing, we will lose the namespace prefix information. I don't think that's actually conformant, so it would have to be switched on as a non-default option. But I'd like a good rationale before doing this: who is generating such weird XML, and why?

RE: NamePoolLimitException - Added by Anonymous about 17 years ago

Legacy ID: #4369049 Legacy Poster: Ole Lensmar (omatzura)

Hi, ok.. the documents are generated by the soap API for JIRA (a popular issue tracking system), which has a request for searching on issues.. the response contains a list of issues that are subclassed in the associated schema, this allowing different types of issues, where each is distinguished via the xsi:type mechanism.. Here is the problem; JIRA creates a new namespace/prefix for each issue/type (although this isn't necessary), below is a cut-out of a result; ... <multiRef id="id19222" xsi:type="ns23107:RemoteCustomFieldValue" xmlns:ns23107="http://beans.soap.rpc.jira.atlassian.com"> ... </multiRef> <multiRef id="id7112" xsi:type="ns23108:RemoteVersion" xmlns:ns23108="http://beans.soap.rpc.jira.atlassian.com"> ... </multiRef> ... as you can see, the prefix is always for the same namespace and increased for each item.. so the larger the result, the more prefixes.. I dont know if their API has some paging mechanism which could be used to not get everything at once.. In our specific case, the user isn't at all interested in this, so that's probably why 8.6.1 has worked ok.. maybe adding a "soft-limit" might be a good idea.. How about subclassing namepool; if I create a (not very optimal) implementation of all public methods and set that as the default namepool, would/could that work as a temporary workaround? again; thanks for your help! regards, /Ole eviware.com

RE: NamePoolLimitException - Added by Anonymous about 17 years ago

Legacy ID: #4369109 Legacy Poster: Michael Kay (mhkay)

I don't think the NamePool is the right place to fix this. Nothing you do in the NamePool can change the value of the xsi:type attribute. Because these prefixes are in fact used in attribute content (xsi:type), getting them wrong will affect the application. It therefore needs an application-layer filter to modify the prefixes, because it's only at that layer that you can know where the prefixes appear. Michael Kay

    (1-5/5)

    Please register to reply