Saxon Configuration synchronization issue
Added by Anonymous over 13 years ago
Legacy ID: #9778290 Legacy Poster: Bartek (bkowal)
It seems that net.sf.saxon.Configuration is a little bit incorrectly synchronized which sometimes causes performance issues. getSourceParser() and reuseSourceParser() should not be synchronized. They could (for example) look like this: [code] private volatile ConcurrentLinkedQueue sourceParserPool = new ConcurrentLinkedQueue(); private final int maxPoolSize; @Override public XMLReader getSourceParser() throws TransformerFactoryConfigurationError { XMLReader parser = sourceParserPool.poll(); if (parser != null) { return parser; } if (getSourceParserClass() != null) { parser = makeParser(getSourceParserClass()); } else { parser = loadParser(); } try { Sender.configureParser(parser); } catch (XPathException err) { throw new TransformerFactoryConfigurationError(err); } if (isValidation()) { try { parser.setFeature("http://xml.org/sax/features/validation", true); } catch (SAXException err) { throw new TransformerFactoryConfigurationError("The XML parser does not support validation"); } } return parser; } @Override public void reuseSourceParser(XMLReader parser) { try { // give things back to the garbage collecter parser.setContentHandler(null); parser.setEntityResolver(null); parser.setDTDHandler(null); parser.setErrorHandler(null); // Unfortunately setting the lexical handler to null doesn't work on Xerces, because // it tests "value instanceof LexicalHandler". So we set it to a lexical handler that // holds no references parser.setProperty("http://xml.org/sax/properties/lexical-handler", dummyLexicalHandler); } catch (Exception err) { // } // this condition isn't safe, some fluctuations in pool size might be possible; however, this should not do any harm :) if (sourceParserPool.size() < maxPoolSize) { sourceParserPool.offer(parser); } } [/code] Please also note that I added a (configurable) limit (a not really strict one). Anyway, it seems that currently there's no limit - parser pool can be extremely huge. Thanks, Bartek
Replies (4)
Please register to reply
RE: Saxon Configuration synchronization issue - Added by Anonymous over 13 years ago
Legacy ID: #9778697 Legacy Poster: Michael Kay (mhkay)
Thanks for the contribution. When you say the current code is "incorrect", I assume you only mean that it is sub-optimal? Do you have any evidence of this causing a bottleneck, or of the improvements produced by your change? I would expect it to make a difference only when you are parsing very large numbers of very small documents (which of course is the scenario that caused the parser pool to be introduced in the first place). Do you have any unit tests for your changed code? Finally (and sorry about this, I know it's boring) can you please confirm that you and your employer are happy either to license this code under MPL, or to relinquish all copyright claims.
RE: Saxon Configuration synchronization issue - Added by Anonymous over 13 years ago
Legacy ID: #9778770 Legacy Poster: Michael Kay (mhkay)
A further question: putting a limit on the pool size. This only seems necessary if you think people are likely to return a parser to the pool that was not obtained from the pool. This seems improbable to me, and not worth the cost of calling the size() method which is expensive on ConcurrentLinkedQueue.
RE: Saxon Configuration synchronization issue - Added by Anonymous over 13 years ago
Legacy ID: #9779015 Legacy Poster: Bartek (bkowal)
Hi, Yes, I should have said that is suboptimal. It's not incorrect. Similarly to what I wrote on the second thread, for this issue I only have thread dumps with plenty of threads waiting on these synchronization. I haven't prepared any unit tests, this was just quick hacking. If I were to prepare a clean patch with unit tests, I would send you a diff and not an ugly snippet (based on 9.1.08 and not the trunk) :). On limiting the size: This was an extremely simple mechanism that was implemented for our tests. However, I feel that limiting the size makes sense as parsers might be "idle" for a long period of time which would prevent them from being gc-ed. A real pool with min/max/keepalive would be required. I'm ok with licensing this code under MPL. However, I'll need to refactor it anyway :). Thanks, Bartek
RE: Saxon Configuration synchronization issue - Added by Anonymous over 13 years ago
Legacy ID: #9784622 Legacy Poster: Bartek (bkowal)
Oops, I meant to write: "I'm ok with licensing this code under MPL. However, you'll need to refactor it anyway :). " Bartek
Please register to reply