Project

Profile

Help

Bug #3366

closed

Performance regression (9.6 - 9.8) building document using pull parser

Added by Artur Chylek almost 7 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2017-07-28
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
9.8
Fix Committed on Branch:
9.8
Fixed in Maintenance Release:
Platforms:

Description

Hi,

after upgrading from Saxon-HE 9.6.0-5 to 9.8.0-3 I observed during regression tests of our application that there is about 30% of performance loss when transforming an XML document of about 5KB size from byte array to NodeInfo.

I didn't change my code, only upgraded Saxon but I walked through the documentation of the latest version and didn't find any information that would suggest I should change my code.

I tested with the following Oracle JVMs:

  • jdk1.8.0_66

  • jdk1.8.0_121

  • jdk1.8.0_141

and the results are the same - performance degradation is still in place.

I created a performance benchmark that shows the problem. It takes about a minute and a half to finish for each version of Saxon.

It uses JMH(Java Microbenchmark Harness) library which generates csv(number of iterations with percentiles) and jfr(Java Flight Records - detailed stats about the Java process) reports and can be run using Maven with the following commands:

  • mvn clean install -Pold

  • mvn clean install -Pnew

where -Pold runs the benchmark with Saxon 9.6.0-5 and -Pnew with 9.8.0-3.

The reports are generated in a target directory of a project and are overwritten every time you run the test.

You can compare csv files for old and new Saxon version and you should observe that the old version of Saxon performs a lot better.

On my machine, which is Windows 7 Enterprise SP1 with 16GB of RAM and Intel Core i7 2.7GHZ, the old version transformed about 1 700 000 documents and the new one only 1 100 000 documents.

Artur


Files

saxon-perf.zip (7.55 KB) saxon-perf.zip Artur Chylek, 2017-07-28 10:38

Related issues

Related to Saxon - Bug #3394: NullPointerException thrown in samples PullExamples-j - StaxBridgeDuplicateO'Neil Delpratt2017-08-15

Actions
Actions #1

Updated by Michael Kay almost 7 years ago

  • Category set to Performance
  • Assignee set to Michael Kay
  • Priority changed from High to Normal

Thanks for reporting it.

Unfortunately you've packaged it up with dependencies on third-party libraries that don't fit naturally into our normal testing and benchmarking environment, so I'm going to have to unravel the code to see if I can find a way to reproduce it using our usual tools.

Am I right in think that the path you are testing is this one:

       public static NodeInfo toNodeInfo(XMLStreamReader xmlStreamReader) throws XPathException {
            StaxBridge parser = new StaxBridge();
            parser.setXMLStreamReader(xmlStreamReader);
            parser.setPipelineConfiguration(CONFIGURATION.makePipelineConfiguration());
            TinyBuilder builder = new TinyBuilder(parser.getPipelineConfiguration());
            new PullPushCopier(parser, builder).copy();
            return builder.getCurrentRoot();
        }

If so, what implementation class of XMLStreamReader is your environment picking up?

I'm puzzled as to why you have written the parsing operation in this way. Why aren't you using the more obvious

NodeInfo node = Configuration.buildDocument(new StreamSource(new ByteArrayInputStream(bytes)));

Reading from a pull parser is something that Saxon supports, but it's never been a major focus of our performance objectives.

Actions #2

Updated by Michael Kay almost 7 years ago

I put together a much simplified benchmark measuring the times for push and pull parsing for this input file. Results:

9.8

Pull time: 1093ms

Push time: 983ms

Pull time: 1102ms

Push time: 934ms

Pull time: 1115ms

Push time: 884ms

9.7

Pull time: 1040ms

Push time: 1078ms

Pull time: 1198ms

Push time: 1025ms

Pull time: 1151ms

Push time: 999ms

9.6

Pull time: 793ms

Push time: 1003ms

Pull time: 773ms

Push time: 958ms

Pull time: 899ms

Push time: 928ms

The pull parser used is com.sun.xml.stream.XMLReaderImpl (which we haven't normally used much because in the past it was unreliable: most of our pull testing is done using Woodstox).

The push code here does

config.buildDocument(new StreamSource(new ByteArrayInputStream(xmlBytes)))

while the pull code does

        StaxBridge parser = new StaxBridge();
        XMLStreamReader reader = XML_INPUT_FACTORY.createXMLStreamReader(new ByteArrayInputStream(xmlBytes));
        parser.setXMLStreamReader(reader);
        parser.setPipelineConfiguration(config.makePipelineConfiguration());
        TinyBuilder builder = new TinyBuilder(parser.getPipelineConfiguration());
        new PullPushCopier(parser, builder).copy();
        return builder.getCurrentRoot();

Each measurement is obtained by parsing 20,000 times and timing the last 10,000.

To try and get more repeatable numbers I increased the number of iterations. I also tried changing the pull code to:

        XMLStreamReader reader = XML_INPUT_FACTORY.createXMLStreamReader(new ByteArrayInputStream(xmlBytes));
        return config.buildDocument(new StAXSource(reader));

but this isn't recognized on 9.6, so I changed it back.

results for 9.8

Pull time: 827ms

Push time: 774ms

Pull time: 798ms

Push time: 704ms

Pull time: 859ms

Push time: 998ms

Pull time: 1067ms

Push time: 1009ms

Pull time: 1091ms

Push time: 970ms

Pull time: 1085ms

Push time: 963ms

Results for 9.6:

Pull time: 871ms

Push time: 940ms

Pull time: 729ms

Push time: 846ms

Pull time: 710ms

Push time: 901ms

Pull time: 783ms

Push time: 746ms

Pull time: 720ms

Push time: 854ms

Pull time: 811ms

Push time: 836ms

As so often with Java benchmarking, the results are frustratingly variable from one run to the next, however, there does seem to be a reasonably consistent indication that pull parsing was faster under 9.6.

Performance differences under 25% can be very difficult to investigate because of the variability of results.

Actions #3

Updated by Michael Kay almost 7 years ago

Debugging the two releases side-by-side, the only significant change seems to be that 9.8 is copying location information (line number etc) from the pull parser through to the TinyBuilder, whereas 9.6 dropped all location information. (PullPushTee lines 85-88.) If I experimentally remove this code, I get the timings:

Pull time: 799ms

Push time: 688ms

Pull time: 719ms

Push time: 674ms

Pull time: 698ms

Push time: 680ms

Pull time: 723ms

Push time: 687ms

Pull time: 742ms

Push time: 661ms

Pull time: 743ms

Push time: 683ms

which looks much better. But the push timings have improved too, so it may just be that there was less background activity on the machine. Ran again with the location code enabled:

Pull time: 772ms

Push time: 824ms

Pull time: 765ms

Push time: 651ms

Pull time: 764ms

Push time: 695ms

Pull time: 731ms

Push time: 666ms

Pull time: 746ms

Push time: 645ms

Pull time: 766ms

Push time: 637ms

which is not convincingly different.

Back on 9.6, I'm now getting:

Pull time: 706ms

Push time: 697ms

Pull time: 589ms

Push time: 679ms

Pull time: 608ms

Push time: 669ms

Pull time: 622ms

Push time: 695ms

Pull time: 608ms

Push time: 651ms

Pull time: 604ms

Push time: 674ms

So there's definitely some other effect occurring.

Actions #4

Updated by Michael Kay almost 7 years ago

Looking at the Java profile, for 9.8 the top 10 hits are

CPU SAMPLES BEGIN (total = 1972) Fri Jul 28 12:29:02 2017
rank   self  accum   count trace method
   1 49.95% 49.95%     985 300064 java.net.SocketInputStream.socketRead0
   2  9.84% 59.79%     194 300118 net.sf.saxon.pull.PullPushTee.next
   3  9.43% 69.22%     186 300121 net.sf.saxon.pull.PullPushTee.copyEvent
   4  4.56% 73.78%      90 300130 com.sun.xml.stream.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next
   5  2.89% 76.67%      57 300133 com.sun.xml.stream.XMLDocumentScannerImpl.next
   6  2.59% 79.26%      51 300119 com.sun.xml.stream.XMLEntityReaderImpl.scanQName
   7  2.13% 81.39%      42 300138 com.sun.xml.stream.XMLEntityReaderImpl.scanQName
   8  1.77% 83.16%      35 300120 com.sun.xml.stream.XMLEntityReaderImpl.scanQName
   9  1.47% 84.63%      29 300122 net.sf.saxon.om.NamePool.allocateFingerprint
  10  0.91% 85.55%      18 300123 com.sun.xml.stream.XMLEntityReaderImpl.scanQName

whereas for 9.6 we see

CPU SAMPLES BEGIN (total = 1664) Fri Jul 28 12:29:51 2017
rank   self  accum   count trace method
   1 49.94% 49.94%     831 300072 java.net.SocketInputStream.socketRead0
   2 10.04% 59.98%     167 300126 net.sf.saxon.pull.PullPushTee.next
   3  5.05% 65.02%      84 300108 com.sun.xml.stream.XMLNSDocumentScannerImpl.scanStartElement
   4  4.63% 69.65%      77 300119 com.sun.xml.stream.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next
   5  3.97% 73.62%      66 300112 com.sun.xml.stream.XMLEntityReaderImpl.scanQName
   6  3.79% 77.40%      63 300125 net.sf.saxon.pull.StaxBridge.getNameCode
   7  2.46% 79.87%      41 300128 com.sun.xml.stream.XMLNSDocumentScannerImpl.scanEndElement
   8  1.86% 81.73%      31 300123 com.sun.xml.stream.xerces.util.SymbolTable.addSymbol
   9  1.62% 83.35%      27 300127 net.sf.saxon.pull.PullPushTee.copyEvent
  10  1.38% 84.74%      23 300120 com.sun.xml.stream.XMLEntityReaderImpl.load
Actions #5

Updated by Michael Kay almost 7 years ago

Tried modifying the XML file to use only one namespace. Timings now:

9.6

Pull time: 652ms

Pull time: 689ms

Pull time: 733ms

Pull time: 653ms

Pull time: 684ms

Pull time: 707ms

9.8

Pull time: 905ms

Pull time: 855ms

Pull time: 825ms

Pull time: 840ms

Pull time: 818ms

Pull time: 812ms

So the issue doesn't seem to be related to namespaces.

Actions #6

Updated by Michael Kay almost 7 years ago

Although the NamePool design has changed between releases, the number of calls to allocate a name code is the same in both releases (11879999).

Note: on the push() path, the ReceivingContentHandler maintains a local cache of allocated name codes, but there is no equivalent on the pull path. It's possible that allocating name codes has become more expensive as a result of the new NamePool design, which was designed to eliminate concurrency bottlenecks rather than to improve the speed of allocation. I tried dummying-out the name code allocation in both releases (which we can do because we never use the tree after it has been built) and this appears to narrow the gap a little:

9.6

Pull time: 738ms

Pull time: 653ms

Pull time: 671ms

Pull time: 630ms

Pull time: 679ms

Pull time: 644ms

9.8

Pull time: 715ms

Pull time: 699ms

Pull time: 779ms

Pull time: 730ms

Pull time: 772ms

Pull time: 741ms

Actions #7

Updated by Artur Chylek almost 7 years ago

Thanks for the quick response.

We have been using pull approach since the beginning Saxon was added to our project. We upgraded Saxon several times and apart from bumping the version we didn't need to change anything and our regression tests passed. But was is more important I wasn't aware of these push and pull approaches so I didn't test the alternative way first.

After your suggestion I changed the benchmark to use push approach and the results are roughly the same as in 9.6 for pull approach.

I guess I will refactor my application to use push approach.

Artur

Actions #8

Updated by Michael Kay almost 7 years ago

I tried adding the name caching code from ReceivingContentHandler into StaxBridge to reduce the number of name objects allocated, but it made no useful difference - which is not really surprising for this source document as the majority of element names only appear once.

Actions #9

Updated by Michael Kay almost 7 years ago

Discovered an inefficiency in TinyBuilder - when adding a compressed whitespace text node to the tree, we calculate the length of the node (TinyBuilder.characters(), line 360). But we do this in both releases. Whitespace compression might be an opportunity for speed-up, but it's not accounting for the present regression.

I tried dummying the code that allocates a StructuredQName object for each element name (to use the same SQN each time). This appears to make a significant difference (730ms -> 530ms). In 9.6 we were calling the Namepool to allocate a CodedName, so this is an area of change.

OK, I have changed StaxBridge so that on startElement it (a) calls the namePool to allocate a fingerprint, and (b) keeps a cache mapping fingerprints to NodeName objects. It looks in the cache to find a node name for the fingerprint, and if it finds one -- and if the prefix is the same -- then it uses the cached object rather than allocating a new one. With this change (and with some configuration changes to make sure the 9.6 and 9.8 projects are as comparable as possible) I'm now seeing:

9.6

Pull time: 737ms

Pull time: 666ms

Pull time: 670ms

Pull time: 639ms

Pull time: 674ms

Pull time: 635ms

9.8

Pull time: 747ms

Pull time: 682ms

Pull time: 686ms

Pull time: 686ms

Pull time: 680ms

Pull time: 677ms

which is not a significant difference. So I think that (subject to successful regression testing) we can say "problem solved".

I have (a) added back in the 9.8 code to copy location information across, and (b) made a change to make compressed whitespace work as intended, and we still get acceptable results.

Actions #10

Updated by Michael Kay almost 7 years ago

  • Subject changed from Performance degradation after upgrading from Saxon-HE 9.6.0-5 and 9.8.0-3 Java version to Performance regression (9.6 - 9.8) building document using pull parser
  • Status changed from New to Resolved
  • Applies to branch 9.8 added
  • Fix Committed on Branch 9.8 added
Actions #11

Updated by O'Neil Delpratt over 6 years ago

  • Related to Bug #3394: NullPointerException thrown in samples PullExamples-j - StaxBridge added
Actions #12

Updated by O'Neil Delpratt over 6 years ago

  • % Done changed from 0 to 100

A further fix made to deal with a regression resulting from the previous fix. In the class StaxBridge and method getNodeName() we still have to handle the case when the method reader.getNamespaceURI returns null.

Actions #13

Updated by O'Neil Delpratt over 6 years ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 9.8.0.4 added

Bug fix applied in the Saxon 9.8.0.4 maintenance release.

Please register to edit this issue

Also available in: Atom PDF