Project

Profile

Help

processing of large files with many id-refs

Added by Anonymous over 19 years ago

Legacy ID: #2931665 Legacy Poster: Thomas (zabel123)

Hello, i am trying to process a quite large xml-file with many id-refs and pointers.....up to a size of 2mbyte this works finde, but 4mbyte cannot be processed anymore(errors, etc.). I tried this by an invocation of saxon 8.1 on the commandline (i preserved for the transformation process an amount of 10*4mByte=40mByte, but didn't help) and with the Jaxp-interface. what can i do to be able to process large xml-files with a lot of pointers and id-refs? a workaround would also help........or a way to split the transformation (what seems very difficult to me because of the pointers and id-refs) many thanx in advance! Regards Thomas


Replies (6)

Please register to reply

RE: processing of large files with many id-re - Added by Anonymous over 19 years ago

Legacy ID: #2931748 Legacy Poster: Michael Kay (mhkay)

To help you with this I will probably need to see what your XML and XSLT look like, and I will certainly need to know what errors you were getting. (You can post your source files under "Support Requests", or if you want to keep them private, you can mail them to support (at) saxonica.com). 4Mb is not especially large provided you are using efficient access methods such as id() and key(). If you try to follow the references using filter expressions such as //rec[@id=$val] then it will become very time-consuming. Michael Kay

RE: processing of large files with many id-refs - Added by Anonymous over 19 years ago

Legacy ID: #2931834 Legacy Poster: Thomas (zabel123)

Hi Michael, first i must say, that i always thought, that i get an error, but i am not sure about this anymore (i stopped the processing, because i thougt, i'd get an error). it seems, that the processing-time of large-xml-files increases exponentially with the filesize. up to now, i have the following processing-times: source(mByte) processing-time (seconds) -------------------------------------------------- 1 27 seconds 1,67 109 seconds 3 342 seconds 4 ????? 5 ????? our solution should be able to process 4/5mByte-xml-source-files. do you think, that the performance could be increased signifiucantly or that the transformation of these files should work? the XSLT-sheets, that we wrote use the more performant access methods id() and key(). i will now send a 5mByte-xml-source file and the corresponding XSLT-sheet to your specified adress ( support (at) saxonica.com ). THX for you taking a look into my sheets and for your GREAT support in general!!! Regards Thomas

RE: processing of large files with many id-re - Added by Anonymous over 19 years ago

Legacy ID: #2932051 Legacy Poster: Michael Kay (mhkay)

Measuring the time with increasing data sizes is definitely the right thing to do: your figures seem to show that the elapsed time increases quadratically (not exponentially!). The obvious things in your stylesheet that are likely to cause quadratic behaviour are <xsl:for-each select="//General_classification[Classification_association/Classified_element = $ItemVersion]"> and <xsl:for-each select="//*[Property_value_representation/Item_property_association/Described_element = $ItemVersion]"> and these look like constructs that are very easily replaced by calls on the key() function. These expressions cause quadratic performance because the time taken to execute the expression is linear with the document size, and the number of times the expression is executed is also linear with the document size. There may be other similar things that I have missed, I only took a very quick glance through the code. Michael Kay http://www.saxonica.com/

RE: processing of large files with many id-refs - Added by Anonymous over 19 years ago

Legacy ID: #2933672 Legacy Poster: Thomas (zabel123)

:-) ok, i will have a go on my script to improve its performance. now I have another xslt-stylesheet, which is used to transform the same xml-source. performance-tests of this stylesheet resulted in the following values (with precompiled stylesheet): source(mByte) -> time (seconds) ------------------------------------ 1,00 1,864 1,67 3,034 2,85 4,078 5,00 for 15 hours til now ?! The behaviour seems in this case not to increase quadratically. and in case of the 5mByte-source-file the processor didn't finish the transformation until now. but this should work....... i do not know, why this doesn't work. could you take another quick glance into my sheet? (which i will now send to ) Regards Thomas

RE: processing of large files with many id-re - Added by Anonymous over 19 years ago

Legacy ID: #2934911 Legacy Poster: Michael Kay (mhkay)

This doesn't run indefinitely for me, it crashes with a stack full error. I strongly suspect the reason for this is that there is a cycle in your 5Mb data, and your stylesheet is looping on it. Michael Kay http://www.saxonica.com/

RE: processing of large files with many id-refs - Added by Anonymous over 19 years ago

Legacy ID: #2935794 Legacy Poster: Thomas (zabel123)

I will check out, if the source file is corrupt. Thank you, Michael! note: your advice to improve the performance of my stylesheet makes it 100(!!!!!!) times faster. Thanx a lot!!!! Regards Thomas

    (1-6/6)

    Please register to reply