Project

Profile

Help

Bug #2271 » Bug #3873 - 2014-12-23T18_29_11Z.eml

Tomaž Erjavec, 2014-12-23 19:29

 
Return-Path: <Tomaz.Erjavec@ijs.si>
Received: from mi025.mc1.hosteurope.de ([80.237.138.230]) by wp245.webpack.hosteurope.de running ExIM with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) id 1Y3UCc-00007z-TV; Tue, 23 Dec 2014 19:29:02 +0100
Received: from mail.ijs.si ([193.2.4.66]) by mx0.webpack.hosteurope.de (mi025.mc1.hosteurope.de) with esmtps (TLSv1.1:DHE-RSA-AES256-SHA:256) id 1Y3UCb-0001BQ-Ed for dropbox+saxonica+f38e@plan.io; Tue, 23 Dec 2014 19:29:02 +0100
Received: from amavis-proxy-ori.ijs.si (localhost [IPv6:::1]) by mail.ijs.si (Postfix) with ESMTP id 3k6Qzx1Drczp0 for <dropbox+saxonica+f38e@plan.io>; Tue, 23 Dec 2014 19:29:01 +0100
Received: from mail.ijs.si ([IPv6:::1]) by amavis-proxy-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10012) with ESMTP id 3GePXrDg4XI2 for <dropbox+saxonica+f38e@plan.io>; Tue, 23 Dec 2014 19:28:57 +0100
Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP for <dropbox+saxonica+f38e@plan.io>; Tue, 23 Dec 2014 19:28:56 +0100
Received: from [192.168.1.136] (188-230-155-248.dynamic.t-2.net [188.230.155.248]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mildred.ijs.si (Postfix) with ESMTPSA id 3k6Qzr45BXz1Lx for <dropbox+saxonica+f38e@plan.io>; Tue, 23 Dec 2014 19:28:56 +0100
Date: Tue, 23 Dec 2014 19:29:06 +0100
From: =?UTF-8?B?VG9tYcW+IEVyamF2ZWM=?= <Tomaz.Erjavec@ijs.si>
To: Saxonica Developer Community <dropbox+saxonica+f38e@plan.io>
Message-ID: <5499B472.5030004@ijs.si>
In-Reply-To: <redmine.journal-3851.20141222173745@plan.io>
References: <redmine.issue-2271.20141221205103@plan.io>
<redmine.journal-3851.20141222173745@plan.io>
Subject: Re: [Saxon - Bug #2271] AIOOBE with large xml file
Mime-Version: 1.0
Content-Type: multipart/alternative;
boundary=------------060604000301060009090905;
charset=UTF-8
Content-Transfer-Encoding: 7bit
Delivery-date: Tue, 23 Dec 2014 19:29:02 +0100
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h=
content-type:content-type:in-reply-to:references:subject:subject
:mime-version:user-agent:from:from:date:date:message-id:received
:received:received; s=jakla4; t=1419359337; x=1421951338; bh=uiE
gUOO4/wUjnxYyLzex+CE/MiT+/t3Axcj8nr4I5t8=; b=L5ti66IUZTjWdvlwpsi
ulu/jFZGS2ODk4LJ3z//WSFdCJ2ZHbebJFyvweTQ5qzBUagQ15LRs6lQLauoI/Qx
Z9P0Z9Og2HP1HFYB4YiNCYTqwzq+RBRuFTItUTp/uzlJx4HgXMA6QeIPGl1at59r
kCKFY+cJNVP5/7GbwxFLbdYk=
X-Virus-Scanned: amavisd-new at ijs.si
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101
Thunderbird/31.3.0
X-HE-Spam-Level: -----
X-HE-Spam-Score: -5.0
X-HE-Spam-Report: Content analysis details: (-5.0 points) pts rule name
description ---- ----------------------
-------------------------------------------------- -5.0 RCVD_IN_DNSWL_HI RBL:
Sender listed at http://www.dnswl.org/, high trust [193.2.4.66 listed in
list.dnswl.org] 0.1 HTML_MESSAGE BODY: HTML included in message -0.1
DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain
-0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1
DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid
X-HE-SPF: PASSED
Envelope-to: dropbox+saxonica+f38e@plan.io

This is a multi-part message in MIME format.
--------------060604000301060009090905
Content-Type: text/plain;
charset=utf-8;
format=flowed
Content-Transfer-Encoding: quoted-printable

Thanks for running the tests and making the patch. OK, so file is simply =

too big; still, it is, I guess, nice that a misleading error is not =

reported.
I tried processing with the linked tree model - after one hour my =

machine also ran out of heap space..
So, the moral is to work with smaller files, at least until Java handles =

larger structures.
All the best,
Toma=C5=BE

Dne 22.12.2014 ob 18:37 je Saxonica Developer Community zapisal(a):
>
> --- In your reply, please do not write below this line ---
>
> Issue #2271 has been updated by Michael Kay.
>
> * *Found in version* changed from /1.8.0_25/ to /9.6/
>
> I've reproduced the problem, and it's nothing to do with Java 8. It's =

> simply hitting the limit of 1G characters allowed in text nodes in a =

> TinyTree document. I'm committing a patch that makes it fail cleanly =

> when this limit is reached.
>
> (The source document is 2.9Gb, and it appears to consist largely of =

> text with very little markup).
>
> It's possible that you would get further with the linked tree model (I =

> tried it and ran out of memory). In theory you would be able to create =

> the tree successfully, and would only hit problems if you try and get =

> the string value of the root node (which would blow the Java limit for =

> a String or CharSequence).
>
> As memory gets larger we're probably going to have to think about how =

> to handle larger source documents. It won't be easy unless Java gives =

> us some help: we would have to avoid data structures involving large =

> strings or arrays, and we would have to change some APIs, which would =

> all be rather painful.
>
> -----------------------------------------------------------------------=
-
>
>
> Bug #2271: AIOOBE with large xml file
> <https://saxonica.plan.io/issues/2271#change-3851>
>
> * Author: Toma=C5=BE Erjavec
> * Status: In Progress
> * Priority: Normal
> * Assignee: Toma=C5=BE Erjavec
> * Category: Internals
> * Sprint/Milestone:
> * Legacy ID:
> * Found in version: 9.6
> * Fixed in version:
>
> Hi,
> Saxon gives me an array index out of bounds when I try to process a =

> large file and this happens even with an empty stylesheet. I can =

> understand that it wouldn't work, but with an exception saying out of =

> memory, but not AIOOBE.
> I'm using Saxon 9.6.0.3 (I tried with some older versions, same =

> problem) with java 1.8.0_25:
> Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
> Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
> Below is the trace.
> All the best,
> Toma=C5=BE
> PS: I can send the file if it would help.
>
> $ du -h blog.bug.xml
> 2,8G blog.bug.xml
> $ java -jar /usr/local/bin/saxon9he.jar -xsl:empty.xsl blog.bug.xml > =

> bug.vert
> java.lang.ArrayIndexOutOfBoundsException: -32768
> at =

> net.sf.saxon.tree.tiny.LargeStringBuffer.append(LargeStringBuffer.java:=
90)
> at net.sf.saxon.tree.tiny.TinyTree.appendChars(TinyTree.java:405)
> at net.sf.saxon.tree.tiny.TinyBuilder.makeTextNode(TinyBuilder.java:380=
)
> at net.sf.saxon.tree.tiny.TinyBuilder.characters(TinyBuilder.java:362)
> at =

> net.sf.saxon.event.ReceivingContentHandler.flush(ReceivingContentHandle=
r.java:544)
> at =

> net.sf.saxon.event.ReceivingContentHandler.endElement(ReceivingContentH=
andler.java:435)
> at =

> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement=
(AbstractSAXParser.java:609)
> at =

> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.=
scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
> at =

> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$=
FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2973)
> at =

> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XML=
DocumentScannerImpl.java:606)
> at =

> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(X=
MLNSDocumentScannerImpl.java:117)
> at =

> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.=
scanDocument(XMLDocumentFragmentScannerImpl.java:510)
> at =

> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML=
11Configuration.java:848)
> at =

> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML=
11Configuration.java:777)
> at =

> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.ja=
va:141)
> at =

> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abst=
ractSAXParser.java:1213)
> at =

> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.par=
se(SAXParserImpl.java:649)
> at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:440)
> at net.sf.saxon.event.Sender.send(Sender.java:171)
> at net.sf.saxon.Controller.transform(Controller.java:1690)
> at net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:54=
7)
> at net.sf.saxon.Transform.processFile(Transform.java:1056)
> at net.sf.saxon.Transform.doTransform(Transform.java:659)
> at net.sf.saxon.Transform.main(Transform.java:80)
> Fatal error during transformation: =

> java.lang.ArrayIndexOutOfBoundsException: -32768
>
> -----------------------------------------------------------------------=
-
>
> You have received this notification because you have either subscribed =

> to or are involved in a project on Saxonica Developer Community site.
> To change your notification preferences, please click here: =

> https://saxonica.plan.io/my/account
>
> =

>
> This notification was cheerfully delivered by <https://plan.io/>
> =

> Planio <https://plan.io/>
>


--------------060604000301060009090905
Content-Type: text/html;
charset=utf-8
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Content-Ty=
pe">
</head>
<body bgcolor=3D"#FFFFFF" text=3D"#000000">
Thanks for running the tests and making the patch. OK, so file is
simply too big; still, it is, I guess, nice that a misleading error
is not reported.<br>
I tried processing with the linked tree model - after one hour my
machine also ran out of heap space..<br>
So, the moral is to work with smaller files, at least until Java
handles larger structures.<br>
All the best,<br>
Toma=C5=BE<br>
<br>
<div class=3D"moz-cite-prefix">Dne 22.12.2014 ob 18:37 je Saxonica
Developer Community zapisal(a):<br>
</div>
<blockquote cite=3D"mid:redmine.journal-3851.20141222173745@plan.io"
type=3D"cite">
<style>
@import url(<a class=3D"moz-txt-link-freetext" href=3D"https://assets.pla=
n.io/stylesheets/fonts.css">https://assets.plan.io/stylesheets/fonts.css<=
/a>);
body {
font-family: "ProximaNova-Regular", Verdana, sans-serif;
font-size: 1.1em;
color:#333434;
}
h1, h2, h3 { font-family: "ProximaNova-Bold", "Trebuchet MS", Verdana, sa=
ns-serif; margin: 0px; }
h1 { font-size: 1.2em; }
h2, h3 { font-size: 1.1em; }
a, a:link, a:visited, a:hover, a:active { color:#2b7a94; }
a.wiki-anchor { display: none; }
hr {
width: 100%;
height: 1px;
background: #ccc;
border: 0;
}
</style>
<table width=3D"100%">
<tbody>
<tr>
<td style=3D"font-family: MarketWeb, Verdana,
sans-serif;font-size:0.8em;text-align:center;width:100%;col=
or:#D7D7D7;">
<p>--- In your reply, please do not write below this line
---</p>
</td>
</tr>
<tr>
<td>Issue #2271 has been updated by Michael Kay.
<ul>
<li><strong>Found in version</strong> changed from <i>1.8=
.0_25</i>
to <i>9.6</i></li>
</ul>
<p>I've reproduced the problem, and it's nothing to do
with Java 8. It's simply hitting the limit of 1G
characters allowed in text nodes in a TinyTree document.
I'm committing a patch that makes it fail cleanly when
this limit is reached.</p>
<p>(The source document is 2.9Gb, and it appears to
consist largely of text with very little markup).</p>
<p>It's possible that you would get further with the
linked tree model (I tried it and ran out of memory). In
theory you would be able to create the tree
successfully, and would only hit problems if you try and
get the string value of the root node (which would blow
the Java limit for a String or CharSequence).</p>
<p>As memory gets larger we're probably going to have to
think about how to handle larger source documents. It
won't be easy unless Java gives us some help: we would
have to avoid data structures involving large strings or
arrays, and we would have to change some APIs, which
would all be rather painful.</p>
<hr>
<h1><a moz-do-not-send=3D"true"
href=3D"https://saxonica.plan.io/issues/2271#change-385=
1">Bug
#2271: AIOOBE with large xml file</a></h1>
<ul>
<li>Author: Toma=C5=BE Erjavec</li>
<li>Status: In Progress</li>
<li>Priority: Normal</li>
<li>Assignee: Toma=C5=BE Erjavec</li>
<li>Category: Internals</li>
<li>Sprint/Milestone: </li>
<li>Legacy ID: </li>
<li>Found in version: 9.6</li>
<li>Fixed in version: </li>
</ul>
<p>Hi,<br>
Saxon gives me an array index out of bounds when I try
to process a large file and this happens even with an
empty stylesheet. I can understand that it wouldn't
work, but with an exception saying out of memory, but
not AIOOBE.<br>
I'm using Saxon 9.6.0.3 (I tried with some older
versions, same problem) with java 1.8.0_25:<br>
Java(TM) SE Runtime Environment (build 1.8.0_25-b17)<br>
Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02,
mixed mode)<br>
Below is the trace.<br>
All the best,<br>
Toma=C5=BE<br>
PS: I can send the file if it would help.</p>
<p>$ du -h blog.bug.xml<br>
2,8G blog.bug.xml<br>
$ java -jar /usr/local/bin/saxon9he.jar -xsl:empty.xsl
blog.bug.xml &gt; bug.vert<br>
java.lang.ArrayIndexOutOfBoundsException: -32768<br>
at
net.sf.saxon.tree.tiny.LargeStringBuffer.append(LargeStringBuffer.java:90=
)<br>
at
net.sf.saxon.tree.tiny.TinyTree.appendChars(TinyTree.java=
:405)<br>
at
net.sf.saxon.tree.tiny.TinyBuilder.makeTextNode(TinyBuild=
er.java:380)<br>
at
net.sf.saxon.tree.tiny.TinyBuilder.characters(TinyBuilder=
.java:362)<br>
at
net.sf.saxon.event.ReceivingContentHandler.flush(ReceivingContentHandler.=
java:544)<br>
at
net.sf.saxon.event.ReceivingContentHandler.endElement(ReceivingContentHan=
dler.java:435)<br>
at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(A=
bstractSAXParser.java:609)<br>
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.sc=
anEndElement(XMLDocumentFragmentScannerImpl.java:1782)<br>
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$Fr=
agmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2973)<br>
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDo=
cumentScannerImpl.java:606)<br>
at
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XML=
NSDocumentScannerImpl.java:117)<br>
at
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.sc=
anDocument(XMLDocumentFragmentScannerImpl.java:510)<br>
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11=
Configuration.java:848)<br>
at
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11=
Configuration.java:777)<br>
at
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java=
:141)<br>
at
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abstra=
ctSAXParser.java:1213)<br>
at
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse=
(SAXParserImpl.java:649)<br>
at
net.sf.saxon.event.Sender.sendSAXSource(Sender.java:440)<=
br>
at net.sf.saxon.event.Sender.send(Sender.java:171)<br>
at
net.sf.saxon.Controller.transform(Controller.java:1690)<b=
r>
at
net.sf.saxon.s9api.XsltTransformer.transform(XsltTransfor=
mer.java:547)<br>
at
net.sf.saxon.Transform.processFile(Transform.java:1056)<b=
r>
at
net.sf.saxon.Transform.doTransform(Transform.java:659)<br=
>
at net.sf.saxon.Transform.main(Transform.java:80)<br>
Fatal error during transformation:
java.lang.ArrayIndexOutOfBoundsException: -32768</p>
<script type=3D"application/ld+json">
{
"@context": <a class=3D"moz-txt-link-rfc2396E" href=3D"http://schema.or=
g">"http://schema.org"</a>,
"@type": "EmailMessage",
"action": {
"@type": "ViewAction",
"url": <a class=3D"moz-txt-link-rfc2396E" href=3D"https://saxonica.pl=
an.io/issues/2271#change-3851">"https://saxonica.plan.io/issues/2271#chan=
ge-3851"</a>,
"name": "View on Planio"
},
"description": "Click here to view this issue update on Planio."
}
</script></td>
</tr>
<tr>
<td style=3D"font-size:0.8em;width:100%;">
<hr>
<p>You have received this notification because you have
either subscribed to or are involved in a project on
Saxonica Developer Community site.<br>
To change your notification preferences, please click
here: <a moz-do-not-send=3D"true" class=3D"external"
href=3D"https://saxonica.plan.io/my/account">https://sa=
xonica.plan.io/my/account</a></p>
</td>
<td><br>
</td>
</tr>
<tr>
<td style=3D"font-family: MarketWeb, Verdana,
sans-serif;font-size:1.2em;text-align:center;width:100%;col=
or:#D7D7D7;"><br>
<div><a moz-do-not-send=3D"true" href=3D"https://plan.io/"
style=3D"color:#D7D7D7;text-decoration:none;">This
notification was cheerfully delivered by</a></div>
</td>
<td><br>
</td>
</tr>
<tr>
<td style=3D"text-align:center;width:100%;"><a
moz-do-not-send=3D"true" href=3D"https://plan.io/"
title=3D"Planio"><img moz-do-not-send=3D"true"
src=3D"https://assets.plan.io/images/planio_logo_gray_2=
04x50.png"
alt=3D"Planio" style=3D"vertical-align: middle;"
height=3D"25" width=3D"102"></a></td>
</tr>
</tbody>
</table>
</blockquote>
<br>
</body>
</html>

--------------060604000301060009090905--
    (1-1/1)