Project

Profile

Help

Bug #2271 » Bug #3873 - 2014-12-23T18_29_11Z.eml

Tomaž Erjavec, 2014-12-23 19:29

 
1
Return-Path: <Tomaz.Erjavec@ijs.si>
2
Received: from mi025.mc1.hosteurope.de ([80.237.138.230]) by wp245.webpack.hosteurope.de running ExIM with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) id 1Y3UCc-00007z-TV; Tue, 23 Dec 2014 19:29:02 +0100
3
Received: from mail.ijs.si ([193.2.4.66]) by mx0.webpack.hosteurope.de (mi025.mc1.hosteurope.de) with esmtps (TLSv1.1:DHE-RSA-AES256-SHA:256) id 1Y3UCb-0001BQ-Ed for dropbox+saxonica+f38e@plan.io; Tue, 23 Dec 2014 19:29:02 +0100
4
Received: from amavis-proxy-ori.ijs.si (localhost [IPv6:::1]) by mail.ijs.si (Postfix) with ESMTP id 3k6Qzx1Drczp0 for <dropbox+saxonica+f38e@plan.io>; Tue, 23 Dec 2014 19:29:01 +0100
5
Received: from mail.ijs.si ([IPv6:::1]) by amavis-proxy-ori.ijs.si (mail.ijs.si [IPv6:::1]) (amavisd-new, port 10012) with ESMTP id 3GePXrDg4XI2 for <dropbox+saxonica+f38e@plan.io>; Tue, 23 Dec 2014 19:28:57 +0100
6
Received: from mildred.ijs.si (mailbox.ijs.si [IPv6:2001:1470:ff80::143:1]) by mail.ijs.si (Postfix) with ESMTP for <dropbox+saxonica+f38e@plan.io>; Tue, 23 Dec 2014 19:28:56 +0100
7
Received: from [192.168.1.136] (188-230-155-248.dynamic.t-2.net [188.230.155.248]) (using TLSv1 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by mildred.ijs.si (Postfix) with ESMTPSA id 3k6Qzr45BXz1Lx for <dropbox+saxonica+f38e@plan.io>; Tue, 23 Dec 2014 19:28:56 +0100
8
Date: Tue, 23 Dec 2014 19:29:06 +0100
9
From: =?UTF-8?B?VG9tYcW+IEVyamF2ZWM=?= <Tomaz.Erjavec@ijs.si>
10
To: Saxonica Developer Community <dropbox+saxonica+f38e@plan.io>
11
Message-ID: <5499B472.5030004@ijs.si>
12
In-Reply-To: <redmine.journal-3851.20141222173745@plan.io>
13
References: <redmine.issue-2271.20141221205103@plan.io>
14
 <redmine.journal-3851.20141222173745@plan.io>
15
Subject: Re: [Saxon - Bug #2271] AIOOBE with large xml file
16
Mime-Version: 1.0
17
Content-Type: multipart/alternative;
18
 boundary=------------060604000301060009090905;
19
 charset=UTF-8
20
Content-Transfer-Encoding: 7bit
21
Delivery-date: Tue, 23 Dec 2014 19:29:02 +0100
22
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=ijs.si; h=
23
 content-type:content-type:in-reply-to:references:subject:subject
24
 :mime-version:user-agent:from:from:date:date:message-id:received
25
 :received:received; s=jakla4; t=1419359337; x=1421951338; bh=uiE
26
 gUOO4/wUjnxYyLzex+CE/MiT+/t3Axcj8nr4I5t8=; b=L5ti66IUZTjWdvlwpsi
27
 ulu/jFZGS2ODk4LJ3z//WSFdCJ2ZHbebJFyvweTQ5qzBUagQ15LRs6lQLauoI/Qx
28
 Z9P0Z9Og2HP1HFYB4YiNCYTqwzq+RBRuFTItUTp/uzlJx4HgXMA6QeIPGl1at59r
29
 kCKFY+cJNVP5/7GbwxFLbdYk=
30
X-Virus-Scanned: amavisd-new at ijs.si
31
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101
32
 Thunderbird/31.3.0
33
X-HE-Spam-Level: -----
34
X-HE-Spam-Score: -5.0
35
X-HE-Spam-Report: Content analysis details: (-5.0 points) pts rule name
36
 description ---- ----------------------
37
 -------------------------------------------------- -5.0 RCVD_IN_DNSWL_HI RBL:
38
 Sender listed at http://www.dnswl.org/, high trust [193.2.4.66 listed in
39
 list.dnswl.org] 0.1 HTML_MESSAGE BODY: HTML included in message -0.1
40
 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain
41
 -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature 0.1
42
 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid
43
X-HE-SPF: PASSED
44
Envelope-to: dropbox+saxonica+f38e@plan.io
45

    
46
This is a multi-part message in MIME format.
47
--------------060604000301060009090905
48
Content-Type: text/plain;
49
 charset=utf-8;
50
 format=flowed
51
Content-Transfer-Encoding: quoted-printable
52

    
53
Thanks for running the tests and making the patch. OK, so file is simply =
54

    
55
too big; still, it is, I guess, nice that a misleading error is not =
56

    
57
reported.
58
I tried processing with the linked tree model - after one hour my =
59

    
60
machine also ran out of heap space..
61
So, the moral is to work with smaller files, at least until Java handles =
62

    
63
larger structures.
64
All the best,
65
Toma=C5=BE
66

    
67
Dne 22.12.2014 ob 18:37 je Saxonica Developer Community zapisal(a):
68
>
69
> --- In your reply, please do not write below this line ---
70
>
71
> Issue #2271 has been updated by Michael Kay.
72
>
73
>   * *Found in version* changed from /1.8.0_25/ to /9.6/
74
>
75
> I've reproduced the problem, and it's nothing to do with Java 8. It's =
76

    
77
> simply hitting the limit of 1G characters allowed in text nodes in a =
78

    
79
> TinyTree document. I'm committing a patch that makes it fail cleanly =
80

    
81
> when this limit is reached.
82
>
83
> (The source document is 2.9Gb, and it appears to consist largely of =
84

    
85
> text with very little markup).
86
>
87
> It's possible that you would get further with the linked tree model (I =
88

    
89
> tried it and ran out of memory). In theory you would be able to create =
90

    
91
> the tree successfully, and would only hit problems if you try and get =
92

    
93
> the string value of the root node (which would blow the Java limit for =
94

    
95
> a String or CharSequence).
96
>
97
> As memory gets larger we're probably going to have to think about how =
98

    
99
> to handle larger source documents. It won't be easy unless Java gives =
100

    
101
> us some help: we would have to avoid data structures involving large =
102

    
103
> strings or arrays, and we would have to change some APIs, which would =
104

    
105
> all be rather painful.
106
>
107
> -----------------------------------------------------------------------=
108
-
109
>
110
>
111
>   Bug #2271: AIOOBE with large xml file
112
>   <https://saxonica.plan.io/issues/2271#change-3851>
113
>
114
>   * Author: Toma=C5=BE Erjavec
115
>   * Status: In Progress
116
>   * Priority: Normal
117
>   * Assignee: Toma=C5=BE Erjavec
118
>   * Category: Internals
119
>   * Sprint/Milestone:
120
>   * Legacy ID:
121
>   * Found in version: 9.6
122
>   * Fixed in version:
123
>
124
> Hi,
125
> Saxon gives me an array index out of bounds when I try to process a =
126

    
127
> large file and this happens even with an empty stylesheet. I can =
128

    
129
> understand that it wouldn't work, but with an exception saying out of =
130

    
131
> memory, but not AIOOBE.
132
> I'm using Saxon 9.6.0.3 (I tried with some older versions, same =
133

    
134
> problem) with java 1.8.0_25:
135
> Java(TM) SE Runtime Environment (build 1.8.0_25-b17)
136
> Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02, mixed mode)
137
> Below is the trace.
138
> All the best,
139
> Toma=C5=BE
140
> PS: I can send the file if it would help.
141
>
142
> $ du -h blog.bug.xml
143
> 2,8G blog.bug.xml
144
> $ java -jar /usr/local/bin/saxon9he.jar -xsl:empty.xsl blog.bug.xml > =
145

    
146
> bug.vert
147
> java.lang.ArrayIndexOutOfBoundsException: -32768
148
> at =
149

    
150
> net.sf.saxon.tree.tiny.LargeStringBuffer.append(LargeStringBuffer.java:=
151
90)
152
> at net.sf.saxon.tree.tiny.TinyTree.appendChars(TinyTree.java:405)
153
> at net.sf.saxon.tree.tiny.TinyBuilder.makeTextNode(TinyBuilder.java:380=
154
)
155
> at net.sf.saxon.tree.tiny.TinyBuilder.characters(TinyBuilder.java:362)
156
> at =
157

    
158
> net.sf.saxon.event.ReceivingContentHandler.flush(ReceivingContentHandle=
159
r.java:544)
160
> at =
161

    
162
> net.sf.saxon.event.ReceivingContentHandler.endElement(ReceivingContentH=
163
andler.java:435)
164
> at =
165

    
166
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement=
167
(AbstractSAXParser.java:609)
168
> at =
169

    
170
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.=
171
scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
172
> at =
173

    
174
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$=
175
FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2973)
176
> at =
177

    
178
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XML=
179
DocumentScannerImpl.java:606)
180
> at =
181

    
182
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(X=
183
MLNSDocumentScannerImpl.java:117)
184
> at =
185

    
186
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.=
187
scanDocument(XMLDocumentFragmentScannerImpl.java:510)
188
> at =
189

    
190
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML=
191
11Configuration.java:848)
192
> at =
193

    
194
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML=
195
11Configuration.java:777)
196
> at =
197

    
198
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.ja=
199
va:141)
200
> at =
201

    
202
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abst=
203
ractSAXParser.java:1213)
204
> at =
205

    
206
> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.par=
207
se(SAXParserImpl.java:649)
208
> at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:440)
209
> at net.sf.saxon.event.Sender.send(Sender.java:171)
210
> at net.sf.saxon.Controller.transform(Controller.java:1690)
211
> at net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:54=
212
7)
213
> at net.sf.saxon.Transform.processFile(Transform.java:1056)
214
> at net.sf.saxon.Transform.doTransform(Transform.java:659)
215
> at net.sf.saxon.Transform.main(Transform.java:80)
216
> Fatal error during transformation: =
217

    
218
> java.lang.ArrayIndexOutOfBoundsException: -32768
219
>
220
> -----------------------------------------------------------------------=
221
-
222
>
223
> You have received this notification because you have either subscribed =
224

    
225
> to or are involved in a project on Saxonica Developer Community site.
226
> To change your notification preferences, please click here: =
227

    
228
> https://saxonica.plan.io/my/account
229
>
230
> 	=
231

    
232
>
233
> This notification was cheerfully delivered by <https://plan.io/>
234
> 	=
235

    
236
> Planio <https://plan.io/>
237
>
238

    
239

    
240
--------------060604000301060009090905
241
Content-Type: text/html;
242
 charset=utf-8
243
Content-Transfer-Encoding: quoted-printable
244

    
245
<html>
246
  <head>
247
    <meta content=3D"text/html; charset=3Dutf-8" http-equiv=3D"Content-Ty=
248
pe">
249
  </head>
250
  <body bgcolor=3D"#FFFFFF" text=3D"#000000">
251
    Thanks for running the tests and making the patch. OK, so file is
252
    simply too big; still, it is, I guess, nice that a misleading error
253
    is not reported.<br>
254
    I tried processing with the linked tree model - after one hour my
255
    machine also ran out of heap space..<br>
256
    So, the moral is to work with smaller files, at least until Java
257
    handles larger structures.<br>
258
    All the best,<br>
259
    Toma=C5=BE<br>
260
    <br>
261
    <div class=3D"moz-cite-prefix">Dne 22.12.2014 ob 18:37 je Saxonica
262
      Developer Community zapisal(a):<br>
263
    </div>
264
    <blockquote cite=3D"mid:redmine.journal-3851.20141222173745@plan.io"
265
      type=3D"cite">
266
      <style>
267
@import url(<a class=3D"moz-txt-link-freetext" href=3D"https://assets.pla=
268
n.io/stylesheets/fonts.css">https://assets.plan.io/stylesheets/fonts.css<=
269
/a>);
270
body {
271
  font-family: "ProximaNova-Regular", Verdana, sans-serif;
272
  font-size: 1.1em;
273
  color:#333434;
274
}
275
h1, h2, h3 { font-family: "ProximaNova-Bold", "Trebuchet MS", Verdana, sa=
276
ns-serif; margin: 0px; }
277
h1 { font-size: 1.2em; }
278
h2, h3 { font-size: 1.1em; }
279
a, a:link, a:visited, a:hover, a:active { color:#2b7a94; }
280
a.wiki-anchor { display: none; }
281
hr {
282
  width: 100%;
283
  height: 1px;
284
  background: #ccc;
285
  border: 0;
286
}
287
</style>
288
      <table width=3D"100%">
289
        <tbody>
290
          <tr>
291
            <td style=3D"font-family: MarketWeb, Verdana,
292
              sans-serif;font-size:0.8em;text-align:center;width:100%;col=
293
or:#D7D7D7;">
294
              <p>--- In your reply, please do not write below this line
295
                ---</p>
296
            </td>
297
          </tr>
298
          <tr>
299
            <td>Issue #2271 has been updated by Michael Kay.
300
              <ul>
301
                <li><strong>Found in version</strong> changed from <i>1.8=
302
.0_25</i>
303
                  to <i>9.6</i></li>
304
              </ul>
305
              <p>I've reproduced the problem, and it's nothing to do
306
                with Java 8. It's simply hitting the limit of 1G
307
                characters allowed in text nodes in a TinyTree document.
308
                I'm committing a patch that makes it fail cleanly when
309
                this limit is reached.</p>
310
              <p>(The source document is 2.9Gb, and it appears to
311
                consist largely of text with very little markup).</p>
312
              <p>It's possible that you would get further with the
313
                linked tree model (I tried it and ran out of memory). In
314
                theory you would be able to create the tree
315
                successfully, and would only hit problems if you try and
316
                get the string value of the root node (which would blow
317
                the Java limit for a String or CharSequence).</p>
318
              <p>As memory gets larger we're probably going to have to
319
                think about how to handle larger source documents. It
320
                won't be easy unless Java gives us some help: we would
321
                have to avoid data structures involving large strings or
322
                arrays, and we would have to change some APIs, which
323
                would all be rather painful.</p>
324
              <hr>
325
              <h1><a moz-do-not-send=3D"true"
326
                  href=3D"https://saxonica.plan.io/issues/2271#change-385=
327
1">Bug
328
                  #2271: AIOOBE with large xml file</a></h1>
329
              <ul>
330
                <li>Author: Toma=C5=BE Erjavec</li>
331
                <li>Status: In Progress</li>
332
                <li>Priority: Normal</li>
333
                <li>Assignee: Toma=C5=BE Erjavec</li>
334
                <li>Category: Internals</li>
335
                <li>Sprint/Milestone: </li>
336
                <li>Legacy ID: </li>
337
                <li>Found in version: 9.6</li>
338
                <li>Fixed in version: </li>
339
              </ul>
340
              <p>Hi,<br>
341
                Saxon gives me an array index out of bounds when I try
342
                to process a large file and this happens even with an
343
                empty stylesheet. I can understand that it wouldn't
344
                work, but with an exception saying out of memory, but
345
                not AIOOBE.<br>
346
                I'm using Saxon 9.6.0.3 (I tried with some older
347
                versions, same problem) with java 1.8.0_25:<br>
348
                Java(TM) SE Runtime Environment (build 1.8.0_25-b17)<br>
349
                Java HotSpot(TM) 64-Bit Server VM (build 25.25-b02,
350
                mixed mode)<br>
351
                Below is the trace.<br>
352
                All the best,<br>
353
                Toma=C5=BE<br>
354
                PS: I can send the file if it would help.</p>
355
              <p>$ du -h blog.bug.xml<br>
356
                2,8G blog.bug.xml<br>
357
                $ java -jar /usr/local/bin/saxon9he.jar -xsl:empty.xsl
358
                blog.bug.xml &gt; bug.vert<br>
359
                java.lang.ArrayIndexOutOfBoundsException: -32768<br>
360
                at
361
net.sf.saxon.tree.tiny.LargeStringBuffer.append(LargeStringBuffer.java:90=
362
)<br>
363
                at
364
                net.sf.saxon.tree.tiny.TinyTree.appendChars(TinyTree.java=
365
:405)<br>
366
                at
367
                net.sf.saxon.tree.tiny.TinyBuilder.makeTextNode(TinyBuild=
368
er.java:380)<br>
369
                at
370
                net.sf.saxon.tree.tiny.TinyBuilder.characters(TinyBuilder=
371
.java:362)<br>
372
                at
373
net.sf.saxon.event.ReceivingContentHandler.flush(ReceivingContentHandler.=
374
java:544)<br>
375
                at
376
net.sf.saxon.event.ReceivingContentHandler.endElement(ReceivingContentHan=
377
dler.java:435)<br>
378
                at
379
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(A=
380
bstractSAXParser.java:609)<br>
381
                at
382
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.sc=
383
anEndElement(XMLDocumentFragmentScannerImpl.java:1782)<br>
384
                at
385
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$Fr=
386
agmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2973)<br>
387
                at
388
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDo=
389
cumentScannerImpl.java:606)<br>
390
                at
391
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XML=
392
NSDocumentScannerImpl.java:117)<br>
393
                at
394
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.sc=
395
anDocument(XMLDocumentFragmentScannerImpl.java:510)<br>
396
                at
397
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11=
398
Configuration.java:848)<br>
399
                at
400
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11=
401
Configuration.java:777)<br>
402
                at
403
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java=
404
:141)<br>
405
                at
406
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abstra=
407
ctSAXParser.java:1213)<br>
408
                at
409
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse=
410
(SAXParserImpl.java:649)<br>
411
                at
412
                net.sf.saxon.event.Sender.sendSAXSource(Sender.java:440)<=
413
br>
414
                at net.sf.saxon.event.Sender.send(Sender.java:171)<br>
415
                at
416
                net.sf.saxon.Controller.transform(Controller.java:1690)<b=
417
r>
418
                at
419
                net.sf.saxon.s9api.XsltTransformer.transform(XsltTransfor=
420
mer.java:547)<br>
421
                at
422
                net.sf.saxon.Transform.processFile(Transform.java:1056)<b=
423
r>
424
                at
425
                net.sf.saxon.Transform.doTransform(Transform.java:659)<br=
426
>
427
                at net.sf.saxon.Transform.main(Transform.java:80)<br>
428
                Fatal error during transformation:
429
                java.lang.ArrayIndexOutOfBoundsException: -32768</p>
430
              <script type=3D"application/ld+json">
431
{
432
  "@context": <a class=3D"moz-txt-link-rfc2396E" href=3D"http://schema.or=
433
g">"http://schema.org"</a>,
434
  "@type": "EmailMessage",
435
  "action": {
436
    "@type": "ViewAction",
437
    "url": <a class=3D"moz-txt-link-rfc2396E" href=3D"https://saxonica.pl=
438
an.io/issues/2271#change-3851">"https://saxonica.plan.io/issues/2271#chan=
439
ge-3851"</a>,
440
    "name": "View on Planio"
441
  },
442
  "description": "Click here to view this issue update on Planio."
443
}
444
</script></td>
445
          </tr>
446
          <tr>
447
            <td style=3D"font-size:0.8em;width:100%;">
448
              <hr>
449
              <p>You have received this notification because you have
450
                either subscribed to or are involved in a project on
451
                Saxonica Developer Community site.<br>
452
                To change your notification preferences, please click
453
                here: <a moz-do-not-send=3D"true" class=3D"external"
454
                  href=3D"https://saxonica.plan.io/my/account">https://sa=
455
xonica.plan.io/my/account</a></p>
456
            </td>
457
            <td><br>
458
            </td>
459
          </tr>
460
          <tr>
461
            <td style=3D"font-family: MarketWeb, Verdana,
462
              sans-serif;font-size:1.2em;text-align:center;width:100%;col=
463
or:#D7D7D7;"><br>
464
              <div><a moz-do-not-send=3D"true" href=3D"https://plan.io/"
465
                  style=3D"color:#D7D7D7;text-decoration:none;">This
466
                  notification was cheerfully delivered by</a></div>
467
            </td>
468
            <td><br>
469
            </td>
470
          </tr>
471
          <tr>
472
            <td style=3D"text-align:center;width:100%;"><a
473
                moz-do-not-send=3D"true" href=3D"https://plan.io/"
474
                title=3D"Planio"><img moz-do-not-send=3D"true"
475
                  src=3D"https://assets.plan.io/images/planio_logo_gray_2=
476
04x50.png"
477
                  alt=3D"Planio" style=3D"vertical-align: middle;"
478
                  height=3D"25" width=3D"102"></a></td>
479
          </tr>
480
        </tbody>
481
      </table>
482
    </blockquote>
483
    <br>
484
  </body>
485
</html>
486

    
487
--------------060604000301060009090905--
    (1-1/1)