Tagsoup throws error, any ideas?
Added by Anonymous over 17 years ago
Legacy ID: #4302511 Legacy Poster: iWantToKeepAnon (iwanttokeepanon)
I posted this at the yahoo group for tagsoup-friends, but got no help. Maybe
somebody here can enlighten me. Everytime I access a second document w/ tagsoup it barfs
(see below). Well, not EVERYTIME. If I load sufficently small pages it can load more
than one, but what is the fun in that? ;-) This test template gives the stacktrace I put
at the end of this post: <xsl:template match="/">
<debug> <xsl:copy-of
select="document('http://www.yahoo.com/')" />
<xsl:copy-of select="document('http://www.msn.com/')" />
</debug> </xsl:template> I blows on the 4th line (loading
the msn document). Last time this worked was with Saxon 8.4 (IIRC). I don't know if its
a Saxon change or a tagsoup one. Saxon version and stats: Saxon 8.9J from Saxonica Java
version 1.4.2-03 Stylesheet compilation time: 367 milliseconds Processing file:test.xsl
Tagsoup: .r..r..r.. 1 root root 59139 May 3 14:14 tagsoup-1.1.2.jar Command line: java
net.sf.saxon.Transform -x org.ccil.cowan.tagsoup.Parser \ test.xsl test.xsl The stack
tracs is this: Building tree for http://www.msn.com/ using class
net.sf.saxon.tinytree.TinyBuilder java.lang.NullPointerException at
org.ccil.cowan.tagsoup.Parser.push(Parser.java:631) at
org.ccil.cowan.tagsoup.Parser.rectify(Parser.java:874) at
org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:829) at
org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:586) at
org.ccil.cowan.tagsoup.Parser.parse(Parser.java:399) at
net.sf.saxon.event.Sender.sendSAXSource(Sender.java:300) at
net.sf.saxon.event.Sender.send(Sender.java:142) at
net.sf.saxon.event.Sender.send(Sender.java:43) at
net.sf.saxon.functions.Document.makeDoc(Document.java:262) at
net.sf.saxon.functions.Document$DocumentMappingFunction.map(Document.java:134) at
net.sf.saxon.expr.ItemMappingIterator.next(ItemMappingIterator.java:46) at
net.sf.saxon.instruct.CopyOf.processLeavingTail(CopyOf.java:139) at
net.sf.saxon.instruct.Block.processLeavingTail(Block.java:365) at
net.sf.saxon.instruct.Instruction.process(Instruction.java:91) at
net.sf.saxon.instruct.ElementCreator.processLeavingTail(ElementCreator.java:240) at
net.sf.saxon.instruct.Template.applyLeavingTail(Template.java:98) at
net.sf.saxon.instruct.ApplyTemplates.applyTemplates(ApplyTemplates.java:317) at
net.sf.saxon.Controller.transformDocument(Controller.java:1705) at
net.sf.saxon.Controller.transform(Controller.java:1513) at
net.sf.saxon.Transform.processFile(Transform.java:860) at
net.sf.saxon.Transform.doTransform(Transform.java:504) at
net.sf.saxon.Transform.main(Transform.java:60) Fatal error during transformation:
java.lang.NullPointerException:(no message) Thanks ... -- Rodman
Replies (7)
Please register to reply
RE: Tagsoup throws error, any ideas? - Added by Anonymous over 17 years ago
Legacy ID: #4302528 Legacy Poster: Michael Kay (mhkay)
Saxon attempts to reuse the same parser for multiple documents because this gives a significant performance advantage. SAX parsers are supposed to be serially reusable so this looks like a TagSoup bug to me. It is triggered by a Saxon change, however, because in the past Saxon didn't attempt to reuse the parser. I think your easiest workaround is probably to write a URIResolver to handle the input documents rather than relying on the -x switch. You can then create a new SAXSource with a newly allocated parser for each document. Michael Kay
RE: Tagsoup throws error, any ideas? - Added by Anonymous over 17 years ago
Legacy ID: #4302586 Legacy Poster: iWantToKeepAnon (iwanttokeepanon)
Thanks Dr. Kay! You truly do spend a lot of time helping people. I wonder sometimes that there is any new development on Saxon at all! <g> This is just for personal 'hacking' around. Thanks for the tips. I'll crosspost this over to the tagsoup-friends. Thx again, -- Rodman
RE: Tagsoup throws error, any ideas? - Added by Anonymous over 17 years ago
Legacy ID: #4303667 Legacy Poster: John Cowan (johnwcowan)
I don't think parser reuse can be the problem. My regression tests run several hundred documents through a single TagSoup parser object. I've just downloaded the latest Saxon-B to see if I can figure out what the problem is. Note to iWantToKeepAnon: It's a really really bad idea to use two dynamically changing documents when sending in a test case.
RE: Tagsoup throws error, any ideas? - Added by Anonymous over 17 years ago
Legacy ID: #4305209 Legacy Poster: John Cowan (johnwcowan)
Okay, I've nailed the problem. The NullPointerException arises because the value of the instance variable that holds the entity resolver in TagSoup is null, and it's null because net.sf.saxon.Configuration.reuseSourceParser (Configuration.java:1,445) sets it to null explicitly. There's nothing in the Javadoc at http://www.saxproject.org/apidoc/org/xml/sax/XMLReader.html#setEntityResolver(org.xml.sax.EntityResolver) which justifies calling setEntityResolver with a null argument, though getEntityResolver does return null if no entity resolver has been set.
RE: Tagsoup throws error, any ideas? - Added by Anonymous over 17 years ago
Legacy ID: #4305605 Legacy Poster: Michael Kay (mhkay)
Thanks for the diagnostics. Are you going to fix it? Although the spec might have a gap, I think that since the parser is supposed to be reusable, and since it is supposed to be able to operate without a user-supplied EntityResolver, and since there's no explicit "@throws NullPointerException" in the Javadoc, it's a reasonable expectation that one should be able to revert the setting to its initial state by passing null. I don't think it's essential here for Saxon to unset the EntityResolver, it's more of a safety measure: since the application that's using the parser the second time might be quite unrelated to the application that used it the first time, I think the parser it gets should be a clean one. It also has memory benefits since parsing tends to happen early in the life of the application and hanging on to the EntityResolver might lock a lot of stuff into memory for the rest of the application's life. Michael Kay
RE: Tagsoup throws error, any ideas? - Added by Anonymous over 17 years ago
Legacy ID: #4306197 Legacy Poster: John Cowan (johnwcowan)
TagSoup 1.1.3 now accepts null as the argument to setEntityResolver, set*Handler, and as a lexical handler set by setProperty, with the effect of restoring default behavior.
RE: Tagsoup throws error, any ideas? - Added by Anonymous over 17 years ago
Legacy ID: #4311081 Legacy Poster: iWantToKeepAnon (iwanttokeepanon)
Thanks John and Michael, tagsoup works in my stylesheet now. John writes: Note to iWantToKeepAnon: It's a really really bad idea to use two dynamically changing documents when sending in a test case. You are correct of course, but it was the shortest and simplest stylesheet demonstrating the problem. I've found bug reports are more likely to be investigated when the reporter boils down the problem into the shortest amount of code that still demonstrates the issue. Thanks for the quick turn around!
Please register to reply