Project

Profile

Help

Tagsoup throws error, any ideas?

Added by Anonymous about 17 years ago

Legacy ID: #4302511 Legacy Poster: iWantToKeepAnon (iwanttokeepanon)

I posted this at the yahoo group for tagsoup-friends, but got no help. Maybe somebody here can enlighten me. Everytime I access a second document w/ tagsoup it barfs (see below). Well, not EVERYTIME. If I load sufficently small pages it can load more than one, but what is the fun in that? ;-) This test template gives the stacktrace I put at the end of this post: <xsl:template match="/"> <debug> <xsl:copy-of select="document('http://www.yahoo.com/')" /> <xsl:copy-of select="document('http://www.msn.com/')" /> </debug> </xsl:template> I blows on the 4th line (loading the msn document). Last time this worked was with Saxon 8.4 (IIRC). I don't know if its a Saxon change or a tagsoup one. Saxon version and stats: Saxon 8.9J from Saxonica Java version 1.4.2-03 Stylesheet compilation time: 367 milliseconds Processing file:test.xsl Tagsoup: .r..r..r.. 1 root root 59139 May 3 14:14 tagsoup-1.1.2.jar Command line: java
net.sf.saxon.Transform -x org.ccil.cowan.tagsoup.Parser \ test.xsl test.xsl The stack tracs is this: Building tree for http://www.msn.com/ using class net.sf.saxon.tinytree.TinyBuilder java.lang.NullPointerException at org.ccil.cowan.tagsoup.Parser.push(Parser.java:631) at org.ccil.cowan.tagsoup.Parser.rectify(Parser.java:874) at org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:829) at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:586) at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:399) at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:300) at net.sf.saxon.event.Sender.send(Sender.java:142) at net.sf.saxon.event.Sender.send(Sender.java:43) at net.sf.saxon.functions.Document.makeDoc(Document.java:262) at net.sf.saxon.functions.Document$DocumentMappingFunction.map(Document.java:134) at net.sf.saxon.expr.ItemMappingIterator.next(ItemMappingIterator.java:46) at net.sf.saxon.instruct.CopyOf.processLeavingTail(CopyOf.java:139) at net.sf.saxon.instruct.Block.processLeavingTail(Block.java:365) at net.sf.saxon.instruct.Instruction.process(Instruction.java:91) at net.sf.saxon.instruct.ElementCreator.processLeavingTail(ElementCreator.java:240) at net.sf.saxon.instruct.Template.applyLeavingTail(Template.java:98) at net.sf.saxon.instruct.ApplyTemplates.applyTemplates(ApplyTemplates.java:317) at net.sf.saxon.Controller.transformDocument(Controller.java:1705) at net.sf.saxon.Controller.transform(Controller.java:1513) at net.sf.saxon.Transform.processFile(Transform.java:860) at net.sf.saxon.Transform.doTransform(Transform.java:504) at net.sf.saxon.Transform.main(Transform.java:60) Fatal error during transformation: java.lang.NullPointerException:(no message) Thanks ... -- Rodman


Replies (7)

Please register to reply

RE: Tagsoup throws error, any ideas? - Added by Anonymous about 17 years ago

Legacy ID: #4302528 Legacy Poster: Michael Kay (mhkay)

Saxon attempts to reuse the same parser for multiple documents because this gives a significant performance advantage. SAX parsers are supposed to be serially reusable so this looks like a TagSoup bug to me. It is triggered by a Saxon change, however, because in the past Saxon didn't attempt to reuse the parser. I think your easiest workaround is probably to write a URIResolver to handle the input documents rather than relying on the -x switch. You can then create a new SAXSource with a newly allocated parser for each document. Michael Kay

RE: Tagsoup throws error, any ideas? - Added by Anonymous about 17 years ago

Legacy ID: #4302586 Legacy Poster: iWantToKeepAnon (iwanttokeepanon)

Thanks Dr. Kay! You truly do spend a lot of time helping people. I wonder sometimes that there is any new development on Saxon at all! <g> This is just for personal 'hacking' around. Thanks for the tips. I'll crosspost this over to the tagsoup-friends. Thx again, -- Rodman

RE: Tagsoup throws error, any ideas? - Added by Anonymous about 17 years ago

Legacy ID: #4303667 Legacy Poster: John Cowan (johnwcowan)

I don't think parser reuse can be the problem. My regression tests run several hundred documents through a single TagSoup parser object. I've just downloaded the latest Saxon-B to see if I can figure out what the problem is. Note to iWantToKeepAnon: It's a really really bad idea to use two dynamically changing documents when sending in a test case.

RE: Tagsoup throws error, any ideas? - Added by Anonymous about 17 years ago

Legacy ID: #4305209 Legacy Poster: John Cowan (johnwcowan)

Okay, I've nailed the problem. The NullPointerException arises because the value of the instance variable that holds the entity resolver in TagSoup is null, and it's null because net.sf.saxon.Configuration.reuseSourceParser (Configuration.java:1,445) sets it to null explicitly. There's nothing in the Javadoc at http://www.saxproject.org/apidoc/org/xml/sax/XMLReader.html#setEntityResolver(org.xml.sax.EntityResolver) which justifies calling setEntityResolver with a null argument, though getEntityResolver does return null if no entity resolver has been set.

RE: Tagsoup throws error, any ideas? - Added by Anonymous about 17 years ago

Legacy ID: #4305605 Legacy Poster: Michael Kay (mhkay)

Thanks for the diagnostics. Are you going to fix it? Although the spec might have a gap, I think that since the parser is supposed to be reusable, and since it is supposed to be able to operate without a user-supplied EntityResolver, and since there's no explicit "@throws NullPointerException" in the Javadoc, it's a reasonable expectation that one should be able to revert the setting to its initial state by passing null. I don't think it's essential here for Saxon to unset the EntityResolver, it's more of a safety measure: since the application that's using the parser the second time might be quite unrelated to the application that used it the first time, I think the parser it gets should be a clean one. It also has memory benefits since parsing tends to happen early in the life of the application and hanging on to the EntityResolver might lock a lot of stuff into memory for the rest of the application's life. Michael Kay

RE: Tagsoup throws error, any ideas? - Added by Anonymous about 17 years ago

Legacy ID: #4306197 Legacy Poster: John Cowan (johnwcowan)

TagSoup 1.1.3 now accepts null as the argument to setEntityResolver, set*Handler, and as a lexical handler set by setProperty, with the effect of restoring default behavior.

RE: Tagsoup throws error, any ideas? - Added by Anonymous almost 17 years ago

Legacy ID: #4311081 Legacy Poster: iWantToKeepAnon (iwanttokeepanon)

Thanks John and Michael, tagsoup works in my stylesheet now. John writes: Note to iWantToKeepAnon: It's a really really bad idea to use two dynamically changing documents when sending in a test case. You are correct of course, but it was the shortest and simplest stylesheet demonstrating the problem. I've found bug reports are more likely to be investigated when the reporter boils down the problem into the shortest amount of code that still demonstrates the issue. Thanks for the quick turn around!

    (1-7/7)

    Please register to reply