Project

Profile

Help

Getting TagSoup to work with Saxon

Added by Anonymous over 19 years ago

Legacy ID: #3191270 Legacy Poster: Marc van Grootel (mvgrootel)

I'm trying to get TagSoup to work as a source document parser for Saxon 8.4. I've tried doing this: java -cp tagsoup-1.0rc3.jar -jar saxon8.jar -x org.ccil.cowan.tagsoup.Parser -o foo.xml foo.html identity.xsl Transformation failed: net.sf.saxon.trans.DynamicError: Failed to load org.ccil.cowan.tagsoup.Parser Same happens with 6.5.3 I also tried using the Ant xslt task: <xslt processor="trax" in="foo.html" out="foo.xml" style="identity.xsl"> <factory name="net.sf.saxon.TransformerFactoryImpl"> <attribute name="http://saxon.sf.net/feature/sourceParserClass" value="org.ccil.cowan.tagsoup.Parser"/> </factory> </xslt> AFAIK this should be more or less equivalent with the first example. This however results in a SAXParseException on an HTML meta tag (<meta> without an end tag). So it seems as if the tagsoup parser isn't doing the parsing here. Eventually I want to be able to do an XSLT transform using HTML as the source document without having to serialize the TagSoup result first. Any ideas? --Marc


Replies (6)

Please register to reply

RE: Getting TagSoup to work with Saxon - Added by Anonymous over 19 years ago

Legacy ID: #3191297 Legacy Poster: Brett Knights (bknights)

What if you try: java -cp tagsoup-1.0rc3.jar:saxon8.jar net.sf.saxon.Transform -x org.ccil.cowan.tagsoup.Parser -o foo.xml foo.html identity.xsl keep in mind that any documents loaded by a call to document(...) will also be parsed by tagsoup.

RE: Getting TagSoup to work with Saxon - Added by Anonymous over 19 years ago

Legacy ID: #3191316 Legacy Poster: Marc van Grootel (mvgrootel)

Yes, that works! (any idea why using -jar doesn't?) I think I'm going to post the Ant fragment to ant-users because I think that's an Ant, xslt task related problem. Thanks a lot, --Marc

RE: Getting TagSoup to work with Saxon - Added by Anonymous over 19 years ago

Legacy ID: #3191320 Legacy Poster: DD (ddevienne)

Your command line was wrong. -cp is always ignored with you use -jar. It's unlikely to be a problem with Ant's <xslt> (which you didn't even show usage of). I've used Saxon with Ant just fine. I don't think you can pass a custom parser to Saxon thru <xslt> anyway. --DD

RE: Getting TagSoup to work with Saxon - Added by Anonymous over 19 years ago

Legacy ID: #3192464 Legacy Poster: Marc van Grootel (mvgrootel)

Thanks, I didn't realize that -cp was ignored with -jar. I did provide an example of the xslt task in my original post though. I use Saxon with Ant's xslt task all the time. But according to the Ant docs my sample should also set a customer parser for source docs in Saxon (using a <factory> nested element with the http://saxon.sf.net/feature/sourceParserClass attribute). Therefore I think it may be a problem with Ant's xslt task.

RE: Getting TagSoup to work with Saxon - Added by Anonymous over 19 years ago

Legacy ID: #3192694 Legacy Poster: DD (ddevienne)

My apologies. You're right that it does not work with <xslt>. Saxon calls on Xerces, the default SAX parser to process the input (broken) html file, which of course chokes on it (net.sf.saxon.event.Sender.sendSAXSource calls org.apache.xerces.parsers.AbstractSAXParser.parse) Either Saxon ignores the feature passed in, or Ant does not supply it. Could also be that since Ant initialized the SAX factory to use Xerces for its own parsing, Saxon does not (or cannot) re-initialize a different custom SAX parser. BTW, Saxon's code mentions that the sourceClassParser feature is kind of deprecated, and the SAX way to initialize the parser is preferred. This doesn't jibe with your use case of using a custom SAX parser like tagsoup with Saxon, when Saxon is invoked by Ant that requires a normal SAX itself. --DD

RE: Getting TagSoup to work with Saxon - Added by Anonymous over 19 years ago

Legacy ID: #3273668 Legacy Poster: jim collins (jimcollins)

Or you could just use tsaxon http://mercury.ccil.org/~cowan/XML/tagsoup/tsaxon/ which already has this set up

    (1-6/6)

    Please register to reply