Bug #1813
closedAbsent XHTML DTD entities
100%
Description
From Ricki Brown ricki.w.brown@gmail.com in direct email:
I've been trying out Saxon as a replacement for the standard Java transformer and I was wanting to transform XHTML documents (possibly my first mistake) to obtain (say) a list of image src attributes. I'm not sure if this is a good idea exactly; I did consider using alternatives like JSoup but the rest of my code uses XSLT in some form.
So my documents looks like
<title>Hello World</title>Some text
with
<xsl:stylesheet
version="1.0"
xmlns:xhtml="http://www.w3.org/1999/xhtml"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="xhtml">
<xsl:output method="xml"
indent="yes"
encoding="UTF-8"
standalone="yes"/>
<xsl:template match="/">
<images>
<xsl:apply-templates/>
</images>
</xsl:template>
<xsl:template match="text()"/>
<xsl:template match="//xhtml:img">
<image><xsl:value-of select="@src"/></image>
</xsl:template>
</xsl:stylesheet>
and were taking a long time to transform. After reading around on the subject matter I understand that Saxon uses its own Entity Resolver to fetch common entities from within the Saxon jar file but when I paused the process there were HTTP connections active.
When I disabled my internet connection and ran something like
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(myStylesheet);
Source src = new StreamSource(myFile);
Result res = new StreamResult(System.out);
transformer.transform(src, res);
I got
Exception in thread "main" net.sf.saxon.trans.XPathException: I/O error reported by XML parser processing file:/: www.w3.org
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:427)
at net.sf.saxon.event.Sender.send(Sender.java:169)
at net.sf.saxon.Controller.transform(Controller.java:1890)
Caused by: java.net.UnknownHostException: www.w3.org
I downloaded the source and attached a breakpoint to the last line of StandardEntityResolver's resolveEntity method and found that the following entities aren't mapped
-//W3C//ELEMENTS XHTML Inline Style 1.0//EN
http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-inlstyle-1.mod
-//W3C//ELEMENTS XHTML Editing Elements 1.0//EN
http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-edit-1.mod
-//W3C//ELEMENTS XHTML BIDI Override Element 1.0//EN
http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-bdo-1.mod
-//W3C//ELEMENTS XHTML Style Sheets 1.0//EN
http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-style-1.mod
I can register these entities myself by calling StandardEntityResolver.register with the appropriate arguments and then everything works without an internet connection.
Please register to edit this issue