Project

Profile

Help

Bug #1813

closed

Absent XHTML DTD entities

Added by Michael Kay over 11 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Sprint/Milestone:
-
Start date:
2013-06-18
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

From Ricki Brown in direct email:

I've been trying out Saxon as a replacement for the standard Java transformer and I was wanting to transform XHTML documents (possibly my first mistake) to obtain (say) a list of image src attributes. I'm not sure if this is a good idea exactly; I did consider using alternatives like JSoup but the rest of my code uses XSLT in some form.

So my documents looks like

<title>Hello World</title>

Some text

with

<xsl:stylesheet

version="1.0"

xmlns:xhtml="http://www.w3.org/1999/xhtml"

  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

  exclude-result-prefixes="xhtml">

<xsl:output method="xml"

    indent="yes"

    encoding="UTF-8"

    standalone="yes"/>

<xsl:template match="/">

    <images>

    <xsl:apply-templates/>

    </images>

</xsl:template>

<xsl:template match="text()"/>

<xsl:template match="//xhtml:img">

    <image><xsl:value-of select="@src"/></image>

</xsl:template>

</xsl:stylesheet>

and were taking a long time to transform. After reading around on the subject matter I understand that Saxon uses its own Entity Resolver to fetch common entities from within the Saxon jar file but when I paused the process there were HTTP connections active.

When I disabled my internet connection and ran something like

TransformerFactory factory = TransformerFactory.newInstance();

Transformer transformer = factory.newTransformer(myStylesheet);

Source src = new StreamSource(myFile);

Result res = new StreamResult(System.out);

transformer.transform(src, res);

I got

Exception in thread "main" net.sf.saxon.trans.XPathException: I/O error reported by XML parser processing file:/: www.w3.org

at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:427)

at net.sf.saxon.event.Sender.send(Sender.java:169)

at net.sf.saxon.Controller.transform(Controller.java:1890)

Caused by: java.net.UnknownHostException: www.w3.org

I downloaded the source and attached a breakpoint to the last line of StandardEntityResolver's resolveEntity method and found that the following entities aren't mapped

-//W3C//ELEMENTS XHTML Inline Style 1.0//EN

http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-inlstyle-1.mod

-//W3C//ELEMENTS XHTML Editing Elements 1.0//EN

http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-edit-1.mod

-//W3C//ELEMENTS XHTML BIDI Override Element 1.0//EN

http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-bdo-1.mod

-//W3C//ELEMENTS XHTML Style Sheets 1.0//EN

http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-style-1.mod

I can register these entities myself by calling StandardEntityResolver.register with the appropriate arguments and then everything works without an internet connection.

Please register to edit this issue

Also available in: Atom PDF