Project

Profile

Help

Issues with FOP+Saxon generating the FOP intermediate format (area tree)

Added by Nico Kutscherauer about 1 month ago

Hi,

as the title says, I have a problem using FOP and Saxon generating the FOP intermediate XML format. I have reconstructed the issues and described the problems in this GitHub repository.

The user perspective is:

I have the following in my XSL-FO input:

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:fox="http://xmlgraphics.apache.org/fop/extensions">
    <!-- ... -->
    <fo:declarations>
        <pdf:catalog xmlns:pdf="http://xmlgraphics.apache.org/fop/extensions/pdf">
            <pdf:dictionary type="normal" key="ViewerPreferences">
                <pdf:boolean key="DisplayDocTitle">true</pdf:boolean>
            </pdf:dictionary>
        </pdf:catalog>
    </fo:declarations>
    <!-- ... -->
</fo:root>

If the Saxon is on the classpath the resulting FOP area tree contains this:

<document xmlns="http://xmlgraphics.apache.org/fop/intermediate"
          version="2.0">
   <header>
      <pdf:catalog>
         <!-- ... -->
      </pdf:catalog>

The namespace declaration for the prefix pdf is missing so the result is not wellformed!

If Xalan is on the classpath the result is wellformed:

<document xmlns="http://xmlgraphics.apache.org/fop/intermediate" version="2.0">
<header>
<pdf:catalog xmlns:pdf="apache:fop:extensions:pdf">
<!-- ... -->
</pdf:catalog>

I made some debugging and provided the details here. Not sure if that is usefull.

I don't expect that this is a Saxon bug but I'm not deep in SAX/JAXP to argue that and to identify the problems in FOP. Do you have a hint/guess for me how that could happen?

Thanks & Best Regards,
Nico


Replies (4)

Please register to reply

RE: Issues with FOP+Saxon generating the FOP intermediate format (area tree) - Added by Michael Kay about 1 month ago

Without looking too deeply into the gory detail, I suspect the problem is caused by known weaknesses in the design of the JAXP ContentHandler interface. Specifically, the details of the events that are passed across, especially for namespaces, depend on the property settings of the XML "Parser" (that is, the component that issues the events). But in an application like this, Saxon has no opportunity to confirgure the parser, nor even to discover what these property settings are. We decided therefore to mandate that if an application wants to supply the input using this interface, it is required to conform to our expectations on these settings, and that we wouldn't incur the (significant) cost of verifying what is passed over.

See in particular the Javadoc comments on ReceivingContentHandler:

* <p>The {@code ReceivingContentHandler} is written on the assumption that it is receiving events
 * from a parser configured with {@code http://xml.org/sax/features/namespaces} set to true
 * and {@code http://xml.org/sax/features/namespace-prefixes} set to false.</p>
 * <p>When running as a {@code TransformerHandler}, we have no control over the feature settings
 * of the sender of the events, and if the events do not follow this pattern then the class may
 * fail in unpredictable ways.</p>

There's also a comment on the startElement method:

* <p>This event allows up to three name components for each
     * element:</p>
     *
     * <ol>
     * <li>the Namespace URI;</li>
     * <li>the local name; and</li>
     * <li>the qualified (prefixed) name.</li>
     * </ol>
     *
     * <p>Saxon expects all three of these to be provided.

We also rely on startPrefixMapping() and endPrefixMapping() calls happening.

RE: Issues with FOP+Saxon generating the FOP intermediate format (area tree) - Added by Michael Kay about 1 month ago

I strongly suspect that the startNamespacePrefix() and endNamespacePrefix() methods on the ContentHandler are not being called.

RE: Issues with FOP+Saxon generating the FOP intermediate format (area tree) - Added by Nico Kutscherauer 30 days ago

Thanks, Michael for the hints! Now I know what to look out for. Maybe I can provide a Patch-Fix for FOP (though I have my doubts that they will accept them :-/).

RE: Issues with FOP+Saxon generating the FOP intermediate format (area tree) - Added by Michael Kay 30 days ago

It's probably best fixed by implementing a ContentHandler filter class that converts the events as FOP emits them to the events as Saxon expects them.

I feel sure though that this has worked in the past. In fact, in the distant past Saxon used to ship with some kind of FOP integration. But I seem to remember we always had trouble making it work across FOP releases.

    (1-4/4)

    Please register to reply