Project

Profile

Help

Bug #5957

closed

DTD Validation disabling

Added by Mark Hansen about 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
High
Assignee:
Category:
JAXP Java API
Sprint/Milestone:
-
Start date:
2023-04-05
Due date:
2023-04-06
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
12, trunk
Fix Committed on Branch:
12, trunk
Fixed in Maintenance Release:
Platforms:
Java

Description

Hi,

i am trying to transform a XML file with an XSL 3.0. The XML file has a doctype declaration, but for the workflow it is not necassary to download and validating the source file. I tried a couple of things but the error was the same, that the DTD file could not be downloaded.

Here are my code:

 EnterpriseConfiguration configuration = new EnterpriseConfiguration();
 configuration.setConfigurationProperty(
      Feature.ENTITY_RESOLVER_CLASS, IgnoreDoctypeEntityResolver.class.getCanonicalName());
 configuration.setConfigurationProperty(Feature.DTD_VALIDATION, false);
 configuration.setValidation(false);
 StreamingTransformerFactory streamingTransformerFactory =
      new StreamingTransformerFactory(configuration);

 streamingTransformerFactory.setErrorListener(new StandardErrorListener());
      Transformer strTransformer =
          streamingTransformerFactory.newTransformer(
              new StreamSource(xslFile)));

 StreamSource xml = new StreamSource(onixStream);

 okFile = new File(targetDirectory.toString(), "dummy.xml");


strTransformer.transform(xml, new StreamResult(okFile));

Can you help me to solve this problem?

Actions #1

Updated by Mark Hansen about 1 year ago

ublic class IgnoreDoctypeEntityResolver implements EntityResolver {

  private static final Logger log = LoggerFactory.getLogger(IgnoreDoctypeEntityResolver.class);

  public static final String DOCTYPE_SUFFIX = ".dtd";

  @Override
  public InputSource resolveEntity(String publicId, String systemId) {
    if (systemId.endsWith(DOCTYPE_SUFFIX)) {
      // resolve to empty result to ignore
      log.info("Ignoring external xml entity: {}", systemId);
      return new InputSource(new StringReader(""));
    }
    // otherwise use default behaviour
    return null;
  }
}
Actions #2

Updated by Michael Kay about 1 year ago

You can ask Saxon to switch off DTD validation, but you can't prevent it from reading the external DTD, because that's needed for entity expansion. What you have to do is redirect it to use a different (perhaps dummy) DTD.

But you're already doing that - why isn't it working? Offhand, I don't know, I'll have to try your code in the debugger. One of the problems is that there are too many places you can set such options, and they sometimes interfere with each other.

Actions #3

Updated by Martin Honnen about 1 year ago

Assuming Xerces is the underlying XML parser then for me

configuration.setParseOptions(configuration.getParseOptions().withParserFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false));

prevents loading of an external DTD. I would think I have only used that once or twice and probably only with Saxon HE but you could give it a try for EE, I don't think the parser use of specific to the Saxon edition.

Actions #4

Updated by Mark Hansen about 1 year ago

First of all, thanks for the quick responses.

The error changes from:

Error I/O error reported by XML parser processing null. Caused by java.io.FileNotFoundException: http://doesnotexist.com/BookProduct_3.0_short.dtd Error I/O error reported by XML parser processing null. Caused by java.io.IOException: net.sf.saxon.s9api.SaxonApiException: I/O error reported by XML parser processing null: I/O error reported by XML parser processing null: I/O error reported by XML parser processing null.

To:

Error I/O error reported by XML parser processing null. Caused by java.io.FileNotFoundException: D:\Projekte\service\import-metadaten\BookProduct_3.0_short.dtd (Das System kann die angegebene Datei nicht finden) Error I/O error reported by XML parser processing null. Caused by java.io.IOException: net.sf.saxon.s9api.SaxonApiException: I/O error reported by XML parser processing null: I/O error reported by XML parser processing null: I/O error reported by XML parser

Now the validator tryed to find the DTD on my own system. But there is no file neither.

Actions #5

Updated by Martin Honnen about 1 year ago

I get the following code to compile and run with Java 8, Saxon EE 12.1 without any errors:

import com.saxonica.config.EnterpriseConfiguration;
import com.saxonica.config.StreamingTransformerFactory;
import net.sf.saxon.lib.Feature;

import javax.xml.transform.Templates;
import javax.xml.transform.TransformerException;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

public class Main {
    public static void main(String[] args) throws TransformerException {

        EnterpriseConfiguration configuration = new EnterpriseConfiguration();
        configuration.setConfigurationProperty(Feature.DTD_VALIDATION, false);
        configuration.setValidation(false);

        configuration.setParseOptions(configuration.getParseOptions().withParserFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false));

        StreamingTransformerFactory streamingTransformerFactory = new StreamingTransformerFactory(configuration);

        Templates streamingTemplates = streamingTransformerFactory.newTemplates(new StreamSource("xslt-test1.xsl"));

        streamingTemplates.newTransformer().transform(new StreamSource("input1.xml"), new StreamResult("result1.xml"));

        streamingTemplates.newTransformer().transform(new StreamSource("input2.xml"), new StreamResult("result2.xml"));

        streamingTemplates.newTransformer().transform(new StreamSource("input3.xml"), new StreamResult("result3.xml"));

    }
}

input2.xml has e.g. <!DOCTYPE root SYSTEM "http://example.com/example1.dtd"> but doesn't give any error, input3.xml has e.g. <!DOCTYPE root SYSTEM "example1.dtd"> and doesn't give any error either.

Actions #6

Updated by Mark Hansen about 1 year ago

It still not working for me.

I have also Saxon-EE 12.1 but my Java version is 17.

I copied your code.

Here are my XML:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Book PUBLIC "-//MeineFirma Solutions//DTD EMail V 1.0//DE"
        "http://doesnotexist.com/BookProduct_3.0_short.dtd">
<Book release="3.0">
   <Entry>...</Entry>
</Book>

and here is my XSL:

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:so="http://www.book.org/3.0/short">
    <xsl:mode streamable="yes" on-no-match="shallow-copy"/>

    <xsl:template match="so:Book">
        <xsl:iterate
                select="so:Entry">
            <xsl:result-document href="{concat(position(),'.xml')}" method="xml">
                <xsl:apply-templates select="."/>
            </xsl:result-document>
        </xsl:iterate>
    </xsl:template>

    <xsl:template match="so:Entry">
        <xsl:copy-of select="."/>
    </xsl:template>
</xsl:stylesheet>
Actions #7

Updated by Martin Honnen about 1 year ago

That XSLT code trying to match Entry or Book elements in a namespace doesn't seem to fit the XML sample not using any namespace.

Actions #8

Updated by Martin Honnen about 1 year ago

I tried a slight adaption of your XML

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE Book PUBLIC "-//MeineFirma Solutions//DTD EMail V 1.0//DE"
        "http://doesnotexist.com/BookProduct_3.0_short.dtd">
<Book xmlns="http://www.book.org/3.0/short" release="3.0">
    <Entry>...</Entry>
</Book>

and XSLT

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:so="http://www.book.org/3.0/short">

    <xsl:mode streamable="yes" on-no-match="shallow-copy"/>

    <xsl:template match="so:Book">
        <xsl:iterate
                select="so:Entry">
            <xsl:result-document href="Entry-{position()}.xml" method="xml">
                <xsl:apply-templates select="."/>
            </xsl:result-document>
        </xsl:iterate>
    </xsl:template>

    <xsl:template match="so:Entry">
        <xsl:copy-of select="."/>
    </xsl:template>

</xsl:stylesheet>

(also had the Java code to set e.g. streamingTemplates.newTransformer().transform(new StreamSource("input4.xml"), new StreamResult(new File("result4.xml")));, otherwise the xsl:result-document would throw some error about a wrong relative URI) but then, with Java 8 the DTD is ignored.

That looks as if the setting I suggested is not working with Java 17 although I have not tested whether that is the relevant difference.

I guess you will have wait until Michael finds the time to test/tell how to change your code to have the external code ignored.

Actions #9

Updated by Martin Honnen about 1 year ago

For a test, I have installed Adoptium/Eclipse Temurin openjdk 17.0.6 and have now run the program using that Java version but it behaves the same for me, no errors related to the DTDs are given, the XSLT transformations run through with any external DTDs being ignored.

Actions #10

Updated by Mark Hansen about 1 year ago

Hello Martin,

i have maked a mistake or hava a misunderstanding.

I have shained two transformations and the old one have worked with the xslt30Transformer and the configuration property ENTITY_RESOLVER_CLASS which ignore the DTD. That worked with the saxon-he version 10.5.

I have now added

 configuration.setParseOptions(configuration.getParseOptions().withParserFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false));

and it worked fine.

Thanks. That helped. I never have find out that with the configuration propertie.

Actions #11

Updated by Michael Kay about 1 year ago

I can confirm that the entity resolver supplied using Feature.ENTITY_RESOLVER_CLASS is not being used.

What seems to be happening is that Configuration.getSourceParser() returns an XMLReader whose entity resolver is an EntityResolverWrappingResourceResolver which invokes the Configuration's CatalogResolver. On return, `ActiveStreamSource.deliver() does

            if (options.getEntityResolver() != null && parser.getEntityResolver() == null) {
                parser.setEntityResolver(options.getEntityResolver());
            }

which has no effect because parser.getEntityResolver() is not null. This code is there to stop us stomping over an EntityResolver explicitly registered with the XMLReader, but it has the effect that the EntityResolver held in the ParseOptions is ignored (at least on this path).

If I take out the condition && parser.getEntityResolver() == null) the test case works - but I'm not convinced that is the right solution,

Actions #12

Updated by Michael Kay about 1 year ago

  • Tracker changed from Support to Bug
  • Category set to JAXP Java API
  • Status changed from New to In Progress
  • Assignee set to Michael Kay
Actions #13

Updated by Michael Kay about 1 year ago

I've committed this change as a partial/provisional solution that handles this test case.

Actions #14

Updated by Michael Kay 12 months ago

  • Status changed from In Progress to Resolved
  • Applies to branch trunk added
  • Fix Committed on Branch trunk added
  • Platforms Java added

Marking this resolved because we produced a patch that fixes the immediate problem, although there are wider issues still to consider.

Actions #15

Updated by O'Neil Delpratt 12 months ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.2 added

Bug fix applied in the Saxon 12.2 maintenance release

Please register to edit this issue

Also available in: Atom PDF