Project

Profile

Help

Bug #5800

closed

Saxon-HE:9.5.5 and higher causes NullpointerException in GatherUsePackageDeclaration method when using xsl:include and applying custom URIResolver

Added by Jan Strakos over 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Resolvers
Sprint/Milestone:
-
Start date:
2023-01-06
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
10, 11, 9.5, 9.6, 9.7, 9.8, 9.9
Fix Committed on Branch:
10, 11, trunk
Fixed in Maintenance Release:
Platforms:
Java

Description

Saxon-HE:9.5.5 and higher causes NullpointerException in GatherUsePackageDeclaration method when using xsl:include and applying custom URIResolver

Hello, first of all I would like to thank you for the great work your team is doing.

I am facing the following problem, which occurs from version Saxon-HE:9.5.5 and above. The last stable version is Saxon-HE:9.5.1-4. I am currently using the latest version of saxon 11.4. I was able to isolate and repeat the problem. I am convinced that the error is due to our custom URI resolver, but I am not clear why. I also know the code line that, when commented out, makes everything work, but I don't know why.

This is the exception:

java.lang.NullPointerException at net.sf.saxon.style.PrincipalStylesheetModule.gatherUsePackageDeclarations(PrincipalStylesheetModule.java:522) at net.sf.saxon.style.PrincipalStylesheetModule.gatherUsePackageDeclarations(PrincipalStylesheetModule.java:523) at net.sf.saxon.style.PrincipalStylesheetModule.spliceUsePackages(PrincipalStylesheetModule.java:487) at net.sf.saxon.style.PrincipalStylesheetModule.preprocess(PrincipalStylesheetModule.java:330) at net.sf.saxon.style.Compilation.compilePackage(Compilation.java:290) at net.sf.saxon.style.StylesheetModule.loadStylesheet(StylesheetModule.java:249) at net.sf.saxon.style.Compilation.compileSingletonPackage(Compilation.java:113) at net.sf.saxon.s9api.XsltCompiler.compile(XsltCompiler.java:936) at net.sf.saxon.jaxp.SaxonTransformerFactory.newTemplates(SaxonTransformerFactory.java:155) at net.sf.saxon.jaxp.SaxonTransformerFactory.newTransformer(SaxonTransformerFactory.java:112) at com.netledger.dcs.NLComponentXSLTransformationTest.tempTest(NLComponentXSLTransformationTest.java:180) at com.netledger.test.NLTestBaseRunner$NLFailOnTimeout$1.run(NLTestBaseRunner.java:458)

This is the test:

		final TransformerFactory transformerFactory = TransformerFactory.newInstance();
		File base = new File("/webdev/tempTest/oramw/apache"); // Directory, that contains other directories with xslt templates
		File[] dependencyLocations = { new File("/webdev/tempTest2/_dist/cmd/shared/rule") }; // Directory, that contains potential directories with xslt templates - its for fallback
		transformerFactory.setURIResolver(new RelativePathUriResolver(base, dependencyLocations)); // Setting our custom URI resolver
		final Transformer transformer = transformerFactory.newTransformer(new StreamSource("/webdev/tempTest/oramw/apache/conf-templates/create-rota-dedicated-baseline.xslt")); // The transformer is not created and an exception occurs

Somethink about the test data: There is a "main" template, location: "/webdev/tempTest/oramw/apache/conf-templates/create-rota-dedicated-baseline.xslt". This template has an include element: <xsl:include href="./conf-templates/base-defs.xslt"/>. The base-defs.xslt has two include elements <xsl:include href="../../../config/shared/rule/output-content.xslt"/> and <xsl:include href=".../../../config/shared/rule/functions.xslt"/>. For simplycity, thats all. I do not see anything wrong with this.

What have i found: I found out that if I comment out the resolvedSource.setSystemId(resolvedFile.getAbsolutePath()); line in the custom URIResolver (provided down below), suddenly everything works. But if I don't do that and uncomment the row, when method gatherUsePackageDeclarations is called and line TreeInfo includedTree = compilation.getStylesheetModules().get(key); is executed, the problem occurs, because stylesheetModules Hashmap contains three key-value pairs with keys: [/webdev/tempTest/config/shared/rule/output-content.xslt] and [/webdev/tempTest/config/shared/rule/functions.xslt] and [file:/webdev/tempTest/oramw/apache/conf-templates/conf/templates/base-defs.xslt]. DocumentKey, which is used for searching the value is [file:/../config/shared/rule/output-content.xslt] and this caused the NullpointerException. To be honest, i dont know why the systemID has an impact to this, but i am really curious about it... I think it has something to do with the systemID, the calculation of the DocumentKey and using the element

This is our custom URIREsolver:

/**
 * Implementation of javax.xml.transform.URIResolver that allows the user to explicitly specify the
 * base directory against which relative hrefs are resolved when processing stylesheets, and a list of
 * dependency locations to fall back to for resolution.
 *
 * The href is resolved against the following locations, returning the first match:
 * - The location of the file containing the relative href, if available
 * - The resolution base, if non-null
 * - Each dependency directory, in the order in which they are listed
 */
public class RelativePathUriResolver implements URIResolver
{
	private final File resolutionBase;
	private final File[] dependencyDirectories;

	public RelativePathUriResolver(@Nullable File resolutionBase, File[] dependencyDirectories)
	{
		this.resolutionBase = resolutionBase;
		this.dependencyDirectories = dependencyDirectories;
	}

	@Override
	public Source resolve(String href, String base) throws TransformerException
	{
		if (Strings.isNullOrEmpty(href))
			throw new TransformerException("Passed HREF was empty.");

		// When a non-null base is passed to this method, it is typically the path to a file in a resolution chain.
		// If this is the case, attempt to resolve against the parent directory of base rather than base itself.
		File baseFile = Strings.isNullOrEmpty(base) ? null : new File(base);
		if (baseFile != null && baseFile.isFile())
			baseFile = baseFile.getParentFile();

		List<File> potentialRoots = Lists.asList(baseFile, resolutionBase, dependencyDirectories);
		File resolvedFile = findFile(href, potentialRoots);
		if (resolvedFile == null)
		{
			// Some of our config template reference includes using relative paths as though they were in a source tree, e.g.
			//    <xsl:include href="../../../config/shared/rule/functions.xslt"/>
			// These should be updated, but until they are, we will retain the filename-only lookup as a fallback.
			resolvedFile = findFile(NLIOUtil.getNameOnlyGivenPath(href), potentialRoots);
		}

		if (resolvedFile == null)
		{
			String potentialRootsString = potentialRoots
					.stream()
					.filter(Objects::nonNull)
					.map(File::getAbsolutePath)
					.collect(Collectors.joining("\n\t"));
			throw new TransformerException(String.format("Unable to resolve HREF [ %s ] for any of the potential base directories:\n"
					+ "\t%s\n", href, potentialRootsString));
		}

		try (InputStream fileStream = new FileInputStream(resolvedFile))
		{
			Source resolvedSource = new JDOMSource(NLJdomUtils.buildDocument(fileStream));
			resolvedSource.setSystemId(resolvedFile.getAbsolutePath()); // IF I COMMENT OUT THIS LINE, EVERYTHING WORKS FINE
			return resolvedSource;
		}
		catch (JDOMException je)
		{
			throw new TransformerException("An exception occured while converting requested object [ " +
					resolvedFile.getAbsolutePath() + " ] for HREF [ " + href + " ] to a Source object.", je);
		}
		catch (IOException ioe)
		{
			throw new TransformerException(
					String.format("Problem accessing file [ %s ] for HREF [ %s ].", resolvedFile.getAbsolutePath(), href), ioe);
		}
	}

	private File findFile(String href, List<File> potentialRoots)
	{
		return potentialRoots.stream()
				.filter(Objects::nonNull)
				.filter(File::isDirectory)
				.map(file -> file.toPath().resolve(href).toFile())
				.filter(File::isFile)
				.findFirst()
				.orElse(null);
	}
}


Files

Actions #1

Updated by Michael Kay over 1 year ago

I'll start by adding some diagnostics, so that if get(documentKey) returns null, we produce some kind of diagnostic message rather than crashing out with an NPE.

Actions #2

Updated by Michael Kay over 1 year ago

Saxon's internal design here is as follows.

Firstly, every stylesheet module is first read in a streamed filter pass by the UseWhenFilter. As the name suggests the original purpose of this class was to evaluate xsl:use-when attributes and remove sections of the stylesheet before being processed any further. But this first pass also looks for, and resolves, xsl:include declarations - the included modules are themselves processed using a UseWhenFilter so the process is recursive. As each module is processed, a key is calculated by calling DocumentFn.computeDocumentKey(), and the built stylesheet document is saved in a hash table under this key.

In a second phase of processing, these constructed documents are assembled into a complete stylesheet. The xsl:include declarations are processed a second time, the document key is computed again, and used to retrieve the included module from the hash table. The NPE is occurring because the module wasn't found in the hash table, which must mean that the document key computed the second time round is different from the original.

Although DocumentFn.computeDocumentKey() has four arguments, only the first two (href and baseURI) should be used when computing a key for a stylesheet module.

My suspicion would be that the base URI used during the first (use-when) phase of processing differs from the base URI used during the second phase; and this is almost certainly related to your use of resolvedSource.setSystemId(resolvedFile.getAbsolutePath()) in the URIResolver. But the detail need further examination.

Actions #3

Updated by Michael Kay over 1 year ago

I have managed to reproduce the problem.

For the first included stylesheet module (inc1), it is stored in the module map using a document key computed directly from the base URI and href, ignoring anything returned by the URI resolver.

For the second included stylesheet module (inc2, included by inc1), it is stored in the module map using a document key computed from the base URI of inc1, which is the value returned in the systemId property by the URIResolver. But gatherUsePackageDeclarations is trying to find it using a key computed with the help of use.getBaseURI(), where use is the node representing the xsl:include element, and this is an empty string.

I suspect the problem is that JDOM document returned by the URIResolver needs to be copied to a Saxon LinkedTree, and the base URI has got lost in the course of this process. Note that the JDOM document itself does not know about the systemId allocated by the URIResolver; this is a property of the JDOMSource object, but not of the JDOM Document.

Actions #4

Updated by Michael Kay over 1 year ago

Indeed, tracing this in the debugger, it appears that when the JDOM document is copied to a LinkedTree (with the appropriate filters to strip comments, whitespace, and processing instructions, plus a UseWhenFilter), the resulting document node has the right base URI, but the element nodes all have a base URI of "" (zero-length string).

(Is there any good reason for returning the stylesheet module from your URIResolver as a JDOM document? We ought to support this, but it's an expensive thing to do unless you have a good reason.)

Actions #5

Updated by Michael Kay over 1 year ago

  • Status changed from New to Resolved
  • Fix Committed on Branch 10, 11, trunk added

The JDOMSource class extends SAXSource, and as far as Saxon is concerned, it is treating the input as if it were an ordinary SAXSource. However, the JDOMSource differs from the usual kind of SAXSource in that its XMLReader (which is pretending to be an XML parser but actually reads nodes off a depth-first walk of the JDOM tree) does not supply any location information; and this is why the base URI of the element nodes is unknown.

The UseWhenFilter, which is on the pipeline between the JDOMSource and the LinkedTree Builder, gets around this problem: if the XmlReader does not supply location information, then it takes the base URI of xsl:include declarations from the systemId property of the pipeline (taking a quick look first to see if there is an xml:base attribute present).

Solution: in LinkedTreeBuilder.startElement(), if location.getSystemId() is null, set location to a location taken from the system ID of the builder itself, overriding that supplied by the caller.

Test case added to URIResolverTest.

Actions #6

Updated by Jan Strakos over 1 year ago

Hello, first of all, thank you very much for your investigation ! Am I to understand that the fix for this problem will be in the next version of Saxon ? Until then, I have to wait, right ? Or is there a workaround ?

Actions #7

Updated by Michael Kay over 1 year ago

Yes, I have committed patches for the development branch, the 11.x branch, and the 10.x branch, and unless you want to try pulling the source code from the open source repository and rebuilding the product yourself (which isn't very easy...) you'll need to wait until the next maintenance release on the relevant branch for a fix. We're very close to a 12.0 release, and when that's out of the way we'll be looking at making new maintenance releases for 11.x and 10.x.

As an interim workaround, I'd suggest returning something other than a JDOMSource from the URIResolver.

Actions #8

Updated by Community Admin over 1 year ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.0 added

Bug issue fix applied in the Saxon 12.0 Major Release. Leaving this bug marked as Resolved until fix applied

Actions #9

Updated by O'Neil Delpratt about 1 year ago

  • Fixed in Maintenance Release 11.5 added

Bug fix applied in the Saxon 11.5 maintenance release.

Actions #10

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 10.9 added

Bug fix applied in the Saxon 10.9 maintenance release.

Please register to edit this issue

Also available in: Atom PDF