Project

Profile

Help

Bug #6222

closed

XMLResolver fails on windows paths in system identifiers that are reachable but can't be resolved, even if "-Dxml.catalog.fixWindowsSystemIdentifiers=true" was given.

Added by Stefan Krause 7 months ago. Updated 6 months ago.

Status:
Closed
Priority:
Low
Category:
-
Sprint/Milestone:
-
Start date:
2023-10-12
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
11, 12
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:
Java

Description

XMLResolver fails on windows paths in system identifiers that are reachable but can't be resolved, even if "-Dxml.catalog.fixWindowsSystemIdentifiers=true" was given.

stack trace:

java.lang.IllegalArgumentException: Illegal character in path at index 1: .\test.dtd
        at java.base/java.net.URI.create(Unknown Source)
        at java.base/java.net.URI.resolve(Unknown Source)
        at org.xmlresolver.Resolver.openConnection(Resolver.java:250)
        at org.xmlresolver.Resolver.resolveEntity(Resolver.java:204)
        at net.sf.saxon.lib.CatalogResourceResolver.resolveEntity(CatalogResourceResolver.java:194)
        at net.sf.saxon.lib.EntityResolverWrappingResourceResolver.resolveEntity(EntityResolverWrappingResourceResolver.java:46)
[…]

Steps to reproduce (see attached ZIP file):

xml-inputfile.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root SYSTEM ".\test.dtd">
<root id="id12345"/>

test.dtd

  • is a valid DTD
  • is located in the same directory as xml-inputfile.xml

command line

java -cp "path\to\saxon\*;path\to\saxon\lib\*" -Dxml.catalog.fixWindowsSystemIdentifiers=true net.sf.saxon.Transform -xsl:"main.xsl" -s:"xml-inputfile.xml" -o:"xml-outputfile.xml" [-catalog:catalog_0X.xml]

Current Results

  • no catalog given (run_01.bat)
    • fails
  • catalog is given and system identifier can be resolved (run_02.bat uses catalog_01.xml)
    • works
  • catalog is given and system identifier can't be resolved (run_03.bat uses catalog_02.xml)
    • fails

Expected results: (like run_02.bat)

  • input file is loaded
  • no exception is thrown
  • no warning appears

Files

#6222_xmlresolver.zip (3.34 KB) #6222_xmlresolver.zip Stefan Krause, 2023-10-12 09:34
Actions #2

Updated by Norm Tovey-Walsh 7 months ago

  • Assignee set to Norm Tovey-Walsh
Actions #3

Updated by Norm Tovey-Walsh 6 months ago

  • Status changed from New to AwaitingInfo

I think this is the expected behavior.

If I'm understanding this correctly, what's happening is that run_03.bat attempts to resolve .\test.dtd with catalog_02.xml. There is no matching catalog entry for the DTD, so no match is found. The resolver reports failure and Saxon goes on to try the original URI, .\test.dtd which isn't a valid URI.

You can see this if you set -Dxml.catalog.defaultLoggerLogLevel=debug:

...
config: Fix Windows system identifiers: true
config: Default logger log level: debug
config: Searching for catalogs on classpath:
config: Catalog: jar:file:/C:/Users/norm/Desktop/%236222_xmlresolver/SaxonHE12-3J/lib/xmlresolver-5.2.0-data.jar!/org/xmlresolver/catalog.xml
config: Throw URI exceptions: true
config: Catalog list cleared
config: Catalog: file:/C:/Users/norm/Desktop/%236222_xmlresolver/catalog_02.xml
request: resolveEntity: ./test.dtd (baseURI: file:/C:/Users/norm/Desktop/%236222_xmlresolver/xml-inputfile.xml, publicId: null)
config: Loaded catalog: file:/C:/Users/norm/Desktop/%236222_xmlresolver/catalog_02.xml
config: Loaded catalog: jar:file:/C:/Users/norm/Desktop/%236222_xmlresolver/SaxonHE12-3J/lib/xmlresolver-5.2.0-data.jar!/org/xmlresolver/catalog.xml
response: resolveEntity: null
java.lang.IllegalArgumentException: Illegal character in path at index 1: .\test.dtd
        at java.base/java.net.URI.create(URI.java:883)
        at java.base/java.net.URI.resolve(URI.java:1066)

Note that the resolver is looking for ./test.dtd indicating that it has fixed the Windows system identifier.

The point at which the resolver responds null is the point at which the resolver exits the picture. At this point, Saxon has no choice but to ask for the original URI and that...isn't a URI.

XML system identifiers are not filenames, or file paths, they are URIs and "\" is not a valid character in a URI.

If you can't fix the input documents to contain valid system identifiers as URIs, then your best bet is to make sure that the catalog successfully resolves the identifiers.

Actions #4

Updated by Stefan Krause 6 months ago

Sorry for the delayed response, I was out of office.

I see your point here, and you can close this ticket.

For now, this is just a problem with our test cases. Maybe in the future we need an easy solution to make run_01.bat and run_03.bat working, since we can not know in advance which DTDs where referenced. Please consider that many applications including OxygenXML, XMetaL, and Saxon had accepted and/or produced such entity references during the last 25 years.

Actions #5

Updated by Norm Tovey-Walsh 6 months ago

  • Status changed from AwaitingInfo to Closed

Thank you.

Please register to edit this issue

Also available in: Atom PDF