Saxon-PE 11.3 fails at resolving external entity
I'm testing Saxon-PE 10.8 and 11.3 for user project. When I convert XML file using 10.8, it works without no problem. However, 11.3 reports java.net.URISyntaxException. Here is the screen shot. The test has been done on Windows 10 + PowerShell.
It seems that Windows path notation "..\master\glossary\gls.ent" in ahfsm-custom.ent is not handles properly. I attached the ZIP data archive.
- Unzip diff-2022-06-23.zip
- Maintain JDK path and Saxon-PE path in xmllist/test-pe-10.8.ps1 and test-pe-11.3.ps1
- At folder xmllist, open PowerShell
- Enter command "./test-pe-10.8.ps1". This command will end normally.
- Enter command "./test-pe-11.3.ps1". This command will end with exception.
Hope this helps to fix the 11.3 problem.
Updated by Norm Tovey-Walsh 9 days ago
I'm testing Saxon-PE 10.8 and 11.3 for user project. When I convert
XML file using 10.8, it works without no problem. However, 11.3
reports java.net.URISyntaxException. Here is the screen shot. The test
has been done on Windows 10 + PowerShell.
Thanks for the test case. I’ll take a look. I’m suspicious that there’s
an issue with the local encoding. From the screen shot, apparently
“..\master\glossary\gls.ent” is rendenered by Windows as
“..￥master￥glossary￥gls.ent” which is a little worrisome.
Be seeing you,
Updated by Toshihiko Makita 9 days ago
Thank you for your notification.
I’m suspicious that there’s an issue with the local encoding.
It is known font problem specific to Japanese fonts used to display on Windows.
We (Japanese) are so accustomed with this text output result. But people outside Japan will worry about encoding.
Hope this helps your understanding.
Updated by Norm Tovey-Walsh 6 days ago
On closer inspection, I can see what the problem is. In XML, system identifiers aren't filenames, they're URIs. Unescaped backslashes are not valid characters in a URI.
In Saxon 11, we updated the XML resolver used in the product and made catalogs available by default. The new resolver treats the URIs as
java.net.URI objects where the old resolver just carried them around as strings. Since the Java
URI class is enforcing constraints imposed by the URI specification, it's not clear that there's much we can change.
The easiest workaround is to replace the "\" characters in your system identifiers with either "/" characters or encoded backslashes, "%5C".
If neither of those workarounds is practical, let me know and I'll try to think of another answer.
Updated by Toshihiko Makita 6 days ago
In XML, system identifiers aren't filenames, they're URIs.
You are absolutely right. The problem is Adobe FrameMaker. The relevant user has been used Adobe FrameMaker over 20 years. As a result, there is tons of this path notations in the CMS. So, it is very difficult to tell user that this notation is not right as URI. This is the most headache problem for me.
Updated by Norm Tovey-Walsh 5 days ago
- Status changed from In Progress to Resolved
I've published XML Resolver 4.4.0 which includes an option to address this problem. Use the "FIX_WINDOWS_SYSTEM_IDENTIFIERS" feature.
For example, you can set it with a system property:
java "-Dxml.catalog.fixWindowsSystemIdentifiers=true" -cp ...
You can also set it in a configuration file or via the API, depending on what makes the most sense in your environment. You'll need to swap out the XML Resolver 4.2.0 library for the 4.4.0 version. Instructions about that are now on the Saxonica website: https://www.saxonica.com/html/documentation11/about/installationjava/jarfiles.html
Please let me know if you continue to have difficulty.
Updated by Norm Tovey-Walsh 4 days ago
Thank you for your quick fix!!! Very appreciated.
You'll need to swap out the XML Resolver 4.2.0 library for the 4.4.0 version.
Is 4.4.0 version already published in Saxonica Web site?
No, I hadn’t considered copying it to the Saxonica web site. You can get
it from Maven or from
Be seeing you,
Please register to edit this issue