Project

Profile

Help

Bug #5580

closed

Saxon-PE 11.3 fails at resolving external entity

Added by Toshihiko Makita 9 days ago. Updated 4 days ago.

Status:
Resolved
Priority:
Normal
Category:
Third-party product
Sprint/Milestone:
-
Start date:
2022-06-24
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
11, trunk
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

I'm testing Saxon-PE 10.8 and 11.3 for user project. When I convert XML file using 10.8, it works without no problem. However, 11.3 reports java.net.URISyntaxException. Here is the screen shot. The test has been done on Windows 10 + PowerShell.

PowerShell screen shot

It seems that Windows path notation "..\master\glossary\gls.ent" in ahfsm-custom.ent is not handles properly. I attached the ZIP data archive.

diff-2022-06-23.zip

Reproducing procedure:

  1. Unzip diff-2022-06-23.zip
  2. Maintain JDK path and Saxon-PE path in xmllist/test-pe-10.8.ps1 and test-pe-11.3.ps1
  3. At folder xmllist, open PowerShell
  4. Enter command "./test-pe-10.8.ps1". This command will end normally.
  5. Enter command "./test-pe-11.3.ps1". This command will end with exception.

Hope this helps to fix the 11.3 problem.


Files

2022-06-24-2.png (92.8 KB) 2022-06-24-2.png PowerShell screen shot Toshihiko Makita, 2022-06-24 03:45
diff-2022-06-23.zip (35.3 KB) diff-2022-06-23.zip Test data archive Toshihiko Makita, 2022-06-24 03:51
2022-06-29-9.png (60.5 KB) 2022-06-29-9.png Toshihiko Makita, 2022-06-29 13:32
Actions #1

Updated by Michael Kay 9 days ago

  • Category set to Third-party product
  • Assignee set to Norm Tovey-Walsh
  • Applies to branch 11, trunk added
Actions #2

Updated by Norm Tovey-Walsh 9 days ago

I'm testing Saxon-PE 10.8 and 11.3 for user project. When I convert
XML file using 10.8, it works without no problem. However, 11.3
reports java.net.URISyntaxException. Here is the screen shot. The test
has been done on Windows 10 + PowerShell.

Thanks for the test case. I’ll take a look. I’m suspicious that there’s
an issue with the local encoding. From the screen shot, apparently
“..\master\glossary\gls.ent” is rendenered by Windows as
“..¥master¥glossary¥gls.ent” which is a little worrisome.

Be seeing you,
norm

--
Norm Tovey-Walsh
Saxonica

Actions #3

Updated by Toshihiko Makita 9 days ago

Thank you for your notification.

I’m suspicious that there’s an issue with the local encoding.

It is known font problem specific to Japanese fonts used to display on Windows.

Backslash & Yen sign behavior

We (Japanese) are so accustomed with this text output result. But people outside Japan will worry about encoding.

Hope this helps your understanding.

Regards,

Actions #4

Updated by Norm Tovey-Walsh 6 days ago

On closer inspection, I can see what the problem is. In XML, system identifiers aren't filenames, they're URIs. Unescaped backslashes are not valid characters in a URI.

In Saxon 11, we updated the XML resolver used in the product and made catalogs available by default. The new resolver treats the URIs as java.net.URI objects where the old resolver just carried them around as strings. Since the Java URI class is enforcing constraints imposed by the URI specification, it's not clear that there's much we can change.

The easiest workaround is to replace the "\" characters in your system identifiers with either "/" characters or encoded backslashes, "%5C".

If neither of those workarounds is practical, let me know and I'll try to think of another answer.

Actions #5

Updated by Norm Tovey-Walsh 6 days ago

  • Status changed from New to AwaitingInfo
  • Priority changed from Low to Normal
Actions #6

Updated by Toshihiko Makita 6 days ago

In XML, system identifiers aren't filenames, they're URIs.

You are absolutely right. The problem is Adobe FrameMaker. The relevant user has been used Adobe FrameMaker over 20 years. As a result, there is tons of this path notations in the CMS. So, it is very difficult to tell user that this notation is not right as URI. This is the most headache problem for me.

Actions #7

Updated by Norm Tovey-Walsh 5 days ago

  • Status changed from AwaitingInfo to In Progress

Okay. I'll add a feature to the XML Resolver to fix this problem. I won't be surprised if it happens to other users as well.

Actions #8

Updated by Norm Tovey-Walsh 5 days ago

  • Status changed from In Progress to Resolved

I've published XML Resolver 4.4.0 which includes an option to address this problem. Use the "FIX_WINDOWS_SYSTEM_IDENTIFIERS" feature.

For example, you can set it with a system property:

java "-Dxml.catalog.fixWindowsSystemIdentifiers=true" -cp ...

You can also set it in a configuration file or via the API, depending on what makes the most sense in your environment. You'll need to swap out the XML Resolver 4.2.0 library for the 4.4.0 version. Instructions about that are now on the Saxonica website: https://www.saxonica.com/html/documentation11/about/installationjava/jarfiles.html

Please let me know if you continue to have difficulty.

Actions #9

Updated by Toshihiko Makita 4 days ago

Thank you for your quick fix!!! Very appreciated.

You'll need to swap out the XML Resolver 4.2.0 library for the 4.4.0 version.

Is 4.4.0 version already published in Saxonica Web site?

Actions #10

Updated by Norm Tovey-Walsh 4 days ago

Thank you for your quick fix!!! Very appreciated.

You'll need to swap out the XML Resolver 4.2.0 library for the 4.4.0 version.

Is 4.4.0 version already published in Saxonica Web site?

No, I hadn’t considered copying it to the Saxonica web site. You can get
it from Maven or from

https://github.com/xmlresolver/xmlresolver/releases/tag/4.4.0

Be seeing you,
norm

--
Norm Tovey-Walsh
Saxonica

Actions #11

Updated by Toshihiko Makita 4 days ago

Thank you, it worked like a charm!

VSCode terminal window

Please register to edit this issue

Also available in: Atom PDF