Project

Profile

Help

Support #5601

closed

Latter defined entity is honored rather than former one in Saxon-PE 11.3

Added by Toshihiko Makita almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Low
Assignee:
-
Category:
-
Sprint/Milestone:
-
Start date:
2022-07-13
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

I'm testing Saxon-PE 10.8 and 11.3 for user project. When I convert XML file with entity reference, the former entity definition is not honored and the latter defined entity is honored in Saxon-PE 11.3.

I attached the sample file archive: diff-2022-07-13.zip

Reproducing procedure

  1. Unzip diff-2022-07-13.zip
  2. Maintain JDK path and Saxon-PE path in xmllist/test-pe-10.8.ps1 and test-pe-11.3-edit.ps1
  3. At folder xmllist, open PowerShell
  4. Enter command "./test-pe-10.8.ps1". This command will generate output-10.8.xml from input.xml.
  5. Enter command "./test-pe-11.3-edit.ps1". This command will generate output-11.3.xml from input.xml.
  6. Compare the both results.

My test results

Input XML file

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE book PUBLIC "-//AHF//DTD J2008 4.2-based AHF XML DTD for SM//EN" "ahfsm2-4.dtd">
<book>
    Ohm=&ohrm;
</book>

Entity definition

Entity ohrm is defined in iso-num.ent and AHFproc.ent.

iso-num.ent

<!ENTITY ohrm	"&#x2126;"> <!-- OHM SIGN -->

AHFproc.ent

<!ENTITY ohrm	"&#x03A9;"> <!-- Omega -->

Also AHFproc.ent is written after iso-inum.ent.

ahfsm2-4.dtd

<!ENTITY % ISOnum PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML" "iso-num.ent">
%ISOnum;
...
<!ENTITY % AHFproc PUBLIC "-//AHF//ENTITIES for Process//EN//XML" "AHFproc.ent">
%AHFproc;

Output results

output-10.8.xml (The former defined one is adopted)

output-11.3.xml (The latter defined one is adopted)

I think that Saxon-PE 11.3 does not conform XML entity definition rule: the first defined one will be adopted. If there is my misunderstanding, please let me know.

Regards,


Files

saxon-10.8.png (35.8 KB) saxon-10.8.png Toshihiko Makita, 2022-07-13 08:51
saxon-11.3.png (35 KB) saxon-11.3.png Toshihiko Makita, 2022-07-13 08:52
diff-2022-07-13.zip (10.9 KB) diff-2022-07-13.zip Toshihiko Makita, 2022-07-13 08:53
Actions #1

Updated by Michael Kay almost 2 years ago

I haven't checked what the spec says on this, but I'm afraid it's not a Saxon issue, it's an XML parser issue. Saxon hands this off entirely to the XML parser. If you don't like the results of one XML parser, you can always try a different one.

If there's a difference between two Saxon releases, then it seems likely that you're using different XML parsers for some reason.

Actions #2

Updated by Michael Kay almost 2 years ago

On further thought, this could be a side-effect of changes to the resolver mechanism, which mean that external entities are probably being cached by default.

Actions #3

Updated by Toshihiko Makita almost 2 years ago

but I'm afraid it's not a Saxon issue, it's an XML parser issue.

I fully admit your opinion and I'm afraid that two PowerShell script file has no significant differences as far as I see. Saxon-PE is invoked by JDK 17.0.3.1 command-line in both cases.

Actions #4

Updated by Norm Tovey-Walsh almost 2 years ago

On further thought, this could be a side-effect of changes to the
resolver mechanism, which mean that external entities are probably
being cached by default.

Perhaps, but that would only happy if the first and second entities were
external parsed entities with the same URI, in which case, I wouldn’t
think it would matter.

Be seeing you,
norm

--
Norm Tovey-Walsh
Saxonica

Actions #5

Updated by Norm Tovey-Walsh almost 2 years ago

I think I've reproduced it. But I definitely don't undertand it.

Actions #6

Updated by Norm Tovey-Walsh almost 2 years ago

I get exactly the same result with 10.8 as I do with 11.2, so I don't think the resolver is the culprit. But I also get the second definition, which confuses me.

Actions #7

Updated by Norm Tovey-Walsh almost 2 years ago

It is actually resolver related. The problem, I think, is that you've changed the contents of iso-num.ent without changing its public identifier. Saxon 11 uses a jar file containing a fairly broad collection of standard, unchanging DTDs, entity collections, and schemas. Because

<!ENTITY % ISOnum PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML" "iso-num.ent">

Identifies itself as the ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN//XML entity set, the XML Resolver uses the definition from its data jar file instead of attempting to access it via its system identifier.

The "official" version of the numeric and special graphic entities does not include ohrm at all, so the only definition present is the one in AHFproc.ent.

You can fix this by removing or changing the public identifier on ISOnum or putting the new entity definition it its own entity set and including that one in addition to ISOnum.

Saxon 10 has a different mechanism for attempting to resolve standard entity sets, but it appears not to have had that public identifier in its lookup table and you're not attempting to use one of the standard W3C URIs that it would have recognized, so you don't experience the same issue there.

Hope that's helpful!

Actions #8

Updated by Norm Tovey-Walsh almost 2 years ago

  • Status changed from New to Resolved
Actions #9

Updated by Toshihiko Makita almost 2 years ago

The "official" version of the numeric and special graphic entities does not include ohrm at all, so the only definition present is the one in AHFproc.ent.

OH! My god!

This entity definition file has been created by my user and they may add "ohrm" entry. I will inform this to the user to change the public identifier not use the "official" one.

Thank you very much.

Actions #10

Updated by Michael Kay almost 2 years ago

  • Tracker changed from Bug to Support

Reclassifying as "Support" so it doesn't appear on the "resolved bugs" list.

Please register to edit this issue

Also available in: Atom PDF