Project

Profile

Help

Bug #5884

closed

SaxonCS 12 doesn't find elements in HTML DOM based on [@class = 'foo'] predicate

Added by Martin Honnen about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Features new in 4.0
Sprint/Milestone:
-
Start date:
2023-02-16
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
12, trunk
Fix Committed on Branch:
12, trunk
Fixed in Maintenance Release:
Platforms:
.NET

Description

I get a strange failure by SaxonCS to select elements with predicates based on the class attribute in HTML DOMs (i.e. returned by saxon:parse-html or fn:parse-html); a query like 'C:\Program Files\Saxonica\SaxonCS-12.0\SaxonCS.exe' query -q:.\saxon-parse-html-test1.xq !indent=yes returns nothing but <?xml version="1.0" encoding="UTF-8"?>.

XQuery sample

saxon:parse-html(unparsed-text('test2.html'))//*:h2[contains(@class, 'foo')]

HTML document:

<!doctype html>
<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <h2>h2 1</h2>
    <h2 class=foo>h2 2</h2>
  </body>
</html>

Saxon EE 12.0 Java finds e.g.

<?xml version="1.0" encoding="UTF-8"?>
<h2 xmlns="http://www.w3.org/1999/xhtml" class="foo">h2 2</h2>
Actions #1

Updated by Michael Kay about 1 year ago

Problem reproduced.

Looks like AttributeGetter is calling a method overload getAttributeValue() that isn't implemented on HtmlNodeWrapper, and the generic implementation on the superclass returns null.

In the course of tracking this in the debugger, I also found an inefficiency in the elaboration code: the default implementation of Elaborator.elaborateForString() is calling elaborateForUnicodeString() on each call, rather than only doing it once and reusing the result.

Actions #2

Updated by Michael Kay about 1 year ago

  • Category changed from Saxon extensions to Features new in 4.0
  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Applies to branch trunk added
  • Fix Committed on Branch 12, trunk added
Actions #3

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.1 added

Bug fix applied in the Saxon 12.1 maintenance release.

Please register to edit this issue

Also available in: Atom PDF