Bug #5884
closedSaxonCS 12 doesn't find elements in HTML DOM based on [@class = 'foo'] predicate
100%
Description
I get a strange failure by SaxonCS to select elements with predicates based on the class
attribute in HTML DOMs (i.e. returned by saxon:parse-html
or fn:parse-html
); a query like 'C:\Program Files\Saxonica\SaxonCS-12.0\SaxonCS.exe' query -q:.\saxon-parse-html-test1.xq !indent=yes
returns nothing but <?xml version="1.0" encoding="UTF-8"?>
.
XQuery sample
saxon:parse-html(unparsed-text('test2.html'))//*:h2[contains(@class, 'foo')]
HTML document:
<!doctype html>
<html>
<head>
<title>Test</title>
</head>
<body>
<h2>h2 1</h2>
<h2 class=foo>h2 2</h2>
</body>
</html>
Saxon EE 12.0 Java finds e.g.
<?xml version="1.0" encoding="UTF-8"?>
<h2 xmlns="http://www.w3.org/1999/xhtml" class="foo">h2 2</h2>
Updated by Michael Kay almost 2 years ago
Problem reproduced.
Looks like AttributeGetter is calling a method overload getAttributeValue() that isn't implemented on HtmlNodeWrapper, and the generic implementation on the superclass returns null.
In the course of tracking this in the debugger, I also found an inefficiency in the elaboration code: the default implementation of Elaborator.elaborateForString() is calling elaborateForUnicodeString() on each call, rather than only doing it once and reusing the result.
Updated by Michael Kay almost 2 years ago
- Category changed from Saxon extensions to Features new in 4.0
- Status changed from New to Resolved
- Assignee set to Michael Kay
- Applies to branch trunk added
- Fix Committed on Branch 12, trunk added
Updated by O'Neil Delpratt over 1 year ago
- Status changed from Resolved to Closed
- % Done changed from 0 to 100
- Fixed in Maintenance Release 12.1 added
Bug fix applied in the Saxon 12.1 maintenance release.
Please register to edit this issue