Project

Profile

Help

Bug #5883

closed

SaxonCS 11.5 gives System.NullReferenceException: Object reference not set to an instance of an object on using XPath //*:h2[contains(@class, 'foo')] against DOM constructed from saxon:parse-html

Added by Martin Honnen about 1 year ago. Updated 8 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
.NET API
Sprint/Milestone:
Start date:
2023-02-16
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
11
Fix Committed on Branch:
11
Fixed in Maintenance Release:
Platforms:
.NET

Description

I have an HTML document like

<!doctype html>
<html>
  <head>
    <title>Test</title>
  </head>
  <body>
    <h2>h2 1</h2>
    <h2 class=foo>h2 2</h2>
  </body>
</html>

that I try to process with SaxonCS 11.5 query from the command line with e.g. 'C:\Program Files\Saxonica\SaxonCS-11.5\SaxonCS.exe' query -qversion:4.0 -qs:"xquery version '4.0'; saxon:parse-html(unparsed-text('test1.html'))//*:h2[contains(@class, 'foo')]" --allowSyntaxExtensions:on !indent=yes.

That gives a NullReferenceException (after the output of the XML declaration<?xml version="1.0" encoding="UTF-8"?>):

System.NullReferenceException: Object reference not set to an instance of an object.
   at Saxon.Impl.Dom.HtmlNodeWrapper.getAttributeValue(String uri, String local)
   at Saxon.Hej.expr.AttributeGetter.evaluateItem(XPathContext context)
   at Saxon.Hej.expr.AtomicSequenceConverter.evaluateItem(XPathContext context)
   at Saxon.Hej.expr.Expression.evaluateAsString(XPathContext context)
   at Saxon.Hej.functions.Contains.Inner_Optimized_2.effectiveBooleanValue(XPathContext context)
   at Saxon.Hej.expr.FilterIterator.NonNumeric.matches()
   at Saxon.Hej.expr.FilterIterator.getNextMatchingItem()
   at Saxon.Hej.expr.FilterIterator.next()
   at Saxon.Hej.om.SequenceTool.supply(SequenceIterator iter, ItemConsumer`1 consumer)
   at Saxon.Hej.expr.Expression.process(Outputter output, XPathContext context)
   at Saxon.Hej.query.XQueryExpression.run(DynamicQueryContext env, Result result, Properties outputProperties)
   at Saxon.Hej.s9api.XQueryEvaluator.run(Destination destination)
   at Saxon.Hej.Query.runQuery(XQueryExecutable exp, XQueryEvaluator evaluator, Source input, Destination destination)
   at Saxon.Hej.Query.doQuery(String[] args, String command)
Fatal error during query: NullReferenceException: Object reference not set to an instance of an object.
Exiting with code 2

I can't test it directly with 12.0 as that has that (I think known) bug not being able to resolve the file name/relative file path given in -qs.

I will later test in an .xq file with 12.0.

Actions #1

Updated by Martin Honnen about 1 year ago

I have now tested with SaxonCS 12.0, it doesn't give a NullReferenceException but strangely fails to select based on a predicate [@class = 'foo'], I have filed that as a separate bug https://saxonica.plan.io/issues/5884.

Actions #2

Updated by Michael Kay about 1 year ago

I'll be honest, it's hard to know what to do about parse-html() in SaxonCS 11.x. During development of 12.x, we created a large number of new tests for parse-html(), and these revealed that the implementation based on HtmlAgillityPack was badly broken - partly because the spec had changed as part of the QT4 effort, partly because HtmlAgilityPack doesn't actually implement the HTML5 parsing algorithm, and partly because we had run far too few tests. So I wrote a brand new implementation based on AngleSharp.

I shelved this problem to focus on getting it right for 12.x. The options for 11.x are to withdraw the feature, to retrofit the 12.x implementation, or to "make do and mend". None of these options is particularly attractive.

Actions #3

Updated by Michael Kay about 1 year ago

The HtmlAgilityPack API documentation is appallingly bad. We were calling node.GetAttributes("class") and it returns a list containing a single null. Lacking any documentation for the method, I've no idea why it does that or whether it does it deliberately, but in any case we fall in a heap.

I've changed it to call node.GetAttributeValue("class", "improbable-default-value") and this is working, though it will give the wrong answer if the attribute actually has that value.

Actions #4

Updated by Michael Kay about 1 year ago

  • Status changed from New to Resolved
  • Assignee set to Michael Kay
  • Applies to branch 11 added
  • Fix Committed on Branch 11 added
Actions #5

Updated by Debbie Lockett 8 months ago

  • Status changed from Resolved to Closed
  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 11.6 added

Bug fix applied in the Saxon 11.6 maintenance release.

Please register to edit this issue

Also available in: Atom PDF