Project

Profile

Help

Support #6554

closed

The contains (text()) function does not return all results

Added by Ati Wolf 3 months ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
-
Start date:
2024-10-01
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
12
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:
Java

Description

Version:12.4 Gradle dependency: group: 'net.sf.saxon', name: 'Saxon-HE', version: '12.4'

Xpath: //li[contains(text(), 'Text 3')]

In version 12.3 it brings 2 records, while in version 12.4 only 1 piece for the same xpath. I have attached my test file.


Files

test.xml (225 Bytes) test.xml Ati Wolf, 2024-10-01 12:59
Actions #1

Updated by Michael Kay 3 months ago

  • Tracker changed from Bug to Support

You haven't said how you are running the XPath expression.

The correct result, for XPath 2.0+, is an error, and this is the result I am getting: the contains() function does not allow a sequence of multiple items to be supplied as the first argument. That's because the second li element has multiple text node children.

If you want to tell if the string value of the li element contains a particular substring, the correct expression is //li[contains(., 'Text 3')]. If you want to tell whether any of the text node children contains a particular substring, the correct expression is //li[text()[contains(., 'Text 3')]].

It's possible that you were running in XPath 1.0 compatibility mode, in which case you wouldn't get an error, rather it would ignore all text nodes except the first.

I'm not sure what you mean in your question about a "record" or a "piece" - I think you're using non-technical terms here - and it would help to know more precisely exactly what you were doing and exactly what the results were.

I can't think of any change between 12.2 and 12.3 that would affect this, and indeed, I can't think of any scenario where your expression would deliver multiple results: it should deliver an error in 2.0+ mode or a single item (the third li element) in 1.0 mode.

Actions #2

Updated by Ati Wolf 3 months ago

Michael Kay wrote in #note-1:

You haven't said how you are running the XPath expression.

The correct result, for XPath 2.0+, is an error, and this is the result I am getting: the contains() function does not allow a sequence of multiple items to be supplied as the first argument. That's because the second li element has multiple text node children.

If you want to tell if the string value of the li element contains a particular substring, the correct expression is //li[contains(., 'Text 3')]. If you want to tell whether any of the text node children contains a particular substring, the correct expression is //li[text()[contains(., 'Text 3')]].

It's possible that you were running in XPath 1.0 compatibility mode, in which case you wouldn't get an error, rather it would ignore all text nodes except the first.

I'm not sure what you mean in your question about a "record" or a "piece" - I think you're using non-technical terms here - and it would help to know more precisely exactly what you were doing and exactly what the results were.

I can't think of any change between 12.2 and 12.3 that would affect this, and indeed, I can't think of any scenario where your expression would deliver multiple results: it should deliver an error in 2.0+ mode or a single item (the third li element) in 1.0 mode.

How can I switch between XPath 1.0 and XPath 2.0?

The goal is to process an xhtml to work similarly to js libraries:

Here's how it's used:

net.sf.saxon.xpath.XPathFactoryImpl saxon = new net.sf.saxon.xpath.XPathFactoryImpl();
XPath newXPath = saxon.newXPath();
newXPath.compile(expression);

NodeList nodeList = (NodeList) newXPath.evaluate(contextNode, XPathConstants.NODESET);

12.3 - This version throws an error

net.sf.saxon.trans.UncheckedXPathException: A sequence of more than one item is not allowed as the first argument of fn:contains() ("
		Text 1
		", "
	") 

12.4 - In this version, the size of the nodeList is 1

Actions #3

Updated by Michael Kay 3 months ago

How can I switch between XPath 1.0 and XPath 2.0?

Using the JAXP XPath API, you can switch to XPath 1.0 compatibility mode by doing

net.sf.saxon.xpath.XPathFactoryImpl saxon = new net.sf.saxon.xpath.XPathFactoryImpl();
XPath newXPath = saxon.newXPath();
((net.sf.saxon.xpath.XPathEvaluator)newXPath).getStaticContext().setBackwardsCompatibilityMode(true);
newXPath.compile(expression);
Actions #4

Updated by Michael Kay 3 months ago

I ran the following Java program against both 12.3 and 12.4:

public static void main(String[] args) throws InterruptedException {

            try {
                XPathFactoryImpl saxon = new XPathFactoryImpl();
                XPath newXPath = saxon.newXPath();
                XPathExpression exp = newXPath.compile("//li[contains(text(), 'Text 3')]");

                DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
                factory.setNamespaceAware(true);
                DocumentBuilder builder = factory.newDocumentBuilder();
                Node root = builder.parse(new File("/Users/mike/...../test.xml"));

                NodeList nodeList = (NodeList) exp.evaluate(root, XPathConstants.NODESET);
                System.out.println(nodeList.getLength());
            } catch (XPathExpressionException | ParserConfigurationException | SAXException | IOException e) {
                throw new RuntimeException(e);
            }

        }

In both cases it failed, as expected, saying

Exception in thread "main" net.sf.saxon.trans.UncheckedXPathException
Caused by: net.sf.saxon.trans.XPathException: A sequence of more than one item is not allowed as the first argument of fn:contains() ("
		Text 1
		", "
	")

If I change it to set 1.0 compatibility mode as described above, it outputs "1" under either 12.3 or 12.4.

These results are all correct according to the spec.

If you still think there is a problem, please supply a repro that shows exactly what you are doing so that I can reproduce your results.

Actions #5

Updated by Michael Kay 3 months ago

  • Status changed from New to AwaitingInfo
  • Assignee set to Michael Kay
  • Applies to branch 12 added
Actions #6

Updated by Michael Kay about 1 month ago

  • Status changed from AwaitingInfo to Closed

Closing this as it has gone dormant.

Please register to edit this issue

Also available in: Atom PDF