Project

Profile

Help

Basic XPath...

Added by Anonymous about 19 years ago

Legacy ID: #3099535 Legacy Poster: Oliver Cole (stormeagle)

I am trying to get some basic XPath working, and failing. I created the following code based on XPathExample: private void go() throws Exception { System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl"); XPathFactory xpf = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON); XPath xpe = xpf.newXPath(); InputSource is = new InputSource(new File("tnt2.html").toURL().toString()); SAXSource ss = new SAXSource(is); NodeInfo doc = ((XPathEvaluator)xpe).setSource(ss); xpe.setXPathVariableResolver(this); //XPathExpression findStatus = xpe.compile("/HTML/BODY/TABLE[3]/TBODY/TR/TD/TABLE[2]/TBODY/TR[3]/TD[4]"); XPathExpression findStatus = xpe.compile("/BODY"); List matchedLines = (List)findStatus.evaluate(doc, XPathConstants.NODESET); System.out.println(matchedLines); } The long XPath is what I eventually plan to use, but I can't get anything working at the moment. If I run "/" it returns a TinyDocumentImpl, but I put in any other query, and it returns null... What am I doing wrong? Thanks in advance, Oli


Replies (9)

Please register to reply

RE: Basic XPath... - Added by Anonymous about 19 years ago

Legacy ID: #3099612 Legacy Poster: Michael Kay (mhkay)

Could it be that all your elements in the source document are in the XHTML namespace? Try a query without any names to see if that works, e.g. /* or //*:BODY. If the names are in the XHTML namespace (note: this is implicitly declared in the XHTML DTD) then you will need to prefix names in the XPath expression (e.g. /x:HTML/x:BODY) and call setNamespaceContext to bind the namespace prefix. (However, I''m not convinced this is the problem, because your example uses upper case and XHTML requires lower case. I would need to see the source document.) Michael Kay

RE: Basic XPath... - Added by Anonymous about 19 years ago

Legacy ID: #3099655 Legacy Poster: Oliver Cole (stormeagle)

Thanks for the quick reply! The source document can be found at http://www.tnt.com/webtracker/tracking.do?requestType=GEN&searchType=CON&navigation=1&respLang=en&respCountry=GB&genericSiteIdent=&cons=865183640 Yes, it is in the XHTML namespace, I don't know why I deviated into using caps in my XPath. What do I call setNamespaceContext on? Thanks, Oli

RE: Basic XPath... - Added by Anonymous about 19 years ago

Legacy ID: #3099669 Legacy Poster: Michael Kay (mhkay)

setNamespaceContext() is a method on the XPath interface. You need to supply a NamespaceContext object. The only method on the NamespaceContext that Saxon is likely to call is getNamespaceURI(String prefix) which needs to return the namespace URI corresponding to any prefix you have used in the XPath expression. Michael Kay

RE: Basic XPath... - Added by Anonymous about 19 years ago

Legacy ID: #3099681 Legacy Poster: Oliver Cole (stormeagle)

OK, so I am now using: private void go() throws Exception { System.setProperty("javax.xml.xpath.XPathFactory:"+NamespaceConstant.OBJECT_MODEL_SAXON, "net.sf.saxon.xpath.XPathFactoryImpl"); XPathFactory xpf = XPathFactory.newInstance(NamespaceConstant.OBJECT_MODEL_SAXON); XPath xpe = xpf.newXPath(); InputSource is = new InputSource(new File("tnt2.html").toURL().toString()); SAXSource ss = new SAXSource(is); NodeInfo doc = ((XPathEvaluator)xpe).setSource(ss); xpe.setXPathVariableResolver(this); xpe.setNamespaceContext( new NamespaceContext() { public String getNamespaceURI(String prefix) { if (prefix.equals("x")) return "http://www.w3.org/1999/xhtml"; else return null; } public String getPrefix(String arg0) {return null;} public Iterator getPrefixes(String arg0) {return null;} }); //XPathExpression findStatus = xpe.compile("/HTML/BODY/TABLE[3]/TBODY/TR/TD/TABLE[2]/TBODY/TR[3]/TD[4]"); XPathExpression findStatus = xpe.compile("/x:html/x:body/x:table"); List matchedLines = (List)findStatus.evaluate(doc, XPathConstants.NODESET); System.out.println(matchedLines); } That query string works, but as soon as I add my [3] it starts returning null again. Have I got my syntax wrong? Oli

RE: Basic XPath... - Added by Anonymous about 19 years ago

Legacy ID: #3099720 Legacy Poster: Michael Kay (mhkay)

That query string works, but as soon as I add my [3] it starts returning nullagain. Have I got my syntax wrong? No, it would give you an exception if the syntax were wrong. If it returns null, that's because the expression didn't select any nodes. You haven't shown me the source document so I can't tell you why that is... Michael Kay

RE: Basic XPath... - Added by Anonymous about 19 years ago

Legacy ID: #3099728 Legacy Poster: Oliver Cole (stormeagle)

I showed you the source document in my third post: http://www.tnt.com/webtracker/tracking.do?requestType=GEN&searchType=CON&navigation=1&respLang=en&respCountry=GB&genericSiteIdent=&cons=865183640 Thanks, Oli

RE: Basic XPath... - Added by Anonymous about 19 years ago

Legacy ID: #3099775 Legacy Poster: Michael Kay (mhkay)

It's a very peculiar source document because some of the elements are in a namespace and some aren't (look for the xmlns=""). When you've got a mess like this to cope with, the best way might be to use the construct *:table which matches an element with local name "table" in any namespace (or none). Better, of course, would be to sort the source document out. Michael Kay

RE: Basic XPath... - Added by Anonymous about 19 years ago

Legacy ID: #3099834 Legacy Poster: Oliver Cole (stormeagle)

Hmm, I won't argue with that, the source is definitely far from ideal. However, fixing it isn't really an option, as I'm going to be grabbing it live from the site each time. Ok, so I turned off the setNamespaceContext, and my query now works, one step further... "/:html/:body/:table[3]" returns a TinyElementImpl, but "/:html/:body/:table[3]/*:tbody" doesn't. I am looking at the DOM in the Firefox DOM inspector, so I don't have to look at the source to know that the *:tbody should refer to the only child of that table element. Thanks, Oli

RE: Basic XPath... - Added by Anonymous about 19 years ago

Legacy ID: #3100236 Legacy Poster: Michael Kay (mhkay)

I'm surprised you've got as far as you have, because your XHTML document is not even well-formed: c:\temp>java -jar c:\MyJava\saxon8.jar "http://www.tnt.com/webtracker/tracking.do?requestType=GEN&searchType=CON&navigation=1&respLang=en&respCountry=GB&genericSiteIdent=&cons=865183640" c:\MyJava\samples\styles\identity.xsl !indent=yes Warning: Running an XSLT 1.0 stylesheet with an XSLT 2.0 processor Error on line 48 column 48 of http://www.tnt.com/webtracker/tracking.do?requestType=GEN&searchType=CON&navigation=1&respLang=en&respCountry=GB&genericSiteIdent=&cons=865183640: SXXP0003: Error reported by XML parser: The entity name must immediately follow the '&' in the entity reference. Transformation failed: Run-time errors were reported You can be pretty sure that if your XPath expression isn't selecting anything that's because it doesn't match anything.

    (1-9/9)

    Please register to reply