Project

Profile

Help

Regex problem

Added by Anonymous about 16 years ago

Legacy ID: #4972867 Legacy Poster: Dennis Brothers (dbrothers)

I'm not sure whether the problem is in my wetware or in Saxon, but I've got a regex problem that's driving me crazy. I'm using xsl:analyze-string to find and remove empty HTML markup from xml documents. The regex "&lt;([pP])&gt;[\r\n\s]&lt;/\1&gt;" works fine, locating all empty <P> tags. When I change it to "&lt;([bBpP])&gt;[\r\n\s]&lt;/\1&gt;", to detect <B> as well as <P> tags, it no longer recognizes the <P> tags (nor the <B> tags either). The above is a simplified test case; I was originally using a regex of "&lt;([bBpPhH]\d?)&gt;[\r\n\s]*&lt;/\1&gt;" to detect <P>, <B>, and <Hn> tags and found it only recognized the <Hn> tags. - Dennis Brothers


Replies (3)

Please register to reply

RE: Regex problem - Added by Anonymous about 16 years ago

Legacy ID: #4973918 Legacy Poster: Michael Kay (mhkay)

When reporting a problem like this, it's best to provide your code in a form where anyone can run it. I can't reproduce the problem. The stylesheet <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"> <xsl:output indent="yes" /> <xsl:variable name="data"><![CDATA[<P> </P>]]></xsl:variable> <xsl:template match="/"> <a match="{matches($data, '&lt;([bBpP])&gt;[\r\n\s]*&lt;/\1&gt;')}"/> </xsl:template> </xsl:stylesheet> (with any input) produces the output <?xml version="1.0" encoding="UTF-8"?> <a match="true"/> What platform are you on? Java or .NET? Which Java version? Note that regular expression support in GCJ is pretty fragile, or was when I last tried it.

RE: Regex problem - Added by Anonymous about 16 years ago

Legacy ID: #4973996 Legacy Poster: Dennis Brothers (dbrothers)

I think I'm on Java - I'm using Stylus Studio Enterprise 2008 R2. The stylesheet is large and complex; it processes about 400 MB of data broken up into 19 files. This is the 8th phase of a "cleanup" process; the previous phases, which also used analyze-string in various ways, appear to have functioned correctly. FWIW, this is the first regex in the series that uses a back-reference. Also FWIW, the data contains the escaped entity references (&lt; etc.) rather than using CDATA. When I get to the office in an hour or so I'll try to extract a simple reproducible case. - Dennis Brothers

RE: Regex problem - Added by Anonymous about 16 years ago

Legacy ID: #4974271 Legacy Poster: Dennis Brothers (dbrothers)

Well, I eliminated everything in the transform unrelated to the failing analyze-string, and set up a couple of records of test data, and of course it now works. I'll start putting stuff back and see if I can get it to fail again. - Dennis Brothers

    (1-3/3)

    Please register to reply