Project

Profile

Help

Different regex behaviour on Windows & Linux

Added by Anonymous over 16 years ago

Legacy ID: #4492664 Legacy Poster: aascon (aascon)

I have a strange problem here. I am using SaxonB 8.9 java version on both Linux and Windows. The code is being developed on a Win PC but will eventually run in a production environment on a Linux box. However, I get different outputs from the same stylesheet depending on which machine I run it on. The input is something like <kwd>stars: individual (RX J0052.9-7158, 2E0053.7-7227, SMC X-2)</kwd> The desired output for this would be <kwd>stars: individual<ind>RX J0052.9-7158</ind><ind>2E0053.7-7227</ind><ind>SMC X-2</ind></kwd> And the relevant code I am using is <xsl:analyze-string select="." regex="\s*(([^)]+))s*"> <xsl:matching-substring> <xsl:for-each select="tokenize(regex-group(1),'\s*,\s*')"> <ind><xsl:value-of select="."/></ind> </xsl:for-each> </xsl:matching-substring> <xsl:non-matching-substring> <xsl:value-of select="."/> </xsl:non-matching-substring> </xsl:analyze-string> which does indeed produce the desired output on both platforms. However, if the string being matched contains an entity or character reference, the string will still be matched on the Windows machine but not on the Linux one! eg. <kwd>stars: individual (RX J0052.9&#8722;7158, 2E0053.7-7227, SMC X-2)</kwd> and <kwd>stars: individual (RX J0052.9&minus;7158, 2E0053.7-7227, SMC X-2)</kwd> produce output of <kwd>stars: individual<ind>RX J0052.9&#8722;7158</ind><ind>2E0053.7-7227</ind><ind>SMC X-2</ind></kwd> on the Win box but are not matched on the Linux box and passed out as the non-matching-substring, eg <kwd>stars: individual (RX J0052.9&#8722;7158, 2E0053.7-7227, SMC X-2)</kwd> Has anyone got a clue as to why this is happening? cheers, Bruce


Replies (3)

Please register to reply

RE: Different regex behaviour on Windows &amp; Li - Added by Anonymous over 16 years ago

Legacy ID: #4492801 Legacy Poster: aascon (aascon)

Whoops, sorry. There's a typo in the regex. I missed off the backslash before the last 's'. It should have read regex="\s*(([^)]+))\s*"

RE: Different regex behaviour on Windows &amp; Li - Added by Anonymous over 16 years ago

Legacy ID: #4492893 Legacy Poster: aascon (aascon)

Problem solved (thanks to David Carlisle!). It was down to a difference in Java VMs on the two machines. Upgraded both to 1.6 and this solved it.

RE: Different regex behaviour on Windows &amp; Li - Added by Anonymous over 16 years ago

Legacy ID: #4493033 Legacy Poster: Michael Kay (mhkay)

Glad you've solved the problem. But the reason Saxon has different regex code for the two JDK versions is to mask any differences between them, so the results should always be the same. That's the theory. I'll look into what's happening when I get back home. Michael Kay

    (1-3/3)

    Please register to reply