Project

Profile

Help

Bug #5877

open

Java-CustomGUID generator is producing the same result when used inside foreach loop

Added by Thirupathi Molugoori about 1 year ago. Updated about 1 year ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Saxon extensions
Sprint/Milestone:
-
Start date:
2023-02-10
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:
Java

Description

InputXml:

<?xml version=\"1.0\"?>
<Guids>
	<serialNum>1</serialNum>
</Guids>
<Guids>
	<serialNum>2</serialNum>
</Guids>			

Case1:Using functions inside xslt

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:func="http://www.xsltfunctions.com"
	xmlns:helper="com.test.xslt.function" extension-element-prefixes="helper" version="2.0">
	<xsl:template match="TestXML">
		<xsl:element name="TestElement">
			<xsl:for-each select="./Guids">
				<xsl:element name="serialNum">
					<xsl:value-of select="serialNum"/>
				</xsl:element>
				<xsl:element name="TestGuid">
					<xsl:value-of select="func:genGuid()"/>
				</xsl:element>
			</xsl:for-each>
		</xsl:element>
	</xsl:template>
	<xsl:function name="func:genGuid">
		<xsl:value-of select="helper:genGUID()"/>
	</xsl:function>
</xsl:stylesheet>

**Output	**
<?xml version="1.0" encoding="UTF-8"?>
<TestElement>
	<serialNum>1</serialNum>
	<TestGuid>D5CAF5D4-8C05-408A-B56A-A277A8F33975</TestGuid>
	<serialNum>2</serialNum>
	<TestGuid>D5CAF5D4-8C05-408A-B56A-A277A8F33975</TestGuid>
</TestElement>

Case2:If I use the custom function directly inside the for-each(instead of template/function) then also it is producing the same guid as above which is not expected.

Case3:Using template inside Xslt is producing the correct result

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:xs="http://www.w3.org/2001/XMLSchema"
	xmlns:helper="com.test.xslt.function" extension-element-prefixes="helper" version="2.0">
	<xsl:template match="TestXML">
		<xsl:element name="TestElement">
			<xsl:for-each select="./Guids">
				<xsl:element name="serialNum">
					<xsl:value-of select="serialNum"/>
				</xsl:element>
				<xsl:element name="TestGuid">
					<xsl:call-template name="genGUID"/>
				</xsl:element>
			</xsl:for-each>
		</xsl:element>
	</xsl:template>
	<xsl:template name="genGUID">
		<xsl:value-of select="helper:genGUID()"/>
	</xsl:template>
</xsl:stylesheet>

Output:

<?xml version="1.0" encoding="UTF-8"?>
<TestElement>
	<serialNum>1</serialNum>
	<TestGuid>E75AE452-4B7F-4CF8-A6CE-64DE81687384</TestGuid>
	<serialNum>2</serialNum>
	<TestGuid>771D2C44-8216-4CA7-B43A-2CBEB03AF7F3</TestGuid>
</TestElement>

Customer function implementation:

package com.test.xslt.function;

import java.util.UUID;

import net.sf.saxon.s9api.ExtensionFunction;
import net.sf.saxon.s9api.QName;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.SequenceType;
import net.sf.saxon.s9api.XdmAtomicValue;
import net.sf.saxon.s9api.XdmValue;

public class GUIDGenFunction implements ExtensionFunction {

   @Override
   public QName getName() {
      return new QName( "com.test.xslt.function", "genGUID" );
   }
   @Override
   public SequenceType[] getArgumentTypes() {
      return new SequenceType[]{};
   }

   @Override
   public SequenceType getResultType() {
      return SequenceType.ANY;
   }

   @Override
   public XdmValue call( XdmValue[] arguments ) throws SaxonApiException {
      String result = UUID.randomUUID().toString().toUpperCase();
      System.out.println( "GUID Result " + result );
      return new XdmAtomicValue( result );
   }
}
ExtensionFunction genGUIDFunction = new GUIDGenFunction();
processor.registerExtensionFunction( genGUIDFunction );

Please let me know why it is not working as expected in case1&2.

Actions #1

Updated by Thirupathi Molugoori about 1 year ago

Thirupathi Molugoori wrote:

InputXml:

1 2

Case1:Using functions inside xslt for-each

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:func="http://www.xsltfunctions.com" xmlns:helper="com.test.xslt.function" extension-element-prefixes="helper" version="2.0"> <xsl:template match="TestXML"> <xsl:element name="TestElement"> <xsl:for-each select="./Guids"> <xsl:element name="serialNum"> <xsl:value-of select="serialNum"/> </xsl:element> <xsl:element name="TestGuid"> <xsl:value-of select="func:genGuid()"/> </xsl:element> </xsl:for-each> </xsl:element> </xsl:template> <xsl:function name="func:genGuid"> <xsl:value-of select="helper:genGUID()"/> </xsl:function> </xsl:stylesheet>

**Output **

1 D5CAF5D4-8C05-408A-B56A-A277A8F33975 2 D5CAF5D4-8C05-408A-B56A-A277A8F33975

Case2:If I use the custom function directly inside the for-each(instead of template/function) then also it is producing the same guid as above which is not expected.

Case3:Using template inside Xslt for-each is producing the correct result

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:helper="com.test.xslt.function" extension-element-prefixes="helper" version="2.0"> <xsl:template match="TestXML"> <xsl:element name="TestElement"> <xsl:for-each select="./Guids"> <xsl:element name="serialNum"> <xsl:value-of select="serialNum"/> </xsl:element> <xsl:element name="TestGuid"> <xsl:call-template name="genGUID"/> </xsl:element> </xsl:for-each> </xsl:element> </xsl:template> <xsl:template name="genGUID"> <xsl:value-of select="helper:genGUID()"/> </xsl:template> </xsl:stylesheet>

Output:

1 E75AE452-4B7F-4CF8-A6CE-64DE81687384 2 771D2C44-8216-4CA7-B43A-2CBEB03AF7F3

Customer function implementation: package com.test.xslt.function;

import java.util.UUID;

import net.sf.saxon.s9api.ExtensionFunction; import net.sf.saxon.s9api.QName; import net.sf.saxon.s9api.SaxonApiException; import net.sf.saxon.s9api.SequenceType; import net.sf.saxon.s9api.XdmAtomicValue; import net.sf.saxon.s9api.XdmValue;

public class GUIDGenFunction implements ExtensionFunction {

@Override public QName getName() { return new QName( "com.test.xslt.function", "genGUID" ); } @Override public SequenceType[] getArgumentTypes() { return new SequenceType[]{}; }

@Override public SequenceType getResultType() { return SequenceType.ANY; }

@Override public XdmValue call( XdmValue[] arguments ) throws SaxonApiException { String result = UUID.randomUUID().toString().toUpperCase(); System.out.println( "GUID Result " + result ); return new XdmAtomicValue( result ); } } ExtensionFunction genGUIDFunction = new GUIDGenFunction(); processor.registerExtensionFunction( genGUIDFunction );

Please let me know why it is not working as expected in case1&2.

Actions #2

Updated by Michael Kay about 1 year ago

  • Description updated (diff)
Actions #3

Updated by Michael Kay about 1 year ago

Please note the Javadoc for ExtensionFunction:

Extension functions implemented using this interface are expected to be free of side-effects, * and to have no dependencies on the static or dynamic context. A richer interface for extension * functions is provided via the {@link net.sf.saxon.lib.ExtensionFunctionDefinition} class.

A GUID generator is not a pure function because when you call it twice with the same arguments, it produces different results. In that sense it is considered to have side-effects. Integrating impure functions into a functional language like XSLT is challenging.

If you use the richer interface ExtensionFunctionDefinition then you have the opportunity to declare that your function has side-effects, and if you do this then the Saxon optimizer will treat it more gently, for example it will try to avoid function inlining and loop-lifting. However, this isn't a total guarantee, especially if your function is called several layers deep inside other functions.

We've also introduced the instruction saxon:do as a way of invoking functions known to have side-effects, though that's designed primarily for functions that return no result. Again, it's not a complete remedy.

The only clean solution to this problem is to follow the design pattern of fn:random-number-generator(), where making a call on the function returns a new random number generator. This establishes a functional dependency between successive calls. (This is essentially implementing the "monad" concept from languages like Haskell).

The reason it works with templates and not with functions is that Saxon optimises functions much more aggressively. It is able to do so because functions (by design) have far less context dependency.

Actions #4

Updated by Thirupathi Molugoori about 1 year ago

I appreciate your quick response, Thank you. I have also tried with ExtensionFunctionDefinition but there is no difference in the outcome.

With the same input xml and xslt (using function inside xslt for each loop)- produced the correct result i.e. different guids for each iteration with Saxon-9.1.0.8 version. Is there any difference in the implementation between Saxon-9.1.0.8 and Saxon-HE-11.3 ? Why it is behaving differently? Please clarify

Actions #5

Updated by Martin Honnen about 1 year ago

Did you make sure you implemented https://www.saxonica.com/html/documentation11/javadoc/net/sf/saxon/lib/ExtensionFunctionDefinition.html#hasSideEffects() to return true, to implement Michael's hint "you have the opportunity to declare that your function has side-effects"?

Actions #6

Updated by Thirupathi Molugoori about 1 year ago

I tried it now. The custom implementation is as below

package com.test.xslt.function;

import java.util.UUID;

import net.sf.saxon.expr.XPathContext;
import net.sf.saxon.lib.ExtensionFunctionCall;
import net.sf.saxon.lib.ExtensionFunctionDefinition;
import net.sf.saxon.om.Sequence;
import net.sf.saxon.om.StructuredQName;
import net.sf.saxon.trans.XPathException;
import net.sf.saxon.value.SequenceType;
import net.sf.saxon.value.StringValue;

public class EFDGUIDGenFunction extends ExtensionFunctionDefinition{

   @Override
   public boolean hasSideEffects() {
         return true;
   }
   
   @Override
   public StructuredQName getFunctionQName() {
      return new StructuredQName( "helper", "com.test.xslt.function", "genGUID" );
   }

   @Override
   public SequenceType[] getArgumentTypes() {
      return new SequenceType[]{};
   }

   @Override
   public SequenceType getResultType( SequenceType[] suppliedArgumentTypes ) {
      return SequenceType.ANY_SEQUENCE;
   }

   @Override
   public ExtensionFunctionCall makeCallExpression() {
      return new ExtensionFunctionCall() {
         @Override
         public Sequence call( XPathContext context, Sequence[] arguments ) throws XPathException {
            String result = UUID.randomUUID().toString().toUpperCase();
            System.out.println( "GUID Result From EFD Function " + result );
            return StringValue.makeStringValue( result );
         }
      };
   }

}

ExtensionFunctionDefinition genGUIDFunction = new EFDGUIDGenFunction();
processor.registerExtensionFunction( genGUIDFunction );

And it resulted in below output:

<?xml version="1.0" encoding="UTF-8"?>
<TestElement>
	<serialNum>1</serialNum>
	<TestGuid>1B35CC86-1D2F-4EDE-B9FC-5F5123F6BF76</TestGuid>
	<serialNum>2</serialNum>
	<TestGuid>1B35CC86-1D2F-4EDE-B9FC-5F5123F6BF76</TestGuid>
</TestElement>

It is producing the same guids only. Please let me know if I am missing something here.

Actions #7

Updated by Martin Honnen about 1 year ago

I am not sure what goes wrong, I have tried with the current version 11.5 HE and the code

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="3.0"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                exclude-result-prefixes="#all"
                xmlns:guid="com.test.xslt.function"
                expand-text="yes">

    <xsl:mode on-no-match="shallow-copy"/>

    <xsl:output indent="yes"/>

    <xsl:template match="root">
        <xsl:copy>
            <xsl:for-each select="item">
                <xsl:copy>
                    <value>{.}</value>
                    <guid>{guid:genGUID()}</guid>
                </xsl:copy>
            </xsl:for-each>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="/" name="xsl:initial-template">
        <xsl:next-match/>
        <xsl:comment>Run with {system-property('xsl:product-name')} {system-property('xsl:product-version')} {system-property('Q{http://saxon.sf.net/}platform')}</xsl:comment>
    </xsl:template>

</xsl:stylesheet>

and the Java code

package org.example;

import net.sf.saxon.lib.ExtensionFunctionDefinition;
import net.sf.saxon.s9api.Processor;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.Xslt30Transformer;
import net.sf.saxon.s9api.XsltCompiler;

import javax.xml.transform.stream.StreamSource;
import java.io.File;

public class Main {
    public static void main(String[] args) throws SaxonApiException {
        Processor processor = new Processor(false);

        ExtensionFunctionDefinition genGUIDFunction = new EFDGUIDGenFunction();
        processor.registerExtensionFunction( genGUIDFunction );

        XsltCompiler xsltCompiler = processor.newXsltCompiler();

        Xslt30Transformer xslt30Transformer = xsltCompiler.compile(new StreamSource(new File("guid-generator-test2.xsl"))).load30();

        xslt30Transformer.applyTemplates(new StreamSource(new File("sample1.xml")), xslt30Transformer.newSerializer(System.out));
    }
}

and the result output is e.g.

GUID Result From EFD Function 9C2DAE54-B95C-4C0A-A29A-091C4136E713
GUID Result From EFD Function DF570537-85D6-4848-904A-B9F153CE4438
GUID Result From EFD Function 2FCD166B-662B-42D3-A316-977813FCF76A
<?xml version="1.0" encoding="UTF-8"?>
<root>
   <item>
      <value>a</value>
      <guid>9C2DAE54-B95C-4C0A-A29A-091C4136E713</guid>
   </item>
   <item>
      <value>b</value>
      <guid>DF570537-85D6-4848-904A-B9F153CE4438</guid>
   </item>
   <item>
      <value>c</value>
      <guid>2FCD166B-662B-42D3-A316-977813FCF76A</guid>
   </item>
</root>
<!--Run with SAXON HE 11.5 -->

So three different GUIDs.

Actions #8

Updated by Thirupathi Molugoori about 1 year ago

Hi Martin Honnen, With templates, yes it is giving the different GUIDs and I have mentioned that in Case3: in my first comment. The problem is with using function or directly using the helper:genGUID inside for each loop. As I have mentioned in my previous comment, it was working with saxon:9.1.0.8.

Actions #9

Updated by Martin Honnen about 1 year ago

I kind of don't see why you need that wrapper function but can you live with implementing it as

    <xsl:function name="func:genGuid" as="xs:string">
        <xsl:sequence select="guid:genGUID()"/>
    </xsl:function>

? In that case I get new, different GUIDs.

For your case of

	<xsl:function name="func:genGuid">
		<xsl:value-of select="helper:genGUID()"/>
	</xsl:function>

you need to wait for Michael Kay to tell what to do, I would have thought that

	<xsl:function name="func:genGuid" new-each-time="yes">
		<xsl:value-of select="helper:genGUID()"/>
	</xsl:function>

should ensure you get a new value on each call but it doesn't seem to happen.

Actions #10

Updated by Michael Kay about 1 year ago

You can get an insight into how Saxon is optimizing this stylesheet by using the -explain option on the command line. This shows:

OPT : At line 12 of file:/Users/mike/Desktop/temp/test.xsl
OPT : Lifted (convert(mergeAdj(func:genGuid()))) above (<TestElement {Guids!(<serialNum {xsl:value-of}/>, ...)}/>) on line 6
OPT : Expression after rewrite: let $Q{http://saxon.sf.net/generated-variable}v0 := convertTo_xs:string(data(mergeAdj(Q{http://www.xsltfunctions.com}genGuid()))) 

which shows that the call on function:getGUID() has been lifted out of the for-each loop, because Saxon is assuming it will return the same result each time.

One way to prevent that is to switch off loop-lifting entirely, using the option -opt:-l (that's lower-case letter ell).

I'm surprised that setting new-each-time="yes" on the function doesn't prevent loop-lifting -- I'll look into that.

The simplest way of defeating the optimizer on this one is to make the function take a parameter, and call it supplying "position()" as the value of the parameter.

As to your question regarding Saxon 9.1.0.8, the optimizer has become far more ambitious in the intervening 13 years. Calling extension functions with side-effects has never been fully supported in a reliable way, but the more ambitious the optimizer becomes, the more likely it is that code with side-effects won't work.

Actions #11

Updated by Martin Honnen about 1 year ago

I have had a look at some W3C test cases on new-each-time, like function-1028, when I adapt it to have two functions involved with e.g.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="3.0"
    xmlns:x="http://xxx.com/" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="#all">

    <!-- proactive functions -->

    <xsl:function name="x:test" new-each-time="yes" as="xs:string">
        <xsl:variable name="new-node" as="element()">
            <e></e>
        </xsl:variable>
        <xsl:sequence select="generate-id($new-node)" />
    </xsl:function>

	<xsl:function name="x:genGuid" new-each-time="yes">
		<xsl:value-of select="x:test()"/>
	</xsl:function>

    <xsl:template match="/*">
        <out>
          <xsl:variable name="ids" as="xs:string*">
            <xsl:for-each
                select="(1,4,6,8,3,5,6,2,1,3)">
                <xsl:value-of select="x:genGuid()" />
            </xsl:for-each>
          </xsl:variable>
          <xsl:value-of select="count(distinct-values($ids))"/>
        </out>
    </xsl:template>

</xsl:stylesheet>

Saxon (tested with CS 12 and HE 11.5) gives <?xml version="1.0" encoding="UTF-8"?><out>1</out>.

Actions #12

Updated by Michael Kay about 1 year ago

  • Category set to Saxon extensions
  • Assignee set to Michael Kay
  • Priority changed from High to Normal

I'm experimenting with changing the system properties on a user function call so that if the function specifies new-each-time="yes", the function call will be marked with the HAS_SIDE_EFFECTS property, which inhibits loop-lifting.

The only problem is that new-each-time="yes" is the default which will cause loop-lifting to be inhibited for all user function calls unless we can establish (by analysis of the function body) that the call is actually deterministic.

There are two questions here: (a) is Saxon's current behaviour (loop-lifting the function call) actually conformant with the spec, given that user functions are by default "proactive"? (b) if not, would it cause too many user performance problems if we changed it?

Please register to edit this issue

Also available in: Atom PDF