Project

Profile

Help

How random is a sequence produced with calls to random-number-generator()?

Added by Kai Weber over 1 year ago

A colleague today observed that sequences of numbers or permutations produced with Saxon's implementation of the XPath 3.1 function random-number-generator() look not as random as he would have expected. That got us into discussions on how to properly use this function or pseudo random generators in general.

When calling the function four times with slightly changed seeds, we get quite similar resulting numbers, e. g.

random-number-generator('jkasfghjuaelafsdjflasdf-1')?number,
random-number-generator('jkasfghjuaelafsdjflasdf-2')?number,
random-number-generator('jkasfghjuaelafsdjflasdf-3')?number,
random-number-generator('jkasfghjuaelafsdjflasdf-4')?number

returns

0.7320934054462759
0.7323621507628908
0.7322725689906858
0.731824660129661

I think that's understandable, as Saxon uses Java's standard java.util.Random class, which says that its pseudo random generator is not cryptographically secure. I understand this as saying: Unlike with cryptographically secure RNGs with a non-secure RNG you might get similar outputs for similar inputs / seeds.

So, if we want to get a sequence of random looking numbers, I think we should not call random-number-generator() consecutively with an increasing counter of seeds. Instead I think we should use consecutive calls to the next() function returned by a random-number-generator() call. However, to my understanding of pseudo random number generators, if I want a high probability of a random-looking distribution of a number sequence, I should seed the RNG once and then stick with this seed. (Just like somebody quoted on this Stack Overflow thread: "The 'randomness' of the sequence does not depend on the seed chosen, though it does depend on not reseeding the sequence".

Now, when I was looking at the source code of Saxon-HE's random-number-generator implementation I see that every call to the next() function does actually reseed with a new instance of java.util.Random instead of returning the next number of the previously seeded RNG. Doesn't that potentially weaken the perceived randomness of the generated sequence? Or is my intuition about the proper use of Java's standard RNG Random wrong?

Best regards, Kai Weber


Replies (7)

Please register to reply

RE: How random is a sequence produced with calls to random-number-generator()? - Added by Michael Kay over 1 year ago

Certainly it would be wrong to have an expectation that two random sequences generated from very similar seeds start with very different values; the idea is that any one sequence is random within itself, not that it's completely different from any other sequence starting. (Nevertheless, I'm a little surprised at the behaviour you have demonstrated).

You're right that we appear to be calling new Random(x) on each call of next(), where x is the previous number in the sequence, rather than calling nextDouble() on the original instance of Random. It's possible that this is inefficient (I don't know), but I can't see why it should be incorrect: unlike your example, it's not as if all the calls on new Random(x) are using very similar values of x.

RE: How random is a sequence produced with calls to random-number-generator()? - Added by Kai Weber over 1 year ago

Thanks for your answer!

I am not worried about the overhead of creating new objects the the Random() class, I guess that's negligible. I was rather wondering about contractual guarantees of RNGs. To my understanding an RNG guarantees that a sequence of 1000 numbers looks random if you seed it once and then call next() a 1000 times, but it does not guarantee that a sequence of numbers looks random if you seed it a 1000 times with a random number and then always take the first number. However, this is an abstract and untested subjective notion, not something I have actually confirmed with Saxon's implementation of random-number-generator. It may well be that for all practical purposes the output of your implementation looks random enough, and it may also well be that the Java implementation of java.util.Random.nextDouble() will actually produce a random distribution of "first numbers" when re-seeded again and again, no matter what a general RNG contract would guarantee. Anyway, that's not a question specific to Saxon's XSLT processor, that's something I might try to figure out elsewhere, looking at Java's implementation of the Random class.

Best regards, Kai

RE: How random is a sequence produced with calls to random-number-generator()? - Added by Martin Honnen over 1 year ago

Just for comparison, here is the output of BaseX 10.7 (which with XQuery 3.1 also implements the XPath 3.1 function random-number-generator):

random-number-generator('jkasfghjuaelafsdjflasdf-1')?number,
random-number-generator('jkasfghjuaelafsdjflasdf-2')?number,
random-number-generator('jkasfghjuaelafsdjflasdf-3')?number,
random-number-generator('jkasfghjuaelafsdjflasdf-4')?number

gives e.g.

0.2617995323684167
0.26188912904178285
0.26197871081398794
0.2620682925861929

RE: How random is a sequence produced with calls to random-number-generator()? - Added by Kai Weber over 1 year ago

Interesting. Looking at their source code they seem to have copied some functionality over from Java's Random class, rather than just using it (cf. https://github.com/BaseXdb/basex/blob/10.7/basex-core/src/main/java/org/basex/query/func/fn/FnRandomNumberGenerator.java).

They also seem to calculate some hash from the seed passed to the XPath function, then using the hash as their seed rather than the original value. Their hashing function probably produces similar outputs for similar inputs as well, therefore their actual number output for the same seeds is different from Saxon's implementation, but still similar seeds lead to similar first random numbers.

What most interested me in their implementation was their implementation of the returned next() function, and if I understand it correctly, they're using the same approach as Saxon: They return a new (or at least re-seeded) instance of an RNG. So, that's at least common practice, and most of all: It's in accordance with the W3C specification, which says, next() returns "another random number generator", not that it returns the next number from the currently initialized / seeded RNG. So I need to realize that it's not the implementations that run against my RNG user's intuitions, but it's the spec.

RE: How random is a sequence produced with calls to random-number-generator()? - Added by Vladimir Nesterovsky over 1 year ago

Similar problem with Random was reported in java in the past, see:

https://stackoverflow.com/questions/12282628/why-are-initial-random-numbers-similar-when-using-similar-seeds

There people respond why, and don't recommend to use it in sensitive applications.

In addition author observes that this behavior pertains only to initial value in random sequence but not to next values, so one simple solution would be to drop first value.

RE: How random is a sequence produced with calls to random-number-generator()? - Added by Martin Honnen over 1 year ago

In the context of XSLT 3 I find storing the "current" random-number-generator in an accumulator that is changed when you match the nodes where you need a new "random" number easily allows you to start with e.g. random-number-generator(current-dateTime()) and then use next ?next() to access the ?number using code like e.g.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="3.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="#all"
  expand-text="yes">
  
  <xsl:accumulator name="rng" as="map(*)" initial-value="random-number-generator(current-dateTime())">
    <xsl:accumulator-rule match="*" select="$value?next()"/>
  </xsl:accumulator>

  <xsl:mode on-no-match="shallow-copy" use-accumulators="rng"/>
  
  <xsl:template match="*">
    <xsl:comment>Random number: {accumulator-before('rng')?number}</xsl:comment>
    <xsl:next-match/>
  </xsl:template>

  <xsl:template match="/" name="xsl:initial-template">
    <xsl:copy>
      <xsl:apply-templates/>
      <xsl:comment>Run with {system-property('xsl:product-name')} {system-property('xsl:product-version')} at {current-dateTime()}</xsl:comment>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

RE: How random is a sequence produced with calls to random-number-generator()? - Added by Martin Honnen over 1 year ago

In a pure XPath 3.1 environment you can use random-number-generator with fold-left to generate a sequence of n random numbers e.g. 20 numbers with

fold-left(1 to 20, random-number-generator(current-dateTime()), function ($a, $i) { let $rng := head($a) return ($rng?next(), $rng?number, tail($a)) }) => tail()
    (1-7/7)

    Please register to reply