Project

Profile

Help

Bug #5668

closed

Generated bytecode for string() fails with NullPointerException when argument is ()

Added by Michael Kay over 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Byte code generation
Sprint/Milestone:
-
Start date:
2022-08-30
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
11, trunk
Fix Committed on Branch:
11, trunk
Fixed in Maintenance Release:
Platforms:
Java

Description

A user reports a failure on 11.3, showing an internal error caused by a NullPointerException in generated bytecode.

Although the user reported that the failure occurred on some (but not all) Windows platforms, I have reproduced the failure in the IDE on the 11.x development branch, on MacOS.


Related issues

Has duplicate Saxon - Bug #6115: NullPointerException on StringConverterDuplicateMichael Kay2023-07-042023-07-05

Actions
Actions #1

Updated by Michael Kay over 1 year ago

The failure occurs while executing the generated bytecode for a fairly complex stylesheet function lbs:days_for_nth_weekday_occurrence - the bytecode makes a call to UntypedSequenceConverter.untypedConverter() supplying a StringValue with type annotation xs:untypedAtomic, and content = null.

The offending StringValue appears to be constructed by a call on StringValue.makeUntypedAtomic(UnicodeString value) from within the same bytecode method, supplying a null UnicodeString.

There are a dozen places where the bytecode compiler generates calls to this method. In a number of them, we avoid performing a run-time check on the string being null if static analysis indicates that the check is unnecessary. So a possible cause is that the static type analysis is wrong (inferring a non-null type where in fact the value is nullable).

Actions #2

Updated by Michael Kay over 1 year ago

Need to think about how to approach solving this one. Switching on bytecode tracing seems unpromising, because the stylesheet does a great deal of processing before the problem occurs, which would make the amount of trace output totally unmanageable.

One approach might be to try and cut down the stylesheet to the simplest possible form that generates the failure. Perhaps this could be done by identifying the parameters to the stylesheet function that cause the failure, and generating a standalone call to this function with the same parameter values?

Another angle of attack might be to inject more selective tracing of the specific bytecode locations that call the failing method. At each of the dozen places where calls on makeUntypedAtomic() are generated, we could inject code to output the line number of the expression being compiled.

Or we could explore the theory that the failure is the result of an incorrect type inference. On the paths where we don't check that the value is null because static type inference says it can't be, we could inject a call on a diagnostic method that performs this check, together with information about the expression being compiled.

Actions #3

Updated by Michael Kay over 1 year ago

I tried adding declarations of the required parameter types:

<xsl:param name="date" as="xs:date"></xsl:param>
<xsl:param name="weekday" as="xs:string"></xsl:param>
<xsl:param name="occurrence" as="xs:string"></xsl:param>

to see if this changed anything, but it didn't.

I tried tracing the parameters to the function using xsl:message. The failure occurred after a call with

Date 2022-09-07 weekday Wed occurrence 5

which are values not at all dissimilar to many previous successful calls on the function. Indeed, there has been a previous call with exactly the same parameters (though it's possible that that call was before bytecode generation kicked in).

So I experimented with the option --thresholdForHotspotByteCode and yes, the failure occurs the first time the method is called using generated bytecode, irrespective of the actual function parameters.

Actions #4

Updated by Michael Kay over 1 year ago

I added selective diagnostics to the bytecode at the places where we call makeUntypedAtomic(), and ran with --debugByteCode:on. Unfortunately this causes bytecode generation to fail because the method becomes too large (it exceeds the JVM bytecode limits).

So this is going to be a tough one to debug.

Actions #5

Updated by Michael Kay over 1 year ago

OK, by injecting trace code more selectively I've identified the call site as

Message: StringToUntyped2.makeUntypedAtomic Q{http://www.w3.org/2001/XMLSchema}untypedAtomic(string(convert(data($weekday_dates[Q{http://www.w3.org/2001/XMLSchema}integer($currOccurrence)]))))

That gives us something to be going on, but it will probably have to wait till tomorrow.

Actions #6

Updated by Michael Kay over 1 year ago

On the failure path, it seems that $occurrence is '5', while $weekday_dates is a sequence of 4 text nodes.

So $currOccurrenceDate should be a document node with no content, which would mean that the result of the function should a zero-length text node, which gets converted to a zero-length string by virtue of the fact that the function return type is declared as xs:string. (The code is heavily using xsl:value-of rather than xsl:sequence to return a function result, so we have to look very carefully at code that constructs text nodes and converts them back to strings).

But we don't get that far. The failure occurs while evaluating

xs:untypedAtomic(string(convert(data($weekday_dates[xs:integer($currOccurrence)]))))

apparently because the string() call returns an empty sequence, represented internally as null. The converter to untypedAtomic doesn't allow for this possibility, because string() is never supposed to return null.

The "convert()" function shown here is an AtomicSequenceConverter, and injecting further diagnostics, we see that it is actually converting to xs:string. Since the data() call returns an empty sequence, this converter should also return an empty sequence, which the string() call should convert to a zero-length string.

Debugging into the bytecode generator for string(), it is called with onEmpty=returnNull which is incorrect (this setting means that if the argument is an empty sequence, the result will be an empty sequence, whereas it should be a zero-length string).

The value onEmpty=returnNull is set by StringToUntypedAtomicCompiler.compileToPrimitive which uses the existing value unchanged.

A change to StringFnCompiler so it always returns a zero-length string if the argument is empty, ignoring the caller's onEmpty setting, makes the test case work.

Of course this will need careful regression testing to ensure it has no side-effects.

Actions #7

Updated by Michael Kay over 1 year ago

A note about the way the code is written: the evaluation is very contorted because of the constant conversions between strings and text nodes (and this is where the Saxon bug kicks in). It would be much better to write the function like this:

<xsl:function name="lbs:days_for_nth_weekday_occurrence" as="xs:string">
		<xsl:param name="date" as="xs:date"></xsl:param>
		<xsl:param name="weekday" as="xs:string"></xsl:param>
		<xsl:param name="occurrence" as="xs:string"></xsl:param>
		
		<!--this can likely be optimized for performance so that the dates in a given month don't need to be recomputed each time-->
		<xsl:variable name="weekday_dates" as="xs:string*" select="lbs:dates_in_month(substring(xs:string($date),6,2), substring(xs:string($date),1,4))/dates/date[lower-case(substring(format-date(.,'[F]'),1,3)) = lower-case(substring($weekday,1,3))]" />

		<xsl:variable name="currOccurrence" select="tokenize(normalize-space($occurrence),',')[1]" as="xs:string?"/>
		<xsl:variable name="currOccurrenceDate" as="xs:string?>
			<xsl:choose>
				<xsl:when test="$currOccurrence = 'last'">
					<xsl:sequence select="$weekday_dates[count($weekday_dates)]"/>		
				</xsl:when>
				<xsl:otherwise>
					<xsl:sequence select="$weekday_dates[xs:integer($currOccurrence)]"/>		
				</xsl:otherwise>
			</xsl:choose>									
		</xsl:variable>
		
		<xsl:choose>
			<xsl:when test="contains($occurrence,',')">
				<xsl:variable name="remainingOccurrence" select="normalize-space(substring($occurrence,string-length($currOccurrence)+2))" as="xs:string"/>

				<!--using recursion here to compute the concatenated string of dates separate by an underscore-->
				<xsl:sequence select="concat($currOccurrenceDate,'_',lbs:days_for_nth_weekday_occurrence($date,$weekday,$remainingOccurrence))"></xsl:value-of>
			</xsl:when>
			
			<xsl:otherwise>
				<xsl:sequence select="$currOccurrenceDate"/>
			</xsl:otherwise>
		</xsl:choose>

	</xsl:function>
Actions #8

Updated by Michael Kay over 1 year ago

  • Subject changed from Generated bytecode fails with NullPointerException to Generated bytecode for string() fails with NullPointerException when argument is ()
  • Status changed from New to In Progress
Actions #9

Updated by Michael Kay over 1 year ago

  • Status changed from In Progress to Resolved
  • Fix Committed on Branch 11, trunk added
  • Platforms Java added

The patch is now regression tested so it will go into the next maintenance release.

Actions #10

Updated by Community Admin over 1 year ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 12.0 added

Bug issue fix applied in the Saxon 12.0 Major Release. Leaving this bug marked as Resolved until fix applied

Actions #11

Updated by O'Neil Delpratt about 1 year ago

  • Fixed in Maintenance Release 11.5 added

Bug applied in the Saxon 11.5 Maintenance release.

Actions #12

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from Resolved to Closed
Actions #13

Updated by Michael Kay 10 months ago

  • Has duplicate Bug #6115: NullPointerException on StringConverter added

Please register to edit this issue

Also available in: Atom PDF