Project

Profile

Help

Saxon's HttpUrlConnection usages during doc calls for remote resources have no timeout?

Added by Kean Erickson over 5 years ago

Hi there,

Using Saxon EE 9.7.0-2, we've noticed that it's possible that HttpUrlConnection calls for remote resources can hang indefinitely if there are any network issues. This is a "feature" of HttpUrlConnection that can arise when timeouts (connection timeout + read timeout) aren't being set for usages of the class. As a result, the thread calling saxon to transform can hang indefinitely, even long after the connection comes back. I've attached one such thread appearing in a dump.

The issue is straightforward to recreate: just switch to a wired connection, make a doc call to a page that will hang for a minute and manually disconnect your ethernet (preferably through a switch rather than directly from your machine). I've attached the simple jsp I used for testing, it just does Thread.sleep() for one minute.

<xsl:copy-of select="doc('http://yoursite.com/wait.jsp')"/>

I was wondering two things:

  1. Is there any way to make saxon apply timeouts for these kinds of calls via configuration?
  2. Are there any changes to this behavior in newer versions of saxon?

Thanks!


Replies (4)

Please register to reply

RE: Saxon's HttpUrlConnection usages during doc calls for remote resources have no timeout? - Added by Michael Kay over 5 years ago

Saxon relies entirely on the Java runtime for access to remote resources over HTTP (for example, when executing the doc() function, or when fetching stylesheet modules via xsl:include/xsl:import). In fact in most cases (when we're retrieving XML resources) we don't even fetch them ourselves, we leave this to the XML parser to do, so the detailed behaviour may depend on which XML parser you are using. Your stack trace shows a case where Apache Xerces is creating the HTTP connection.

This isn't the case for other media types, e.g. with unparsed-text() we fetch the document ourselves; but we use the simplest of calls, e.g.

URLConnection connection = absoluteURL.openConnection();
inputStream = connection.getInputStream();

You can intervene in this process at a number of levels.

  • There are Java system properties you can set to influence HTTP client behaviour, though I have never become familiar with them.

  • When the XML parser is fetching the resource, you can write an EntityResolver to take over the job of getting the input stream

  • When Saxon is fetching the resource, you can write a URIResolver or UnparsedTextResolver to take over the job of getting the input stream

RE: Saxon's HttpUrlConnection usages during doc calls for remote resources have no timeout? - Added by Kean Erickson over 5 years ago

The system default's a great idea, did not know about that. Thanks!

RE: Saxon's HttpUrlConnection usages during doc calls for remote resources have no timeout? - Added by Kean Erickson over 5 years ago

Hey there, one more question on this: I noticed that, after setting the global timeout, a doc() call that hits this timeout will result in an IOException, which in the Sender class is thrown as an XPathException with the problem URI. But for an unparsed-text call, the StandardUnparsedTextResolver will catch the IOException and throw XPathException without including in its message the URI that failed to respond. Is there any reason for that? I know we can write our own unparsed text resolver but it'd be nice if this info was included for users to see like it is in the doc() call.

RE: Saxon's HttpUrlConnection usages during doc calls for remote resources have no timeout? - Added by Michael Kay over 5 years ago

Thanks for pointing it out, I have made some improvements to the error messages for the next release.

Note that you can call setDebugging(true) on the StandardUnparsedTextResolver, in which case Logger.info() and Logger.error() messages are sent to the Logger registered with the Configuration. By default these are output to System.err.

    (1-4/4)

    Please register to reply