Project

Profile

Help

Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks

Added by Ryan Ternier almost 11 years ago

We have 3 identical Production stacks. Our brokers, which process business logic (C1, C2, C3) use Saxon to do custom XSLT parsing on outgoing Web Messages.

ON Saturday, 27 April 2013, our C2 environment started throwing errors on all XSLT parsing. C1 and C3 processed identical messages without error.

This issue started at 1:00pm. At 2:30pm I took the proper steps and restarted IIS, that didn't fix this, so I bounced the server. After the server came up, it ran properly for 47 minutes, then started failing again.

This is the first time it has ever done this. The server processes around 50k-400k messages a day depending on the day of week.

The XSLT being used masks's specific fields in an XML Document. It has nothing to do with dates, so I'm not sure why Saxon bombed on the dates (Error message at the bottom). Because this was localized to our C2 environment, I know it's nothing to do with our Code, or XSLT. Also, I was able to run the same XML message through multiple server's using our testing scripts.

Environment: Microsoft Server 2003 Microsoft .NET 2.0 Ram: 64gb Cores: 16 This server also runs JNBridge to allow .NET to communicate with other Java components. I am using the Saxon .NET API's.

Any thoughts?

This is the XSLT that Saxon is using:




    
    
      
      
    
      
        
      
    
 
    
      
        
          		
          MSK
        
        
          
        
      
     

h1. This is the error message we got:




Message: Index was outside the bounds of the array.

   at java.security.AccessController.doPrivileged(Object , AccessControlContext , CallerID )
   at java.security.AccessController.doPrivileged(PrivilegedAction action, CallerID )
   at sun.util.resources.LocaleData.getBundle(String , Locale )
   at sun.util.resources.LocaleData.getCalendarData(Locale locale)
   at java.util.Calendar.setWeekCountData(Locale )
   at java.util.Calendar..ctor(TimeZone zone, Locale aLocale)
   at java.util.GregorianCalendar..ctor(TimeZone zone, Locale aLocale)
   at java.util.GregorianCalendar..ctor()
   at net.sf.saxon.value.DateTimeValue.getCurrentDateTime(XPathContext context)
   at net.sf.saxon.style.UseWhenFilter..ctor(Receiver next)
   at net.sf.saxon.PreparedStylesheet.loadStylesheetModule(Source styleSource)
   at net.sf.saxon.PreparedStylesheet.prepare(Source styleSource)
   at net.sf.saxon.TransformerFactoryImpl.newTemplates(Source source, CompilerInfo info)
   at Saxon.Api.XsltCompiler.Compile(TextReader input)
   at HCIM.DataFilter.DataRestrictionFilter.Filter(XmlDocument toFilter)
   at HCIM.Messaging.MessagingBase.run(XmlNode node)


Replies (12)

Please register to reply

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Ryan Ternier almost 11 years ago

Forgot to add:

Version of Saxon: saxon9he, Version=9.4.0.2, Culture=neutral, PublicKeyToken=e1fdd002d5083fe6 saxon9he-api, Version=9.4.0.2, Culture=neutral, PublicKeyToken=e1fdd002d5083fe6

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Michael Kay almost 11 years ago

Interesting, and tricky!

It's a crash inside OpenJDK code, in fact looking at the relevant class here

http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/security/AccessController.java

it seems to contain native code, which in this case will be IKVM code that bridges the Java AccessController code to methods in the .NET framework. So it seems like an IKVM issue. No doubt the reason it fails on one of your servers and not others is that there is some difference in the .NET configuration.

I'll raise it on the IKVM developers list and hope that Jeroen or Volker can come up with something. They're usually very responsive.

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Ryan Ternier almost 11 years ago

Hey Michael,

Thanks for the quick reply.

I thought it could've been a .NET configuration issue so I double checked all my .config files and they're similar.

One thought was a memory issue. We've never encountered this before, and when I restarted the machine, it started to work fine, for 47 minutes. Then it started with the same error again. I'll keep digging.

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Ryan Ternier almost 11 years ago

Last night at 2:00am it happened on our C3 stack. So it's not a configuration issue. Our C3 stack has the lightest load of every server - only doing around 10-30k messages a day. Last night our C1 and C2 stacks worked perfectly fine.



Message: Index was outside the bounds of the array.

   at java.security.AccessController.doPrivileged(Object , AccessControlContext , CallerID )
   at java.security.AccessController.doPrivileged(PrivilegedAction action, CallerID )
   at sun.util.resources.LocaleData.getBundle(String , Locale )
   at sun.util.resources.LocaleData.getCalendarData(Locale locale)
   at java.util.Calendar.setWeekCountData(Locale )
   at java.util.Calendar..ctor(TimeZone zone, Locale aLocale)
   at java.util.GregorianCalendar..ctor(TimeZone zone, Locale aLocale)
   at java.util.GregorianCalendar..ctor()
   at net.sf.saxon.value.DateTimeValue.getCurrentDateTime(XPathContext context)
   at net.sf.saxon.style.UseWhenFilter..ctor(Receiver next)
   at net.sf.saxon.PreparedStylesheet.loadStylesheetModule(Source styleSource)
   at net.sf.saxon.PreparedStylesheet.prepare(Source styleSource)
   at net.sf.saxon.TransformerFactoryImpl.newTemplates(Source source, CompilerInfo info)
   at Saxon.Api.XsltCompiler.Compile(TextReader input)
   at HCIM.DataFilter.DataRestrictionFilter.Filter(XmlDocument toFilter)
   at HCIM.Messaging.MessagingBase.run(XmlNode node)

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Michael Kay almost 11 years ago

It's very strange because all that Saxon is doing is getting the current date and time! Nothing clever that could involve multithreading or anything like that, and nothing that isn't done on every single run.

I raised it on the ikvm-developers list here:

https://sourceforge.net/mailarchive/forum.php?thread_name=27e09687e431497fa1b6c579fcf6302c%40mane.sumatrasoftware.com&forum_name=ikvm-developers

and it seems we're going to have to try and get a more detailed stack trace. I'm not immediately sure how to achieve that. I'm going to ask follow-up questions from IKVM, and we're going to conduct some experiments. I don't think it can be a Saxon problem, and Jeroen thinks its unlikely to be an IKVM problem, but we'll do our best to try and get you some diagnostics.

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Ryan Ternier almost 11 years ago

It's quite possible it's something to do with the Heap.

This has been running smoothly for 3 months, and now it seems to be hitting random servers during batch loads. (All our C brokers are load balanced).

I looked at the post. When I restarted IIS, the problem never got corrected. Sorry if I wasn't clear on that. The only way to resolve it was to do a full restart of the server.

The code being used to process the XSLT's:



            //Get the XSLT files. There might be many.
            Configuration.DataFilter dfConfig = new HCIM.Configuration.DataFilter(organization, environment);
            Dictionary filters = dfConfig.GetXSLTFilters();

            XmlDocument filteredXML = new XmlDocument();
            filteredXML.LoadXml(toFilter.OuterXml);
            
            Saxon.Api.Processor proc = new Saxon.Api.Processor();
            try
            {
                
                foreach(KeyValuePair filter in filters)
                {
                    XdmNode node = proc.NewDocumentBuilder().Build(filteredXML as XmlNode);

                    XsltTransformer t = proc.NewXsltCompiler().Compile(new System.IO.StringReader(filter.Value)).Load();

                    t.InitialContextNode = node;

                    using (System.IO.MemoryStream ms = new System.IO.MemoryStream())
                    {
                        Serializer s = new Serializer();

                        s.SetOutputStream(ms);
                        t.Run(s);

                        System.IO.StreamReader sr = new System.IO.StreamReader(ms);
                        ms.Seek(0, System.IO.SeekOrigin.Begin);

                        filteredXML.Load(ms);
                        ms.Close();
                        s.Close();
                        s = null;
                        sr.Close();
                        sr = null;

                    }

                    t = null;
                    node = null;
                }

            }
            catch (Exception ex)
            {                
                throw;
            }
            finally
            {
                proc = null;
            }
            return filteredXML;

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Ryan Ternier almost 11 years ago

As an update, here's the versions of Java on those boxes:

C1 - Java 1.5 C2 - Java 1.5 and 1.6 C3 - Java 1.5 and 1.6

This has only happened on C2 and C3. Not saying this is 100% the cause, but it is an interesting item. Does Saxon/IKVM compile the JDK at run time? Or is it pre-compiled and shipped with Saxon?

Is there a way of telling it to use a specific Version of Java?

Thanks for the help Michael.

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Michael Kay almost 11 years ago

I think we can be pretty sure that there's nothing wrong with your application code and nothing wrong with the Saxon code. Saxon's asking for the date and time, and that results in an attempt to fetch locale information, and that request hits some kind of security/resource problem deep in the system. The only thing we can do at application level is to collect better diagnostics.

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Michael Kay almost 11 years ago

There's no evidence that your Java JDK has anything to do with the problem. This is all .NET code. It might have originated as Java source code (Saxon plus the OpenJDK library), but it's all cross-compiled to run under .NET, and the JDK isn't involved.

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by O'Neil Delpratt almost 11 years ago

Hi Ryan,

Please observe comments raised on the the ikvmc-developers list:

https://sourceforge.net/mailarchive/forum.php?thread_name=27e09687e431497fa1b6c579fcf6302c%40mane.sumatrasoftware.com&forum_name=ikvm-developers

In summary you should be able to get the full java stack trace if you include the following try catch statement around your code in question:


using ikvm.extensions;

try
{
    ...
}
catch (Exception x)
{
    x.printStackTrace();
}

You will see the Java stack trace in the console window only. Please may you send us the stack trace on this forum post to hopefully establish the cause of the exception you reported.

Kind regards, O'Neil

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Ryan Ternier almost 11 years ago

Hey Michael,

Thanks,

I've been watching that thread. I'll make the changes during our next iteration.

One thing that I've been wondering, the XSLT I'm using doesn't reference dates at all... Why is Saxon trying to load the Calendar for that transform? There are date fields in the XML, but as the XSL is not targeting them, I'm wondering why it's even bothering going down that path.

Cheers,

Ryan

RE: Saxon (.net) started to fail on one of our Production stacks - worked on the other stacks - Added by Michael Kay almost 11 years ago

Saxon gets the current date and time as part of the initialization of the Controller. It could do it lazily on the first call to current-dateTime() or implicit-timezone(), but it doesn't. No particular reason; if it were known to be a high-cost or high-risk operation then we would probably have done it differently.

    (1-12/12)

    Please register to reply