Project

Profile

Help

Bug #4792

closed

Python program using SaxonC 1.2.1 HE module with xquery_processor.run_query_to_string() crashes

Added by Anton Shchetikhin over 3 years ago. Updated about 2 years ago.

Status:
Closed
Priority:
Normal
Category:
Python
Start date:
2020-10-08
Due date:
% Done:

100%

Estimated time:
Found in version:
1.2.1
Fixed in version:
11.1
Platforms:

Description

I managed to compile the SaxonC 1.2.1 HE module for Python 3.7 on Ubuntu 16.04 64bit, but I am having trouble getting some XQuery expressions to run (see attachments).

  1. without_func_call.xquery works fine
Creating new xquery processor -> OK
Prepare xml -> OK
Prepare xquery -> OK
Setup context -> OK
Setup `$document` variable -> OK
Running XQuery -> 
<БлокПроверок xmlns="http://пф.рф/ВС/СЗВ-М/2017-01-01"
              ID="ВСЗЛ.Б-АНКЕТА.1"
              name="Блок проверок по БД анкетных данных">
   <Проверка ID="1">
      <Описание>Указывается СНИЛС, содержащийся в страховом свидетельстве</Описание>
      <РезультатЗапроса>0</РезультатЗапроса>
      <КодРезультата>30</КодРезультата>
   </Проверка>
   <Проверка ID="2">
      <Описание>Указывается ФИО, содержащееся в страховом свидетельстве</Описание>
      <РезультатЗапроса>0</РезультатЗапроса>
      <КодРезультата>30</КодРезультата>
   </Проверка>
   <Проверка ID="3">
      <Описание>Статус ИЛС в реестре 'Застрахованные лица' на дату проверяемого документа не должен быть равен значению 'УПРЗ'</Описание>
      <РезультатЗапроса>0</РезультатЗапроса>
      <КодРезультата>30</КодРезультата>
   </Проверка>
</БлокПроверок>
  1. func_call_with_empty_param.xquery works fine too
Creating new xquery processor -> OK
Prepare xml -> OK
Prepare xquery -> OK
Setup context -> OK
Setup `$document` variable -> OK
Running XQuery -> 
<БлокПроверок xmlns="http://пф.рф/ВС/СЗВ-М/2017-01-01"
              ID="АФ.КСФ.1"
              name="Проверка структуры файла">
   <Проверка ID="1">
      <Описание>Проверяемый файл должен быть корректно заполненным XML-документом</Описание>
      <РезультатЗапроса>
         <Результат xmlns="http://пф.рф/ВС/СЗВ-М/2017-01-01">0</Результат>
      </РезультатЗапроса>
      <КодРезультата>50</КодРезультата>
   </Проверка>
</БлокПроверок>
  1. func_call_with_non_empty_param.xquery doesn't work as expected
Creating new xquery processor -> OK
Prepare xml -> OK
Prepare xquery -> OK
Setup context -> OK
Setup `$document` variable -> OK
Running XQuery -> 
Syntax error on line 124 at column 7 near {...у[x443]л[x43b]ь[x44c]т[x442]а[x430]т[x442]а[x430]> </П[x41f]р[x440]о[x43e]в[x432]е[x435]р[x440]к[x43a]а[x430]> </Б[x411]л[x43b]о[x43e]к[x43a]...} 
  XPST0003: End of input encountered while parsing direct constructor
None

JET RUNTIME HAS DETECTED UNRECOVERABLE ERROR: system exception at 0x00007f3456e437c6
Please, contact the vendor of the application.
Core dump will be piped to "/usr/share/apport/apport %p %s %c %d %P %E"
Extra information about error is saved in the "jet_err_25848.txt" file.

Aborted (core dumped)

Text file with the error and python script (test_saxon.py) are attached.

What am I doing wrong? Maybe there's a problem with Cyrillic letters in function call? I wish also to note that BaseX works as expected in all cases.


Files

func_call_with_empty_param.xquery (2.25 KB) func_call_with_empty_param.xquery Anton Shchetikhin, 2020-10-08 19:44
func_call_with_non_empty_param.xquery (4.73 KB) func_call_with_non_empty_param.xquery Anton Shchetikhin, 2020-10-08 19:44
test_saxon.py (1.19 KB) test_saxon.py Anton Shchetikhin, 2020-10-08 19:44
jet_err_25848.txt (74.4 KB) jet_err_25848.txt Anton Shchetikhin, 2020-10-08 19:44
without_func_call.xquery (1.55 KB) without_func_call.xquery Anton Shchetikhin, 2020-10-08 19:44
test.xml (2.4 KB) test.xml Anton Shchetikhin, 2020-10-09 14:29
gdb_out.txt (3.78 KB) gdb_out.txt Anton Shchetikhin, 2020-10-09 14:37
Actions #1

Updated by Martin Honnen over 3 years ago

Can you show the input XML as well?

Also, if you have XML and XQuery as files, why do you use the indirection over etree and file apis to feed strings to Saxon, can't you just let it handle the files with e.g. parse_xml(xml_file_name='/home/mrslow/dev/scripts_py3/test.xml') and set_query_file('/home/mrslow/dev/scripts_py3/func_call_with_non_empty_param.xquery')? Perhaps that way Saxon is better able to cope with the file content.

Actions #2

Updated by O'Neil Delpratt over 3 years ago

  • Priority changed from Low to Normal
  • Found in version set to 1.2.1

Thanks for reporting you issue.

Initial thoughts as mentioned in comment #1 why not pass the files directly to Saxon. However, it is good to see how Saxon/C can be used with ElementTree.

This is indeed a bug as we should not be getting the Jet runtime crash. It is probably intercepting a null pointer exception somewhere. I will investigate with the repo you have sent us. I usually run the gdb debugger, which prevents this interception by Jet. Therefore seeing the underlying problem.

Actions #3

Updated by Anton Shchetikhin over 3 years ago

Thanks for the responses.

  1. My mistake, I forgot to attach the input XML.
  2. I usually get XML and XQuery in binary form, so I read them in the script first.
Actions #4

Updated by O'Neil Delpratt over 3 years ago

Thanks for sharing the gdb output. Very useful indeed. I can see the error:

Syntax error on line 124 at column 7 near {...у[x443]л[x43b]ь[x44c]т[x442]а[x430]т[x442]а[x430]> </П[x41f]р[x440]о[x43e]в[x432]е[x435]р[x440]к[x43a]а[x430]> </Б[x411]л[x43b]о[x43e]к[x43a]...} 
50	
  XPST0003: End of input encountered while parsing direct constructor

It looks like there is a problem with the XQuery string. Not sure if this is a legitimate error or maybe encoding/decoding issue with what is produced from the binary. The encoding (i.e. encoding='utf-8') you have used to convert the binary to string should work well in Saxon.

What happens next is you rightly try to print the error message from the XQuery processor. Here is where the crash happens somewhere in the get_error_message. This is because of the index+1 in the argument should be index. See: xqp.get_error_message(index+1) should be xqp.get_error_message(index). The documentation is not clear if the argument is 0 or 1 based. We will raise a bug against the documentation too: getErrorMessage should should zero based indexing.

We will investigate further if the query should be accepted or not.

Actions #5

Updated by Anton Shchetikhin over 3 years ago

I have tried to pass the XQuery file path directly to the processor (set_query_file('/home/mrslow/dev/scripts_py3/func_call_with_non_empty_param.xquery')) as Martin Honnen suggested and it works as expected.

Thanks for you advice.

Though working with files isn't the solution of the original problem. For instance, if I try to reuse the same XQuery expression it will definitely be better to utilize memory representation rather than reading it from a file. But maybe I'm missing out on something, and there is a 'prepare' statement counterpart for the expressions.

Actions #6

Updated by Anton Shchetikhin over 3 years ago

Hello to everyone. Is there any progress on the investigation?

Actions #7

Updated by O'Neil Delpratt over 3 years ago

Sorry for the delay on this issue. I will look at this later in the week.

Actions #8

Updated by O'Neil Delpratt over 3 years ago

  • Status changed from New to In Progress

Sorry for the delay on this issue. Currently looking at this bug issue. Running with Python 3.5

I managed to reproduce the error: XPST0003: End of input encountered while parsing direct constructor, but not the Jet crash.

However when I upgrade my python version to 3.7 I can now see the Jet crash.

As mentioned in comment #4 if I change the indexing to zero base (i.e. xqp.get_error_message(i-1)) the Jet crash goes away. We will fix this up in the code and the documentation.

Now looking at the query.

Actions #9

Updated by O'Neil Delpratt over 3 years ago

Update:

I can confirm the query looks good. I managed to get the expected result when run with the Saxon/C command-line tool (i.e. in the command directory). I also got the expected result when I ran the query on Java too.

It looks like a Saxon/C API issue. Investigating it further.

Actions #10

Updated by Anton Shchetikhin over 3 years ago

Thanks for the information.

Actions #11

Updated by O'Neil Delpratt over 3 years ago

  • Category changed from Python to C++ API
Actions #12

Updated by O'Neil Delpratt over 3 years ago

  • Category changed from C++ API to Python

Hi,

I am getting closer to resolving this issue. Just to confirm that this bug only applies to the Python API. There seems to be a problem with the internal function make_c_str(), which is used in the function set_query_content. There seems to be some encoding issue there.

The following workaround should be enough as the solution to the original problem:

In your python script where you have the code:

xqp.set_query_content(xq)

Replace it with the following:

xqp.set_property("qs", xq)

The make_c_str function is not used within set_property therefore it avoids the encoding issue.

Actions #13

Updated by Anton Shchetikhin over 3 years ago

Thanks, it works as expected. Can I expect Python API fixes in the future or help with them?

Actions #14

Updated by O'Neil Delpratt over 3 years ago

Yes indeed. For this bug issue we hope to get a fix out in the next maintenance release.

Actions #15

Updated by O'Neil Delpratt over 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Bug fixed. Made change in the make_c_str function to properly handle UTF-8 encoding to C.

Fix available in the next maintenance release.

Actions #16

Updated by Anton Shchetikhin over 3 years ago

Thank you very much.

Actions #17

Updated by O'Neil Delpratt about 2 years ago

  • Tracker changed from Support to Bug
  • Status changed from Resolved to Closed
  • Fixed in version set to 11.1

Bug fix available in the SaxonC 11.1 release

Please register to edit this issue

Also available in: Atom PDF