Project

Profile

Help

Bug #6274

closed

parse_json on PySaxonProcessor is broken: has a documented keyword "encoding" argument but throws error for more than one keyword parameter, gives error with json_text

Added by Martin Honnen about 1 year ago. Updated 12 months ago.

Status:
Closed
Priority:
Normal
Category:
Python API
Start date:
2023-12-02
Due date:
% Done:

100%

Estimated time:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Found in version:
12.4.1
Fixed in version:
12.4.2
SaxonC Languages:
SaxonC Platforms:
SaxonC Architecture:

Description

https://www.saxonica.com/saxon-c/doc12/html/saxonc.html#PySaxonProcessor-parse_json says:

parse_json(self, **kwds)
parse_xml(self, **kwds) [sic!]
Parse a source JSON document supplied as a lexical representation, source file or uri, and return it as an XDM value
Args:
    **kwds: Possible keyword arguments: must be one of the following (json_file_name|json_text).
    Also accept the keyword 'encoding' (str) to specify the encoding of the json_text string. If not specified then the platform default encoding is used.
 
Returns:
    PyXdmValue: The XDM value representation of the JSON document
Raises:
    Exception: Error if the keyword argument is not one of json_file_name|xml_text|xml_uri. [sic!]

However, when I try to use the encoding keyword argument together with the json_text keyword argument I get an error "Error: parse_json should only contain one of the following keyword arguments: (json_file_name|json_text)", probably as the code checks

      py_error_message = "Error: parse_json should only contain one of the following keyword arguments: (json_file_name|json_text)"
        if len(kwds) != 1:
          raise Exception(py_error_message)

Later on there seems to be code to use a possible encoding argument in lots of lines but frustratingly the code never gets there.

        cdef char * c_json_string = NULL
        cdef char * c_encoding_string = NULL
        encoding = None

        if "encoding" in kwds:
            encoding = kwds["encoding"]
        py_encoding_string = encoding.encode(encoding)
        c_encoding_string = py_encoding_string

        if "json_text" in kwds:
          py_value = kwds["json_text"]
          if py_value is None:
              raise Exception("JSON text is None")
          py_text_string = py_value.encode(encoding if encoding is not None else sys.getdefaultencoding()) if py_value is not None else None
          c_json_string = py_text_string if py_value is not None else ""
          if c_json_string == NULL:
              raise Exception("Error converting JSON text")
          val = PyXdmValue()
          val.thisvptr = self.thisptr.parseJsonFromString(c_json_string, c_encoding_string if encoding is not None else NULL)
          if val.thisvptr is NULL:
              return None
          val.thisvptr.incrementRefCount()
          return val

But even worse, even if I don't try to use more than one keyword argument and solely pass in json_text, the code seems to give an exception 'NoneType' object has no attribute 'encode'; sample test program:

from saxonche import *

with PySaxonProcessor() as saxon_proc:
    print(saxon_proc.version)

    json1 = """{ "test" : "This is a test. Price is higher than 25 €. " }"""

    try:
        parsed_json1 = saxon_proc.parse_json(json_text=json1)
        print(parsed_json1)
    except PySaxonApiError as e:
        print(e.message)
    except Exception as e:
        print(e)

    parse_json_fn = PyXdmFunctionItem().get_system_function(saxon_proc, '{http://www.w3.org/2005/xpath-functions}parse-json', 1)

    try:
        parsed_json1 = parse_json_fn.call(saxon_proc, [saxon_proc.make_string_value(json1, encoding="UTF-8")])
        print(parsed_json1)
    except PySaxonApiError as e:
        print(e.message)

Also the documentation needs some cleanup, the name is parse_json but then says parse_xml. And the possible names of keyword arguments in the documentation alter between json_file_name|xml_text|xml_uri and json_file_name|json_text, probably due to copy/pasting from parse_xml but not adapting all names for parse_json.

Actions #1

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from New to In Progress
  • Assignee changed from O'Neil Delpratt to Debbie Lockett
  • Found in version set to 12.4.1
Actions #2

Updated by Debbie Lockett about 1 year ago

  • Assignee changed from Debbie Lockett to O'Neil Delpratt

I have committed changes which fix the errors in the documentation; and fix the py_error_message content and conditions under which it is raised.

Tests should be added. i.e. for using parse_json() and supplying json_text, with and without encoding.

Comparing the code for parse_json and parse_xml, I'm not convinced that the code is correct yet. parse_json has:

if "encoding" in kwds:
            encoding = kwds["encoding"]
        py_encoding_string = encoding.encode(encoding)
        c_encoding_string = py_encoding_string

but parse_xml has:

if "encoding" in kwds:
            encoding = kwds["encoding"]
            py_encoding_string = encoding.encode('UTF-8')
            c_encoding_string = py_encoding_string if py_encoding_string is not None else ""
Actions #3

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

I have fixed the indentation problem. The code samples above now run without problem.

Actions #4

Updated by O'Neil Delpratt 12 months ago

  • Status changed from Resolved to Closed
  • Fixed in version set to 12.4.2

Fix applied in SaxonC 12.4.2 maintenance release

Please register to edit this issue

Also available in: Atom PDF