Bug #6274
closedparse_json on PySaxonProcessor is broken: has a documented keyword "encoding" argument but throws error for more than one keyword parameter, gives error with json_text
100%
Description
https://www.saxonica.com/saxon-c/doc12/html/saxonc.html#PySaxonProcessor-parse_json says:
parse_json(self, **kwds)
parse_xml(self, **kwds) [sic!]
Parse a source JSON document supplied as a lexical representation, source file or uri, and return it as an XDM value
Args:
**kwds: Possible keyword arguments: must be one of the following (json_file_name|json_text).
Also accept the keyword 'encoding' (str) to specify the encoding of the json_text string. If not specified then the platform default encoding is used.
Returns:
PyXdmValue: The XDM value representation of the JSON document
Raises:
Exception: Error if the keyword argument is not one of json_file_name|xml_text|xml_uri. [sic!]
However, when I try to use the encoding
keyword argument together with the json_text
keyword argument I get an error "Error: parse_json should only contain one of the following keyword arguments: (json_file_name|json_text)", probably as the code checks
py_error_message = "Error: parse_json should only contain one of the following keyword arguments: (json_file_name|json_text)"
if len(kwds) != 1:
raise Exception(py_error_message)
Later on there seems to be code to use a possible encoding
argument in lots of lines but frustratingly the code never gets there.
cdef char * c_json_string = NULL
cdef char * c_encoding_string = NULL
encoding = None
if "encoding" in kwds:
encoding = kwds["encoding"]
py_encoding_string = encoding.encode(encoding)
c_encoding_string = py_encoding_string
if "json_text" in kwds:
py_value = kwds["json_text"]
if py_value is None:
raise Exception("JSON text is None")
py_text_string = py_value.encode(encoding if encoding is not None else sys.getdefaultencoding()) if py_value is not None else None
c_json_string = py_text_string if py_value is not None else ""
if c_json_string == NULL:
raise Exception("Error converting JSON text")
val = PyXdmValue()
val.thisvptr = self.thisptr.parseJsonFromString(c_json_string, c_encoding_string if encoding is not None else NULL)
if val.thisvptr is NULL:
return None
val.thisvptr.incrementRefCount()
return val
But even worse, even if I don't try to use more than one keyword argument and solely pass in json_text
, the code seems to give an exception 'NoneType' object has no attribute 'encode'
; sample test program:
from saxonche import *
with PySaxonProcessor() as saxon_proc:
print(saxon_proc.version)
json1 = """{ "test" : "This is a test. Price is higher than 25 €. " }"""
try:
parsed_json1 = saxon_proc.parse_json(json_text=json1)
print(parsed_json1)
except PySaxonApiError as e:
print(e.message)
except Exception as e:
print(e)
parse_json_fn = PyXdmFunctionItem().get_system_function(saxon_proc, '{http://www.w3.org/2005/xpath-functions}parse-json', 1)
try:
parsed_json1 = parse_json_fn.call(saxon_proc, [saxon_proc.make_string_value(json1, encoding="UTF-8")])
print(parsed_json1)
except PySaxonApiError as e:
print(e.message)
Also the documentation needs some cleanup, the name is parse_json
but then says parse_xml
. And the possible names of keyword arguments in the documentation alter between json_file_name|xml_text|xml_uri
and json_file_name|json_text
, probably due to copy/pasting from parse_xml
but not adapting all names for parse_json
.
Updated by O'Neil Delpratt about 1 year ago
- Status changed from New to In Progress
- Assignee changed from O'Neil Delpratt to Debbie Lockett
- Found in version set to 12.4.1
Updated by Debbie Lockett about 1 year ago
- Assignee changed from Debbie Lockett to O'Neil Delpratt
I have committed changes which fix the errors in the documentation; and fix the py_error_message
content and conditions under which it is raised.
Tests should be added. i.e. for using parse_json()
and supplying json_text
, with and without encoding
.
Comparing the code for parse_json
and parse_xml
, I'm not convinced that the code is correct yet. parse_json
has:
if "encoding" in kwds:
encoding = kwds["encoding"]
py_encoding_string = encoding.encode(encoding)
c_encoding_string = py_encoding_string
but parse_xml
has:
if "encoding" in kwds:
encoding = kwds["encoding"]
py_encoding_string = encoding.encode('UTF-8')
c_encoding_string = py_encoding_string if py_encoding_string is not None else ""
Updated by O'Neil Delpratt about 1 year ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
I have fixed the indentation problem. The code samples above now run without problem.
Updated by O'Neil Delpratt 12 months ago
- Status changed from Resolved to Closed
- Fixed in version set to 12.4.2
Fix applied in SaxonC 12.4.2 maintenance release
Please register to edit this issue