Project

Profile

Help

XQuery XML entities and doctype

Added by Anonymous over 17 years ago

Legacy ID: #4142556 Legacy Poster: David R Pratten (lismorelad)

Hi. Saxon XQuery is fantastic. Having great results with the B parser. The output document is a large XHTML file ready for printing with Prince (www.princexml.com) based on XML source data. I have a couple of questions: 1) Saxon generates an error when I include a &nbsp; into the .xql query. XPST0003: XQuery syntax error in #...iv class="addressblock">&nbsp;#: invalid entity reference &nbsp; Failed to compile query ==>How do I include XHTML entities into the output document? 2) When I include a doctype into the .xql file Saxon generates the following error: XPST0003: XQuery syntax error in #<!D#: Expected '--' or '[CDATA[' after '<!' Failed to compile query ==>How do I get Saxon to include the XHTML 1.0 Strict doctype into the output document? Thanks David


Replies (9)

Please register to reply

RE: XQuery XML entities and doctype - Added by Anonymous over 17 years ago

Legacy ID: #4142583 Legacy Poster: David R Pratten (lismorelad)

Found http://www.saxonica.com/documentation/extensions/instructions/entity-ref.html and &#160; as a substitute for &nbsp; Still not clear on the doctype question....

RE: XQuery XML entities and doctype - Added by Anonymous over 17 years ago

Legacy ID: #4142636 Legacy Poster: Michael Kay (mhkay)

XQuery doesn't allow named entity references to appear in the query text, only numeric character references. So you can write the NBSP character as &#xa0; which is its numeric value in Unicode. How it appears in the serialized output depends on the serialization options, notably the encoding: most likely it will appear as itself, that is a character that is visually indistinguishable from an ordinary space; but the browser will understand it all the same. If you want it to show up, select us-ascii as your output encoding. The only way of getting a DOCTYPE into your serialized output is to use the serialization parameters doctype-system and doctype-public. (This is because DOCTYPE isn't part of the XDM tree model, it can only be added when the result tree is serialized.) You can set these options within the query using the option declaration declare option saxon:output "doctype=......"; etc. or you can set them from the command line, or the Java API.

RE: XQuery XML entities and doctype - Added by Anonymous over 17 years ago

Legacy ID: #4143473 Legacy Poster: David R Pratten (lismorelad)

Dear Michael, Thankyou for your quick reply. I must be thick or something, but I can not find in the saxon documentation how to complete the doctype instruction. I am confused between doctype, doctype-public and doctype-system and saxon:doctype and saxon:output. Could I trouble you again to recommend an option instruction that will generate the following doctype at the top of the serialised output? Thanks in advance. David <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

RE: XQuery XML entities and doctype - Added by Anonymous over 17 years ago

Legacy ID: #4146004 Legacy Poster: Michael Kay (mhkay)

Please try the following query: declare namespace saxon="http://saxon.sf.net/"; declare option saxon:output "doctype-public=-//W3C//DTD XHTML 1.0 Strict//EN"; declare option saxon:output "doctype-system=http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"; <html> <head><title>sample</title></head> <body><p>document</p></body> </html> Michael Kay Saxonica Limited

RE: XQuery XML entities and doctype - Added by Anonymous over 17 years ago

Legacy ID: #4146026 Legacy Poster: David R Pratten (lismorelad)

Thankyou - extremely helpful. David

PHP character entities filter for saxon - Added by Anonymous over 17 years ago

Legacy ID: #4146051 Legacy Poster: David R Pratten (lismorelad)

// // Remove named entities from source before feeding to saxon // Source of character entities table: http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references // $namedentities = array("&quot;" ,"&amp;", "&lt;", "&gt;", "&nbsp;", "&iexcl;", "&cent;", "&pound;", "&curren;", "&yen;", "&brvbar;", "&sect;", "&uml;", "&copy;", "&ordf;", "&laquo;", "&not;", "&shy;", "&reg;", "&macr;", "&deg;", "&plusmn;", "&sup2;", "&sup3;", "&acute;", "&micro;", "&para;", "&middot;", "&cedil;", "&sup1;", "&ordm;", "&raquo;", "&frac14;", "&frac12;", "&frac34;", "&iquest;", "&Agrave;", "&Aacute;", "&Acirc;", "&Atilde;", "&Auml;", "&Aring;", "&AElig;", "&Ccedil;", "&Egrave;", "&Eacute;", "&Ecirc;", "&Euml;", "&Igrave;", "&Iacute;", "&Icirc;", "&Iuml;", "&ETH;", "&Ntilde;", "&Ograve;", "&Oacute;", "&Ocirc;", "&Otilde;", "&Ouml;", "&times;", "&Oslash;", "&Ugrave;", "&Uacute;", "&Ucirc;", "&Uuml;", "&Yacute;", "&THORN;", "&szlig;", "&agrave;", "&aacute;", "&acirc;", "&atilde;", "&auml;", "&aring;", "&aelig;", "&ccedil;", "&egrave;", "&eacute;", "&ecirc;", "&euml;", "&igrave;", "&iacute;", "&icirc;", "&iuml;", "&eth;", "&ntilde;", "&ograve;", "&oacute;", "&ocirc;", "&otilde;", "&ouml;", "&divide;", "&oslash;", "&ugrave;", "&uacute;", "&ucirc;", "&uuml;", "&yacute;", "&thorn;", "&yuml;", "&OElig;", "&oelig;", "&Scaron;", "&scaron;", "&Yuml;", "&fnof;", "&circ;", "&tilde;", "&Alpha;", "&Beta;", "&Gamma;", "&Delta;", "&Epsilon;", "&Zeta;", "&Eta;", "&Theta;", "&Iota;", "&Kappa;", "&Lambda;", "&Mu;", "&Nu;", "&Xi;", "&Omicron;", "&Pi;", "&Rho;", "&Sigma;", "&Tau;", "&Upsilon;", "&Phi;", "&Chi;", "&Psi;", "&Omega;", "&alpha;", "&beta;", "&gamma;", "&delta;", "&epsilon;", "&zeta;", "&eta;", "&theta;", "&iota;", "&kappa;", "&lambda;", "&mu;", "&nu;", "&xi;", "&omicron;", "&pi;", "&rho;", "&sigmaf;", "&sigma;", "&tau;", "&upsilon;", "&phi;", "&chi;", "&psi;", "&omega;", "&thetasym;", "&upsih;", "&piv;", "&ensp;", "&emsp;", "&thinsp;", "&zwnj;", "&zwj;", "&lrm;", "&rlm;", "&ndash;", "&mdash;", "&lsquo;", "&rsquo;", "&sbquo;", "&ldquo;", "&rdquo;", "&bdquo;", "&dagger;", "&Dagger;", "&bull;", "&hellip;", "&permil;", "&prime;", "&Prime;", "&lsaquo;", "&rsaquo;", "&oline;", "&frasl;", "&euro;", "&image;", "&weierp;", "&real;", "&trade;", "&alefsym;", "&larr;", "&uarr;", "&rarr;", "&darr;", "&harr;", "&crarr;", "&lArr;", "&uArr;", "&rArr;", "&dArr;", "&hArr;", "&forall;", "&part;", "&exist;", "&empty;", "&nabla;", "&isin;", "&notin;", "&ni;", "&prod;", "&sum;", "&minus;", "&lowast;", "&radic;", "&prop;", "&infin;", "&ang;", "&and;", "&or;", "&cap;", "&cup;", "&int;", "&there4;", "&sim;", "&cong;", "&asymp;", "&ne;", "&equiv;", "&le;", "&ge;", "&sub;", "&sup;", "&nsub;", "&sube;", "&supe;", "&oplus;", "&otimes;", "&perp;", "&sdot;", "&lceil;", "&rceil;", "&lfloor;", "&rfloor;", "&lang;", "&rang;", "&loz;", "&spades;", "&clubs;", "&hearts;", "&diams;"); $numericentities = array("&#34;", "&#38;", "&#60;", "&#62;", "&#160;", "&#161;", "&#162;", "&#163;", "&#164;", "&#165;", "&#166;", "&#167;", "&#168;", "&#169;", "&#170;", "&#171;", "&#172;", "&#173;", "&#174;", "&#175;", "&#176;", "&#177;", "&#178;", "&#179;", "&#180;", "&#181;", "&#182;", "&#183;", "&#184;", "&#185;", "&#186;", "&#187;", "&#188;", "&#189;", "&#190;", "&#191;", "&#192;", "&#193;", "&#194;", "&#195;", "&#196;", "&#197;", "&#198;", "&#199;", "&#200;", "&#201;", "&#202;", "&#203;", "&#204;", "&#205;", "&#206;", "&#207;", "&#208;", "&#209;", "&#210;", "&#211;", "&#212;", "&#213;", "&#214;", "&#215;", "&#216;", "&#217;", "&#218;", "&#219;", "&#220;", "&#221;", "&#222;", "&#223;", "&#224;", "&#225;", "&#226;", "&#227;", "&#228;", "&#229;", "&#230;", "&#231;", "&#232;", "&#233;", "&#234;", "&#235;", "&#236;", "&#237;", "&#238;", "&#239;", "&#240;", "&#241;", "&#242;", "&#243;", "&#244;", "&#245;", "&#246;", "&#247;", "&#248;", "&#249;", "&#250;", "&#251;", "&#252;", "&#253;", "&#254;", "&#255;", "&#338;", "&#339;", "&#352;", "&#353;", "&#376;", "&#402;", "&#710;", "&#732;", "&#913;", "&#914;", "&#915;", "&#916;", "&#917;", "&#918;", "&#919;", "&#920;", "&#921;", "&#922;", "&#923;", "&#924;", "&#925;", "&#926;", "&#927;", "&#928;", "&#929;", "&#931;", "&#932;", "&#933;", "&#934;", "&#935;", "&#936;", "&#937;", "&#945;", "&#946;", "&#947;", "&#948;", "&#949;", "&#950;", "&#951;", "&#952;", "&#953;", "&#954;", "&#955;", "&#956;", "&#957;", "&#958;", "&#959;", "&#960;", "&#961;", "&#962;", "&#963;", "&#964;", "&#965;", "&#966;", "&#967;", "&#968;", "&#969;", "&#977;", "&#978;", "&#982;", "&#8194;", "&#8195;", "&#8201;", "&#8204;", "&#8205;", "&#8206;", "&#8207;", "&#8211;", "&#8212;", "&#8216;", "&#8217;", "&#8218;", "&#8220;", "&#8221;", "&#8222;", "&#8224;", "&#8225;", "&#8226;", "&#8230;", "&#8240;", "&#8242;", "&#8243;", "&#8249;", "&#8250;", "&#8254;", "&#8260;", "&#8364;", "&#8465;", "&#8472;", "&#8476;", "&#8482;", "&#8501;", "&#8592;", "&#8593;", "&#8594;", "&#8595;", "&#8596;", "&#8629;", "&#8656;", "&#8657;", "&#8658;", "&#8659;", "&#8660;", "&#8704;", "&#8706;", "&#8707;", "&#8709;", "&#8711;", "&#8712;", "&#8713;", "&#8715;", "&#8719;", "&#8721;", "&#8722;", "&#8727;", "&#8730;", "&#8733;", "&#8734;", "&#8736;", "&#8743;", "&#8744;", "&#8745;", "&#8746;", "&#8747;", "&#8756;", "&#8764;", "&#8773;", "&#8776;", "&#8800;", "&#8801;", "&#8804;", "&#8805;", "&#8834;", "&#8835;", "&#8836;", "&#8838;", "&#8839;", "&#8853;", "&#8855;", "&#8869;", "&#8901;", "&#8968;", "&#8969;", "&#8970;", "&#8971;", "&#9001;", "&#9002;", "&#9674;", "&#9824;", "&#9827;", "&#9829;", "&#9830;"); $contents = str_replace( $namedentities, $numericentities, $contents);

RE: PHP character entities filter for saxon - Added by Anonymous over 17 years ago

Legacy ID: #4146072 Legacy Poster: Michael Kay (mhkay)

What exactly are you trying to do here? If you're trying to take an XML document that uses entity references and convert it into one that doesn't, then the query "." (yes, a one-character valid XQuery) will do the job for you.

RE: XQuery XML entities and doctype - Added by Anonymous over 17 years ago

Legacy ID: #4146099 Legacy Poster: David R Pratten (lismorelad)

I am using xquery for creating a mail merge. The source document is a standalone xhtml document with fields in it like: {fn:data($familyname)}&nbsp;{fn:data($surname)}, etc etc. A PHP preprocessor takes a prototype .xql file and inserts into the middle of the xquery the "//body/*" of the xhtml source document to create a generated .xql file. The generated .xql file is then fed to Saxon to create the XHTML mail merged document which is in turn fed to Prince and converted to PDF. The PHP preprocessor replaces named character entities as part of creating the generated .xql file. David

RE: XQuery XML entities and doctype - Added by Anonymous over 17 years ago

Legacy ID: #4146100 Legacy Poster: David R Pratten (lismorelad)

One more note. The PHP preprocessor recognises tags like: <xnews_include href="newsletter.html" path="//body/*"> in the middle of the prototype .xql and replaces it with the serialized XML of the content of the source document. David

    (1-9/9)

    Please register to reply