Project

Profile

Help

XML->HTML: No newline after doctype declaration

Added by Denis Maier 11 months ago

Hi, First post here.

I'm converting xml to html, and in my output files I have no newline after the document declaration, i.e. the root element is on the same line as the doctype declaration, like so:

<!DOCTYPE HTML><html lang="de">

Is this normal? Do I need to use some setting to add the newline? Or is there something wrong with my xsl? I can, of course, prepare a MWE if that's necessary, but I thought maybe I'm missing something fundamental.


Replies (11)

Please register to reply

RE: XML->HTML: No newline after doctype declaration - Added by Michael Kay 11 months ago

The serialization specification says:

If the HTML output method MUST output a document type declaration, it MUST be serialized immediately before the first element, if any, and the name following <!DOCTYPE MUST be HTML or html.

I guess we're interpreting "immediately" as meaning that intervening whitespace isn't allowed. Perhaps that's an excessively pedantic interpretation of the spec, but we generally prefer to do exactly what it says unless there's a good reason otherwise.

Is there any particular reason you want the newline here?

Have you tried indent="yes"?

RE: XML->HTML: No newline after doctype declaration - Added by Denis Maier 11 months ago

Yes, I have indent="yes" on xsl:output.

Is there any particular reason you want the newline here?

No, nothing particular. It's just that it will appear without the newline in about every HTML tutorial you'll find online, e.g.

So, I assumed that this is the somewhat more official way of doing things.

But, if it isn't a problem to have the declaration and the root element on the same line, I can live with that. I just wanted to make sure I'm not missing anything.

RE: XML->HTML: No newline after doctype declaration - Added by Michael Kay 11 months ago

It seems SaxonJ does the same. I would have expected the newline with indent="yes".

RE: XML->HTML: No newline after doctype declaration - Added by Denis Maier 11 months ago

Oh, it looks like I’ve posted to the wrong forum. I’m actually using the Java version (through Oxygen), and the .net Version (via transfom on the command line). There’s no JS involved.

RE: XML->HTML: No newline after doctype declaration - Added by Denis Maier 11 months ago

Well, that's weird. I've tested with transform on the command line. There, I seem to be getting the result I want. Is that possible?

RE: XML->HTML: No newline after doctype declaration - Added by Michael Kay 11 months ago

If you can tell us precisely WHAT you tested with transform on the command line....?

RE: XML->HTML: No newline after doctype declaration - Added by Denis Maier 11 months ago

catalog = schema/catalog-bits-v2-1-with-base.xml
xsl = bits2html.xsl

html:
	transform input.xml -xsl:$(xsl) -catalog:$(catalog) -expand:off -o:output.html

-> make html

the start of the xsl looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="3.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  exclude-result-prefixes="xlink"
  >

  <xsl:output method="html" indent="yes" version="5" html-version="5" encoding="UTF-8"/>
  <xsl:preserve-space elements="p div"/>

  <xsl:strip-space elements="*"/>

RE: XML->HTML: No newline after doctype declaration - Added by Denis Maier 11 months ago

Do you need more info from me?

As I've said in the original post: I can cook it down to a MWE if that is necessary.

RE: XML->HTML: No newline after doctype declaration - Added by Martin Honnen 11 months ago

I wonder whether Saxon so far behaves as it does because https://www.w3.org/TR/xslt-xquery-serialization-31/#HTML_DOCTYPE states:

If the HTML output method MUST output a document type declaration, it MUST be serialized immediately before the first element

RE: XML->HTML: No newline after doctype declaration - Added by Michael Kay 11 months ago

I have changed the HTML5 serializer in SaxonJ (which will also affect SaxonCS and SaxonC) so that if indentation is on, a newline is output after the DOCTYPE declaration. This change will apply from 12.2; I'm not retrofitting it to the 11.x or earlier branches.

    (1-11/11)

    Please register to reply