Bug #3877
closed
Revised HTML indentation rules give poor results for meta and title elements
Fixed in Maintenance Release:
Description
See https://saxonica.plan.io/boards/3/topics/7277
The indentation rules fir HTML and XHTML were revised (see bugs #3839 and #3842) to make them conformant with the spec and to improve the handling where there is existing whitespace in the document being reformatted. This has led to poor results in some cases as illustrated by this help topic.
The first question is: why is no indentation added before the title
start tag?
This is because the flag "inFormattedTag" is true. It seems we are being over-enthusiastic about applying the rule:
Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being those recognized as HTML elements with local names pre, script, style, title, and textarea.
and are also preventing whitespace being added before or after such an element.
The reason there is no whitespace added before or after the link
tag is because link
is an inline element and we are not allowed to add whitespace before or after an inline element. I need to do some further research to establish whether we are correct to include link
in the list of inline elements; it seems that its role depends on context and we may need a more subtle rule here.
For HTML 4, a list of inline elements is given at http://www.htmlhelp.com/reference/html40/inline.html - it does not include LINK.
For HTML 5, a LINK element is a phrasing element if and only if it is "allowed in the body", which is true only if it "has a rel attribute that contains only keywords that are body-ok", where the body-ok keywords are "prefetch" and "stylesheet".
On balance, I think it's probably best if we exclude LINK from the list of inline elements for indentation purposes.
With this change, we still get no whitespace added between the TITLE and LINK elements. This is due to the afterFormatted
flag, whose purpose seems somewhat obscure. The use of this flag hasn't changed, but its effect has changed as a result of TITLE being now (correctly) classified as a formatted element. I don't see anything in the spec that says we shouldn't indent after a formatted element, so I shall experiment with taking this out.
- Status changed from New to Closed
I have run some ad-hoc tests and I am happy with the indented output that is now being produced, so I am closing this bug.
Please register to edit this issue
Also available in: Atom
PDF