Project

Profile

Help

Bug #3877

closed

Revised HTML indentation rules give poor results for meta and title elements

Added by Michael Kay over 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Serialization
Sprint/Milestone:
-
Start date:
2018-08-17
Due date:
% Done:

0%

Estimated time:
Legacy ID:
Applies to branch:
Fix Committed on Branch:
Fixed in Maintenance Release:
Platforms:

Description

See https://saxonica.plan.io/boards/3/topics/7277

The indentation rules fir HTML and XHTML were revised (see bugs #3839 and #3842) to make them conformant with the spec and to improve the handling where there is existing whitespace in the document being reformatted. This has led to poor results in some cases as illustrated by this help topic.

Actions #1

Updated by Michael Kay over 6 years ago

The first question is: why is no indentation added before the title start tag?

This is because the flag "inFormattedTag" is true. It seems we are being over-enthusiastic about applying the rule:

Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being those recognized as HTML elements with local names pre, script, style, title, and textarea.

and are also preventing whitespace being added before or after such an element.

Actions #2

Updated by Michael Kay over 6 years ago

The reason there is no whitespace added before or after the link tag is because link is an inline element and we are not allowed to add whitespace before or after an inline element. I need to do some further research to establish whether we are correct to include link in the list of inline elements; it seems that its role depends on context and we may need a more subtle rule here.

Actions #3

Updated by Michael Kay over 6 years ago

For HTML 4, a list of inline elements is given at http://www.htmlhelp.com/reference/html40/inline.html - it does not include LINK.

For HTML 5, a LINK element is a phrasing element if and only if it is "allowed in the body", which is true only if it "has a rel attribute that contains only keywords that are body-ok", where the body-ok keywords are "prefetch" and "stylesheet".

On balance, I think it's probably best if we exclude LINK from the list of inline elements for indentation purposes.

Actions #4

Updated by Michael Kay over 6 years ago

With this change, we still get no whitespace added between the TITLE and LINK elements. This is due to the afterFormatted flag, whose purpose seems somewhat obscure. The use of this flag hasn't changed, but its effect has changed as a result of TITLE being now (correctly) classified as a formatted element. I don't see anything in the spec that says we shouldn't indent after a formatted element, so I shall experiment with taking this out.

Actions #5

Updated by Michael Kay over 5 years ago

  • Status changed from New to Closed

I have run some ad-hoc tests and I am happy with the indented output that is now being produced, so I am closing this bug.

Please register to edit this issue

Also available in: Atom PDF