Project

Profile

Help

Bug #5852

closed

XML Transformer generates invalid XHTML

Added by Janne Wulf about 1 year ago. Updated about 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Serialization
Sprint/Milestone:
-
Start date:
2023-01-24
Due date:
% Done:

100%

Estimated time:
Legacy ID:
Applies to branch:
11, 12, trunk
Fix Committed on Branch:
11, 12, trunk
Fixed in Maintenance Release:
Platforms:
.NET, Java

Description

I am using Saxon-HE 11.4 to serialize a XHTML document. The document already defines the meta element :

<meta charset="utf-8"/>

Using the xml transformer, the MetaTagAdjuster now adds an additional meta element:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>

Having these two charset declarations in the XHTML document is illegal.

We can avoid inserting the content type at all with this setting:

transformer.setOutputProperty(SaxonOutputKeys.INCLUDE_CONTENT_TYPE, "no")

However, I think the MetaTagAdjuster should check if the charset is already defined.

Example: When having the following

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta charset="utf-8" />
    </head>
</html>

in a Document object and transforming it to written XHTML. It results in

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
        <meta charset="utf-8" />
    </head>
</html>
Actions #1

Updated by Janne Wulf about 1 year ago

Actions #2

Updated by Michael Kay about 1 year ago

Interesting. Doing this makes sense, but would not conform with the 3.1 serialization spec. The spec says that an existing meta element with http-equiv="Content-Type" is discarded, but not one with a charset attribute.

Perhaps the meta/@charset attribute wasn't in the HTML5 spec at the time the 3.1 serialization spec was published, in which case it would be sensible to support it in the way suggested. We'll look into it.

Actions #3

Updated by Michael Kay about 1 year ago

I have raised an issue against the W3C spec here: https://github.com/qt4cg/qtspecs/issues/318

I think this is a case where we should do what is right, rather that doing what the spec says.

Actions #4

Updated by Michael Kay about 1 year ago

  • Category set to Serialization
  • Status changed from New to Resolved
  • Applies to branch 12, trunk added
  • Fix Committed on Branch 11, 12, trunk added
  • Platforms .NET added

Fixed on the 11.x and 12.x branches.

Test cases added to the QT4 test suite: Serialization-html-60 and Serialization-xhtml-68.

Actions #5

Updated by O'Neil Delpratt about 1 year ago

  • % Done changed from 0 to 100
  • Fixed in Maintenance Release 11.5 added

Bug fix applied in the Saxon 11.5 maintenance release.

Actions #6

Updated by O'Neil Delpratt about 1 year ago

  • Status changed from Resolved to Closed
  • Fixed in Maintenance Release 12.1 added

Bug fix applied in the Saxon 12.1 maintenance release.

Please register to edit this issue

Also available in: Atom PDF