XML's many applications include one traditionally handled by HTML,
document source. Sometimes you want to include some HTML in your XML
source; particularly when styling your XML with XSL to HTML, you may
want some literal HTML in the output. In some cases, you'd just like to
include some well-formed mixed-content HTML in your XML, and not worry
too much about controlling its structure.
Writing explicit XSL rules for each HTML tag would be tedious and hard
to maintain. This week, I'll present a quick shortcut that allows you
to include arbitrary mixed-content HTML in your XSL stylesheet's output.
The XSL stylesheet's input includes some XML document, constrained by
(that is, validated against) a particular DTD. To define the element
containing the arbitrary HTML's content model, simply define content
models for each of the HTML elements you want to use and their
attribute lists. For example, define the <a?tag for hyperlinks like
this:
?!ELEMENT a (#PCDATA)*?BR>
?!ATTLIST a
href CDATA #IMPLIED
target CDATA #IMPLIED
name CDATA #IMPLIED?BR>
(This example is simplified, since it allows only unformatted text
inside the hyperlink text node.) Create a definition like this for each
HTML tag you wish to use. After you've defined the tags, define a
parameter entity that includes all of the HTML "pass-through" tags
you've defined:
?!ENTITY % HTMLpassthru
"a|i|b|code|br|tr|td|th|img|font|em"?BR>
Anywhere in the DTD you want to include mixed-content HTML, use this
parameter entity in the DTD. For example:
?!ELEMENT HTMLHelpText (#PCDATA|%HTMLpassthru;)*?BR>
Now for the stylesheet. Defining a separate rule for each possible
HTML element within a HTMLHelpText element would be extremely tedious.
Fortunately, a single rule can handle all such elements. Define an XSL
rule that matches the tag name AND uses <xsl:copy-of?to copy the
entire node (including attributes) to the output.
?!-- EXAMPLE XSL stylesheet --?BR>
?xsl:template
match="a|i|b|code|br|tr|td|th|img|font|em"?BR> ?xsl:copy-of select="."/?BR> ?/xsl:template?BR>
?!-- End EXAMPLE --?BR>
Any nodes in the XML input tree with these tags will be copied to the
output structure unmodified, attributes and all.
This technique is somewhat "quick-and-dirty". Choosing XHTML, which
reformulates HTML as XML and brings XML's extensibility and structure
control into the world of HTML, would be a better solution. But that's
a topic for another newsletter.