An explanation of the abbreviations in a DTD

If you’ve never looked at a Document Type Definition (DTD), you’ve missed one a web designer’s most interesting experiences. I’m only kidding a little bit. You can download several flavors of DTD from the W3C and read them for your edification.

You see a lot of abbreviations and not much explanation of what it all means. I’ll explain a few of the abbreviations for you. Take a look at the information in the XHTML1-transitional.dtd for the HTML element body:

<!ELEMENT html (head, body)>
<!ATTLIST html
%i18n;
id ID #IMPLIED
xmlns %URI; #FIXED 'http://www.w3.org/1999/xhtml'
>

The two items in parentheses are elements that must be included. If you see a question mark after an element listed in parentheses, it means it may be included. If you see a plus sign, it means at least one of that element must be included.

ATTLIST is attribute list. What follows is a list of attributes that this particular element can have. %i18n; is an attribute related to internationalization and means that the element can be adapted to multiple locales. The first attribute is id which is defined as ID and #IMPLIED. #IMPLIED means the attribute is legal to include but not required. If it were required, it would say #REQUIRED.

An example of a #REQUIRED attribute would be the src attribute for the element img.

The next attribute you see is xmlns (xml namespace) which is defined as %URI;. Since this is preceded by a percent sign and followed by a semi-colon, the URI will be replaced by a declared value. In this particular case, the value is #FIXED 'http://www.w3.org/1999/xhtml'. In most other situations, a URI would not be fixed.

Two other abbreviations you may see are CDATA and PCDATA. The first, CDATA, means character data. In English, that means what ever string of letters you put there. For example, class CDATA #IMPLIED, tells you that the class can have character data as a value. On the other hand, PCDATA stands for parsed character data. This means not merely a string of characters, but some entities that may have to be escaped or interpreted by the parser (browser) to have special meaning. So you see things in a DTD like this: !ELEMENT script (#PCDATA).

Finally, you may see hyphens and zeros. For example, !ELEMENT UL - - (LI)+ . The hyphens (and/or zeros) travel in pairs and represent the requirements for a starting and ending tag. So - - means both a starting and ending tag are required, while - 0 means a starting tag is required, but an ending tag is optional. So in the example, !ELEMENT UL - - (LI)+ , a ul requires a starting and ending tag. But the br element, !ELEMENT BR - O EMPTY requires no ending tag.

Leave a Reply