DocBook

From SureLogic
Jump to: navigation, search

We encode our documentation using DocBook. This is an XML application for generating technical documentation. In general XSLT style sheets are used to translate the documentation to other formats, such as HTML and Eclipse Help.

I found Writing Documentation Using DocBook: A Crash Course to be an effective tutorial on DocBook markup. For reference, an older version of DocBook: The Definitive Guide is available. It seems to be up to date enough for creating DocBook 4.5 documents.

Version

The most recent version of DocBook is 5.0. However, we are using DocBook 4.5. This is because the normative definition of DocBook 5.0 is in Relax NG. The DTD definition is still being finalized. Because of this, most of the tools discussed below, which operate of off XML DTDs, are unhappy with DocBook 5.0 documents.

Tools

DocBook DTD

We use a local copy of the DocBook 4.5 DTD as retreived from the DocBook home page. This is described in more detail below.

DocBook Stylesheets

The DocBook definition includes an extensive set of XSLT style sheets. The stylesheets have many parameters that control the transformations. The documentation for the stylesheets can be found on sourceforge.

Saxon

We use the Saxon XSLT processor to apply style sheets to the DocBook documents. This is a 100% Java implementation of XSLT, so we can run it from Ant scripts on various platforms. We currently use Saxon 6.5.5 even though the most recent version of Saxon is 9.1.0.5 because the most recent versions of Saxon process XSLT 2.0 documents. The DocBook XSLT documents are XSLT 1.0 applications. Saxon 6.5.5 is the most recent stable release of the XSLT 1.0 processor.

We also use Saxon extensions from the DocBook Stylesheets distribution. These enable things like automatic line numbering.

Xerces

We use the Xerces XML Parser with Saxon. This replaces the Ælfred XML parser that Saxon normally uses. We make this substitution so that we can use XInclude functionality. We currently use version 2.9.1.

XML Resolver

We use the Apache commons XML resolver with Saxon to handle XML catalog files. (I'm not sure what catalog files are all about, but they are described here.)

FOP

Apache FOP is used to convert from a Formatting Objects file to a PDF file. The OFFO library is used with it to control hyphenation.

Specific Transformations

We use the profile- version of the style sheets so that we can use conditions in the document.

DocBook to HTML

We use the html/profile-docbook.xsl style sheet from the DocBook definition to convert from DocBook to HTML.

DocBook to Eclipse Help

We use the eclipse/profile-eclipse.xml style sheet from DocBook definition to convert from DocBook to HTML.

DocBook to PDF

This is a two step conversion:

  1. We use the fo/profile-docbook.xsl style sheet from the DocBook definition to convert from DocBook to FO (Formatting Objects).
  2. We use Apache FOP to convert from FO to PDF.

Notes about DocBook Markup

DOCTYPE Declarations

When processing DocBook files, Saxon+Xerces retreives the DocBook DTD many times. This is slow, and makes it impossible to build the documents when disconnected from the network. So we use a local copy of the DTD stored in lib/docbook-xml-4.5 in the surelgoic-docs project. Instead of using the normal DOCTYPE header

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">

our documents use a relative URL in the DOCTYPE declaration to refer to our local copy of the DTD. In particular, a DocBook file located at the root of a document subdirectory (for example, at the root of src/sierra-guide) would use the following instead:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"../../lib/docbook-xml-4.5/docbookx.dtd">

Because the URL is relative, you must be careful to update the reference when creating DocBook sources files in nested directories. For example, the file src/jsure-guide/chapter-getting-started/chapter.xml has the following DOCTYPE declaration:

<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"../../../lib/docbook-xml-4.5/docbookx.dtd">

Paragraphs

It turns out the para element that is used to indicate a paragraph allows other block-structured elements to be nested within it. This means, in particular, that list and programlisting elements may be nested within a paragraph. This is convenient because it means a paragraph doesn't have be closed just because a list or example needs to be stated. Paragraphs can be truly marked based on their logical flow, rather than on the needs to the markup.

Titles

Titles, that is, text marked up with the title element, should not end with a period “.”. DocBook transformations add these when they are necessary (such as after a title in a formalpara).

Program Elements

DocBook supports a wide-range of elements for the inline markup of programming language entities. I have been using the following elements:

Element Purpose
varname Local variables
parameter Method/constructor parameters, including the implicit parameter this and qualified receivers
classname Class names, including the names of JSure annotation classes when referenced in text
function Method/constructor names
structfield Field names; the names of other field-like class members including regions and locks
package Name of a Java package
code Inline code fragments; references to modifiers such as synchronized, static, and public

In some cases, a judgment call between using structfield/function and code is required. For example, when referring to the method foo(), I would use function, but if I am really referring to a particular expression that invokes the method, I would lean towards using code. Similarly, if I want to refer to the field f, I would use structfield, but if what I am really referring to is the expression x.y.f, I would use code.

Description Lists

DocBook doesn't provide an immediate counterpart to the HTML description list element <dl>. The <variablelist> element seems similar, but it is actually meant to be used for the more specialized purpose of defining terms. This intended use yields horrible formatting when a sentence or phrase is used as the <term> in a <varlistentry>.

The Right Way

A better solution that I have been using is to use an <itemizedlist> or an <orderedlist>, as context demands, and then using a <formalpara> in the <listitem> instead of a <para>. By using a <formalpara> you can provide a <title> element to be used as the header. For example

<para>To make a peanut butter and jelly sandwich follow these steps:
  <orderedlist>
    <listitem>
      <formalpara>
        <title>Lay out the bread</title>
        <para>Place two slices of bread side-by-side on a plate.</para>
      </formalpara>
    </listitem>
    <listitem>
      <formalpara>
        <title>Spread the peanut butter</title>
        <para>Use a butter knife to spread peanut butter on one of the
        slices of bread.</para>
      </formalpara>
    </listitem>
    <listitem>
      <formalpara>
        <title>Spread the jelly</title>
        <para>Use a second butter knife to spread the jelly on the other
        slice of bread.</para>
      </formalpara>
    </listitem>
    <listitem>
      <formalpara>
        <title>Join the slices</title>
        <para>Form the sandwich by placing the slice with the peanut butter
        on the slice with the jelly, making sure that the peanut butter faces
        the jelly.</para>
      </formalpara>
    </listitem>
  </orderedlist>
</para>

Observe that the title of the paragraph does not end with a period: The DocBook formatter will insert the necessary punctuation.

Dl-good.png

The Wrong Ways

The above solution is superior to either of the two following solutions, which should be avoided.

Using <emphasis>
<!-- Bad solution to a description list.  Do not do this. -->
<para>To make a peanut butter and jelly sandwich follow these steps:
  <orderedlist>
    <listitem>
      <para><emphasis>Lay out the bread.</emphasis> Place two slices of
      bread side-by-side on a plate.</para>
    </listitem>
    <listitem>
      <para><emphasis>Spread the peanut butter.</emphasis> Use a butter
      knife to spread peanut butter on one of the slices of
      bread.</para>
    </listitem>
    <listitem>
      <para><emphasis>Spread the jelly.</emphasis> Use a second butter
      knife to spread the jelly on the other slice of bread.</para>
    </listitem>
    <listitem>
      <para><emphasis>Join the slices.</emphasis> Form the sandwich by
      placing the slice with the peanut butter on the slice with the
      jelly, making sure that the peanut butter faces the jelly.</para>
    </listitem>
  </orderedlist>
</para>

While the rendering for this is not very objectionable

Dl-bad1.png

this style of markup looses the semantics of the list.

Using <variablelist>
<!-- Bad solution to a description list.  Do not do this. -->
<para>To make a peanut butter and jelly sandwich follow these steps:
  <variablelist>
    <varlistentry>
      <term>Lay out the bread</term>
      <listitem>
        <para>Place two slices of bread side-by-side on a plate.</para>
      </listitem>
    </varlistentry>
    <varlistentry>
      <term>Spread the peanut butter</term>
      <listitem>
        <para>Use a butter knife to spread peanut butter on one of the
        slices of bread.</para>
      </listitem>
    </varlistentry>
    <varlistentry>
      <term>Spread the jelly</term>
      <listitem>
        <para>Use a second butter knife to spread the jelly on the other
        slice of bread.</para>
      </listitem>
    </varlistentry>
    <varlistentry>
      <term>Join the slices</term>
      <listitem>
        <para>Form the sandwich by placing the slice with the peanut butter
        on the slice with the jelly, making sure that the peanut butter faces
        the jelly.</para>
      </listitem>
    </varlistentry>
  </variablelist>
</para>

This usage does better capture the semantics of the situration than using <emphasis> tags, but is still an abuse of the intended use of variable lists. It also yields an unattractive rendering:

Dl-bad2.png

Cross References

How to use them...

Notes about some neat things:

  • Can refer to list items
  • Can refer to callouts

Automatic Line Numbering

The DocBook elements that present literal text layout, such as programlisting, synopsis, and screen, support automatic line numbering. This is turned on by using adding the attribute linenumbering="numbered" to the element. The starting line number can be set by using the attribute startinglinenumber. This feature requires that the Saxon extensions provided with the DocBook stylesheets be used.

A word of caution: the literal block begins immediately after the end of opening element tag. Typically we follow this tag with a new line. This new line will result in a numbered empty line in the generated output. For example, given the input

 <programlisting linenumbering="numbered">
 line 1
 line 2
 line 3
 line 4
 </programlisting>

the generated document will contain the text

   1:
   2: line 1
   3: line 2
   4: line 3
   5: line 4
   6:

You need to use

 <programlisting linenumbering="numbered">line 1
 line 2
 line 3
 line 4</programlisting>

to get the expected output

   1: line 1
   2: line 2
   3: line 3
   4: line 4

Nevertheless, now that I have figured out how to make the automatic line numbering work, I find it very useful.

Conditional Elements

If you want to have an element appear only in the HTML (or Eclipse Help) version of the document or only in the PDF version of the document, you can add the attribute condition to the element:

  • When condition="isHTML" the element (and its children) only appears when HTML or Eclipse Help is generated.
  • When condition="isFO" the element (and its children) only appears when FO is generated. (FO is the first step in PDF generation.)

For example:

 …
 <para condition="isHTML">This paragraph only appears when the document
 is compiled to HTML.</para>
 …
 <para>This document is generated as
   <phrase condition="isHTML">HTML.</phrase>
   <phrase condition="isFO">Formatting Objects.</phrase></para>
 …

The condition is set when a style sheet is applied to the DocBook document by setting the profile.condition attribute. The Ant script that builds the documents passes Saxon

  • profile.condition="isHTML" when building HTML and Eclipse Help
  • profile.condition="isFO" when building FO/PDF

XInclude

XInclude is an XML standard for modularizing files, that is, dividing up a single file into multiple files. It is reasonably simple to use:

  1. Add xmlns:xi="http://www.w3.org/2001/XInclude" as an attribute of the root-level element of the document to add XInclude elements to the xi namespace..
  2. Use <xi:include href="url"/> to include the file referenced by url.
  3. The referenced file should also declare a DocBook document type, and its root element should be applicable at the point of inclusion.

For example, we can have a book document in one file, and its chapters in separate files:

book.xml
 <?xml version="1.0" encoding="utf-8"?>
 
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "../../lib/docbook-xml-4.5/docbookx.dtd">
 
 <book id="my-book" lang="en-US"
     xmlns:xi="http://www.w3.org/2001/XInclude">
   <title>My Great Book</title>
 
   <xi:include href="chapter1.xml"/>
   <xi:include href="chapter2.xml"/>
 </book>
chapter1.xml
 <?xml version="1.0" encoding="utf-8"?>
 
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "../../lib/docbook-xml-4.5/docbookx.dtd">
 
 <chapter id="intro" lang="en-US">
   <title>Introduction</title>
 
   <para>This is the introduction.</para>
 </book>
chapter2.xml
 <?xml version="1.0" encoding="utf-8"?>
 
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "../../lib/docbook-xml-4.5/docbookx.dtd">
 
 <chapter id="getting-started" lang="en-US">
   <title>Getting Started</title>
 
   <para>Here we get started learning!</para>
 </book>

Naming Conventions

I found it useful to use a consistent naming convention when using XInclude. The convention I have been using is identify the included file based on the root-level element it contains together with the id attribute of that element. I have used two forms of this, depending on whether the included file is in the same directory as the referring file, or whether it is in a subdirectory:

  • When the file is in the same directory, I name it <root-element>-<id>.xml.
  • When the file is in a subdirectory, I name the subdirectory <root-element>-<id> and the file itself <root-element>.xml. Typically I do this when the file I am including also includes files. Don't forget to update the relative URL in the DOCTYPE declaration.

(Of course this naming scheme only makes sense when file inclusion is strictly used for organization of subelements, and not for sharing/repetition of content across the document.) Using the convention for files in the same directory, the above example becomes

book.xml
 <?xml version="1.0" encoding="utf-8"?>
 
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "../../lib/docbook-xml-4.5/docbookx.dtd">
 
 <book id="my-book" lang="en-US"
     xmlns:xi="http://www.w3.org/2001/XInclude">
   <title>My Great Book</title>
 
   <xi:include href="chapter-intro.xml"/>
   <xi:include href="chapter-getting-started.xml"/>
 </book>
chapter-intro.xml
 <?xml version="1.0" encoding="utf-8"?>
 
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "../../lib/docbook-xml-4.5/docbookx.dtd">
 
 <chapter id="intro" lang="en-US">
   <title>Introduction</title>
 
   <para>This is the introduction.</para>
 </book>
chapter-getting-started.xml
 <?xml version="1.0" encoding="utf-8"?>
 
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "../../lib/docbook-xml-4.5/docbookx.dtd">
 
 <chapter id="getting-started" lang="en-US">
   <title>Getting Started</title>
 
   <para>Here we get started learning!</para>
 </book>

Using the convention for files in subdirectories, the example would be

book.xml
 <?xml version="1.0" encoding="utf-8"?>
 
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "../../lib/docbook-xml-4.5/docbookx.dtd">
 
 <book id="my-book" lang="en-US"
     xmlns:xi="http://www.w3.org/2001/XInclude">
   <title>My Great Book</title>
 
   <xi:include href="chapter-intro/chapter.xml"/>
   <xi:include href="chapter-getting-started/chapter.xml"/>
 </book>
chapter-intro/chapter.xml
 <?xml version="1.0" encoding="utf-8"?>
 
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "../../../lib/docbook-xml-4.5/docbookx.dtd">
 
 <chapter id="intro" lang="en-US">
   <title>Introduction</title>
 
   <para>This is the introduction.</para>
 </book>
chapter-getting-started/chapter.xml
 <?xml version="1.0" encoding="utf-8"?>
 
 <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
 "../../../lib/docbook-xml-4.5/docbookx.dtd">
 
 <chapter id="getting-started" lang="en-US">
   <title>Getting Started</title>
 
   <para>Here we get started learning!</para>
 </book>

Issues with Generated PDF

Images in HTML and PDF

Images may be too large in the generated PDF files. One reason for this is the possibility that the images have missing or incorrect embedded size and DPI information. This can be controlled by using multiple imageobject elements within a mediaobject element, and using the condition attribute as described above to control which imageobject element is actually used in the final document. Currently, we have not been using different images, but rather different image attributes for the different formats. Specifically we have been setting various scaling attributes for FO.

Consider the following example:

 <figure id="fig-quick-start-license-menu">
   <title>The menu option to install a license for Sierra.</title>
   <mediaobject>
     <imageobject condition="isHTML">
       <imagedata fileref="images/quick-start-license-menu.png" />
     </imageobject>
     <imageobject condition="isFO">
       <imagedata width="50%" scalefit="1" fileref="images/quick-start-license-menu.png" />
     </imageobject>
   </mediaobject>
 </figure>

In this case, the unadulterated image images/quick-start-license-menu.png is used converting to HTML (and Eclipse help). But when converting to FO (and then PDF), the same image is used, but it is proportionally scaled so that its width is 50% of the current document area.

(Tim) This approach has problems with the Eclipse help which scales the images like PDF. To avoid this I've used.

 <figure id="fig-quick-start-license-menu">
   <title>The menu option to install a license for Sierra.</title>
   <mediaobject>
     <imageobject>
       <imagedata fileref="images/quick-start-license-menu.png" />
     </imageobject>
   </mediaobject>
 </figure>

Large Program Examples

Figures that contain large programlisting elements, run over the page dimensions when converted to PDF. This is probably a consequence of limitations of FO. I have not yet figured out how to deal with this in a reasonable manner.