Dateianhang 'DocBook-Demystification-HOWTO.xml'

Herunterladen

   1 <?xml version="1.0"?>
   2 <!DOCTYPE article PUBLIC  "-//OASIS//DTD DocBook XML V4.1.2//EN"
   3     "http://docbook.org/xml/4.1.2/docbookx.dtd" [
   4 <!ENTITY howto         "http://tldp.org/HOWTO/">
   5 <!ENTITY mini-howto    "http://tldp.org/HOWTO/mini/">
   6 ]>
   7 
   8 <article>
   9 <articleinfo>
  10   <title>DocBook Demystification HOWTO</title>
  11 
  12   <author>
  13      <firstname>Eric</firstname>
  14      <surname>Raymond</surname>
  15      <affiliation>
  16         <address>
  17            <email>esr@thyrsus.com</email>
  18         </address>
  19      </affiliation>
  20   </author>
  21 
  22   <revhistory>
  23      <revision>
  24 	<revnumber>v1.1</revnumber>
  25 	<date>2002-10-01</date>
  26 	<authorinitials>esr</authorinitials>
  27 	 <revremark>
  28 	   Correct inadvertent misrepresentation of FSF's position.
  29 	   Added pointer to the DocBook FAQ.
  30 	</revremark>
  31      </revision>
  32      <revision>
  33 	<revnumber>v1.0</revnumber>
  34 	<date>2002-09-20</date>
  35 	<authorinitials>esr</authorinitials>
  36 	 <revremark>
  37 	   Initial version.
  38 	</revremark>
  39      </revision>
  40   </revhistory>
  41 
  42   <abstract><para>
  43   This HOWTO attempts to clear the fog and mystery surrounding the
  44   DocBook markup system and the tools that go with it.  It is aimed at
  45   authors of technical documentation for open-source projects hosted
  46   on Linux, but should be useful for people composing other kinds on
  47   other Unixes as well.  
  48   </para></abstract>
  49 
  50 </articleinfo>
  51 
  52 <sect1 id="intro"><title>Introduction</title>
  53 
  54 <para>A great many major open-source projects are converging on
  55 DocBook as a standard format for their documentation &mdash; projects
  56 including the Linux kernel, GNOME, KDE, Samba, and the Linux
  57 Documentation Project.  The advocates of XML-based "structural markup"
  58 (as opposed to the older style of "presentation markup" exemplified by
  59 troff, Tex, and Texinfo) seem to have won the theoretical
  60 battle.</para>
  61 
  62 <para>Nevertheless, a lot of confusion surrounds DocBook and the
  63 programs that support it.  Its devotees speak an argot that is dense
  64 and forbidding even by computer-science standards, slinging around
  65 acronyms that have no obvious relationship to the things you need to
  66 do to write markup and make HTML or Postscript from it.  XML standards
  67 and technical papers are notoriously obscure.  Most DocBook-related
  68 tools are very poorly documented, and their documentation is
  69 especially prone to assume way too much prior knowledge on the
  70 reader's part.</para>
  71 
  72 <para>This HOWTO will attempt to clear up the major mysteries
  73 surrounding DocBook and its application to open-source documentation
  74 &mdash; both the technical and political ones.  Our objective is to equip
  75 you to understand not just what you need to do to make documents, but
  76 why the process is as complex as it is &mdash; and how it can be
  77 expected to change as newer DocBook-related tools become
  78 available.</para>
  79 
  80 </sect1>
  81 <sect1><title>Why care about DocBook at all?</title>
  82 
  83 <para>There are two possibilities that make DocBook really
  84 interesting.  One is <emphasis>multi-mode rendering</emphasis> and the
  85 other is <emphasis>searchable documentation
  86 databases</emphasis>.</para>
  87 
  88 <para>Multi-mode rendering is the easier, nearer-term possibility; it's
  89 the ability to write a document in a single master format that can be
  90 rendered in many different display modes (in particular, as both HTML
  91 for on-line viewing and as Postscript for high-quality printed
  92 output).  This capability is pretty well implemented now.</para>
  93 
  94 <para><emphasis>Searchable documentation databases</emphasis> is
  95 shorthand for the possibility that DocBook might help get us to a
  96 world in which all the documentation on your open-source operating
  97 system is one rich, searchable, cross-indexed and hyperlinked
  98 database (rather than being scattered across several different formats
  99 in multiple locations as it is now).</para>
 100 
 101 <para>Ideally, whenever you install a software package on your machine 
 102 it would register its DocBook documentation into your system's
 103 catalog.  HTML, properly indexed and cross-linked to the HTML in the 
 104 rest of your catalog, would be generated.  The new package's
 105 documentation would then be available through your browser.  All
 106 your documentation would would be searchable through an interface
 107 resembling a good Web search engine.</para>
 108 
 109 <para>HTML itself is not quite rich enough a format to get us to that
 110 world.  To name just one lack, you can't explicitly declare index
 111 entries in HTML.  DocBook <emphasis>does</emphasis> have the semantic
 112 richness to support structured documentation databases.  Fundamentally
 113 that's why so many projects are adopting it.</para>
 114 
 115 <para>DocBook has the vices that go with its virtues.  Some people
 116 find it unpleasantly heavyweight, and too verbose to be really
 117 comfortable as a composition format.  That's OK; as long as the markup
 118 tools they like (things like Perl POD or GNU Texinfo) can generate
 119 DocBook out their back ends, we can all still get we want.  It doesn't
 120 matter whether or not everybody writes in DocBook &mdash; as long as
 121 it becomes the common document interchange format that everyone uses,
 122 we'll still get unified searchable documentation databases.</para>
 123 
 124 </sect1>
 125 <sect1><title>Structural markup: a primer</title>
 126 
 127 <para>Older formatting languages like Tex, Texinfo, and Troff
 128 supported <firstterm>presentation
 129 markup</firstterm><indexterm><primary>presentation
 130 markup</primary></indexterm>.  In these systems, the instructions you
 131 gave were about the appearance and physical layout of the text (font
 132 changes, indentation changes, that sort of thing).</para>
 133 
 134 <para>Presentation markup was adequate as long as your objective was
 135 to print to a single medium or type of display device.  You run into
 136 its limits, however, when you want to mark up a document so that (a)
 137 it can be formatted for very different display media (such as printing
 138 vs. Web display), or (b) you want to support searching and indexing the
 139 document by its logical structure (as you are likely to want to do,
 140 for example, if you are incorporating it into a hypertext system).</para>
 141 
 142 <para>To support these capabilities properly, you need a system of
 143 <firstterm>structural markup</firstterm><indexterm><primary>structural
 144 markup</primary></indexterm>.  In structural markup, you describe not
 145 the physical appearance of the document but the logical properties of
 146 its parts.</para>
 147 
 148 <para>As an example: In a presentation-markup language, if you want to
 149 emphasize a word, you might instruct the formatter to set it in
 150 boldface.  In
 151 <citerefentry><refentrytitle>troff</refentrytitle><manvolnum>1</manvolnum></citerefentry>
 152 this would look like so:</para>
 153 
 154 <programlisting>
 155 All your base
 156 .B are
 157 belong to us!
 158 </programlisting>
 159 
 160 <para>In a structural-markup language, you would tell the formatter to
 161 emphasize the word:</para>
 162 
 163 <programlisting>
 164 All your base &lt;emphasis&gt;are&lt;/emphasis&gt; belong to us!
 165 </programlisting>
 166 
 167 <para> The "&lt;emphasis&gt;" and &lt;/emphasis&gt;in the line above
 168 are called <firstterm>markup
 169 tags</firstterm><indexterm><primary>markup tags</primary></indexterm>,
 170 or just <firstterm>tags</firstterm> for short.  They are the
 171 instructions to your formatter.</para>
 172 
 173 <para>In a structural-markup language, the physical appearance of the
 174 final document would be controlled by a <firstterm>stylesheet</firstterm>
 175 <indexterm><primary>stylesheet</primary></indexterm>.  It is the
 176 stylesheet that would tell the formatter "render emphasis as a font
 177 change to boldface".  One advantage of presentation-markup languages
 178 is that by changing a stylesheet you can globally change the
 179 presentation of the document (to use different fonts, for example)
 180 without having to hack all the the individual instances of (say)
 181 <markup>.B</markup> in the document itself.</para>
 182 
 183 </sect1>
 184 <sect1><title>Document Type Definitions</title>
 185 
 186 <para>(Note: to keep the explanation simple, most of this
 187 section is going to tell some lies, mainly by omitting a lot of 
 188 history.  Truthfulness will be fully restored in a following
 189 section.)</para>
 190 
 191 <para>DocBook is a structural-level markup language.  Specifically, it
 192 is a dialect of XML.  A DocBook document is a hunk of XML that uses
 193 XML tags for structural markup.</para>
 194 
 195  <para>In order for a document formatter to apply a stylesheet to your
 196 document and make it look good, it needs to know things about the
 197 overall structure of your document.  For example, it needs to know
 198 that a book manuscript normally consists of front matter, a sequence
 199 of chapters, and back matter in order to physically format chapter
 200 headers properly.  In order for it to know this sort of thing, you
 201 need to give it a <firstterm>Document Type
 202 Definition</firstterm><indexterm><primary>Document Type
 203 Definition</primary><secondary>DTD</secondary></indexterm> or DTD. The
 204 DTD tells your formatter what sorts of elements can be in the document
 205 structure, and in what orders they can appear.</para>
 206 
 207 <para>What we mean by calling DocBook an `application' of XML is
 208 actually that DocBook is a DTD &mdash; a rather large DTD, with
 209 somewhere around 400 tags in it.</para>
 210 
 211 <para>Lurking behind DocBook is a kind of program called a
 212 <firstterm>validating parser</firstterm><indexterm><primary>validating
 213 parser</primary></indexterm>.When you format a DocBook document, the
 214 first step is to pass it through a validating parser (the front end of
 215 the DocBook formatter).  This program checks your document against the
 216 DocBook DTD to make sure you aren't breaking any of the DTD's
 217 structural rules (otherwise the back end of the formatter, the part
 218 that applies your style sheet, might become quite confused)</para>
 219 
 220 <para>The validating parser will either bomb out, giving you error
 221 messages about places where the document structure is broken, or translate
 222 the document into a stream of <firstterm>formatting events</firstterm>
 223 which the parser back end combines with the information in your stylesheet
 224 to produce formatted output</para>
 225 
 226 <para>Here is a diagram of the whole process:</para>
 227 
 228 <mediaobject>
 229 <imageobject> <imagedata fileref="figure1.png" format="PNG"/></imageobject>
 230 </mediaobject>
 231 
 232 <para>The part of the diagram inside the dotted box is your formatting
 233 software, or <firstterm>toolchain</firstterm>. Besides the obvious and
 234 visible input to the formatter (the document source) you'll need to
 235 keep the two `hidden' inputs of the formatter (DTD and stylesheet) in
 236 mind to understand what follows.</para>
 237 </sect1>
 238 <sect1><title>Other DTDs</title>
 239 
 240 <para>A brief digression into other DTDs may help make clear what parts of
 241 the previous section were specific to DocBook and what parts are general to
 242 all structural-markup languages.</para>
 243 
 244 <para><ulink url="http://www.tei-c.org/">TEI</ulink> (Text Encoding
 245 Initiative) is a large, elaborate DTD used primarily in academia for
 246 computer transcription of literary texts.  TEI's Unix-based toolchains
 247 use many of the same tools that are involved with DocBook, but with
 248 different stylesheets and (of course) a different DTD.</para>
 249 
 250 <para>XHTML, the latest version of HTML, is also an XML application
 251 described by a DTD, which explains the family resemblance between
 252 XHTML and DocBook tags. The XHTML toolchain consists of web browsers
 253 and a number of ad-hoc HTML-to-print utilities.</para>
 254 
 255 <para>Many other XML DTDs are maintained to help people exchange
 256 structured information in fields as diverse as bioinformatics and
 257 banking.  You can look at a <ulink
 258 url="http://www.xml.com/pub/rg/DTD_Repositories"> list of
 259 repositories</ulink> to get some idea of the variety out
 260 there.</para>
 261 
 262 </sect1>
 263 <sect1><title>The DocBook toolchain</title>
 264 
 265 <para>Normally, what you'll do to make XHTML from your
 266 DocBook sources will look like this:</para>
 267 
 268 <screen>
 269 bash$ xmlto xhtml foo.xml
 270 Convert to XHTML
 271 bash$ ls *.html
 272 ar01s02.html ar01s03.html ar01s04.html index.html
 273 </screen>
 274 
 275 <para>In this example, you converted an XML-Docbook  document named 
 276 <filename>foo.xml</filename> with three top-level sections into an
 277 index page and two parts.  Making one big page is just as easy:</para>
 278 
 279 <screen>
 280 bash$ xmlto xhtml-nochunks foo.xml
 281 Convert to XHTML
 282 bash$ ls *.html
 283 foo.html
 284 </screen>
 285 
 286 <para>Finally, here is how you make Postscript for printing:</para>
 287 
 288 <screen>
 289 bash$ xmlto ps foo.xml       # To make Postscript
 290 Convert to XSL-FO
 291 Making portrait pages on A4 paper (210mmx297mm)
 292 Post-process XSL-FO to DVI
 293 Post-process DVI to PS
 294 bash$ ls *.ps
 295 foo.ps
 296 </screen>
 297 
 298 <para>To turn your documents into HTML or Postscript, you need an
 299 engine that can apply the combination of DocBook DTD and 
 300 a suitable stylesheet to your document.  Here is how the 
 301 open-source tools for doing this fit together:</para>
 302 
 303 <mediaobject>
 304 <imageobject> <imagedata fileref="figure2.png" format="PNG"/></imageobject>
 305 </mediaobject>
 306 
 307 <para>Parsing your document and applying the stylesheet transformation
 308 will be handled by one of three programs.  The most likely one is
 309 <application>xsltproc</application><indexterm><primary>xsltproc</primary></indexterm>,
 310 the parser that ships with Red Hat 7.3.  The other possibilities are
 311 two Java programs,
 312 <application>Saxon</application><indexterm><primary>Saxon</primary></indexterm>
 313 and
 314 <application>Xalan</application><indexterm><primary>Xalan</primary></indexterm>,</para>
 315 
 316 <para>It is relatively easy to generate high-quality XHTML from either
 317 DocBook; the fact that XHTML is simply another XML DTD helps a lot.
 318 Translation to HTML is done by applying a rather simple stylesheet,
 319 and that's the end of the story.  RTF is also simple to generate in
 320 this way, and from XHTML or RTF it's easy to generate a flat ASCII
 321 text approximation in a pinch.</para>
 322 
 323 <para>The awkward case is print.  Generating high-quality printed
 324 output (which means, in practice, Adobe's
 325 PDF<indexterm><primary>PDF</primary></indexterm>
 326 (Portable Document Format) is difficult.  Doing it right requires
 327 algorithmically duplicating the delicate judgments of a human
 328 typesetter moving from content to presentation level.</para>
 329 
 330 <para>So, first, a stylesheet translates Docbook's structural markup
 331 into another dialect of XML &mdash;
 332 FO<indexterm><primary>FO</primary></indexterm>
 333 (Formatting Objects).  FO markup is very much presentation-level; you
 334 can think of it as a sort of XML functional equivalent of troff.  It
 335 has to be translated to Postscript for packaging in a PDF.</para>
 336 
 337 <para>In the toolchain shipped with Red Hat, this job is handled by a
 338 TeX macro package called
 339 <application>PassiveTeX</application><indexterm><primary>PassiveTeX</primary></indexterm>. It
 340 translates the formatting objects generated by
 341 <command>xsltproc</command> into Donald Knuth's TeX language.  TeX was
 342 one of the earliest open-source projects, an old but powerful
 343 presentation-level formatting language much beloved of mathematicians
 344 (to whom it provides particulaly elaborate facilities for describing
 345 mathematical notation).  TeX is also famously good at basic
 346 typesetting tasks like kerning, line filling, and hyphenating.  TeX's
 347 output, in what's called DVI<indexterm><primary>DVI</primary></indexterm>
 348 (DeVice Independent) format, is then massaged into PDF.</para>
 349 
 350 <para>If you think this bucket chain of XML to Tex macros to DVI to
 351 PDF sounds like an awkward kludge, you're right.  It clanks, it
 352 wheezes, and it has ugly warts.  Fonts are a significant problem,
 353 since XML and TeX and PDF have very different models of how fonts
 354 work; also, handling internationalization and localization is a
 355 nightmare. About the only thing this code path has going for it is
 356 that it works.</para>
 357 
 358 <para>The elegant way will be
 359 FOP<indexterm><primary>FOP</primary></indexterm>, a direct
 360 FO-to-Postscript translator being developed by the Apache project.
 361 With FOP, the internationalization problem is, if not solved, at least
 362 well confined; XML tools handle Unicode all the way through to FOP.
 363 Glyph to font mapping is also strictly FOP's problem.  The only
 364 trouble with this approach is that it doesn't work &mdash; yet.  As of
 365 August 2002 FOP is in an unfinished alpha state &mdash; usable, but
 366 with rough edges and missing features.</para>
 367 
 368 <para>Here is what the FOP toolchain looks like:</para>
 369 
 370 <mediaobject>
 371 <imageobject> <imagedata fileref="figure3.png" format="PNG"/></imageobject>
 372 </mediaobject>
 373 
 374 <para>FOP has competition.  There is another project called
 375 <application>xsl-fo-proc</application><indexterm><primary>xsl-fo-proc</primary></indexterm>
 376 which aims to do the same things as FOP, but in C++ (and therefore
 377 both faster than Java and not relying on the Java environment).  As of
 378 August 2002 FOP is in an unfinished alpha state, not as far along as
 379 FOP.</para>
 380 
 381 </sect1>
 382 <sect1><title>Who are the projects and the players?</title>
 383 
 384 <para>The DocBook DTD itself is maintained by the DocBook Technical
 385 Committee, headed by Norman Walsh.  Norm is the principal author of
 386 the DocBook stylesheets, a man who has focused remarkable energy and
 387 talent over many years on the extremely complex problems DocBook
 388 addresses.  He is as universally respected in the DocBook/SGML/XML
 389 community as Linus Torvalds is in the Linux world.</para>
 390 
 391 <para>The <ulink url="http://sources.redhat.com/docbook-tools/">
 392 docbook-tools</ulink> project provides open-source tools for
 393 converting SGML DocBook to HTML, Postscript, and other formats.  This
 394 package is shipped with Red Hat and other Linux distributions.  It is
 395 maintained by Mark Galassi.</para>
 396 
 397 <para><ulink url="http://www.jclark.com/jade/">Jade</ulink> is an
 398 engine used to apply DSSSL stylesheets to SGML documents.  It is
 399 maintained by James Clark.</para>
 400 
 401 <para><ulink url="http://openjade.sourceforge.net/">OpenJade</ulink>
 402 is a community project undertaken because the founders thought James
 403 Clark's maintainance of Jade was spotty. The docbook-tools programs
 404 use OpenJade.</para>
 405 
 406 <para><ulink url="http://xmlsoft.org/XSLT/">libxslt</ulink> is a C
 407 library that interprets XSLT, applying stylesheets to XML documents.
 408 It includes a wrapper program, <command>xsltproc</command>, that can be
 409 used as an XML formatter.  The code was written by Daniel Veillard
 410 under the auspices of the GNOME project, but does not require any
 411 GNOME code to run.  I hear it's blazingly fast compared to the 
 412 Java alternatives, not a surprising claim.</para>
 413 
 414 <para><ulink url="http://cyberelk.net/tim/xmlto/">xmlto</ulink> is the
 415 user interface of the XML toolchain that Red Hat ships.  It's written
 416 and maintained by Tim Waugh.</para>
 417 
 418 <para><ulink url="http://users.iclway.co.uk/mhkay/saxon/">Saxon</ulink>
 419 and <ulink url="http://xml.apache.org/xalan-j/">Xalan</ulink> are Java
 420 programs that interpret XSLT.  Saxon seems to be designed to work
 421 under Windows.  Xalan is part of the XML Apache project and native to
 422 Linux and BSD; it's designed to work with FOP.</para>
 423 
 424 <para><ulink
 425 url="http://users.ox.ac.uk/~rahtz/passivetex/">PassiveTeX</ulink> the
 426 package of LaTeX macros that <application>xmlto</application> uses for
 427 producing DVI from XML-DocBook. <ulink
 428 url="http://jadetex.sourceforge.net/">JadeTex</ulink> is the package
 429 of LaTeX macros that OpenJade uses for producing DVI from
 430 SGML-DocBook.</para>
 431 
 432 <para><ulink url="http://xml.apache.org/fop/">FOP</ulink> translates
 433 XML Formatting Objects to PDF.  It is part of the Apache XML project
 434 and is designed to work with Xalan.</para>
 435 
 436 </sect1>
 437 <sect1><title>Migration tools</title>
 438 
 439 <para>The second biggest problem with DocBook is the effort needed to
 440 convert old-style presentation markup to DocBook markup.  Human beings
 441 can usually parse the presentatition of a document into logical
 442 structure automatically, because (for example) they can tell from 
 443 context when an italic font means `emphasis' and when it meabs
 444 something else such as `this is a foreign phrase'.</para>
 445 
 446 <para>Somehow, in converting documents to DocBook, those
 447 sorts of distinctions need to be made explicit.  Sometimes
 448 they're present in the old markup; often they are not, and the
 449 missing  structural information has to be either deduced by 
 450 clever heuristics or added by a human.</para>
 451 
 452 <para>Here is a summary of the state of conversion tools from
 453 various other formats:</para>
 454 
 455 <variablelist>
 456 <varlistentry>
 457 <term>GNU Texinfo</term>
 458 <listitem>
 459 <para>The Free Software Foundation has made a policy decision to
 460 support DocBook as an interchange format.  Texinfo has enough
 461 structure to make reasonably good automatic conversion possible, and
 462 the 4.x versions of <command>makeinfo</command> feature a
 463 <option>--docbook</option> switch that generates DocBook.  More at the
 464 <ulink url="http://www.gnu.org/directory/texinfo.html">makeinfo
 465 project page</ulink>.</para>
 466 </listitem>
 467 </varlistentry>
 468 
 469 <varlistentry>
 470 <term>POD</term>
 471 <listitem>
 472 <para>There is a <ulink
 473 url="http://www.cpan.org/modules/by-module/Pod/">POD::DocBook</ulink>
 474 module that translates Plain Old Documentation markup to DocBook.  It
 475 claims to support every DocBook tag except the L&lt;&gt; italic tag.
 476 The man page also says "Nested =over/=back lists are not supported
 477 within DocBook." but notes that the module has been heavily
 478 tested.</para>
 479 </listitem>
 480 </varlistentry>
 481 
 482 <varlistentry>
 483 <term>LaTeX</term>
 484 <listitem>
 485 <para>LaTeX is a (mostly) structural markup macro language built on
 486 top of the TeX formatter.  There is a project called <ulink
 487 url="http://www.lrz-muenchen.de/services/software/sonstiges/tex4ht/mn.html">
 488 TeX4ht</ulink> that (according to the author of PassiveTeX) can
 489 generate DocBook from LaTeX.</para>
 490 </listitem>
 491 </varlistentry>
 492 
 493 <varlistentry>
 494 <term>man pages and other troff-based markups</term>
 495 <listitem>
 496 <para>This is generally considered the biggest and nastiest conversion
 497 problem.  And indeed, the basic
 498 <citerefentry><refentrytitle>troff</refentrytitle>
 499 <manvolnum>1</manvolnum></citerefentry> markup is at too low a presentation
 500 level for automatic conversion tools to do much of any good.  However,
 501 the gloom in the picture lightens significantly if we consider
 502 translation from sources of documents written in macro packages like
 503 <citerefentry><refentrytitle>man</refentrytitle>
 504 <manvolnum>7</manvolnum></citerefentry>.  These have enough structural
 505 features for automatic translation to get some traction.</para>
 506 
 507 <para>I wrote a tool to do this myself, because I couldn't find
 508 anything else that did a half-decent job of it (and the problem is
 509 interesting).  It's called <ulink
 510 url="http://www.tuxedo.org/~esr/doclifter/">doclifter</ulink>.  It will
 511 translate to either SGML or XML DocBook from
 512 <citerefentry><refentrytitle>man</refentrytitle>
 513 <manvolnum>7</manvolnum></citerefentry>,
 514 <citerefentry><refentrytitle>mdoc</refentrytitle>
 515 <manvolnum>7</manvolnum></citerefentry>,
 516 <citerefentry><refentrytitle>ms</refentrytitle>
 517 <manvolnum>7</manvolnum></citerefentry>, or
 518 <citerefentry><refentrytitle>me</refentrytitle>
 519 <manvolnum>7</manvolnum></citerefentry> macros.  See the documentation
 520 for details.</para>
 521 </listitem>
 522 </varlistentry>
 523 </variablelist>
 524 
 525 </sect1>
 526 <sect1><title>Editing tools</title>
 527 
 528 <para>One thing we presently do not have is a good open-source
 529 structure editor for SGML/XML documents.</para>
 530 
 531 <para><ulink url="http://www.lyx.org/">LyX</ulink> is a GUI word processor
 532 that uses LaTeX for printing and supports structural editing of LaTeX
 533 markup.  There is a LaTeX package that generates DocBook, and a
 534 <ulink url="http://bgu.chez.tiscali.fr/doc/db4lyx/">how-to document</ulink>
 535 escribing how to write SGML and XML in the LyX GUI.</para>
 536 
 537 <para><ulink url="http://idx-getox.idealx.org/">GeTox</ulink>, the
 538 GNOME XML Editor, aims at nontechnical users.  But the software is
 539 still (as of August 2001) alpha, more a proof of concept than anything
 540 useful, and the project group seems not to be very active; there have
 541 been no updates of the website between May 2001 and August 2002 (time of
 542 writing).</para>
 543 
 544 <para><ulink
 545 url="http://www.math.u-psud.fr/~anh/TeXmacs/TeXmacs.html"> GNU
 546 TeXMacs</ulink> is a project aimed at producing an editor that is good
 547 for technical and mathematical material, including displayed formulas.
 548 1.0 was released in April 2002.  The developers plan XML support in
 549 the future, but it's not there yet.</para>
 550 
 551 <para><ulink url="http://www.freesoftware.fsf.org/thotbook/">ThotBook</ulink>
 552 is a project to put together a GUI editor for DocBook based on
 553 the Thot toolkit.  It way be moribund; the web page was not updated
 554 from November 2001 to August 2002 (time of writing).</para>
 555 
 556 <para>Most people still hack the tags by hand using either vi or Emacs, using
 557 psgml to validate the results.</para>
 558 
 559 </sect1>
 560 <sect1><title>Related standards and practices</title>
 561 
 562 <para>The tools are coming together, if slowly, to edit and format
 563 DocBook markup. But DocBook itself is a means, not an end.  We'll need
 564 other standards besides DocBook itself to accomplish the
 565 searchable-documentation-database objective I laid out at the
 566 beginning of this document. There are two big issues: document
 567 cataloguing and metadata.</para>
 568 
 569 <para>The <ulink
 570 url="http://scrollkeeper.sourceforge.net/">Scrollkeeper</ulink>
 571 project aims directly to meet this need. It provides a simple set of
 572 script hooks that can be used by package install and uninstall
 573 productions to register and unregister their documentation.</para>
 574 
 575 <para>Scrollkeeper uses the <ulink
 576 url="http://www.ibiblio.org/osrt/omf/"> Open Metadata Format</ulink>.
 577 This is a standard for indexing open-source documentation analogous to
 578 a library card-catalog system.  The idea is to support rich search
 579 facilities that use the card-catalog metadata as well as the source 
 580 text of the documentation itself.</para>
 581 
 582 </sect1>
 583 
 584 <sect1><title>SGML and SGML-Tools</title>
 585 
 586 <para>In previous sections, I have thrown away a lot of DocBook's
 587 history.  XML has an older brother,
 588 SGML<indexterm><primary>SGML</primary></indexterm> or Standard Generalized
 589 Markup Language.</para>
 590 
 591 <para>Until mid-2002, no discussion of DocBook would have been
 592 complete without a long excursion into SGML, the differences between
 593 SGML and XML, and detailed descriptions of the SGML DocBook toolchain.
 594 Life can be simpler now; a XML DocBook toolchain is available in open
 595 source, works as well as the SGML toolchain ever did, and is easier to
 596 use, If you don't think you'll ever have to deal with old SGML-Docbook
 597 documents, you can skip the remainder of this section.</para>
 598 
 599 <sect2><title>DocBook SGML</title>
 600 
 601 <para>DocBook was originally an SGML application, and there was an
 602 SGML-based DocBook toolchain that is now moribund.  There are minor
 603 differences between the DocBook SGML DTD and the DocBook XML DTD, but
 604 for an introductory discussion we can ignore them. The only one that's
 605 normally user-visible is that in SGML contentless tags did not need to
 606 have a trailing slash added to them before the closing &gt;.
 607 (Requiring the trailing / means XML parsers can be a lot simpler,
 608 because they don't have to know about the DTD to know which opening
 609 tags need closers.)</para>
 610 
 611 <para>Versions of HTML up to 4.01 (before XHTML) were SGML
 612 applications.  TEI was originally an SGML application, too.  The
 613 groups managing all three DTDs jumped to XML for the same reason
 614 DocBook's developers did &mdash; it's drastically simpler.  SGML was
 615 extremely complex; unmanageably so, as it turns out.  The
 616 specification was a dense 150 pages and it is not reliably reported
 617 that any software ever fully implemented it.</para>
 618 
 619 <para>The toolchain diagram I gave earlier was simplified; it
 620 only showed the XML toolchain.  Here is the historically
 621 correct version:</para>
 622 
 623 <mediaobject>
 624 <imageobject><imagedata fileref="figure4.png" format="PNG"/></imageobject>
 625 </mediaobject>
 626 
 627 <para>The DSSSL toolchain is what processed DocBook SGML.
 628 Under it, a document goes from DocBook format through one of two
 629 closely-related stylesheet engines called Jade and OpenJade.  These
 630 turn it into a TeX-macro markup. which is processed by a package called
 631 JadeTeX, into DVIs, which then get turned into Postscript.</para>
 632 </sect2>
 633 
 634 <sect2><title>Why SGML DocBook is dead</title>
 635 
 636 <para>The DSSSL toolchain is, as far as new development goes,
 637 effectively dead.  The XSLT toolchain has just reached production
 638 status as I write in August 2002; a working version shipped in Red Hat
 639 7.3.  It's where DocBook developers are putting almost all of their
 640 effort.</para>
 641 
 642 <para>The reason for the change to XML was threefold.  First,
 643 SGML turned out to be too complicated to use; then, DSSSL turned out
 644 to be too complicated to live with; then, significant parts of the
 645 DSSSL toolchain turned out to be weak and irredeemably messy.</para>
 646 
 647 <para>Relative to SGML, XML has a reduced feature set that is
 648 sufficient for almost all purposes but much easier to understand and
 649 build parsers for.  SGML-processing tools (such as validating parsers) have
 650 to carry around support for a lot of features that DocBook and other
 651 text markup systems never actually used.  Removing these features
 652 made XML simpler and XML-processing tools faster.</para>
 653 
 654 <para>The language used to describe SGML DTDs is sufficiently spiky
 655 and forbidding that composing SGML DTDs was something of a black art.
 656 XML DTDs, on the other hand, can be described in a dialect of XML
 657 itself; there does not need to be a separate DTD language. An XML
 658 description of an XML DTD is called a
 659 <firstterm>schema</firstterm><indexterm><primary>schema</primary></indexterm>;
 660 the term DTD itself will probably pass out of use as the standards for
 661 schemas firm up.</para>
 662 
 663 <para>But mostly the DSSSL toolchain is dead because DSSSL itself, the
 664 SGML stylesheet description language in that toolchain, proved just too
 665 arcane for most human beings, and made stylesheets too difficult to
 666 write and modify. (It was a dialect of Scheme.  Your humble editor, a
 667 LISP-head from way back, shakes his head in sad bemusement that
 668 this should drive people away.)</para>
 669 
 670 <para>XML fans like to sum up all these changes with "XML: tastes great, less
 671 filling."</para>
 672 </sect2>
 673 
 674 <sect2><title>SGML-Tools</title>
 675 
 676 <para>SGML-Tools was the name of a DTD used by the <ulink
 677 url="http://www.linuxdoc.org">Linux Documentation Project</ulink>,
 678 developed a few years ago when today's DocBook toolchains didn't exist.
 679 SGML-Tools markup was simpler, but also much less flexible than
 680 DocBook.  The original SGML-Tools formatter/DTD/stylesheet(s)
 681 toolchain has been dead for some time now, but a successor called <ulink
 682 url="http://sourceforge.net/projects/sgmltools-lite/">SGML-tools
 683 Lite</ulink> is still maintained.</para>
 684 
 685 <para>The LDP has been phasing out SGML-Tools in favor of DocBook, but
 686 it is still possible you might take over an old HOWTO.  These can be
 687 regognized by the identifying header "&lt;!doctype linuxdoc
 688 system&gt;. If this happens to you, convert the thing to XML DocBook
 689 and give the old version a quick burial.</para>
 690 </sect2>
 691 </sect1>
 692 
 693 <sect1><title>References</title>
 694 
 695 <para>One of the things that makes learning DocBook difficult is that
 696 the sites related to it tend to overwhelm the newbie with long lists
 697 of W3C standards, massive exercises in SGML theology, and dense
 698 thickets of abstract terminology.  We're going to try to avoid that
 699 here by giving you just a few selected references to look at.</para>
 700 
 701 <para>Michael Smith's <ulink
 702 url="http://xml.oreilly.com/news/dontlearn_0701.html">
 703 Take My Advice: Don't Learn XML</ulink> surveys the XML world from
 704 an angle similar to this document.</para>
 705 
 706 <para>Norman Walsh's <citetitle>DocBook: The Definitive
 707 Guide</citetitle> is available <ulink
 708 url="http://www.oreilly.com/catalog/docbook/">in print</ulink> and
 709 <ulink url="http://www.docbook.org/tdg/en/html/docbook.html">on the
 710 web</ulink>.  This is indeed the definitive reference, but as an
 711 introduction or tutorial it's a disaster.  Instead, read this:</para>
 712 
 713 <para><ulink url="http://www.bureau-cornavin.com/opensource/crash-course/index.html">Writing 
 714 Documentation Using DocBook: A Crash Course</ulink>.  This is an excellent
 715 tutorial.</para>
 716 
 717 <para>There is an excellent <ulink
 718 url="http://www.dpawson.co.uk/docbook/">DocBook FAQ</ulink> with a lot
 719 of material on styling HTML output.  There is also a DocBook <ulink
 720 url="http://docbook.org/wiki/moin.cgi">wiki</ulink>.</para>
 721 
 722 <para>If you're writing for the Linux Documentation Project, read the
 723 <ulink url="http://www.linuxdoc.org/LDP/LDP-Author-Guide/index.html">
 724 LDP Author Guide</ulink>.</para>
 725 
 726 <para>The best general introduction to SGML and XML that I've
 727 personally read all the way through is David Megginson's <ulink
 728 url="http://vig.pearsoned.com/store/product/0,,store-562_banner-0_isbn-0136422993,00.html">Structuring
 729 XML Documents</ulink> (Prentice-Hall, ISBN: 0-13-642299-3).</para>
 730 
 731 <para>For XML only, <ulink
 732 url="http://www.oreilly.com/catalog/xmlnut2/">XML In A Nutshell</ulink>
 733 by W. Scott Means and Elliotte "Rusty" Harold is very good.</para>
 734 
 735 <para><ulink url="http://www.ibiblio.org/xml/books/bible/">The XML
 736 Bible</ulink> looks like a pretty comprehensive reference on XML and
 737 related standards (including Formatting Objects).</para>
 738 
 739 <para>Finally, the <ulink url="http://xml.coverpages.org/">The XML
 740 Cover Pages</ulink> will take you into the jungle of XML standards
 741 if you really want to go there.</para>
 742 
 743 </sect1>
 744 </article>
 745 
 746 <!-- Keep this comment at the end of the file
 747 Local variables:
 748 mode: sgml
 749 sgml-omittag:t
 750 sgml-shorttag:t
 751 sgml-namecase-general:t
 752 sgml-general-insert-case:lower
 753 sgml-minimize-attributes:nil
 754 sgml-always-quote-attributes:t
 755 sgml-indent-step:1
 756 sgml-indent-data:nil
 757 sgml-parent-document:nil
 758 sgml-exposed-tags:nil
 759 sgml-local-catalogs:nil
 760 sgml-local-ecat-files:nil
 761 End:
 762 -->

Gespeicherte Dateianhänge

Um Dateianhänge in eine Seite einzufügen sollte unbedingt eine Angabe wie attachment:dateiname benutzt werden, wie sie auch in der folgenden Liste der Dateien erscheint. Es sollte niemals die URL des Verweises ("laden") kopiert werden, da sich diese jederzeit ändern kann und damit der Verweis auf die Datei brechen würde.
  • [laden | anzeigen] (2003-04-18 14:37:47, 33.3 KB) [[attachment:DocBook-Demystification-HOWTO.xml]]
  • [laden | anzeigen] (2003-04-18 13:59:10, 5.0 KB) [[attachment:docbook2wiki.py]]
  • [laden | anzeigen] (2003-04-18 14:33:13, 0.9 KB) [[attachment:figure1.png]]
  • [laden | anzeigen] (2003-04-18 14:33:24, 3.7 KB) [[attachment:figure2.png]]
  • [laden | anzeigen] (2003-04-18 14:33:37, 3.1 KB) [[attachment:figure3.png]]
  • [laden | anzeigen] (2003-04-18 14:33:50, 7.1 KB) [[attachment:figure4.png]]
 Alle Dateien | Ausgewählte Dateien: löschen verschieben auf Seite kopieren auf Seite

Sie dürfen keine Anhänge an diese Seite anhängen!