Misplaced Pages

DocBook

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

DocBook is a semantic markup language for technical documentation . It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of documentation.

#204795

56-570: As a semantic language, DocBook enables its users to create document content in a presentation-neutral form that captures the logical structure of the content; that content can then be published in a variety of formats, including HTML , XHTML , EPUB , PDF , man pages , WebHelp and HTML Help , without requiring users to make any changes to the source. In other words, when a document is written in DocBook format it becomes easily portable into other formats, rather than needing to be rewritten. DocBook

112-493: A RELAX NG schema across multiple validators requires either providing those user-defined data types to that validator or using only the two basic types. In practice, however, most RELAX NG processors support the W3C XML Schema set of data types. Schematron is a fairly unusual schema language. Unlike the main three, it defines an XML file's syntax as a list of XPath -based rules. If the document passes these rules, then it

168-411: A degree of modularity in their languages, including, for example, splitting the schema into multiple files. And both of them are, or can be, defined in an XML language. RELAX NG does not have any analog to PSVI . Unlike W3C XML Schema, RELAX NG was designed so that validation and augmentation (adding type information and default values) are separate. W3C XML Schema has a formal mechanism for attaching

224-627: A joint project of HAL Computer Systems and O'Reilly & Associates and eventually spawned its own maintenance organization (the Davenport Group) before moving in 1998 to the SGML Open consortium, which subsequently became OASIS . DocBook is currently maintained by the DocBook Technical Committee at OASIS. DocBook is available in both SGML and XML forms, as a DTD . RELAX NG and W3C XML Schema forms of

280-510: A key group of software companies used DocBook since their representatives were involved in its initial design. Eventually, however, DocBook was adopted by the open source community where it has become a standard for creating documentation for many projects, including FreeBSD , KDE , GNOME desktop documentation, the GTK+ API references, the Linux kernel documentation (which, as of July 2016,

336-475: A pane that appears as a frameset , but is actually implemented with div tags and cookies (so that it is progressive). DocBook offers a large number of features that may be overwhelming to a new user. For those who want the convenience of DocBook without a steep learning curve, Simplified DocBook was designed. It is a small subset of DocBook designed for single documents such as articles or white papers (i.e., "books" are not supported). The Simplified DocBook DTD

392-527: A potential security problem. For WXS validators that will follow a URI to an arbitrary online location, there is the potential for reading something malicious from the other side of the stream. W3C XML Schema does not implement most of the DTD ability to provide data elements to a document. Although W3C XML Schema's ability to add default attributes to elements is an advantage, it is a disadvantage in some ways as well. It means that an XML file may not be usable in

448-588: A schema can do so for DocBook. Many graphical or WYSIWYG XML editors come with the ability to edit DocBook like a word processor . Tables, list items, and other stylized content can be copied and pasted into the DocBook editor and will be preserved in the DocBook XML output. Because DocBook conforms to a well-defined XML schema, documents can be validated and processed using any tool or programming language that includes XML support. DocBook began in 1991 in discussion groups on Usenet and eventually became

504-452: A schema language is to specify what the structure of an XML document can be. This means which elements can reside in which other elements, which attributes are and are not legal to have on a particular element, and so forth. A schema is analogous to a grammar for a language; a schema defines what the vocabulary for the language may be and what a valid "sentence" is. There are historic and current XML schema languages: The main ones (see also

560-587: A schema to an XML document, while RELAX NG intentionally avoids such mechanisms for security and interoperability reasons. RELAX NG has no ability to apply default attribute data to an element's list of attributes (i.e., changing the XML info set), while W3C XML Schema does. Again, this design is intentional and is to separate validation and augmentation. W3C XML Schema has a rich "simple type" system built-in (xs:number, xs:date, etc., plus derivation of custom types), while RELAX NG has an extremely simplistic one because it

616-498: A text-based processor could use bold instead of italics. Semantically, this document is a "book", with a "title", that contains two "chapters" each with their own "titles". Those "chapters" contain "paragraphs" that have text in them. The markup is fairly readable in English. In more detail, the root element of the document is book . All DocBook elements are in an XML Namespace , so the root element has an xmlns attribute to set

SECTION 10

#1732780320205

672-510: A vast number of semantic element tags. They are divided into three broad categories, namely structural, block-level, and inline. Structural tags specify broad characteristics of their contents. The book element, for example, specifies that its child elements represent the parts of a book. This includes a title, chapters, glossaries, appendices, and so on. DocBook's structural tags include, but are not limited to: Structural elements can contain other structural elements. Structural elements are

728-472: Is actually an XSLT transformation that transforms the Schematron document into an XSLT that validates the XML file. As such, Schematron's potential toolset is any XSLT processor, though libxml2 provides an implementation that does not require XSLT. Sun Microsystems 's Multiple Schema Validator for Java has an add-on that allows it to validate RELAX NG schemas that have embedded Schematron rules. This

784-410: Is an XML language. In its current version (5.x), DocBook's language is formally defined by a RELAX NG schema with integrated Schematron rules. (There are also W3C XML Schema +Schematron and Document Type Definition (DTD) versions of the schema available, but these are considered non-standard.) As a semantic language, DocBook documents do not describe what their contents "look like", but rather

840-416: Is complex and hard to learn, although that is partially because it tries to do more than mere validation (see PSVI ). Although being written in XML is an advantage, it is also a disadvantage in some ways. The W3C XML Schema language, in particular, can be quite verbose, while a DTD can be terse and relatively easily editable. Likewise, WXS's formal mechanism for associating a document with a schema can pose

896-424: Is currently at version 1.1. Ingo Schwarze, the author of OpenBSD 's mandoc , considers DocBook inferior to the semantic mdoc macro for man pages . In an attempt to write a DocBook-to-mdoc converter (previous converters like docbook-to-man do not cover semantic elements), he finds the semantic parts "bloated, redundant, and incomplete at the same time" compared to elements covered in mdoc. Moreover, Schwarze finds

952-459: Is designed to make manipulation of the XML instance easier in application programs. This may be by mapping the XSD-defined types to types in a programming language such as Java ("data binding") or by enriching the type system of XML processing languages such as XSLT and XQuery (known as "schema-awareness"). RELAX NG and W3C XML Schema allow for similar mechanisms of specificity. Both allow for

1008-458: Is known as the compact syntax. Tools can easily convert between these forms with no loss of features or even commenting. Even arbitrary elements specified between RELAX NG XML elements can be converted into the compact form. RELAX NG provides very strong support for unordered content. That is, it allows the schema to state that a sequence of patterns may appear in any order. RELAX NG also allows for non-deterministic content models. What this means

1064-558: Is meant to use type libraries developed independently of RELAX NG, rather than grow its own. This is seen by some as a disadvantage. In practice it is common for a RELAX NG schema to use the predefined "simple types" and "restrictions" (pattern, maxLength, etc.) of W3C XML Schema. In W3C XML Schema a specific number or range of repetitions of patterns can be expressed whereas it is practically not possible to specify at all in RELAX NG (<oneOrMore> or <zeroOrMore>). W3C XML Schema

1120-438: Is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two more expressive XML schema languages in widespread use are XML Schema (with a capital S ) and RELAX NG . The mechanism for associating an XML document with a schema varies according to the schema language. The association may be achieved via markup within

1176-462: Is not technically a schema language. Its sole purpose is to direct parts of documents to individual schemas based on the namespace of the encountered elements. An NRL is merely a list of XML namespaces and a path to a schema that each corresponds to. This allows each schema to be concerned with only its own language definition, and the NRL file routes the schema validator to the correct schema file based on

SECTION 20

#1732780320205

1232-503: Is provided as part of the distribution of the DocBook 5 schema and specification package. DocBook files are used to prepare output files in a wide variety of formats. Nearly always, this is accomplished using DocBook XSL stylesheets. These are XSLT stylesheets that transform DocBook documents into a number of formats ( HTML , XSL-FO for later conversion into PDF , etc.). These stylesheets can be sophisticated enough to generate tables of contents, glossaries, and indexes. They can oversee

1288-424: Is that RELAX NG allows the specification of a sequence like the following: When the validator encounters something that matches the "odd" pattern, it is unknown whether this is the optional last "odd" reference or simply one in the zeroOrMore sequence without looking ahead at the data. RELAX NG allows this kind of specification. W3C XML Schema requires all of its sequences to be fully deterministic, so mechanisms like

1344-488: Is that, because it is a direct child of the book; it does not need to be named specially for a human reader. However, because the format was defined by a DTD, it did have to be named as such. The root element does not have or need a version , as the version is built into the DTD declaration at the top of a pre-DocBook 5 document. DocBook 4.x documents are not compatible with DocBook 5, but can be converted into DocBook 5 documents via an XSLT stylesheet. One ( db4-upgrade.xsl )

1400-406: Is the separation of concerns design principle as applied to the authoring and presentation of content. Under this principle, visual and design aspects (presentation and style) are separated from the core material and structure (content) of a document. A typical analogy used to explain this principle is the distinction between the human skeleton (as the structural component) and human flesh (as

1456-516: Is transitioning to Sphinx / reStructuredText ), and the work of the Linux Documentation Project . Until DocBook 5, DocBook was defined normatively by a Document Type Definition (DTD). Because DocBook was built originally as an application of SGML , the DTD was the only available schema language. DocBook 4.x formats can be SGML or XML, but the XML version does not have its own namespace. DocBook 4.x formats had to live within

1512-444: Is valid. Because of its rule-based nature, Schematron's specificity is very strong. It can require that the content of an element be controlled by one of its siblings. It can also request or require that the root element, regardless of what element that happens to be, have specific attributes. It can even specify required relationships between multiple XML files. While Schematron is good at relational constructs, its ability to specify

1568-464: The don't repeat yourself (DRY) principle. LaTeX is a document markup language that focuses primarily on the content and structure of a document. When a document is prepared using the LaTeX system, the source code of the document can be divided into two parts: the document body and the preamble (and the style sheets). The document body can be likened to the body of a HTML document, where one specifies

1624-1197: The ISO 19757's endorsed languages ) are described below. Though there are a number of schema languages available, the primary three languages are Document Type Definitions , W3C XML Schema , and RELAX NG . Each language has its own advantages and disadvantages. DTDs are perhaps the most widely supported schema language for XML. Because DTDs are one of the earliest schema languages for XML, defined before XML even had namespace support, they are widely supported. Internal DTDs are often supported in XML processors; external DTDs are less often supported, but only slightly. Most large XML parsers, ones that support multiple XML technologies, will provide support for DTDs as well. Features available in XSD that are missing from DTDs include: XSD schemas are conventionally written as XML documents, so familiar editing and transformation tools can be used. As well as validation, XSD allows XML instances to be annotated with type information (the Post-Schema-Validation Infoset (PSVI) ) which

1680-548: The DocBook Project development team maintain the key application for producing output from DocBook source documents: A set of XSLT stylesheets (as well as a legacy set of DSSSL stylesheets) that can generate high-quality HTML and print ( FO / PDF ) output, as well as output in other formats, including RTF , man pages and HTML Help. Web help is a chunked HTML output format in the DocBook XSL stylesheets that

1736-541: The DocBook specification not specific enough about the use of tags, the language non-portable across versions, rough in details and overall inconsistent. Norman Walsh is the principal author of the book DocBook: The Definitive Guide , the official documentation of DocBook. This book is available online under the GFDL , and also as a print publication. Separation of presentation and content Separation of content and presentation (or separation of content and style )

DocBook - Misplaced Pages Continue

1792-422: The XML document itself, or via some external means. The XML Schema Definition is commonly referred to as XSD. The process of checking to see if a XML document conforms to a schema is called validation , which is separate from XML's core concept of syntactic well-formedness . All XML documents must be well-formed, but it is not required that a document be valid unless the XML parser is "validating", in which case

1848-573: The XML version are available. Starting with DocBook 5, the RELAX NG version is the "normative" form from which the other formats are generated. DocBook originally started out as an SGML application, but an equivalent XML application was developed and has now replaced the SGML one for most uses. (Starting with version 4 of the SGML DTD, the XML DTD continued with this version numbering scheme.) Initially,

1904-405: The above must be either specified in a different way or omitted altogether. RELAX NG allows attributes to be treated as elements in content models. In particular, this means that one can provide the following: This block states that the element "some_element" must have an attribute named "has_name". This attribute can only take true or false as values, and if it is true, the first child element of

1960-424: The absence of its schema, even if the document would validate against that schema. In effect, all users of such an XML document must also implement the W3C XML Schema specification, thus ruling out minimalist or older XML parsers. It can also slow down the processing of the document, as the processor must potentially download and process a second XML file (the schema); however, a schema would normally then be cached, so

2016-453: The basic structure of a document, that is, which elements can go where, results in a very verbose schema. The typical way to solve this is to combine Schematron with RELAX NG or W3C XML Schema. There are several schema processors available for both languages that support this combined form. This allows Schematron rules to specify additional constraints to the structure defined by W3C XML Schema or RELAX NG. Schematron's reference implementation

2072-468: The columns running from right to left, so "after" in that case would be to the left. DocBook semantics are entirely neutral to these kinds of language-based concepts. Inline-level tags are elements like emphasis, hyperlinks, etc. They wrap text within a block-level element. These elements do not cause the text to break when rendered in a paragraph format, but typically they cause the document processor to apply some kind of distinct typographical treatment to

2128-550: The content and the structure of the document, whereas the preamble (and the style sheets) can be likened to the CSS portion of a HTML document, where the formatting, document specifications and other visual attributes are specified. Under this methodology, academic writings and publications can be structured, styled and typeset with minimal effort by its creators. In fact, it also prevents the end-users — who are usually not trained as designers themselves — from alternating between tweaking

2184-444: The cost comes only on the first use. WXS support exists in a number of large XML parsing packages. Xerces and the .NET Framework 's Base Class Library both provide support for WXS validation. RELAX NG provides for most of the advantages that W3C XML Schema does over DTDs. While the language of RELAX NG can be written in XML, it also has an equivalent form that is much more like a DTD, but with greater specifying power. This form

2240-432: The current namespace. Also, the root element of a DocBook document must have a version that specifies the version of the format that the document is built on. (XML documents can include elements from multiple namespaces at once, like the id attributes in the example.) A book element must contain a title , or an info element containing a title . This must be before any child structural elements. Following

2296-402: The definition of many more. In theory, the lack of a specific list allows a processor to support data types that are very problem-domain specific. Most RELAX NG schemas can be algorithmically converted into W3C XML Schemas and even DTDs (except when using RELAX NG features not supported by those languages, as above). The reverse is not true. As such, RELAX NG can be used as a normative version of

DocBook - Misplaced Pages Continue

2352-437: The document fails to conform to that schema. XML editing tools can also use schema information to avoid creating non-conforming documents in the first place. Because DocBook is XML, documents can be created and edited with any text editor. A dedicated XML editor is likewise a functional DocBook editor. DocBook provides schema files for popular XML schema languages, so any XML editor that can provide content completion based on

2408-521: The document is also checked for conformance with its associated schema. DTD-validating parsers are most common, but some support XML Schema or RELAX NG as well. Validation of an instance document against a schema can be regarded as a conceptually separate operation from XML parsing. In practice, however, many schema validators are integrated with an XML parser. There are several different languages available for specifying an XML schema. Each language has its strengths and weaknesses. The primary purpose of

2464-524: The element must be "name", which stores text. If "name" did not need to be the first element, then the choice could be wrapped in an "interleave" element along with other elements. The order of the specification of attributes in RELAX NG has no meaning, so this block need not be the first block in the element definition. W3C XML Schema cannot specify such a dependency between the content of an attribute and child elements. RELAX NG's specification only lists two built-in types (string and token), but it allows for

2520-403: The enclosed text, by changing the font, size, or similar attributes. (The DocBook specification does say that it expects different typographical treatment, but it does not offer specific requirements as to what this treatment may be.) That is, a DocBook processor doesn't have to transform an emphasis tag into italics . A reader-based DocBook processor could increase the size of the words, or,

2576-399: The formatting and working on the document itself. Similar to the case with HTML and CSS, the separation between content and style also allows a document to be quickly reformatted for different purposes, or a style to be re-purposed across multiple documents as well. XML schema An XML schema is a description of a type of XML document, typically expressed in terms of constraints on

2632-413: The meaning of those contents. For example, rather than explaining how the abstract for an article might be visually formatted, DocBook simply says that a particular section is an abstract. It is up to an external processing tool or application to decide where on a page the abstract should go and what it should look like or whether or not it should be included in the final output at all. DocBook provides

2688-479: The namespace of that element. This XML format is schema-language agnostic and works for just about any schema language. Capitalization in the schema word: there is some confusion as to when to use the capitalized spelling "Schema" and when to use the lowercase spelling. The lowercase form is a generic term and may refer to any type of schema, including DTD, XML Schema (aka XSD), RELAX NG, or others, and should always be written using lowercase except when appearing at

2744-589: The only permitted top-level elements in a DocBook document. Block-level tags are elements like paragraph, lists, etc. Not all these elements can directly contain text. Sequential block-level elements render one "after" another. After, in this case, can differ depending on the language. In most Western languages, "after" means below: text paragraphs are printed down the page. Other languages' writing systems can have different directionality ; for example, in Japanese, paragraphs are often printed in downward columns, with

2800-519: The restrictions of being defined by a DTD. The most significant restriction was that an element name uniquely defines its possible contents. That is, an element named info must contain the same information no matter where it is in the DocBook file. As such, there are many kinds of info elements in DocBook 4.x: bookinfo , chapterinfo , etc. Each has a slightly different content model, but they do share some of their content model. Additionally, they repeat context information. The book's info element

2856-402: The schema, and the user can convert it to other forms for tools that do not support RELAX NG. Most of RELAX NG's disadvantages are covered under the section on W3C XML Schema's advantages over RELAX NG. Though RELAX NG's ability to support user-defined data types is useful, it comes at the disadvantage of only having two data types that the user can rely upon. Which, in theory, means that using

SECTION 50

#1732780320205

2912-409: The selection of particular designated portions of a master document to produce different versions of the same document (such as a "tutorial" or a "quick-reference guide", where each of these consist of a subset of the material). Users can write their own customized stylesheets or even a full-fledged program to process the DocBook into an appropriate output format as their needs dictate. Norman Walsh and

2968-569: The structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints. There are languages developed specifically to express XML schemas. The document type definition (DTD) language, which

3024-538: The title are the structural children, in this case, two chapter elements. Each of these must have a title . They contain para block elements, which can contain free text and other inline elements like the emphasis in the second paragraph of the first chapter. Rules are formally defined in the DocBook XML schema . Appropriate programming tools can validate an XML document (DocBook or otherwise), against its corresponding schema, to determine if (and where)

3080-615: The visual component) which makes up the body's appearance. Common applications of this principle are seen in Web design ( HTML vs. CSS ) and document typesetting ( Lambert's document body vs. its preamble). This principle is not a rigid guideline, but serves more as best practice for keeping appearance and structure separate. In many cases, the design and development aspects of a project are performed by different people, so keeping both aspects separated ensures both initial production accountability and later maintenance simplification, as in

3136-477: Was introduced in version 1.76.1. The documentation for web help also provides an example of web help and is part of the DocBook XSL distribution. The major features are its fully CSS-based page layout, search of the help content, and a table of contents in collapsible-tree form. Search has stemming , match highlighting, explicit page-scoring, and the standard multilingual tokenizer . The search and TOC are in

#204795