Misplaced Pages

XML schema

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constraints are generally expressed using some combination of grammatical rules governing the order of elements, Boolean predicates that the content must satisfy, data types governing the content of elements and attributes, and more specialized rules such as uniqueness and referential integrity constraints.

#931068

79-507: There are languages developed specifically to express XML schemas. The document type definition (DTD) language, which is native to the XML specification, is a schema language that is of relatively limited capability, but that also has other uses in XML aside from the expression of schemas. Two more expressive XML schema languages in widespread use are XML Schema (with a capital S ) and RELAX NG . The mechanism for associating an XML document with

158-400: A XML Catalog ) the schema used in the parsed XML document and that is validated in another language. A common misconception holds that a non-validating XML parser does not have to read document type declarations, when in fact, the document type declarations must still be scanned for correct syntax as well as validity of declarations, and the parser must still parse all entity declarations in

237-516: A macro . The entity declaration assigns it a value that is retained throughout the document. A common use is to have a name more recognizable than a numeric character reference for an unfamiliar character. Entities help to improve legibility of an XML text. In general, there are two types: internal and external. An example of internal entity declarations (here in an internal DTD subset of an SGML document) is: Internal entities may be defined in any order, as long as they are not referenced and parsed in

316-424: A MIME type, interpreted as a relative URI, but it could be an absolute URI to a specific renderer, or a URN indicating an OS-specific object identifier such as a UUID). The declared notation name must be unique within all the document type declaration, i.e. in the external subset as well as the internal subset, at least for conformance with XML. Notations can be associated to unparsed external entities included in

395-493: A RELAX NG schema across multiple validators requires either providing those user-defined data types to that validator or using only the two basic types. In practice, however, most RELAX NG processors support the W3C XML Schema set of data types. Schematron is a fairly unusual schema language. Unlike the main three, it defines an XML file's syntax as a list of XPath -based rules. If the document passes these rules, then it

474-408: A better way to validate XML structure. A DTD is associated with an XML or SGML document by means of a document type declaration (DOCTYPE). The DOCTYPE appears in the syntactic fragment doctypedecl near the start of an XML document. The declaration establishes that the document is an instance of the type defined by the referenced DTD. DOCTYPEs make two sorts of declarations: The declarations in

553-520: A complex type may be constrained by assertions— XPath 2.0 expressions evaluated against the content that must evaluate to true. After XML Schema-based validation, it is possible to express an XML document's structure and content in terms of the data model that was implicit during validation. The XML Schema data model includes: This collection of information is called the Post-Schema-Validation Infoset (PSVI). The PSVI gives

632-410: A degree of modularity in their languages, including, for example, splitting the schema into multiple files. And both of them are, or can be, defined in an XML language. RELAX NG does not have any analog to PSVI . Unlike W3C XML Schema, RELAX NG was designed so that validation and augmentation (adding type information and default values) are separate. W3C XML Schema has a formal mechanism for attaching

711-459: A document, to assure it adheres to the description of the element it is placed in. Like all XML schema languages , XSD can be used to express a set of rules to which an XML document must conform to be considered "valid" according to that schema. However, unlike most other schema languages, XSD was also designed with the intent that determination of a document's validity would produce a collection of information adhering to specific data types . Such

790-408: A fixed value ( #FIXED ), or which value should be used as a default value ("…") in case the given attribute is left out in an XML tag. Attribute list declarations are ignored by non-validating SGML and XML parsers (in which cases any attribute is accepted within all elements of the parsed document), but these declarations are still checked for well-formedness and validity. An entity is similar to

869-408: A notation name "type-image-svg". However, notation names usually follow a naming convention that is specific to the application generating or using the notation: notations are interpreted as additional meta-data whose effective content is an external entity and either a PUBLIC FPI, registered in the catalogs used by XML or SGML parsers, or a SYSTEM URI, whose interpretation is application dependent (here

SECTION 10

#1732783002932

948-446: A notation named "type-image-svg" that references the standard public FPI and the system identifier (the standard URI) of an SVG 1.1 document, instead of specifying just a system identifier as in the first example (which was a relative URI interpreted locally as a MIME type). This annotation is referenced directly within the unparsed "type" attribute of the "img" element, but its content is not retrieved. It also declares another notation for

1027-560: A post-validation infoset can be useful in the development of XML document processing software. XML Schema , published as a W3C recommendation in May 2001, is one of several XML schema languages . It was the first separate schema language for XML to achieve Recommendation status by the W3C. Because of confusion between XML Schema as a specific W3C specification, and the use of the same term to describe schema languages in general, some parts of

1106-526: A potential security problem. For WXS validators that will follow a URI to an arbitrary online location, there is the potential for reading something malicious from the other side of the stream. W3C XML Schema does not implement most of the DTD ability to provide data elements to a document. Although W3C XML Schema's ability to add default attributes to elements is an advantage, it is a disadvantage in some ways as well. It means that an XML file may not be usable in

1185-467: A resolvable location. SGML allows mapping public identifiers to system identifiers in catalogs that are optionally available to the URI resolvers used by document parsing software. This DOCTYPE can only appear after the optional XML declaration , and before the document body, if the document syntax conforms to XML. This includes XHTML documents: An additional internal subset can also be provided after

1264-452: A schema language is to specify what the structure of an XML document can be. This means which elements can reside in which other elements, which attributes are and are not legal to have on a particular element, and so forth. A schema is analogous to a grammar for a language; a schema defines what the vocabulary for the language may be and what a valid "sentence" is. There are historic and current XML schema languages: The main ones (see also

1343-537: A schema to an XML document, while RELAX NG intentionally avoids such mechanisms for security and interoperability reasons. RELAX NG has no ability to apply default attribute data to an element's list of attributes (i.e., changing the XML info set), while W3C XML Schema does. Again, this design is intentional and is to separate validation and augmentation. W3C XML Schema has a rich "simple type" system built-in (xs:number, xs:date, etc., plus derivation of custom types), while RELAX NG has an extremely simplistic one because it

1422-428: A schema varies according to the schema language. The association may be achieved via markup within the XML document itself, or via some external means. The XML Schema Definition is commonly referred to as XSD. The process of checking to see if a XML document conforms to a schema is called validation , which is separate from XML's core concept of syntactic well-formedness . All XML documents must be well-formed, but it

1501-435: A set of schema components : chiefly element and attribute declarations and complex and simple type definitions. These components are usually created by processing a collection of schema documents , which contain the source language definitions of these components. In popular usage, however, a schema document is often referred to as a schema. Schema documents are organized by namespace: all the named schema components belong to

1580-404: A target namespace, and the target namespace is a property of the schema document as a whole. A schema document may include other schema documents for the same namespace, and may import schema documents for a different namespace. When an instance document is validated against a schema (a process known as assessment ), the schema to be used for validation can either be supplied as a parameter to

1659-468: A valid XML document its "type" and facilitates treating the document as an object, using object-oriented programming (OOP) paradigms. The primary reason for defining an XML schema is to formally describe an XML document; however the resulting schema has a number of other uses that go beyond simple validation. The schema can be used to generate code, referred to as XML Data Binding . This code allows contents of XML documents to be treated as objects within

SECTION 20

#1732783002932

1738-406: A vendor-specific application, to annotate the "sgml" root element in the document. In both cases, the declared notation named is used directly in a declared "type" attribute, whose content is specified in the DTD with the "NOTATION" attribute type (this "type" attribute is declared for the "sgml" element, as well as for the "img" element). However, the "title" attribute of the "img" element specifies

1817-433: A way that it becomes possible to parse XML documents with non-validating XML parsers (if the only purpose of the external DTD subset was to define the schema). In addition, documents for these XML schema languages must be parsed separately, so validating the schema of XML documents in pure standalone mode is not really possible with these languages: the document type declaration remains necessary for at least identifying (with

1896-495: Is a specification file that contains set of markup declarations that define a document type for an SGML -family markup language ( GML , SGML , XML , HTML ). The DTD specification file can be used to validate documents. A DTD defines the valid building blocks of an XML document. It defines the document structure with a list of validated elements and attributes. A DTD can be declared inline inside an XML document, or as an external reference. A namespace-aware version of DTDs

1975-472: Is actually an XSLT transformation that transforms the Schematron document into an XSLT that validates the XML file. As such, Schematron's potential toolset is any XSLT processor, though libxml2 provides an implementation that does not require XSLT. Sun Microsystems 's Multiple Schema Validator for Java has an add-on that allows it to validate RELAX NG schemas that have embedded Schematron rules. This

2054-653: Is also a compromise among them. Of those languages, XDR and SOX continued to be used and supported for a while after XML Schema was published. A number of Microsoft products supported XDR until the release of MSXML 6.0 (which dropped XDR in favor of XML Schema) in December 2006. Commerce One , Inc. supported its SOX schema language until declaring bankruptcy in late 2004. The most obvious features offered in XSD that are not available in XML's native Document Type Definitions (DTDs) are namespace awareness and datatypes, that is,

2133-446: Is also declared with its own notation). Notations are also completely opaque for XML and SGML parsers, so they are not differentiated by the type of the external entity that they may reference (for these parsers they just have a unique name associated to a public identifier (an FPI) and/or a system identifier (a URI)). Some applications (but not XML or SGML parsers themselves) also allow referencing notations indirectly by naming them in

2212-528: Is being developed as Part 9 of ISO DSDL . DTDs persist in applications that need special publishing characters, such as the XML and HTML Character Entity References , which derive from larger sets defined as part of the ISO SGML standard effort. XML uses a subset of SGML DTD. As of 2009 , newer XML namespace -aware schema languages (such as W3C XML Schema and ISO RELAX NG ) have largely superseded DTDs as

2291-416: Is complex and hard to learn, although that is partially because it tries to do more than mere validation (see PSVI ). Although being written in XML is an advantage, it is also a disadvantage in some ways. The W3C XML Schema language, in particular, can be quite verbose, while a DTD can be terse and relatively easily editable. Likewise, WXS's formal mechanism for associating a document with a schema can pose

2370-459: Is designed to make manipulation of the XML instance easier in application programs. This may be by mapping the XSD-defined types to types in a programming language such as Java ("data binding") or by enriching the type system of XML processing languages such as XSLT and XQuery (known as "schema-awareness"). RELAX NG and W3C XML Schema allow for similar mechanisms of specificity. Both allow for

2449-458: Is known as the compact syntax. Tools can easily convert between these forms with no loss of features or even commenting. Even arbitrary elements specified between RELAX NG XML elements can be converted into the compact form. RELAX NG provides very strong support for unordered content. That is, it allows the schema to state that a sequence of patterns may appear in any order. RELAX NG also allows for non-deterministic content models. What this means

XML schema - Misplaced Pages Continue

2528-548: Is located by its defined SYSTEM identifier "example1.svg" (also interpreted as a relative URI). The effective content for the "img" element be the content of this second external resource. The difference with the GIF image, is that the SVG image is parsed within the SGML document, according to the declarations in the DTD, where the GIF image is just referenced as an opaque external object (which

2607-558: Is meant to use type libraries developed independently of RELAX NG, rather than grow its own. This is seen by some as a disadvantage. In practice it is common for a RELAX NG schema to use the predefined "simple types" and "restrictions" (pattern, maxLength, etc.) of W3C XML Schema. In W3C XML Schema a specific number or range of repetitions of patterns can be expressed whereas it is practically not possible to specify at all in RELAX NG (<oneOrMore> or <zeroOrMore>). W3C XML Schema

2686-410: Is not accessible). However, such documents are still fully parsable in the non -standalone mode of validating parsers, which signals an error if it can not locate these external entities with their specified public identifier (FPI) or system identifier (a URI), or are inaccessible. (Notations declared in the DTD are also referencing external entities, but these unparsed entities are not needed for

2765-488: Is not parsable with SGML) via its "data" attribute (whose value type is an opaque ENTITY). Only one notation name may be specified in the value of ENTITY attributes (there is no support in SGML, XML 1.0 or XML 1.1 for multiple notation names in the same declared external ENTITY, so separate attributes are needed). However multiple external entities may be referenced (in a space-separated list of names) in attributes declared with type ENTITIES, and where each named external entity

2844-615: Is not required that a document be valid unless the XML parser is "validating", in which case the document is also checked for conformance with its associated schema. DTD-validating parsers are most common, but some support XML Schema or RELAX NG as well. Validation of an instance document against a schema can be regarded as a conceptually separate operation from XML parsing. In practice, however, many schema validators are integrated with an XML parser. There are several different languages available for specifying an XML schema. Each language has its strengths and weaknesses. The primary purpose of

2923-462: Is not technically a schema language. Its sole purpose is to direct parts of documents to individual schemas based on the namespace of the encountered elements. An NRL is merely a list of XML namespaces and a path to a schema that each corresponds to. This allows each schema to be concerned with only its own language definition, and the NRL file routes the schema validator to the correct schema file based on

3002-424: Is parsed as if it was: Reference to the "author" internal entity is not substituted in the replacement text of the "signature" internal entity. Instead, it is replaced only when the "signature" entity reference is parsed within the content of the "sgml" element, but only by validating parsers (non-validating parsers do not substitute entity references occurring within contents of element or within attribute values, in

3081-403: Is specified between "&" and ";") are not replaced like usual named entities (defined with a CDATA value), but are left as distinct unparsed tokens that may be used either as the value of an element attribute (like above) or within the element contents, provided that either the DTD allows such external entities in the declared content type of elements or in the declared type of attributes (here

3160-441: Is successful in that it has been widely adopted and largely achieves what it set out to, it has been the subject of a great deal of severe criticism, perhaps more so than any other W3C Recommendation. Good summaries of the criticisms are provided by James Clark, Anders Møller and Michael Schwartzbach, Rick Jelliffe and David Webber. General problems: Practical limitations of expressibility: Technical problems: XSD 1.1 became

3239-424: Is that RELAX NG allows the specification of a sequence like the following: When the validator encounters something that matches the "odd" pattern, it is unknown whether this is the optional last "odd" reference or simply one in the zeroOrMore sequence without looking ahead at the data. RELAX NG allows this kind of specification. W3C XML Schema requires all of its sequences to be fully deterministic, so mechanisms like

XML schema - Misplaced Pages Continue

3318-444: Is valid. Because of its rule-based nature, Schematron's specificity is very strong. It can require that the content of an element be controlled by one of its siblings. It can also request or require that the root element, regardless of what element that happens to be, have specific attributes. It can even specify required relationships between multiple XML files. While Schematron is good at relational constructs, its ability to specify

3397-434: The "URN:''name''" value of a standard CDATA attribute, everywhere a URI can be specified. However this behaviour is application-specific, and requires that the application maintains a catalog of known URNs to resolve them into the notations that have been parsed in a standard SGML or XML parser. This use allows notations to be defined only in a DTD stored as an external entity and referenced only as

3476-488: The ENTITY type for the data attribute), or the SGML parser is not validating the content. Notations may also be associated directly to elements as additional meta-data, without associating them to another external entity, by giving their names as possible values of some additional attributes (also declared in the DTD within the <!ATTLIST ...> declaration of the element). For example: The example above shows

3555-1148: The ISO 19757's endorsed languages ) are described below. Though there are a number of schema languages available, the primary three languages are Document Type Definitions , W3C XML Schema , and RELAX NG . Each language has its own advantages and disadvantages. DTDs are perhaps the most widely supported schema language for XML. Because DTDs are one of the earliest schema languages for XML, defined before XML even had namespace support, they are widely supported. Internal DTDs are often supported in XML processors; external DTDs are less often supported, but only slightly. Most large XML parsers, ones that support multiple XML technologies, will provide support for DTDs as well. Features available in XSD that are missing from DTDs include: XSD schemas are conventionally written as XML documents, so familiar editing and transformation tools can be used. As well as validation, XSD allows XML instances to be annotated with type information (the Post-Schema-Validation Infoset (PSVI) ) which

3634-604: The internal subset , and substitute the replacement texts of internal entities occurring anywhere in the document type declaration or in the document body. PSVI 1.0, Part 2 Datatypes (Recommendation) , 1.1, Part 1 Structures (Recommendation) , XSD ( XML Schema Definition ), a recommendation of the World Wide Web Consortium ( W3C ), specifies how to formally describe the elements in an Extensible Markup Language ( XML ) document. It can be used by programmers to verify each piece of item content in

3713-655: The type of each attribute value, if not an explicit set of valid values. DTD markup declarations declare which element types , attribute lists , entities , and notations are allowed in the structure of the corresponding class of XML documents. An element type declaration defines an element and its possible content. A valid XML document contains only elements that are defined in the DTD. Various keywords and characters specify an element's content: For example: Element type declarations are ignored by non-validating SGML and XML parsers (in which cases, any elements are accepted in any order, and in any number of occurrences in

3792-400: The DTD loses its special role outside the DTD and it becomes a literal character. However, the references to predefined character entities are substituted wherever they occur, without needing a validating parser (they are only introduced by the "&" character). Notations are used in SGML or XML. They provide a complete reference to unparsed external entities whose interpretation is left to

3871-444: The DTD or in the body of the document, in their order of parsing: it is valid to include a reference to a still undefined entity within the content of a parsed entity, but it is invalid to include anywhere else any named entity reference before this entity has been fully defined, including all other internal entities referenced in its defined content (this also prevents circular or recursive definitions of internal entities). This document

3950-745: The DTD or in the document body but not declared: The XML DTD syntax is one of several XML schema languages. However, many of the schema languages do not fully replace the XML DTD. Notably, the XML DTD allows defining entities and notations that have no direct equivalents in DTD-less XML (because internal entities and parsable external entities are not part of XML schema languages, and because other unparsed external entities and notations have no simple equivalent mappings in most XML schema languages). Most XML schema languages are only replacements for element declarations and attribute list declarations, in such

4029-450: The ability to define element and attribute content as containing values such as integers and dates rather than arbitrary text. The XSD 1.0 specification was originally published in 2001, with a second edition following in 2004 to correct large numbers of errors. XSD 1.1 became a W3C Recommendation in April 2012 . Technically, a schema is an abstract collection of metadata, consisting of

SECTION 50

#1732783002932

4108-405: The above must be either specified in a different way or omitted altogether. RELAX NG allows attributes to be treated as elements in content models. In particular, this means that one can provide the following: This block states that the element "some_element" must have an attribute named "has_name". This attribute can only take true or false as values, and if it is true, the first child element of

4187-424: The absence of its schema, even if the document would validate against that schema. In effect, all users of such an XML document must also implement the W3C XML Schema specification, thus ruling out minimalist or older XML parsers. It can also slow down the processing of the document, as the processor must potentially download and process a second XML file (the schema); however, a schema would normally then be cached, so

4266-471: The application (which interprets them directly or retrieves the external entity themselves), by assigning them a simple name, which is usable in the body of the document. For example, notations may be used to reference non-XML data in an XML 1.1 document. For example, to annotate SVG images to associate them with a specific renderer: This declares the TEXT of external images with this type, and associates it with

4345-453: The basic structure of a document, that is, which elements can go where, results in a very verbose schema. The typical way to solve this is to combine Schematron with RELAX NG or W3C XML Schema. There are several schema processors available for both languages that support this combined form. This allows Schematron rules to specify additional constraints to the structure defined by W3C XML Schema or RELAX NG. Schematron's reference implementation

4424-537: The body of the SGML or XML document. The PUBLIC or SYSTEM parameter of these external entities specifies the FPI and/or the URI where the unparsed data of the external entity is located, and the additional NDATA parameter of these defined entities specifies the additional notation (i.e., effectively the MIME type here). For example: Within the body of the SGML document, these referenced external entities (whose name

4503-504: The body of the document. This is possible because the replacement text specified in the internal entity definitions permits a distinction between parameter entity references (that are introduced by the "%" character and whose replacement applies to the parsed DTD contents) and general entity references (that are introduced by the "&" character and whose replacement is delayed until they are effectively parsed and validated). The "%" character for introducing parameter entity references in

4582-466: The content model of these documents. The following example of a DOCTYPE contains both public and system identifiers: All HTML 4.01 documents conform to one of three SGML DTDs. The public identifiers of these DTDs are constant and are as follows: The system identifiers of these DTDs, if present in the DOCTYPE, are URI references . A system identifier usually points to a specific set of declarations in

4661-444: The cost comes only on the first use. WXS support exists in a number of large XML parsing packages. Xerces and the .NET Framework 's Base Class Library both provide support for WXS validation. RELAX NG provides for most of the advantages that W3C XML Schema does over DTDs. While the language of RELAX NG can be written in XML, it also has an equivalent form that is much more like a DTD, but with greater specifying power. This form

4740-402: The definition of many more. In theory, the lack of a specific list allows a processor to support data types that are very problem-domain specific. Most RELAX NG schemas can be algorithmically converted into W3C XML Schemas and even DTDs (except when using RELAX NG features not supported by those languages, as above). The reverse is not true. As such, RELAX NG can be used as a normative version of

4819-524: The element must be "name", which stores text. If "name" did not need to be the first element, then the choice could be wrapped in an "interleave" element along with other elements. The order of the specification of attributes in RELAX NG has no meaning, so this block need not be the first block in the element definition. W3C XML Schema cannot specify such a dependency between the content of an attribute and child elements. RELAX NG's specification only lists two built-in types (string and token), but it allows for

SECTION 60

#1732783002932

4898-462: The external subset of documents, and allows these documents to remain compatible with validating XML or SGML parsers that have no direct support for notations. Notations are not used in HTML, or in basic profiles for XHTML and SVG, because: Even in validating SGML or XML 1.0 or XML 1.1 parsers, the external entities referenced by an FPI and/or URI in declared notations are not retrieved automatically by

4977-406: The external subset: Alternatively, only the internal subset may be provided: Finally, the document type definition may include no subset at all; in that case, it just specifies that the document has a single top-level element (this is an implicit requirement for all valid XML and HTML documents, but not for document fragments or for all SGML documents, whose top-level elements may be different from

5056-413: The filename extension ".xsd". A unique Internet Media Type is not yet registered for XSDs, so "application/xml" or "text/xml" should be used, as per RFC 3023. The main components of a schema are: Other more specialized components include annotations, assertions, notations, and the schema component which contains information about the schema as a whole. Simple types (also called data types) constrain

5135-467: The implied root element), and it indicates the type name of the root element: DTDs describe the structure of a class of documents via element and attribute-list declarations. Element declarations name the allowable set of elements within the document, and specify whether and how declared elements and runs of character data may be contained within each element. Attribute-list declarations name the allowable set of attributes for each declared element, including

5214-401: The internal entity "example1SVGTitle" whose declaration that does not define an annotation, so it is parsed by validating parsers and the entity replacement text is "Title of example1.svg". The content of the "img" element references another external entity "example1SVG" whose declaration also does not define an notation, so it is also parsed by validating parsers and the entity replacement text

5293-789: The internal subset form part of the DOCTYPE in the document itself. The declarations in the external subset are located in a separate text file. The external subset may be referenced via a public identifier and/or a system identifier . Programs for reading documents may not be required to read the external subset. Any valid SGML or XML document that references an external subset in its DTD, or whose body contains references to parsed external entities declared in its DTD (including those declared within its internal subset ), may only be partially parsed but cannot be fully validated by validating SGML or XML parsers in their standalone mode (this means that these validating parsers do not attempt to retrieve these external entities, and their replacement text

5372-479: The namespace of that element. This XML format is schema-language agnostic and works for just about any schema language. Capitalization in the schema word: there is some confusion as to when to use the capitalized spelling "Schema" and when to use the lowercase spelling. The lowercase form is a generic term and may refer to any type of schema, including DTD, XML Schema (aka XSD), RELAX NG, or others, and should always be written using lowercase except when appearing at

5451-441: The parsed document), but these declarations are still checked for form and validity. An attribute list specifies for a given element type the list of all possible attribute associated with that type. For each possible attribute, it contains: For example: Here are some attribute types supported by both SGML and XML: A default value can define whether an attribute must occur ( #REQUIRED ) or not ( #IMPLIED ), or whether it has

5530-418: The parsers themselves. Instead, these parsers just provide to the application the parsed FPI and/or URI associated to the notations found in the parsed SGML or XML document, and with a facility for a dictionary containing all notation names declared in the DTD; these validating parsers also check the uniqueness of notation name declarations, and report a validation error if some notation names are used anywhere in

5609-451: The permitted content of an element, including its element and text children and its attributes. A complex type definition consists of a set of attribute uses and a content model. Varieties of content model include: A complex type can be derived from another complex type by restriction (disallowing some elements, attributes, or values that the base type permits) or by extension (allowing additional attributes and elements to appear). In XSD 1.1,

5688-469: The programming environment. The schema can be used to generate human-readable documentation of an XML file structure; this is especially useful where the authors have made use of the annotation elements. No formal standard exists for documentation generation, but a number of tools are available, such as the Xs3p stylesheet, that will produce high-quality readable HTML and printed material. Although XML Schema

5767-402: The schema, and the user can convert it to other forms for tools that do not support RELAX NG. Most of RELAX NG's disadvantages are covered under the section on W3C XML Schema's advantages over RELAX NG. Though RELAX NG's ability to support user-defined data types is useful, it comes at the disadvantage of only having two data types that the user can rely upon. Which, in theory, means that using

5846-470: The specification itself, and further derived types can be defined by users in their own schemas. The mechanisms available for restricting data types include the ability to specify minimum and maximum values, regular expressions, constraints on the length of strings, and constraints on the number of digits in decimal values. XSD 1.1 again adds assertions, the ability to specify an arbitrary constraint by means of an XPath 2.0 expression. Complex types describe

5925-581: The start of a sentence. The form "Schema" (capitalized) in common use in the XML community always refers to W3C XML Schema . The focus of the schema definition is structure and some semantics of documents. However, schema design, just like design of databases, computer program, and other formal constructs, also involve many considerations of style, convention, and readability. Extensive discussions of schema design issues can be found in (for example) Maler (1995) and DeRose (1997). Languages: Document type definition A document type definition ( DTD )

6004-668: The textual values that may appear in an element or attribute. This is one of the more significant ways in which XML Schema differs from DTDs. For example, an attribute might be constrained to hold only a valid date or a decimal number. XSD provides a set of 19 primitive data types ( anyURI , base64Binary , boolean , date , dateTime , decimal , double , duration , float , hexBinary , gDay , gMonth , gMonthDay , gYear , gYearMonth , NOTATION , QName , string , and time ). It allows new data types to be constructed from these primitives by three mechanisms: Twenty-five derived types are defined within

6083-510: The user community referred to this language as WXS , an initialism for W3C XML Schema, while others referred to it as XSD , an initialism for XML Schema Definition. In Version 1.1 the W3C has chosen to adopt XSD as the preferred name, and that is the name used in this article. In its appendix of references, the XSD specification acknowledges the influence of DTDs and other early XML schema efforts such as DDML , SOX , XML-Data, and XDR . It has adopted features from each of these proposals but

6162-469: The validation engine, or it can be referenced directly from the instance document using two special attributes, xsi:schemaLocation and xsi:noNamespaceSchemaLocation . (The latter mechanism requires the client invoking validation to trust the document sufficiently to know that it is being validated against the correct schema. "xsi" is the conventional prefix for the namespace " http://www.w3.org/2001/XMLSchema-instance ".) XML Schema Documents usually have

6241-409: The validation of documents in the standalone mode of these parsers: the validation of all external entities referenced by notations is left to the application using the SGML or XML parser). Non-validating parsers may eventually attempt to locate these external entities in the non -standalone mode (by partially interpreting the DTD only to resolve their declared parsable entities), but do not validate

#931068