VoiceXML ( VXML ) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It is used for developing audio and voice response applications, such as banking systems and automated customer service portals. VoiceXML applications are developed and deployed in a manner analogous to how a web browser interprets and visually renders the Hypertext Markup Language (HTML) it receives from a web server . VoiceXML documents are interpreted by a voice browser and in common deployment architectures, users interact with voice browsers via the public switched telephone network (PSTN).
58-667: The VoiceXML document format is based on Extensible Markup Language (XML). It is a standard developed by the World Wide Web Consortium (W3C). VoiceXML applications are commonly used in many industries and segments of commerce. These applications include order inquiry, package tracking, driving directions, emergency notification, wake-up, flight tracking, voice access to email, customer relationship management, prescription refilling, audio news magazines, voice dialing, real-estate information and national directory assistance applications. VoiceXML has tags that instruct
116-549: A numeric character reference . Consider the Chinese character "中", whose numeric code in Unicode is hexadecimal 4E2D, or decimal 20,013. A user whose keyboard offers no method for entering this character could still insert it in an XML document encoded either as 中 or 中 . Similarly, the string "I <3 Jörg" could be encoded for inclusion in an XML document as I <3 Jörg . �
174-403: A control console to drive these parameters, as well as sending commands from the media server to for instance control RGB or moving lighting and other stage effects. A major feature of high-end media servers is projection mapping, where multiple video projectors can project images onto irregularly shaped and positioned set pieces and even track their motion. This is usually done beforehand by
232-448: A list of syntax rules provided in the specification. Some key points in the fairly lengthy list include: The definition of an XML document excludes texts that contain violations of well-formedness rules; they are simply not XML. An XML processor that encounters such a violation is required to report such errors and to cease normal processing. This policy, occasionally referred to as " draconian error handling", stands in notable contrast to
290-522: A mechanism whereby an XML processor can reliably, without any prior knowledge, determine which encoding is being used. Encodings other than UTF-8 and UTF-16 are not necessarily recognized by every XML parser (and in some cases not even UTF-16, even though the standard mandates it to also be recognized). XML provides escape facilities for including characters that are problematic to include directly. For example: There are five predefined entities : All permitted Unicode characters may be represented with
348-423: A media server may require large amounts of RAM , or a powerful multicore CPU . A RAID array may be used to create a large amount of storage, though it is generally not necessary in a home media server to use a RAID array that gives a performance increase because current home network transfer speeds are slower than that of most current hard drives . However, a RAID configuration may be used to prevent loss of
406-555: A more compact non-XML syntax; the two syntaxes are isomorphic and James Clark 's conversion tool— Trang —can convert between them without loss of information. RELAX NG has a simpler definition and validation framework than XML Schema, making it easier to use and implement. It also has the ability to use datatype framework plug-ins ; a RELAX NG schema author, for example, can require values in an XML document to conform to definitions in XML Schema Datatypes. Schematron
464-554: A new protocol, MediaCTRL, is under development at the IETF . The IP Multimedia Subsystem (IMS), the blueprint for next generation networks, defines a component called the MRF (Media Resource Function), which is a kind of media server. In the case of IMS, the 'controlling logic' is provided by the MRFC (MRF controller), which, along with layers above, constitutes an 'application server'. Although
522-575: A relatively small set of additional features to VoiceXML 2.0, based on feedback from implementations of the 2.0 standard. It is backward compatible with VoiceXML 2.0 and reached W3C Recommendation status in June 2007. VoiceXML 3.0 was slated to be the next major release of VoiceXML, with new major features. However, with the disbanding of the VoiceXML Forum in May 2022, the development of the new standard
580-506: A rich datatyping system and allow for more detailed constraints on an XML document's logical structure. XSDs also use an XML-based format, which makes it possible to use ordinary XML tools to help process them. xs:schema element that defines a schema: RELAX NG (Regular Language for XML Next Generation) was initially specified by OASIS and is now a standard (Part 2: Regular-grammar-based validation of ISO/IEC 19757 – DSDL ). RELAX NG schemas may be written in either an XML based syntax or
638-421: A validity error must be able to report it, but may continue normal processing. A DTD is an example of a schema or grammar . Since the initial publication of XML 1.0, there has been substantial work in the area of schema languages for XML. Such schema languages typically constrain the set of elements that may be used in a document, which attributes may be applied to them, the order in which they may appear, and
SECTION 10
#1732791088722696-527: A vocabulary to refer to the constructs within an XML document, but does not provide any guidance on how to access this information. A variety of APIs for accessing XML have been developed and used, and some have been standardized. Existing APIs for XML processing tend to fall into these categories: Stream-oriented facilities require less memory and, for certain tasks based on a linear traversal of an XML document, are faster and simpler than other alternatives. Tree-traversal and data-binding APIs typically require
754-461: Is a lexical , event-driven API in which a document is read serially and its contents are reported as callbacks to various methods on a handler object of the user's design. SAX is fast and efficient to implement, but difficult to use for extracting information at random from the XML, since it tends to burden the application author with keeping track of what part of the document is being processed. It
812-456: Is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable . The World Wide Web Consortium 's XML 1.0 Specification of 1998 and several other related specifications —all of them free open standards —define XML. The design goals of XML emphasize simplicity, generality, and usability across
870-726: Is a language for making assertions about the presence or absence of patterns in an XML document. It typically uses XPath expressions. Schematron is now a standard (Part 3: Rule-based validation of ISO/IEC 19757 – DSDL ). DSDL (Document Schema Definition Languages) is a multi-part ISO/IEC standard (ISO/IEC 19757) that brings together a comprehensive set of small schema languages, each targeted at specific problems. DSDL includes RELAX NG full and compact syntax, Schematron assertion language, and languages for defining datatypes, character repertoire constraints, renaming and entity expansion, and namespace-based routing of document fragments to different validators. DSDL schema languages do not have
928-578: Is an XML industry data standard. XML is used extensively to underpin various publishing formats. One of the applications of XML is in the transfer of Operational meteorology (OPMET) information based on IWXXM standards. The material in this section is based on the XML Specification . This is not an exhaustive list of all the constructs that appear in XML; it provides an introduction to the key constructs most often encountered in day-to-day use. XML documents consist entirely of characters from
986-498: Is better suited to situations in which certain types of information are always handled the same way, no matter where they occur in the document. Pull parsing treats the document as a series of items read in sequence using the iterator design pattern . This allows for writing of recursive descent parsers in which the structure of the code performing the parsing mirrors the structure of the XML being parsed, and intermediate parsed results can be used and accessed as local variables within
1044-499: Is meant to be used by both speech recognizers and speech synthesizers in voice browsing applications. The Call Control eXtensible Markup Language (CCXML) is a complementary W3C standard. A CCXML interpreter is used on some VoiceXML platforms to handle the initial call setup between the caller and the voice browser, and to provide telephony services like call transfer and disconnect to the voice browser. CCXML can also be used in non-VoiceXML contexts. In media server applications, it
1102-442: Is not permitted because the null character is one of the control characters excluded from XML, even when using a numeric character reference. An alternative encoding mechanism such as Base64 is needed to represent such characters. Comments may appear anywhere in a document outside other markup. Comments cannot appear before the XML declaration. Comments begin with <!-- and end with --> . For compatibility with SGML ,
1160-528: Is now owned by Dialogic and Convedia is now owned by Radisys. These languages also contain 'hooks' so that external scripts (like VoiceXML) can run on call legs where IVR functionality is required. There was an IETF working group called mediactrl ("media control") that was working on a successor for these scripting systems, which it is hoped will progress to an open and widely adopted standard. The mediactrl working group concluded in 2013. Extensible Markup Language Extensible Markup Language ( XML )
1218-475: Is often necessary for several call legs to interact with each other, for example in a multi-party conference. Some deficiencies were identified in VoiceXML for this application and so companies designed specific scripting languages to deal with this environment. The Media Server Markup Language (MSML) was Convedia's solution, and Media Server Control Markup Language (MSCML) was Snowshore's solution. Snowshore
SECTION 20
#17327910887221276-455: The .NET Framework , and the DOM traversal API (NodeIterator and TreeWalker). Media server A media server is a computer appliance or an application software that stores digital media (video, audio or images) and makes it available over a network. Media servers range from servers that provide video on demand to smaller personal computers or NAS (Network Attached Storage) for
1334-509: The Internet . It is a textual data format with strong support via Unicode for different human languages . Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures , such as those used in web services . Several schema systems exist to aid in the definition of XML-based languages, while programmers have developed many application programming interfaces (APIs) to aid
1392-458: The Unicode repertoire. Except for a small number of specifically excluded control characters , any character defined by Unicode may appear within the content of an XML document. XML includes facilities for identifying the encoding of the Unicode characters that make up the document, and for expressing characters that, for one reason or another, cannot be used directly. Unicode code points in
1450-410: The infoset augmentation facility and attribute defaults. RELAX NG and Schematron intentionally do not provide these. A cluster of specifications closely related to XML have been developed, starting soon after the initial publication of XML 1.0. It is frequently the case that the term "XML" is used to refer to XML together with one or more of these other technologies that have come to be seen as part of
1508-601: The voice browser to provide speech synthesis , automatic speech recognition , dialog management, and audio playback. The following is an example of a VoiceXML document: When interpreted by a VoiceXML interpreter this will output "Hello world" with synthesized speech. Typically, HTTP is used as the transport protocol for fetching VoiceXML pages. Some applications may use static VoiceXML pages, while others rely on dynamic VoiceXML page generation using an application server like Tomcat , Weblogic , IIS , or WebSphere . Historically, VoiceXML platform vendors have implemented
1566-748: The Media server computer or by hardware on the TV tuner card. Digital TV tuner cards take input from broadcast digital TV. In North America and in South Korea, the ATSC standard is used. In most of the rest of the world, DVB-T is the accepted standard. Since these transmissions are already digital, they do not need to be encoded. A variety of packages are available to run a home theater or media center. The growing use of motion graphics in environments such as theatre , dance , corporate events and concerts has led to
1624-607: The VoiceXML Forum in March 1999, in order to develop a standard markup language for specifying voice dialogs. By September 1999 the Forum released VoiceXML 0.9 for member comment, and in March 2000 they published VoiceXML 1.0. Soon afterwards, the Forum turned over the control of the standard to the W3C. The W3C produced several intermediate versions of VoiceXML 2.0, which reached the final "Recommendation" stage in March 2004. VoiceXML 2.1 added
1682-429: The XML core. Some other specifications conceived as part of the "XML Core" have failed to find wide adoption, including XInclude , XLink , and XPointer . The design goals of XML include, "It shall be easy to write programs which process XML documents." Despite this, the XML specification contains almost no information about how programmers might go about doing such processing. The XML Infoset specification provides
1740-506: The XML processor inserts in the DTD itself and in the XML document wherever they are referenced, like character escapes. DTD technology is still used in many applications because of its ubiquity. A newer schema language, described by the W3C as the successor of DTDs, is XML Schema , often referred to by the initialism for XML Schema instances, XSD (XML Schema Definition). XSDs are far more powerful than DTDs in describing XML languages. They use
1798-434: The allowable parent/child relationships. The oldest schema language for XML is the document type definition (DTD), inherited from SGML. DTDs have the following benefits: DTDs have the following limitations: Two peculiar features that distinguish DTDs from other schema types are the syntactic support for embedding a DTD within XML documents and for defining entities , which are arbitrary fragments of text or markup that
VoiceXML - Misplaced Pages Continue
1856-621: The base language for communication protocols such as SOAP and XMPP . It is one of the message exchange formats used in the Asynchronous JavaScript and XML (AJAX) programming technique. Many industry data standards, such as Health Level 7 , OpenTravel Alliance , FpML , MISMO , and National Information Exchange Model are based on XML and the rich features of the XML schema specification. In publishing, Darwin Information Typing Architecture
1914-401: The behavior of programs that process HTML , which are designed to produce a reasonable result even in the presence of severe markup errors. XML's policy in this area has been criticized as a violation of Postel's law ("Be conservative in what you send; be liberal in what you accept"). The XML specification defines a valid XML document as a well-formed XML document which also conforms to
1972-423: The case of C1 characters, this restriction is a backwards incompatibility; it was introduced to allow common encoding errors to be detected. The code point U+0000 (Null) is the only character that is not permitted in any XML 1.1 document. The Unicode character set can be encoded into bytes for storage or transmission in a variety of different ways, called "encodings". Unicode itself defines encodings that cover
2030-484: The core responsibility of a media server. With telephony networks moving towards VoIP technology, and using Session Initiation Protocol (SIP), the idea of media servers has started to gain some traction. Typically, an application (the 'application server') has the controlling logic, and controls a remote media server (or multiple servers) over an IP connection, possibly using SIP. Protocols such as Netann, MSCML and MSML have been created for this way of working, and
2088-429: The data structure and contain metadata . What is within the tags is data, encoded in the way the XML standard specifies. An additional XML schema (XSD) defines the necessary metadata for interpreting and validating XML. (This is also referred to as the canonical schema.) An XML document that adheres to basic XML rules is "well-formed"; one that adheres to its schema is "valid." IETF RFC 7303 (which supersedes
2146-424: The designer mapping their full set out in 3D in a pre-visualization tool. In the world of telephony, a media server is the computing component that processes the audio or video streams associated with telephone calls or connections. Conference services are a particular example of how media servers can be used, as a special 'engine' is needed to mix audio streams together so that conference participants can hear all of
2204-471: The development of media servers designed specifically for live events. These machines are often high-spec PC computers with high-speed hard drive technologies such as RAID arrays or solid-state drives , multiple GPUs and optionally a video capture board to allow mixing live video with recorded content and real-time effects. The supplied software is usually a suite of tools starting with the main VJ environment where
2262-442: The direct use of almost any Unicode character in element names, attributes, comments, character data, and processing instructions (other than the ones that have special symbolic meaning in XML itself, such as the less-than sign, "<"). The following is a well-formed XML document including Chinese , Armenian and Cyrillic characters: The XML specification defines an XML document as a well-formed text, meaning that it satisfies
2320-523: The entire repertoire; well-known ones include UTF-8 (which the XML standard recommends using, without a BOM ) and UTF-16 . There are many other text encodings that predate Unicode, such as ASCII and various ISO/IEC 8859 ; their character repertoires are in every case subsets of the Unicode character set. XML allows the use of any of the Unicode-defined encodings and any other encodings whose characters also appear in Unicode. XML also provides
2378-498: The following ranges are valid in XML 1.0 documents: XML 1.1 extends the set of allowed characters to include all the above, plus the remaining characters in the range U+0001–U+001F. At the same time, however, it restricts the use of C0 and C1 control characters other than U+0009 (Horizontal Tab), U+000A (Line Feed), U+000D (Carriage Return), and U+0085 (Next Line) by requiring them to be written in escaped form (for example U+0001 must be written as  or its equivalent). In
VoiceXML - Misplaced Pages Continue
2436-708: The functions performing the parsing, or passed down (as function parameters) into lower-level functions, or returned (as function return values) to higher-level functions. Examples of pull parsers include Data::Edit::Xml in Perl , StAX in the Java programming language, XMLPullParser in Smalltalk , XMLReader in PHP , ElementTree.iterparse in Python , SmartXML in Red , System.Xml.XmlReader in
2494-602: The home. By definition, a media server is a device that simply stores and shares media. This definition is vague, and can allow several different devices to be called media servers. It may be a NAS drive, a home theater PC running Windows XP Media Center Edition , MediaPortal or MythTV , or a commercial web server that hosts media for a large web site. In a home setting, a media server acts as an aggregator of information: video , audio , photos, books, etc. These different types of media (whether they originated on DVD , CD , digital camera , or in physical form) are stored on
2552-426: The media files due to disk failure as well. Many media servers also have the ability to capture media. This is done with specialized hardware such as TV tuner cards. Analog TV tuner cards can capture video from analog broadcast TV and output from cable/satellite set top boxes. This analog video then needs to be encoded in digital format to be stored on the media server. This encoding can be done with software running on
2610-425: The media server's hard drive . Access to these is then available from a central location. It may also be used to run special applications that allow the user(s) to access the media from a remote location via the internet. The only requirement for a media server is a method of storing media and a network connection with enough bandwidth to allow access to that media. Depending on the uses and applications that it runs,
2668-550: The older RFC 3023 ), provides rules for the construction of media types for use in XML message. It defines three media types: application/xml ( text/xml is an alias), application/xml-external-parsed-entity ( text/xml-external-parsed-entity is an alias) and application/xml-dtd . They are used for transmitting raw XML files without exposing their internal semantics . RFC 7303 further recommends that XML-based languages be given media types ending in +xml , for example, image/svg+xml for SVG . Further guidelines for
2726-415: The other participants. Conferencing servers may also need other specialized functions like "loudest talker" detection, or transcoding of audio streams, and also interpreting DTMF tones used to navigate menus. For video processing, it may be needed to change video streams, for example transcode from one video codec to another or rescale a picture from one size to another. These media processing functions are
2784-449: The processing of XML data. The main purpose of XML is serialization , i.e. storing, transmitting, and reconstructing arbitrary data. For two disparate systems to exchange information, they need to agree upon a file format. XML standardizes this process. It is therefore analogous to a lingua franca for representing information. As a markup language , XML labels, categorizes, and structurally organizes information. XML tags represent
2842-487: The rules of a Document Type Definition (DTD). In addition to being well formed, an XML document may be valid . This means that it contains a reference to a Document Type Definition (DTD), and that its elements and attributes are declared in that DTD and follow the grammatical rules for them that the DTD specifies. XML processors are classified as validating or non-validating depending on whether or not they check XML documents for validity. A processor that discovers
2900-425: The semantic structure returned by the speech recognizer. The Speech Synthesis Markup Language (SSML) is used to decorate textual prompts with information on how best to render them in synthetic speech, for example which speech synthesizer voice to use or when to speak louder or softer. The Pronunciation Lexicon Specification (PLS) is used to define how words are pronounced. The generated pronunciation information
2958-501: The speech recognizer determines the most likely sentence it heard, it needs to extract the semantic meaning from that sentence and return it to the VoiceXML interpreter. This semantic interpretation is specified via the Semantic Interpretation for Speech Recognition (SISR) standard. SISR is used inside SRGS to specify the semantic results associated with the grammars, i.e., the set of ECMAScript assignments that create
SECTION 50
#17327910887223016-415: The standard in different ways, and added proprietary features. But the VoiceXML 2.0 standard, adopted as a W3C Recommendation on 16 March 2004, clarified most areas of difference. The VoiceXML Forum, an industry group promoting the use of the standard, provides a conformance testing process that certifies vendors' implementations as conformant. AT&T Corporation , IBM , Lucent , and Motorola formed
3074-469: The string "--" (double-hyphen) is not allowed inside comments; this means comments cannot be nested. The ampersand has no special significance within comments, so entity and character references are not recognized as such, and there is no way to represent characters outside the character set of the document encoding. An example of a valid comment: <!--no need to escape <code> & such in comments--> XML 1.0 (Fifth Edition) and XML 1.1 support
3132-530: The use of XML in a networked context appear in RFC 3470 , also known as IETF BCP 70, a document covering many aspects of designing and deploying an XML-based language. XML has come into common use for the interchange of data over the Internet. Hundreds of document formats using XML syntax have been developed, including RSS , Atom , Office Open XML , OpenDocument , SVG , COLLADA , and XHTML . XML also provides
3190-472: The use of much more memory, but are often found more convenient for use by programmers; some include declarative retrieval of document components via the use of XPath expressions. XSLT is designed for declarative description of XML document transformations, and has been widely implemented both in server-side packages and Web browsers. XQuery overlaps XSLT in its functionality, but is designed more for searching of large XML databases . Simple API for XML (SAX)
3248-439: The user defines a number of set layers, each switchable between live, stored media or real-time generated, and layering modes between them. All of these introduce parameters of which manipulation of values in these parameters over time become the performance. High-end media server systems include support for DMX512-A , MIDI , Art-Net or similar control protocols to allow multiple remote sources like sliders, buttons and knobs on
3306-426: The vendor support of XML Schemas yet, and are to some extent a grassroots reaction of industrial publishers to the lack of utility of XML Schemas for publishing . Some schema languages not only describe the structure of a particular XML format but also offer limited facilities to influence processing of individual XML files that conform to this format. DTDs and XSDs both have this ability; they can for instance provide
3364-399: Was scrapped. As of December 2022, there are few VoiceXML 2.0/2.1 platform implementations being offered. The W3C's Speech Interface Framework also defines these other standards closely associated with VoiceXML. The Speech Recognition Grammar Specification (SRGS) is used to tell the speech recognizer what sentence patterns it should expect to hear: these patterns are called grammars. Once
#721278