MHTML , an initialism of " MIME encapsulation of aggregate HTML documents", is a Web archive file format used to combine, in a single computer file , the HTML code and its companion resources (such as images) that are represented by external hyperlinks in the web page's HTML code. The content of an MHTML file is encoded using the same techniques that were first developed for HTML email messages, using the MIME content type multipart/related . MHTML files use an .mhtml or .mht filename extension .
51-483: The first part of the file is an e-mail header . The second part is normally HTML code. Subsequent parts are additional resources identified by their original uniform resource locators (URLs) and encoded in base64 binary-to-text encoding . MHTML was proposed as an open standard, then circulated in a revised edition in 1999 as RFC 2557. The .mhtml (Web archive) and .eml (email) filename extensions are interchangeable: either filename extension can be changed from one to
102-688: A ZIP archive file consisting of XHTML files carrying the content, along with images and other supporting files. EPUB is the most widely supported vendor-independent XML -based e-book format; it is supported by almost all hardware readers and many software readers and mobile apps . A successor to the Open eBook Publication Structure , EPUB 2.0 was approved in October 2007, with a maintenance update (2.0.1) approved in September 2010. The EPUB 3.0 specification became effective in October 2011, superseded by
153-473: A secure connection to the website . Internet users are distributed throughout the world using a wide variety of languages and alphabets, and expect to be able to create URLs in their own local alphabets. An Internationalized Resource Identifier (IRI) is a form of URL that includes Unicode characters. All modern browsers support IRIs. The parts of the URL requiring special treatment for different alphabets are
204-472: A DRM scheme to their liking. However, future versions of EPUB (specifically OCF) may specify a format for DRM. The EPUB specification does not enforce or suggest a particular DRM scheme. This could affect the level of support for various DRM systems on devices and the portability of purchased e-books. Consequently, such DRM incompatibility may segment the EPUB format along the lines of DRM systems, undermining
255-511: A boundary string that is not followed by any data. Some browsers support the MHTML format, either directly or through third-party extensions, but the process for saving a web page along with its resources as an MHTML file is not standardized. Due to this, a web page saved as an MHTML file using one browser may render differently on another. As of version 5.0, IE was the first browser to support reading and saving web pages and external resources to
306-517: A double slash ( // ). Berners-Lee later expressed regret at the use of dots to separate the parts of the domain name within URIs , wishing he had used slashes throughout, and also said that, given the colon following the first component of a URI, the two slashes before the domain name were unnecessary. Early WorldWideWeb collaborators including Berners-Lee originally proposed the use of UDIs: Universal Document Identifiers. An early (1993) draft of
357-560: A fixed version in time. The W3C announced version 3.3 on May 25, 2023. Changes included stricter security and privacy standards; and the adoption of the WebP and Opus media formats. The format and many readers support the following: An EPUB file can optionally contain DRM as an additional layer, but it is not required by the specifications. In addition, the specification does not name any particular DRM system to use, so publishers can choose
408-622: A minor maintenance update (3.0.1) in June 2014. New major features include support for precise layout or specialized formatting (Fixed Layout Documents), such as for comic books, and MathML support. The current version of EPUB is 3.2, effective May 8, 2019. The (text of) format specification underwent reorganization and clean-up; format supports remotely hosted resources and new font formats ( WOFF 2.0 and SFNT ) and uses more pure HTML and CSS . In May 2016 IDPF members approved World Wide Web Consortium (W3C) merger, "to fully align
459-504: A researcher noted that attackers could build malicious documents by creating an MHT file, appending an MSO object at the end (MSO is a file format used by the Microsoft Outlook e-mail application), and renaming the resulting file with a .doc extension. The delivery method would be by spam emails. In April 2019, a security researcher published details about an XML external entity (XXE) vulnerability that could be exploited when
510-474: A root element package and four child elements: metadata , manifest , spine , and guide . Furthermore, the package node must have the unique-identifier attribute. The .opf file's mimetype is application/oebps-package+xml . The metadata element contains all the metadata information for a particular EPUB file. Three metadata tags are required (though many more are available): title , language , and identifier . title contains
561-746: A single MHTML file. As of switching to the Chromium source code , Edge supports saving as MHTML. Support for saving web pages as MHTML files was made available in the Opera 9.0 web browser. From Opera 9.50 through the rest of the Presto-based Opera product line (currently at Opera 12.16 as of 19 July 2013), the default format for saving pages is MHTML. The initial release of the new Webkit/Blink-based Opera (Opera 15) did not support MHTML, but subsequent releases (Opera 16 onwards) do. MHTML can be enabled by typing "opera://flags#save-page-as-mhtml" at
SECTION 10
#1732775720542612-507: A subset of XHTML. There are, however, a few restrictions on certain elements. The mimetype for XHTML documents in EPUB is application/xhtml+xml . Styling and layout are performed using a subset of CSS 2.0, referred to as OPS Style Sheets . This specialized syntax requires that reading systems support only a portion of CSS properties and adds a few custom properties. Custom properties include oeb-page-head, oeb-page-foot, and oeb-column-number . Font-embedding can be accomplished using
663-542: A table of all required mimetypes, see Section 1.3.7 of the specification. Unicode is required, and content producers must use either UTF-8 or UTF-16 encoding. This is to support international and multilingual books. However, reading systems are not required to provide the fonts necessary to display every Unicode character, though they are required to display at least a placeholder for characters that cannot be displayed fully. An example skeleton of an XHTML file for EPUB looks like this: The OPF specification's purpose
714-514: A text document in ASCII that contains the string application/epub+zip . It must also be uncompressed, unencrypted, and the first file in the ZIP archive. This file provides a more reliable way for applications to identify the mimetype of the file than just the .epub extension. Also, there must be a folder named META-INF , which contains the required file container.xml . This XML file points to
765-588: A third-party extension to be installed in the browser. The Mozilla Archive Format (MAFF) is a legacy Web archive file format that was supported by Firefox from 2004 to 2018 through an add-on. Unlike both MHTML and data URIs, MAFF uses a ZIP container to preserve both the HTML file and its external elements. In October 2017 the add-on developer announced the format would no longer be supported in future versions of Firefox. URL A uniform resource locator ( URL ), colloquially known as an address on
816-400: A unique randomized boundary string for separating resources contained within the file. The boundary string is defined at the beginning and used throughout the file. Then, the page resources are contained sequentially, starting with the page's rendered HTML source code. Each resource has its own metadata header which specifies its MIME type and the original location. The MHTML file ends with
867-634: A user opens an MHT file. Since the Windows operating system is set to automatically open all MHT files, by default, in Internet Explorer, the exploit could be triggered when a user double-clicked on a file that they received via email, instant messaging, or another vector, including a different browser. The data URI scheme offers an alternative for including separate elements such as images, style-sheets and scripts in-line when serving an HTML request or saving an HTML resource for offline use. Like
918-437: Is empty if it has no characters; the scheme component is always non-empty. The authority component consists of subcomponents : This is represented in a syntax diagram as: [REDACTED] The URI comprises: A web browser will usually dereference a URL by performing an HTTP request to the specified host, by default on port number 80. URLs using the https scheme require that requests and responses be made over
969-399: Is freely available, MozArchiver, a fork of Mozilla Archive Format extension. GNOME Web added support for read and save web pages in MHTML since version 3.14.1 released in September 2014. There are commercial software products for viewing MHTML files and converting them to other formats, such as PDF and ePub . Some HTML editor programs can view and edit MHTML files. MIME type for MHTML
1020-443: Is not well agreed upon. Used MIME types include: Problem Steps Recorder for Windows can save its output to MHT format. The "Save to Google Drive" extension for Google Chrome can save as MHTML as one of its outputs. Microsoft OneNote , starting with OneNote 2010, emails individual pages as .mht files. Evernote for Windows can export notes as MHT format, as an alternative to HTML or its own native .enex format. In May 2015,
1071-404: Is to "[define] the mechanism by which the various components of an OPS publication are tied together and provides additional structure and semantics to the electronic publication". This is accomplished by two XML files with the extensions .opf and .ncx . The OPF file, traditionally named content.opf , houses the EPUB book's metadata, file manifest, and linear reading order. This file has
SECTION 20
#17327757205421122-468: The @font-face property, as well as including the font file in the OPF's manifest (see below). The mimetype for CSS documents in EPUB is text/css . EPUB also requires that PNG , JPEG , GIF , and SVG images be supported using the mimetypes image/png, image/jpeg, image/gif, image/svg+xml . Other media types are allowed, but creators must include alternative renditions using supported types. For
1173-595: The HTML5 , JavaScript , CSS, SVG formats, making EPUB readers use the same technology as web browsers. Such formats are associated with various types of security issues and privacy-breaching behaviors e.g. Web beacons , CSRF , XSHM due to their complexity and flexibility. Such vulnerabilities can be used to implement web tracking and cross-device tracking on EPUB files. Security researchers also identified attacks leading to local files and other user data being uploaded. The "EPUB 3.1 Overview" document provides
1224-546: The International Digital Publishing Forum (IDPF). It became an official standard of the IDPF in September 2007, superseding the older Open eBook (OEB) standard. The Book Industry Study Group endorses EPUB 3 as the format of choice for packaging content and has stated that the global book publishing industry should rally around a single standard. Technically, a file in the EPUB format is
1275-587: The Internet Engineering Task Force (IETF), as an outcome of collaboration started at the IETF Living Documents birds of a feather session in 1992. The format combines the pre-existing system of domain names (created in 1985) with file path syntax, where slashes are used to separate directory and filenames . Conventions already existed where server names could be prefixed to complete file paths, preceded by
1326-517: The Web , is a reference to a resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifier (URI), although many people use the two terms interchangeably. URLs occur most commonly to reference web pages ( HTTP / HTTPS ) but are also used for file transfer ( FTP ), email ( mailto ), database access ( JDBC ), and many other applications. Most web browsers display
1377-610: The macOS version includes a print-to- PDF feature. As with most other modern web browsers, support for MHTML files can be added to Safari via various third-party extensions. As of version 3.5.7, KDE 's Konqueror web browser does not support MHTML files. An extension project, mhtconv, can be used to allow saving and viewing of MHTML files. NetFront 3.4 (on devices such as the Sony Ericsson K850 ) can view and save MHTML files. Pale Moon requires an extension to be installed to read and write MHT files. One extension
1428-426: The "vivaldi://flags/#save-page-as-mhtml" option. Mozilla Firefox does not support MHTML. Until the advent of version 57 ("Firefox Quantum") , MHT files could be read and written by installing a browser extension , such as Mozilla Archive Format or UnMHT. From version 3.1.1 onwards, Apple Inc. 's Safari web browser does not natively support the MHTML format. Instead, Safari supports the webarchive format, and
1479-523: The HTML Specification referred to "Universal" Resource Locators. This was dropped some time between June 1994 ( RFC 1630 ) and October 1994 (draft-ietf-uri-url-08.txt). In his book Weaving the Web , Berners-Lee emphasizes his preference for the original inclusion of "universal" in the expansion rather than the word "uniform", to which it was later changed, and he gives a brief account of
1530-767: The NCX specification as used in EPUB is in Section 2.4.1 of the specification. The complete specification for NCX can be found in Section 8 of the Specifications for the Digital Talking Book . An example .ncx file: An EPUB file is a group of files that conform to the OPS/OPF standards and are wrapped in a ZIP file. The OCF specifies how to organize these files in the ZIP, and defines two additional files that must be included. The mimetype file must be
1581-506: The OPF file. Also, the meta name="dtb:depth" element is set equal to the depth of the navMap element. navPoint elements can be nested to create a hierarchical table of contents. navLabel 's content is the text that appears in the table of contents generated by reading systems that use the .ncx. navPoint 's content element points to a content document listed in the manifest and can also include an element identifier (e.g. #section1 ). A description of certain exceptions to
MHTML - Misplaced Pages Continue
1632-558: The URL of a web page above the page in an address bar . A typical URL could have the form http://www.example.com/index.html , which indicates a protocol ( http ), a hostname ( www.example.com ), and a file name ( index.html ). Uniform Resource Locators were defined in RFC 1738 in 1994 by Tim Berners-Lee , the inventor of the World Wide Web , and the URI working group of
1683-467: The XHTML content documents in their linear reading order. Also, any content document that can be reached through linking or the table of contents must be listed as well. The toc attribute of spine must contain the id of the NCX file listed in the manifest. Each itemref element's idref is set to the id of its respective content document. The guide element is an optional element for
1734-581: The address bar. Creating MHTML files in Google Chrome is enabled by default in version 86. Creating MHTML (multipart/related) files in Yandex Browser is enabled by default in version 22.7.4.960 (July 2022). Similarly to Google Chrome, the Chromium -based Vivaldi browser can save webpages as MHTML files since the 2.3 release. It supports both reading and writing MHTML files by toggling
1785-490: The advantages of a single standard format and confusing the consumer. DRMed EPUB files must contain a file called rights.xml within the META-INF directory at the root level of the ZIP container. EPUB is widely used on software readers such as Google Play Books on Android and Apple Books on iOS and macOS and Amazon Kindle 's e-readers, but not by associated apps for other platforms. iBooks also supports
1836-411: The content document, and a subset of CSS to provide layout and formatting. XML is used to create the document manifest, table of contents , and EPUB metadata . Finally, the files are bundled in a zip file as a packaging format. An EPUB file uses XHTML 1.1 (or DTBook) to construct the content of a book as of version 2.0.1. This is different from previous versions ( OEBPS 1.2 and earlier), which used
1887-406: The contention that led to the change. Every HTTP URL conforms to the syntax of a generic URI. The URI generic syntax consists of five components organized hierarchically in order of decreasing significance from left to right: A component is undefined if it has an associated delimiter and the delimiter does not appear in the URI; the scheme and path components are always defined. A component
1938-665: The domain name and path. The domain name in the IRI is known as an Internationalized Domain Name (IDN). Web and Internet software automatically convert the domain name into punycode usable by the Domain Name System ; for example, the Chinese URL http://例子.卷筒纸 becomes http://xn--fsqu00a.xn--3lr804guic/ . The xn-- indicates that the character was not originally ASCII . The URL path name can also be specified by
1989-470: The embedded content within MHTML, data URIs use Base64 encoding of the external resources (which may be binary or text) to embed them in-line within the HTML markup. HTML pages saved with external elements embedded using the data URI scheme are standard web pages, and can be opened by any modern browser, including browsers not supporting MHTML such as Mozilla Firefox. Unlike MHTML, saving web pages with their external resources embedded using data URIs requires
2040-429: The file defining the contents of the book. This is the OPF file, though additional alternative rootfile elements are allowed. Apart from mimetype and META-INF/container.xml , the other files (OPF, NCX, XHTML, CSS and images files) are traditionally put in a directory named OEBPS . An example file structure: An example container.xml, given the above file structure: The EPUB 3.0 Recommended Specification
2091-404: The files contained in the package. Each file is represented by an item element, and has the attributes id , href , media-type . All XHTML (content documents), stylesheets, images or other media, embedded fonts, and the NCX file should be listed here. Only the .opf file itself, the container.xml , and the mimetype files should not be included. The spine element lists all
MHTML - Misplaced Pages Continue
2142-480: The hierarchical table of contents for the EPUB file. The specification for NCX was developed for Digital Talking Book (DTB), is maintained by the DAISY Consortium , and is not a part of the EPUB specification. The NCX file has a mimetype of application/x-dtbncx+xml . Of note here is that the values for the docTitle , docAuthor , and meta name="dtb:uid" elements should match their analogs in
2193-406: The other. An .eml message can be sent by e-mail, and it can be displayed by an email client . An email message can be saved using a .mhtml or .mht filename extension and then opened for display in a web browser or for editing other programs, including word processors and text editors . The header of an MHTML file contains metadata such as a date and time stamp , page title, the source URL, and
2244-605: The proprietary iBook format, which is based on the EPUB format but depends upon code from the iBooks app to function. EPUB is a popular format for electronic data interchange because it can be an open format and is based on HTML, as opposed to Amazon's proprietary format for Kindle readers. Popular EPUB producers of public domain and open licensed content include Project Gutenberg , Standard Ebooks , PubMed Central , SciELO and others. In 2022, Amazon 's Send to Kindle service removed support for its own Kindle File Format in favor of EPUB. EPUB requires readers to support
2295-402: The protocol of the current page, typically HTTP or HTTPS. EPub EPUB is an e-book file format that uses the ".epub" file extension . The term is short for electronic publication and is sometimes stylized as ePUB . EPUB is supported by many e-readers , and compatible software is available for most smartphones, tablets, and computers. EPUB is a technical standard published by
2346-473: The publishing industry and core Web technology". EPUB 2.0 was approved in October 2007, with a maintenance update (2.0.1) intended to clarify and correct errata in the specifications being approved in September 2010. EPUB version 2.0.1 consists of three specifications: EPUB internally uses XHTML or DTBook (an XML standard provided by the DAISY Consortium) to represent the text and structure of
2397-416: The purpose of identifying fundamental structural components of the book. Each reference element has the attributes type , title , href . Files referenced in href must be listed in the manifest, and are allowed to have an element identifier (e.g. #figures in the example). An example OPF file: The NCX file ( N avigation C ontrol file for X ML), traditionally named toc.ncx , contains
2448-448: The title of the book, language contains the language of the book's contents in RFC 3066 format or its successors, such as the newer RFC 4646 and identifier contains a unique identifier for the book, such as its ISBN or a URL . The identifier 's id attribute should equal the unique-identifier attribute from the package element. The manifest element lists all
2499-626: The user in the local writing system. If not already encoded, it is converted to UTF-8 , and any characters not part of the basic URL character set are escaped as hexadecimal using percent-encoding ; for example, the Japanese URL http://example.com/引き割り.html becomes http://example.com/%E5%BC%95%E3%81%8D%E5%89%B2%E3%82%8A.html . The target computer decodes the address and displays the page. Protocol-relative links (PRL), also known as protocol-relative URLs (PRURL), are URLs that have no protocol specified. For example, //example.com will use
2550-408: Was approved on 11 October 2011. On June 26, 2014, EPUB 3.0.1 was approved as a minor maintenance update to EPUB 3.0. EPUB 3.0 supersedes the previous release 2.0.1. EPUB 3 consists of a set of four specifications: The EPUB 3.0 format was intended to address the following criticisms: On June 26, 2014, the IDPF published EPUB 3.0.1 as a final Recommended Specification. In November 2014, EPUB 3.0
2601-511: Was published by the ISO / IEC as ISO/IEC TS 30135 (parts 1–7). In January 2020, EPUB 3.0.1 was published by the ISO / IEC as ISO/IEC 23736 (parts 1–6). EPUB 3.2 was announced in 2018, and the final specification was released in 2019. A notable change is the removal of a specialized subset of CSS, enabling the use of non-epub-prefixed properties. The references to HTML and SVG standards are also updated to "newest version available", as opposed to
SECTION 50
#1732775720542#541458