e-text (from " electronic text "; sometimes written as etext ) is a general term for any document that is read in digital form , and especially a document that is mainly text. For example, a computer-based book of art with minimal text, or a set of photographs or scans of pages , would not usually be called an "e-text". An e-text may be a binary or a plain text file, viewed with any open source or proprietary software . An e-text may have markup or other formatting information, or not. An e-text may be an electronic edition of a work originally composed or published in other media, or may be created in electronic form originally. The term is usually synonymous with e-book .
42-721: E-texts, or electronic documents , have been around since long before the Internet, the Web, and specialized E-book reading hardware. Roberto Busa began developing an electronic edition of Aquinas in the 1940s, while large-scale electronic text editing, hypertext , and online reading platforms such as Augment and FRESS appeared in the 1960s. These early systems made extensive use of formatting, markup , automatic tables of contents, hyperlinks , and other information in their texts, as well as in some cases (such as FRESS) supporting not just text but also graphics. In some communities, "e-text"
84-412: A critical edition with footnotes, commentary, critical apparatus, cross-references , or even the simplest tables. This leads to endless practical problems: for example, if the computer cannot reliably distinguish footnotes, it cannot find a phrase that a footnote interrupts. Even raw scanner OCR output usually produces more information than this, such as the use of bold and italic. If this information
126-641: A double line break. In recent decades, the resulting appearance and the lack of a markup possibility have often been perceived as bland and as a drawback of this format. Project Gutenberg attempts to address this by making many texts available in HTML, ePub, and PDF versions as well. HTML versions of older texts are autogenerated versions. Another not-for-profit project, Standard Ebooks , aims to address these issues with its collection of public domain titles that are formatted and styled. It corrects issues related to design and typography. In December 1994, Project Gutenberg
168-733: A few non-text items such as audio files and music-notation files. Most releases are in English, but there are also significant numbers in many other languages. As of April 2016 , the non-English languages most represented are: French, German, Finnish, Dutch, Italian, and Portuguese. Whenever possible, Gutenberg releases are available in plain text , mainly using US-ASCII character encoding but frequently extended to ISO-8859-1 (needed to represent accented characters in French and Scharfes s in German, for example). Besides being copyright-free,
210-415: A scanner, or has it been proofread and corrected? Metadata relating to the text is sometimes included with an e-text, but there is by this definition no way to say whether or where it is preset. At best, the text of the title page might be included (or not), perhaps with centering imitated by indentation. Fifth, texts with more complicated information cannot really be handled at all. A bilingual edition, or
252-418: A work. For example, page numbers, page headers , and footnotes might be omitted, or might simply appear as additional lines of text, perhaps with blank lines before and after (or not). An ornate separator line might be represented instead by a line of asterisks (or not). Chapter and sections titles, likewise, are just additional lines of text: they might be detectable by capitalization if they were all caps in
294-491: Is a trademark of the organization, and the mark cannot be used in commercial or modified redistributions of public domain texts from the project. There is no legal impediment to the reselling of works in the public domain if all references to Project Gutenberg are removed, but Gutenberg contributors have questioned the appropriateness of directly and commercially reusing content that has been formatted by volunteers. There have been instances of books being stripped of attribution to
336-724: Is a volunteer effort to digitize and archive cultural works, as well as to "encourage the creation and distribution of eBooks ." It was founded in 1971 by American writer Michael S. Hart and is the oldest digital library . Most of the items in its collection are the full texts of books or individual stories in the public domain . All files can be accessed for free under an open format layout, available on almost any computer. As of 13 February 2024 , Project Gutenberg had reached 70,000 items in its collection of free eBooks. The releases are available in plain text as well as other formats, such as HTML , PDF , EPUB , MOBI , and Plucker wherever possible. Most releases are in
378-574: Is backed-up regularly and mirrored on servers in many different locations. Project Gutenberg is careful to verify the status of its ebooks according to United States copyright law . Material is added to the Project Gutenberg archive only after it has received a copyright clearance, and records of these clearances are saved for future reference. Project Gutenberg does not claim new copyright on titles it publishes. Instead, it encourages their free reproduction and distribution. Most books in
420-401: Is easy on both the eyes and the computer". Hart made the correct point that proprietary word-processor formats made texts grossly inaccessible; but that is irrelevant to standard, open data formats. The narrow sense of "e-text" is now uncommon, because the notion of "just vanilla ASCII" (attractive at first glance), has turned out to have serious difficulties: First, this narrow type of "e-text"
462-504: Is limited to the English letters. Not even Spanish ñ or the accented vowels used in many European languages cannot be represented (unless awkwardly and ambiguously as "~n" "a'"). Asian, Slavic, Greek, and other writing systems are impossible. Second, diagrams and pictures cannot be accommodated, and many books have at least some such material; often it is essential to the book. Third, "e-texts" in this narrow sense have no reliable way to distinguish "the text" from other things that occur in
SECTION 10
#1732802341050504-648: Is not kept, it is expensive and time-consuming to reconstruct it; more sophisticated information such as what edition you have, may not be recoverable at all. If actuality, even "plain text" uses some kind of "markup"—usually control characters , spaces, tabs, and the like: Spaces between words; two returns and 5 spaces for paragraph. The main difference from more formal markup is that "plain texts" use implicit, usually undocumented conventions, which are therefore inconsistent and difficult to recognize. The narrow sense of e-text as "plain vanilla ASCII" has fallen out of favor. Nevertheless, many such texts are freely available on
546-444: Is simple: 'To encourage the creation and distribution of ebooks ' ". His goal was "to provide as many e-books in as many formats as possible for the entire world to read in as many languages as possible". Likewise, a project slogan is to "break down the bars of ignorance and illiteracy", because its volunteers aim to continue spreading public literacy and appreciation for the literary heritage just as public libraries began to do in
588-402: Is used much more narrowly, to refer to electronic documents that are, so to speak, "plain vanilla ASCII ". By this is meant not only that the document is a plain text file, but that it has no information beyond "the text itself"—no representation of bold or italics, paragraph, page, chapter, or footnote boundaries, etc. Michael S. Hart, for example, argued that this "is the only text mode that
630-514: The English language , but many non-English works are also available. There are multiple affiliated projects that provide additional content, including region- and language-specific works. Project Gutenberg is closely affiliated with Distributed Proofreaders , an Internet-based community for proofreading scanned texts. Project Gutenberg is named after the inventor Johannes Gutenberg , whose works in developing printing technology led to an increase in
672-536: The Internet . Hart believed one day the general public would be able to access computers and decided to make works of literature available in electronic form for free. He used a copy of the United States Declaration of Independence in his backpack, and this became the first Project Gutenberg e-text . He named the project for Johannes Gutenberg , the fifteenth century German printer who propelled
714-454: The internet . Originally, any computer data were considered as something internal—the final data output was always on paper. However, the development of computer networks has made it so that in most cases it is much more convenient to distribute electronic documents than printed ones. The improvements in electronic visual display technologies made it possible to view documents on a screen instead of printing them (thus saving paper and
756-496: The movable type printing press revolution. By the mid-1990s, Hart was running Project Gutenberg from Illinois Benedictine College . More volunteers had joined the effort. He manually entered all of the text until 1989 when image scanners and optical character recognition software improved and became more available, making book scanning more feasible. Hart later came to an arrangement with Carnegie Mellon University , which agreed to administer Project Gutenberg's finances. As
798-511: The "best" e-books from the collection. The CD is available for download as an ISO image . When users are unable to download the CD, they can request to have a copy sent to them, free of charge. In December 2003, a DVD was created containing nearly 10,000 items. At the time, this represented almost the entire collection. In early 2004, the DVD also became available by mail. In July 2007, a new edition of
840-607: The DVD was released containing over 17,000 books, and in April 2010, a dual-layer DVD was released, containing nearly 30,000 items. The majority of the DVDs, and all of the CDs mailed by the project, were recorded on recordable media by volunteers. However, the new dual layer DVDs were manufactured, as it proved more economical than having volunteers burn them. As of October 2010 , the project has mailed approximately 40,000 discs. As of 2017,
882-463: The Federal Court of Justice. As of 4 October 2020 that application was still pending (Federal Court of Justice I ZR 97/19). According to Project Gutenberg Literary Archive Foundation, "In October 2021, the parties reached a settlement agreement. Under the terms of the agreement, Project Gutenberg eBooks by the three authors will be blocked from Germany until their German copyright expires. Under
SECTION 20
#1732802341050924-425: The Project Gutenberg collection are distributed as public domain under United States copyright law. There are also a few copyrighted texts, such as those of science fiction author Cory Doctorow , that Project Gutenberg distributes with permission. These are subject to further restrictions as specified by the copyright holder, although they generally tend to be licensed under Creative Commons . "Project Gutenberg"
966-409: The Web, perhaps as much because they are easily produced as because of any purported portability advantage. For many years Project Gutenberg strongly favored this model of text, but with time, has begun to develop and distribute more capable forms such as HTML . Electronic document An electronic document is a document that can be sent in non-physical means, such as telex , email , and
1008-423: The collection, where UTF-8 is used instead. Other formats may be released as well when submitted by volunteers. The most common non-ASCII format is HTML , which allows markup and illustrations to be included. Some project members and users have requested more advanced formats, believing them to be easier to read. But some formats that are not easily editable, such as PDF , are generally not considered to fit with
1050-644: The delivery of free CDs has been discontinued, though the ISO image is still available for download. As of August 2015 , Project Gutenberg claimed over 72,500 items in its collection, with an average of over 50 new e-books being added each week. These are primarily works of literature from the Western cultural tradition . In addition to literature such as novels, poetry, short stories and drama, Project Gutenberg also has cookbooks , reference works and issues of periodicals. The Project Gutenberg collection also has
1092-691: The different code pages always have been a source of trouble. Even more problems are connected with complex file formats of various word processors , spreadsheets , and graphics software . To alleviate the problem, many software companies distribute free file viewers for their proprietary file formats (one example is Adobe 's Acrobat Reader ). The other solution is the development of standardized non- proprietary file formats (such as HTML and OpenDocument ), and electronic documents for specialized uses have specialized formats—the specialized electronic articles in physics use TeX or PostScript . Project Gutenberg Project Gutenberg ( PG )
1134-581: The goals of Project Gutenberg. Also Project Gutenberg has two options for master formats that can be submitted (from which all other files are generated): customized versions of the Text Encoding Initiative standard (since 2005) and reStructuredText (since 2011). Beginning in 2009, the Project Gutenberg catalog began offering auto-generated alternate file formats, including HTML (when not already provided), EPUB and plucker . Michael Hart said in 2004, "The mission of Project Gutenberg
1176-623: The infringement of copyrights still active in Germany, and asserted that the Project Gutenberg website was under German jurisdiction because it hosts content in the German language and is accessible in Germany. This judgment was confirmed by the Frankfurt Court of Appeal on 30 April 2019 (11 U 27/18 ). The Frankfurt Court of Appeal has not given permission for a further appeal to the Federal Court of Justice (Bundesgerichtshof), however, an application for permission to appeal has been filed with
1218-408: The late 19th century. Project Gutenberg is intentionally decentralized; there is no selection policy dictating what texts to add. Instead, individual volunteers work on what they are interested in, or have available. The Project Gutenberg collection is intended to preserve items for the long term, so they cannot be lost by any one localized accident. In an effort to ensure this, the entire collection
1260-780: The mass availability of books and other text. Michael S. Hart began Project Gutenberg in 1971 with the digitization of the United States Declaration of Independence . Hart, a student at the University of Illinois , obtained access to a Xerox Sigma V mainframe computer in the university's Materials Research Lab. Through friendly operators, he received an account with a virtually unlimited amount of computer time ; its value at that time has since been variously estimated at $ 100,000 or $ 100,000,000. Hart explained he wanted to "give back" this gift by doing something one could consider to be of great value. His initial goal
1302-402: The original (or not). Even to discover what conventions (if any) were used, makes each book a new research or reverse-engineering project. In consequence of this, such texts cannot be reliably re-formatted. A program cannot reliably tell where footnotes, headers or footers are, or perhaps even paragraphs, so it cannot re-arrange the text, for example to fit a narrower screen, or read it aloud for
E-text - Misplaced Pages Continue
1344-700: The project and sold for profit in the Kindle Store and other booksellers, one being the 1906 book Fox Trapping . From 2018 to 2021, the Project Gutenberg website was not accessible within Germany , as a result of a court order from S. Fischer Verlag regarding the works of Heinrich Mann , Thomas Mann and Alfred Döblin . Although they were in the public domain in the United States, the German court (Frankfurt am Main Regional Court) recognized
1386-465: The project's popularity. Starting in 2004, an improved online catalog made Project Gutenberg content easier to browse, access and hyperlink . Project Gutenberg is now hosted by ibiblio at the University of North Carolina at Chapel Hill . Hart died on 6 September 2011 at his home in Urbana, Illinois, at the age of 64. In August 2003, Project Gutenberg created a CD containing approximately 600 of
1428-459: The requirement for a Latin ( character set ) text version of the release had been a criterion of Michael Hart's since the founding of Project Gutenberg, as he believed it was the format most likely to be readable in the extended future. Out of necessity, this criterion has had to be extended further for the sizable collection of texts in East Asian languages such as Chinese and Japanese now in
1470-400: The space required to store the printed copies). However, using electronic documents for the final presentation instead of paper has created the problem of multiple incompatible file formats . Even plain text computer files are not free from this problem—e.g. under MS-DOS , most programs could not work correctly with UNIX -style text files (see newline ), and for non-English speakers,
1512-620: The terms of the settlement, the all-Germany block is no longer in place. Other terms of the settlement are confidential." The Project Gutenberg website has been blocked in Italy since May 2020, as part of a larger effort to block websites that publish newspapers and journals that are protected by copyright in Italy. The text files use the format of plain text encoded in UTF-8 and are typically wrapped at 65–70 characters, with paragraphs separated by
1554-400: The visually impaired. Programs might apply heuristics to guess at the structure, but this can easily fail. Fourth, and a perhaps surprisingly important issue, a "plain-text" e-text affords no way to represent information about the work. For example, is it the first or the tenth edition? Who prepared it, and what rights do they reserve or grant to others? Is this the raw version straight off
1596-488: The volume of e-texts increased, volunteers began to take over the project's day-to-day operations that Hart had run. Italian volunteer Pietro Di Miceli developed and administered the first Project Gutenberg website and started the development of the Project online Catalog. In his ten years in this role (1994–2004), the Project web pages won a number of awards, often being featured in "best of the Web" listings, contributing to
1638-524: Was begun by Michael Hart and John S. Guagliardo to provide low-cost intellectual properties. The initial name for this project was Project Gutenberg 2 (PG II), which created controversy among PG volunteers because of the re-use of the project's trademarked name for a commercial venture. In 2000, a non-profit corporation , the Project Gutenberg Literary Archive Foundation, Inc. 501(c)(3) EIN : 64-6221541
1680-555: Was chartered in Mississippi , United States , to handle the project's legal needs. Donations to it are tax-deductible . Gregory B. Newby, while assistant professor at UNC School of Information and Library Science , and a long-time Project Gutenberg volunteer, in 2001, became the foundation's first CEO , later Arctic Region Supercomputing Center Director, later Compute Canada's Chief Technology Officer. All sister projects are independent organizations that share
1722-503: Was criticized by the Text Encoding Initiative for failing to include documentation or discussion of the decisions unavoidable in preparing a text, or in some cases, not documenting which of several (conflicting) versions of a text has been the one digitized. The selection of works (and editions) available has been determined by popularity, ease of scanning, being out of copyright, and other factors; this would be difficult to avoid in any crowd-sourced project. In March 2004, an initiative
E-text - Misplaced Pages Continue
1764-437: Was to make the 10,000 most consulted books available to the public at little or no charge by the end of the 20th century. On July 4, 1971, after being inspired by a free printed copy of the U.S. Declaration of Independence, he decided to type the text into a computer, and to transmit it to other users on the computer network. This particular computer was one of the 15 nodes on ARPANET , the computer network that would become
#49950