Electronic discovery (also ediscovery or e-discovery ) refers to discovery in legal proceedings such as litigation , government investigations, or Freedom of Information Act requests, where the information sought is in electronic format (often referred to as electronically stored information or ESI). Electronic discovery is subject to rules of civil procedure and agreed-upon processes, often involving review for privilege and relevance before data are turned over to the requesting party.
142-481: Electronic information is considered different from paper information because of its intangible form, volume, transience and persistence. Electronic information is usually accompanied by metadata that is not found in paper documents and that can play an important part as evidence (e.g. the date and time a document was written could be useful in a copyright case). The preservation of metadata from electronic documents creates special challenges to prevent spoliation . In
284-631: A page description . PDF has (as of version 2.0) 25 graphics state properties, of which some of the most important are: As in PostScript, vector graphics in PDF are constructed with paths . Paths are usually composed of lines and cubic Bézier curves , but can also be constructed from the outlines of text. Unlike PostScript, PDF does not allow a single path to mix text outlines with lines and curves. Paths can be stroked, filled, fill then stroked, or used for clipping . Strokes and fills can use any color set in
426-427: A relational database to categorize cultural works and their images. Relational databases and metadata work to document and describe the complex relationships amongst cultural objects and multi-faceted works of art, as well as between objects and places, people, and artistic movements. Relational database structures are also beneficial within collecting institutions and museums because they allow for archivists to make
568-454: A "data element" registry, its purpose is to support describing and registering metadata content independently of any particular application, lending the descriptions to being discovered and reused by humans or computers in developing new applications, databases, or for analysis of data collected in accordance with the registered metadata content. This standard has become the general basis for other kinds of metadata registries, reusing and extending
710-618: A 10-page Microsoft Word document) with a load file for use in image-based discovery review database applications. Increasingly, database review applications have embedded native file viewers with TIFF capabilities. With both native and image file capabilities, it could either increase or decrease the total necessary storage since there may be multiple formats and files associated with each individual native file. Deployment, storage, and best practices are becoming especially critical and necessary to maintain cost-effective strategies. Structured data are most often produced in delimited text format. When
852-496: A PDF are: In later PDF revisions, a PDF document can also support links (inside document or web page), forms, JavaScript (initially available as a plugin for Acrobat 3.0), or any other types of embedded contents that can be handled using plug-ins. PDF combines three technologies: PostScript is a page description language run in an interpreter to generate an image. It can handle graphics and has standard features of programming languages such as branching and looping . PDF
994-455: A PDF. Within text strings, characters are shown using character codes (integers) that map to glyphs in the current font using an encoding . There are several predefined encodings, including WinAnsi , MacRoman , and many encodings for East Asian languages and a font can have its own built-in encoding. (Although the WinAnsi and MacRoman encodings are derived from the historical properties of
1136-420: A PostScript file could be accurately rendered only as the cumulative result of executing all preceding commands to draw all previous pages—any of which could affect subsequent pages—plus the commands to draw that particular page, and there was no easy way to bypass that process to skip around to different pages. Traditionally, to go from PostScript to PDF, a source PostScript file (that is, an executable program)
1278-477: A Web browser plugin without waiting for the entire file to download, since all objects required for the first page to display are optimally organized at the start of the file. PDF files may be optimized using Adobe Acrobat software or QPDF . Page dimensions are not limited by the format itself. However, Adobe Acrobat imposes a limit of 15 million by 15 million inches, or 225 trillion in (145,161 km ). The basic design of how graphics are represented in PDF
1420-772: A class-attribute-value triple. The first 2 elements of the triple (class, attribute) are pieces of some structural metadata having a defined semantic. The third element is a value, preferably from some controlled vocabulary, some reference (master) data. The combination of the metadata and master data elements results in a statement which is a metacontent statement i.e. "metacontent = metadata + master data". All of these elements can be thought of as "vocabulary". Both metadata and master data are vocabularies that can be assembled into metacontent statements. There are many sources of these vocabularies, both meta and master data: UML, EDIFACT, XSD, Dewey/UDC/LoC, SKOS, ISO-25964, Pantone, Linnaean Binomial Nomenclature, etc. Using controlled vocabularies for
1562-417: A clear distinction between cultural objects and their images; an unclear distinction could lead to confusing and inaccurate searches. An object's materiality, function, and purpose, as well as the size (e.g., measurements, such as height, width, weight), storage requirements (e.g., climate-controlled environment), and focus of the museum and collection, influence the descriptive depth of the data attributed to
SECTION 10
#17328017187681704-415: A complete identification of data sources. Since the scope of data can be overwhelming or uncertain in this phase, attempts are made to reasonably reduce the overall scope during this phase - such as limiting the identification of documents to a certain date range or custodians. A duty to preserve begins upon the reasonable anticipation of litigation. Data identified as potentially relevant during preservation
1846-916: A finding of spoliation of evidence and the imposition of one or more sanctions, including adverse inference jury instructions, summary judgment , monetary fines, and other sanctions. In some cases, such as Qualcomm v. Broadcom , attorneys can be brought before the bar. Structured data typically resides in databases or datasets. It is organized in tables with columns, rows, and defined data types. The most common are Relational Database Management Systems ( RDBMS ) that are capable of handling large volumes of data such as Oracle , IBM Db2 , Microsoft SQL Server , Sybase , and Teradata . The structured data domain also includes spreadsheets (not all spreadsheets contain structured data, but those that have data organized in database-like tables), desktop databases like FileMaker Pro and Microsoft Access , structured flat files , XML files, data marts , data warehouses , etc. Voicemail
1988-485: A key topic in efforts toward international standardization . Standards for metadata in digital libraries include Dublin Core , METS , MODS , DDI , DOI , URN , PREMIS schema, EML , and OAI-PMH . Leading libraries in the world give hints on their metadata standards strategies. The use and creation of metadata in library and information science also include scientific publications: Metadata for scientific publications
2130-408: A library might hold in its collection. Until the 1980s, many library catalogs used 3x5 inch cards in file drawers to display a book's title, author, subject matter, and an abbreviated alpha-numeric string ( call number ) which indicated the physical location of the book within the library's shelves. The Dewey Decimal System employed by libraries for the classification of library materials by subject
2272-473: A manner independent of application software , hardware , and operating systems . Based on the PostScript language, each PDF file encapsulates a complete description of a fixed-layout flat document, including the text, fonts , vector graphics , raster images and other information needed to display it. PDF has its roots in "The Camelot Project" initiated by Adobe co-founder John Warnock in 1991. PDF
2414-488: A method for document review since around 2004. Because it requires the review of documents in their original file formats, applications and toolkits capable of opening multiple file formats have also become popular. This is also true in the ECM (Enterprise Content Management) storage markets, which converge quickly with ESI technologies. Petrification involves the conversion of native files into an image format that does not require
2556-452: A page description as an inline image .) Images are typically filtered for compression purposes. Image filters supported in PDF include the following general-purpose filters: Normally all image content in a PDF is embedded in the file. But PDF allows image data to be stored in external files by the use of external streams or Alternate Images . Standardized subsets of PDF, including PDF/A and PDF/X , prohibit these features. Text in PDF
2698-417: A paper for a project then code-named Camelot, in which he proposed the creation of a simplified version of PostScript called Interchange PostScript (IPS). Unlike traditional PostScript, which was tightly focused on rendering print jobs to output devices, IPS would be optimized for displaying pages to any screen and any platform. Adobe Systems made the PDF specification available free of charge in 1993. In
2840-620: A party to decline to use TAR. We are not there yet. Thus, despite what the Court might want a responding party to do, Sedona Principle 6 controls. Hyles' application to force the City to use TAR is DENIED. Grossman and Cormack define TAR in Federal Courts Law Review as: A process for Prioritizing or Coding a Collection of Documents using a computerized system that harnesses human judgments of one or more Subject Matter Expert(s) on
2982-416: A petrified format (such as PDF or TIFF ) alongside metadata . Displaying and explaining evidence before audiences (at depositions, hearings, trials, etc.). The idea is that the audience understands the presentation, and non-professionals can follow the interpretation. Clarity and ease of understanding are the focus here. The native form of data needs to be abstracted, visualized, and broad into context for
SECTION 20
#17328017187683124-790: A petrified, paper-like format (such as PDF or TIFF) at this stage to allow for easier redaction and bates-labeling . Modern processing tools can also employ advanced analytic tools to help document review attorneys more accurately identify potentially relevant documents. During the review phase, documents are reviewed for responsiveness to discovery requests and for privilege. Different document review platforms and services can assist in many tasks related to this process, including rapidly identifying potentially relevant documents and culling documents according to various criteria (such as keyword, date range, etc.). Most review tools also make it easy for large groups of document review attorneys to work on cases, featuring collaborative tools and batches to speed up
3266-439: A printing device. PostScript was not intended for long-term storage and real-time interactive rendering of electronic documents to computer monitors , so there was no need to support anything other than consecutive rendering of pages. If there was an error in the final printed output, the user would correct it at the application level and send a new print job in the form of an entirely new PostScript file. Thus, any given page in
3408-511: A problem with alternative approaches: Here's a new language we want you to learn, and now you need to output these additional files on your server. It's a hassle. (Microformats) lower the barrier to entry. Most common types of computer files can embed metadata, including documents, (e.g. Microsoft Office files, OpenDocument files, PDF ) images, (e.g. JPEG , PNG ) Video files, (e.g. AVI , MP4 ) and audio files. (e.g. WAV , MP3 ) Metadata may be added to files by users, but some metadata
3550-427: A resource. Statistical data repositories have their own requirements for metadata in order to describe not only the source and quality of the data but also what statistical processes were used to create the data, which is of particular importance to the statistical community in order to both validate and improve the process of statistical data production. An additional type of metadata beginning to be more developed
3692-469: A result, files that use a small amount of transparency might be viewed acceptably by older viewers, but files making extensive use of transparency could be viewed incorrectly by an older viewer. The transparency extensions are based on the key concepts of transparency groups , blending modes , shape , and alpha . The model is closely aligned with the features of Adobe Illustrator version 9. The blend modes were based on those used by Adobe Photoshop at
3834-568: A smaller set of Documents and then extrapolates those judgments to the remaining Document Collection. Some TAR methods use Machine Learning Algorithms to distinguish Relevant from Non-Relevant Documents, based on Training Examples Coded as Relevant or Non-Relevant by the Subject Matter Experts(s), while other TAR methods derive systematic Rules that emulate the expert(s)’ decision-making process. TAR processes generally incorporate Statistical Models and/or Sampling techniques to guide
3976-473: A stream may be used instead of the ASCII cross-reference table and contains the offsets and other information in binary format. The format is flexible in that it allows for integer width specification (using the /W array), so that for example, a document not exceeding 64 KiB in size may dedicate only 2 bytes for object offsets. At the end of a PDF file is a footer containing If a cross-reference stream
4118-591: A unique code to each archived message or chat. The systems prevent alterations to original messages, messages cannot be deleted, and unauthorized persons cannot access the messages. The formalized changes to the Federal Rules of Civil Procedure in December 2006 and 2007 effectively forced civil litigants into a compliance mode with respect to their proper retention and management of electronically stored information (ESI). Improper management of ESI can result in
4260-461: A year, regardless of whether or not they [ever] were persons of interest to the agency. Geospatial metadata relates to Geographic Information Systems (GIS) files, maps, images, and other data that is location-based. Metadata is used in GIS to document the characteristics and attributes of geographic data, such as database files and data that is developed within a GIS. It includes details like who developed
4402-676: Is accessibility metadata . Accessibility metadata is not a new concept to libraries; however, advances in universal design have raised its profile. Projects like Cloud4All and GPII identified the lack of common terminologies and models to describe the needs and preferences of users and information that fits those needs as a major gap in providing universal access solutions. Those types of information are accessibility metadata. Schema.org has incorporated several accessibility properties based on IMS Global Access for All Information Model Data Element Specification. The Wiki page WebSchemas/Accessibility lists several properties and their values. While
Electronic discovery - Misplaced Pages Continue
4544-471: Is " raw data ", which forensic investigators can review for hidden evidence. The original file format is known as the "native" format . Litigators may review material from ediscovery in one of several formats: printed paper, "native file", or a petrified, paper-like format, such as PDF files or TIFF images. Modern document review platforms accommodate the use of native files and allow for them to be converted to TIFF and Bates -stamped for use in court. In 2006,
4686-415: Is a subset of PostScript, simplified to remove such control flow features, while graphics commands remain. PostScript was originally designed for a drastically different use case : transmission of one-way linear print jobs in which the PostScript interpreter would collect a series of commands until it encountered the showpage command, then execute all the commands to render a page as a raster image to
4828-532: Is addressed in an article entitled "Better Ediscovery: Unified Governance and the IGRM," published by the American Bar Association. Metadata Metadata (or metainformation ) is " data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: Metadata
4970-464: Is also a shading pattern , which draws continuously varying colors. There are seven types of shading patterns of which the simplest are the axial shading (Type 2) and radial shading (Type 3). Raster images in PDF (called Image XObjects ) are represented by dictionaries with an associated stream. The dictionary describes the properties of the image, and the stream contains the image data. (Less commonly, small raster images may be embedded directly in
5112-438: Is an early example of metadata usage. The early paper catalog had information regarding whichever item was described on said card: title, author, subject, and a number as to where to find said item. Beginning in the 1980s and 1990s, many libraries replaced these paper file cards with computer databases. These computer databases make it much easier and faster for users to do keyword searches. Another form of older metadata collection
5254-491: Is being accomplished in the national and international standards communities, especially ANSI (American National Standards Institute) and ISO (International Organization for Standardization) to reach a consensus on standardizing metadata and registries. The core metadata registry standard is ISO / IEC 11179 Metadata Registries (MDR), the framework for the standard is described in ISO/IEC 11179-1:2004. A new edition of Part 1
5396-469: Is called an embedded font while the former is called an unembedded font . The font files that may be embedded are based on widely used standard digital font formats: Type 1 (and its compressed variant CFF), TrueType , and (beginning with PDF 1.6) OpenType . Additionally PDF supports the Type 3 variant in which the components of the font are described by PDF graphic operators. Fourteen typefaces, known as
5538-486: Is clear that he uses the term in the ISO 11179 "traditional" sense, which is "structural metadata" i.e. "data about the containers of data"; rather than the alternative sense "content about individual instances of data content" or metacontent, the type of data usually found in library catalogs. Since then the fields of information management, information science, information technology, librarianship, and GIS have widely adopted
5680-470: Is completely discrete from other elements and classified according to one dimension only. An example of a linear metadata schema is the Dublin Core schema, which is one-dimensional. Metadata schemata are often 2 dimensional, or planar, where each element is completely discrete from other elements but classified according to 2 orthogonal dimensions. The degree to which the data or metadata is structured
5822-560: Is in its final stage for publication in 2015 or early 2016. It has been revised to align with the current edition of Part 3, ISO/IEC 11179-3:2013 which extends the MDR to support the registration of Concept Systems. (see ISO/IEC 11179 ). This standard specifies a schema for recording both the meaning and technical structure of the data for unambiguous usage by humans and computers. ISO/IEC 11179 standard refers to metadata as information objects about data, or "data about data". In ISO/IEC 11179 Part-3,
Electronic discovery - Misplaced Pages Continue
5964-454: Is more work to be done. Metadata (metacontent) or, more correctly, the vocabularies used to assemble metadata (metacontent) statements, is typically structured according to a standardized concept using a well-defined metadata scheme, including metadata standards and metadata models . Tools such as controlled vocabularies , taxonomies , thesauri , data dictionaries , and metadata registries can be used to apply further standardization to
6106-608: Is most commonly used in museum contexts for object identification and resource recovery purposes. Metadata is developed and applied within collecting institutions and museums in order to: Many museums and cultural heritage centers recognize that given the diversity of artworks and cultural objects, no single model or standard suffices to describe and catalog cultural works. For example, a sculpted Indigenous artifact could be classified as an artwork, an archaeological artifact, or an Indigenous heritage item. The early stages of standardization in archiving, description and cataloging within
6248-533: Is no intelligence or "inferencing" occurring, just the illusion thereof. Metadata schemata can be hierarchical in nature where relationships exist between metadata elements and elements are nested so that parent-child relationships exist between the elements. An example of a hierarchical metadata schema is the IEEE LOM schema, in which metadata elements may belong to a parent metadata element. Metadata schemata can also be one-dimensional, or linear, where each element
6390-418: Is not being used, the footer is preceded by the trailer keyword followed by a dictionary containing information that would otherwise be contained in the cross-reference stream object's dictionary: Within each page, there are one or multiple content streams that describe the text, vector and images being drawn on the page. The content stream is stack-based , similar to PostScript. There are two layouts to
6532-636: Is not only on creation and capture, but moreover on maintenance costs. As soon as the metadata structures become outdated, so too is the access to the referred data. Hence granularity must take into account the effort to create the metadata as well as the effort to maintain it. In all cases where the metadata schemata exceed the planar depiction, some type of hypermapping is required to enable display and view of metadata according to chosen aspect and to serve special views. Hypermapping frequently applies to layering of geographical and geological information overlays. International standards apply to metadata. Much work
6674-495: Is not required in situations where a PDF file is intended only for print. Since the feature is optional, and since the rules for tagged PDF were relatively vague in ISO 32000-1, support for tagged PDF among consuming devices, including assistive technology (AT), is uneven as of 2021. ISO 32000-2, however, includes an improved discussion of tagged PDF which is anticipated to facilitate further adoption. An ISO-standardized subset of PDF specifically targeted at accessibility, PDF/UA ,
6816-628: Is not strictly bound to one of these categories, as it can describe a piece of data in many other ways. Metadata has various purposes. It can help users find relevant information and discover resources . It can also help organize electronic resources, provide digital identification, and archive and preserve resources. Metadata allows users to access resources by "allowing resources to be found by relevant criteria, identifying resources, bringing similar resources together, distinguishing dissimilar resources, and giving location information". Metadata of telecommunication activities including Internet traffic
6958-494: Is often automatically added to files by authoring applications or by devices used to produce the files, without user intervention. While metadata in files are useful for finding them, they can be a privacy hazard when the files are shared. Using metadata removal tools to clean files before sharing them can mitigate this risk. Metadata may be written into a digital photo file that will identify who owns it, copyright and contact information, what brand or model of camera created
7100-403: Is often created by journal publishers and citation databases such as PubMed and Web of Science . The data contained within manuscripts or accompanying them as supplementary material is less often subject to metadata creation, though they may be submitted to e.g. biomedical databases after publication. The original authors and database curators then become responsible for metadata creation, with
7242-457: Is often discoverable under electronic discovery rules. Employers may have a duty to retain voicemail if there is an anticipation of litigation involving that employee. Data from voice assistants like Amazon Alexa and Siri have been used in criminal cases. Although petrifying documents to static image formats ( TIFF & JPEG ) had become the standard document review method for almost two decades, native format review has increased in popularity as
SECTION 50
#17328017187687384-434: Is organized using ASCII characters, except for certain elements that may have binary content. The file starts with a header containing a magic number (as a readable string) and the version of the format, for example %PDF-1.7 . The format is a subset of a COS ("Carousel" Object Structure) format. A COS tree file consists primarily of objects , of which there are nine types: Comments using 8-bit characters prefixed with
7526-521: Is particularly so, according to research studies (cited in Rio Tinto), where the TAR methodology uses continuous active learning ("CAL") which eliminates issues about the seed set and stabilizing the TAR tool. The Court would have liked the City to use TAR in this case. But the Court cannot, and will not, force the City to do so. There may come a time when TAR is so widely used that it might be unreasonable for
7668-531: Is placed in a legal hold . This ensures that data cannot be destroyed. Care is taken to ensure this process is defensible, while the end goal is to reduce the possibility of data spoliation or destruction. Failure to preserve can lead to sanctions. Even if a court does not rule that the failure to preserve is negligence, they can force the accused to pay fines if the lost data puts the defense "at an undue disadvantage in establishing their defense." Once documents have been preserved, collection can begin. The collection
7810-412: Is referred to as "granularity" . "Granularity" refers to how much detail is provided. Metadata with a high granularity allows for deeper, more detailed, and more structured information and enables a greater level of technical manipulation. A lower level of granularity means that metadata can be created for considerably lower costs but will not provide as detailed information. The major impact of granularity
7952-402: Is represented by text elements in page content streams. A text element specifies that characters should be drawn at certain positions. The characters are specified using the encoding of a selected font resource . A font object in PDF is a description of a digital typeface . It may either describe the characteristics of a typeface, or it may include an embedded font file . The latter case
8094-774: Is saved as persistent repository and describe business objects in various enterprise systems and applications. Structural metadata commonality is also important to support data virtualization. Standardization and harmonization work has brought advantages to industry efforts to build metadata systems in the statistical community. Several metadata guidelines and standards such as the European Statistics Code of Practice and ISO 17369:2013 ( Statistical Data and Metadata Exchange or SDMX) provide key principles for how businesses, government bodies, and other entities should manage statistical data and metadata. Entities such as Eurostat , European System of Central Banks , and
8236-523: Is stored in the integrated library management system, ILMS , using the MARC metadata standard. The purpose is to direct patrons to the physical or electronic location of items or areas they seek as well as to provide a description of the item/s in question. More recent and specialized instances of library metadata include the establishment of digital libraries including e-print repositories and digital image libraries. While often based on library principles,
8378-485: Is the bibliographic classification, the subject, the Dewey Decimal class number . There is always an implied statement in any "classification" of some object. To classify an object as, for example, Dewey class number 514 (Topology) (i.e. books having the number 514 on their spine) the implied statement is: "<book><subject heading><514>". This is a subject-predicate-object triple, or more importantly,
8520-414: Is the transfer of data from a company to its legal counsel, who will determine the relevance and disposition of data. Some companies that deal with frequent litigation have software in place to quickly place legal holds on certain custodians when an event (such as legal notice) is triggered and begin the collection process immediately. Other companies may need to call in a digital forensics expert to prevent
8662-484: Is the use by the US Census Bureau of what is known as the "Long Form". The Long Form asks questions that are used to create demographic data to find patterns of distribution. Libraries employ metadata in library catalogues , most commonly as part of an Integrated Library Management System . Metadata is obtained by cataloging resources such as books, periodicals, DVDs, web pages or digital images. This data
SECTION 60
#17328017187688804-439: Is used as the basis for generating PostScript-like PDF code (see, e.g., Adobe Distiller ). This is done by applying standard compiler techniques like loop unrolling , inlining and removing unused branches, resulting in code that is purely declarative and static. The end result is then packaged into a container format , together with all necessary dependencies for correct rendering (external files, graphics, or fonts to which
8946-567: Is usually expressed as a set of keywords in a natural language. According to Ralph Kimball , metadata can be divided into three categories: technical metadata (or internal metadata), business metadata (or external metadata), and process metadata . NISO distinguishes three types of metadata: descriptive, structural, and administrative. Descriptive metadata is typically used for discovery and identification, as information to search and locate an object, such as title, authors, subjects, keywords, and publisher. Structural metadata describes how
9088-447: Is very similar to that of PostScript, except for the use of transparency, which was added in PDF 1.4. PDF graphics use a device-independent Cartesian coordinate system to describe the surface of a page. A PDF page description can use a matrix to scale , rotate , or skew graphical elements. A key concept in PDF is that of the graphics state , which is a collection of graphical parameters that may be changed, saved, and restored by
9230-420: Is very widely collected by various national governmental organizations. This data is used for the purposes of traffic analysis and can be used for mass surveillance . Metadata was traditionally used in the card catalogs of libraries until the 1980s when libraries converted their catalog data to digital databases . In the 2000s, as data and information were increasingly stored digitally, this digital data
9372-540: The Civil Procedure Rules and Practice Direction 31B on Disclosure of Electronic Documents apply. Other jurisdictions around the world also have rules relating to electronic discovery. The Electronic Discovery Reference Model (EDRM) is an ubiquitous diagram that represents a conceptual view of these stages involved in the ediscovery process. The identification phase is when potentially responsive documents are identified for further analysis and review. In
9514-420: The U.S. Environmental Protection Agency have implemented these and other such standards and guidelines with the goal of improving "efficiency when managing statistical business processes". Metadata has been used in various ways as a means of cataloging items in libraries in both digital and analog formats. Such data helps classify, aggregate, identify, and locate a particular book, DVD, magazine, or any object
9656-515: The U.S. Supreme Court 's amendments to the Federal Rules of Civil Procedure created a category for electronic records that, for the first time, explicitly named emails and instant message chats as likely records to be archived and produced when relevant. One type of preservation problem arose during the Zubulake v. UBS Warburg LLC lawsuit. Throughout the case, the plaintiff claimed that
9798-538: The Windows and Macintosh operating systems, fonts using these encodings work equally well on any platform.) PDF can specify a predefined encoding to use, the font's built-in encoding or provide a lookup table of differences to a predefined or built-in encoding (not recommended with TrueType fonts). The encoding mechanisms in PDF were designed for Type 1 fonts, and the rules for applying them to TrueType fonts are complex. For large fonts or fonts with non-standard glyphs,
9940-504: The acquisition of digital media), which can lead to confusion. While attorneys involved in case litigation try their best to understand the companies and organizations they represent, they may fail to understand the policies and practices that are in place in the company's IT department. As a result, some data may be destroyed after a legal hold has been issued by unknowing technicians performing their regular duties. Many companies are deploying software that properly preserves data across
10082-419: The contents and context of data or data files increases its usefulness. For example, a web page may include metadata specifying what software language the page is written in (e.g., HTML), what tools were used to create it, what subjects the page is about, and where to find more information about the subject. This metadata can automatically improve the reader's experience and make it easier for users to find
10224-470: The ontologies of the systems from which they were created. Often the processes through which cultural objects are described and categorized through metadata in museums do not reflect the perspectives of the maker communities. PDF This is an accepted version of this page Portable Document Format ( PDF ), standardized as ISO 32000 , is a file format developed by Adobe in 1992 to present documents , including text formatting and images, in
10366-419: The standard 14 fonts , have a special significance in PDF documents: These fonts are sometimes called the base fourteen fonts . These fonts, or suitable substitute fonts with the same metrics, should be available in most PDF readers, but they are not guaranteed to be available in the reader, and may only display correctly if the system has them installed. Fonts may be substituted if they are not embedded in
10508-499: The CCO, are integrated within a Museum's Collections Management System (CMS), a database through which museums are able to manage their collections, acquisitions, loans and conservation. Scholars and professionals in the field note that the "quickly evolving landscape of standards and technologies" creates challenges for cultural documentarians, specifically non-technically trained professionals. Most collecting institutions and museums use
10650-563: The Library of Congress Controlled Vocabularies are reputable within the museum community and are recommended by CCO standards. Museums are encouraged to use controlled vocabularies that are contextual and relevant to their collections and enhance the functionality of their digital information systems. Controlled Vocabularies are beneficial within databases because they provide a high level of consistency, improving resource retrieval. Metadata structures, including controlled vocabularies, reflect
10792-429: The PDF files: non-linearized (not "optimized") and linearized ("optimized"). Non-linearized PDF files can be smaller than their linear counterparts, though they are slower to access because portions of the data required to assemble pages of the document are scattered throughout the PDF file. Linearized PDF files (also called "optimized" or "web optimized" PDF files) are constructed in a manner that enables them to be read in
10934-471: The United States, at the federal level, electronic discovery is governed by common law , case law and specific statutes, but primarily by the Federal Rules of Civil Procedure (FRCP), including amendments effective December 1, 2006, and December 1, 2015. In addition, state law and regulatory agencies increasingly also address issues relating to electronic discovery. In England and Wales , Part 31 of
11076-476: The United States, in Zubulake v. UBS Warburg , Hon. Shira Scheindlin ruled that failure to issue a written legal hold notice whenever litigation is reasonably anticipated will be deemed grossly negligent. This holding brought additional focus to the concepts of legal holds, eDiscovery, and electronic preservation. Custodians who are in possession of potentially relevant information or documents are identified. Data mapping techniques are often employed to ensure
11218-455: The analysis from a client-based perspective; here, each investigator looks at one agent included in the evidence additional Patterns like discussions or network analysis around people can be done. Documents are turned over to opposing counsel based on agreed-upon specifications. Often this production is accompanied by a load file, which is used to load documents into a document review platform. Documents can be produced either as native files or in
11360-812: The assistance of automated processes. Comprehensive metadata for all experimental data is the foundation of the FAIR Guiding Principles , or the standards for ensuring research data are findable , accessible , interoperable , and reusable . Such metadata can then be utilized, complemented, and made accessible in useful ways. OpenAlex is a free online index of over 200 million scientific documents that integrates and provides metadata such as sources, citations , author information , scientific fields , and research topics. Its API and open source website can be used for metascience, scientometrics , and novel tools that query this semantic web of papers . Another project under development, Scholia , uses
11502-563: The author is, when the document was written, and a short summary of the document. Metadata within web pages can also contain descriptions of page content, as well as key words linked to the content. These links are often called "Metatags", which were used as the primary factor in determining order for a web search until the late 1990s. The reliance on metatags in web searches was decreased in the late 1990s because of "keyword stuffing", whereby metatags were being largely misused to trick search engines into thinking some websites had more relevance in
11644-435: The collected data. Currently the two main approaches for identifying responsive material on custodian machines are: (1) where physical access to the organizations network is possible - agents are installed on each custodian machine which push large amounts of data for indexing across the network to one or more servers that have to be attached to the network or (2) for instances where it is impossible or impractical to attend
11786-574: The components of an object are organized. An example of structural metadata would be how pages are ordered to form chapters of a book. Finally, administrative metadata gives information to help manage the source. Administrative metadata refers to the technical information, such as file type, or when and how the file was created. Two sub-types of administrative metadata are rights management metadata and preservation metadata. Rights management metadata explains intellectual property rights , while preservation metadata contains information to preserve and save
11928-428: The components of metacontent statements, whether for indexing or finding, is endorsed by ISO 25964 : "If both the indexer and the searcher are guided to choose the same term for the same concept, then relevant documents will be retrieved." This is particularly relevant when considering search engines of the internet, such as Google. The process indexes pages and then matches text strings using its complex algorithm; there
12070-741: The content is desirable. This is particularly useful in video applications such as Automatic Number Plate Recognition and Vehicle Recognition Identification software, wherein license plate data is saved and used to create reports and alerts. There are 2 sources in which video metadata is derived: (1) operational gathered metadata, that is information about the content produced, such as the type of equipment, software, date, and location; (2) human-authored metadata, to improve search engine visibility, discoverability, audience engagement, and providing advertising opportunities to video publishers. Avid's MetaSync and Adobe's Bridge are examples of professional video editing software with access to metadata. Information on
12212-508: The contents. PDF 2.0 defines 256-bit AES encryption as the standard for PDF 2.0 files. The PDF Reference also defines ways that third parties can define their own encryption systems for PDF. PDF files may be digitally signed, to provide secure authentication; complete details on implementing digital signatures in PDF are provided in ISO 32000-2. PDF files may also contain embedded DRM restrictions that provide further controls that limit copying, editing, or printing. These restrictions depend on
12354-464: The data, when it was collected, how it was processed, and what formats it's available in, and then delivers the context for the data to be used effectively. Metadata can be created either by automated information processing or by manual work. Elementary metadata captured by computers can include information about when an object was created, who created it, when it was last updated, file size, and file extension. In this context an object refers to any of
12496-413: The data; it is used to summarize basic information about data that can make tracking and working with specific data easier. Some examples include: For example, a digital image may include metadata that describes the size of the image, its color depth, resolution, when it was created, the shutter speed, and other data. A text document's metadata may contain information about how long the document is, who
12638-411: The document refers), and compressed . Modern applications write to printer drivers that directly generate PDF rather than going through PostScript first. As a document format, PDF has several advantages over PostScript: Its disadvantages are: PDF since v1.6 supports embedding of interactive 3D documents: 3D drawings can be embedded using U3D or PRC and various other data formats. A PDF file
12780-411: The document root. This dictionary contains an array of Optional Content Groups (OCGs), each describing a set of information and each of which may be individually displayed or suppressed, plus a set of Optional Content Configuration Dictionaries, which give the status (Displayed or Suppressed) of the given OCGs. A PDF file may be encrypted , for security, in which case a password is needed to view or edit
12922-538: The early years PDF was popular mainly in desktop publishing workflows, and competed with several other formats, including DjVu , Envoy , Common Ground Digital Paper, Farallon Replica and even Adobe's own PostScript format. PDF was a proprietary format controlled by Adobe until it was released as an open standard on July 1, 2008, and published by the International Organization for Standardization as ISO 32000-1:2008, at which time control of
13064-534: The effective and efficient use of information in enabling an organization to achieve its goals." As compared to eDiscovery, information governance as a discipline is relatively new. Yet, there is traction for convergence. eDiscovery—a multi-billion-dollar industry—is rapidly evolving, ready to embrace optimized solutions that strengthen cybersecurity (for cloud computing). Since the early 2000s, eDiscovery practitioners have developed skills and techniques that can be applied to information governance. Organizations can apply
13206-467: The efforts to describe and standardize the varied accessibility needs of information seekers are beginning to become more robust, their adoption into established metadata schemas has not been as developed. For example, while Dublin Core (DC)'s "audience" and MARC 21's "reading level" could be used to identify resources suitable for users with dyslexia and DC's "format" could be used to identify resources available in braille, audio, or large print formats, there
13348-530: The evidence needed to prove the case existed in emails stored on UBS' own computer systems. Because the emails requested were either never found or destroyed, the court found that they were more likely to exist than not. The court found that while the corporation's counsel directed that all potential discovery evidence, including emails, be preserved, the staff that the directive applied to did not follow through. This resulted in significant sanctions against UBS. To establish authenticity, some archiving systems apply
13490-506: The file, along with exposure information (shutter speed, f-stop, etc.) and descriptive information, such as keywords about the photo, making the file or image searchable on a computer and/or the Internet. Some metadata is created by the camera such as, color space, color channels, exposure time, and aperture (EXIF), while some is input by the photographer and/or software after downloading to a computer. Most digital cameras write metadata about
13632-446: The focus on non-librarian use, especially in providing metadata, means they do not follow traditional or common cataloging approaches. Given the custom nature of included materials, metadata fields are often specially created e.g. taxonomic classification fields, location fields, keywords, or copyright statement. Standard file information such as file size and format are usually automatically included. Library operation has for decades been
13774-425: The following: A metadata engine collects, stores and analyzes information about data and metadata in use within a domain. Data virtualization emerged in the 2000s as the new software technology to complete the virtualization "stack" in the enterprise. Metadata is used in data virtualization servers which are enterprise infrastructure components, alongside database and application servers. Metadata in these servers
13916-406: The full implementation of the ISO 32000-1 specification. These proprietary technologies are not standardized, and their specification is published only on Adobe's website. Many of them are not supported by popular third-party implementations of PDF. ISO published version 2.0 of PDF, ISO 32000-2 in 2017, available for purchase, replacing the free specification provided by Adobe. In December 2020,
14058-402: The graphics state, including patterns . PDF supports several types of patterns. The simplest is the tiling pattern in which a piece of artwork is specified to be drawn repeatedly. This may be a colored tiling pattern , with the colors specified in the pattern object, or an uncolored tiling pattern , which defers color specification to the time the pattern is drawn. Beginning with PDF 1.3 there
14200-494: The imaging model. A tagged PDF (see clause 14.8 in ISO 32000) includes document structure and semantics information to enable reliable text extraction and accessibility . Technically speaking, tagged PDF is a stylized use of the format that builds on the logical structure framework introduced in PDF 1.3. Tagged PDF defines a set of standard structure types and attributes that allow page content (text, graphics, and images) to be extracted and reused for other purposes. Tagged PDF
14342-638: The information objects are data about Data Elements, Value Domains, and other reusable semantic and representational information objects that describe the meaning and technical details of a data item. This standard also prescribes the details for a metadata registry, and for registering and administering the information objects within a Metadata Registry. ISO/IEC 11179 Part 3 also has provisions for describing compound structures that are derivations of other data elements, for example through calculations, collections of one or more data elements, or other forms of derived data. While this standard describes itself originally as
14484-504: The lessons learned from eDiscovery to accelerate their path to a sophisticated information governance framework. The Information Governance Reference Model (IGRM) illustrates the relationship between key stakeholders and the Information Lifecycle and highlights the transparency required to enable effective governance. The updated IGRM v3.0 emphasizes that Privacy & Security Officers are essential stakeholders. This topic
14626-417: The level of contribution and the responsibilities. Moreover, various metadata about scientific outputs can be created or complemented – for instance, scite.ai attempts to track and link citations of papers as 'Supporting', 'Mentioning' or 'Contrasting' the study. Other examples include developments of alternative metrics – which, beyond providing help for assessment and findability, also aggregate many of
14768-434: The location the photo was taken from may also be included. Photographic Metadata Standards are governed by organizations that develop the following standards. They include, but are not limited to: Metadata is particularly useful in video, where information about its contents (such as transcripts of conversations and text descriptions of its scenes) is not directly understandable by a computer, but where an efficient search of
14910-438: The metadata application is manifold, covering a large variety of fields, there are specialized and well-accepted models to specify types of metadata. Bretherton & Singley (1994) distinguish between two distinct classes: structural/control metadata and guide metadata. Structural metadata describes the structure of database objects such as tables, columns, keys and indexes. Guide metadata helps humans find specific items and
15052-463: The metadata of scientific publications for various visualizations and aggregation features such as providing a simple user interface summarizing literature about a specific feature of the SARS-CoV-2 virus using Wikidata 's "main subject" property. In research labor, transparent metadata about authors' contributions to works have been proposed – e.g. the role played in the production of the paper,
15194-524: The metadata. Structural metadata commonality is also of paramount importance in data model development and in database design . Metadata (metacontent) syntax refers to the rules created to structure the fields or elements of metadata (metacontent). A single metadata scheme may be expressed in a number of different markup or programming languages, each of which requires a different syntax. For example, Dublin Core may be expressed in plain text, HTML , XML , and RDF . A common example of (guide) metacontent
15336-408: The method adopted to collect and process data there are few resources available for practitioners to evaluate the different tools. This is an issue due to the significant cost of eDiscovery solutions. Notwithstanding the limited options for obtaining trial licences for the tools, a significant barrier to the evaluation process is creating a suitable environment in which to test such tools. Adams suggests
15478-720: The model number, shutter speed, etc., and some enable you to edit it; this functionality has been available on most Nikon DSLRs since the Nikon D3 , on most new Canon cameras since the Canon EOS 7D , and on most Pentax DSLRs since the Pentax K-3. Metadata can be used to make organizing in post-production easier with the use of key-wording. Filters can be used to analyze a specific set of photographs and create selections on criteria like rating or capture time. On devices with geolocation capabilities like GPS (smartphones in particular),
15620-683: The museum community began in the late 1990s with the development of standards such as Categories for the Description of Works of Art (CDWA), Spectrum, CIDOC Conceptual Reference Model (CRM), Cataloging Cultural Objects (CCO) and the CDWA Lite XML schema. These standards use HTML and XML markup languages for machine processing, publication and implementation. The Anglo-American Cataloguing Rules (AACR), originally developed for characterizing books, have also been applied to cultural objects, works of art and architecture. Standards, such as
15762-593: The network to combat this trend, preventing inadvertent data spoliation. Given the complexities of modern litigation and the wide variety of information systems on the market, electronic discovery often requires IT professionals from both the attorney's office (or vendor) and the parties to the litigation to communicate directly to address technology incompatibilities and agree on production formats. Failure to get expert advice from knowledgeable personnel often leads to additional time and unforeseen costs in acquiring new technology or adapting existing technologies to accommodate
15904-416: The network. This process has been patented and embodied in a tool that has been the subject of a conference paper. In relation to the second approach, despite self-collection being a hot topic in eDiscovery, concerns are being addressed by limiting the involvement of the custodian to simply plugging in a device and running an application to create an encrypted container of responsive documents. Regardless of
16046-450: The number of tables subject to discovery is large or relationships between the tables are of essence, the data are produced in native database format or as a database backup file. A number of different people may be involved in an electronic discovery project: lawyers for both parties, forensic specialists, IT managers, and records managers, amongst others. Forensic examination often uses specialized terminology (for example, "image" refers to
16188-427: The numbers themselves can be perceived as the data. But if given the context that this database is a log of a book collection, those 13-digit numbers may now be identified as ISBNs – information that refers to the book, but is not itself the information within the book. The term "metadata" was coined in 1968 by Philip Bagley, in his book "Extension of Programming Language Concepts" where it
16330-635: The object by cultural documentarians. The established institutional cataloging practices, goals, and expertise of cultural documentarians and database structure also influence the information ascribed to cultural objects and the ways in which cultural objects are categorized. Additionally, museums often employ standardized commercial collection management software that prescribes and limits the ways in which archivists can describe artworks and cultural objects. As well, collecting institutions and museums use Controlled Vocabularies to describe cultural objects and artworks in their collections. Getty Vocabularies and
16472-440: The objects in the file, and also allows for small changes to be made without rewriting the entire file ( incremental update ). Before PDF version 1.5, the table would always be in a special ASCII format, be marked with the xref keyword, and follow the main body composed of indirect objects. Version 1.5 introduced optional cross-reference streams , which have the form of a standard stream object, possibly with filters applied. Such
16614-552: The percent sign ( % ) may be inserted. Objects may be either direct (embedded in another object) or indirect . Indirect objects are numbered with an object number and a generation number and defined between the obj and endobj keywords if residing in the document root. Beginning with PDF version 1.5, indirect objects (except other streams) may also be located in special streams known as object streams (marked /Type /ObjStm ). This technique enables non-stream objects to have standard stream filters applied to them, reduces
16756-423: The physical location of the custodian system - storage devices are attached to custodian machines (or company servers) and then each collection instance is manually deployed. In relation to the first approach there are several issues: New technology is able to address problems created by the first approach by running an application entirely in memory on each custodian machine and only pushing responsive data across
16898-673: The presentation. The results of the analysis should be the subject of the presentation. The clear documentation should provide reproducibility. Any data that is stored in an electronic form may be subject to production under common eDiscovery rules. This type of data has historically included email and office documents (spreadsheets, presentations, documents, PDFs, etc.) but can also include photos, video, instant messaging, collaboration tools, text (SMS), messaging apps, social media, ephemeral messaging, Internet of things (smart devices like Fitbits, smart watches, Alexa Alexa, Apple Siri, Nest), databases, and other file types. Also included in ediscovery
17040-439: The process and to measure overall system effectiveness. Anecdotal evidence for this emerging trend points to the business value of information governance (IG), defined by Gartner as "the specification of decision rights and an accountability framework to encourage desirable behavior in the valuation, creation, storage, use, archival, and deletion of information. It includes the processes, roles, standards, and metrics that ensure
17182-588: The public discussions about a scientific paper on social media such as Reddit , citations on Misplaced Pages , and reports about the study in the news media – and a call for showing whether or not the original findings are confirmed or could get reproduced . Metadata in a museum context is the information that trained cultural documentation specialists, such as archivists , librarians , museum registrars and curators , create to index, structure, describe, identify, or otherwise specify works of art, architecture, cultural objects and their images. Descriptive metadata
17324-516: The purposes of discovery. The original set of 15 classic metadata terms, known as the Dublin Core Metadata Element Set are endorsed in the following standards documents: The W3C Data Catalog Vocabulary (DCAT) is an RDF vocabulary that supplements Dublin Core with classes for Dataset, Data Service, Catalog, and Catalog Record. DCAT also uses elements from FOAF, PROV-O, and OWL-Time. DCAT provides an RDF model to support
17466-452: The registration and administration portion of the standard. The Geospatial community has a tradition of specialized geospatial metadata standards, particularly building on traditions of map- and image-libraries and catalogs. Formal metadata is usually essential for geospatial data, as common text-processing approaches are not applicable. The Dublin Core metadata terms are a set of vocabulary terms that can be used to describe resources for
17608-411: The review process and eliminate work duplication. Qualitative analysis of the content discovered in the collection phase and after being reduced by the preprocessing phase. The Evidence is looked at in context. Correlation analysis or contextual analysis to extract structured information relevant to the case. Structuring like Timelineing or Clustering into Topics can be done. An example structure could be
17750-493: The search than they really did. Metadata can be stored and managed in a database , often called a metadata registry or metadata repository . However, without context and a point of reference, it might be impossible to identify metadata just by looking at it. For example: by itself, a database containing several numbers, all 13 digits long could be the results of calculations or a list of numbers to plug into an equation – without any other context,
17892-502: The second edition of PDF 2.0, ISO 32000-2:2020, was published, with clarifications, corrections, and critical updates to normative references (ISO 32000-2 does not include any proprietary technologies as normative references). In April 2023 the PDF Association made ISO 32000-2 available for download free of charge. A PDF file is often a combination of vector graphics , text, and bitmap graphics . The basic types of content in
18034-423: The size of files that have large numbers of small indirect objects and is especially useful for Tagged PDF . Object streams do not support specifying an object's generation number (other than 0). An index table, also called the cross-reference table, is located near the end of the file and gives the byte offset of each indirect object from the start of the file. This design allows for efficient random access to
18176-432: The special encodings Identity-H (for horizontal writing) and Identity-V (for vertical) are used. With such fonts, it is necessary to provide a ToUnicode table if semantic information about the characters is to be preserved. A text document which is scanned to PDF without the text being recognised by optical character recognition (OCR) is an image, with no fonts or text properties. The original imaging model of PDF
18318-577: The specification passed to an ISO Committee of volunteer industry experts. In 2008, Adobe published a Public Patent License to ISO 32000-1 granting royalty-free rights for all patents owned by Adobe necessary to make, use, sell, and distribute PDF-compliant implementations. PDF 1.7, the sixth edition of the PDF specification that became ISO 32000-1, includes some proprietary technologies defined only by Adobe, such as Adobe XML Forms Architecture (XFA) and JavaScript extension for Acrobat, which are referenced by ISO 32000-1 as normative and indispensable for
18460-449: The spoliation of data. The size and scale of this collection are determined by the identification phase. During the processing phase, native files are prepared to be loaded into a document review platform. Often, this phase also involves the extraction of text and metadata from the native files. Various data culling techniques are employed during this phase, such as deduplication and de-NISTing. Sometimes native files will be converted to
18602-411: The term. In these fields, the word metadata is defined as "data about data". While this is the generally accepted definition, various disciplines have adopted their own more specific explanations and uses of the term. Slate reported in 2013 that the United States government's interpretation of "metadata" could be broad, and might include message content such as the subject lines of emails. While
18744-448: The time. When the PDF 1.4 specification was published, the formulas for calculating blend modes were kept secret by Adobe. They have since been published. The concept of a transparency group in PDF specification is independent of existing notions of "group" or "layer" in applications such as Adobe Illustrator. Those groupings reflect logical relationships among objects that are meaningful when editing those objects, but they are not part of
18886-601: The times, origins and destinations of phone calls, electronic messages, instant messages, and other modes of telecommunication, as opposed to message content, is another form of metadata. Bulk collection of this call detail record metadata by intelligence agencies has proven controversial after disclosures by Edward Snowden of the fact that certain Intelligence agencies such as the NSA had been (and perhaps still are) keeping online metadata on millions of internet users for up to
19028-450: The typical structure of a catalog that contains records, each describing a dataset or service. Although not a standard, Microformat (also mentioned in the section metadata on the internet below) is a web-based approach to semantic markup which seeks to re-use existing HTML/XHTML tags to convey metadata. Microformat follows XHTML and HTML standards but is not a standard in itself. One advocate of microformats, Tantek Çelik , characterized
19170-571: The use of native applications. This is useful in the redaction of privileged or sensitive information since redaction tools for images are traditionally more mature and easier to apply on uniform image types by non-technical people. Efforts to redact similarly petrified PDF files by incompetent personnel have removed redacted layers and exposed redacted information, such as social security numbers and other private information. Traditionally, electronic discovery vendors had been contracted to convert native files into TIFF images (for example, 10 images for
19312-609: The use of the Microsoft Deployment Lab which automatically creates a small virtual network running under HyperV Technology-assisted review (TAR)—also known as computer-assisted review or predictive coding—involves the application of supervised machine learning or rule-based approaches to infer the relevance (or responsiveness, privilege, or other categories of interest) of ESI. Technology-assisted review has evolved rapidly since its inception circa 2005. Following research studies indicating its effectiveness, TAR
19454-439: The web page online. A CD may include metadata providing information about the musicians, singers, and songwriters whose work appears on the disc. In many countries, government organizations routinely store metadata about emails, telephone calls, web pages, video traffic, IP connections, and cell phone locations. Metadata means "data about data". Metadata is defined as the data providing information about one or more aspects of
19596-481: Was opaque, similar to PostScript, where each object drawn on the page completely replaced anything previously marked in the same location. In PDF 1.4 the imaging model was extended to allow transparency. When transparency is used, new objects interact with previously marked objects to produce blending effects. The addition of transparency to PDF was done by means of new extensions that were designed to be ignored in products written to PDF 1.3 and earlier specifications. As
19738-628: Was described using metadata standards . The first description of "meta data" for computer systems is purportedly noted by MIT's Center for International Studies experts David Griffel and Stuart McIntosh in 1967: "In summary then, we have statements in an object language about subject descriptions of data and token codes for the data. We also have statements in a meta language describing the data relationships and transformations, and ought/is relations between norm and data." Unique metadata standards exist for different disciplines (e.g., museum collections, digital audio files , websites , etc.). Describing
19880-514: Was first published in 2012. With the introduction of PDF version 1.5 (2003) came the concept of Layers. Layers, more formally known as Optional Content Groups (OCGs), refer to sections of content in a PDF document that can be selectively viewed or hidden by document authors or viewers. This capability is useful in CAD drawings, layered artwork, maps, multi-language documents, etc. Basically, it consists of an Optional Content Properties Dictionary added to
20022-429: Was first recognized by a U.S. court in 2012, by an Irish court in 2015, and by a U.K. court in 2016. Recently a U.S. court has declared that it is " black letter law that where the producing party wants to utilize TAR for document review, courts will permit it." In a subsequent matter, the same court stated, To be clear, the Court believes that for most cases today, TAR is the best and most efficient search tool. That
20164-635: Was standardized as ISO 32000 in 2008. The last edition as ISO 32000-2:2020 was published in December 2020. PDF files may contain a variety of content besides flat text and graphics including logical structuring elements, interactive elements such as annotations and form-fields, layers, rich media (including video content), three-dimensional objects using U3D or PRC , and various other data formats . The PDF specification also provides for encryption and digital signatures , file attachments, and metadata to enable workflows requiring these features. The development of PDF began in 1991 when John Warnock wrote
#767232