Misplaced Pages

Interchange File Format

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

Interchange File Format ( IFF ) is a generic digital container file format originally introduced by Electronic Arts (in cooperation with Commodore ) in 1985 to facilitate transfer of data between software produced by different companies.

#181818

69-545: IFF files do not have any standard filename extension . On many systems that generate IFF files, file extensions are not important because the operating system stores file format metadata separately from the file name . The .iff filename extension is commonly used for the ILBM image file format , which uses the IFF container format. Resource Interchange File Format is a format developed by Microsoft and IBM in 1991 that

138-542: A FourCC ). This is followed by a 32-bit signed integer (all integers in IFF file structure are big-endian ) specifying the size of the following data (the chunk content) in bytes. Because the specification includes explicit lengths for each chunk, it is possible for a parser to skip over chunks that it either can't or doesn't care to process. This structure is closely related to the type–length–value (TLV) representation. There are predefined group chunks, with type IDs FORM , LIST and CAT  . A FORM chunk

207-670: A Uniform Type Identifier by which to identify the file type internally. The use of a filename extension in a command name appears occasionally, usually as a side effect of the command having been implemented as a script, e.g., for the Bourne shell or for Python , and the interpreter name being suffixed to the command name, a practice common on systems that rely on associations between filename extension and interpreter, but sharply deprecated in Unix-like systems, such as Linux , Oracle Solaris , BSD -based systems, and Apple's macOS , where

276-439: A full stop (period), but in some systems it is separated with spaces. Some file systems implement filename extensions as a feature of the file system itself and may limit the length and format of the extension, while others treat filename extensions as part of the filename without special distinction. The Multics file system stores the file name as a single string, not split into base name and extension components, allowing

345-484: A pathname to be the character string that must be entered into a file system by a user in order to identify a file. On early personal computers using the CP/M operating system, filenames were always 11 characters. This was referred to as the 8.3 filename with a maximum of an 8 byte name and a maximum of a 3 byte extension. Utilities and applications allowed users to specify filenames without trailing spaces and include

414-414: A Fortran compiler might use the extension FOR for source input file, OBJ for the object output and LST for the listing. Although there are some common extensions, they are arbitrary and a different application might use REL and RPT . Extensions have been restricted, at least historically on some systems, to a length of 3 characters, but in general can have any length, e.g., html . There

483-570: A consequence of being derived from the UNIX-like NeXTSTEP operating system, in addition to using type and creator codes. In Commodore systems, files can only have four extensions: PRG, SEQ, USR, REL. However, these are used to separate data types used by a program and are irrelevant for identifying their contents. With the advent of graphical user interfaces , the issue of file management and interface behavior arose. Microsoft Windows allowed multiple applications to be associated with

552-401: A dot before the extension. The dot was not actually stored in the directory. Using only 7 bit characters allowed several file attributes to be included in the actual filename by using the high-order-bit; these attributes included Readonly, Archive, and System. Eventually this was too restrictive and the number of characters allowed increased. The attribute bits were moved to a special block of

621-503: A file with its media type as an extended attribute. Some desktop environments , such as KDE Plasma and GNOME , associate a media type with a file by examining both the filename suffix and the contents of the file, in the fashion of the file command, as a heuristic . They choose the application to launch when a file is opened based on that media type, reducing the dependency on filename extensions. macOS uses both filename extensions and media types, as well as file type codes , to select

690-589: A file: additionally, the exact byte representation of the filename on the storage device is needed. This can be solved at the application level, with some tricky normalization calls. The issue of Unicode equivalence is known as "normalized-name collision". A solution is the Non-normalizing Unicode Composition Awareness used in the Subversion and Apache technical communities. This solution does not normalize paths in

759-459: A filename, although most utilities do not handle them well. Filenames may include things like a revision or generation number of the file, a numerical sequence number (widely used by digital cameras through the DCF standard ), a date and time (widely used by smartphone camera software and for screenshots ), or a comment such as the name of a subject or a location or any other text to help identify

SECTION 10

#1732773392182

828-406: A filesystem to storing components of names, so increasing limits often requires an incompatible change, as well as reserving more space. A particular issue with filesystems that store information in nested directories is that it may be possible to create a file with a complete pathname that exceeds implementation limits, since length checking may apply only to individual parts of the name rather than

897-612: A given extension, and different actions were available for selecting the required application, such as a context menu offering a choice between viewing, editing or printing the file. The assumption was still that any extension represented a single file type; there was an unambiguous mapping between extension and icon. When the Internet age first arrived, those using Windows systems that were still restricted to 8.3 filename formats had to create web pages with names ending in .HTM , while those using Macintosh or UNIX computers could use

966-421: A maximum of eight plus three characters was a filename alias of " long file name.??? " as a way to conform to 8.3 limitations for older programs. This property was used by the move command algorithm that first creates a second filename and then only removes the first filename. Other filesystems, by design, provide only one filename per file, which guarantees that alteration of one filename's file does not alter

1035-577: A number of formats, such as CMAP , which holds color palette in ILBM , ANIM and DR2D files (pictures, animations and vector pictures). There are chunks that have a common name but hold different data such as BODY , which could store an image in an ILBM file and sound in an 8SVX file. And finally, there are chunks unique to their file type. Some programs that create IFF files add chunks to them with their internal data; these same files can later be read by other programs without any disruption (because their parsers could skip uninteresting chunks), which

1104-593: A period must occur at least once each 8 characters, two consecutive periods could not appear in the name, and must end with a letter or digit. By convention, the letters and numbers before the first period was the account number of the owner or the project it belonged to, but there was no requirement to use this convention. On the McGill University MUSIC/SP system, file names consisted of The Univac VS/9 operating system had file names consisting of In 1985, RFC   959 officially defined

1173-480: A program always has the same extension-less name, with only the interpreter directive and/or magic number changing, and references to the program from other programs remain valid. The default behavior of File Explorer , the file browser provided with Microsoft Windows , is for filename extensions to not be displayed. Malicious users have tried to spread computer viruses and computer worms by using file names formed like LOVE-LETTER-FOR-YOU.TXT.vbs . The hope

1242-423: A so-called "pad byte" after their regular end. The top-level structure of an IFF file consists of exactly one of the group chunks: FORM , LIST or CAT  , where FORM is by far the most common one. Each type of chunk typically has a different internal structure, which could be numerical data, text, or raw data. It is also possible to include other IFF files as if they are chunks (note that they have

1311-547: A three-character extension. The period character is not stored. The High Performance File System (HPFS), used in Microsoft and IBM 's OS/2 stores the file name as a single string, with the "." character as just another character in the file name. The convention of using suffixes continued, even though HPFS supports extended attributes for files, allowing a file's type to be stored in the file as an extended attribute. Microsoft's Windows NT 's native file system, NTFS , and

1380-752: A variety of ways, filename extensions started to become closely associated with certain products—even specific product versions. For example, early WordStar files used .WS or .WS n , where n was the program's version number. Also, conflicting uses of some filename extensions developed. One example is .rpm , used for both RPM Package Manager packages and RealPlayer Media files;. Others are .qif , shared by DESQview fonts, Quicken financial ledgers, and QuickTime pictures; .gba , shared by GrabIt scripts and Game Boy Advance ROM images; .sb , used for SmallBasic and Scratch ; and .dts , being used for Dynamix Three Space and DTS . In many Internet protocols, such as HTTP and MIME email ,

1449-399: Is a great advantage of IFF and similar formats. Filename extension A filename extension , file name extension or file extension is a suffix to the name of a computer file (for example, .txt , .docx , .md ). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically delimited from the rest of the filename with

SECTION 20

#1732773392182

1518-524: Is based on IFF, except the byte order has been changed to little-endian to match the x86 microprocessor architecture. Apple 's Audio Interchange File Format (AIFF) is a big-endian audio file format developed from IFF. The TIFF image file format is not related to IFF. An IFF file is built up from chunks . Each chunk begins with what the specification calls a "Type ID" (what the Macintosh called an OSType , and Windows developers might call

1587-465: Is changed, the command name extension is changed as well, and the OS provides a consistent API by allowing the same extensionless version of the command to be used in both cases. This method suffers somewhat from the essentially global nature of the association mapping, as well as from developers' incomplete avoidance of extensions when calling programs, and that developers can not force that avoidance. Windows

1656-476: Is just a marker and the content of the file does not have to match it. This can be used to disguise malicious content. When trying to identify a file for security reasons, it is therefore considered dangerous to rely on the extension alone and a proper analysis of the content of the file is preferred. For example, on UNIX-like systems, it is not uncommon to find files with no extensions at all, as commands such as file are meant to be used instead, and will read

1725-416: Is like a record structure, containing a type ID (indicating the record type) followed by nested chunks specifying the record fields. A LIST is a factoring structure containing a series of PROP (property) chunks plus nested group chunks to which those properties apply. A CAT  is just a collection of nested chunks with no special semantics. Group chunks can contain other group chunks, depending on

1794-606: Is no general encoding standard for filenames. File names have to be exchanged between software environments for network file transfer, file system storage, backup and file synchronization software, configuration management, data compression and archiving, etc. It is thus very important not to lose file name information between applications. This led to wide adoption of Unicode as a standard for encoding file names, although legacy software might not be Unicode-aware. Traditionally, filenames allowed any character in their filenames as long as they were file system safe. Although this permitted

1863-747: Is that different instances of the script or program can use different files. This makes an absolute or relative path composed of a sequence of filenames. Unix-like file systems allow a file to have more than one name; in traditional Unix-style file systems, the names are hard links to the file's inode or equivalent. Windows supports hard links on NTFS file systems, and provides the command fsutil in Windows XP, and mklink in later versions, for creating them. Hard links are different from Windows shortcuts , classic Mac OS/macOS aliases , or symbolic links . The introduction of LFNs with VFAT allowed filename aliases. For example, longfi~1.??? with

1932-649: Is that this will appear as LOVE-LETTER-FOR-YOU.TXT , a harmless text file, without alerting the user to the fact that it is a harmful computer program, in this case, written in VBScript . Default behavior for ReactOS is to display filename extensions in ReactOS Explorer . Later Windows versions (starting with Windows XP Service Pack 2 and Windows Server 2003 ) included customizable lists of filename extensions that should be considered "dangerous" in certain "zones" of operation, such as when downloaded from

2001-400: Is the only remaining widespread employer of this mechanism. On systems with interpreter directives , including virtually all versions of Unix, command name extensions have no special significance, and are by standard practice not used, since the primary method to set interpreters for scripts is to start them with a single line specifying the interpreter to use. In these environments, including

2070-431: The file type . Some other file systems, such as Unix file systems, VFAT , and NTFS , treat a filename as a single string; a convention often used on those file systems is to treat the characters following the last period in the filename, in a filename containing periods, as the extension part of the filename. Multiple output files created by an application may use the same basename and various extensions. For example,

2139-416: The web or received as an e-mail attachment. Modern antivirus software systems also help to defend users against such attempted attacks where possible. Some viruses take advantage of the similarity between the " .com " top-level domain and the ".COM" filename extension by emailing malicious, executable command-file attachments under names superficially similar to URLs ( e.g. , "myparty.yahoo.com"), with

Interchange File Format - Misplaced Pages Continue

2208-496: The "." to be just another character allowed in file names. It allows for variable-length filenames, permitting more than one dot, and hence multiple suffixes, as well as no dot, and hence no suffix. Some components of Multics, and applications running on it, use suffixes to indicate file types, but not all files are required to have a suffix — for example, executables and ordinary text files usually have no suffixes in their names. File systems for UNIX-like operating systems also store

2277-401: The 8.3 name/extension split in file names from non-NT Windows. The classic Mac OS disposed of filename-based extension metadata entirely; it used, instead, a distinct file type code to identify the file format. Additionally, a creator code was specified to determine which application would be launched when the file's icon was double-clicked . macOS , however, uses filename suffixes as

2346-483: The Internet. For instance, a content author may specify the extension svgz for a compressed Scalable Vector Graphics file, but a web server that does not recognize this extension may not send the proper content type application/svg+xml and its required compression header, leaving web browsers unable to correctly interpret and display the image. BeOS , whose BFS file system supports extended attributes, would tag

2415-460: The Unicode version in use. For instance, UDF is limited to Unicode 2.0; macOS's HFS+ file system applies NFD Unicode normalization and is optionally case-sensitive (case-insensitive by default.) Filename maximum length is not standard and might depend on the code unit size. Although it is a serious issue, in most cases this is a limited one. On Linux, this means the filename is not enough to open

2484-527: The attributes separately from the file name. Around 1995, VFAT , an extension to the MS-DOS FAT filesystem, was introduced in Windows 95 and Windows NT . It allowed mixed-case long filenames (LFNs), using Unicode characters, in addition to classic "8.3" names. Programs and devices may automatically assign names to files such as a numerical counter (for example IMG_0001.JPG ) or a time stamp with

2553-424: The clock of their camera. Internet-connected devices such as smartphones may synchronize their clock from a NTP server. An absolute reference includes all directory levels. In some systems, a filename reference that does not include the complete directory path defaults to the current working directory . This is a relative reference. One advantage of using a relative reference in program configuration files or scripts

2622-459: The current date and time. The benefit of a time stamped file name is that it facilitates searching files by date, given that file managers usually feature file searching by name. In addition, files from different devices can be merged in one folder without file naming conflicts. Numbered file names, on the other hand, do not require that the device has a correctly set internal clock. For example, some digital camera users might not bother setting

2691-404: The effect that unaware users click on email-embedded links that they think lead to websites but actually download and execute the malicious attachments. There have been instances of malware crafted to exploit vulnerabilities in some Windows applications which could cause a stack-based buffer overflow when opening a file with an overly long, unhandled filename extension. The filename extension

2760-474: The encoding used for a filename as part of the extended file information. This forced costly filename encoding guessing with each file access. A solution was to adopt Unicode as the encoding for filenames. In the classic Mac OS, however, encoding of the filename was stored with the filename attributes. The Unicode standard solves the encoding determination issue. Nonetheless, some limited interoperability issues remain, such as normalization (equivalence), or

2829-480: The entire name. Many Windows applications are limited to a MAX_PATH value of 260, but Windows file names can easily exceed this limit. From Windows 10, version 1607 , MAX_PATH limitations have been removed. Filenames in some file systems, such as FAT and the ODS-1 and ODS-2 levels of Files-11 , are composed of two parts: a base name or stem and an extension or suffix used by some applications to indicate

Interchange File Format - Misplaced Pages Continue

2898-670: The exact capitalization by which it is named. On a case-insensitive, case-preserving file system, on the other hand, only one of "MyName.Txt", "myname.txt" and "Myname.TXT" can be the name of a file in a given directory at a given time, and a file with one of these names can be referenced by any capitalization of the name. From its original inception, the file systems on Unix and its derivative systems were case-sensitive and case-preserving. However, not all file systems on those systems are case-sensitive; by default, HFS+ and APFS in macOS are case-insensitive but case-preserving, and SMB servers usually provide case-insensitive behavior (even when

2967-454: The extension in a command name unnecessarily exposes an implementation detail which puts all references to the commands from other programs at future risk if the implementation changes. For example, it would be perfectly normal for a shell script to be reimplemented in Python or Ruby, and later in C or C++, all of which would change the name of the command were extensions used. Without extensions,

3036-411: The extension is a separate namespace from the filename. Under Microsoft's DOS and Windows , extensions such as EXE , COM or BAT indicate that a file is a program executable . In OS/360 and successors , the part of the dataset name following the last period, called the low level qualifier, is treated as an extension by some software, e.g., TSO EDIT, but it has no special significance to

3105-633: The file including additional information. The original File Allocation Table (FAT) file system, used by Standalone Disk BASIC-80 , had a 6.3 file name, with a maximum of 6 bytes in the name and a maximum of 3 bytes in the extension. The FAT12 and FAT16 file systems in IBM PC DOS / MS-DOS and Microsoft Windows prior to Windows 95 used the same 8.3 convention as the CP/M file system. The FAT file systems supported 8-bit characters, allowing them to support non-ASCII characters in file names, and stored

3174-443: The file is a tar archive of one or more files, and the .gz indicates that the tar archive file is compressed with gzip ). Programs transforming or creating files may add the appropriate extension to names inferred from input file names (unless explicitly given an output file name), but programs reading files usually ignore the information; it is mostly intended for the human user. It is more common, especially in binary files, for

3243-438: The file name as a single string, with "." as just another character in the file name. A file with more than one suffix is sometimes said to have more than one extension, although terminology varies in this regard, and most authors define extension in a way that does not allow more than one in the same file name. More than one extension usually represents nested transformations, such as files.tar.gz (the .tar indicates that

3312-440: The file to contain internal or external metadata describing its contents. This model generally requires the full filename to be provided in commands, whereas the metadata approach often allows the extension to be omitted. In DOS and 16-bit Windows , file names have a maximum of 8 characters, a period, and an extension of up to three letters. The FAT file system for DOS and Windows stores file names as an 8-character name and

3381-415: The file's header to determine its content. Filename A filename or file name is a name used to uniquely identify a computer file in a file system . Different file systems impose different restrictions on filename lengths. A filename may (depending on the file system) include: The components required to identify a file by utilities and applications varies across operating systems, as does

3450-535: The file. Some people use the term filename when referring to a complete specification of device, subdirectories and filename such as the Windows C:\Program Files\Microsoft Games\Chess\Chess.exe . The filename in this case is Chess.exe . Some utilities have settings to suppress the extension as with MS Windows Explorer. During the 1970s, some mainframe and minicomputers had operating systems where files on

3519-420: The interpreter is normally specified as a header in the script (" shebang "). On association-based systems, the filename extension is generally mapped to a single, system-wide selection of interpreter for that extension (such as ".py" meaning to use Python), and the command itself is runnable from the command line even if the extension is omitted (assuming appropriate setup is done). If the implementation language

SECTION 50

#1732773392182

3588-454: The introduction of VFAT , store filenames as upper-case regardless of the letter case used to create them. For example, a file created with the name "MyName.Txt" or "myname.txt" would be stored with the filename "MYNAME.TXT" (VFAT preserves the letter case). Any variation of upper and lower case can be used to refer to the same file. These kinds of file systems are called case-insensitive and are not case-preserving . Some filesystems prohibit

3657-480: The later ReFS , also store the file name as a single string; again, the convention of using suffixes to simulate extensions continued, for compatibility with existing versions of Windows. In Windows NT 3.5 , a variant of the FAT file system, called VFAT appeared; it supports longer file names, with the file name being treated as a single string. Windows 95 , with VFAT, introduced support for long file names, and removed

3726-496: The needs of the application. Group chunks, like their simpler counterparts, contain a length element. Skipping over a group can thus be done with a simple relative seek operation . Chunks must begin on even file offsets, as befits the origins of IFF on the Motorola 68000 processor, which couldn't address quantities larger than a byte on odd addresses. Thus chunks with odd lengths will be "padded" to an even byte boundary by adding

3795-415: The new Unicode encoding. Mac OS X 10.3 marked Apple's adoption of Unicode 3.2 character decomposition, superseding the Unicode 2.1 decomposition used previously. This change caused problems for developers writing software for Mac OS X. Within a single directory, filenames must be unique. Since the filename syntax also applies for directories, it is not possible to create a file and directory entries with

3864-518: The operating system itself; the same applies to Unix files in MVS. The filename extension was originally used to determine the file's generic type. The need to condense a file's type into three characters frequently led to abbreviated extensions. Examples include using .GFX for graphics files, .TXT for plain text , and .MUS for music. However, because many different software programs have been made that all handle these data types (and others) in

3933-838: The other filename's file. Some filesystems restrict the length of filenames. In some cases, these lengths apply to the entire file name, as in 44 characters in IBM z/OS . In other cases, the length limits may apply to particular portions of the filename, such as the name of a file in a directory, or a directory name. For example, 9 (e.g., 8-bit FAT in Standalone Disk BASIC ), 11 (e.g. FAT12 , FAT16 , FAT32 in DOS), 14 (e.g. early Unix), 21 ( Human68K ), 31, 30 (e.g. Apple DOS 3.2 and 3.3), 15 (e.g. Apple ProDOS ), 44 (e.g. IBM S/370), or 255 (e.g. early Berkeley Unix) characters or bytes. Length limits often result from assigning fixed space in

4002-467: The recommended .html filename extension. This also became a problem for programmers experimenting with the Java programming language , since it requires the four-letter suffix .java for source code files and the five-letter suffix .class for Java compiler object code output files. Filename extensions may be considered a type of metadata . They are commonly used to imply information about

4071-482: The repository. Paths are only normalized for the purpose of comparisons. Nonetheless, some communities have patented this strategy, forbidding its use by other communities. To limit interoperability issues, some ideas described by Sun are to: Those considerations create a limitation not allowing a switch to a future encoding different from UTF-8. One issue was migration to Unicode. For this purpose, several software companies provided software for migrating filenames to

4140-558: The same character set for composing a filename. Before Unicode became a de facto standard, file systems mostly used a locale-dependent character set. By contrast, some new systems permit a filename to be composed of almost any character of the Unicode repertoire, and even some non-Unicode byte sequences. Limitations may be imposed by the file system, operating system, application, or requirements for interoperability with other systems. Many file system utilities prohibit control characters from appearing in filenames. In Unix-like file systems,

4209-585: The same name in a single directory. Multiple files in different directories may have the same name. Uniqueness approach may differ both on the case sensitivity and on the Unicode normalization form such as NFC, NFD. This means two separate files might be created with the same text filename and a different byte implementation of the filename, such as L"\x00C0.txt" (UTF-16, NFC) (Latin capital A with grave) and L"\x0041\x0300.txt" (UTF-16, NFD) (Latin capital A, grave combining). Some filesystems, such as FAT prior to

SECTION 60

#1732773392182

4278-522: The same structure: four letters followed with length), and some formats use this. There are standard chunks that could be present in any IFF file, such as AUTH (containing text with information about author of the file), ANNO (containing text with annotation, usually name of the program that created the file), NAME (containing text with name of the work in the file), VERS (containing file version), (c)  (containing text with copyright information). There are also chunks that are common among

4347-514: The syntax and format for a valid filename. The characters allowed in filenames depend on the file system. The letters A–Z and digits 0–9 are allowed by most file systems; many file systems support additional characters, such as the letters a–z, special characters, and other printable characters such as accented letters, symbols in non-Roman alphabets, and symbols in non-alphabetic scripts. Some file systems allow even unprintable characters, including Bell , Null , Return and Linefeed , to be part of

4416-505: The system were identified by a user name, or account number. For example, on the TOPS-10 and RSTS/E operating systems from Digital Equipment Corporation , files were identified by On the OS/VS1 , MVS , and OS/390 operating systems from IBM , a file name was up to 44 characters, consisting of upper case letters, digits, and the period. A file name must start with a letter or number,

4485-413: The type of a bitstream is stated as the media type , or MIME type, of the stream, rather than a filename extension. This is given in a line of text preceding the stream, such as Content-type: text/plain . There is no standard mapping between filename extensions and media types, resulting in possible mismatches in interpretation between authors, web servers, and client software when transferring files over

4554-439: The underlying file system is case-sensitive, e.g. Samba on most Unix-like systems), and SMB client file systems provide case-insensitive behavior. File system case sensitivity is a considerable challenge for software such as Samba and Wine , which must interoperate efficiently with both systems that treat uppercase and lowercase files as different and with systems that treat them the same. File systems have not always provided

4623-405: The use of any encoding, and thus allowed the representation of any local text on any local system, it caused many interoperability issues. A filename could be stored using different byte strings in distinct systems within a single country, such as if one used Japanese Shift JIS encoding and another Japanese EUC encoding. Conversion was not possible as most systems did not expose a description of

4692-413: The use of lower case letters in filenames altogether. Some file systems store filenames in the form that they were originally created; these are referred to as case-retentive or case-preserving . Such a file system can be case-sensitive or case-insensitive . If case-sensitive, then "MyName.Txt" and "myname.txt" may refer to two different files in the same directory, and each file must be referenced by

4761-612: The way data might be stored in the file. The exact definition, giving the criteria for deciding what part of the file name is its extension, belongs to the rules of the specific file system used; usually the extension is the substring which follows the last occurrence, if any, of the dot character ( example: txt is the extension of the filename readme.txt , and html the extension of index.html ). On file systems of some mainframe systems such as CMS in VM , VMS , and of PC systems such as CP/M and derivative systems such as MS-DOS ,

#181818