Misplaced Pages

UTF-16

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

UTF-16 ( 16-bit Unicode Transformation Format) is a character encoding method capable of encoding all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two 16-bit code units . UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding now known as UCS-2 (for 2-byte Universal Character Set), once it became clear that more than 2 (65,536) code points were needed, including most emoji and important CJK characters such as for personal and place names.

#913086

94-535: UTF-16 is used by systems such as the Microsoft Windows API , the Java programming language and JavaScript /ECMAScript. It is also sometimes used for plain text and word-processing data files on Microsoft Windows. It is used by more modern implementations of SMS . UTF-16 is the only encoding (still) allowed on the web that is incompatible with 8-bit ASCII . However it has never gained popularity on

188-1138: A pair of 16- bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character. 65,520 of the 65,536 code points in this plane have been allocated to a Unicode block, leaving just 16 code points in a single unallocated range (2FE0..2FEF). As of Unicode 16.0 , the BMP comprises the following 164 blocks: Plane 1 , the Supplementary Multilingual Plane ( SMP ), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts include Linear B , Egyptian hieroglyphs , and cuneiform scripts. It also includes English reform orthographies like Shavian and Deseret , and some modern scripts like Osage , Warang Citi , Adlam , Wancho and Toto . Symbols and notations include historic and modern musical notation ; mathematical alphanumerics ; shorthands; Emoji and other pictographic sets; and game symbols for playing cards , mahjong , and dominoes . As of Unicode 16.0 ,

282-741: A web browser . The new service is an attempt at capitalizing on the growing trend, fostered during the COVID-19 pandemic , for businesses to adopt a hybrid remote work environment, in which "employees split their time between the office and home". As the service will be accessible through web browsers, Microsoft will be able to bypass the need to publish the service through Google Play or the Apple App Store . Microsoft announced Windows 365 availability to business and enterprise customers on August 2, 2021. Multilingual support has been built into Windows since Windows 3.0. The language for both

376-550: A change which Microsoft promised would provide better performance over its DOS-based predecessors. Windows XP would also introduce a redesigned user interface (including an updated Start menu and a "task-oriented" Windows Explorer ), streamlined multimedia and networking features, Internet Explorer 6 , integration with Microsoft's .NET Passport services, a " compatibility mode " to help provide backwards compatibility with software designed for previous versions of Windows, and Remote Assistance functionality. At retail, Windows XP

470-411: A code unit starts a character can be determined without examining earlier code units (i.e. the type of code unit can be determined by the ranges of values in which it falls). UTF-8 shares these advantages, but many earlier multi-byte encoding schemes (such as Shift JIS and other Asian multi-byte encodings) did not allow unambiguous searching and could only be synchronized by re-parsing from the start of

564-2067: A fixed size. The 338 blocks defined in Unicode 16.0 cover 27% of the possible code point space, and range in size from a minimum of 16 code points (sixteen blocks) to a maximum of 65,536 code points (Supplementary Private Use Area-A and -B, which constitute the entirety of planes 15 and 16). For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems. 0000–​0FFF 1000–​1FFF 2000–​2FFF 3000–​3FFF 4000–​4FFF 5000–​5FFF 6000–​6FFF 7000–​7FFF 8000–​8FFF 9000–​9FFF A000–​AFFF B000–​BFFF C000–​CFFF D000–​DFFF E000–​EFFF F000–​FFFF 10000–​10FFF 11000–​11FFF 12000–​12FFF 13000–​13FFF 14000–​14FFF 16000–​16FFF 17000–​17FFF 18000–​18FFF 1A000–​1AFFF 1B000–​1BFFF 1C000–​1CFFF 1D000–​1DFFF 1E000–​1EFFF 1F000–​1FFFF 20000–​20FFF 21000–​21FFF 22000–​22FFF 23000–​23FFF 24000–​24FFF 25000–​25FFF 26000–​26FFF 27000–​27FFF 28000–​28FFF 29000–​29FFF 2A000–​2AFFF 2B000–​2BFFF 2C000–​2CFFF 2D000–​2DFFF 2E000–​2EFFF 2F000–​2FFFF 30000–​30FFF 31000–​31FFF 32000–​32FFF E0000–​E0FFF 15: SPUA-A F0000–​FFFFF 16: SPUA-B 100000–​10FFFF The first plane, plane 0 ,

658-502: A hint to perform byte-swapping for the remaining values. If the BOM is missing, RFC 2781 recommends that big-endian (BE) encoding be assumed. In practice, due to Windows using little-endian (LE) order by default, many applications assume little-endian encoding. It is also reliable to detect endianness by looking for null bytes, on the assumption that characters less than U+0100 are very common. If more even bytes (starting at 0) are null, then it

752-513: A large number of new features, Windows 7 was intended to be a more focused, incremental upgrade to the Windows line, with the goal of being compatible with applications and hardware with which Windows Vista was already compatible. Windows 7 has multi-touch support, a redesigned Windows shell with an updated taskbar with revealable jump lists that contain shortcuts to files frequently used with specific applications and shortcuts to tasks within

846-467: A larger 31-bit space and an encoding ( UCS-4 ) that would require 4 bytes per character. This was resisted by the Unicode Consortium , both because 4 bytes per character wasted a lot of memory and disk space, and because some manufacturers were already heavily invested in 2-byte-per-character technology. The UTF-16 encoding scheme was developed as a compromise and introduced with version 2.0 of

940-511: A mix of UTF-16, UTF-8, and legacy byte encodings. While there's been some UTF-8 support for even Windows XP, it was improved (in particular the ability to name a file using UTF-8) in Windows 10 insider build 17035 and the May 2019 update. As of May 2019, Microsoft recommends software use UTF-8 , on Windows and Xbox , instead of other 8-bit encodings. It is unclear if they are recommending usage of UTF-8 over UTF-16, though they do state "UTF-16 [..]

1034-610: A modular, portable kernel with preemptive multitasking and support for multiple processor architectures. However, following the successful release of Windows 3.0 , the NT development team decided to rework the project to use an extended 32-bit port of the Windows API known as Win32 instead of those of OS/2. Win32 maintained a similar structure to the Windows APIs (allowing existing Windows applications to easily be ported to

SECTION 10

#1732782344914

1128-400: A new Windows 365 service in the following month. The new service will allow for cross-platform usage , aiming to make the operating system available for both Apple and Android users. It is a separate service and offers several variations including Windows 365 Frontline, Windows 365 Boot, and the Windows 365 app. The subscription service will be accessible through any operating system with

1222-579: A program called "Interface Manager". The name "Windows" comes from the fact that the system was one of the first to use graphical boxes to represent programs; in the industry, at the time, these were called "windows" and the underlying software was called "windowing software." It was announced in November 1983 (after the Apple Lisa , but before the Macintosh ) under the name "Windows", but Windows 1.0

1316-510: A redesigned, object oriented user interface, replacing the previous Program Manager with the Start menu , taskbar , and Windows Explorer shell . Windows 95 was a major commercial success for Microsoft; Ina Fried of CNET remarked that "by the time Windows 95 was finally ushered off the market in 2001, it had become a fixture on computer desktops around the world." Microsoft published four OEM Service Releases (OSR) of Windows 95, each of which

1410-465: A special version with integrated peer-to-peer networking features and a version number of 3.11, was released. It was sold along with Windows 3.1. Support for Windows 3.1 ended on December 31, 2001. Windows 3.2, released in 1994, is an updated version of the Chinese version of Windows 3.1. The update was limited to this language version, as it fixed only issues related to the complex writing system of

1504-543: A specific base language and are commonly used for more popular languages such as French or Chinese. These languages cannot be downloaded through the Download Center, but are available as optional updates through the Windows Update service (except Windows 8). The interface language of installed applications is not affected by changes in the Windows interface language. The availability of languages depends on

1598-416: A successor to NT 4.0. The Windows NT name was dropped at this point in order to put a greater focus on the Windows brand. The next major version of Windows NT, Windows XP , was released to manufacturing (RTM) on August 24, 2001, and to the general public on October 25, 2001. The introduction of Windows XP aimed to unify the consumer-oriented Windows 9x series with the architecture introduced by Windows NT,

1692-573: Is "constructed from a pair of Unicode scalar values" (and those values are outside the BMP and require 4 bytes each). UTF-16 in no way assists in "counting characters" or in "measuring the width of a string". UTF-16 is often claimed to be more space-efficient than UTF-8 for East Asian languages, since it uses two bytes for characters that take 3 bytes in UTF-8. Since real text contains many spaces, numbers, punctuation, markup (for e.g. web pages), and control characters, which take only one byte in UTF-8, this

1786-549: Is a product line of proprietary graphical operating systems developed and marketed by Microsoft . It is grouped into families and sub-families that cater to particular sectors of the computing industry – Windows (unqualified) for a consumer or corporate workstation , Windows Server for a server and Windows IoT for an embedded system . Windows is sold as either a consumer retail product or licensed to third-party hardware manufacturers who sell products bundled with Windows. The first version of Windows, Windows 1.0 ,

1880-731: Is a unique burden that Windows places on code that targets multiple platforms." The IBM i operating system designates CCSID ( code page ) 13488 for UCS-2 encoding and CCSID 1200 for UTF-16 encoding, though the system treats them both as UTF-16. UTF-16 is used by the Qualcomm BREW operating systems; the .NET environments; and the Qt cross-platform graphical widget toolkit . Symbian OS used in Nokia S60 handsets and Sony Ericsson UIQ handsets uses UCS-2. iPhone handsets use UTF-16 for Short Message Service instead of UCS-2 described in

1974-435: Is an edition of Windows that runs on minimalistic computers , like satellite navigation systems and some mobile phones. Windows Embedded Compact is based on its own dedicated kernel, dubbed Windows CE kernel. Microsoft licenses Windows CE to OEMs and device makers. The OEMs and device makers can modify and create their own user interfaces and experiences, while Windows CE provides the technical foundation to do so. Windows CE

SECTION 20

#1732782344914

2068-450: Is an unofficial name given to the version of Windows that runs on Xbox consoles. From Xbox One onwards it is an implementation with an emphasis on virtualization (using Hyper-V ) as it is three operating systems running at once, consisting of the core operating system , a second implemented for games and a more Windows-like environment for applications. Microsoft updates Xbox One's OS every month, and these updates can be downloaded from

2162-484: Is big-endian. The standard also allows the byte order to be stated explicitly by specifying UTF-16BE or UTF-16LE as the encoding type. When the byte order is specified explicitly this way, a BOM is specifically not supposed to be prepended to the text, and a U+FEFF at the beginning should be handled as a ZWNBSP character. Most applications ignore a BOM in all cases despite this rule. For Internet protocols, IANA has approved "UTF-16", "UTF-16BE", and "UTF-16LE" as

2256-562: Is only true for artificially constructed dense blocks of text. A more serious claim can be made for Devanagari and Bengali , which use multi-letter words and all the letters take 3 bytes in UTF-8 and only 2 in UTF-16. In addition the Chinese Unicode encoding standard GB 18030 always produces files the same size or smaller than UTF-16 for all languages, not just for Chinese (it does this by sacrificing self-synchronization). UTF-16

2350-409: Is rarely tested), has led to many bugs in software, including in Windows itself, the solution is usually adopting UTF-8 , as most software has done including (partially) Windows itself and Java and JavaScript. In the late 1980s, work began on developing a uniform encoding for a "Universal Character Set" ( UCS ) that would replace earlier language-specific encodings with one coordinated system. The goal

2444-529: Is said to be available to update from qualified Windows 7 with SP1, Windows 8.1 and Windows Phone 8.1 devices from the Get Windows 10 Application (for Windows 7 , Windows 8.1 ) or Windows Update ( Windows 7 ). In February 2017, Microsoft announced the migration of its Windows source code repository from Perforce to Git . This migration involved 3.5 million separate files in a 300-gigabyte repository. By May 2017, 90 percent of its engineering team

2538-413: Is still used. JavaScript may use UCS-2 or UTF-16. As of ES2015, string methods and regular expression flags have been added to the language that permit handling strings from an encoding-agnostic perspective. UEFI uses UTF-16 to encode strings by default. Swift , Apple's preferred application language, used UTF-16 to store strings until version 5 which switched to UTF-8. Quite a few languages make

2632-573: Is the most popular desktop operating system in the world, with a 70% market share as of March 2023 , according to StatCounter ; however when including mobile OS es, it is not the most used, in favor of Android . As of today, the most recent version of Windows is Windows 11 for consumer PCs and tablets , Windows 11 Enterprise for corporations, and Windows Server 2025 for servers. Still supported are some editions of Windows 10 , Windows Server 2016 or later (and exceptionally with paid support down to Windows Server 2008 ). As of today,

2726-528: Is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 16.0, five of the planes have assigned code points (characters), and seven are named. The limit of 17 planes is due to UTF-16 , which can encode 2 code points (16 planes) as pairs of words , plus

2820-447: Is the last Windows client operating system to support Itanium. Windows Server line continues to support this platform until Windows Server 2012 ; Windows Server 2008 R2 is the last Windows operating system to support Itanium architecture. On April 25, 2005, Microsoft released Windows XP Professional x64 Edition and Windows Server 2003 x64 editions to support x86-64 (or simply x64), the 64-bit version of x86 architecture. Windows Vista

2914-474: Is used for text in the OS ; API of all currently supported versions of Microsoft Windows (and including at least all since Windows CE / 2000 / XP / 2003 / Vista / 7 ) including Windows 10 . In Windows XP, no code point above U+FFFF is included in any font delivered with Windows for European languages. Older Windows NT systems (prior to Windows 2000) only support UCS-2 . Files and network data tend to be

UTF-16 - Misplaced Pages Continue

3008-456: The 3GPP TS 23.038 ( GSM ) and IS-637 ( CDMA ) standards. The Joliet file system , used in CD-ROM media, encodes file names using UCS-2BE (up to sixty-four Unicode characters per file name). Python version 2.0 officially only used UCS-2 internally, but the UTF-8 decoder to "Unicode" produced correct UTF-16. There was also the ability to compile Python so that it used UTF-32 internally, this

3102-588: The Basic Multilingual Plane ( BMP ), contains characters for almost all modern languages, and a large number of symbols . A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing . Most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean ( CJK ) characters. The High Surrogate ( U+D800–U+DBFF ) and Low Surrogate ( U+DC00–U+DFFF ) codes are reserved for encoding non-BMP characters in UTF-16 by using

3196-585: The Start screen , which uses large tiles that are more convenient for touch interactions and allow for the display of continually updated information, and a new class of apps which are designed primarily for use on touch-based devices. The new Windows version required a minimum resolution of 1024×768 pixels, effectively making it unfit for netbooks with 800×600-pixel screens. Other changes include increased integration with cloud services and other online platforms (such as social networks and Microsoft's own OneDrive (formerly SkyDrive) and Xbox Live services),

3290-404: The Unicode Consortium , the latter representing mostly manufacturers of computing equipment. The two groups attempted to synchronize their character assignments so that the developing encodings would be mutually compatible. The early 2-byte encoding was originally called "Unicode", but is now called "UCS-2". When it became increasingly clear that 2 characters would not suffice, IEEE introduced

3384-974: The Windows Driver Model , support for USB composite devices , support for ACPI , hibernation , and support for multi-monitor configurations. Windows 98 also included integration with Internet Explorer 4 through Active Desktop and other aspects of the Windows Desktop Update (a series of enhancements to the Explorer shell which was also made available for Windows 95). In May 1999, Microsoft released Windows 98 Second Edition , an updated version of Windows 98. Windows 98 SE added Internet Explorer 5.0 and Windows Media Player 6.2 amongst other upgrades. Mainstream support for Windows 98 ended on June 30, 2002, and extended support for Windows 98 ended on July 11, 2006. On September 14, 2000, Microsoft released Windows Me (Millennium Edition),

3478-505: The Windows Image Acquisition framework for retrieving images from scanners and digital cameras), additional system utilities such as System File Protection and System Restore , and updated home networking tools. However, Windows Me was faced with criticism for its speed and instability, along with hardware compatibility issues and its removal of real mode DOS support. PC World considered Windows Me to be one of

3572-578: The Windows Store service for software distribution, and a new variant known as Windows RT for use on devices that utilize the ARM architecture , and a new keyboard shortcut for screenshots . An update to Windows 8, called Windows 8.1 , was released on October 17, 2013, and includes features such as new live tile sizes, deeper OneDrive integration, and many other revisions. Windows 8 and Windows 8.1 have been subject to some criticism, such as

3666-401: The high surrogates ( 0xD800–0xDBFF ), low surrogates ( 0xDC00–0xDFFF ), and valid BMP characters (0x0000–0xD7FF, 0xE000–0xFFFF) are disjoint , it is not possible for a surrogate to match a BMP character, or for two adjacent code units to look like a legal surrogate pair . This simplifies searches a great deal. It also means that UTF-16 is self-synchronizing on 16-bit words: whether

3760-451: The x86 -based personal computer became dominant in the professional world. Windows NT 4.0 and its predecessors supported PowerPC , DEC Alpha and MIPS R4000 (although some of the platforms implement 64-bit computing , the OS treated them as 32-bit). Windows 2000 dropped support for all platforms, except the third generation x86 (known as IA-32 ) or newer in 32-bit mode. The client line of

3854-446: The "Tablet PC" edition (designed for mobile devices meeting its specifications for a tablet computer , with support for stylus pen input and additional pen-enabled applications). Mainstream support for Windows XP ended on April 14, 2009. Extended support ended on April 8, 2014. After Windows 2000, Microsoft also changed its release schedules for server operating systems; the server counterpart of Windows XP, Windows Server 2003 ,

UTF-16 - Misplaced Pages Continue

3948-615: The BMP as a single word. UTF-8 was designed with a much larger limit of 2 (2,147,483,648) code points (32,768 planes), and would still be able to encode 2 (2,097,152) code points (32 planes) even under the current limit of 4 bytes . The 17 planes can accommodate 1,114,112 code points. Of these, 2,048 are surrogates (used to make the pairs in UTF-16), 66 are non-characters , and 137,468 are reserved for private use , leaving 974,530 for public assignment. Planes are further subdivided into Unicode blocks , which, unlike planes, do not have

4042-610: The BMP") are encoded with a single 16-bit code unit equal to the numerical value of the code point, as in the older UCS-2. Code points greater than or equal to 2 ("above the BMP") are encoded using two 16-bit code units. These two 16-bit code units are chosen from the UTF-16 surrogate range 0xD800–0xDFFF which had not previously been assigned to characters. Values in this range are not used as characters, and UTF-16 provides no legal way to code them as individual code points. A UTF-16 stream, therefore, consists of single 16-bit codes outside

4136-736: The C development environment, which included numerous windows samples. Windows 2.0 was released in December 1987, and was more popular than its predecessor. It features several improvements to the user interface and memory management. Windows 2.03 changed the OS from tiled windows to overlapping windows. The result of this change led to Apple Computer filing a suit against Microsoft alleging infringement on Apple's copyrights (eventually settled in court in Microsoft's favor in 1993). Windows 2.0 also introduced more sophisticated keyboard shortcuts and could make use of expanded memory . Windows 2.1

4230-589: The Chinese language. Windows 3.2 was generally sold by computer manufacturers with a ten-disk version of MS-DOS that also had Simplified Chinese characters in basic output and some translated utilities. The next major consumer-oriented release of Windows, Windows 95 , was released on August 24, 1995. While still remaining MS-DOS-based, Windows 95 introduced support for native 32-bit applications , plug and play hardware, preemptive multitasking , long file names of up to 255 characters, and provided increased stability over its predecessors. Windows 95 also introduced

4324-664: The SMP comprises the following 161 blocks: Plane 2 , the Supplementary Ideographic Plane ( SIP ), is used for CJK Ideographs, mostly CJK Unified Ideographs , that were not included in earlier character encoding standards. As of Unicode 16.0 , the SIP comprises the following seven blocks: Plane 3 is the Tertiary Ideographic Plane (TIP). CJK Unified Ideographs Extension G was added to

4418-504: The TIP in Unicode 13.0, released in March 2020. It also is tentatively allocated for Oracle Bone script and Small Seal Script . As of Unicode 16.0 , the TIP comprises the following two blocks: Planes 4 to 13 (planes 4 to D in hexadecimal ): No characters have yet been assigned, or proposed for assignment, to Planes 4 through 13. Plane 14 ( E in hexadecimal) is designated as

4512-651: The Unicode standard in July 1996. It is fully specified in RFC 2781, published in 2000 by the IETF . UTF-16 is specified in the latest versions of both the international standard ISO/IEC 10646 and the Unicode Standard. "UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either 10646 or the Unicode Standard." UTF-16 will never be extended to support a larger number of code points or to support

4606-588: The Windows NT family still ran on IA-32 up to Windows 10 (the server line of the Windows NT family still ran on IA-32 up to Windows Server 2008 ). With the introduction of the Intel Itanium architecture ( IA-64 ), Microsoft released new versions of Windows to support it. Itanium versions of Windows XP and Windows Server 2003 were released at the same time as their mainstream x86 counterparts. Windows XP 64-Bit Edition (Version 2003), released in 2003,

4700-430: The Windows interface, and require a certain base language (the language which Windows originally shipped with). This is used for most languages in emerging markets. Full Language Packs, which translate the complete operating system, are only available for specific editions of Windows (Ultimate and Enterprise editions of Windows Vista and 7, and all editions of Windows 8, 8.1 and RT except Single Language). They do not require

4794-620: The Xbox 360's system is backwards compatible with the original Xbox. Up to and including every version before Windows 2000 , Microsoft used an in-house version control system named Source Library Manager (SLM). Shortly after Windows 2000 was released, Microsoft switched to a fork of Perforce named Source Depot. This system was used up until 2017 once the system could not keep up with the size of Windows. Microsoft had begun to integrate Git into Team Foundation Server in 2013, but Windows (and Office) continued to rely on Source Depot. The Windows code

SECTION 50

#1732782344914

4888-527: The Xbox Live service to the Xbox and subsequently installed, or by using offline recovery images downloaded via a PC. It was originally based on NT 6.2 (Windows 8) kernel, and the latest version runs on an NT 10.0 base. This system is sometimes referred to as "Windows 10 on Xbox One". Xbox One and Xbox Series operating systems also allow limited (due to licensing restrictions and testing resources) backward compatibility with previous generation hardware, and

4982-457: The application developers themselves. Windows 8 and Windows Server 2012 introduce a new Language Control Panel where both the interface and input languages can be simultaneously changed, and language packs, regardless of type, can be downloaded from a central location. The PC Settings app in Windows 8.1 and Windows Server 2012 R2 also includes a counterpart settings page for this. Changing

5076-437: The application, a home networking system called HomeGroup , and performance improvements. Windows 8 , the successor to Windows 7, was released generally on October 26, 2012. A number of significant changes were made on Windows 8, including the introduction of a user interface based around Microsoft's Metro design language with optimizations for touch-based devices such as tablets and all-in-one PCs. These changes include

5170-477: The byte order of code units, UTF-16 allows a byte order mark (BOM), a code point with the value U+FEFF, to precede the first actual coded value. (U+FEFF is the invisible zero-width non-breaking space /ZWNBSP character). If the endian architecture of the decoder matches that of the encoder, the decoder detects the 0xFEFF value, but an opposite-endian decoder interprets the BOM as the noncharacter value U+FFFE reserved for this purpose. This incorrect result provides

5264-426: The code point are distributed among the UTF-16 bytes. Additional bits added by the UTF-16 encoding process are shown in black. UTF-16 and UCS-2 produce a sequence of 16-bit code units. Since most communication and storage protocols are defined for bytes, and each unit thus takes two 8-bit bytes, the order of the bytes may depend on the endianness (byte order) of the computer architecture. To assist in recognizing

5358-557: The code points that were replaced by surrogates, as this would violate the Unicode Stability Policy with respect to general category or surrogate code points. (Any scheme that remains a self-synchronizing code would require allocating at least one Basic Multilingual Plane (BMP) code point to start a sequence. Changing the purpose of a code point is disallowed.) Each Unicode code point is encoded either as one or two 16-bit code units . Code points less than 2 ("in

5452-409: The design, mostly because of virtual memory and loadable virtual device drivers ( VxDs ) that allow Windows to share arbitrary devices between multi-tasked DOS applications. Windows 3.0 applications can run in protected mode , which gives them access to several megabytes of memory without the obligation to participate in the software virtual memory scheme. They run inside the same address space, where

5546-549: The encoding part of the string object, and thus store and support a large set of encodings including UTF-16. Most consider UTF-16 and UCS-2 to be different encodings. Examples are the PHP language and MySQL . A method to determine what encoding a system is using internally is to ask for the "length" of string containing a single non-BMP character. If the length is 2 then UTF-16 is being used. 4 indicates UTF-8. 3 or 6 may indicate CESU-8 . 1 may indicate UTF-32, but more likely indicates

5640-516: The full Windows feature set. The early versions of Windows are often thought of as graphical shells, mostly because they ran on top of MS-DOS and used it for file system services. However, even the earliest Windows versions already assumed many typical operating system functions; notably, having their own executable file format and providing their own device drivers (timer, graphics, printer, mouse, keyboard and sound). Unlike MS-DOS, Windows allowed users to execute multiple graphical applications at

5734-436: The interface language also changes the language of preinstalled Windows Store apps (such as Mail, Maps and News) and certain other Microsoft-developed apps (such as Remote Desktop). The above limitations for language packs are however still in effect, except that full language packs can be installed for any edition except Single Language, which caters to emerging markets. Windows NT included support for several platforms before

SECTION 60

#1732782344914

5828-515: The keyboard and the interface can be changed through the Region and Language Control Panel. Components for all supported input languages, such as Input Method Editors , are automatically installed during Windows installation (in Windows XP and earlier, files for East Asian languages, such as Chinese, and files for right-to-left scripts, such as Arabic, may need to be installed separately, also from

5922-820: The language decodes the string to code points before measuring the "length". In many languages, quoted strings need a new syntax for quoting non-BMP characters, as the C-style "\uXXXX" syntax explicitly limits itself to 4 hex digits. The following examples illustrate the syntax for the non-BMP character U+1D11E 𝄞 MUSICAL SYMBOL G CLEF : Microsoft Windows 24H2 (10.0.26100.2454) (November 21, 2024 ; 5 days ago  ( 2024-11-21 ) ) [±] 23H2 (10.0.22635.4515) (November 22, 2024 ; 4 days ago  ( 2024-11-22 ) ) [±] 24H2 (10.0.26120.2415) (November 22, 2024 ; 4 days ago  ( 2024-11-22 ) ) [±] Microsoft Windows

6016-433: The last DOS-based version of Windows. Windows Me incorporated visual interface enhancements from its Windows NT-based counterpart Windows 2000 , had faster boot times than previous versions (which however, required the removal of the ability to access a real mode DOS environment, removing compatibility with some older programs), expanded multimedia functionality (including Windows Media Player 7, Windows Movie Maker , and

6110-489: The names for these encodings (the names are case insensitive). The aliases UTF_16 or UTF16 may be meaningful in some programming languages or software applications, but they are not standard names in Internet protocols. Similar designations, UCS-2BE and UCS-2LE , are used to show versions of UCS-2 . A "character" may use any number of Unicode code points. For instance an emoji flag character takes 8 bytes, since it

6204-622: The only active top-level family is Windows NT . The first version, Windows NT 3.1 , was intended for server computing and corporate workstations . It grew into a product line of its own and now consists of four sub-families that tend to be released almost simultaneously and share the same kernel. These top-level Windows families are no longer actively developed: The term Windows collectively describes any or all of several generations of Microsoft operating system products. These products are generally categorized as follows: The history of Windows dates back to 1981 when Microsoft started work on

6298-400: The other planes are encoded as two 16-bit code units called a surrogate pair . The first code unit is a high surrogate and the second is a low surrogate (These are also known as "leading" and "trailing" surrogates, respectively, analogous to the leading and trailing bytes of UTF-8.): Illustrated visually, the distribution of U' between W1 and W2 looks like: Since the ranges for

6392-403: The platform), but also supported the capabilities of the existing NT kernel . Following its approval by Microsoft's staff, development continued on what was now Windows NT, the first 32-bit version of Windows. However, IBM objected to the changes, and ultimately continued OS/2 development on its own. Windows NT was the first Windows operating system based on a hybrid kernel . The hybrid kernel

6486-495: The removal of the Start menu . On September 30, 2014, Microsoft announced Windows 10 as the successor to Windows 8.1. It was released on July 29, 2015, and addresses shortcomings in the user interface first introduced with Windows 8. Changes on PC include the return of the Start Menu, a virtual desktop system, and the ability to run Windows Store apps within windows on the desktop rather than in full-screen mode. Windows 10

6580-754: The said Control Panel). Third-party IMEs may also be installed if a user feels that the provided one is insufficient for their needs. Since Windows 2000, English editions of Windows NT have East Asian IMEs (such as Microsoft Pinyin IME and Microsoft Japanese IME) bundled, but files for East Asian languages may be manually installed on Control Panel. Interface languages for the operating system are free for download, but some languages are limited to certain editions of Windows. Language Interface Packs (LIPs) are redistributable and may be downloaded from Microsoft's Download Center and installed for any edition of Windows (XP or later) – they translate most, but not all, of

6674-420: The same time, through cooperative multitasking . Windows implemented an elaborate, segment-based, software virtual memory scheme, which allows it to run applications larger than available memory: code segments and resources are swapped in and thrown away when memory became scarce; data segments moved in memory when a given application had relinquished processor control. Windows 3.0 , released in 1990, improved

6768-439: The segmented memory provides a degree of protection. Windows 3.0 also featured improvements to the user interface. Microsoft rewrote critical operations from C into assembly . Windows 3.0 was the first version of Windows to achieve broad commercial success, selling 2 million copies in the first six months. Windows 3.1, made generally available on March 1, 1992, featured a facelift. In August 1993, Windows for Workgroups,

6862-628: The standard states that such arrangements should be treated as encoding errors. It is possible to unambiguously encode an unpaired surrogate (a high surrogate code point not followed by a low one, or a low one not preceded by a high one) in the format of UTF-16 by using a code unit equal to the code point. The result is not valid UTF-16, but the majority of UTF-16 encoder and decoder implementations do this when translating between encodings. To encode U+10437 (𐐷) to UTF-16: To decode U+10437 (𐐷) from UTF-16: The following table summarizes this conversion, as well as others. The colors indicate how bits from

6956-478: The string. UTF-16 is not self-synchronizing if one byte is lost or if traversal starts at a random byte. Because the most commonly used characters are all in the BMP, handling of surrogate pairs is often not thoroughly tested. This leads to persistent bugs and potential security holes, even in popular and well-reviewed application software (e.g. CVE - 2008-2938 , CVE- 2012-2135 ). The official Unicode standard says that no UTF forms, including UTF-16, can encode

7050-445: The surrogate code points. Since these will never be assigned a character, there should be no reason to encode them. However, Windows allows unpaired surrogates in filenames and other places, which generally means they have to be supported by software in spite of their exclusion from the Unicode standard. UCS-2, UTF-8, and UTF-32 can encode these code points in trivial and obvious ways, and a large amount of software does so, even though

7144-612: The surrogate range, and pairs of 16-bit values that are within the surrogate range. Both UTF-16 and UCS-2 encode code points in this range as single 16-bit code units that are numerically equal to the corresponding code points. These code points in the Basic Multilingual Plane (BMP) are the only code points that can be represented in UCS-2. As of Unicode 9.0, some modern non-Latin Asian, Middle-Eastern, and African scripts fall outside this range, as do most emoji characters. Code points from

7238-459: The web, where it is declared by under 0.003% of public web pages. UTF-8 , by comparison, accounts for over 98% of all web pages. The Web Hypertext Application Technology Working Group (WHATWG) considers UTF-8 "the mandatory encoding for all [text]" and that for security reasons browser applications should not use UTF-16. The variable length character of UTF-16, combined with the fact that most characters are not variable length (so variable length

7332-465: The worst operating systems Microsoft had ever released, and the fourth worst tech product of all time. In November 1988, a new development team within Microsoft (which included former Digital Equipment Corporation developers Dave Cutler and Mark Lucovsky ) began work on a revamped version of IBM and Microsoft's OS/2 operating system known as "NT OS/2". NT OS/2 was intended to be a secure, multi-user operating system with POSIX compatibility and

7426-407: Was announced as the successor to Windows 10 during a livestream. The new operating system was designed to be more user-friendly and understandable. It was released on October 5, 2021. As of May 2022, Windows 11 is a free upgrade to Windows 10 users who meet the system requirements. In July 2021, Microsoft announced it will start selling subscriptions to virtualized Windows desktops as part of

7520-492: Was available in a number of different editions , and has been subject to some criticism , such as drop of performance, longer boot time, criticism of new UAC, and stricter license agreement. Vista's server counterpart, Windows Server 2008 was released in early 2008. On July 22, 2009, Windows 7 and Windows Server 2008 R2 were released to manufacturing (RTM) and released to the public three months later on October 22, 2009. Unlike its predecessor, Windows Vista, which introduced

7614-466: Was designed as a modified microkernel , influenced by the Mach microkernel developed by Richard Rashid at Carnegie Mellon University, but without meeting all of the criteria of a pure microkernel. The first release of the resulting operating system, Windows NT 3.1 (named to associate it with Windows 3.1 ) was released in July 1993, with versions for desktop workstations and servers . Windows NT 3.5

7708-518: Was divided among 65 different repositories with a kind of virtualization layer to produce unified view of all of the code. Plane (Unicode)#Basic Multilingual Plane In the Unicode standard, a plane is a contiguous group of 65,536 (2 ) code points . There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–10 16 of the first two positions in six position hexadecimal format (U+ hh hhhh ). Plane 0

7802-462: Was marketed in two main editions : the "Home" edition was targeted towards consumers, while the "Professional" edition was targeted towards business environments and power users , and included additional security and networking features. Home and Professional were later accompanied by the "Media Center" edition (designed for home theater PCs , with an emphasis on support for DVD playback, TV tuner cards , DVR functionality, and remote controls), and

7896-698: Was not released until November 1985. Windows 1.0 was to compete with Apple 's operating system, but achieved little popularity. Windows 1.0 is not a complete operating system; rather, it extends MS-DOS . The shell of Windows 1.0 is a program known as the MS-DOS Executive . Components included Calculator , Calendar, Cardfile , Clipboard Viewer , Clock, Control Panel , Notepad , Paint , Reversi , Terminal and Write . Windows 1.0 does not allow overlapping windows. Instead, all windows are tiled . Only modal dialog boxes may appear over other windows. Microsoft sold as included Windows Development libraries with

7990-462: Was released in April 2003. It was followed in December 2005, by Windows Server 2003 R2. After a lengthy development process , Windows Vista was released on November 30, 2006, for volume licensing and January 30, 2007, for consumers. It contained a number of new features , from a redesigned shell and user interface to significant technical changes , with a particular focus on security features . It

8084-577: Was released in September 1994, focusing on performance improvements and support for Novell 's NetWare , and was followed up by Windows NT 3.51 in May 1995, which included additional improvements and support for the PowerPC architecture. Windows NT 4.0 was released in June 1996, introducing the redesigned interface of Windows 95 to the NT series. On February 17, 2000, Microsoft released Windows 2000 ,

8178-675: Was released in two different versions: Windows/286 and Windows/386 . Windows/386 uses the virtual 8086 mode of the Intel 80386 to multitask several DOS programs and the paged memory model to emulate expanded memory using available extended memory . Windows/286, in spite of its name, runs on both Intel 8086 and Intel 80286 processors. It runs in real mode but can make use of the high memory area . In addition to full Windows packages, there were runtime-only versions that shipped with early Windows software from third parties and made it possible to run their Windows software on MS-DOS and without

8272-509: Was released on November 20, 1985, as a graphical operating system shell for MS-DOS in response to the growing interest in graphical user interfaces (GUIs). The name "Windows" is a reference to the windowing system in GUIs. The 1990 release of Windows 3.0 catapulted its market success and led to various other product families, including the now-defunct Windows 9x , Windows Mobile , Windows Phone , and Windows CE/Embedded Compact . Windows

8366-401: Was roughly equivalent to a service pack . The first OSR of Windows 95 was also the first version of Windows to be bundled with Microsoft's web browser , Internet Explorer . Mainstream support for Windows 95 ended on December 31, 2000, and extended support for Windows 95 ended on December 31, 2001. Windows 95 was followed up with the release of Windows 98 on June 25, 1998, which introduced

8460-533: Was sometimes done on Unix. Python 3.3 switched internal storage to use one of ISO-8859-1 , UCS-2, or UTF-32 depending on the largest code point in the string. Python 3.12 drops some functionality (for CPython extensions) to make it easier to migrate to UTF-8 for all strings. Java originally used UCS-2, and added UTF-16 supplementary character support in J2SE 5.0 . Recently they have encouraged dumping support for any 8-bit encoding other than UTF-8 but internally UTF-16

8554-586: Was the first client version of Windows NT to be released simultaneously in IA-32 and x64 editions. As of 2024, x64 is still supported. An edition of Windows 8 known as Windows RT was specifically created for computers with ARM architecture , and while ARM is still used for Windows smartphones with Windows 10, tablets with Windows RT will not be updated. Starting from Windows 10 Fall Creators Update (version 1709) and later includes support for ARM-based PCs. Windows CE (officially known as Windows Embedded Compact ),

8648-425: Was to include all required characters from most of the world's languages, as well as symbols from technical domains such as science, mathematics, and music. The original idea was to replace the typical 256-character encodings, which required 1 byte per character, with an encoding using 65,536 (2) values, which would require 2 bytes (16 bits) per character. Two groups worked on this in parallel, ISO/IEC JTC 1/SC 2 and

8742-556: Was used in the Dreamcast along with Sega's own proprietary OS for the console. Windows CE was the core from which Windows Mobile was derived. Its successor, Windows Phone 7 , was based on components from both Windows CE 6.0 R3 and Windows CE 7.0 . Windows Phone 8 however, is based on the same NT-kernel as Windows 8. Windows Embedded Compact is not to be confused with Windows XP Embedded or Windows NT 4.0 Embedded , modular editions of Windows based on Windows NT kernel. Xbox OS

8836-420: Was using Git, in about 8500 commits and 1760 Windows builds per day. In June 2021, shortly before Microsoft's announcement of Windows 11, Microsoft updated their lifecycle policy pages for Windows 10, revealing that support for their last release of Windows 10 will end on October 14, 2025. On April 27, 2023, Microsoft announced that version 22H2 would be the last of Windows 10. On June 24, 2021, Windows 11

#913086