The Hong Kong Supplementary Character Set ( 香港增補字符集 ; commonly abbreviated to HKSCS ) is a set of Chinese characters – 4,702 in total in the initial release—used in Cantonese , as well as when writing the names of some places in Hong Kong (whether in written Cantonese or standard written Chinese sentences).
107-551: It evolved from the preceding Government Chinese Character Set ( 政府通用字庫 ) or GCCS . GCCS is a set of supplementary Chinese characters coded in the user-defined areas of the Big5 character set. It was originally used within the Hong Kong Government and later used by the public. It later evolved into Hong Kong Supplementary Character Set when the characters in the set were submitted to ISO-10646 for coding. Due to
214-574: A Private Use Area ( PUA ) is a range of code points that, by definition, will not be assigned characters by the standard. Three private use areas are defined: one in the Basic Multilingual Plane ( U+E000–U+F8FF ), and one each in, and nearly covering, planes 15 and 16 ( U+F0000–U+FFFFD , U+100000–U+10FFFD ). They are intentionally left undefined so that third parties may assign their own characters without conflicting with Unicode Consortium assignments. Under
321-548: A | and single-storey | ɑ | forms both representing the Latin letter ⟨ A ⟩ . Variants also emerge for aesthetic reasons, to make handwriting easier, or to correct what the writer perceives to be errors in a character's form. Individual components may be replaced with visually, phonetically, or semantically similar alternatives. The boundary between character structure and style—and thus whether forms represent different characters, or are merely variants of
428-437: A brush onto silk, bamboo, or paper, and being printed using woodblocks and moveable type . Technologies invented since the 19th century allowing for wider use of characters include telegraph codes and typewriters , as well as input methods and text encodings on computers. Chinese characters are accepted as representing one of four independent inventions of writing in human history. In each instance, writing evolved from
535-416: A character's meaning. Examples of phono-semantic compounds include 河 ( hé ; 'river'), 湖 ( hú ; 'lake'), 流 ( liú ; 'stream'), 沖 ( chōng ; 'surge'), and 滑 ( huá ; 'slippery'). Each of these characters have three short strokes on their left-hand side: 氵 , a simplified combining form of ⽔ 'WATER' . This component serves
642-428: A few characters in length at their shortest, to several dozen at their longest. The Shang king would communicate with his ancestors by means of scapulimancy , inquiring about subjects such as the royal family, military success, and the weather. Inscriptions were made in the divination material itself before and after it had been cracked by exposure to heat; they generally include a record of the questions posed, as well as
749-690: A given position in the compound. Components within a character may serve a specific function: phonetic components provide a hint for the character's pronunciation, and semantic components indicate some element of the character's meaning. Components that serve neither function may be classified as pure signs with no particular meaning, other than their presence distinguishing one character from another. A straightforward structural classification scheme may consist of three pure classes of semantographs , phonographs and signs —having only semantic, phonetic, and form components respectively, as well as classes corresponding to each combination of component types. Of
856-558: A language. Specifically, characters represent the smallest units of meaning in a language, which are referred to as morphemes . Morphemes in Chinese—and therefore the characters used to write them—are nearly always a single syllable in length. In some special cases, characters may denote non-morphemic syllables as well; due to this, written Chinese is often characterized as morphosyllabic . Logographs may be contrasted with letters in an alphabet , which generally represent phonemes ,
963-406: A line, and later evolved into their present forms with less potential for graphical ambiguity in context. More complex indicatives include 凸 ('convex'), 凹 ('concave'), and 平 ('flat and level'). Compound ideographs ( 会意 ; 會意 ; huìyì )—also called logical aggregates , associative idea characters , or syssemantographs —combine other characters to convey
1070-542: A mature form, also called 八分 ( bāfēn ). Bamboo slips discovered during the late 20th century point to this maturation being completed during the reign of Emperor Wu of Han ( r. 141–87 BCE ). This process, called libian ( 隶变 ; 隸變 ), involved character forms being mutated and simplified, with many components being consolidated, substituted, or omitted. In turn, the components themselves were regularized to use fewer, straighter, and more well-defined strokes. The resulting clerical forms largely lacked any of
1177-419: A model first popularized in the 2nd-century Shuowen Jiezi dictionary. More recent models have analysed the methods used to create characters, how characters are structured, and how they function in a given writing system. Most characters can be analysed structurally as compounds made of smaller components ( 部件 ; bùjiàn ), which are often independent characters in their own right, adjusted to occupy
SECTION 10
#17327800240881284-685: A new, synthetic meaning. A canonical example is 明 ('bright'), interpreted as the juxtaposition of the two brightest objects in the sky: ⽇ 'SUN' and ⽉ 'MOON' , together expressing their shared quality of brightness. Other examples include 休 ('rest'), composed of pictographs ⼈ 'MAN' and ⽊ 'TREE' , and 好 ('good'), composed of ⼥ 'WOMAN' and ⼦ 'CHILD' . Many traditional examples of compound ideographs are now believed to have actually originated as phono-semantic compounds, made obscure by subsequent changes in pronunciation. For example,
1391-478: A result of unification, and their Big5 code points are reserved for compatibility. Retired "not verifiable" GCCS characters are found in UTC Sources (UTC-00877–UTC-00898), where they are sourced from Adobe-CNS1-1, an Adobe-CNS1 supplement implemented to support GCCS. The HKSCS is encoded in Big5 (Big5-HKSCS, big5hk) and ISO 10646 ( Unicode ). Starting from HKSCS-2004, all characters previously using
1498-719: A semantic component. Pictographs have often been extended from their original meanings to take on additional layers of metaphor and synecdoche , which sometimes displace the character's original sense. When this process results in excessive ambiguity between distinct senses written with the same character, it is usually resolved by new compounds being derived to represent particular senses. Indicatives ( 指事 ; zhǐshì ), also called simple ideographs or self-explanatory characters , are visual representations of abstract concepts that lack any tangible form. Examples include 上 ('up') and 下 ('down')—these characters were originally written as dots placed above and below
1605-571: A semantic function in each example, indicating the character has some meaning related to water. The remainder of each character is its phonetic component: 湖 ( hú ) is pronounced identically to 胡 ( hú ) in Standard Chinese, 河 ( hé ) is pronounced similarly to 可 ( kě ), and 沖 ( chōng ) is pronounced similarly to 中 ( zhōng ). The phonetic components of most compounds may only provide an approximate pronunciation, even before subsequent sound shifts in
1712-429: A stylus in clay moulds used to cast ritual bronzes . Characters have also been incised into stone, or written in ink onto slips of silk, wood, and bamboo. The invention of paper for use as a writing medium occurred during the 1st century CE, and is traditionally credited to Cai Lun ( d. 121 CE ). There are numerous styles, or scripts ( 书 ; 書 ; shū ) in which characters can be written, including
1819-540: A system using two distinct types of ideographs . Ideographs could either be pictographs visually depicting objects or concepts, or fixed signs representing concepts only by shared convention. These systems are classified as proto-writing , because the techniques they used were insufficient to carry the meaning of spoken language by themselves. Various innovations were required for Chinese characters to emerge from proto-writing. Firstly, pictographs became distinct from simple pictures in use and appearance: for example,
1926-545: A time and without indicating any greater context. Qiu concludes, "We simply possess no basis for saying that they were already being used to record language." A historical connection with the symbols used by the late Neolithic Dawenkou culture ( c. 4300 – c. 2600 BCE ) in Shandong has been deemed possible by palaeographers, with Qiu concluding that they "cannot be definitively treated as primitive writing, nevertheless they are symbols which resemble most
2033-832: A transitional form between clerical and regular script which remained in use through the Three Kingdoms period (220–280 CE) and beyond. Cursive script ( 草书 ; 草書 ; cǎoshū ) was in use as early as 24 BCE, synthesizing elements of the vulgar writing that had originated in Qin with flowing cursive brushwork. By the Jin dynasty (266–420), the Han cursive style became known as 章草 ( zhāngcǎo ; 'orderly cursive'), sometimes known in English as 'clerical cursive', 'ancient cursive', or 'draft cursive'. Some attribute this name to
2140-521: A village near Anyang in Henan —discovered to be the site of Yin , the final Shang capital—which was excavated by a team led by Li Ji (1896–1979) from the Academia Sinica between 1928 and 1937. To date, over 150 000 oracle bone fragments have been found. Oracle bone inscriptions recorded divinations undertaken to communicate with the spirits of royal ancestors. The inscriptions range from
2247-485: A well-developed writing system, which suggests an initial emergence predating the late 2nd millennium BCE. Although written Chinese is first attested in official divinations, it is widely believed that writing was also used for other purposes during the Shang, but that the media used in other contexts—likely bamboo and wooden slips —were less durable than bronzes or oracle bones, and have not been preserved. As early as
SECTION 20
#17327800240882354-414: A word is used to indicate a different word with a similar pronunciation, depending on context. This allowed for words that lacked a plausible pictographic representation to be written down for the first time. This technique preempted more sophisticated methods of character creation that would further expand the lexicon. The process whereby writing emerged from proto-writing took place over a long period; when
2461-547: A writing system comprising thousands of distinct characters was non-trivial. Chinese characters are predominantly input on computers using a standard keyboard. Many input methods (IMEs) are phonetic, where typists enter characters according to schemes like pinyin or bopomofo for Mandarin, Jyutping for Cantonese, or Hepburn for Japanese. For example, 香港 ('Hong Kong') could be input as xiang1gang3 using pinyin, or as hoeng1gong2 using Jyutping. Private Use Area#Private use characters In Unicode ,
2568-608: Is " Other, private use (Co) ", and no character names are specified. No representative glyphs are provided, and character semantics are left to private agreement. Private-use characters are assigned Unicode code points whose interpretation is not specified by this standard and whose use may be determined by private agreement among cooperating users. These characters are designated for private use and do not have defined, interpretable semantics except by private agreement. ... No charts are provided for private-use characters, as any such characters are, by their very nature, defined only outside
2675-481: Is given by Xu as 轉注 ( zhuǎnzhù ; 'reversed and refocused'); however, its definition is unclear, and it is generally disregarded by modern scholars. Modern scholars agree that the theory presented in the Shuowen Jiezi is problematic, failing to fully capture the nature of Chinese writing, both in the present, as well as at the time Xu was writing. Traditional Chinese lexicography as embodied in
2782-483: Is in no hurry to encode them. Some, such as unrepresented languages, are likely to end up encoded in the future. Some unusual cases such as fictional languages are outside the usual scope of Unicode but not explicitly ruled out by the principles of Unicode, and may show up eventually (such as the Star Trek and Tolkien writing systems). In other cases, the proposed encoding violates one or more Unicode principles and hence
2889-581: Is maintained by the ConScript Unicode Registry (CSUR). The CSUR, which is not officially endorsed or associated with the Unicode Consortium, provides a mapping for constructed scripts, such as Klingon pIqaD and Ferengi script (Star Trek), Tengwar and Cirth (J.R.R. Tolkien's cursive and runic scripts), Alexander Melville Bell's Visible Speech , and Dr. Seuss' alphabet from On Beyond Zebra . The CSUR previously encoded
2996-399: Is not required, and character forms may be accentuated to evoke a variety of aesthetic effects. Traditional ideals of calligraphic beauty often tie into broader philosophical concepts native to East Asia. For example, aesthetics can be conceptualized using the framework of yin and yang , where the extremes of any number of mutually reinforcing dualities are balanced by the calligrapher—such as
3103-402: Is now written with five strokes instead of eight, and a system of five basic stroke types is commonly employed in analysis—with certain compound strokes treated as sequences of basic strokes made in a single motion. Characters are constructed according to predictable visual patterns. Some components have distinct combining forms when occupying specific positions within a character—for example,
3210-457: Is regularly done with corporate brand names: for example, Coca-Cola 's Chinese name is 可口可乐 ; 可口可樂 ( Kěkǒu Kělè ; 'delicious enjoyable'). Some characters and components are pure signs , whose meaning merely derives from their having a fixed and distinct form. Basic examples of pure signs are found with the numerals beyond four, e.g. 五 ('five') and 八 ('eight'), whose forms do not give visual hints to
3317-575: Is unlikely to ever be officially recognized by Unicode—mostly where users want to directly encode alternate forms, ligatures, or base-character-plus-diacritic combinations (such as the TUNE scheme). Informally, the range U+F000 through U+F8FF is known as the Corporate Use Area. This originates from early versions of Unicode, which defined an "End User Zone" extending from U+E000 upward and a "Corporate Use Zone" extending from U+F8FF downward, with
Hong Kong Supplementary Character Set - Misplaced Pages Continue
3424-450: The I ;Ching . According to one tradition, Chinese characters were invented during the 3rd millennium BCE by Cangjie , a scribe of the legendary Yellow Emperor . Cangjie is said to have invented symbols called 字 ( zì ) due to his frustration with the limitations of knotting, taking inspiration from his study of the tracks of animals, landscapes, and the stars in the sky. On
3531-548: The ⼑ 'KNIFE' component appears as 刂 on the right side of characters, but as ⺈ at the top of characters. The order in which components are drawn within a character is fixed. The order in which the strokes of a component are drawn is also largely fixed, but may vary according to several different standards. This is summed up in practice with a few rules of thumb, including that characters are generally assembled from left to right, then from top to bottom, with "enclosing" components started before, then closed after,
3638-646: The 3500 characters that are frequently used in Standard Chinese, pure semantographs are estimated to be the rarest, accounting for about 5% of the lexicon, followed by pure signs with 18%, and semantic–form and phonetic–form compounds together accounting for 19%. The remaining 58% are phono-semantic compounds. The Chinese palaeographer Qiu Xigui ( b. 1935 ) presents three principles of character function adapted from earlier proposals by Tang Lan [ zh ] (1901–1979) and Chen Mengjia (1911–1966), with semantographs describing all characters whose forms are wholly related to their meaning, regardless of
3745-616: The Macao Supplementary Character Set was developed, building on HKSCS with additional Unicode-mapped characters. The first batch of 121 MSCS characters were submitted for addition to or horizontal extension in Unicode (as appropriate) in 2009, and the first final version of MSCS was established in 2020. The HKSCS has gone through a few iterations. The last edition of HKSCS to encode all of its characters in Big5
3852-757: The Ming (1368–1644) and Qing dynasties (1644–1912) led to considerable standardization in character forms, which prefigured later script reforms during the 20th century. This print orthography , exemplified by the 1716 Kangxi Dictionary , was later dubbed the jiu zixing ('old character shapes'). Printed Chinese characters may use different typefaces , of which there are four broad classes in use: Before computers became ubiquitous, earlier electro-mechanical communications devices like telegraphs and typewriters were originally designed for use with alphabets, often by means of alphabetic text encodings like Morse code and ASCII . Adapting these technologies for use with
3959-583: The MingLiU font, and these characters can be entered via the keyboard. The patch that provides Big5 encoding of HKSCS is unsupported in Windows Vista and later. A utility provided by Microsoft is available to convert HKSCS and Unicode PUA-encoded characters to Unicode 4.1 version. In 2010, Microsoft published a HKSCS-2004 patch for Windows XP and Windows Server 2003. It replaces Windows XP version of MingLiU, PMingLiU, and MingLiU_HKSCS (if HKSCS-2001 patch
4066-657: The Private Use Area section of Unicode are remapped, with many of them reassigned to Extension B Block or Supplementary Ideographic Plane Compatibility Block. However, to preserve compatibility with programs that generated PUA code points, the allocated code points are reserved, and no new characters will be mapped to PUA . Since around 2005, many Hong Kong and Macau website had switched encoding from Big5-HKSCS to Unicode, included HKGolden . Similarly to Hong Kong's situation, there are also characters that are needed by Macao but included in neither Big5 nor HKSCS, hence,
4173-467: The Shuowen Jiezi describes 信 ('trust') as an ideographic compound of ⼈ 'MAN' and ⾔ 'SPEECH' , but modern analyses instead identify it as a phono-semantic compound—though with disagreement as to which component is phonetic. Peter A. Boodberg and William G. Boltz go so far as to deny that any compound ideographs were devised in antiquity, maintaining that secondary readings that are now lost are responsible for
4280-462: The Shuowen Jiezi has suggested implausible etymologies for some characters. Moreover, several categories are considered to be ill-defined: for example, it is unclear whether characters like 大 ('large') should be classified as pictographs or indicatives. However, awareness of the 'six writings' model has remained a common component of character literacy, and often serves as a tool for students memorizing characters. The broadest trend in
4387-582: The Shuowen Jiezi . For nearly two millennia, this scheme was the primary framework for character analysis used throughout the Sinosphere. Xu based most of his analysis on examples of Qin seal script that were written down several centuries before his time—these were usually the oldest specimens available to him, though he stated he was aware of the existence of even older forms. The first five categories are pictographs, indicatives, compound ideographs, phono-semantic compounds, and loangraphs. The sixth category
Hong Kong Supplementary Character Set - Misplaced Pages Continue
4494-517: The Sinosphere . In Japanese , Korean , and Vietnamese , Chinese characters are known as kanji , hanja , and chữ Hán respectively. Writing traditions also emerged for some of the other languages of China , like the sawndip script used to write the Zhuang languages of Guangxi . Each of these written vernaculars used existing characters to write the language's native vocabulary, as well as
4601-568: The Sui dynasty (581–618) required test takers to write in Literary Chinese using regular script, which contributed to the prevalence of both throughout later Chinese history. Each character of a text is written within a uniform square allotted for it. As part of the evolution from seal script into clerical script, character components became regularized as discrete series of strokes ( 笔画 ; 筆畫 ; bǐhuà ). Strokes can be considered both
4708-470: The loanwords it borrowed from Chinese . In addition, each invented characters for local use. In written Korean and Vietnamese, Chinese characters have largely been replaced with alphabets, leaving Japanese as the only major non-Chinese language still written using them. At the most basic level, characters are composed of strokes that are written in a fixed order. Methods of writing characters have historically included being carved into stone, being inked with
4815-526: The 13th century BCE in what is now Anyang , Henan, as part of divinations conducted by the Shang dynasty royal house. Character forms were originally highly pictographic in style, but evolved over time as writing spread across China. Numerous attempts have been made to reform the script, including the promotion of small seal script by the Qin dynasty (221–206 BCE). Clerical script , which had matured by
4922-505: The Buddhist terminology introduced to China in antiquity, as well as contemporary non-Chinese words and names. For example, each character in the name 加拿大 ( Jiānádà ; 'Canada') is often used as a loangraph for its respective syllable. However, the barrier between a character's pronunciation and meaning is never total: when transcribing into Chinese, loangraphs are often chosen deliberately as to create certain connotations. This
5029-522: The Chinese languages and others from regions historically influenced by Chinese culture . Chinese characters have a documented history spanning over three millennia, representing one of the four independent inventions of writing accepted by scholars; of these, they comprise the only writing system continuously used since its invention. Over time, the function, style, and means of writing characters have evolved greatly. Unlike letters in alphabets that reflect
5136-605: The HKSCS extensions. The table supports all code points in HKSCS-2001, except for the compatibility code points specified by the standard. In addition, the MingLiU font is altered using Microsoft's patch. This patch is known to create conflicts in applications such as Microsoft Office , or any application using fonts supporting simplified Chinese characters (e.g.: SimSun ). If the target environment contains custom font mapped to
5243-657: The HKSCS-2001 Big5 code page (with CPGID 1374 as CCSID 5470 as the double byte component), CCSID 9567 to the HKSCS-2004 code page (with CPGID 1374 as CCSID 9566 as the double byte component), and CCSID 13663 to the HKSCS-2008 code page (with CPGID 1374 as CCSID 13662 as the double byte component), while CCSID 1375 (with CPGID 1374 as CCSID 1374 as its double byte component) is assigned to a growing HKSCS code page, currently equivalent to CCSID 13663. HKSCS support
5350-401: The Latin alphabet. The express purpose of MUFI is to experimentally determine which characters are necessary to represent these texts, and to have those characters officially encoded in Unicode. As of Unicode version 5.1, 152 MUFI characters have been incorporated into the official Unicode encoding. Some agreed-upon PUA character collections exist in part or whole because the Unicode Consortium
5457-455: The PUA. Some of these private use agreements are published, so other PUA implementers can aim for unused or less-used code points to prevent overlaps. Several characters and scripts previously encoded in private use agreements have actually been fully encoded in Unicode, necessitating mappings from the PUA to other Unicode code points. One of the more well-known and broadly implemented PUA agreements
SECTION 50
#17327800240885564-457: The Qin small seal script was standardized for use throughout the entire country under the direction of Chancellor Li Si ( c. 280 – 208 BCE). It was traditionally believed that Qin scribes only used small seal script, and the later clerical script was a sudden invention during the early Han. However, more than one script was used by Qin scribes: a rectilinear vulgar style had also been in use in Qin for centuries prior to
5671-410: The Shang royal house. Contemporaneous inscriptions in a related but distinct style were also made on ritual bronze vessels. This oracle bone script ( 甲骨文 ; jiǎgǔwén ) was first documented in 1899, after specimens were discovered being sold as "dragon bones" for medicinal purposes, with the symbols carved into them identified as early character forms. By 1928, the source of the bones had been traced to
5778-586: The Shang, the oracle bone script existed as a simplified form alongside another that was used in bamboo books, in addition to elaborate pictorial forms often used in clan emblems. These other forms have been preserved in what is called bronze script ( 金文 ; jīnwén ), where inscriptions were made using a stylus in a clay mould, which was then used to cast ritual bronzes . These differences in technique generally resulted in character forms that were less angular in appearance than their oracle bone script counterparts. Study of these bronze inscriptions has revealed that
5885-620: The Sinosphere during the 20th century as a result of Western influence. Many publications outside mainland China continue to use the traditional vertical writing direction. Western influence also resulted in the generalized use of punctuation being widely adopted in print during the 19th and 20th centuries. Prior to this, the context of a passage was considered adequate to guide readers; this was enabled by characters being easier than alphabets to read when written scriptio continua , due to their more discretized shapes. The earliest attested Chinese characters were carved into bone, or marked using
5992-580: The Unicode Stability Policy, the Private Use Areas will remain allocated for that purpose in all future Unicode versions. Assignments to Private Use Area characters need not be "private" in the sense of strictly internal to an organisation; a number of assignment schemes have been published by several organisations. Such publication may include a font that supports the definition (showing the glyphs), and software making use of
6099-462: The above-mentioned patch, Mozilla uses its own code page table. However, the fix for bug 343129 does not support characters mapped to code points above Basic Multilingual Plane. QT 3.x-based applications (e.g.: KDE ) only support characters mapped to code points FFFF or lower. In QT4, characters outside BMP are supported via surrogates. Big5-HKSCS Text Codec supports HKSCS-1999 back in Qt-2.3.x, but it
6206-669: The ancient pictographic script discovered thus far in China... They undoubtedly can be viewed as the forerunners of primitive writing." The oldest attested Chinese writing comprises a body of inscriptions produced during the Late Shang period ( c. 1250 – 1050 BCE), with the very earliest examples from the reign of Wu Ding dated between 1250 and 1200 BCE. Many of these inscriptions were made on oracle bones —usually either ox scapulae or turtle plastrons—and recorded official divinations carried out by
6313-436: The answers as interpreted in the cracks. A minority of bones feature characters that were inked with a brush before their strokes were incised; the evidence of this also shows that the conventional stroke orders used by later calligraphers had already been established for many characters by this point. Oracle bone script is the direct ancestor of later forms of written Chinese. The oldest known inscriptions already represent
6420-419: The apparent absence of phonetic indicators, but their arguments have been rejected by other scholars. Phono-semantic compounds ( 形声 ; 形聲 ; xíngshēng ) are composed of at least one semantic component and one phonetic component. They may be formed by one of several methods, often by adding a phonetic component to disambiguate a loangraph, or by adding a semantic component to represent a specific extension of
6527-549: The basic unit of handwriting, as well as the writing system's basic unit of graphemic organization. In clerical and regular script, individual strokes traditionally belong to one of eight categories according to their technique and graphemic function. In what is known as the Eight Principles of Yong , calligraphers practice their technique using the character 永 ( yǒng ; 'eternity'), which can be written with one stroke of each type. In ordinary writing, 永
SECTION 60
#17327800240886634-644: The boundary between the two left undefined. The concept of reserving specific code points for Private Use is based on similar earlier usage in other character sets. In particular, many otherwise obsolete characters in East Asian scripts continue to be used in specific names or other situations, and so some character sets for those scripts made allowance for private-use characters (such as the user-defined planes of CNS 11643 , or gaiji in certain Japanese encodings). The Unicode standard references these uses under
6741-555: The box when Traditional Chinese Language support is selected during installation. They can also be installed manually at a later time. Mac OS X 10.0–10.2 supports HKSCS-1999. 10.3–10.4 supports HKSCS-2001. Some of the letters added to HKSCS-2004 is supported via Unicode PUA in OS X 10.4. Starting with OS X 10.5, all the HKSCS-2004 characters are supported via standard Unicode 4.1 code points. Mozilla 1.5 and above supports HKSCS, with HKSCS-2004 support added into Gecko 1.8.1 code base. Unlike
6848-492: The calligrapher Zhong Yao ( c. 151 – 230), who was living in the state of Cao Wei (220–266); he is often called the "father of regular script". The earliest surviving writing in regular script comprises copies of Zhong Yao's work, including at least one copy by Wang Xizhi. Characteristics of regular script include the 'pause' ( 頓 ; dùn ) technique used to end horizontal strokes, as well as heavy tails on diagonal strokes made going down and to
6955-639: The character as 明 . However, the increased usage of 朙 was followed by the proliferation of a third variant: 眀 , with ⽬ 'EYE' on the left—likely derived as a contraction of 朙 . Ultimately, 明 became the character's standard form. From the earliest inscriptions until the 20th century, texts were generally laid out vertically—with characters written from top to bottom in columns, arranged from right to left. Word boundaries are generally not indicated with spaces . A horizontal writing direction—with characters written from left to right in rows, arranged from top to bottom—only became predominant in
7062-562: The code points affected by Microsoft's patch, the custom fonts can undo Microsoft's patch. Furthermore, the patch breaks EUDC Editor supplied with the affected versions of Windows. Starting with Windows Vista , HKSCS-2004 characters are only supported as Unicode 4.1 or later; however, HKSCS-2001 and HKSCS-1999 characters are supported as Big5-HKSCS and Unicode, but Big5-HKSCS is available only if set "Language for non-Unicode programs" to "Hong Kong" or "Macau". All characters are assigned standard, non- PUA codepoints. The characters are displayed with
7169-403: The components they enclose. For example, 永 is drawn in the following order: Over a character's history, variant character forms ( 异体字 ; 異體字 ; yìtǐzì ) emerge via several processes. Variant forms have distinct structures, but represent the same morpheme; as such, they can be considered instances of the same underlying character. This is comparable to visually distinct double-storey |
7276-584: The context of this standard. In the Basic Multilingual Plane (plane 0), the block titled Private Use Area has 6400 code points. Planes 15 and 16 are almost entirely assigned to two further Private Use Areas, Supplementary Private Use Area-A and Supplementary Private Use Area-B respectively. In UTF-16 a subset of the high surrogates (U+DB80..U+DBFF) is used for these and only these planes, and are called High Private Use Surrogates . There are three PUA blocks in Unicode. In Unicode 1.0.0,
7383-616: The day that these first characters were created, grain rained down from the sky; that night, the people heard the wailing of ghosts and demons, lamenting that humans could no longer be cheated. Collections of graphs and pictures have been discovered at the sites of several Neolithic settlements throughout the Yellow River valley, including Jiahu ( c. 6500 BCE ), Dadiwan and Damaidi (6th millennium BCE), and Banpo (5th millennium BCE). Symbols at each site were inscribed or drawn onto artefacts, appearing one at
7490-417: The distinct process of semantic extension, where a word acquires additional senses, which often remain written with the same character. As both processes often result in a single character form being used to write several distinct meanings, loangraphs are often misidentified as being the result of semantic extension, and vice versa. Loangraphs are also used to write words borrowed from other languages, such as
7597-519: The distinct units of sound used by speakers of a language. Despite their origins in picture-writing, Chinese characters are no longer ideographs capable of representing ideas directly; their comprehension relies on the reader's knowledge of the particular language being written. The areas where Chinese characters were historically used—sometimes collectively termed the Sinosphere —have a long tradition of lexicography attempting to explain and refine their use; for most of history, analysis revolved around
7704-439: The duality between strokes made quickly or slowly, between applying ink heavily or lightly, between characters written with symmetrical or asymmetrical forms, and between characters representing concrete or abstract concepts. Woodblock printing was invented in China between the 6th and 9th centuries, followed by the invention of moveable type by Bi Sheng (972–1051) during the 11th century. The increasing use of print during
7811-826: The early Han dynasty (202 BCE – 220 CE), abstracted the forms of characters—obscuring their pictographic origins in favour of making them easier to write. Following the Han, regular script emerged as the result of cursive influence on clerical script, and has been the primary style used for characters since. Informed by a long tradition of lexicography , states using Chinese characters have standardized their forms: broadly, simplified characters are used to write Chinese in mainland China , Singapore , and Malaysia , while traditional characters are used in Taiwan , Hong Kong , and Macau . After being introduced in order to write Literary Chinese , characters were often adapted to write local languages spoken throughout
7918-504: The evolution of Chinese characters over their history has been simplification, both in graphical shape ( 字形 ; zìxíng ), the "external appearances of individual graphs", and in graphical form ( 字体 ; 字體 ; zìtǐ ), "overall changes in the distinguishing features of graphic[al] shape and calligraphic style, [...] in most cases refer[ring] to rather obvious and rather substantial changes". The traditional notion of an orderly procession of script styles, each suddenly appearing and displacing
8025-406: The extent that the original objects represented are no longer obvious. This proto-writing system was limited to representing a relatively narrow range of ideas with a comparatively small library of symbols. This compelled innovations that allowed for symbols to directly encode spoken language. In each historical case, this was accomplished by some form of the rebus technique, where the symbol for
8132-557: The fact that the style was considered more orderly than a later form referred to as 今草 ( jīncǎo ; 'modern cursive'), which had first emerged during the Jin and was influenced by semi-cursive and regular script. This later form was exemplified by the work of figures like Wang Xizhi (303–361), who is often regarded as the most important calligrapher in Chinese history. An early form of semi-cursive script ( 行书 ; 行書 ; xíngshū ; 'running script') can be identified during
8239-460: The forms of pictographs have been simplified in order to make them easier to write. As a result, modern readers generally cannot deduce what many pictographs were originally meant to resemble; without knowing the context of their origin in picture-writing, they may be interpreted instead as pure signs. However, if a pictograph's use in compounds still reflects its original meaning, as with 日 in 晴 ('clear sky'), it can still be analysed as
8346-598: The historical forms like seal script and clerical script. Most styles used throughout the Sinosphere originated within China, though they may display regional variation. Styles that have been created outside of China tend to remain localized in their use: these include the Japanese edomoji and Vietnamese lệnh thư scripts. Calligraphy was traditionally one of the four arts to be mastered by Chinese scholars, considered to be an artful means of expressing thoughts and teachings. Chinese calligraphy typically makes use of an ink brush to write characters. Strict regularity
8453-590: The inherent differences between standard written Chinese and written Cantonese , the Government of Hong Kong recognised the need for a standardised set of proprietary characters that would allow for the streamlining of electronic communication; at the time, the Big5 Chinese encoding scheme did not contain a vast majority of these characters (some were erroneously cross-listed with similar characters). The Government Chinese Character Set ( 政府通用字庫 ) or GCCS
8560-401: The initial development of Chinese writing, and has remained common throughout its subsequent history. Some loangraphs ( 假借 ; jiǎjiè ; 'borrowing') are introduced to represent words previously lacking another written form—this is often the case with abstract grammatical particles such as 之 and 其 . The process of characters being borrowed as loangraphs should not be conflated with
8667-488: The late Han, with its development stemming from a cursive form of neo-clerical script. Liu Desheng ( 劉德升 ; c. 147 – 188 CE) is traditionally recognized as the inventor of the semi-cursive style, though accreditations of this kind often indicate a given style's early masters, rather than its earliest practitioners. Later analysis has suggested popular origins for semi-cursive, as opposed to it being an invention of Liu. It can be characterized partly as
8774-507: The mainstream script underwent slow, gradual evolution during the late Shang, which continued during the Zhou dynasty ( c. 1046 – 256 BCE) until assuming the form now known as small seal script ( 小篆 ; xiǎozhuàn ) within the Zhou state of Qin . Other scripts in use during the late Zhou include the bird-worm seal script ( 鸟虫书 ; 鳥蟲書 ; niǎochóngshū ), as well as
8881-548: The method by which the meaning was originally depicted, phonographs that include a phonetic component, and loangraphs encompassing existing characters that have been borrowed to write other words. Qiu also acknowledges the existence of character classes that fall outside of these principles, such as pure signs. Most of the oldest characters are pictographs ( 象形 ; xiàngxíng ), representational pictures of physical objects. Examples include 日 ('Sun'), 月 ('Moon'), and 木 ('tree'). Over time,
8988-723: The name "End User Character Definition" (EUCD). Additionally, the C1 control block contains two codes intended for private use "control functions" by ECMA-48 : 0x91 private use one (PU1) and 0x92 private use two (PU2). Unicode includes these at U+0091 <control-0091> and U+0092 <control-0092> but defines them as control characters (category Cc ), not private-use characters (category Co ). Encodings which do not have private use areas but have more or less unused areas, such as ISO/IEC 8859 and Shift JIS , have seen uncontrolled variants of these encodings evolve. For Unicode, software companies can use
9095-518: The one previous, has been disproven by later scholarship and archaeological work. Instead, scripts evolved gradually, with several coexisting in a given area. Several of the Chinese classics indicate that knotted cords were used to keep records prior to the invention of writing. Works that reference the practice include chapter 80 of the Tao Te Ching and the " Xici II" commentary to
9202-405: The phonetic series of characters using 余 ( yú ; jyu4 ), a literary first-person pronoun. The Old Chinese pronunciations of these characters were similar, but the phonetic component no longer serves as a useful hint for their pronunciation due to subsequent sound shifts. The phenomenon of existing characters being adapted to write other words with similar pronunciations was necessary in
9309-428: The pictograph 大 , meaning 'large', was originally a picture of a large man, but one would need to be aware of its specific meaning in order to interpret the sequence 大鹿 as signifying 'large deer', rather than being a picture of a large man and a deer next to one another. Due to this process of abstraction, as well as to make characters easier to write, pictographs gradually became more simplified and regularized—often to
9416-495: The pictorial qualities that remained in seal script. Around the midpoint of the Eastern Han (25–220 CE), a simplified and easier form of clerical script appeared, which Qiu terms 'neo-clerical' ( 新隶体 ; 新隸體 ; xīnlìtǐ ). By the end of the Han, this had become the dominant script used by scribes, though clerical script remained in use for formal works, such as engraved stelae . Qiu describes neo-clerical as
9523-407: The plain Big5 label). However, only its decoder uses all HKSCS extensions, while its encoder explicitly excludes those with lead bytes below 0xA1 (thus excluding most of the HKSCS extensions but including, for example, those inherited from Big5 ETEN ). Newer browsers follow this standard, including Firefox . Chinese character Chinese characters are logographs used to write
9630-494: The private use area extended from U+E800 to U+FDFF (i.e. did not include U+E000..E7FF, but additionally included the U+F900..FDFF range now occupied by CJK Compatibility Ideographs , Alphabetic Presentation Forms and Arabic Presentation Forms-A ). This was changed to U+E000..F8FF in Unicode 1.0.1, and remained so in Unicode 1.1. Contrary to misconception, the range U+D800..DFFF (reserved for UTF-16 surrogates since Unicode 2.0)
9737-439: The private-use characters (e.g. a graphics character for a "print document" function). By definition, multiple private parties may assign different characters to the same code point, with the consequence that a user may see one private character from an installed font where a different one was intended. Under the Unicode definition, code points in the Private Use Areas are not noncharacters, reserved, or unassigned. Their category
9844-529: The purely pictorial use of symbols disappeared, leaving only those representing spoken words, the process was complete. Chinese characters have been used in several different writing systems throughout history. The concept of a writing system includes both the written symbols themselves, called graphemes —which may include characters, numerals, or punctuation—as well as the rules by which they are used to record language. Chinese characters are logographs , which are graphemes that represent units of meaning in
9951-525: The quantities they represent. The Shuowen Jiezi is a character dictionary authored c. 100 CE by the scholar Xu Shen ( c. 58 – c. 148 CE ). In its postface, Xu analyses what he sees as all the methods by which characters are created. Later authors iterated upon Xu's analysis, developing a categorization scheme known as the 'six writings' ( 六书 ; 六書 ; liùshū ), which identifies every character with one of six categories that had previously been mentioned in
10058-423: The regional forms used in non-Qin states. Examples of these styles were preserved as variants in the Shuowen Jiezi . Historically, Zhou forms were collectively referred to as large seal script ( 大篆 ; dàzhuàn ), a term which has fallen out of favour due to its lack of precision. Following Qin's conquest of the other Chinese states that culminated in the founding of the imperial Qin dynasty in 221 BCE,
10165-472: The result of clerical forms being written more quickly, without formal rules of technique or composition: what would be discrete strokes in clerical script frequently flow together instead. The semi-cursive style is commonly adopted in contemporary handwriting. Regular script ( 楷书 ; 楷書 ; kǎishū ), based on clerical and semi-cursive forms, is the predominant form in which characters are written and printed. Its innovations have traditionally been credited to
10272-545: The right. It developed further during the Eastern Jin (317–420) in the hands of Wang Xizhi and his son Wang Xianzhi (344–386). However, most Jin-era writers continued to use neo-clerical and semi-cursive styles in their daily writing. It was not until the Northern and Southern period (420–589) that regular script became the predominant form. The system of imperial examinations for the civil service established during
10379-426: The same character—is often non-trivial or unclear. For example, prior to the Qin dynasty the character meaning 'bright' was written as either 明 or 朙 —with either ⽇ 'SUN' or 囧 'WINDOW' on the left, and ⽉ 'MOON' on the right. As part of the Qin programme to standardize small seal script across China, the 朙 form was promoted. Some scribes ignored this, and continued to write
10486-428: The sounds of speech, Chinese characters generally represent morphemes , the units of meaning in a language. Writing a language's entire vocabulary requires thousands of different characters. Characters are created according to several different principles, where aspects of both shape and pronunciation may be used to indicate the character's meaning. The first attested characters are oracle bone inscriptions made during
10593-414: The spoken language. Some characters may only have the same initial or final sound of a syllable in common with phonetic components. A phonetic series comprises all the characters created using the same phonetic component, which may have diverged significantly in their pronunciations over time. For example, 茶 ( chá ; caa4 ; 'tea') and 途 ( tú ; tou4 ; 'route') are part of
10700-569: The undeciphered Phaistos characters, as well as the Shavian and Deseret alphabets, which have all been accepted for official encoding in Unicode. Another common PUA agreement is maintained by the Medieval Unicode Font Initiative (MUFI). This project is attempting to support all of the scribal abbreviations, ligatures, precomposed characters , symbols, and alternate letterforms found in medieval texts written in
10807-589: The wars of unification. The popularity of this form grew as writing became more widespread. By the Warring States period ( c. 475 – 221 BCE), an immature form of clerical script ( 隶书 ; 隸書 ; lìshū ) had emerged based on the vulgar form developed within Qin, often called "early clerical" or "proto-clerical". The proto-clerical script evolved gradually; by the Han dynasty (202 BCE – 220 CE), it had arrived at
10914-463: Was HKSCS-2008, while the characters added in HKSCS-2016 are mapped to Unicode only (as a CJK Unified Ideographs horizontal glyph extension where appropriate). In Microsoft Windows 98, NT 4.0, 2000, XP, HKSCS support can be enabled using Microsoft's patch. In Microsoft's implementation, application using code page 950 automatically uses a hidden code page 951 table for the Big5 encoding of
11021-416: Was added to glibc in 2000, but it has not been updated since then. HKSCS-2004 support is handled as Unicode 4.1 and later. For freedesktop.org setup, AR PL ShanHeiSun Uni font fully supports HKSCS-2004 since 0.1-0.dot.1, with latest revision of HKSCS-2004 supported in version 0.1.20060903-1. Modern desktop distributions (e.g. Ubuntu) include Arphic Technology 's HKSCS-compliant UKai and UMing fonts out of
11128-523: Was applied) with Windows 7 version of MingLiU, PMingLiU and MingLiU_HKSCS. In addition, MingLiU-ExtB, MingLiU_HKSCS-ExtB and PMingLiU-ExtB fonts will be added onto target system. However, IME is not updated as it was in the case of HKSCS-2001 patch, and the fonts are from pre-release of Windows 7. For earlier versions of the OS, HKSCS support requires the use of Microsoft's patch, or the Hong Kong government's Digital 21's utilities. IBM assigns CCSID 5471 to
11235-690: Was not included in the private use range of any Unicode 1.x version. Historically, planes E0 (224) through FF (255), and groups 60 (96) though 7F (127) of the Universal Coded Character Set (i.e. U+E00000 through U+FFFFFF and U+60000000 through U+7FFFFFFF) were also designated as private use. These ranges were removed from the specified private-use ranges when the UCS was restricted to the seventeen planes reachable in UTF-16. Many people and institutions have created character collections for
11342-625: Was thus developed by the government. The character set consists of Chinese characters commonly used in Hong Kong. Some characters are Cantonese -specific, while some are alternative forms of characters. The set is not well-organised and the characters are not closely examined. Subsequently, the HKSCS-1999 (HKSCS 1999 specification) was developed. Following its acceptance, newer revisions were released in 2001 (adding 116 new characters) and in 2004 (adding 123 new characters), totalling 4,941 characters. 106 GCCS characters were removed in HKSCS-1999 as
11449-614: Was too late in Qt development schedule to be officially included in the Qt-2.3.x series, so it was officially supported in Qt-3.0.1. HKSCS-2001 support was added in Qt-3.0.5. GNOME supports HKSCS characters in Unicode ranges, except those mapped to the Basic Multilingual Plane compatibility block. Patches to support characters mapped to above Basic Multilingual Plane was introduced during Pango 1.1. The WHATWG Encoding Standard (used by HTML5 ) includes HKSCS in its definition of Big5 (used even with
#87912