Misplaced Pages

Punycode

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames . Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, which is called the letter–digit–hyphen (LDH) subset. For example, München ( German name for Munich ) is encoded as Mnchen-3ya .

#426573

68-601: While the Domain Name System (DNS) technically supports arbitrary sequences of octets in domain name labels, the DNS standards recommend the use of the LDH subset of ASCII conventionally used for host names, and require that string comparisons between DNS domain names should be case-insensitive. The Punycode syntax is a method of encoding strings containing Unicode characters, such as internationalized domain names (IDNA), into

136-446: A label and zero or more resource records (RR), which hold information associated with the domain name. The domain name itself consists of the label, concatenated with the name of its parent node on the right, separated by a dot. The tree sub-divides into zones beginning at the root zone . A DNS zone may consist of as many domains and subdomains as the zone manager chooses. DNS can also be partitioned according to class where

204-483: A "com" server, and finally an "example.com" server. Name servers in delegations are identified by name, rather than by IP address. This means that a resolving name server must issue another DNS request to find out the IP address of the server to which it has been referred. If the name given in the delegation is a subdomain of the domain for which the delegation is being provided, there is a circular dependency . In this case,

272-487: A Punycode decoding, the string xn-- is prepended to Punycode sequences in internationalized domain names. This is called ACE (ASCII Compatible Encoding). Thus the domain name "bücher.tld" would be represented in a URL as "xn--bcher-kva.tld". The following table shows examples of Punycode encodings for different types of input. Domain Name System Early research and development: Merging

340-408: A cache of data. An authoritative name server can either be a primary server or a secondary server. Historically the terms master/slave and primary/secondary were sometimes used interchangeably but the current practice is to use the latter form. A primary server is a server that stores the original copies of all zone records. A secondary server uses a special automatic updating mechanism in

408-416: A combination of these methods. In a non-recursive query , a DNS resolver queries a DNS server that provides a record either for which the server is authoritative, or it provides a partial result without querying other servers. In case of a caching DNS resolver , the non-recursive query of its local DNS cache delivers a result and reduces the load on upstream DNS servers by caching DNS resource records for

476-755: A compromise between five competing proposals of solutions to Paul Mockapetris . Mockapetris instead created the Domain Name System in 1983 while at the University of Southern California . The Internet Engineering Task Force published the original specifications in RFC 882 and RFC 883 in November 1983. These were updated in RFC 973 in January 1986. In 1984, four UC Berkeley students, Douglas Terry, Mark Painter, David Riggle, and Songnian Zhou, wrote

544-399: A dataset from a reliable source. Assuming the resolver has no cached records to accelerate the process, the resolution process starts with a query to one of the root servers. In typical operation, the root servers do not answer directly, but respond with a referral to more authoritative servers, e.g., a query for "www.wikipedia.org" is referred to the org servers. The resolver now queries

612-403: A function is defined in lowercase, it can be called in uppercase, but if a variable is defined in lowercase, it cannot be referred to in uppercase. Nim is case-insensitive and ignores underscores, as long as the first characters match. A text search operation could be case-sensitive or case-insensitive, depending on the system, application, or context. The user can in many cases specify whether

680-550: A general purpose database, the DNS has also been used in combating unsolicited email (spam) by storing a real-time blackhole list (RBL). The DNS database is traditionally stored in a structured text file, the zone file , but other database systems are common. The Domain Name System originally used the User Datagram Protocol (UDP) as transport over IP. Reliability, security, and privacy concerns spawned

748-407: A period of time after an initial response from upstream DNS servers. In a recursive query , a DNS resolver queries a single DNS server, which may in turn query other DNS servers on behalf of the requester. For example, a simple stub resolver running on a home router typically makes a recursive query to the DNS server run by the user's ISP . A recursive query is one for which the DNS server answers

SECTION 10

#1732772624427

816-475: A search is sensitive to case, e.g. in most text editors, word processors, and Web browsers. A case-insensitive search is more comprehensive, finding "Language" (at the beginning of a sentence), "language", and "LANGUAGE" (in a title in capitals); a case-sensitive search will find the computer language "BASIC" but exclude most of the many unwanted instances of the word. For example, the Google Search engine

884-469: A service's location on the network to change without affecting the end users, who continue to use the same hostname. Users take advantage of this when they use meaningful Uniform Resource Locators ( URLs ) and e-mail addresses without having to know how the computer actually locates the services. An important and ubiquitous function of the DNS is its central role in distributed Internet services such as cloud services and content delivery networks . When

952-407: A source code tree for software for Unix-like systems might have both a file named Makefile and a file named makefile in the same directory. In addition, some Mac Installers assume case insensitivity and fail on case-sensitive file systems. The older MS-DOS filesystems FAT12 and FAT16 were case-insensitive and not case-preserving, so that a file whose name is entered as readme.txt or ReadMe.txt

1020-555: A time to live (TTL), which indicates how long the information remains valid before it needs to be discarded or refreshed. This TTL is determined by the administrator of the authoritative DNS server and can range from a few seconds to several days or even weeks. Case sensitivity In computers, case sensitivity defines whether uppercase and lowercase letters are treated as distinct ( case-sensitive ) or equivalent ( case-insensitive ). For instance, when users interested in learning about dogs search an e-book , "dog" and "Dog" are of

1088-477: A type of error called a "lame delegation" or "lame response". Domain name resolvers determine the domain name servers responsible for the domain name in question by a sequence of queries starting with the right-most (top-level) domain label. For proper operation of its domain name resolver, a network host is configured with an initial cache ( hints ) of the known addresses of the root name servers. The hints are updated periodically by an administrator by retrieving

1156-458: A user accesses a distributed Internet service using a URL, the domain name of the URL is translated to the IP address of a server that is proximal to the user. The key functionality of the DNS exploited here is that different users can simultaneously receive different translations for the same domain name, a key point of divergence from a traditional phone-book view of the DNS. This process of using

1224-441: A weight of 1 equals 10. After this, the weight of the next digit depends on the first threshold: generally, for any n , the weight of the ( n +1)-th digit is w × (36 − t ), where w is the previous weight and t is the threshold of the n -th digit. So in this case, the second symbol has a place value of 36 minus the previous threshold value of 1, which equals 35. Therefore, the sum of the first two symbols 'k' (=10) and 'v' (=21)

1292-666: Is 10 × 1 + 21 × 35. Since the second symbol is not less than its threshold value of 1, there is more to come. However, since the third symbol in this example is 'a' (=0), we may ignore calculating its weight. Therefore, "kva" represents the decimal number (10 × 1) + (21 × 35) = 745. Number 745 will be encoded as 10 + 21 × 35 + 0 (base 35 used for second digit, the most significant digit 0 needed as terminator), 10 → 'k', 21 → 'v', 0 → 'a', so "bücher" → "bcher-kva". The thresholds themselves are determined for each successive encoded character by an algorithm keeping them between 1 and 26 inclusive. The case can then be used to provide information about

1360-583: Is basically case-insensitive, with no option for case-sensitive search. In Oracle SQL, most operations and searches are case-sensitive by default, while in most other DBMSes , SQL searches are case-insensitive by default. Case-insensitive operations are sometimes said to fold case , from the idea of folding the character code table so that upper- and lowercase letters coincide. In filesystems in Unix-like systems, filenames are usually case-sensitive (there can be separate readme.txt and Readme.txt files in

1428-570: Is designed to work across all scripts, and to be self-optimizing by attempting to adapt to the character set ranges within the string as it operates. It is optimized for the case where the string is composed of zero or more ASCII characters and in addition characters from only one other script system, but will cope with any arbitrary Unicode string. Note that for DNS use, the domain name string is assumed to have been normalized using nameprep and (for top-level domains ) filtered against an officially registered language table before being punycoded, and that

SECTION 20

#1732772624427

1496-492: Is known as the LDH rule (letters, digits, hyphen). Domain names are interpreted in a case-independent manner. Labels may not start or end with a hyphen. An additional rule requires that top-level domain names should not be all-numeric. The limited set of ASCII characters permitted in the DNS prevented the representation of names and words of many languages in their native alphabets or scripts. To make this possible, ICANN approved

1564-488: Is only achieved with at least 6 labels (counting the last null label). Although no technical limitation exists to prevent domain name labels from using any character that is representable by an octet, hostnames use a preferred format and character set. The characters allowed in labels are a subset of the ASCII character set, consisting of characters a through z , A through Z , digits 0 through 9 , and hyphen. This rule

1632-613: Is saved as README.TXT. Later, with VFAT in Windows 95 the FAT file systems became case-preserving as an extension of supporting long filenames . Later Windows file systems such as NTFS are internally case-sensitive, and a readme.txt and a Readme.txt can coexist in the same directory. However, for practical purposes filenames behave as case-insensitive as far as users and most software are concerned. This can cause problems for developers or software coming from Unix-like environments, similar to

1700-399: Is served by the root name servers , the servers to query when looking up ( resolving ) a TLD . An authoritative name server is a name server that only gives answers to DNS queries from data that have been configured by an original source, for example, the domain administrator or by dynamic DNS methods, in contrast to answers obtained via a query to another name server that only maintains

1768-406: Is used which allows variable-length codes without separate delimiters: a digit lower than a threshold value marks that it is the most-significant digit, hence the end of the number. The threshold value depends on the position in the number and also on previous insertions, to increase efficiency. Correspondingly the weights of the digits vary. In this case a number system with 36 symbols is used, with

1836-551: The Internationalizing Domain Names in Applications (IDNA) system, by which user applications, such as web browsers, map Unicode strings into the valid DNS character set using Punycode . In 2009, ICANN approved the installation of internationalized domain name country code top-level domains ( ccTLD s) . In addition, many registries of the existing top-level domain names ( TLD s ) have adopted

1904-414: The case-insensitive 'a' through 'z' equal to the decimal numbers 0 through 25, and '0' through '9' equal to the decimal numbers 26 through 35. Thus "kva", corresponds to the decimal number string "10 21 0". To decode this string of symbols, a sequence of thresholds will be needed, in this case it's (1, 1, 26, 26, ...). The weight (or place value ) of the least-significant digit is always 1: 'k' (=10) with

1972-478: The top-level domain ; for example, the domain name www.example.com belongs to the top-level domain com . The hierarchy of domains descends from right to left; each label to the left specifies a subdivision, or subdomain of the domain to the right. For example, the label example specifies a subdomain of the com domain, and www is a subdomain of example.com. This tree of subdivisions may have up to 127 levels. A label may contain zero to 63 characters, because

2040-404: The " Authoritative Answer " ( AA ) bit in its responses. This flag is usually reproduced prominently in the output of DNS administration query tools, such as dig , to indicate that the responding name server is an authority for the domain name in question. When a name server is designated as the authoritative server for a domain name for which it does not have authoritative data, it presents

2108-540: The ARPANET. Elizabeth Feinler developed and maintained the first ARPANET directory. Maintenance of numerical addresses, called the Assigned Numbers List, was handled by Jon Postel at the University of Southern California 's Information Sciences Institute (ISI), whose team worked closely with SRI. Addresses were assigned manually. Computers, including their hostnames and addresses, were added to

Punycode - Misplaced Pages Continue

2176-462: The DNS database are for start of authority ( SOA ), IP addresses ( A and AAAA ), SMTP mail exchangers (MX), name servers (NS), pointers for reverse DNS lookups (PTR), and domain name aliases (CNAME). Although not intended to be a general purpose database, DNS has been expanded over time to store records for other types of data for either automatic lookups, such as DNSSEC records, or for human queries such as responsible person (RP) records. As

2244-401: The DNS protocol in communication with its primary to maintain an identical copy of the primary records. Every DNS zone must be assigned a set of authoritative name servers. This set of servers is stored in the parent domain zone with name server (NS) records. An authoritative server indicates its status of supplying definitive answers, deemed authoritative , by setting a protocol flag, called

2312-503: The DNS protocol sets limits on the acceptable lengths of the output Punycode string. First, all ASCII characters in the string are copied from input to output, skipping over any other characters. For example, "bücher" is copied to "bcher". If any characters were copied, i.e. if there was at least one ASCII character in the input, an ASCII hyphen is appended to the output (e.g., "bücher" → "bcher-", but "ü" → ""). Note that hyphens are themselves ASCII characters. Thus, they can be present in

2380-473: The DNS to assign proximal servers to users is key to providing faster and more reliable responses on the Internet and is widely used by most major Internet services. The DNS reflects the structure of administrative responsibility on the Internet. Each subdomain is a zone of administrative autonomy delegated to a manager. For zones operated by a registry , administrative information is often complemented by

2448-460: The IDNA system, guided by RFC 5890, RFC 5891, RFC 5892, RFC 5893. The Domain Name System is maintained by a distributed database system, which uses the client–server model . The nodes of this database are the name servers . Each domain has at least one authoritative DNS server that publishes information about that domain and the name servers of any domains subordinate to it. The top of the hierarchy

2516-430: The IP address spaces . The Domain Name System maintains the domain name hierarchy and provides translation services between it and the address spaces. Internet name servers and a communication protocol implement the Domain Name System. A DNS name server is a server that stores the DNS records for a domain; a DNS name server responds with answers to queries against its database. The most common types of records stored in

2584-402: The Internet, and increase performance in end-user applications, the Domain Name System supports DNS cache servers which store DNS query results for a period of time determined in the configuration ( time-to-live ) of the domain name record in question. Typically, such caching DNS servers also implement the recursive algorithm necessary to resolve a given name starting with the DNS root through to

2652-471: The LDH subset of ASCII favored by DNS. It is specified in IETF Request for Comments 3492. As stated in RFC 3492, "Punycode is an instance of a more general algorithm called Bootstring , which allows strings composed from a small set of 'basic' code points to uniquely represent any string of code points drawn from a larger set." Punycode defines parameters for the general Bootstring algorithm to match

2720-704: The associated entities. Most prominently, it translates readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols . The Domain Name System has been an essential component of the functionality of the Internet since 1985. The Domain Name System delegates the responsibility of assigning domain names and mapping those names to Internet resources by designating authoritative name servers for each domain. Network administrators may delegate authority over subdomains of their allocated name space to other name servers. This mechanism provides distributed and fault-tolerant service and

2788-543: The authoritative name servers of the queried domain. With this function implemented in the name server, user applications gain efficiency in design and operation. The combination of DNS caching and recursive functions in a name server is not mandatory; the functions can be implemented independently in servers for special purposes. Internet service providers typically provide recursive and caching name servers for their customers. In addition, many home networking routers implement DNS caches and recursion to improve efficiency in

Punycode - Misplaced Pages Continue

2856-456: The characteristics of Unicode text. This section demonstrates the procedure for Punycode encoding, using the example of the string "bücher" ( Bücher is German for books ), which is translated into the label "bcher-kva". To make the encoding and decoding algorithms simple, no attempt has been made to prevent some encoded values from encoding inadmissible Unicode values: however, these should be checked for and detected during decoding. Punycode

2924-475: The computer. Computers at educational institutions would have the domain edu , for example. She and her team managed the Host Naming Registry from 1972 to 1989. By the early 1980s, maintaining a single, centralized host table had become slow and unwieldy and the emerging network required an automated naming system to address technical and personnel issues. Postel directed the task of forging

2992-438: The delegation for example.org. The glue records are address records that provide IP addresses for ns1.example.org. The resolver uses one or more of these IP addresses to query one of the domain's authoritative servers, which allows it to complete the DNS query. A common approach to reduce the burden on DNS servers is to cache the results of name resolution locally or on intermediary resolver hosts. Each DNS query result comes with

3060-407: The end of ASCII) and ü (code point 252 = 0xFC , see Unicode's Latin-1 Supplement ). The ü is inserted at position 1, after the b . Thus the encoder will add the number (6 × 124) + 1 = 745 , and the decoder can retrieve these by ⌊745 ÷ 6⌋ = 124 and 745 mod 6 = 1 . These numbers are strictly increasing. For the second and subsequent inserted characters, the difference between the number and

3128-754: The first Unix name server implementation for the Berkeley Internet Name Domain, commonly referred to as BIND . In 1985, Kevin Dunlap of DEC substantially revised the DNS implementation. Mike Karels , Phil Almquist, and Paul Vixie then took over BIND maintenance. Internet Systems Consortium was founded in 1994 by Rick Adams , Paul Vixie , and Carl Malamud , expressly to provide a home for BIND development and maintenance. BIND versions from 4.9.3 onward were developed and maintained by ISC, with support provided by ISC's sponsors. As co-architects/programmers, Bob Halley and Paul Vixie released

3196-456: The first production-ready version of BIND version 8 in May 1997. Since 2000, over 43 different core developers have worked on BIND. In November 1987, RFC 1034 and RFC 1035 superseded the 1983 DNS specifications. Several additional Request for Comments have proposed extensions to the core DNS protocols. The domain name space consists of a tree data structure . Each node or leaf in the tree has

3264-465: The input and, if so, they will be copied to the output. This causes no ambiguity: if the output contains hyphens, the one that got added is always the last one. It marks the end of the ASCII characters. The non-ASCII characters are sorted by Unicode value, lowest first (if a character occurs more than once they are sorted by position). Each is then encoded as a single number. This single number defines both

3332-433: The length is only allowed to take 6 bits. The null label of length zero is reserved for the root zone. The full domain name may not exceed the length of 253 characters in its textual representation (or 254 with the trailing dot). In the internal binary representation of the DNS this maximum length of 253 requires 255 octets of storage, as it also stores the length of the first of many labels and adds last null byte. 255 length

3400-422: The local network. The client side of the DNS is called a DNS resolver. A resolver is responsible for initiating and sequencing the queries that ultimately lead to a full resolution (translation) of the resource sought, e.g., translation of a domain name into an IP address. DNS resolvers are classified by a variety of query methods, such as recursive , non-recursive , and iterative . A resolution process may use

3468-416: The location to insert the character at and which character to insert. The encoded number is n × j + i . By dividing by n and also getting the remainder, a decoder can determine j and i . There are six possible places to insert a character in the string "bcher" (including before the first character and after the last one). There are 124 code points between the last ASCII code point (127 = 0x7F ,

SECTION 50

#1732772624427

3536-408: The name server and IP address. For example, if the authoritative name server for example.org is ns1.example.org, a computer trying to resolve www.example.org first resolves ns1.example.org. As ns1 is contained in example.org, this requires resolving example.org first, which presents a circular dependency. To break the dependency, the name server for the top level domain org includes glue along with

3604-409: The name server providing the delegation must also provide one or more IP addresses for the authoritative name server mentioned in the delegation. This information is called glue . The delegating name server provides this glue in the form of records in the additional section of the DNS response, and provides the delegation in the authority section of the response. A glue record is a combination of

3672-570: The networks and creating the Internet: Commercialization, privatization, broader access leads to the modern Internet: Examples of Internet services: The Domain Name System ( DNS ) is a hierarchical and distributed name service that provides a naming system for computers , services, and other resources on the Internet or other Internet Protocol (IP) networks. It associates various information with domain names ( identification strings ) assigned to each of

3740-626: The original case of the string. Because special characters are sorted by their code points by encoding algorithm, for the insertion of a second special character in "bücher", the first possibility is "büücher" with code "bcher-kvaa", the second "bücüher" with code "bcher-kvab", etc. After "bücherü" with code "bcher-kvae" comes codes representing insertion of ý, the Unicode character following ü, starting with "ýbücher" with code "bcher-kvaf" (different from "übücher" coded "bcher-jvab"), etc. To prevent hyphens in non-international domain names from triggering

3808-456: The previous one is written. The number is encoded using the letters a through z and the digits 0 through 9 . It is not base-36 but a more complex scheme described below, which allows the numbers to be concatenated, with nothing separating them. Punycode uses generalized variable-length integers to represent these values. For example, this is how "kva" is used to represent the code number 745: A number system with little-endian ordering

3876-547: The primary file by contacting the SRI Network Information Center (NIC), directed by Feinler, via telephone during business hours. Later, Feinler set up a WHOIS directory on a server in the NIC for retrieval of information about resources, contacts, and entities. She and her team developed the concept of domains. Feinler suggested that domains should be based on the location of the physical address of

3944-402: The query completely by querying other name servers as needed. In typical operation, a client issues a recursive query to a caching recursive DNS server, which subsequently issues non-recursive queries to determine the answer and send a single answer back to the client. The resolver, or another DNS server acting recursively on behalf of the resolver, negotiates use of recursive service using bits in

4012-404: The query headers. DNS servers are not required to support recursive queries. The iterative query procedure is a process in which a DNS resolver queries a chain of one or more DNS servers. Each server refers the client to the next server in the chain, until the current server can fully resolve the request. For example, a possible resolution of www.example.com would query a global root server, then

4080-477: The registry's RDAP and WHOIS services. That data can be used to gain insight on, and track responsibility for, a given host on the Internet. Using a simpler, more memorable name in place of a host's numerical address dates back to the ARPANET era. The Stanford Research Institute (now SRI International ) maintained a text file named HOSTS.TXT that mapped host names to the numerical addresses of computers on

4148-516: The root servers, and as a result, root name servers actually are involved in only a relatively small fraction of all requests. In theory, authoritative name servers are sufficient for the operation of the Internet. However, with only authoritative name servers operating, every DNS query must start with recursive queries at the root zone of the Domain Name System and each user system would have to implement resolver software capable of recursive operation. To improve efficiency, reduce DNS traffic across

SECTION 60

#1732772624427

4216-506: The same directory). MacOS is somewhat unusual in that, by default, it uses HFS+ and APFS in a case-insensitive (so that there cannot be a readme.txt and a Readme.txt in the same directory) but case-preserving mode (so that a file created as readme.txt is shown as readme.txt and a file created as Readme.txt is shown as Readme.txt) by default. This causes some issues for developers and power users , because most file systems in other Unix-like environments are case-sensitive, and, for example,

4284-770: The same significance to them. Thus, they request a case-insensitive search. But when they search an online encyclopedia for information about the United Nations , for example, or something with no ambiguity regarding capitalization and ambiguity between two or more terms cut down by capitalization, they may prefer a case-sensitive search. Case sensitivity may differ depending on the situation: Some programming languages are case-sensitive for their identifiers ( C , C++ , Java , C# , Verilog , Ruby , Python and Swift ). Others are case-insensitive (i.e., not case-sensitive), such as ABAP , Ada , most BASICs (an exception being BBC BASIC ), Common Lisp , Fortran , SQL (for

4352-629: The separate classes can be thought of as an array of parallel namespace trees. Administrative responsibility for any zone may be divided by creating additional zones. Authority over the new zone is said to be delegated to a designated name server. The parent zone ceases to be authoritative for the new zone. The definitive descriptions of the rules for forming domain names appear in RFC 1035, RFC 1123, RFC 2181, and RFC 5892. A domain name consists of one or more parts, technically called labels , that are conventionally concatenated , and delimited by dots, such as example.com. The right-most label conveys

4420-423: The servers referred to, and iteratively repeats this process until it receives an authoritative answer. The diagram illustrates this process for the host that is named by the fully qualified domain name "www.wikipedia.org". This mechanism would place a large traffic burden on the root servers, if every resolution on the Internet required starting at the root. In practice caching is used in DNS servers to off-load

4488-511: The syntax, and for some vendor implementations, e.g. Microsoft SQL Server , the data itself) Pascal , Rexx and ooRexx . There are also languages, such as Haskell , Prolog , and Go , in which the capitalisation of an identifier encodes information about its semantics . Some other programming languages have varying case sensitivity; in PHP , for example, variable names are case-sensitive but function names are not case-sensitive. This means that if

4556-604: The use of the Transmission Control Protocol (TCP) as well as numerous other protocol developments. An often-used analogy to explain the DNS is that it serves as the phone book for the Internet by translating human-friendly computer hostnames into IP addresses. For example, the hostname www.example.com within the domain name example.com translates to the addresses 93.184.216.34 ( IPv4 ) and 2606:2800:220:1:248:1893:25c8:1946 ( IPv6 ). The DNS can be quickly and transparently updated, allowing

4624-467: Was designed to avoid a single large central database. In addition, the DNS specifies the technical functionality of the database service that is at its core. It defines the DNS protocol, a detailed specification of the data structures and data communication exchanges used in the DNS, as part of the Internet protocol suite . The Internet maintains two principal namespaces , the domain name hierarchy and

#426573