Next: , Up: char-sets sets   [Index]


26.6.1 Introduction

Unicode assigns a single number to each code element defined by the Standard. Each of these numbers is called a code point and, when referred to in text, is listed in hexadecimal form following the prefix U+. For example, the code point U+0041 is the hexadecimal number 0041 (equal to the decimal number 65); it represents the character A in the Unicode Standard.

Each character is also assigned a unique name that specifies it and no other. For example, U+0041 is assigned the character name LATIN CAPITAL LETTER A. U+0A1B is assigned the character name GURMUKHI LETTER CHA. These Unicode names are identical to the ISO/IEC 10646 names for the same characters.

For a general overview of the Unicode Standard see6:

http://en.wikipedia.org/wiki/Unicode

For the complete reference of Unicode code points we should inspect the Unicode Characters Database7:

ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt
ftp://ftp.unicode.org/Public/UNIDATA/Blocks.txt

which is partly and introductorily documented by8:

ftp://ftp.unicode.org/Public/UNIDATA/UCD.html

the same directory on the unicode.org site offers other documents on the interpretation of the database.

For an explanation of ASCII coding, see9:

http://en.wikipedia.org/wiki/Ascii

Footnotes

(6)

URL last verified Tue Jun 23, 2009.

(7)

URLs last verified Tue Jun 23, 2009.

(8)

URL last verified Tue Jun 23, 2009.

(9)

URL last verified Tue Jun 23, 2009.