Vicare Scheme: baselib characters

4.10 Characters

The characters are objects that represent Unicode scalar values.

Unicode defines a standard mapping between sequences of Unicode scalar values (integers in the range 0 to #x10FFFF, excluding the range #xD800 to #xDFFF) in the latest version of the standard and human–readable “characters”.

More precisely, Unicode distinguishes between glyphs, which are printed for humans to read, and characters, which are abstract entities that map to glyphs (sometimes in a way that’s sensitive to surrounding characters). Furthermore, different sequences of scalar values sometimes correspond to the same character. The relationships among scalar, characters, and glyphs are subtle and complex.

Despite this complexity, most things that a literate human would call a “character” can be represented by a single Unicode scalar value (although several sequences of Unicode scalar values may represent that same character). For example, Roman letters, Cyrillic letters, Hebrew consonants, and most Chinese characters fall into this category.

Unicode scalar values exclude the range #xD800 to #xDFFF, which are part of the range of Unicode code points. However, the Unicode code points in this range, the so–called surrogates, are an artifact of the UTF–16 encoding, and can only appear in specific Unicode encodings, and even then only in pairs that encode scalar values. Consequently, all characters represent code points, but the surrogate code points do not have representations as characters.

Procedure: char? obj: Return #t if obj is a character, #f otherwise.

Procedure: char->integer char

Procedure: integer->char sv

sv must be a Unicode scalar value, i.e., a non–negative exact integer object in [0, #xD7FF] union [#xE000, #x10FFFF].

Given a character, char->integer returns its Unicode scalar value as an exact integer object. For a Unicode scalar value sv, integer->char returns its associated character.

(integer->char 32)                      ⇒ #\space
(char->integer (integer->char 5000))    ⇒ 5000
(integer->char #\xD800)                 ⇒ exception &assertion

Procedure: char=? char1 char2 char3 …

Procedure: char<? char1 char2 char3 …

Procedure: char>? char1 char2 char3 …

Procedure: char<=? char1 char2 char3 …

Procedure: char>=? char1 char2 char3 …

These procedures impose a total ordering on the set of characters according to their Unicode scalar values.

(char<? #\z #\Z)                ⇒ #f