Vicare Scheme: iklib chars unicode utf32

6.22.3.4 Unicode’s UTF-32 encoding

UTF-32, also called UCS 4, is a multioctet character encoding for Unicode which can represent every character in the Unicode set: it can represent every code point in the ranges ‘[0, #xD800)’ and ‘(#xDFFF, #x10FFFF]’. It uses exactly 32 bits per Unicode code point.

This makes UTF-32 a fixed-length encoding, in contrast to all other Unicode Transformation Formats which are variable–length encodings. The UTF-32 form of a character is a direct representation of its code point.

The following syntactic bindings are exported by the library (vicare unsafe unicode). The following macros assume the word arguments are fixnums representing 32-bit words: they must be in the range ‘[0, #xFFFFFFFF]’; while the code-point arguments are fixnums representing Unicode code points (they are in the range ‘[0, #x10FFFF]’, but outside the range ‘[#xD800, #xDFFF]’).

Encoding

Syntax: utf-32-code-point? code-point: Evaluate to #t if code-point is a Unicode code point representable in UTF-32 encoding.

Syntax: utf-32-encode code-point: Encode a Unicode code point as UTF-32 encoding.

Decoding

Syntax: utf-32-word? word: Evaluate to #t if word is valid as 32-bit word UTF-32 encoding of a Unicode character; otherwise evaluate to #f.

Syntax: utf-32-decode word: Encode a valid UTF-32 encoding word into the corresponding Unicode code point.