Vicare Scheme: iklib chars unicode latin1

6.22.3.6 ISO/IEC 8859-1 also known as Latin-1 encoding

Latin-1 encoding uses 1 octet per character. For an itroduction to Latin-1 see:

http://en.wikipedia.org/wiki/ISO/IEC_8859-1

and for Unicode’s “C1 Controls and Latin-1 Supplement” see:

https://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29

http://www.unicode.org/charts/PDF/U0080.pdf

Strictly speaking, the Latin-1 encoding only defines code points in the ranges ‘[#x20, #x7E]’ and ‘[#xA0, #xFF]’; notice that the control characters are excluded.

In the range ‘[#x20, #x7E]’ the Latin-1 code points are equal to the corresponding ASCII code points.

In both the ranges ‘[#x20, #x7E]’ and ‘[#xA0, #xFF]’: Latin-1’s code points are equal to Unicode’s code points, when we take into account Unicode’s “C1 Controls and Latin-1 Supplement”.

Notice that:

Unicode’s “C0 Controls and Basic Latin” specifies the code points in the range ‘[#x00, #x7F]’ so that they are equal to ASCII’s control characters in the same range. This Unicode block includes the range of control characters ‘[#x00, #x1F]’ which is left undefined by Latin-1.
Unicode’s “C1 Controls and Latin-1 Supplement” specifies code points in the range ‘[#x80, #x9F]’. This range is left undefined by Latin-1.

This library defines an extended Latin-1 encoding spanning the whole ‘[#x00, #xFF]’ range with the following blocks:

[#x00, #x1F]    C0 Controls
[#x20, #x7E]    Latin-1 code points
#x7F            C0 Controls
[#x80, #x9F]    C1 Controls
[#xA0, #xFF]    Latin-1 code points

The following syntactic bindings are exported by the library (vicare unsafe unicode). In the following macros the argument latin-1-code-point is meant to be a fixnum representing a Latin-1 code point; while the argument unicode-code-point is meant to be a fixnum representing a Unicode code point.

Encoding Unicode code points as Latin-1 code points

Syntax: unicode-code-point-representable-as-latin-1-code-point? unicode-code-point: Evaluate to #t if unicode-code-point is a Unicode code point in a range that can be encoded in Latin-1; otherwise evaluate to #f.

Syntax: latin-1-encode unicode-code-point: Encode a Unicode code point into a Latin-1 code point.

Decoding Unicode code points from Latin-1 code points

Syntax: latin-1-code-point? octet: Assum octet is the fixnum representation of an octet. Evaluate to #t if octet a valid Latin-1 code point; otherwise evaluate to #f.

Syntax: latin-1-decode latin-1-code-point: Decode a Latin-1 code point to a Unicode code point.

Classification

Syntax: latin-1-C0-control? latin-1-code-point: Evaluate to #t if the argument is a Latin-1 code point in the range of C0 Control characters.

Syntax: latin-1-C1-control? latin-1-code-point: Evaluate to #t if the argument is a Latin-1 code point in the range of C1 Control characters.

Syntax: latin-1-control? latin-1-code-point: Evaluate to #t if the argument is a Latin-1 code point in the range of C0 Control or C1 Control characters.

Syntax: latin-1-graphic? latin-1-code-point: Evaluate to #t if the argument is a Latin-1 code point in the range of graphics (non–control) characters.