Next: objects strings, Previous: objects structs, Up: objects [Index]
A Scheme character has two representations:
the least significant 32 bits of the two representations are equal (see unicode).
Let’s say that machine words are 32-bit values, which means the word size is 4 bytes; then the representation of a character is:
| Unicode code point | char tag |--------|--------|--------|--------| byte3 byte2 byte1 byte0
the least significant byte is set to #x0F
: this “tags” the
machine words which embed characters. On 64-bit machines, the layout
is:
Unused |Unicode code point |char tag |...........................|....................|......| |------|------|------|------|------|------|------|------| byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
At the Scheme level: standalone characters are moved around as ikptr_t
values, but when characters are stored in a string the ikptr_t
value is
converted to a 32-bit integer of type ikchar_t
.
Standalone characters are encoded into ikptr_t
values as follows:
ikuword_t unicode_code_point = the_code_point; ikptr_t s_char; s_char = (unicode_code_point << char_shift) | char_tag;
decoded to ikuword_t
values as follows:
ikptr_t s_char = the_character; ikuword_t unicode_code_point; unicode_code_point = s_char >> char_shift;
and identified as follows:
ikptr_t X = the_value; if (char_tag == (char_mask & X)) it_is_a_character(); else it_is_not();
Characters from a Scheme string are decoded from ikchar_t
to ikuword_t
as follows:
ikchar_t ch = the_32bit_character; ikuword_t unicode_code_point; unicode_code_point = s_char >> char_shift;
and encoded from ikuword_t
to ikchar_t
as follows:
ikuword_t unicode_code_point = the_code_point; ikchar_t ch; ch = (ikchar_t)((unicode_code_point << char_shift) | char_tag);
An alias for uint32_t
used to store a Unicode code point tagged
as character.
Integer values used to tag and recognise ikptr_t
values representing
characters. char_mask
isolates the tag bits from an ikptr_t
and
char_tag
represents the tag bits.
Integer value representing the number of bits we must shift left to turn
a C language long
into a machine word ready to be tagged as
Scheme character.
Evaluate to true if X is a Scheme character.
Convert a Scheme character to and from an ikuword_t
value
representing the Unicode code point.
Convert a ikuword_t
value representing the Unicode code
point into a 32-bit integer representing a Scheme character to be stored
into a string.
Given a 32-bit value representing a Scheme character: untag it and return a 32-bit value representing the Unicode code point.
Return an unsigned integer representing the Unicode code point corresponding to the given ASCII character.
Next: objects strings, Previous: objects structs, Up: objects [Index]