A Scheme character has two representations:
the least significant 32 bits of the two representations are equal. unicode for details on Unicode.
Let's say that machine words are 32-bit values, which means the word size is 4 bytes; then the representation of a character is:
| Unicode code point | char tag
|--------|--------|--------|--------|
byte3 byte2 byte1 byte0
the least significant byte is set to #x0F: this “tags” the
machine words which embed characters. On 64-bit machines, the layout
is:
Unused |Unicode code point |char tag
|...........................|....................|......|
|------|------|------|------|------|------|------|------|
byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0
At the Scheme level: standalone characters are moved around as ikptr
values, but when characters are stored in a string the ikptr value is
converted to a 32-bit integer of type ikchar.
Standalone characters are encoded into ikptr values as follows:
unsigned long unicode_code_point = the_code_point;
ikptr s_char;
s_char = (unicode_code_point << char_shift) | char_tag;
decoded to unsigned long values as follows:
ikptr s_char = the_character;
unsigned long unicode_code_point;
unicode_code_point = s_char >> char_shift;
and identified as follows:
ikptr X = the_value;
if (char_tag == (char_mask & X))
it_is_a_character();
else
it_is_not();
Characters from a Scheme string are decoded from ikchar to
unsigned long as follows:
ikchar ch = the_32bit_character;
unsigned long unicode_code_point;
unicode_code_point = s_char >> char_shift;
and encoded from unsigned long to ikchar as follows:
unsigned long unicode_code_point = the_code_point;
ikchar ch;
ch = (ikchar)((unicode_code_point << char_shift) | char_tag);
An alias for
uint32_tused to store a Unicode code point tagged as character.
Integer values used to tag and recognise
ikptrvalues representing characters.char_maskisolates the tag bits from anikptrandchar_tagrepresents the tag bits.
Integer value representing the number of bits we must shift left to turn a C language
longinto a machine word ready to be tagged as Scheme character.