Next: , Previous: , Up: objects   [Index]


13.10 Character objects

A Scheme character has two representations:

the least significant 32 bits of the two representations are equal (see unicode).

Let’s say that machine words are 32-bit values, which means the word size is 4 bytes; then the representation of a character is:

|    Unicode code point    | char tag

|--------|--------|--------|--------|
  byte3    byte2    byte1    byte0

the least significant byte is set to #x0F: this “tags” the machine words which embed characters. On 64-bit machines, the layout is:

        Unused              |Unicode code point  |char tag
|...........................|....................|......|

|------|------|------|------|------|------|------|------|
 byte7  byte6  byte5  byte4  byte3  byte2  byte1  byte0

At the Scheme level: standalone characters are moved around as ikptr_t values, but when characters are stored in a string the ikptr_t value is converted to a 32-bit integer of type ikchar_t.

Basic operations

Standalone characters are encoded into ikptr_t values as follows:

ikuword_t  unicode_code_point = the_code_point;
ikptr_t    s_char;

s_char = (unicode_code_point << char_shift) | char_tag;

decoded to ikuword_t values as follows:

ikptr_t    s_char = the_character;
ikuword_t  unicode_code_point;

unicode_code_point = s_char >> char_shift;

and identified as follows:

ikptr_t   X = the_value;

if (char_tag == (char_mask & X))
  it_is_a_character();
else
  it_is_not();

Characters from a Scheme string are decoded from ikchar_t to ikuword_t as follows:

ikchar_t   ch = the_32bit_character;
ikuword_t  unicode_code_point;

unicode_code_point = s_char >> char_shift;

and encoded from ikuword_t to ikchar_t as follows:

ikuword_t  unicode_code_point = the_code_point;
ikchar_t   ch;

ch = (ikchar_t)((unicode_code_point << char_shift) | char_tag);
Type Definition: ikchar_t

An alias for uint32_t used to store a Unicode code point tagged as character.

Preprocessor Symbol: char_mask
Preprocessor Symbol: char_tag

Integer values used to tag and recognise ikptr_t values representing characters. char_mask isolates the tag bits from an ikptr_t and char_tag represents the tag bits.

Preprocessor Symbol: char_shift

Integer value representing the number of bits we must shift left to turn a C language long into a machine word ready to be tagged as Scheme character.

Convenience preprocessor macros

Preprocessor Macro: int IK_IS_CHAR (ikptr_t X)

Evaluate to true if X is a Scheme character.

Preprocessor Macro: ikptr_t IK_CHAR_FROM_INTEGER (ikuword_t X)
Preprocessor Macro: ikuword_t IK_CHAR_TO_INTEGER (ikptr_t X)

Convert a Scheme character to and from an ikuword_t value representing the Unicode code point.

Preprocessor Macro: ikchar_t IK_CHAR32_FROM_INTEGER (ikuword_t X)

Convert a ikuword_t value representing the Unicode code point into a 32-bit integer representing a Scheme character to be stored into a string.

Preprocessor Macro: uint32_t IK_CHAR32_TO_INTEGER (ikchar_t ch)

Given a 32-bit value representing a Scheme character: untag it and return a 32-bit value representing the Unicode code point.

Preprocessor Macro: ikuword_t IK_UNICODE_FROM_ASCII (char ch)

Return an unsigned integer representing the Unicode code point corresponding to the given ASCII character.


Next: , Previous: , Up: objects   [Index]