Next: , Previous: , Up: objects   [Index]


13.11 String objects

Strings are variable–length blocks of memory referenced by machine words tagged as strings; strings are not stored in memory in UTF-32 format. The first word in the memory block is a fixnum representing the number of characters in the data area; a string is capable of holding at most a number of characters equal to the return value of greatest-fixnum.

|------------------------|-------------| reference to string
      heap pointer         string tag

|------------------------|-------------| string first word
     number of words       fixnum tag

All the remaining space in the memory block is filled with 32-bit unsigned integers whose least significant bits are set to the character tag and whose most significant bits are set to the character’s Unicode code point:

 tag ch0 ch1 ch2 ch3 ch4 ch5 ch6 ch7
|---|---|---|---|---|---|---|---|---| string memory block

Character indexes are zero–based.

Basic operations

String objects are allocated on the heap; to perform the allocation we compute the whole size of the data area, add to it room for meta data and finally compute the aligned block size:

ikpcb_t *  pcb            = ik_the_pcb();
ikuword_t  length         = the_number_of_chars;
ikuword_t  requested_size = sizeof(ikchar_t) * length;
ikuword_t  block_size     = disp_string_data + requested_size;
ikuword_t  align_size     = IK_ALIGN(block_size);
ikptr_t    s_str = ik_safe_alloc(pcb, align_size) | string_tag;

ik_safe_alloc() returns an ikptr_t value representing the aligned pointer, having the 3 least significant bits set to zero; we add to it the string tag (an integer value fitting in 3 bits) which allows to recognise strings among all the other built in objects.

We have to explicitly store the string length in the memory block as a fixnum, so usually a full allocation looks like this:

ikptr_t
ika_string_alloc (ikpcb_t * pcb, ikuword_t number_of_chars)
{
  ikuword_t align_size = IK_ALIGN(disp_string_data + \
    number_of_chars * sizeof(ikchar_t));
  ikptr_t   s_str = ik_safe_alloc(pcb, align_size) | string_tag;
  IK_STRING_LENGTH_FX(s_str) = IK_FIX(number_of_chars);
  return s_str;
}

which will leave the characters not initialised: this is not a problem from the garbage collector point of view. Strings are allocated on the Scheme heap’s nursery, which is not a garbage collection root; this means we can leave uninitialised the memory words allocated by ik_safe_alloc() to round the block size to the aligned size.

To fill a string of 3 chars with characters we should do:

ikptr_t     s_str = the_string;
ikchar_t *  ch    = IK_STRING_DATA_IKCHARP(s_str);

ch[0] = IK_CHAR32_FROM_INTEGER(10);
ch[1] = IK_CHAR32_FROM_INTEGER(20);
ch[2] = IK_CHAR32_FROM_INTEGER(30);

to retrieve the character at index 2 we do:

ikuword_t   index  = 2;
ikptr_t     s_str  = the_string;
ikchar_t *  ch     = IK_STRING_DATA_IKCHARP(s_str);
ikptr_t     s_char = (ikptr_t)ch[index];

and to retrieve the string length:

ikptr_t    s_str    = the_string;
ikptr_t    s_length = IK_STRING_LENGTH_FX(s_str);
ikuword_t  length   = IK_UNFIX(s_length);

or, shorter:

ikptr_t    s_str  = the_string;
ikuword_t  length = IK_STRING_LENGTH(s_str);
Preprocessor Symbol: string_char_size

Integer value representing the number of bytes in a Scheme character stored in a Scheme string.

Preprocessor Symbol: string_mask
Preprocessor Symbol: string_tag

Integer values used to tag and recognise ikptr_t values representing string references. string_mask isolates the tag bits from an ikptr_t and string_tag represents the tag bits.

Preprocessor Symbol: disp_string_length

Displacement of length. The number of bytes to add to an untagged pointer to string to get the pointer to the first byte in the word holding the string length as fixnum.

Preprocessor Symbol: disp_string_data

Displacement of data area. The number of bytes to add to an untagged pointer to string to get the pointer to the first byte in the data area.

Preprocessor Symbol: off_string_length

An integer to add to a tagged ikptr_t reference to retrieve the pointer to the first byte of the string length as fixnum.

Preprocessor Symbol: off_string_data

An integer to add to a tagged ikptr_t string reference to retrieve the pointer to the first byte of the data area.

Convenience preprocessor macros

Preprocessor Macro: int IK_IS_STRING (ikptr_t obj)

Return true if obj is a reference to a string object.

Preprocessor Macro: ikptr_t IK_STRING_LENGTH_FX (ikptr_t str)

Return a fixnum representing the number of characters in the string str.

Preprocessor Macro: ikuword_t IK_STRING_LENGTH (ikptr_t str)

Return an integer representing the number of characters in the string str.

Preprocessor Macro: ikchar_t IK_CHAR32 (ikptr_t str, ikuword_t idx)

Evaluate to the 32-bit character representation at index idx in the string str. A use of this macro can appear both as operand and as left–side of an assignment; example:

ikuword_t  idx   = the_index;
ikptr_t    s_str = the_string;
ikchar_t   ch;

IK_CHAR32(s_str, idx) = IK_CHAR32_FROM_INTEGER(10);
ch = IK_CHAR32(s_str, idx);
Preprocessor Macro: void * IK_STRING_DATA_VOIDP (ikptr_t str)

Given a tagged reference to string object str, return a pointer to the first byte in the data area.

Preprocessor Macro: ikchar_t * IK_STRING_DATA_IKCHARP (ikptr_t str)

Given a tagged reference to string object str, return a pointer to the first Scheme character in the data area.

Operations on strings

Function: ikptr_t ika_string_alloc (ikpcb_t * pcb, ikuword_t number_of_chars)
Function: ikptr_t iku_string_alloc (ikpcb_t * pcb, ikuword_t number_of_chars)

Allocate, initialise and return a new string object capable of holding the specified number of chars.

Function: ikptr_t ika_string_from_cstring (ikpcb_t * pcb, const char * cstr)
Function: ikptr_t iku_string_from_cstring (ikpcb_t * pcb, const char * cstr)

Allocate a new string object and fill it with the ASCII characters from the given ASCIIZ string.

Function: ikptr_t iku_string_to_symbol (ikpcb_t * pcb, ikptr_t str)

Return a Scheme symbol object whose name is the Scheme string str. This function is the same as iku_symbol_from_string().


Next: , Previous: , Up: objects   [Index]