Next: , Previous: objects chars, Up: objects


12.11 String objects

Strings are variable–length blocks of memory referenced by machine words tagged as strings; strings are not stored in memory in UTF-32 format. The first word in the memory block is a fixnum representing the number of characters in the data area; a string is capable of holding at most a number of characters equal to the return value of greatest-fixnum.

     |------------------------|-------------| reference to string
           heap pointer         string tag
     
     |------------------------|-------------| string first word
          number of words       fixnum tag

All the remaining space in the memory block is filled with 32-bit unsigned integers whose least significant bits are set to the character tag and whose most significant bits are set to the character's Unicode code point:

      tag ch0 ch1 ch2 ch3 ch4 ch5 ch6 ch7
     |---|---|---|---|---|---|---|---|---| string memory block

Character indexes are zero–based.

Basic operations

String objects are allocated on the heap; to perform the allocation we compute the whole size of the data area, add to it room for meta data and finally compute the aligned block size:

     ikpcb * pcb            = ik_the_pcb();
     long    length         = the_number_of_chars;
     long    requested_size = sizeof(ikchar) * length;
     long    block_size     = disp_string_data + requested_size;
     long    align_size     = IK_ALIGN(block_size);
     ikptr   str = ik_safe_alloc(pcb, align_size) | string_tag;

ik_safe_alloc() returns an ikptr value representing the aligned pointer, having the 3 least significant bits set to zero; we add to it the string tag (an integer value fitting in 3 bits) which allows to recognise strings among all the other built in objects.

We have to explicitly store the string length in the memory block as a fixnum, so usually a full allocation looks like this:

     ikptr
     ik_string_alloc (ikpcb * pcb, long number_of_chars)
     {
       long  align_size;
       ikptr str;
       align_size = IK_ALIGN(disp_string_data
                             + number_of_chars * sizeof(ikchar));
       str        = ik_safe_alloc(pcb, align_size) | string_tag;
       IK_REF(str, off_string_length) = IK_FIX(number_of_chars);
       return str;
     }

To fill a string of 3 chars with characters we should do:

     ikptr     s_str = the_string;
     ikchar *  ch    = (ikchar*)(s_str + off_string_data);
     
     ch[0] = IK_CHAR32_FROM_INTEGER(10);
     ch[1] = IK_CHAR32_FROM_INTEGER(20);
     ch[2] = IK_CHAR32_FROM_INTEGER(30);

to retrieve the character at index 2 we do:

     long      index  = 2;
     ikptr     s_str  = the_string;
     ikchar *  ch     = (ikchar*)(s_str + off_string_data);
     ikptr     s_char = (ikptr)ch[index];

and to retrieve the string length:

     ikptr  s_str    = the_string;
     ikptr  s_length = IK_REF(s_str, off_string_length);
     long   length   = IK_UNFIX(s_length);
— Preprocessor Symbol: string_char_size

Integer value representing the number of bytes in a Scheme character stored in a Scheme string.

— Preprocessor Symbol: string_mask
— Preprocessor Symbol: string_tag

Integer values used to tag and recognise ikptr values representing string references. string_mask isolates the tag bits from an ikptr and string_tag represents the tag bits.

— Preprocessor Symbol: disp_string_length

Displacement of length. The number of bytes to add to an untagged pointer to string to get the pointer to the first byte in the word holding the string length as fixnum.

— Preprocessor Symbol: disp_string_data

Displacement of data area. The number of bytes to add to an untagged pointer to string to get the pointer to the first byte in the data area.

— Preprocessor Symbol: off_string_length

An integer to add to a tagged ikptr reference to retrieve the pointer to the first byte of the string length as fixnum.

— Preprocessor Symbol: off_string_data

An integer to add to a tagged ikptr string reference to retrieve the pointer to the first byte of the data area.

Convenience preprocessor macros
— Preprocessor Macro: int IK_IS_STRING (ikptr obj)

Return true if obj is a reference to a string object.

— Preprocessor Macro: ikptr IK_STRING_LENGTH_FX (ikptr str)

Return a fixnum representing the number of characters in the string str.

— Preprocessor Macro: long IK_STRING_LENGTH (ikptr str)

Return an integer representing the number of characters in the string str.

— Preprocessor Macro: ikchar IK_CHAR32 (ikptr str, long idx)

Evaluate to the 32-bit character representation at index idx in the string str. A use of this macro can appear both as operand and as left–side of an assignment; example:

          long    idx   = the_index;
          ikptr   s_str = the_string;
          ikchar  ch;
          
          IK_CHAR32(s_str, idx) = IK_CHAR32_FROM_INTEGER(10);
          ch = IK_CHAR32(s_str, idx);
Operations on strings
— Function: ikptr ika_string_alloc (ikpcb * pcb, long number_of_chars)

Allocate, initialise and return a new string object capable of holding the specified number of chars.