Next: objects symbols, Previous: objects chars, Up: objects [Index]
Strings are variable–length blocks of memory referenced by machine
words tagged as strings; strings are not stored in memory in
UTF-32 format. The first word in the memory block is a fixnum
representing the number of characters in the data area; a string is
capable of holding at most a number of characters equal to the return
value of greatest-fixnum
.
|------------------------|-------------| reference to string heap pointer string tag |------------------------|-------------| string first word number of words fixnum tag
All the remaining space in the memory block is filled with 32-bit unsigned integers whose least significant bits are set to the character tag and whose most significant bits are set to the character’s Unicode code point:
tag ch0 ch1 ch2 ch3 ch4 ch5 ch6 ch7 |---|---|---|---|---|---|---|---|---| string memory block
Character indexes are zero–based.
String objects are allocated on the heap; to perform the allocation we compute the whole size of the data area, add to it room for meta data and finally compute the aligned block size:
ikpcb_t * pcb = ik_the_pcb(); ikuword_t length = the_number_of_chars; ikuword_t requested_size = sizeof(ikchar_t) * length; ikuword_t block_size = disp_string_data + requested_size; ikuword_t align_size = IK_ALIGN(block_size); ikptr_t s_str = ik_safe_alloc(pcb, align_size) | string_tag;
ik_safe_alloc()
returns an ikptr_t
value representing the aligned
pointer, having the 3 least significant bits set to zero; we add
to it the string tag (an integer value fitting in 3 bits) which
allows to recognise strings among all the other built in objects.
We have to explicitly store the string length in the memory block as a fixnum, so usually a full allocation looks like this:
ikptr_t ika_string_alloc (ikpcb_t * pcb, ikuword_t number_of_chars) { ikuword_t align_size = IK_ALIGN(disp_string_data + \ number_of_chars * sizeof(ikchar_t)); ikptr_t s_str = ik_safe_alloc(pcb, align_size) | string_tag; IK_STRING_LENGTH_FX(s_str) = IK_FIX(number_of_chars); return s_str; }
which will leave the characters not initialised: this is not a problem
from the garbage collector point of view. Strings are allocated on the
Scheme heap’s nursery, which is not a garbage collection root; this
means we can leave uninitialised the memory words allocated by
ik_safe_alloc()
to round the block size to the aligned size.
To fill a string of 3 chars with characters we should do:
ikptr_t s_str = the_string; ikchar_t * ch = IK_STRING_DATA_IKCHARP(s_str); ch[0] = IK_CHAR32_FROM_INTEGER(10); ch[1] = IK_CHAR32_FROM_INTEGER(20); ch[2] = IK_CHAR32_FROM_INTEGER(30);
to retrieve the character at index 2 we do:
ikuword_t index = 2; ikptr_t s_str = the_string; ikchar_t * ch = IK_STRING_DATA_IKCHARP(s_str); ikptr_t s_char = (ikptr_t)ch[index];
and to retrieve the string length:
ikptr_t s_str = the_string; ikptr_t s_length = IK_STRING_LENGTH_FX(s_str); ikuword_t length = IK_UNFIX(s_length);
or, shorter:
ikptr_t s_str = the_string; ikuword_t length = IK_STRING_LENGTH(s_str);
Integer value representing the number of bytes in a Scheme character stored in a Scheme string.
Integer values used to tag and recognise ikptr_t
values representing
string references. string_mask
isolates the tag bits from an
ikptr_t
and string_tag
represents the tag bits.
Displacement of length. The number of bytes to add to an untagged pointer to string to get the pointer to the first byte in the word holding the string length as fixnum.
Displacement of data area. The number of bytes to add to an untagged pointer to string to get the pointer to the first byte in the data area.
An integer to add to a tagged ikptr_t
reference to retrieve the pointer
to the first byte of the string length as fixnum.
An integer to add to a tagged ikptr_t
string reference to retrieve the
pointer to the first byte of the data area.
Return true if obj is a reference to a string object.
Return a fixnum representing the number of characters in the string str.
Return an integer representing the number of characters in the string str.
Evaluate to the 32-bit character representation at index idx in the string str. A use of this macro can appear both as operand and as left–side of an assignment; example:
ikuword_t idx = the_index; ikptr_t s_str = the_string; ikchar_t ch; IK_CHAR32(s_str, idx) = IK_CHAR32_FROM_INTEGER(10); ch = IK_CHAR32(s_str, idx);
Given a tagged reference to string object str, return a pointer to the first byte in the data area.
Given a tagged reference to string object str, return a pointer to the first Scheme character in the data area.
Allocate, initialise and return a new string object capable of holding the specified number of chars.
Allocate a new string object and fill it with the ASCII characters from the given ASCIIZ string.
Return a Scheme symbol object whose name is the Scheme string
str. This function is the same as iku_symbol_from_string()
.
Next: objects symbols, Previous: objects chars, Up: objects [Index]