Next: , Previous: objects types, Up: objects


12.2 Built in object references

Values of type ikptr at the C language level are the ones we move around as arguments and return values at the Scheme level; they represent machine words. ikptr values have two major interpretations:

Immediate values
Objects that fit in a single machine word: special constants, fixnums, characters and input/output port transcoders.
Reference values
Objects allocated on the heap and subject to garbage collection; they are represented by tagged pointers: symbols, pairs, vectors, bytevectors, structures, ports, bignums, ratnums, flonums, compnums, cflonums, strings, closures, continuations, codes, pointers.

immediate ikptr values have two minor interpretations:

Immediate special constants
These are #t, #f, nil, void, unbound, BWP.
Immediate variable values
These are fixnums, characters and transcoders.

reference ikptr values have two minor interpretations:

Vector tagged references
Memory pointer values whose 3 least significant bits are set to the vector tag. They reference multiword objects allocated on the heap: vectors, bignums, structures, flonums, ratnums, compnums, cflonums, continuations, code, ports, symbols, pointers.
Specially tagged references
Pointer values whose 3 least significant bits are set to a type–specific tag. They reference multiword objects allocated on the heap: pairs, bytevectors, closures, strings.
— Object Reference: ikptr

An immediate built in object or a reference to a built in object; implemented as an unsigned long int it is meant to be the size of a machine word. The definition assumes that:

— Preprocessor Macro: int IK_TAGOF (ikptr ref)

Return an integer representing the 3 least significant bits of an ikptr value.

— Preprocessor Macro: ikptr IK_REF (ikptr value_ref, long byte_offset)
— Preprocessor Macro: ikptr ref (ikptr value_ref, long byte_offset)

Getter and setter for machine words. Interpret value_ref as a pointer to an array of ikptr values and locate the value at the zero–based byte_offset. A use of this macro can appear both as operand and as left–side of an assignment.

          ikptr   P, Q;
          
          Q = IK_REF(P, 2*wordsize); /* retrieve the 3rd word */
          IK_REF(P, 0) = 123L;       /* store a value in the 1st word */

Both value_ref and byte_offset are first cast to long values, then added and the sum is cast to ikptr *.

There are two categories of values for byte_offset: offsets and displacements; both are usually precomputed at compile time and are predefined for the built in Scheme values.

Displacements
They are plain numbers of bytes to be added to an untagged pointer to obtain the memory address of a machine word.
Offsets
They are number of bytes from which a Scheme value's tag is subtracted: adding an offset to a tagged pointer removes the tag and computes the memory address of a machine word, in a single step.

Given an untagged pointer to a vector, the fixnum representing the length of the vector can be obtained with:

          ikptr   p_vector = ...;
          ikptr   s_length = IK_REF(p_vector, disp_vector_length);

predefined displacements have names prefixed with disp_; given a tagged pointer to a vector, the fixnum representing the length of the vector can be obtained with:

          ikptr   s_vector = ...;
          ikptr   s_length = IK_REF(s_vector, off_vector_length);

predefined offsets have names prefixed with off_. An offset can be computed from a displacement simply by subtracting the tag:

          off_vector_length = disp_vector_length - vector_tag

this because we can build a tagged pointer from an untagged and aligned one with:

          s_vector = p_vector | vector_tag = p_vector + vector_tag

and vice versa we can compute an untagged pointer from a tagged one with:

          p_vector = s_vector - vector_tag

and so:

          s_vector + off_vector_length = p_vector + disp_vector_length
NOTE ref() is defined only in the internal header file and its use is deprecated.

Immediate values

All the immediate values but fixnums have the 3 least significant bits set to 1; to distinguish between immediate values and references we can do:

     ikptr   X;
     
     if (IK_IS_FIXNUM(X) || (immediate_tag == IK_TAGOF(X)))
       it_is_immediate();
     else
       it_is_not();

where:

     immediate_tag = 7 = #b111
— Macro: IK_FALSE_OBJECT 0x2F
— Macro: IK_FALSE
— Macro: IK_TRUE_OBJECT 0x3F
— Macro: IK_TRUE
— Macro: IK_NULL_OBJECT 0x4F
— Macro: IK_NULL
— Macro: IK_EOF_OBJECT 0x5F
— Macro: IK_EOF
— Macro: IK_VOID_OBJECT 0x7F
— Macro: IK_VOID

Special machine words of type ikptr representing, respectively: #f; #t; nil, the empty list; EOF, the end of file; #<void>, the return value of functions returning no value.

— Macro: IK_UNBOUND_OBJECT 0x6F
— Macro: IK_UNBOUND

Special machine word value stored in the value and proc fields of Scheme symbol memory blocks to signal that these fields are unset.

— Macro: IK_BWP_OBJECT 0x8F
— Macro: IK_BWP

Special machine word value stored in locations that used to hold weak references to values which have been already garbage collected.

The variable values that fit in a single machine word are fixnums, characters and port transcoders. The last byte of these machine words is tagged as follows:

        object      |  tag bits  | tag hex | mask bits
     ---------------+------------+---------+------------
     fixnums 32-bit | #b??????00 |   --    | #b00000011
     fixnums 64-bit | #b?????000 |   --    | #b00000111
     characters     | #b00001111 |  #x0F   | #b11111111
     transcoders    | #b01111111 |  #x7F   | #b11111111

to identify a fixnum we can do:

     ikptr   X;
     
     if (fx_tag == (X & fx_mask))
       it_is_a_fixnum();
     else
       it_is_not();

or just use the macro IK_IS_FIXNUM(); similarly for for the other immediate variable values.

Notice that a NULL pointer stored in a ikptr with zero bits as tag represents the fixnum zero; also, the zero tag bits for fixnums are in such a number that: a tagged ikptr fixnum can be interpreted as the number of bytes needed to hold a number of machine words equal to the number represented by the fixnum itself, that is the following holds true:

     long    number_of_words = ...;
     
     number_of_words * wordsize == number_of_words << fx_shift;

where fx_shift is the number of bits in the fixnum's tag.

Values allocated on the heap

The values that do not fit into a single machine word are composed of a reference machine word and an array of machine words on the heap; they are: symbols, pairs, vectors, bytevectors, structures, ports, bignums, ratnums, flonums, compnums, cflonums, strings, closures, continuations, codes, pointers.

The machine words used as reference have the 3 least significant bits used as tag and the remaining most significant bits used to store a pointer in memory; on 32-bit platforms the layout of such machine words is:

      PPPPPPPP PPPPPPPP PPPPPPPP PPPPPTTT   P = bit of pointer
     |--------|--------|--------|--------|  T = bit of tag
       byte 3   byte 2   byte 1   byte 0

the following tags are used:

       object    | tag bits | tag hex | mask bits
     ------------+----------+---------+------------
     pairs       |   #b001  |   #x1   | #b00000111
     bytevectors |   #b010  |   #x2   | #b00000111
     closure     |   #b011  |   #x3   | #b00000111
     vectors     |   #b101  |   #x5   | #b00000111
     strings     |   #b110  |   #x6   | #b00000111

notice how none of the tags for reference words is #b111, which is reserved for immediate values; also notice how #b100 must not be used as tag, because on 32-bit platforms it would match the fixnums with the least significant bit set to zero.

The vector tag is used to tag machine word references to multiple object types: vectors, bignums, structures, flonums, ratnums, compnums, cflonums, continuations, code, ports, symbols, pointers, system continuations. The first word in the memory block of these types has the least significant bits set to a secondary tag.

All the possible values for 3-bit tags in reference values are already allocated; new object types can be added only defining a new secondary tag with references tagged as vector.

While the API defines predicates to recognise values, to identify a type–specific reference we can do:

     ikptr   X;
     
     if (pair_tag == (X & pair_mask))
       it_is_a_pair();
     else
       it_is_not();

similarly for the other types. The vector tag acts as primary tag; a secondary tag is stored in the least significant bits of the referenced vector of words on the heap; to recognise such values we can do:

     ikptr  X;
     
     if ((vector_tag    == (X & vector_tag)) &&
         (secondary_tag == (secondary_mask
                            & IK_REF(X, -vector_tag))))
       it_is();
     else
       it_is_not();

where secondary_tag and secondary_mask are type–specific. The secondary tags and the associated masks are:

        object           |  tag bits   | tag hex | tag mask
                         |    76543210 |         |    76543210
     --------------------+-------------+---------+-------------
     vector              |  #b??????00 | fixnum  | #b00000111
     bignum              |  #b????s011 |   #x03  | #b00000111
     structure           |  #b?????101 |   #x05  | #b00000111
     flonum              |  #b00010111 |   #x17  |    --
     ratnum              |  #b00100111 |   #x27  |    --
     compnum             |  #b00110111 |   #x37  |    --
     cflonum             |  #b01000111 |   #x47  |    --
     continuation        |  #b00011111 |   #x1F  |    --
     code                |  #b00101111 |   #x2F  |    --
     port                |  #b??111111 |   #x3F  | #b00111111
     symbol              |  #b01011111 |   #x5F  |    --
     pointer             | #b100000111 |  #x107  |    --
     system continuation | #b100011111 |  #x11F  |    --

notice how the port secondary tag has all the 6 least significant bits set to 1: no other tag must have all such bits set to 1. Secondary tags for new types can be allocated by selecting the least significant byte to #x0F and reserving a specific bit pattern in the most significant bytes.

The only tags having an associated mask are the ones of objects storing additional informations in the first word of the heap vector:

Vectors
The first word of a vector is a fixnum representing the number of elements.
Bignums
The first word uses the 3 least significant bits as tag, the 4th bit representing the sign (0 for positive, 1 for negative) and the remaining bits representing the number of words in the bignum data area.
Structures
The first word is tagged as vector, because the first word of a structure is itself a reference to a structure: the type descriptor.
Ports
The most significant bits of the first word are used for port attributes.