Next: , Previous: , Up: objects   [Index]


13.2 Built in object references

Values of type ikptr_t at the C language level are the ones we move around as arguments and return values at the Scheme level; they represent machine words. ikptr_t values have two major interpretations:

Immediate values

Objects that fit in a single machine word: special constants (like #t and #f), fixnums, characters and input/output port transcoders.

Reference values

Objects allocated on the heap and subject to garbage collection; they are represented by tagged pointers: symbols, pairs, vectors, bytevectors, structures, ports, bignums, ratnums, flonums, compnums, cflonums, strings, closures, continuations, code objects, pointers.

immediate ikptr_t values have two minor interpretations:

Immediate special constants

These are #t, #f, nil, void, unbound, BWP.

Immediate variable values

These are fixnums, characters and transcoders.

reference ikptr_t values have two minor interpretations:

Vector tagged references

Memory pointer values whose 3 least significant bits are set to the vector tag. They reference multiword objects allocated on the heap: vectors, bignums, structures, flonums, ratnums, compnums, cflonums, continuations, code, ports, symbols, pointers.

Specially tagged references

Pointer values whose 3 least significant bits are set to a type–specific tag. They reference multiword objects allocated on the heap: pairs, bytevectors, closures, strings.

Object Reference: ikptr_t

An immediate built in object or a reference to a built in object; it is defined as follows:

Preprocessor Macro: int IK_TAGOF (ikptr_t ref)

Return an integer representing the 3 least significant bits of an ikptr_t value.

Preprocessor Macro: ikptr_t IK_REF (ikptr_t value_ref, ikuword_t byte_offset)

Getter and setter for machine words. Interpret value_ref as a pointer to an array of ikptr_t values and locate the value at the zero–based byte_offset. A use of this macro can appear both as operand and as left–side of an assignment.

ikptr_t         P, Q;

Q = IK_REF(P, 2*wordsize); /* retrieve the 3rd word */
IK_REF(P, 0) = 123L;       /* store a value in the 1st word */

Both value_ref and byte_offset are first cast to long values, then added and the sum is cast to ikptr_t *.

There are two categories of values for byte_offset: offsets and displacements; both are usually precomputed at compile time and are predefined for the built in Scheme values.

Displacements

They are plain numbers of bytes to be added to an untagged pointer to obtain the memory address of a machine word.

Offsets

They are number of bytes from which a Scheme value’s tag is subtracted: adding an offset to a tagged pointer removes the tag and computes the memory address of a machine word, in a single step.

Given an untagged pointer to a vector, the fixnum representing the length of the vector can be obtained with:

ikptr_t   p_vector = ...;
ikptr_t   s_length = IK_REF(p_vector, disp_vector_length);

predefined displacements have names prefixed with disp_; given a tagged pointer to a vector, the fixnum representing the length of the vector can be obtained with:

ikptr_t   s_vector = ...;
ikptr_t   s_length = IK_REF(s_vector, off_vector_length);

predefined offsets have names prefixed with off_. An offset can be computed from a displacement simply by subtracting the tag:

off_vector_length = disp_vector_length - vector_tag

this because we can build a tagged pointer from an untagged and aligned one with:

s_vector = p_vector | vector_tag = p_vector + vector_tag

and vice versa we can compute an untagged pointer from a tagged one with:

p_vector = s_vector - vector_tag

and so:

s_vector + off_vector_length = p_vector + disp_vector_length
Preprocessor Macro: ikptr_t * IK_PTR (ikptr_t value_ref, ikuword_t byte_offset)

Like IK_REF(), but rather than returning the machine word at offset byte_offset from value_ref, return a pointer to it. This is especially useful to build the second argument in a call to ik_signal_dirt_in_page_of_pointer().

Immediate values

All the immediate values but fixnums have the 3 least significant bits set to 1; to distinguish between immediate values and references we can do:

ikptr_t   X;

if (IK_IS_FIXNUM(X) || (immediate_tag == IK_TAGOF(X)))
  it_is_immediate();
else
  it_is_not();

where:

immediate_tag = 7 = #b111
Macro: IK_FALSE_OBJECT 0x2F
Macro: IK_FALSE
Macro: IK_TRUE_OBJECT 0x3F
Macro: IK_TRUE
Macro: IK_NULL_OBJECT 0x4F
Macro: IK_NULL
Macro: IK_EOF_OBJECT 0x5F
Macro: IK_EOF
Macro: IK_VOID_OBJECT 0x7F
Macro: IK_VOID

Special machine words of type ikptr_t representing, respectively: #f; #t; nil, the empty list; EOF, the end of file; #!void, the return value of functions returning no value.

Macro: IK_UNBOUND_OBJECT 0x6F
Macro: IK_UNBOUND

Special machine word value stored in the value and proc fields of Scheme symbol memory blocks to signal that these fields are unset.

Macro: IK_BWP_OBJECT 0x8F
Macro: IK_BWP

Special machine word value stored in locations that used to hold weak references to values which have been already garbage collected. ‘BWP’ stands for “broken weak pointer”.

Macro: IK_FORWARD_PTR ((ikptr_t)-1)

When a Scheme object’s memory block is moved by the garbage collector: the first word of the old memory block is overwritten with a special value, the “forward pointer”, which is the symbol IK_FORWARD_PTR.

Notice that when the garbage collector scans, word by word, memory that should contain the data area of a Scheme object: it interprets every machine word with all the bits set to 1 as IK_FORWARD_PTR.

Newly allocated memory is initialised by Vicare to a sequence of IK_FORWARD_PTR words, which, most likely, will trigger an assertion violation if the garbage collector scans a machine word we have not explicitly initialised to something valid. Whenever we reserve a portion of memory page, with aligned size, for a Scheme object we must initialise all its words to something valid.

When we convert a requested size to an aligned size with IK_ALIGN(): either zero or one machine word is allocated beyond the requested size. When such additional machine word is allocated: we have to initialise it to something valid. Usually the safe value to which we should initialise memory is the fixnum zero: a machine word with all the bits set to 0.

The variable values that fit in a single machine word are fixnums, characters and port transcoders. The last byte of these machine words is tagged as follows:

   object      |  tag bits  | tag hex | mask bits
---------------+------------+---------+------------
fixnums 32-bit | #b??????00 |   --    | #b00000011
fixnums 64-bit | #b?????000 |   --    | #b00000111
characters     | #b00001111 |  #x0F   | #b11111111
transcoders    | #b01111111 |  #x7F   | #b11111111

to identify a fixnum we can do:

ikptr_t   X;

if (fx_tag == (X & fx_mask))
  it_is_a_fixnum();
else
  it_is_not();

or just use the macro IK_IS_FIXNUM(); similarly for for the other immediate variable values.

Notice that a NULL pointer stored in a ikptr_t with zero bits as tag represents the fixnum zero; also, the zero tag bits for fixnums are in such a number that: a tagged ikptr_t fixnum can be interpreted as the number of bytes needed to hold a number of machine words equal to the number represented by the fixnum itself, that is the following holds true:

long    number_of_words = ...;

number_of_words * wordsize == number_of_words << fx_shift;

where fx_shift is the number of bits in the fixnum’s tag.

Values allocated on the heap

The values that do not fit into a single machine word are composed of a reference machine word and an array of machine words on the heap; they are: symbols, pairs, vectors, bytevectors, structures, ports, bignums, ratnums, flonums, compnums, cflonums, strings, closures, continuations, codes, pointers.

The machine words used as reference have the 3 least significant bits used as tag and the remaining most significant bits used to store a pointer in memory; on 32-bit platforms the layout of such machine words is:

 PPPPPPPP PPPPPPPP PPPPPPPP PPPPPTTT   P = bit of pointer
|--------|--------|--------|--------|  T = bit of tag
  byte 3   byte 2   byte 1   byte 0

the following tags are used:

  object    | tag bits | tag hex | mask bits
------------+----------+---------+------------
pairs       |   #b001  |   #x1   | #b00000111
bytevectors |   #b010  |   #x2   | #b00000111
closure     |   #b011  |   #x3   | #b00000111
vectors     |   #b101  |   #x5   | #b00000111
strings     |   #b110  |   #x6   | #b00000111

notice how none of the tags for reference words is #b111, which is reserved for immediate values; also notice how #b100 must not be used as tag, because on 32-bit platforms it would match the fixnums with the least significant bit set to one.

The vector tag is used to tag machine word references to multiple object types: vectors, bignums, structures, flonums, ratnums, compnums, cflonums, continuations, code, ports, symbols, pointers, system continuations. The first word in the memory block of these types has the least significant bits set to a secondary tag.

All the possible values for 3-bit tags in reference values are already allocated; new object types can be added only by defining a new secondary tag with references tagged as vector.

While the API defines predicates to recognise values, to identify a type–specific reference we can do:

ikptr_t   X;

if (pair_tag == (X & pair_mask))
  it_is_a_pair();
else
  it_is_not();

similarly for the other types. The vector tag acts as primary tag; a secondary tag is stored in the least significant bits of the referenced vector of words on the heap; to recognise such values we can do:

ikptr_t  X;

if ((vector_tag    == (X & vector_mask)) &&
    (secondary_tag == (secondary_mask & IK_REF(X, -vector_tag))))
  it_is();
else
  it_is_not();

where secondary_tag and secondary_mask are type–specific. The secondary tags and the associated masks are:

   object           |  tag bits   | tag hex | tag mask
                    |    76543210 |         |    76543210
--------------------+-------------+---------+-------------
vector              |  #b??????00 | fixnum  |    --
bignum              |  #b????s011 |   #x03  | #b00000111
structure           |  #b?????101 |   #x05  | #b00000111
flonum              |  #b00010111 |   #x17  |    --
ratnum              |  #b00100111 |   #x27  |    --
compnum             |  #b00110111 |   #x37  |    --
cflonum             |  #b01000111 |   #x47  |    --
continuation        |  #b00011111 |   #x1F  |    --
code                |  #b00101111 |   #x2F  |    --
port                |  #b??111111 |   #x3F  | #b00111111
symbol              |  #b01011111 |   #x5F  |    --
pointer             | #b100000111 |  #x107  |    --
system continuation | #b100011111 |  #x11F  |    --

notice how the port secondary tag has all the 6 least significant bits set to 1: no other tag must have all such bits set to 1. Secondary tags for new types can be allocated by selecting the least significant byte to #x0F and reserving a specific bit pattern in the most significant bytes.

The only tags having an associated mask are the ones of objects storing additional informations in the first word of the heap vector:

Vectors

The first word of a vector is a fixnum representing the number of elements.

Bignums

The first word uses the 3 least significant bits as tag, the 4th bit representing the sign (0 for positive, 1 for negative) and the remaining bits representing the number of words in the bignum data area.

Structures

The first word is tagged as vector, because the first word of a structure is itself a reference to a structure: the type descriptor.

Ports

The most significant bits of the first word are used for port attributes.


Next: , Previous: , Up: objects   [Index]