Previous: layout bitfid, Up: layout [Contents][Index]
Block objects are represented by a pointer to a C language structure with a header and a data block; slightly simplified, it looks like this:
#define C_uword unsigned C_word #define C_header C_uword typedef struct { C_header header; C_word data[]; /* Variable-length array: header determines length */ } C_SCHEME_BLOCK;
The header’s bit pattern is broken up into three parts:
The representation in machine words uses the least significant byte:
XXXXYYYY ZZZZZZZZ ZZZZZZZZ ZZZZZZZZ GC type size of object (slot or byte count)
this is the meaning of the bit groups:
C_GC_FORWARDING_BIT
Indicates this object has been forwarded elsewhere. To find the object at its new location, the entire header is shifted to the left (which shifts out this bit). Then, the value is reinterpreted as a pointer. Remember, the lowest two bits of word pointers are always zero, so we can do this with impunity!
C_BYTEBLOCK_BIT
Indicates this is a byte blob (size bits are interpreted in bytes, not words).
C_SPECIALBLOCK_BIT
Indicates that the first slot is special and should be skipped by the garbage collector.
C_8ALIGN_BIT
Indicates that for this object, alignment must be maintained at an 8-byte boundary.
The type bits are assigned incrementally. There is room for 16 types, only 2 of which are currently unused. Let’s look at the definitions, which should also help to explain the practical use of the latter 3 GC bits:
#define C_SYMBOL_TYPE (0x01000000L) #define C_STRING_TYPE (0x02000000L | C_BYTEBLOCK_BIT) #define C_PAIR_TYPE (0x03000000L) #define C_CLOSURE_TYPE (0x04000000L | C_SPECIALBLOCK_BIT) #define C_FLONUM_TYPE (0x05000000L | C_BYTEBLOCK_BIT | C_8ALIGN_BIT) /* unused (0x06000000L ...) */ #define C_PORT_TYPE (0x07000000L | C_SPECIALBLOCK_BIT) #define C_STRUCTURE_TYPE (0x08000000L) #define C_POINTER_TYPE (0x09000000L | C_SPECIALBLOCK_BIT) #define C_LOCATIVE_TYPE (0x0a000000L | C_SPECIALBLOCK_BIT) #define C_TAGGED_POINTER_TYPE (0x0b000000L | C_SPECIALBLOCK_BIT) #define C_LAMBDA_INFO_TYPE (0x0d000000L | C_BYTEBLOCK_BIT) /* unused (0x0e000000L ...) */ #define C_BUCKET_TYPE (0x0f000000L)
Most of the types should be self–explanatory to a seasoned Schemer, but a few things deserve further explanation.
C_BYTEBLOCK_BIT
is also set, for obvious reasons: strings do not
consist of slots containing Scheme values, but of bytes, which are opaque. Because the header’s
size bits store the length in bytes instead of in words, we can spot a very important limitation:
CHICKEN strings can only hold 16 MiB of data on a 32-bit machine (on a 64-bit machine, strings
are “limited” to 65536 TiB).
CLOSURE
type uses C_SPECIALBLOCK_BIT
. This indicates to the garbage collector
that the first slot contains a raw non–Scheme value. In the case of a closure, it contains a
pointer to a C function. The other slots contain free variables that were closed over
(“captured”) by the lambda, which are normal Scheme objects. The compiled C function “knows”
which variable lives in which slot.
FLONUM
type uses C_BYTEBLOCK_BIT
, because an unboxed C double
value is not
a Scheme object: we want to treat the data as an opaque blob. On a 32-bit system, the double
will take up two machine words, so we can’t use C_SPECIALBLOCK_BIT
. The header will
therefore hold the value 8 as its size. It also has another GC bit: C_8ALIGN_BIT
. This
ensures that the 64-bit double
is aligned on a 8-byte boundary, to avoid unaligned access on
32-bit systems. This adds some complexity to garbage collection and memory allocation.
STRUCTURE
type refers to a SRFI-9 type of record object. Its slots hold the record’s
fields, and the accessors and constructors “know” which field is stored at which index.
POINTER
type holds a raw C pointer inside a Scheme object. Again, because C pointers are
not Scheme objects, the object’s first (and only) slot is treated specially, via C_SPECIALBLOCK_BIT
.
LOCATIVE
type represents a rather complicated object. It acts a bit like a pointer into
a slab of memory. You can use it as a single value which represents a location inside another block
object. This can then be used as an argument to a foreign function that expects a pointer. Its
first slot holds a raw pointer. The other slots hold the offset, the type of pointer (encoded as
fixnum) and the original object, unless it is a weak reference.
TAGGED_POINTER
type is exactly like POINTER
, but it has an extra user–defined
tag. This can make it easier for code to identify the pointer’s type. The tag is a Scheme value
held in its second slot.
LAMBDA_INFO
type stores procedure introspection information (mostly for debugging).
BUCKET
type is a special internal pair–like object which is used in the linked list of
symbols under a hash table bucket in the symbol table. It does not count as a reference, so that
symbols can be garbage collected when only the symbol table still refers to them.
So far, the only numeric types we’ve seen are fixnums and flonums. What about the other numeric types? After all, CHICKEN 5 has a full numeric tower!
In CHICKEN 5, rational and complex numbers are viewed as two simpler numbers stuck together. They’re stored as records with a special tag, which the runtime system recognises. Bignums are also represented as a record with a special tag and a slot that refers to the byte blob containing the actual bignum value.
Previous: layout bitfid, Up: layout [Contents][Index]