Jan 05 (Marco’s 2019 Weblog)

Previous: Video of the day, Up: Marco’s 2019 Weblog

More experiments with the C language: CCNames ¶

Posted on Sat Jan 5, 2019

It is established that my brain is getting old… sigh! I need to adapt: I need some new scheme to organise ideas and complexity; I need some new abstraction; I need some new notation. In these situations: I am in danger of valuating positively what is new over what is more effective; I know, so I will be careful… right?!

Most likely this experiment will result in failure. But failure is… failure! So… I do it because I feel like it. Life is hard…

I started experimenting something new with my CCLibraries projects. I do not want to fight the language: I do not want to introduce syntaxes that are out–of–place in a C language source base. I will just abuse a little the preprocessor (it is my understanding that some people abuse it much more than what I want to do…). I want everything to be standard C11.

“Well known” functions and variables ¶

I want every struct type to have one or more of the following functions:

init(): Constructor function. Initialises an already allocated struct instance: it can be on the stack or embedded into another, enclosing, struct. Acquires all the asynchronous resources associated to the fields of the struct.
final(): Destructor function. Finalises a struct instance allocated on the stack or embedded into another, enclosing, struct. Releases all the asynchronous resources associated to the fields of the struct.
new(): Constructor function. Allocates a block of memory on the heap to hold a new struct instance. Acquires all the asynchronous resources associated to the fields of the struct, usually by calling the init() function.
delete(): Destructor function. Finalises a struct instance allocated on the heap. Releases all the asynchronous resources associated to the fields of the struct, usually by calling the final() function. Releases the allocated block of memory.

I have in mind more “well known” functions, with defined behaviour and defined type signature. There are also some “well known” variables, mostly tables of pointers to functions.

Naming “well known” stuff ¶

How do I name these functions and variables? Let’s say that a library’s namespace is my and a struct is called coords; the full type name is my_coords_t. What’s the name of the init() function? One among:

my_coords_init
my_init_coords
init_my_coords
my_coords_t_init

or another variant. I want a way to define such a function name that is both descriptive and easy to replicate in different code repositories: every struct in every code base I author must follow the same naming convention, easily and recognisably. The fact that I cannot decide between the first 3 of these alternatives, and just stick with it, is a sure sign of impending senility.

My ageing mind has thought of defining a preprocessor macro that is used as follows:

ccname_init(my_coords_t)

the result of expanding such macro use is a function name, for example:

my_coords_t__init

I will never see the actual name in the source code, I will always write the macro use. The actual function name will show itself in compiler messages: I will try to live with it.

Now let’s say we need two init() constructors with different arguments: we have a rec variant and a pol variant; how do we generate the names for the init() constructors? Here:

ccname_init(my_coords_t, rec)
ccname_init(my_coords_t, pol)

the first operand of the macro use is the struct type name, the second operand is the name of the constructor variant. Notice that, when appropriate, the ccname_ macros are variadic.

Let’s look at an actual definition and use:

typedef struct my_coords_t       my_coords_t;

struct my_coords_t {
  double        X;
  double        Y;
};

void
ccname_init(my_coords_t, rec) (my_coords_t * self, double x, double y)
{
  self->X = x;
  self->Y = y;
}

void
ccname_init(my_coords_t, pol) (my_coords_t * self, double rho, double theta)
{
  self->X = rho * cos(theta);
  self->Y = rho * sin(theta);
}

int
main (void)
{
  my_coords_t    A[1];
  my_coords_t    B[1];

  ccname_init(my_coords_t, rec)(A, 1.0, 2.0);
  ccname_init(my_coords_t, pol)(B, 2.0, 1.0);

  exit(0);
}

A more complex example ¶

I want to define struct types that act as “interfaces” for the struct types representing data. So an instance of my_serialisable_I implements a serialisation api for a specific instance of type my_coords_t. An interface struct is little more than a table of pointers to function implementing the specialised api.

What is the name of the “well known” constructor for my_serialisable_I acting upon an instance of my_coords_t?

ccname_iface_new(my_serialisable_I, my_coords_t)

If the interface has a write() “method”: what is its name?

ccname_iface_method(my_serialisable_I, my_coords_t, write)

An the name of the struct representing the table of pointers?

ccname_iface_table(my_serialisable_I, my_coords_t)

And if there are multiple variants of serialisable interface for the same my_coords_t?

ccname_iface_new(my_serialisable_I, my_coords_t, rec)
ccname_iface_new(my_serialisable_I, my_coords_t, pol)

Achievement: the role of function names is explicitly stated in the code ¶

Given that a struct’s api is partly composed by functions with a well defined role: building names with macros tells us directly which role a given function has. The definition:

my_coords_t const *
ccname_new(my_coords_t, rec) (cce_destination_t L, double X, double Y)
{
  ...
}

tells us that the function is the rec variant of a constructor for the type my_coords_t; the function call in:

cce_location_t          L[1];
my_coords_t const *      A;

...
A = ccname_new(my_coords_t, rec)(L, 1.0, 2.0);
...

tells us that we are calling the rec variant of the constructor for the type my_coords_t.

Achievement: automatically replacing the type name, replaces the function names ¶

Let’s say we rename the type my_coords_t to your_coords_t; by just using the editor’s facilities to replace strings we automatically change:

ccname_init(my_coords_t)

into:

ccname_init(your_coords_t)

there is no need to perform another pass to replace my_coords_init() into your_coords_init(), and so on for the other functions in the api.

Problem: how do I document such an api? ¶

I use gnu Texinfo for my documentation needs. How do I document a function whose name is hidden and accessible only as the output form of a macro use? Given a function name that is built with:

ccname_init(my_coords_t, rec)
ccname_init(my_coords_t, pol)

here is an attempt using Texinfo’s @deftypefun environment (wrapping the macro use between braces in the source file):

Function: void ccname_init(my_coords_t, rec) (my_coords_t * SELF, double X, double Y) ¶
Function: void ccname_init(my_coords_t, pol) (my_coords_t * SELF, double RHO, double THETA) ¶: Initialise an already allocated struct instance using rectangular or polar coordinates.

Technologically it works; index search when browsing the documentation on the terminal with the Info reader also works (which is of paramount importance for me). How does it appear in the documentation browser? Long and stuffy, which makes it a pain for the eyes to scan the page; but descriptive and readable function names are also long and stuffy (raise your hand if you find LAPACK function names to be readable, easy to type without mistakes, easy to remember).

Problem: how do I complete a name or search for a definition using tags? ¶

I use gnu Emacs for my editing needs and the gnu Autotools to manage building projects. How do I customise a feature like complete-symbol to auto–complete a macro–built name? How do I customise a feature like xref-find-definitions to search the definition point of a function whose name is macro–built?

gnu Automake has built–in support for creating a TAGS file that we can use from Emacs with complete-symbol and xref-find-definitions; these features work fine with identifiers defined in C language code. But the following are not identifiers:

ccname_init(my_coords_t, rec)
ccname_init(my_coords_t, pol)

the symbols ccname_init and my_coords_t are automatically picked up by the tags infrastructure, but the whole macro–use needs customisation.

Using gnu Automake’s support for tags, we can add the following variable definition in Makefile.am:

AM_ETAGSFLAGS = --regex='{c}/"\\_<\\(ccname_\\(?:alloc\\|delete\\|final\\|i\\(?:face_\\(?:method\\(?:_type\\)?\\|new\\|table\\(?:_type\\)?\\)\\|nit\\)\\|method\\(?:_type\\)?\\|new\\|\\(?:releas\\|tabl\\(?:e_typ\\)?\\)e\\)\\)\\_>"/'

where the regular expression was generated under Emacs with:

(regexp-opt
     '("ccname_alloc"
       "ccname_delete"
       "ccname_final"
       "ccname_iface_method"
       "ccname_iface_method_type"
       "ccname_iface_new"
       "ccname_iface_table"
       "ccname_iface_table_type"
       "ccname_init"
       "ccname_method"
       "ccname_method_type"
       "ccname_new"
       "ccname_release"
       "ccname_table"
       "ccname_table_type")
     'symbols)

Et voilà! Everything works! Most likely, in future, I will further develop and refine this regular expression.

Problem: how do I configure syntax highlighting? ¶

Nowadays every source code editor has features for syntax highlighting, or font locking in gnu Emacs jargon. My current, imperfect, setup, loaded from .emacs, is this:

(defconst my-ccnames-macros
  (eval-when-compile
    (regexp-opt
     '("ccname_alloc"
       "ccname_delete"
       "ccname_final"
       "ccname_iface_method"
       "ccname_iface_method_type"
       "ccname_iface_new"
       "ccname_iface_table"
       "ccname_iface_table_type"
       "ccname_init"
       "ccname_method"
       "ccname_method_type"
       "ccname_new"
       "ccname_release"
       "ccname_table"
       "ccname_table_type")
     'symbols)))

;;We perform this call to  `font-lock-add-keywords' at the top-level, so
;;the configuration is done only once at file-loading time.
;;
(font-lock-add-keywords
    ;;This  argument  MODE is  set  to  `c-mode'  because this  call  is
    ;;performed   at   the   top-level.   See   the   documentation   of
    ;;`font-lock-add-keywords' for details.
    'c-mode

  ;;Here we need  to remember that "(regexp-opt  ... 'symbols)" encloses
  ;;the generated regular expression between  '\_<\(' and '\)\_>' so the
  ;;SUBEXP number must be 1 to match the actual symbol.
  ;;
  `(...
    ;;Abuse the keyword face to fontify some CCNames macro names.
    ;;
    ;;We  use t  as OVERRIDE  argument to  override an  already existent
    ;;fontification with this specification.
    (,my-ccnames-macros 1 font-lock-keyword-face t)
    ...)

  ;;This  true value  as HOW  argument causes  this specification  to be
  ;;appended to the value of `font-lock-keywords'.
  ;;
  ;;We need it  to allow correct fontification of  known function names,
  ;;which must happen after the fontification built into `c-mode'.
  t)

This needs further development, and coordination with other fontification specifications I use for C mode.

Problem: error messages become very messy ¶

There is really nothing we can do about this!

Achievement: less identifiers to remember. Problem: more positional arguments to remember ¶

What is the cognitive load required to write an read code like this? Less or more than plain C? I have no answer yet.