Am I going too far with gnu Bash?

Posted on Mon Dec 10, 2018

Years ago I stopped to significantly develop my mbfl project; it is a library of functions for the Unix shell gnu Bash. First reason: I perceived it had enough features (I already had a script that sends email); second reason: it had become dog slow to run mbfl scripts with my old computer.

Lately, I found that the library does not have enough features, after all; so I launched a restyling and development project for this library. It’s been a while since I upgraded to a really faster machine, so speed is no more a problem.

About the syntax

When I was younger, I found really fun to use programming languages with peculiar syntaxes; in a very immature way, I felt that it was already an achievement to be able to code with them. Today I am old. So I’m replacing most of the uses of:

test -n "$STRING"
test -z "$STRING"
test -f "$PATHNAME"

with mbfl function calls:

mbfl_string_is_not_empty "$STRING"
mbfl_string_is_empty "$STRING"
mbfl_file_is_file "$PATHNAME"

Good bye weird syntaxes! Welcome descriptive function names!

I’m also getting rid of most logic operator uses like:

some_predicate && {
  consequent_action
}

some_predicate || {
  alternate_action
}

for the more descriptive:

if some_predicate
then consequent_action
fi

if ! some_predicate
then alternate_action
fi

Less symbols, more words. The sign of old age…

I know that I should not touch code that already works, but I do have some obsessive compulsive behaviour.

Variable references

A typical idiom in Bash programming is running a command or function and collecting the textual output in a variable as follows:

RESULT=$(some_command --with --options)
RESULT=$(some_function with argu ments)

this syntax runs a subshell to execute the command. I assume that running subshells is expensive; for sure it may cause unexpected behaviour when a called function attempts to mutate uplevel variables (side effects are unavoidable).

This is not the only way to return a result from a function call: we can also use variables with the NAMEREF attribute. So I am adding a number of function variants that use variable references to return values; for example, now the function mbfl_file_extension_var exists, it does the same of mbfl_file_extension but, rather than printing the result on stdout, it mutates a variable in the scope of the caller.

Dangers of result variables

We must be careful when using this feature! Let’s consider this script:

function main () {
    local -i X=0
    mbfl_func X
    printf 'X=%d\n' $X
}

function mbfl_func () {
    local -n Y=${1:?}
    Y=1
}

main

everything is fine: the script will print X=1 because the call to mbfl_func will mutate the variable X in its execution environment, and such variable happens to be defined in the scope of main.

Now let’s consider this script:

function main () {
    local -i X=0
    mbfl_func X
    printf 'X=%d\n' $X
}

function mbfl_func () {
    local -n Y=${1:?}
    local X
    Y=1
}

main

it will print X=0 because mbfl_func accesses the variable X in its execution environment, and such variable is defined by mbfl_func itself; the local definition of X shadows the upper level definition.

There is no true escape from this problem! There is no definitive way to avoid “fishing” a local variable in a lower function from an upper function. mbfl attempts to mitigate the problem by prefixing its variables with mbfl_ when a function uses reference variables. We must never use a variable name with the prefix mbfl_ to avoid name conflicts.

So we should write the script as follows:

function main () {
    local -i X=0
    mbfl_func X
    printf 'X=%d\n' $X
}

function mbfl_func () {
    local -n mbfl_Y=${1:?}
    local mbfl_X
    mbfl_Y=1
}

main

Problem solved? No.

Using the preprocessor

mbfl’s preprocessor (gnu m4 plus a library of macros) has facilities to help us create variables with unique names that we can safely use as arguments to functions. With these facilities, we can write the demo script as:

function main () {
    mbfl_local_varref(X, 0, -i)
    mbfl_func mbfl_varname(X)
    printf 'X=%d\n' $X
}

function mbfl_func () {
    local -n Y=${1:?}
    Y=1
}

main

the preprocessor transforms the input script into (edited with added comments):

function main () {
    # Declare a local variable.
    local mbfl_a_varname_X

    # Allocate a unique variable name and store it into "mbfl_a_varname_X".
    mbfl_variable_alloc mbfl_a_varname_X

    # Declare a new variable using the unique name.  Initialise it.
    local -i $mbfl_a_varname_X=0

    # Make "X" an alias for the unique name.
    local -n X=$mbfl_a_varname_X

    # Hand the unique name to the function as argument.
    mbfl_func $mbfl_a_varname_X

    printf 'X=%d\n' $X
}

function mbfl_func () {
    local -n Y=${1:?}
    Y=1
}

main

The use of unique names removes the problem of names collision.

Using such preprocessor facilities consumes some computation time; they should be used when we do not care about execution time and when such time is significantly less than running a subshell. Otherwise we should just run the functions in a subshell.

Location handlers and atexit commands

I have added a very simple infrastructure to register atexit commands to be run when trap ... EXIT fires. Obviously, scripts using this api must not set the EXIT trap by themselves.

Also, location handlers are now available. In the following script the call to main prints ‘0342516’:

function handler_append () {
    local THING=${1:?}
    RESULT+=$THING
}

function main () {
    local RESULT

    handler_append 0
    mbfl_location_enter
    {
        mbfl_location_handler "handler_append 1"
        mbfl_location_enter
        {
            mbfl_location_handler "handler_append 2"
            mbfl_location_enter
            {
                mbfl_location_handler "handler_append 3"
            }
            mbfl_location_leave
            mbfl_location_handler "handler_append 4"
        }
        mbfl_location_leave
        mbfl_location_handler "handler_append 5"
    }
    mbfl_location_leave
    handler_append 6

    echo "$RESULT"
}

main

every call to mbfl_location_enter initialises a new location; every call to mbfl_location_handler registers a handler in the current location; every call to mbfl_location_leave finalises a location by running its handlers.

It is possible to register a special “run all handlers” function as atexit command: when we exit a script from a nested location, all the handlers are executed.

What for?

I’ve been using mbfl scripts for years for my personal automation needs: backup procedures, system administration, user operations (my window manager is Fvwm, but I do not use the “system menu” to launch programs and do stuff: I run a mbfl script from an X terminal).

Now I have more needs, as a terminal die harder. I started the development of a personal message notifier (it’s working but horrible, right now), I want a contacts book (on top of SQLite) and some automation for my personal finance bookkeeping (also on top of SQLite). More scripts! More scripts!