Next: , Up: srfi strings ratio   [Index]


2.8.2.1 Strings are code–point sequences

This SRFI considers strings simply to be a sequence of “code points” or character encodings. Operations such as comparison or reversal are always done code point by code point. See the comments below on super–ASCII character types for implications that follow.

It’s entirely possible that a legal string might not be a sensible “text” sequence. For example, consider a string comprised entirely of zero–width Unicode accent characters with no preceding base character to modify; this is a legal string, albeit one that does not make a great deal of sense when interpreted as a sequence of natural–language text. The routines in this SRFI do not handle these “text” concerns; they restrict themselves to the underlying view of strings as merely a sequence of “code points”.

This SRFI defines string operations that are locale–independent and context–independent. While it is certainly important to have a locale–sensitive comparison or collation procedure when processing text, it is also important to have a suite of operations that are reliably invariant for basic string processing; otherwise, a change of locale could cause data structures such as hash tables, b–trees, symbol tables, directories of filenames, etc. to become corrupted.

Locale–sensitive and context–sensitive text operations, such as collation, are explicitly deferred to a subsequent, companion “text” SRFI.