Vicare Scheme: stdlib unicode strings

5.1.2 Strings

Procedure: string-upcase string

Procedure: string-downcase string

Procedure: string-titlecase string

Procedure: string-foldcase string

These procedures take a string argument and return a string result. They are defined in terms of Unicode’s locale–independent case mappings from Unicode scalar–value sequences to scalar–value sequences. In particular, the length of the result string can be different from the length of the input string. When the specified result is equal in the sense of string=? to the argument, these procedures may return the argument instead of a newly allocated string.

The string-upcase procedure converts a string to upper case; string-downcase converts a string to lower case. The string-foldcase procedure converts the string to its case–folded counterpart, using the full case–folding mapping, but without the special mappings for Turkic languages. The string-titlecase procedure converts the first cased character of each word, and downcases all other cased characters.

(string-upcase "Hi")                    ⇒ "HI"
(string-downcase "Hi")                  ⇒ "hi"
(string-foldcase "Hi")                  ⇒ "hi"

(string-titlecase "kNock KNoCK")        ⇒ "Knock Knock"
(string-titlecase "who's there?")       ⇒ "Who's There?"
(string-titlecase "r6rs")               ⇒ "R6rs"
(string-titlecase "R6RS")               ⇒ "R6rs"

NOTE The case mappings needed for implementing these procedures can be extracted from UnicodeData.txt, SpecialCasing.txt, WordBreakProperty.txt (the “MidLetter” property partly defines case–ignorable characters), and CaseFolding.txt from the Unicode Consortium.

Since these procedures are locale–independent, they may not be appropriate for some locales.

NOTE Word breaking, as needed for the correct casing of the upper case greek sigma and for string-titlecase, is specified in Unicode Standard Annex #29.

Procedure: string-ci=? string1 string2 string3 …

Procedure: string-ci<? string1 string2 string3 …

Procedure: string-ci>? string1 string2 string3 …

Procedure: string-ci<=? string1 string2 string3 …

Procedure: string-ci>=? string1 string2 string3 …

These procedures are similar to string=?, etc., but operate on the case–folded versions of the strings.

(string-ci<? "z" "Z")                   ⇒ #f
(string-ci=? "z" "Z")                   ⇒ #t

Procedure: string-normalize-nfd string

Procedure: string-normalize-nfkd string

Procedure: string-normalize-nfc string

Procedure: string-normalize-nfkc string

These procedures take a string argument and return a string result, which is the input string normalized to Unicode normalization form D, KD, C, or KC, respectively. When the specified result is equal in the sense of string=? to the argument, these procedures may return the argument instead of a newly allocated string.

(string-normalize-nfd "\xE9;")          ⇒ "\x65;\x301;"
(string-normalize-nfc "\xE9;")          ⇒ "\xE9;"
(string-normalize-nfd "\x65;\x301;")    ⇒ "\x65;\x301;"
(string-normalize-nfc "\x65;\x301;")    ⇒ "\xE9;"