Previous: stdlib bytevector flonum, Up: stdlib bytevector [Index]
This section describes procedures that convert between strings and
bytevectors containing Unicode encodings of those strings. When
decoding bytevectors, encoding errors are handled as with the
replace
semantics of textual I/O: If an invalid or incomplete
character encoding is encountered, then the replacement character
U+FFFD
is appended to the string being generated, an appropriate
number of bytes are ignored, and decoding continues with the following
bytes.
Return a newly allocated (unless empty) bytevector that contains the UTF-8 encoding of the given string.
If endianness is specified, it must be the symbol big
or
the symbol little
. The string->utf16
procedure returns a
newly allocated (unless empty) bytevector that contains the UTF-16BE
or UTF-16LE encoding of the given string (with no byte–order mark).
If endianness is not specified or is big
, then UTF-16BE is
used. If endianness is little
, then UTF-16LE is used.
If endianness is specified, it must be the symbol big
or
the symbol little
. The string->utf32
procedure returns a
newly allocated (unless empty) bytevector that contains the UTF-32BE
or UTF-32LE encoding of the given string (with no byte mark). If
endianness is not specified or is big
, then UTF-32BE is used.
If endianness is little
, then UTF-32LE is used.
Return a newly allocated (unless empty) string whose character sequence is encoded by the given bytevector.
As Vicare extension: the optional argument handling-mode
must be a symbol representing an error handling mode, as validated by
error-handling-mode
(see error-handling-mode); when not given, it defaults to ‘raise’.
The argument endianness must be the symbol big
or the
symbol little
.
The utf16->string
procedure returns a newly allocated (unless
empty) string whose character sequence is encoded by the given
bytevector.
bytevector is decoded according to UTF-16, UTF-16BE,
UTF-16LE, or a fourth encoding scheme that differs from all three of
those as follows: If endianness-mandatory is absent or #f
,
utf16->string
determines the endianness according to a UTF-16
Byte Order Mark (BOM) at the beginning of bytevector if a BOM
is present; in this case, the BOM is not decoded as a character. Also
in this case, if no UTF-16 BOM is present, endianness specifies
the endianness of the encoding. If endianness-mandatory is a true
value, endianness specifies the endianness of the encoding, and
any UTF-16 BOM in the encoding is decoded as a regular character.
NOTE A UTF-16 BOM is either a sequence of bytes
#xFE
,#xFF
specifyingbig
and UTF-16BE, or#xFF
,#xFE
specifyinglittle
and UTF-16LE.
(utf16->string '#vu8(#xAA #xBB) (endianness big)) ⇒ "\xAABB;" (utf16->string '#vu8(#xAA #xBB) (endianness little)) ⇒ "\xBBAA;" ;;In all the following tests: the endianness argument is ;;ignored; the BOM is processed; an empty string is generated. ;;Big endian BOM. (utf16->string '#vu8(#xFE #xFF) (endianness big) #f) ⇒ "" (utf16->string '#vu8(#xFE #xFF) (endianness little) #f) ⇒ "" ;;Little endian BOM. (utf16->string '#vu8(#xFF #xFE) (endianness big) #f) ⇒ "" (utf16->string '#vu8(#xFF #xFE) (endianness little) #f) ⇒ "" ;;In all the following tests: the endianness argument is ;;ignored; the BOM is processed; a string of 1 character is ;;generated. ;;Big endian BOM. (utf16->string '#vu8(#xFE #xFF #xAA #xBB) (endianness big) #f) ⇒ "\xAABB;" (utf16->string '#vu8(#xFE #xFF #xAA #xBB) (endianness little) #f) ⇒ "\xAABB;" ;;Little endian BOM. (utf16->string '#vu8(#xFF #xFE #xAA #xBB) (endianness big) #f) ⇒ "\xBBAA;" (utf16->string '#vu8(#xFF #xFE #xAA #xBB) (endianness little) #f) ⇒ "\xBBAA;"
As Vicare extension: the optional argument handling-mode
must be a symbol representing an error handling mode, as validated by
error-handling-mode
(see error-handling-mode); when not given, it defaults to ‘raise’.
endianness must be the symbol big
or the symbol
little
.
The utf32->string
procedure returns a newly allocated (unless
empty) string whose character sequence is encoded by the given
bytevector.
bytevector is decoded according to UTF-32, UTF-32BE,
UTF-32LE, or a fourth encoding scheme that differs from all three of
those as follows: If endianness-mandatory is absent or #f
,
utf32->string
determines the endianness according to a UTF-32
Byte Order Mark (BOM) at the beginning of bytevector if a
BOM is present; in this case, the BOM is not decoded as a
character. Also in this case, if no UTF-32 BOM is present,
endianness specifies the endianness of the encoding. If
endianness-mandatory is a true value, endianness specifies
the endianness of the encoding, and any UTF-32 BOM in the encoding
is decoded as a regular character.
NOTE A UTF-32 BOM is either a sequence of bytes
#x00
,#x00
,#xFE
,#xFF
specifyingbig
and UTF-32BE, or#xFF
,#xFE
,#x00
,#x00
, specifyinglittle
and UTF-32LE.
As Vicare extension: the optional argument handling-mode
must be a symbol representing an error handling mode, as validated by
error-handling-mode
(see error-handling-mode); when not given, it defaults to ‘raise’.
Previous: stdlib bytevector flonum, Up: stdlib bytevector [Index]