Next: stdlib io port eof object, Previous: stdlib io port buffer modes, Up: stdlib io port [Index]
Several different Unicode encoding schemes describe standard ways to encode characters and strings as byte sequences and to decode those sequences. Within this document, a codec is an immutable Scheme object that represents a Unicode or similar encoding scheme.
An end–of–line style is a symbol that, if it is not none
,
describes how a textual port transcodes representations of line endings.
A transcoder is an immutable Scheme object that combines a codec with an end–of–line style and a method for handling decoding errors. Each transcoder represents some specific bidirectional (but not necessarily lossless), possibly stateful translation between byte sequences and Unicode characters and strings. Every transcoder can operate in the input direction (bytes to characters) or in the output direction (characters to bytes). A transcoder argument name means that the corresponding argument must be a transcoder.
A binary port is a port that supports binary I/O, does not have an associated transcoder and does not support textual I/O. A textual port is a port that supports textual I/O, and does not support binary I/O. A textual port may or may not have an associated transcoder.
These are predefined codecs for the ISO 8859-1, UTF-8, and UTF-16 encoding schemes.
A call to any of these procedures returns a value that is equal in the
sense of eqv?
to the result of any other call to the same
procedure.
?eol-style-symbol should be a symbol whose name is one of
lf
, cr
, crlf
, nel
, crnel
, ls
,
and none
.
The form evaluates to the corresponding symbol. If the name of
?eol-style-symbol is not one of these symbols, the effect and
result are implementation–dependent; in particular, the result may be
an eol–style symbol acceptable as an eol-style argument to
make-transcoder
. Otherwise, an exception is raised.
All eol–style symbols except none
describe a specific
line–ending encoding:
lf ?linefeed cr ?carriage-return crlf ?carriage-return ?linefeed nel ?next-line crnel ?carriage-return ?next-line ls ?line-separator
For a textual port with a transcoder, and whose transcoder has an
eol–style symbol none
, no conversion occurs. For a textual
input port, any eol–style symbol other than none
means that all
of the above line-ending encodings are recognized and are translated
into a single linefeed. For a textual output port, none
and
lf
are equivalent. Linefeed characters are encoded according to
the specified eol-style symbol, and all other characters that
participate in possible line endings are encoded as is.
NOTE Only the name of ?eol-style-symbol is significant.
Return the default end–of–line style of the underlying platform, e.g.,
lf
on Unix and crlf
on Windows.
This condition type could be defined by:
(define-condition-type &i/o-decoding &i/o-port make-i/o-decoding-error i/o-decoding-error?)
An exception with this type is raised when one of the operations for textual input from a port encounters a sequence of bytes that cannot be translated into a character or string by the input direction of the port’s transcoder.
When such an exception is raised, the port’s position is past the invalid encoding.
This condition type could be defined by:
(define-condition-type &i/o-encoding &i/o-port make-i/o-encoding-error i/o-encoding-error? (char i/o-encoding-error-char))
An exception with this type is raised when one of the operations for textual output to a port encounters a character that cannot be translated into bytes by the output direction of the port’s transcoder. char is the character that could not be encoded.
?error-handling-mode-symbol should be a symbol whose name is one
of ignore
, raise
, and replace
.
The form evaluates to the corresponding symbol. If
?error-handling-mode-symbol is not one of these identifiers,
effect and result are implementation–dependent: the result may be an
error–handling–mode symbol acceptable as a handling-mode
argument to make-transcoder
. If it is not acceptable as a
handling-mode argument to make-transcoder
, an exception is
raised.
NOTE Only the name of ?error-handling-style-symbol is significant.
The error–handling mode of a transcoder specifies the behavior of textual I/O operations in the presence of encoding or decoding errors.
If a textual input operation encounters an invalid or incomplete
character encoding, and the error–handling mode is ignore
, an
appropriate number of bytes of the invalid encoding are ignored and
decoding continues with the following bytes. If the error–handling
mode is replace
, the replacement character U+FFFD
is
injected into the data stream, an appropriate number of bytes are
ignored, and decoding continues with the following bytes. If the
error–handling mode is raise
, an exception with condition type
&i/o-decoding
is raised.
If a textual output operation encounters a character it cannot encode,
and the error–handling mode is ignore
, the character is ignored
and encoding continues with the next character. If the error–handling
mode is replace
, a codec–specific replacement character is
emitted by the transcoder, and encoding continues with the next
character. The replacement character is U+FFFD
for transcoders
whose codec is one of the Unicode encodings, but is the ?
character for the Latin–1 encoding. If the error–handling mode is
raise
, an exception with condition type &i/o-encoding
is
raised.
codec must be a codec; eol-style, if present, an eol–style symbol; and handling-mode, if present, an error–handling–mode symbol.
eol-style may be omitted, in which case it defaults to the native
end–of–line style of the underlying platform. handling-mode may
be omitted, in which case it defaults to replace
. The result is
a transcoder with the behavior specified by its arguments.
Return an implementation–dependent transcoder that represents a possibly locale-dependent “native” transcoding.
These are accessors for transcoder objects; when applied to a transcoder
returned by make-transcoder
, they return the codec,
eol-style, and handling-mode arguments, respectively.
Return the string that results from transcoding the bytevector according to the input direction of the transcoder.
Return the bytevector that results from transcoding the string according to the output direction of the transcoder.
Next: stdlib io port eof object, Previous: stdlib io port buffer modes, Up: stdlib io port [Index]