iconv library ¶The Recode library is able to use the capabilities of an
external, pre-installed iconv library, usually as provided by GNU
libc or the portable libiconv written by Bruno Haible. In
fact, many capabilities of the Recode library are duplicated in
an external iconv library, as they likely share many charsets.
We discuss, here, the issues related to this duplication, and other
peculiarities specific to the iconv library.
The RECODE_STRICT_MAPPING_FLAG option, corresponding to the
‘--strict’ flag, is implemented by adding iconv option
//IGNORE to the ‘after’ encoding. This has the side effect
that untranslatable input is only signalled at the end of the
conversion, whereas with Recode’s built-in conversion routines the error
will be signalled immediately.
If the string -translit is appended to the after encoding,
characters being converted are transliterated when needed and possible.
This means that when a character cannot be represented in the target
character set, it can be approximated through one or several similar
looking characters. Characters that are outside of the target character
set and cannot be transliterated are replaced with a question mark (?)
in the output. This corresponds to the iconv option
//TRANSLIT.
To check whether iconv is used for a particular conversion,
just use the ‘-v’ or ‘--verbose’ option, see Controlling how files are recoded, and
check whether ‘:iconv:’ appears as an intermediate charset.
The :iconv: charset represents a conceptual pivot charset within
the external iconv library (in fact, this pivot exists, but is
not directly reachable). This charset has a : (a mere colon) and
:libiconv: for aliases. It is not allowed to recode from or to
this charset directly. But when this charset is selected as an
intermediate, usually by automatic means, then the external iconv
Recode library is called to handle the transformations. By using an
‘--ignore=:iconv:’ option on the recode call or
equivalently, but more simply, ‘-x:’, Recode is instructed to avoid
this charset as an intermediate, with the consequence that the external
iconv library is not used. You can also use
--prefer-iconv to use iconv if possible. Consider these
calls:
recode l1..1250 < input > output recode -x: l1..1250 < input > output recode --prefer-iconv l1..1250 < input > output
All should transform input from ISO-8859-1 to CP1250
on output. The first call might use the external iconv
library, while the second call definitely avoids it. The third call
will use the external iconv library if it supports the required
conversion. Whatever the path used, the results should normally be
identical. However, there might be observable differences. Most of
them might result from reversibility issues, as the external
iconv engine does not likely address reversibility in the same
way. Even if much less likely, some differences might result from
slight errors in the tables used, such differences should then be
reported as bugs.
Discrepancies might be seen in the area of error detection and recovery.
The Recode library usually tries to detect canonicity errors in
input, and production of ambiguous output, but the external iconv
library does not necessarily do it the same way. Moreover, the
Recode library may not always recover as nicely as possible when
the external iconv has no translation for a given character.
The external iconv libraries may offer different sets of charsets
and aliases from one library to another, and also between successive
versions of a single library. Best is to check the documentation of
the external iconv library, as of the time Recode was
installed, to know which charsets and aliases are being provided.
The ‘--ignore=:iconv:’ or ‘-x:’ options might be useful when
there is a need to make a recoding more exactly repeatable between
machines or installations, the idea being here to remove the variance
possibly introduced by the various implementations of an external
iconv library. These options might also help deciding whether if
some recoding problem is genuine to Recode, or is induced by the
external iconv library.