character_sets
This library provides a character_set_protocol protocol plus
concrete objects for converting between lists of character codes and
lists of bytes. It also provides metadata predicates
preferred_mime_name/1, name/1, alias/1, and mibenum/1
based on the IANA character set registry:
https://www.iana.org/assignments/character-sets/character-sets.xhtml
The currently provided objects are:
us_asciiiso_8859_1iso_8859_2iso_8859_3iso_8859_4iso_8859_9iso_8859_10iso_8859_13iso_8859_14iso_8859_15iso_8859_16windows_1250windows_1251windows_1252windows_1253windows_1254windows_1257utf_8utf_16leutf_16beutf_32leutf_32be
Object names are derived from the preferred IANA MIME names by
lowercasing them and replacing hyphens with underscores. When a registry
entry has no distinct preferred MIME alias, the registered IANA name is
used instead. A compatibility alias object named utf16be is also
provided for utf_16be.
The Unicode character set objects work with Unicode scalar values and do not emit or consume a byte order mark (BOM).
This library intentionally does not currently provide Shift_JIS or
GB18030 objects because portable mapping tables for those multibyte
encodings are not yet included.
No input validation is performed when converting between character codes
and bytes. When necessary, use the types library validation and
checking predicates before calling the codes_to_bytes/2 and
bytes_to_codes/2 predicates.
API documentation
Open the ../../apis/library_index.html#character_sets link in a web browser.
Loading
To load all entities in this library, load the loader.lgt file:
| ?- logtalk_load(character_sets(loader)).
Testing
To test this library predicates, load the tester.lgt file:
| ?- logtalk_load(character_sets(tester)).
Usage
The UTF, ISO 8859, and Windows character set objects are grouped in three main files:
utf_character_sets.lgtiso_8859_character_sets.lgtwindows_character_sets.lgt
This allows some customization of the character set objects loaded by
your application. Note that the character_set_protocol.lgt and
character_sets.lgt base files must always be loaded (they include
the us_ascii character set, which is thus always loaded).