character_sets

This library provides a character_set_protocol protocol plus concrete objects for converting between lists of character codes and lists of bytes. It also provides metadata predicates preferred_mime_name/1, name/1, alias/1, and mibenum/1 based on the IANA character set registry:

https://www.iana.org/assignments/character-sets/character-sets.xhtml

The currently provided objects are:

  • us_ascii

  • iso_8859_1

  • iso_8859_2

  • iso_8859_3

  • iso_8859_4

  • iso_8859_9

  • iso_8859_10

  • iso_8859_13

  • iso_8859_14

  • iso_8859_15

  • iso_8859_16

  • windows_1250

  • windows_1251

  • windows_1252

  • windows_1253

  • windows_1254

  • windows_1257

  • utf_8

  • utf_16le

  • utf_16be

  • utf_32le

  • utf_32be

Object names are derived from the preferred IANA MIME names by lowercasing them and replacing hyphens with underscores. When a registry entry has no distinct preferred MIME alias, the registered IANA name is used instead. A compatibility alias object named utf16be is also provided for utf_16be.

The Unicode character set objects work with Unicode scalar values and do not emit or consume a byte order mark (BOM).

This library intentionally does not currently provide Shift_JIS or GB18030 objects because portable mapping tables for those multibyte encodings are not yet included.

No input validation is performed when converting between character codes and bytes. When necessary, use the types library validation and checking predicates before calling the codes_to_bytes/2 and bytes_to_codes/2 predicates.

API documentation

Open the ../../apis/library_index.html#character_sets link in a web browser.

Loading

To load all entities in this library, load the loader.lgt file:

| ?- logtalk_load(character_sets(loader)).

Testing

To test this library predicates, load the tester.lgt file:

| ?- logtalk_load(character_sets(tester)).

Usage

The UTF, ISO 8859, and Windows character set objects are grouped in three main files:

  • utf_character_sets.lgt

  • iso_8859_character_sets.lgt

  • windows_character_sets.lgt

This allows some customization of the character set objects loaded by your application. Note that the character_set_protocol.lgt and character_sets.lgt base files must always be loaded (they include the us_ascii character set, which is thus always loaded).