The ISO 8859 Character Set

ISO 8859 is a full series of standardized multilingual single-byte coded (8bit) graphic character sets for writing in alphabetic languages:

  1. Latin1 (West European)
  2. Latin2 (East European)
  3. Latin3 (South European)
  4. Latin4 (North European)
  5. Cyrillic
  6. Arabic
  7. Greek
  8. Hebrew
  9. Latin5 (Turkish)
  10. Latin6 (Nordic)

The ISO 8859 charsets were designed in the mid-1980s by the European Computer Manufacturer's Association (ECMA) and endorsed by the International Standards Organisation (ISO).

The following bitmap GIFs show only the upper G1 portions of the respective charsets. Characters 0 to 127 are always identical with US-ASCII and the positions 128 to 159 hold some less used control characters: the so-called C1 set from ISO 6429.

ISO-8859-1 (Latin1)


charset=ISO-8859-1

Latin1 covers most West European languages, such as French (fr), Spanish (es), Catalan (ca), Basque (eu), Portuguese (pt), Italian (it), Albanian (sq), Rhaeto-Romanic (rm), Dutch (nl), German (de), Danish (da), Swedish (sv), Norwegian (no), Finnish (fi), Faroese (fo), Icelandic (is), Irish (ga), Scottish (gd), and English (en), incidentally also Afrikaans (af) and Swahili (sw), thus in effect also the entire American continent, Australia and much of Africa. Latin1 has also been adopted as the first page of ISO 10646 (Unicode).

DEC-MCS

ISO-8859-1 was derived from the DEC Multinational Character Set used on the standard DEC VT-220 terminals:


charset=DEC-MCS

CP1252 (WinLatin1)

You often see Microsoft Windows users (check out my code page survey) announcing their texts as being in ISO-8859-1 even when in fact they contain funny characters from the CP1252 superset (and they may become more since Microsoft has also added the Euro to their code pages), so here you have a Unix font for them:


charset=Windows-1252 BDF

ISO-8859-2 (Latin2)


charset=ISO-8859-2

Latin2 covers the languages of Central and Eastern Europe: Czech (cs), Hungarian (hu), Polish (pl), Romanian (ro), Croatian (hr), Slovak (sk), Slovenian (sl), Sorbian.

The German umlauts are found at exactly the same positions in Latin1, Latin2, Latin3, Latin4, Latin5, Latin6. Thus you can write German+Polish with Latin2 or German+Turkish with Latin5 but there is no 8bit charset to properly mix German+Russian, for instance.

ISO-8859-3 (Latin3)


charset=ISO-8859-3

Latin3 is popular with authors of Esperanto (eo) and Maltese (mt), and it covered Turkish before the introduction of Latin5 in 1988.

ISO-8859-4 (Latin4)


charset=ISO-8859-4

Latin4 introduced letters for Estonian (et), theBaltic languages Latvian (lv, Lettish) and Lithuanian (lt), Greenlandic (kl) and Lappish. Note that Latvian requires the cedilla on the =BB U+0123 LATIN SMALL LETTER G WITH CEDILLA to jump on top. Latin4 was followed by Latin6.

ISO-8859-5 (Cyrillic)


charset=ISO-8859-5

With these Cyrillic letters you can type Bulgarian (bg), Byelorussian (be), Macedonian (mk), Russian (ru), Serbian (sr) and pre-1990 (no ghe with upturn) Ukrainian (uk). The ordering is based on the (incompatibly) revised GOST 19768 of 1987 with the Russian letters except for sorted by Russian alphabet (ABVGDE).

Note that several other Cyrillic charsets are used on the net. Have a look at my neighboring Cyrillic charsets page.

ISO-8859-6 (Arabic)


charset=ISO-8859-6

This is the Arabic alphabet, unfortunately the basic alphabet for the Arabic (ar) language only and not containing the four extra letters for Persian (fa) nor the eight extra letters for Pakistani Urdu (ur). This fixed font is not well-suited for text display. Each Arabic letter occurs in up to four (2) presentation forms: initial, medial, final or separate. To make Arabic text legible you'll need a display engine that analyses the context and combines the appropriate glyphs on top of a handler for the reverse writing direction shared with Hebrew. The rendering algorithm is described in the Unicode book and I have implemented it in my arabjoin perl script.

ISO-8859-7 (Greek)


charset=ISO-8859-7 BDF

This is (modern monotonic) Greek (el) to me. ISO-8859-7 was formerly known as ELOT 928 or ECMA-118:1986.

ISO-8859-8 (Hebrew)


charset=ISO-8859-8 BDF

And this is the Hebrew script used by Hebrew (iw) and Yiddish (ji). Like Arabic it is written leftwards.

ISO-8859-9 (Latin5)


charset=ISO-8859-9 BDF

Latin5 replaces the rarely needed Icelandic letters in Latin1 with the Turkish ones.

ISO-8859-10 (Latin6)


charset=ISO-8859-10

Introduced in 1992, Latin6 rearranged the Latin4 characters, dropped some symbols and the Latvian ŗ, added the last missing Inuit (Greenlandic Eskimo) and non-Skolt Sami (Lappish) letters and reintroduced the Icelandic to cover the entire Nordic area. Skolt Sami still needs a few more accents. Note that RFC 1345 and GNU recode contain errors and use a preliminary and different latin6.