International Characters Support

From Octave
Revision as of 08:52, 1 April 2014 by CdeMills (talk | contribs)
Jump to navigation Jump to search

ANSI

The first widely character set was the 7-bits ANSI, with values ranging from 0 to 127. Being developped for English, it uses latin character set, but without accents and other punctuation signs.

In the '80s, extensions were provided by using 8-bits character tables, whose characters 128 to 255 where used to encode the missing values. But there were so many that those 128 values were not enough. So a number of maps where defined. For instance, ISO-8859-1 for Western Europeans Languages, with letter for french: é, Nordic languages: Ø, a few symbols: ½, and so on. Typical computer support consisted in early loading the adequate character map, then glyphs were rendered correctly.

The first issue with this approach is about conversion. To view some text in Greek or Cyrillic language on a display configured for Western European requires to switch back and forth between codepages.

Unicode

Unicode is a standard and an effort to encode symbols from every language existing or having existed on Earth. There are actually 190000 signs from 93 languages. Unicode is equivalent to ISO/CEI 10646.