International Characters Support: Difference between revisions

Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
=ANSI=
=ANSI=


The first widely character set was the 7-bits ANSI, with values ranging from 0 to 127. Being developped for English, it uses latin character set, but without accents and other punctuation signs.
The first widely used character set was 7-bits ANSI, with values ranging from 0 to 127. Being developed for English, it uses a Latin character set, but without accents and other punctuation signs (diacritical marks).


In the '80s, extensions were provided by using 8-bits character tables, whose characters 128 to 255 where used to encode the missing values. But there were so many that those 128 values were not enough. So a number of maps where defined. For instance, ISO-8859-1 for Western Europeans Languages, with letter for french: é, Nordic languages: Ø, a few symbols: ½, and so on.
In the '80s, extensions were provided by using 8-bits character tables, whose code numbers 128 to 255 where used to encode the missing values. But there were so many characters that those additional 128 values were not enough. So a number of maps (code pages) where defined. For instance, ISO-8859-1 for Western European Languages, with letter for French: é, German ä, Nordic languages: Ø, a few math symbols: °, µ, ½, and so on.
Typical computer support consisted in early loading the adequate character map, then glyphs were rendered correctly.
Typical computer support consisted in loading the adequate character map beforehand, then glyphs were rendered correctly.


The first issue with this approach is about conversion. To view some text in Greek or Cyrillic language on a display configured for Western European requires to switch back and forth between codepages.  
The first issue with this approach is about conversion. To view some text in Greek or Cyrillic language on a display configured for Western European requires switching back and forth between code pages.


=Unicode=
=Unicode=