International Characters Support: Difference between revisions
no edit summary
(→ANSI) |
No edit summary |
||
Line 16: | Line 16: | ||
=Storage and binary representation= | =Storage and binary representation= | ||
* UTF-32 stores each symbols on 4 bytes according to two schemes: Big Endian and Little Endian | * UTF-32 stores each symbols on 4 bytes according to two schemes: Big Endian and Little Endian | ||
* UTF-16 stores most of its symbols on 2 bytes; | * UTF-16 stores most of its symbols on 2 bytes; rarely used values are stored using a sequence of "prefix-value". Two schemes: Big Endian and Little Endian | ||
* UTF-8 was designed to be mostly | * UTF-8 was designed to be mostly compatible with ASCII; symbols storage is either 1, 2, 3, 4 bytes. This scheme is defined sequentially, there is no ambiguity linked to its endianness. | ||
=C and C++ support= | =C and C++ support= | ||
Line 50: | Line 50: | ||
| number of symbols <= storage length | | number of symbols <= storage length | ||
| number of symbols <= storage length | | number of symbols <= storage length | ||
| number of symbols | | number of symbols proportional to storage length | ||
|- | |- | ||
| tests | | tests | ||
| must be aware of UTF-8 and locales | | must be aware of UTF-8 and locales | ||
| must be aware of UTF-16, locales AND | | must be aware of UTF-16, locales AND endianness | ||
| must be aware of UTF-32, locales AND | | must be aware of UTF-32, locales AND endianness | ||
|- | |- | ||
| finding symbols | | finding symbols | ||
| must | | must implement a sequential machine for prefix codes or be 8-bit clean and be able to locate sequences | ||
| must | | must implement a sequential machine for prefix codes; must be aware of endianness | ||
| must be aware of | | must be aware of endianness | ||
|- | |- | ||
| concatenating strings | | concatenating strings | ||
| must be 8-bit compatible | | must be 8-bit compatible | ||
| must verify the | | must verify the endianness are the same | ||
| must verify the | | must verify the endianness are the same | ||
|- | |- | ||
| displaying strings | | displaying strings | ||
| must pass it to external app without truncating the 8th bit | | must pass it to external app without truncating the 8th bit | ||
| must ensure the external app is UTF-16; must check for | | must ensure the external app is UTF-16; must check for endianness | ||
| must ensure the external app is UTF-32; must check for | | must ensure the external app is UTF-32; must check for endianness | ||
|- | |- | ||
|} | |} |