International Characters Support: Difference between revisions

no edit summary
No edit summary
Line 16: Line 16:
=Storage and binary representation=  
=Storage and binary representation=  
* UTF-32 stores each symbols on 4 bytes according to two schemes: Big Endian and Little Endian
* UTF-32 stores each symbols on 4 bytes according to two schemes: Big Endian and Little Endian
* UTF-16 stores most of its symbols on 2 bytes; rarelly used values are stored using a sequence of "prefix-value". Two schemes: Big Endian and Little Endian
* UTF-16 stores most of its symbols on 2 bytes; rarely used values are stored using a sequence of "prefix-value". Two schemes: Big Endian and Little Endian
* UTF-8 was designed to be mostly compabile with ASCII; symbols storage is either 1, 2, 3, 4 bytes. This scheme is defined sequentially, there is no ambiguity linked to its endianess.
* UTF-8 was designed to be mostly compatible with ASCII; symbols storage is either 1, 2, 3, 4 bytes. This scheme is defined sequentially, there is no ambiguity linked to its endianness.


=C and C++ support=
=C and C++ support=
Line 50: Line 50:
| number of symbols <= storage length
| number of symbols <= storage length
| number of symbols <= storage length
| number of symbols <= storage length
| number of symbols proportionnal to storage length
| number of symbols proportional to storage length
|-
|-
| tests
| tests
| must be aware of UTF-8 and locales
| must be aware of UTF-8 and locales
| must be aware of UTF-16, locales AND endianess
| must be aware of UTF-16, locales AND endianness
| must be aware of UTF-32, locales AND endianess
| must be aware of UTF-32, locales AND endianness
|-
|-
| finding symbols
| finding symbols
| must implementent a sequential machine for prefix codes or be 8-bit clean and be able to locate sequences
| must implement a sequential machine for prefix codes or be 8-bit clean and be able to locate sequences
| must implementent a sequential machine for prefix codes; must be aware of endianess
| must implement a sequential machine for prefix codes; must be aware of endianness
| must be aware of endianess
| must be aware of endianness
|-
|-
| concatenating strings
| concatenating strings
| must be 8-bit compatible
| must be 8-bit compatible
| must verify the endianess are the same
| must verify the endianness are the same
| must verify the endianess are the same
| must verify the endianness are the same
|-
|-
| displaying strings
| displaying strings
| must pass it to external app without truncating the 8th bit
| must pass it to external app without truncating the 8th bit
| must ensure the external app is UTF-16; must check for endianess
| must ensure the external app is UTF-16; must check for endianness
| must ensure the external app is UTF-32; must check for endianess
| must ensure the external app is UTF-32; must check for endianness
|-
|-
|}
|}
Anonymous user