International Characters Support: Difference between revisions

Correct wchar_t description
m (typo fix)
(Correct wchar_t description)
Line 28: Line 28:
What is important here is that usual characters should be declared as "chars" or "signed chars". "Unsigned char" means they MAY be submitted to truncation of the eighth bit, this is implementation-dependant.
What is important here is that usual characters should be declared as "chars" or "signed chars". "Unsigned char" means they MAY be submitted to truncation of the eighth bit, this is implementation-dependant.


In order to support wide-characters, the two-byte storage wchar_t was added to the C standard. Functions whose argument is wchar instead of char are generally prefixed by "w".
In order to support "wide" characters with an extended range of values, the storage type wchar_t was added to the C standard. The size of wchar_t is system dependent: on Windows, it is 2 bytes, and on Linux and macOS it is 4 bytes. Functions whose argument is wchar instead of char are generally prefixed by "w".


=Character functions=
=Character functions=
Line 106: Line 106:
== Development ==
== Development ==
* short term: tests to ensure every string processing is 8-bit clean
* short term: tests to ensure every string processing is 8-bit clean
* middle and long term: there are a number of options to fully support whatever symbol existing in Unicode:
* middle and long term: there are a number of options to fully support whatever symbols exist in Unicode:
** make use of C wchar_t type
** make use of C wchar_t, char16_t, or char32_t types
** make use of ICU [http://site.icu-project.org/], an open-source lib with various Unicode support functions
** make use of ICU [http://site.icu-project.org/], an open-source lib with various Unicode support functions


[[Category:Development]]
[[Category:Development]]
21

edits