युनिकोड: Difference between revisions

Content deleted Content added
Line ६२:
 
Further additions of characters to the already-encoded scripts, as well as symbols, in particular for [[mathematics]] and [[music]] (in the form of notes and rhythmic symbols), also occur. The [http://www.unicode.org/roadmaps/ Unicode Roadmap] lists scripts not yet in Unicode with tentative assignments to code blocks. Invented scripts, most of which do not qualify for inclusion in Unicode due to lack of real-world usage, are listed in the [[ConScript Unicode Registry]], along with unofficial but widely-used [[Private Use Area]] code assignments. Similarly, many medieval letter variants and ligatures not in Unicode are encoded in the [[Medieval Unicode Font Initiative]].
 
== Issues ==
 
Some people, mostly in [[Japan]], oppose Unicode in general, claiming technical limitations and political problems in its operation. People working on the Unicode standard regard such claims simply as misunderstandings of the Unicode standard and of the process by which it has evolved. The most common mistake, according to this view, involves confusion between abstract [[character (computing)|characters]] and their highly-variable visual forms ([[glyphs]]). On the other hand, whereas [[Chinese language|Chinese]] can readily read most types of glyphs used by Japanese or [[Korea]]ns, Japanese often can recognize only a particular variant.
 
Some have decried Unicode as a plot against Asian cultures perpetrated by Westerners with no understanding of the characters as used in Chinese, Korean, and Japanese, despite the presence of a majority of experts from all three regions in the [[Ideographic Rapporteur Group]] (IRG). The IRG advises the consortium and ISO on additions to the repertoire and on [[Han unification]], the identification of forms in the three languages which one can treat as stylistic variations of the same historical character. Han unification has become one of the most controversial aspects of Unicode.
 
Unicode is criticized for failing to allow for older and alternate forms of [[kanji]] which, critics argue, complicates the processing of ancient Japanese and uncommon Japanese names, although it follows the recommendations of Japanese language scholars and of the Japanese government. There have been several attempts to create an alternative to Unicode. [http://www-106.ibm.com/developerworks/unicode/library/u-secret.html] Among them are [[TRON (encoding)|TRON]] (although it is not widely adopted in Japan, some, particularly those who need to handle historical Japanese text, favor this), and [[UTF-2000]].
 
It is true that many older forms were not included in early versions of the Unicode standard, but Unicode 4.0 contains more than 90,000 Han characters, far more than any dictionary or any other standard, and work continues on adding characters from the early literature of China, Korea, and Japan. Some argue, however, that this is not satisfactory, pointing out as an example the need to create new characters, representing words in various [[Chinese dialects]], more of which may be invented in the future.
 
An alternative way, pursued by people like [[Chu Bong-Foo]], uses encoding which provides information on the radicals making up Han characters. For example, a 1991 Chinese computing system by Chu already provides 60,000 Han characters support, and takes up only 80KB memory space for the generation of glyphs from raw [[Cangjie method|Cangjie]] codes.
 
Their argument against Unicode is that the Unicode approach to Han characters is the same as assigning every English word with a separate code.
 
[[Thai language]] support has been criticized for its illogical ordering of Thai characters. This complication is due to Unicode inheriting the [[TIS-620|Thai Industrial Standard 620]], which worked in the same way. This ordering problem complicates the Unicode collation process. [http://www-106.ibm.com/developerworks/unicode/library/u-secret.html]
 
[[Indic script|Indic Scripts]] such as [[Tamil script|Tamil]] and [[Telugu script|Telugu]] are each allocated only 128 slots of the Unicode space, matching the [[ISCII]] standard. The correct rendering of Unicode Indic text requires transforming the stored logical order characters into visual order and the forming of compound characters out of components. Local scholars are arguing in favor of an assignment of Unicode codepoint to compound characters. This will most likely not happen, as can be seen of the case of Tibetan script where even the Chinese National Standard organization failed to achieve a similar change.
 
Opponents of Unicode sometimes erroneously claim even now that it cannot handle more than 65,535 characters, even though this limitation was removed in Unicode 2.0.
 
== Trivia ==