118x Filetype PDF File size 0.46 MB Source: www.cs.cmu.edu
8.3 The Perso-Arabic Standard for iv. Numerals are placed after erabs and superscripts. Information Interchange (This is provided only to support display for The standard proposed by C-DAC GIST is an language specific numerals and standard extension to the standard 8-bit ASCII. It numerals i.e. the ASCII numerals are available). compliments the symbol set of Latin script by Standardization of Perso-Arabic Fonts adding the symbol of the Perso-Arabic scripts. The standard supports storage for the Perso-Arabic Characteristics of Perso-Arabic languages : languages like Urdu, Persian, Sindhi, Kashmiri, and Arabic. Perso-Arabic languages are written in Naskh & Characteristics Nastaliq scripts. Urdu & Kashmiri are traditionally written in the Nastaliq script ; while Sindhi is written i. Its a 8-bit standard in Naskh script. Although the script employs basic ii. Supports letters for Urdu, Arabic, Sindhi, letters of the language, the rendering of these letters Kashmiri in a word is extremely complex. The reason for this complexity is that the text has traditionally been iii. Defines Perso-Arabic alphabets in the upper composed through calligraphy, a medium whose ASCII (This leaves the lower ASCII free. The precepts are based on the aesthetic sense of the lower ASCII can be used for English alphabets calligrapher rather than on any formula. So great e.g. to give a bi-lingual font support). is the variation in calligraphy that many times it is iv. Defines numerals other than ASCII numbers difficult to recognize the letters in a constituent (48 to 57) (This may help supporting both word. This is because, in their calligraphed form, Arabic Numerals 0-9 and language specific the individual letters partially or completely fused numerals) into each other thereby losing their identity. A degree of fusion is purposely introduced to make v. Maintains the order of alphabets for Perso- the resulting fused glyph visually appealing. Arabic languages. Another characteristic of the Perso-Arabic languages vi. Alphabets / letters are placed in their ascending is the use of diacritics. Diacritics, although sparingly order. Letters like “bhey” are not provided for used, help in the proper pronunciation of the URDU but kept for languages like Sindhi. constituent word. The diacritics appear above or Urdu may make use of the digraph “be” and below a character to specify a vowel or emphasize a “choTi-he” for that. particular sound. These are essential for the removal vii. Minimal erabs are provided. Tanveen, for of ambiguities, natural language processing and example do-zabar, can be formed with the help speech synthesis. of double zabar. vii. Unicode compatability can be achieved by Standardization of Glyph Set having PASCII to UNICODE & viceversa Following was taken into consideration while converter. designing fonts for the Perso-Arabic languages. Superscripts Considering the complexities of the script it was i. Place for superscripts like khaRa-alif is provided not possible to accommodate all the glyphs / ligatures in an 8 bit code space. Hence 16 – bit ii. Place for superscripts for Arabic is provided font code space was considered. iii. Place for superscripts like “re-ze”, “ain”, etc. is 1. Alphabet provided. 58 October 2002 2. Numerals • Includes Beginning ligatures 3. Special characters • Includes Middle ligatures 4. Diacritics • Includes Ending ligatures. 5. Religious and linguistic symbols • Includes dotted circle glyph 6. Control characters India The 16-bit Nastaliq font for Urdu & Kashmiri India is a paradise in the foot of the great Himalayas Fonts developed by C-DAC for Urdu & Kashmiri in the northern end and lies cocooned by huge are 16-bit. The Glyphs are defined in the User Area oceans on the other three sides. While the Arabian of the Unicode range. The ASCII range is not used Sea borders the southwest side, the southeast is lulled and can be used for different purposes (it can be by the Bay of Bengal, and the southern tip - Kanya used to support English for example). Kumari (Cape Comorin) is washed by the Indian Ocean. Hence protected by such natural barriers • Includes all the basic shapes like mountains and water, it is separated from the • Includes all the starting shapes and variations rest of Asia. For geographers, it lies to the north of the equator between 8.4 and 37.6 degrees north • Includes all the middle shapes and variations latitude and 68.7 and 97.25 degrees east longitude. • Includes all the ending shapes and variations India measures 3214 km from north to south and 2933 kms from east to west. it has a land frontier • Includes levels for erabs (short vowels) of 15,200 kms and a coastline of 7516.5 kms. • Includes Complete ligatures India shares its political borders with Pakistan and • Includes Beginning ligatures Afghanistan on the west; Bangladesh and Myanmar in the east; Nepal, China, Tibet and Bhutan in the • Includes Middle ligatures north.The Capital of India is New Delhi. • Includes Ending ligatures Languages • Includes dotted circle glyph India has 18 officially recognized languages among The 16-bit Naskh font for Sindhi, Urdu & about 200 languages as enumerated in the census. Kashmiri. Names of Languages Font developed by C-DAC for Sindhi, Urdu & th Following languages are listed in the 8 schedule Kashmiri are 16-bit. The Glyphs are defined in the of the Constitution (given in Devanagri order): User Area of the Unicode range. The ASCII range • Assamese is not used and can be used for different purposes (it can be used to support English for example). • Urdu • Includes all the basic shapes • Oriya • Includes all the starting shapes • Kannada • Includes all the middle shapes • Kashmiri • Includes all the ending shapes • Konkani • Includes levels for erabs (short vowels) • Gujarati • Includes Complete ligatures • Tamil October 2002 59 • Telugu plexities of rendering, a number of alternate shapes • Nepali are possible for a single letter, considering its posi- tion in the word and the letter next to it. Due to • Punjabi this nature of Nastaliq, it increases the glyph set for • Bengali the language. • Manipuri The characters of Urdu also need diacritics to help • Marathi in a proper pronunciation of the constituent word. There are a number of diacritics, the common ones • Malayalam being Zabar, Zer, and Pesh. • Sanskrit History of Urdu language • Sindhi The word Urdu means ‘Lashkar’, derived from the • Hindi Turkish language meaning 'armies'. In the south Urdu Design Guide : General Information of India it flourished under the name of Dakhani and southwest as Gurjari while in Delhi its name Introduction changed from Hindi to Hindavi and Hindustani. This document provides general information about Alternate names of Urdu are DAKHINI(DAKANI, the Urdu language and some conventions of its DECCAN, DESIA, MIRGAN), PINJARI, usage in India. REKHTA (REKHTI). The information presented in this document is in- Population using the Urdu Language tended to assist in understanding the nature and 48,062,000 in India (1997 IMA); problems of Urdu implementation in the digital 10,719,000 in Pakistan (1993), or 7.57% of the medium. It contains the generic description of population; Urdu. Urdu is one of the official languages of India. It is 600,000 in Bangladesh; the official language of Pakistan, and spoken in 64,000 in Mauritius (1993 Johnstone). various countries around the world. 170,000 in South Africa (1987). Language Description 18,500 in Bahrain (1979 WA); Urdu belongs to the Indo-Aryan subgroup of the 17,800 in Oman (1980 WA); Indo-European family of languages. It has devel- oped with the heavy influences of Arabic, Persian 15,400 in Qatar; and Turkish languages. Urdu writing system is a 382,000 in Saudi Arabia; super set of Arabic and Persian and contains 39 characters. Urdu is written from right side to left. 3,562 in Fiji (1980 WA); Unlike English, the characters do not have upper 23,000 in Germany; and lower cases. Further, the shape assumed by a character in a word is context-sensitive i.e. the shape 14,000 in Norway; is different depending whether the position of the Totals : character is at the beginning, in the middle or at the end of the constituent word. 60,290,000 or more in all countries Urdu is traditionally written in Nastaliq, a script 104,000,000 including second language users rich in calligraphic content. Owing to the com- (1999 WA). 60 October 2002 PASCII (Perso-Arabic Standard for Information Interchange) Version 1.0 128 144 160 176 192 208 224 240 8 9 A B C D E F 0 9 y ¶ ª k 4 - 1 Kasheeda õ |»ëm5/ 2 @ _øÀÇl6; 3 + ö Åঠ7: 4 B c üÿ 8? ý gûÈ è 9= 5 ÿ 6 ò lþµ[ !. 7 G Ê Êk» fg 8 ú n/ø Õ/±nÔÂ---------} e h 9 L È ¢ à r % cb A Ó Å ¦ Û/Ö o / d Reserved B ô ô p « L i à ( Reserved C Q / l sM 0)ß D ó u/ù ±áZ1*. E V ±ù û {ë2+Reserved F [w÷{j3ATR Reserved October 2002 61
no reviews yet
Please Login to review.