jagomart
digital resources
picture1_Pasciistandard


 118x       Filetype PDF       File size 0.46 MB       Source: www.cs.cmu.edu


File: Pasciistandard
8 3 the perso arabic standard for iv numerals are placed after erabs and superscripts information interchange this is provided only to support display for the standard proposed by c ...

icon picture PDF Filetype PDF | Posted on 24 Sep 2022 | 3 years ago
Partial capture of text on file.
                                      8.3 The Perso-Arabic Standard for                                            iv. Numerals are placed after erabs and superscripts.
                                               Information Interchange                                                  (This is provided only to support display for
                               The standard proposed by C-DAC GIST is an                                                language specific numerals and standard
                               extension to the standard 8-bit ASCII. It                                                numerals i.e. the ASCII numerals are available).
                               compliments the symbol set of Latin script by                                       Standardization of Perso-Arabic Fonts
                               adding the symbol of the Perso-Arabic scripts. The
                               standard supports storage for the Perso-Arabic                                      Characteristics of Perso-Arabic languages :
                               languages like Urdu, Persian, Sindhi, Kashmiri, and
                               Arabic.                                                                             Perso-Arabic languages are written in Naskh &
                               Characteristics                                                                     Nastaliq scripts. Urdu & Kashmiri are traditionally
                                                                                                                   written in the Nastaliq script ; while Sindhi is written
                               i.    Its a 8-bit standard                                                          in Naskh script. Although the script employs basic
                               ii.   Supports letters for Urdu, Arabic, Sindhi,                                    letters of the language, the rendering of these letters
                                     Kashmiri                                                                      in a word is extremely complex. The reason for
                                                                                                                   this complexity is that the text has traditionally been
                               iii. Defines Perso-Arabic alphabets in the upper                                    composed through calligraphy, a medium whose
                                     ASCII (This leaves the lower ASCII free. The                                  precepts are based on the aesthetic sense of the
                                     lower ASCII can be used for English alphabets                                 calligrapher rather than on any formula. So great
                                     e.g. to give a bi-lingual font support).                                      is the variation in calligraphy that many times it is
                               iv. Defines numerals other than ASCII numbers                                       difficult to recognize the letters in a constituent
                                     (48 to 57) (This may help supporting both                                     word. This is because, in their calligraphed form,
                                     Arabic Numerals 0-9 and language specific                                     the individual letters partially or completely fused
                                     numerals)                                                                     into each other thereby losing their identity. A
                                                                                                                   degree of fusion is purposely introduced to make
                               v.    Maintains the order of alphabets for Perso-                                   the resulting fused glyph visually appealing.
                                     Arabic languages.                                                             Another characteristic of the Perso-Arabic languages
                               vi. Alphabets / letters are placed in their ascending                               is the use of diacritics. Diacritics, although sparingly
                                     order. Letters like “bhey” are not provided for                               used, help in the proper pronunciation of the
                                     URDU but kept for languages like Sindhi.                                      constituent word. The diacritics appear above or
                                     Urdu may make use of the digraph “be” and                                     below a character to specify a vowel or emphasize a
                                     “choTi-he” for that.                                                          particular sound. These are essential for the removal
                               vii. Minimal erabs are provided. Tanveen, for                                       of ambiguities, natural language processing and
                                     example do-zabar, can be formed with the help                                 speech synthesis.
                                     of double zabar.
                               vii. Unicode compatability can be achieved by                                       Standardization of Glyph Set
                                     having PASCII to UNICODE & viceversa                                          Following was taken into consideration while
                                     converter.                                                                    designing fonts for the Perso-Arabic languages.
                               Superscripts                                                                        Considering the complexities of the script it was
                               i.    Place for superscripts like khaRa-alif is provided                            not possible to accommodate all the glyphs /
                                                                                                                   ligatures in an 8 bit code space. Hence 16 – bit
                               ii.   Place for superscripts for Arabic is provided                                 font code space was considered.
                               iii. Place for superscripts like “re-ze”, “ain”, etc. is                                           1.    Alphabet
                                     provided.
                               58                                                                                                                                         October 2002
                                 2. Numerals                                        • Includes Beginning ligatures
                                 3.  Special characters                             • Includes Middle ligatures
                                 4.  Diacritics                                     • Includes Ending ligatures.
                                 5.  Religious and linguistic symbols               • Includes dotted circle glyph
                                 6.  Control characters                          India
                      The 16-bit Nastaliq font for Urdu & Kashmiri               India is a paradise in the foot of the great Himalayas
                      Fonts developed by C-DAC for Urdu & Kashmiri               in the northern end and lies cocooned by huge
                      are 16-bit. The Glyphs are defined in the User Area        oceans on the other three sides. While the Arabian
                      of the Unicode range. The ASCII range is not used          Sea borders the southwest side, the southeast is lulled
                      and can be used for different purposes (it can be          by the Bay of Bengal, and the southern tip - Kanya
                      used to support English for example).                      Kumari (Cape Comorin) is washed by the Indian
                                                                                 Ocean. Hence protected by such natural barriers
                         • Includes all the basic shapes                         like mountains and water, it is separated from the
                         • Includes all the starting shapes and variations       rest of Asia. For geographers, it lies to the north of
                                                                                 the equator between 8.4 and 37.6 degrees north
                         • Includes all the middle shapes and variations         latitude and 68.7 and 97.25 degrees east longitude.
                         • Includes all the ending shapes and variations         India measures 3214 km from north to south and
                                                                                 2933 kms from east to west. it has a land frontier
                         • Includes levels for erabs (short vowels)              of 15,200 kms and a coastline of 7516.5 kms.
                         • Includes Complete ligatures                           India shares its political borders with Pakistan and
                         • Includes Beginning ligatures                          Afghanistan on the west; Bangladesh and Myanmar
                                                                                 in the east; Nepal, China, Tibet and Bhutan in the
                         • Includes Middle ligatures                             north.The Capital of India is New Delhi.
                         • Includes Ending ligatures                             Languages
                         • Includes dotted circle glyph                          India has 18 officially recognized languages among
                      The 16-bit Naskh font for Sindhi, Urdu &                   about 200 languages as enumerated in the census.
                      Kashmiri.                                                  Names of Languages
                      Font developed by C-DAC for Sindhi, Urdu &                                                         th
                                                                                 Following languages are listed in the 8  schedule
                      Kashmiri are 16-bit. The Glyphs are defined in the         of the Constitution (given in Devanagri order):
                      User Area of the Unicode range. The ASCII range                   • Assamese
                      is not used and can be used for different purposes
                      (it can be used to support English for example).                  • Urdu
                         • Includes all the basic shapes                                • Oriya
                         • Includes all the starting shapes                             • Kannada
                         • Includes all the middle shapes                               • Kashmiri
                         • Includes all the ending shapes                               • Konkani
                         • Includes levels for erabs (short vowels)                     • Gujarati
                         • Includes Complete ligatures                                  • Tamil
                     October 2002                                                                                                  59
                             • Telugu                                             plexities of rendering, a number of alternate shapes
                             • Nepali                                             are possible for a single letter, considering its posi-
                                                                                  tion in the word and the letter next to it. Due to
                             • Punjabi                                            this nature of Nastaliq, it increases the glyph set for
                             • Bengali                                            the language.
                             • Manipuri                                           The characters of Urdu also need diacritics to help
                             • Marathi                                            in a proper pronunciation of the constituent word.
                                                                                  There are a number of diacritics, the common ones
                             • Malayalam                                          being Zabar, Zer, and Pesh.
                             • Sanskrit                                           History of Urdu language
                             • Sindhi                                             The word Urdu means ‘Lashkar’, derived from the
                             • Hindi                                              Turkish language meaning 'armies'. In the south
                      Urdu Design Guide : General Information                     of India it flourished under the name of Dakhani
                                                                                  and southwest as Gurjari while in Delhi its name
                      Introduction                                                changed from Hindi to Hindavi and Hindustani.
                      This document provides general information about            Alternate names of Urdu are DAKHINI(DAKANI,
                      the Urdu language and some conventions of its               DECCAN, DESIA, MIRGAN), PINJARI,
                      usage in India.                                             REKHTA (REKHTI).
                      The information presented in this document is in-           Population using the Urdu Language
                      tended to assist in understanding the nature and            48,062,000 in India (1997 IMA);
                      problems of Urdu implementation in the digital              10,719,000 in Pakistan (1993), or 7.57% of the
                      medium. It contains the generic description of              population;
                      Urdu.
                      Urdu is one of the official languages of India. It is       600,000 in Bangladesh;
                      the official language of Pakistan, and spoken in            64,000 in Mauritius (1993 Johnstone).
                      various countries around the world.                         170,000 in South Africa (1987).
                      Language Description                                        18,500 in Bahrain (1979 WA);
                      Urdu belongs to the Indo-Aryan subgroup of the              17,800 in Oman (1980 WA);
                      Indo-European family of languages. It has devel-
                      oped with the heavy influences of Arabic, Persian           15,400 in Qatar;
                      and Turkish languages. Urdu writing system is a             382,000 in Saudi Arabia;
                      super set of Arabic and Persian and contains 39
                      characters. Urdu is written from right side to left.        3,562 in Fiji (1980 WA);
                      Unlike English, the characters do not have upper            23,000 in Germany;
                      and lower cases. Further, the shape assumed by a
                      character in a word is context-sensitive i.e. the shape     14,000 in Norway;
                      is different depending whether the position of the          Totals :
                      character is at the beginning, in the middle or at
                      the end of the constituent word.                            60,290,000 or more in all countries
                      Urdu is traditionally written in Nastaliq, a script         104,000,000 including second language users
                      rich in calligraphic content. Owing to the com-             (1999 WA).
                      60                                                                                                  October 2002
                                                                  PASCII (Perso-Arabic Standard for Information Interchange) Version 1.0
                                                                128                               144                     160                      176                     192                     208                      224                     240
                                                          8                                         9                     A                          B                     C                       D                        E                       F
                                             0                                   9                                        y                        ¶                       ª                       k                        4                       -
                                             1              Kasheeda                                õ                     |»ëm5/
                                             2                           @  _øÀÇl6;
                                             3                           +                         ö                      †Åঠ7:
                                             4                B                                    c                       Š üÿ­ 8?
                                                              ý                                    gûÈ è 9=
                                             5                                                                                                                             ÿ
                                             6
                                                              ò                                    l“þµ[ !.
                                             7               G                                    Ê                       ˜Êk» fg
                                             8         ú                                          n/ø                     œÕ/±nÔÂ---------}                                                                                 e                       h
                                             9        L                                           È                       ¢                        à                       r                       %                        cb
                                             A             Ó                Å                                             ¦       Û/Ö                                      o                       /                        d                       Reserved
                                            B        ô                  ô                        p                        «                        L                       i                       à                        (                       Reserved
                                             C         Q / l                                      s­M…0)ß
                                             D      ó                   u/ù                                               ±áZ1*.
                                             E       V        ±ù                                                          û                        {ë2+Reserved
                                             F       [w÷{j3ATR Reserved
                                          October 2002                                                                                                                                                                                                                    61
The words contained in this file might help you see if this file matches what you are looking for:

...The perso arabic standard for iv numerals are placed after erabs and superscripts information interchange this is provided only to support display proposed by c dac gist an language specific extension bit ascii it i e available compliments symbol set of latin script standardization fonts adding scripts supports storage characteristics languages like urdu persian sindhi kashmiri written in naskh nastaliq traditionally while its a although employs basic ii letters rendering these word extremely complex reason complexity that text has been iii defines alphabets upper composed through calligraphy medium whose leaves lower free precepts based on aesthetic sense can be used english calligrapher rather than any formula so great g give bi lingual font variation many times other numbers difficult recognize constituent may help supporting both because their calligraphed form individual partially or completely fused into each thereby losing identity degree fusion purposely introduced make v maint...

no reviews yet
Please Login to review.