Transliterator
Transliterator converts a string between Latin and other scripts. For example:
Source | Transliteration |
---|---|
kyanpasu |
|
Αλφαβητικός Κατάλογος |
|
биологическом |
biologichyeskom |
It is important to note that transliteration is not translation. Rather, transliteration is the conversion of letters from one script to another without translating the underlying words.
Note: Standard transliteration methods often do not follow the pronunciation rules of any particular language in the target script.
The Transliterator stage supports these scripts. In general, the Transliterator stage
follows the UNGEGN Working Group on Romanization Systems guidelines. For more
information, see www.eki.ee/wgrs.
- Arabic
- The script used by several Asian and African languages, including Arabic, Persian, and Urdu.
- Cyrillic
- The script used by Eastern European and Asian languages, including Slavic languages such as Russian. The Transliterator stage generally follows ISO 9 for the base Cyrillic set.
- Devanagari
- The script used by several Indian languages, including Hindi and Sanskrit. This script is a descendent of the Brahmi script which is one of the oldest writing systems used in Ancient India and present South and Central Asia.
- Greek
- The script used by the Greek language. This script belongs to the Hellenic branch of the Indo-European language family.
- Gujarati
- The script used by the state of Gujarat in western India. It is one of the modern scripts of India which was adapted from the Devanagari script.
- Gurmukhi
- The script used by Indian language Punjabi. This script has a considerable influence from Nagari script which is an earlier form of the Devanagari script.
- Hangul
- The script used by the Korean language. The Transliterator stage follows the Korean Ministry of Culture and Tourism Transliteration regulations. For more information, see the website of The National Institute of the Korean Language.
- Han
- The script used by Chinese language. It is a branch of the Tibetan-Burman language family and has been written with scripts based on Thai and Chinese.
- Traditional/Simplified Chinese
-
The Transliterator stage supports both traditional and simplified Chinese. For example, this is Traditional Chinese: . This is Simplified Chinese:
- Kannada
- The script used by several South Indian languages, such as Konkani. This script is a descendent of Brahmi script of ancient India.
- Katakana and Hiragana
- One of several scripts that can be used to write Japanese. The Transliterator stage uses a slight variant of the Hepburn system. With Hepburn system, both ZI () and DI () are represented by "ji" and both ZU () and DU () are represented by "zu". This is amended slightly for reversibility by using "dji" for DI and "dzu" for DU. The Katakana transliteration is reversible. Hiragana-Katakana transliteration is not completely reversible since there are several Katakana letters that do not have corresponding Hiragana equivalents. Also, the length mark is not used with Hiragana. The Hiragana-Latin transliteration is also not reversible since internally it is a combination of Katakana-Hiragana and Hiragana-Latin.
- Half width/Full width
- The Transliterator stage can convert between narrow half-width scripts and wider full-width scripts. For example, this is half-width: . This is full-width: .
- Latin
- The script used by most languages of Europe, such as English. It was originally used by the ancient Romans to write the Latin language.
- Malayalam
- The script used by the Malayalam language, the official language of the Indian state of Kerala. This script was first written with the Vatteluttu alphabet which means 'round writing' and developed from the Brahmi script of ancient India.
- Oriya
- The script used by the Oriya language, the official language of the Indian state of Odisha. The Oriya script was developed from the Kalinga script, one of the many descendents of the Brahmi script of ancient India.
- Tamil
- The script used by the Tamil language in several states of India, Sri Lanka, and Malaysia. This script was originally written with a version of the Brahmi script known as Tamil Brahmi.
- Telugu
- The script used by several languages of South India. This script is a descendent of Brahmi script of ancient India.
- Thai
- The script used by Thai language. This script is influenced by the Brahmi script of ancient India and the Khmer alphabets.
Transliterator is a part of Data Normalization. For a listing of other stages, see Spectrum Data Normalization.