indic_transliteration.detect¶
Example usage:
from indic_transliteration import detect
detect.detect('pitRRIn') == Scheme.ITRANS
detect.detect('pitRRn') == Scheme.HK
When handling a Sanskrit string, it’s almost always best to explicitly
state its transliteration scheme. This avoids embarrassing errors with
words like pitRRIn. But most of the time, it’s possible to infer the
encoding from the text itself.
detect.py automatically detects a string’s transliteration scheme:
detect('pitRRIn') == Scheme.ITRANS
detect('pitRRn') == Scheme.HK
detect('pitFn') == Scheme.SLP1
detect('पितॄन्') == Scheme.Devanagari
detect('পিতৄন্') == Scheme.Bengali
Supported schemes¶
All schemes are attributes on the Scheme class. You can also just
use the scheme name:
Scheme.IAST == 'IAST'
Scheme.Devanagari == 'Devanagari'
Scripts:
- Bengali (
'Bengali') - Devanagari (
'Devanagari') - Gujarati (
'Gujarati') - Gurmukhi (
'Gurmukhi') - Kannada (
'Kannada') - Malayalam (
'Malayalam') - Oriya (
'Oriya') - Tamil (
'Tamil') - Telugu (
'Telugu')
Romanizations:
- Harvard-Kyoto (
'HK') - IAST (
'IAST') - ITRANS (
'ITRANS') - Kolkata (
'Kolkata') - SLP1 (
'SLP1') - Velthuis (
'Velthuis')
-
indic_transliteration.detect.BLOCKS= [('Malayalam', 3328), ('Kannada', 3200), ('Telugu', 3072), ('Tamil', 2944), ('Oriya', 2816), ('Gujarati', 2688), ('Gurmukhi', 2560), ('Bengali', 2432), ('Devanagari', 2304)]¶ Schemes sorted by Unicode code point. Ignore schemes with none defined.
-
indic_transliteration.detect.BRAHMIC_FIRST_CODE_POINT= 2304¶ Start of the Devanagari block.
-
indic_transliteration.detect.BRAHMIC_LAST_CODE_POINT= 3455¶ End of the Malayalam block.
-
class
indic_transliteration.detect.Regex[source]¶ -
IAST_OR_KOLKATA_ONLY= re.compile('[āīūṛṝḷḹēōṃḥṅñṭḍṇśṣḻ]')¶ Match on special Roman characters
-
ITRANS_ONLY= re.compile('ee|oo|\\^[iI]|RR[iI]|L[iI]|~N|N\\^|Ch|chh|JN|sh|Sh|\\.a')¶ Match on ITRANS-only
-
ITRANS_OR_VELTHUIS_ONLY= re.compile('aa|ii|uu|~n')¶ Match on chars shared by ITRANS and Velthuis
-
KOLKATA_ONLY= re.compile('[ēō]')¶ Match on Kolkata-specific Roman characters
-
SLP1_ONLY= re.compile('[fFxXEOCYwWqQPB]|kz|Nk|Ng|tT|dD|Sc|Sn|[aAiIuUfFxXeEoO]R|G[yr]|(\\W|^)G')¶ Match on SLP1-only characters and bigrams
-
VELTHUIS_ONLY= re.compile('\\.[mhnrltds]|"n|~s')¶ Match on Velthuis-only characters
-
-
indic_transliteration.detect.Scheme¶ Enum for Sanskrit schemes.
alias of
indic_transliteration.detect.Enum