indic_transliteration.detect¶
Example usage:
from indic_transliteration import detect
detect.detect('pitRRIn') == Scheme.ITRANS
detect.detect('pitRRn') == Scheme.HK
When handling a Sanskrit string, it’s almost always best to explicitly
state its transliteration scheme. This avoids embarrassing errors with
words like pitRRIn
. But most of the time, it’s possible to infer the
encoding from the text itself.
detect.py
automatically detects a string’s transliteration scheme:
detect('pitRRIn') == Scheme.ITRANS
detect('pitRRn') == Scheme.HK
detect('pitFn') == Scheme.SLP1
detect('पितॄन्') == Scheme.Devanagari
detect('পিতৄন্') == Scheme.Bengali
Supported schemes¶
All schemes are attributes on the Scheme
class. You can also just
use the scheme name:
Scheme.IAST == 'IAST'
Scheme.Devanagari == 'Devanagari'
Scripts:
- Bengali (
'Bengali'
) - Devanagari (
'Devanagari'
) - Gujarati (
'Gujarati'
) - Gurmukhi (
'Gurmukhi'
) - Kannada (
'Kannada'
) - Malayalam (
'Malayalam'
) - Oriya (
'Oriya'
) - Tamil (
'Tamil'
) - Telugu (
'Telugu'
)
Romanizations:
- Harvard-Kyoto (
'HK'
) - IAST (
'IAST'
) - ITRANS (
'ITRANS'
) - Kolkata (
'Kolkata'
) - SLP1 (
'SLP1'
) - Velthuis (
'Velthuis'
)
-
indic_transliteration.detect.
BLOCKS
= [('Malayalam', 3328), ('Kannada', 3200), ('Telugu', 3072), ('Tamil', 2944), ('Oriya', 2816), ('Gujarati', 2688), ('Gurmukhi', 2560), ('Bengali', 2432), ('Devanagari', 2304)]¶ Schemes sorted by Unicode code point. Ignore schemes with none defined.
-
indic_transliteration.detect.
BRAHMIC_FIRST_CODE_POINT
= 2304¶ Start of the Devanagari block.
-
indic_transliteration.detect.
BRAHMIC_LAST_CODE_POINT
= 3455¶ End of the Malayalam block.
-
class
indic_transliteration.detect.
Regex
[source]¶ -
IAST_OR_KOLKATA_ONLY
= re.compile('[āīūṛṝḷḹēōṃḥṅñṭḍṇśṣḻ]')¶ Match on special Roman characters
-
ITRANS_ONLY
= re.compile('ee|oo|\\^[iI]|RR[iI]|L[iI]|~N|N\\^|Ch|chh|JN|sh|Sh|\\.a')¶ Match on ITRANS-only
-
ITRANS_OR_VELTHUIS_ONLY
= re.compile('aa|ii|uu|~n')¶ Match on chars shared by ITRANS and Velthuis
-
KOLKATA_ONLY
= re.compile('[ēō]')¶ Match on Kolkata-specific Roman characters
-
SLP1_ONLY
= re.compile('[fFxXEOCYwWqQPB]|kz|Nk|Ng|tT|dD|Sc|Sn|[aAiIuUfFxXeEoO]R|G[yr]|(\\W|^)G')¶ Match on SLP1-only characters and bigrams
-
VELTHUIS_ONLY
= re.compile('\\.[mhnrltds]|"n|~s')¶ Match on Velthuis-only characters
-
-
indic_transliteration.detect.
Scheme
¶ Enum for Sanskrit schemes.
alias of
indic_transliteration.detect.Enum