The task was complex. “When we looked at Hindi, there were multiple challenges. The first thing is the unavailability of data. Then, unlike most other global languages, neither the spelling nor the pronunciation is standardised in Hindi,” said Natarajan.
Take the word for love – pyaar. While it is spelt as pyar in the movie Dil Vil Pyar Vyar, it is spelt differently in Pyaar Ka Punchnama. It means that when someone does a search, it restricts the ability of the search engine to connect with the content because there isn’t any standardised representation. Similarly, in the Hindi script, Devanagari, the same word can be spelt differently in different places, leading to a huge variability. For example, while in some places people say dariya (river), in other places it is spelt as daria. Then comes the issue of local dialects and accent variation which makes things even more fiendishly difficult.