DEVANAGARI DIGITIZATION: A MACHINE LEARNING PARADIGM FOR ACCURATE WORD CONVERSION

Main Article Content

Shilpa Tyagi, Chiranjit Dutta, Manu Singh

Abstract

The digitization and conversion of handwritten or printed documents in the Devanagari script into editable Word format have become increasingly essential in the digital age. This article presents an innovative approach that utilizes machine learning techniques to perform the conversion seamlessly and accurately. The proposed system is designed to handle a wide range of Devanagari scripts from various sources and formats. It provides a combination of Optical Character Recognition (OCR) and machine learning algorithms to recognize the characters, words, and structure of the script. The model is trained on a diverse dataset comprising handwritten and printed Devanagari text, covering different writing styles and qualities. The character recognition module employs convolutional neural networks (CNN) and recurrent neural networks (RNN) to accurately identify Devanagari characters. Word segmentation techniques enable the system to split sentences into meaningful units, considering the specific rules and characteristics of the Devanagari script. In addition to transcription, the system recognizes contextual information and maintains the logical structure of the text. The proposed model demonstrates the effectiveness and accuracy of the proposed system in converting Devanagari script into an editable Word format, surpassing traditional OCR methods. It has immense potential in preserving and promoting the cultural and linguistic heritage encapsulated in Devanagari documents and opens new possibilities for research, translation, and content accessibility in a digital context.

Article Details

Section
Articles