Bonfring International Journal of Software Engineering and Soft Computing

Impact Factor: 0.375 | International Scientific Indexing(ISI) calculate based on International Citation Report(ICR)


A Survey of Automatic Text Summarization System for Different Regional Language in India

Virat V. Giri, Dr.M.M. Math and Dr.U.P. Kulkarni


Abstract:

Automatic text summarization is technique of compressing the original text into shorter form which will provide same meaning and information as provided by original text. The brief summary produced by summarization system allows readers to quickly and easily understand the content of original documents without having to read each individual document. The overall motive of text summarization is to convey the meaning of text by using less number of words and sentences. Summaries are of two types: Abstractive summaries and Extractive summaries. Extractive summaries involve extracting relevant sentences from the source text in proper order. The relevant sentences are extracted by applying statistical and language dependent features to the input text. On the other hand, abstractive text summaries are made by applying natural language understanding. This system comprises of two main steps: Pre Processing and Processing phase. Pre Processing phase represents the Marathi text in structured way. In processing phase, different features deciding the importance of sentences are determined and calculated. Some of the statistical features are Marathi keywords identification, relative sentence length feature and numbered data feature. Various linguistic features for selecting important sentences in summary are: Marathi headlines identification, identification of lines just next to headlines, identification of Marathi-nouns, identification of Marathi-proper-nouns, identification of common-English- Marathi-nouns, identification of Marathi-cue-phrases and identification of title-keywords in sentences. Scores of sentences are determined from sentence-feature-weight equation. Weights of features are determined using mathematical regression. This paper concentrates on survey and performance analysis of automatic text summarizers for Marathi language.

Keywords: Marathi Text Summarizer, Extractive Summarization, Named Entity Recognition, Keywords Identification, Headlines Identification.

Volume: 6 | Issue: Special Issue on Advances in Computer Science and Engineering and Workshop on Big Data Analytics Editors: Dr.S.B. Kulkarni, Dr.U.P. Kulkarni, Dr.S.M. Joshi and J.V. Vadavi

Pages: 52-57

Issue Date: October , 2016

DOI: 10.9756/BIJSESC.8242

Full Text

Email

Password

 


This Journal is an Open Access Journal to Facilitate the Research Community