Text Mining Techniques, Applications, and Issues

This review of current literature explores text mining techniques and industry-specific applications. Selecting and using the right techniques and tools according to the domain helps make the text-mining process easier and more efficient. As you read this article, understand this includes applying specific sequences and patterns to extract useful information by removing irrelevant details for predictive analysis. Of course, major issues that may arise during the text mining process include domain knowledge integration, varying concepts of granularity, multilingual text refinement, and natural language processing ambiguity. Figure 3 shows the inter-relationships among text mining techniques and their core functionalities. Using this as a blueprint, apply one example from your industry to each part of the Venn diagram.

1. Introduction

The size of data is increasing at exponential rates day by day. Almost all type of institutions, organizations, and business industries are storing their data electronically. A huge amount of text is flowing over the internet in the form of digital libraries, repositories, and other textual information such as blogs, social media network and e-mails. It is challenging task to determine appropriate patterns and trends to extract valuable knowledge from this large volume of data. Traditional data mining tools are incapable to handle textual data since it requires time and effort to extract information.

Text mining is a process to extract interesting and significant patterns to explore knowledge from textual data sources. Text mining is a multi-disciplinary field based on information retrieval, data mining, machine learning, statistics, and computational linguistics. Figure 1 shows the Venn diagram of text mining and its interaction with other fields. Several text mining techniques like summarization, classification, clustering etc., can be applied to extract knowledge. Text mining deals with natural language text which is stored in semi-structured and unstructured format. Text mining techniques are continuously applied in industry, academia, web applications, internet and other fields. Application areas like search engines, customer relationship management system, filter emails, product suggestion analysis, fraud detection, and social media analytics use text mining for opinion mining, feature extraction, sentiment, predictive, and trend analysis.

Fig. 1. Venn diagram of text mining interaction with other fields

Generic process of text mining performs the following steps (Figure 2)

  • Collecting unstructured data from different sources available in different file formats such as plain text, web pages, pdf files etc.
  • Pre-processing and cleansing operations are performed to detect and remove anomalies. Cleansing process make sure to capture the real essence of text available and is performed to remove stop words stemming (process of identifying the root of certain word) and indexing the data.
  • Processing and controlling operations are applied to audit and further clean the data set by automatic processing.
  • Pattern analysis is implemented by Management Information System (MIS).
  • Information processed in the above steps are used to extract valuable and relevant information for effective and timely decision making and trend analysis.

Fig. 2. Text mining process

Extraction of valuable information from a corpus of different document is a tedious and tiresome task. The selection of appropriate technique for mining text reduce the time and effort to find the relevant patterns for analysis and decision making. The objective of this paper is to analyze different text mining techniques which help to perform text analytics effectively and efficiently from large amount of data. Moreover, the issues that arise during text mining process are identified. This paper is organized in different sections. Previous work is discussed in section II. In section III, different techniques of text mining are explained. Section IV presents the application areas of text mining techniques. In section V, issues and challenges in text mining field are highlighted. Section VI concludes the outcomes.