This review of current literature explores text mining techniques and industry-specific applications. Selecting and using the right techniques and tools according to the domain helps make the text-mining process easier and more efficient. As you read this article, understand this includes applying specific sequences and patterns to extract useful information by removing irrelevant details for predictive analysis. Of course, major issues that may arise during the text mining process include domain knowledge integration, varying concepts of granularity, multilingual text refinement, and natural language processing ambiguity. Figure 3 shows the inter-relationships among text mining techniques and their core functionalities. Using this as a blueprint, apply one example from your industry to each part of the Venn diagram.
2. Review of Literature
S.-H. Liao, P.-H. Chu, and P.-Y. Hsiao, described that gathering, extracting, pre-processing, text transformation, feature extraction, pattern selection, and evaluation steps are part of text mining process. In addition, different widely used text mining techniques, i.e., clustering, categorization,
decision tree categorization, and their application in diverse fields are surveyed. N. Zhong, Y. Li, and S.-T. Wu highlighted the issues in text mining applications and techniques. They discussed that dealing with unstructured text is difficult as compared to structured or
tabular data using traditional mining tools and techniques. They have shown the applications of text mining process in bioinformatics, business intelligence and national security system. Natural language processing and entity recognition techniques has reduced the issues that occur during text mining process. However, there exist issues which need attention.
A. Henriksson, H. Moen, M. Skeppstedt, V. Daudaravicius, and M. Duneld explored MEDLINE biomedical database by integrating a framework for named entity recognition, classification of
text, hypothesis generation and testing, relationship and synonym extraction, extract abbreviations. This new framework
helps to eliminate unnecessary details and extract valuable
information. B. Laxman and D. Sujatha analyzed the text using text mining patterns
and showed term based approaches cannot analyze synonyms
and polysemy properly. Moreover, a prototype model was
designed for specification of patterns in terms of assigning
weight according to their distribution. This approach helps to
enhance the efficiency of text mining process. C. P. Chen and C.-Y. Zhang presented
a crime detection system using text mining tools and relation
discovery algorithm was designed to correlate the term with
abbreviation.
R. Rajendra and V. Saransh presented a top down and bottom up approach for
web based text mining process. To combine the similar text
documents, they apply k-mean clustering technique for bottom
up partitioning. To find out the similarity within the document
TF-IDF (Term Frequency- Inverse Document Frequency) algorithm has been used to find information regarding specific
subjects. K. Sumathy and M. Chidambaram gave an overview of applications, tools and
issues arises to mine the text. They discussed that documents
may be structured, semi structured or unstructured and extracting useful information is a tiresome task. They presented
a generic framework for concept based mining which can be
visualized as text refinement and knowledge distillation phases.
The intermediate form of entity representation mining depends
on specific domain
P.J. Joby and J. Korra presented innovative and efficient pattern discovery
techniques. They used the pattern evolving and discovering techniques to enhance the effectiveness of discovering relevant
and appropriate information. They performed BM25 and vector
support machine based filtering on router corpus volume 1 and
text retrieval conference data to estimate the effectiveness of
the suggested technique. Z. Wen, T. Yoshida, and X. Tang performed various experiments
of classification using multi-word features on the text. They
proposed a hand-crafted method to extract multi-word features
from the data set. To classify and extract multi-word text
they divide text into linear and nonlinear polynomial form in
support of vector machine that improve the effectiveness of
the extracted data.