Text Mining Techniques, Applications, and Issues
5. Issues in Text Mining Field
Many issues occur during the text mining process and effect the efficiency and effectiveness of decision making. Complexities can arise at the intermediate stage of text mining. In preprocessing stage various rules and regulations are defined to standardize the text that make text mining process efficient. Before applying pattern analysis on the document there is a need to convert unstructured data into intermediate form but at this stage mining process has its own complications. Sometime real theme or data mislay its importance due to the modification in the text sequence. Another major issue is a multilingual text refinement dependency that create problems. Only few tools are available that support multiple languages. Various algorithms and techniques are used independently to support multilingual text. Because numerous important documents persist outside the text mining process because various tools dont support them. These issues create a lots of problems in knowledge discovery and decision making process. Infect real benefit is difficult to attain by using the existing text mining techniques and tools because its rarely support multilingual documents.
Integration of domain knowledge is an important area as it performs specific operations on specified corpus and attain desired outcomes. In this situations domain knowledge from which document corpus to be extracted need to integrate with the computing abilities from which information have to be attained. According to the requirements of the field, experts are needed to work collaboratively from diverse domains to extract more effective, precise and accurate results.
The use of synonyms, polysems and antonyms in the
documents create problems (abstruseness) for the text mining
tools that take both in the same context. It is difficult to
categorize the documents when collection of document is large
and generated from diverse fields having the same domain.
Abbreviations gives changed meaning in different situation is
also a big issue. Varying concepts of granularity change
the context of text according to the condition and domain
knowledge. There is need to describe rules according to the
field that will be used as a standard in the area and can be
embedded in text mining tools as a plug-in. It entails lots of
effort and time to develop and deploy plug-ins in all fields
separately. To develop plug-ins in depth and proper knowledge
about the specific domain will be required. Natural
languages have lots of complications in itself that create
problem in text refinement methods and the identification
of entity relationship. Words having same spelling but give
diverse meaning, for example, fly and fly. Text mining tools
considered both as similar while one is verb and other is noun.
Grammatical rules according to the nature and context is still
an open issue in the field of text mining.