Business Intelligence
Semi-structured or unstructured data
Problems with semi-structured or unstructured data
There are several challenges to developing BI with semi-structured data. According to Inmon & Nesavich, some of those are:
- Physically accessing unstructured textual data – unstructured data is stored in a huge variety of formats.
- Terminology – Among researchers and analysts, there is a need to develop a standardized terminology.
- Volume of data – As stated earlier, up to 85% of all data exists as semi-structured data. Couple that with the need for word-to-word and semantic analysis.
- Searchability of unstructured textual data – A simple search on some data, e.g. apple, results in links where there is a reference to that precise search term. (Inmon & Nesavich, 2008) gives an example: "a search is made on the term felony. In a simple search, the term felony is used, and everywhere there is a reference to felony, a hit to an unstructured document is made. But a simple search is crude. It does not find references to crime, arson, murder, embezzlement, vehicular homicide, and such, even though these crimes are types of felonies".