loader image
Skip to main content
If you continue browsing this website, you agree to our policies:
x
Completion requirements

As you read, think about how using, protecting, and managing information and data could support an organization's competitive advantage. Conversely, failure to protect data, particularly personal information, could reduce or destroy any competitive advantage within a business. How does understanding customer information and data support current operations? How might it impact future operations?

The Business Intelligence Toolkit

Data Mining

While reporting tools can help users explore data, modern data sets can be so large that it might be impossible for humans to spot underlying trends. That's where data mining can help. Data mining is the process of using computers to identify hidden patterns and to build models from large data sets.

Some of the key areas where businesses are leveraging data mining include the following:

  • Customer segmentation - figuring out which customers are likely to be the most valuable to a firm.
  • Marketing and promotion targeting - identifying which customers will respond to which offers at which price at what time.
  • Market basket analysis - determining which products customers buy together, and how an organization can use this information to cross-sell more products or services.
  • Collaborative filtering - personalizing an individual customer's experience based on the trends and preferences identified across similar customers.
  • Customer churn - determining which customers are likely to leave, and what tactics can help the firm avoid unwanted defections.
  • Fraud detection - uncovering patterns consistent with criminal activity.
  • Financial modeling - building trading systems to capitalize on historical trends.
  • Hiring and promotion - identifying characteristics consistent with employee success in the firm's various roles.

For data mining to work, two critical conditions need to be present: (1) the organization must have clean, consistent data, and (2) the events in that data should reflect current and future trends. The recent financial crisis provides lessons on what can happen when either of these conditions isn't met.

First lets look at problems with using bad data. A report in the New York Times has suggested that in the period leading up to the 2008 financial crisis, some banking executives deliberately deceived risk management systems in order to skew capital-on-hand requirements. This deception let firms load up on risky debt, while carrying less cash for covering losses. Deceive your systems with bad data and your models are worthless. In this case, wrong estimates from bad data left firms grossly overexposed to risk. When debt defaults occurred; several banks failed, and we entered the worst financial crisis since the Great Depression.

Now consider the problem of historical consistency: Computer-driven investment models can be very effective when the market behaves as it has in the past. But models are blind when faced with the equivalent of the "hundred-year flood" (sometimes called black swans); events so extreme and unusual that they never showed up in the data used to build the model.

We saw this in the late 1990s with the collapse of the investment firm Long Term Capital Management. LTCM was started by Nobel Prize–winning economists, but when an unexpected Russian debt crisis caused the markets to move in ways not anticipated by its models, the firm lost 90 percent of its value in less than two months. The problem was so bad that the Fed had to step in to supervise the firm's multibillion-dollar bailout. Fast forward a decade to the banking collapse of 2008, and we again see computer-driven trading funds plummet in the face of another unexpected event - the burst of the housing bubble.

Data mining presents a host of other perils, as well. It's possible to over-engineer a model, building it with so many variables that the solution arrived at might only work on the subset of data you've used to create it. You might also be looking at a random but meaningless statistical fluke. In demonstrating how flukes occur, one quantitative investment manager uncovered a correlation that at first glance appeared statistically to be a particularly strong predictor for historical prices in the S&P 500 stock index. That predictor? Butter production in Bangladesh. Sometimes durable and useful patterns just aren't in your data.

One way to test to see if you're looking at a random occurrence in the numbers is to divide your data, building your model with one portion of the data, and using another portion to verify your results. This is the approach Netflix has used to test results achieved by teams in the Netflix Prize, the firm's million-dollar contest for improving the predictive accuracy of its movie recommendation engine.

Finally, sometimes a pattern is uncovered but determining the best choice for a response is less clear. As an example, let's return to the data-mining wizards at Tesco. An analysis of product sales data showed several money-losing products, including a type of bread known as "milk loaf". Drop those products, right? Not so fast. Further analysis showed milk loaf was a "destination product" for a loyal group of high-value customers, and that these customers would shop elsewhere if milk loaf disappeared from Tesco shelves. The firm kept the bread as a loss-leader and retained those valuable milk loaf fans. Data miner, beware - first findings don't always reveal an optimal course of action.

This last example underscores the importance of recruiting a data mining and business analytics team that possesses three critical skills: information technology (for understanding how to pull together data, and for selecting analysis tools), statistics (for building models and interpreting the strength and validity of results), and business knowledge (for helping set system goals, requirements, and offering deeper insight into what the data really says about the firm's operating environment). Miss one of these key functions and your team could make some major mistakes.

While we've focused on tools in our discussion above, many experts suggest that business intelligence is really an organizational process as much as it is a set of technologies. Having the right team is critical in moving the firm from goal setting through execution and results.


Artificial Intelligence

Data mining has its roots in a branch of computer science known as artificial intelligence (or AI). The goal of AI is create computer programs that are able to mimic or improve upon functions of the human brain. Data mining can leverage neural networks or other advanced algorithms and statistical techniques to hunt down and expose patterns, and build models to exploit findings.

Expert systems are AI systems that leverage rules or examples to perform a task in a way that mimics applied human expertise. Expert systems are used in tasks ranging from medical diagnoses to product configuration.

Genetic algorithms are model building techniques where computers examine many potential solutions to a problem, iteratively modifying (mutating) various mathematical models, and comparing the mutated models to search for a best alternative. Genetic algorithms have been used to build everything from financial trading models to handling complex airport scheduling, to designing parts for the international space station.

While AI is not a single technology, and not directly related to data creation, various forms of AI can show up as part of analytics products, CRM tools, transaction processing systems, and other information systems.