The Technology

...or "A Mine is a Terrible Thing to Waste"


Picture courtesy of the The National Coal Museum P.O. BOX 369 West Frankfort, IL 62896 http://www.isgs.uiuc.edu:70/isgsroot/tours/coal-mine/coal.html

Data Mining? Whatís that?

You may not have heard of Data Mining (DM), but it may have a profound impact on the way that data is used at your company in the future. In the popular press, it's hard to get the real story about DM. At this point, it's not well known enough to be mentioned much outside of computing type publications very often, but is slowly making its way into marketing journals and general business publications as well.

Definition of Data Mining (Hype vs. Reality)

In short, data mining is the concept of using Artificial Intelligence (AI) methods to find hidden patterns in large volumes of data. There are a variety of vendors that supply software products (see link below) that perform this type of analysis on data.

In learning about what data mining is, it is important to learn first what it is not. The reason for this is that many vendors have confused customers (sometimes intentionally) with regards to what DM is and is not. Desiring to make their own products seem as though they fit into this niche, they have confused the meaning of the DM concept.

Non-Data Mining Tools That Have Been Confused By the Press The Most Include:
Online Analytical Processing (OLAP) Tools - where the goal is to enable analysts to "slice and dice" data to prove/disprove a hypothesis that the analyst has come up with. This type of tool typically works with aggregate data and lets users perform "Let's look at sales by region, sales by channel, sales by sales force" investigations "on the fly". Users are able to chug through queries quicker than using simple query tools, because the OLAP data has been pre-summarized using the aggregate dimensions (Region, Channel, Sales Force) that the OLAP designers (data programmers) have pre-summarized the data in. The user interface for such a tool resembles a "cross tab", or data matrix of an Excel spreadsheet. OLAP is a concept and there are a variety of vendors that sell software packages that support analysis of this type.
Additional OLAP Resources:
The OLAP Council

Examples of Multidimensional Databases (where/how OLAP tools store data)

OLAP according to Neil Raden, an author and lecturer on data warehousing and decision support, and is President Archer Decision Sciences, Inc.

Datamation article on OLAP

 
The key differentiator of data mining is that these tools do NOT help you look through data to find patterns that YOU think may be out there (i.e.: prove a hypothesis you may have). Instead, data mining tools look through data and tell you about interesting patterns that THEY find (i.e.: they come up with a hypothesis by themselves, based on a look through your data).

The key question that a true data mining tool attempts to answer for companies that collect mountains of data is "What do we do with the mountains once we have collected them?" The primary method that a true data mining tool uses to answer this question is an AI-based algorithm such as rule induction, neural nets, genetic and/or decision trees (CART, CHAID).

Each of these data mining terms is defined in an excellent glossary (including algorithm descriptions) provided by Pilot Software.

This link will help you to get an idea of some of the vendors that currently are providing data mining tools and services.

Here are some examples of data mining (association rules and profile generation) from Silicon Graphics.

In summary, your company has a valuable company asset, a virtual gold mine, in the data you have most likely collected over the years, and "a mine is a terrible thing to waste".


To send me feedback, click this: andrew.hall@anderson.ucla.edu

Back to main Data Mining index page: 

Visual Stimuli courtesy:  


Last Updated by AJx on 04/05/98 08:10:50 PM