Data Mining

EMM glossary term: Data Mining

The process of analyzing large amounts of data for patterns. This process can be used to predict buying habits, credit card purchases and cross selling capabilities. Data mining includes a lot of different processes, which are done in order to process information and make it understandable for further use. These processes include the use of artificial intelligence, statistics, machine learning and database systems. The term is a misnomer, because the goal is the extraction of patterns and knowledge from large amount of data, not the extraction of data itself. It also is a buzzword and is frequently applied to any form of large-scale data or information processing (collection, extraction, warehousing, analysis, and statistics) as well as any application of computer decision support system, including artificial intelligence, machine learning, and business intelligence. The popular book “Data mining: Practical machine learning tools and techniques with Java” (which covers mostly machine learning material) was originally to be named just “Practical machine learning”, and the term “data mining” was only added for marketing reasons. Often the more general terms “(large scale) data analysis”, or “analytics” – or when referring to actual methods, artificial intelligence and machine learning – are more appropriate.

The actual data mining task is the automatic or semi-automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records (cluster analysis), unusual records (anomaly detection), and dependencies (association rule mining).