Another very important subject related to data is, of course, data mining. Instead of data warehousing, which works on all the process from storing the data to de presentation, data mining focus is the analysis of information itself.
Throught this examination of information, the data is extracted and transformed in some usable data blocks. This process is largelly applied these days, in almost every industry that works and stores some information.
As we said before, this computer science concept is largelly used in data industry. Specifically in business intelligence systems or artificial intelligence ones, using techniques like machine learning, statistical analysis and others.
It is based on some data system, likely a database, and involves various data management methods. To gather more detailed information about databases systems, start by reading this basic database tutorial.
Define Data Mining
This computer science targets large sets of data and tries to discover patterns, with specific methods and algorithms, providing a much better understanding of that information. These patterns help to explore the data and detect relacionships between variables. Then, these patterns are applied to new data, validating it.
There are three basic stages of a data mining process:
- Pre-processing – Assemble a large set of information and analyse it, cleaning the bad data.
- Data Mining – Processing of data. Take use of the following methods: association, clustering, classification, regression and summarization.
- Validation – Verifying the discoveries and results.
These general tasks can be divided into more detailed subtasks, as we explore each subject.
Important Reading on Subject
In order to provide an intoductionary level document for the data mining subject, we developed a review of the best data mining books, we suggest to become an expert on the matter.
The previous books have distinct focus ideas, specific to data mining in multiple platforms and languages. Reading them will provide you a solid knowledge of how data mining is processed. Despite that, we suggest practical exercises by mining aa large set of data to consolidate all the information read in these books and other kind of sources.
Data Mining book reviews
This resource is the bible of managing and exploring structured data. It covers all essencial ideas of data mining and it’s most important algorithms in a simple, yet detailed way. The book focus a lot of different data management scenarios, which is very useful for a better understanding of the ideas exposed.
Definetely this is the most important book to read if you want to explored large sets of already structured information.
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
If you’re confortable using Python as a programming language, then this book is perfect for you. Everything about data manipulation and exploring, in Python, is covered in an incredible detail. It introduces scientific programming in Python, specifying the libraries that suit you most for data mining.
Essencially this is a book about the tools and methods you should use in Python, described in a practical and modern way, focusing everything on the data.
The Hadoop data platform is fully explained on this super detailed book. It’s about the building of data systems from top to bottom, covering all phases in an impressive and comprehensive manner. Focusing on the analysis of large sets of data, this resource provides you the tools and a guide to implement that analysis.
If you’re looking for the knowledge of how to build an entire system for data mining, then this is a great reference you should read.
This book introduces you to the data science world, covering every topic of big data. The main concern of this resource is to introduce the reader to a analytical thinking of the data, for better understanding it, uncovering all the possibilities of it’s exploration and analysis. Written in a easy way, this book is suitable for everyone who wants to be in touch with the big data world.
This guide explores several data mining techniques for information analysis, becoming a MUST read resource on this subject.
A book that explores the concept of big data and it’s implications on today’s world. It is very informative and has a lot of good examples of big data implementations, complementing the subject’s overview.
It is a less technical book, but also very important for the contextualization of big data in our systems.
This awesome resource introduces the reader to the data sciences, representing it as the next level of IT systems evolution. The topics covered on this book are the most useful data mining techiques, desbribing each one of them in a particular chapter. Everything is explained in a detailed way, with a few examples, which are super helpful.
Data Smart is a MUST read book, for everyone who wants to expands his knowledge about data mining.