One of the most asked questions about data warehousing is the comparison between two distinct structures: data mart and a data warehouse.
These two structuring concepts are linked to each other, but a wrong idea of these definitions and differences can lead to a poorly designed system and a low integration level between the stored information.
The contruction, architecture and processing of a data warehouse is intrinsically connected to these data structures.
So we’ve decided to compile a resource to fully explain you these differences and improve your skills.
A data mart is one piece of a data warehouse where all the information is related to a specific business area. Therefore it is considered a subset of all the data stored in that particular database, since all data marts together create a data warehouse.
This idea of subsetting the information can be easily extrapolated to different departments in a company or distinct business areas with lots of data related to it. They are all related to the same company but divided by usability into several data marts.
So a data mart is some subset of data specific to some user types tasks, creating a view in a format that makes information easier to use and analyse by the end users of your system.
With this description, the idea of data mart should be clear to you. To support the previous definitions and to allow you viewing it, we provide an example of this kind of data structures.
As you can see, it is built on a star schema structure which organizes data in a specific way. The central fact table references several dimensions or attributes.
Since this is a data mart, all the information contained in this data structure is only relative to sales and its dependencies.
This example is very simple and for demonstration purposes only. In a real application, if you need to build a sales data mart, you have far more dependencies to store all the related data.
Therefore using a data mart based solution to store information provides several benefits to the system, enhances its functionality and user experience.
The primary advantages are:
- Data Segregation: Each box of information is developed without changing the other ones. This boosts information security and the quality of data.
- Easier Access to Information: These data structures provide a easiest way of interpret the information stored on the database
- Faster Response: Derived from the adopted structure
- Simple queries: Based on the structure and size of the data
- Subject full detailed data: Might also provide summarization of the information
- Specific to User Needs: This set of data is focused on the end user needs
- Easy to Create and Mantain
These data structures are built on information databases, taking advantages of some specific schema structures. Data marts are designed according to star schema or snowflake schema methodologies.
Therefore, on these types of schema design, data is represented by facts which are stored on fact tables. On the other hand, the definitions and attributes of that data is represented by dimensions on dimension tables. The organization between these two elements defines which schema structure is used.
Now that we know what a data mart is, let’s compare it to a data warehouse!
This comparison is divided on several key points that explain the differences, and some similarities, between them.
We also provide you this simple diagram, which compares a data mart and a data warehouse in terms of size and purpose.
The first, and most obvious difference is the information scope each one stores. On one hand, data warehouses save all kinds of data related to system. On the other hand, data marts just store specific subject information, becoming much more focused on these functionalities.
Based on the previous premise we can say that a data warehouse is usually much bigger than data marts, because it keeps a lot more data.
As you may know, a data warehouse usually integrates several sources of data in order to feed its database and the system’s needs. In opposite, a data mart has a lot less integration to do, since its data is very specific.
Creating a data warehouse is way more difficult and time consuming than building a data mart. Building all the structure a relationships between data its a long and very important step. Plus you need to think and analyse how you will integrate all of your information sources. Since data marts are smaller and subject oriented, these actions tend to be much simpler.
However a well built data warehouse can support large systems for the long run. In the other hand a good data mart is only limited to its activity area.
Like creation, the management of data warehouses is far more complex than data marts. For the same reasons stated above, it is obvious that when you have a lot more data, relationships, processes to manage, it becomes a harder task.
In overall, in terms of cost, data marts are cheaper than data warehouse. To build and mantain a data warehouse you need significantly more physical resources like servers, disk space, memory and cpu. Due to the complexity of the systems, a data mart requires less time to build and operate. So, since time is money, we can easilly reach to our conclusion.
The performance of a system always depends on how it is built, the infraestruture which supports it, the processes, the number of users, etc. However, due to some previous conclusions, is safe to say that usually a data mart is more performant than a data warehouse because of the inherited complexity.