How to consolidate data from different data sources
As a data architect for years, I tends to take this for granted, until someone came to me and asking me some methodology (or just simply HOW to do it) on consolidating data from different databases into one database.
For some unknown reason people tends to think different databases will have the same data structure. For example, if database A has a customer table, and database B also has a customer table, somehow people automatically assume they have the same columns and contain the same information. This assumption can lead to serious under-estimation on a project, and it happens often.
Although each situation is different, there are basic principles to follow:
1. Find a good data modeler to design a central data model for you. Is it a database for management reports, for transaction, how many departments are going to use it? What is the purpose of central database?
2. Set up staging area (a separate set of tables in the database) to store interim data from all sources. In a more complex system, multiple staging areas maybe needed.
3. Mapping (map each field from source to the staging table’s field) and generate files according to defined format from all sources, load them into staging area
4. Mapping from staging into central database. Mapping exercises are usually the most time consuming.
5. Identify data cleansing process. This means that if there are garbage data in the source, the data need to be “clean” before bringing over to the central database. Example: Numbers were being stored as Customer Name in the source data.
6. Get a good team of SQL developers to implement ETL (Extract, Transform and Load) scripts.
Of course the actual work and resources involved are much more complex depending on different situations. However, the basic building-block to consolidate data from different sources is outline above.

