2
我被要求制作一种ETL风格的应用程序,可将信息从一个数据源传输到另一个数据源。目前,我已经决定要使用三层架构,但我想了解更多的最佳实践以及本维基百科页面上所描述的生命周期:ETL架构
http://en.wikipedia.org/wiki/Extract,_transform,_load
的四层对于ETL架构设计分层方法
* Functional layer: Core functional ETL processing (extract, transform, and load).
* Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting.
* Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and error-handling, codes management.
* Utility layer: Common components supporting all other layers.
现实生活中的ETL周期
The typical real-life ETL cycle consists of the following execution steps:
1. Cycle initiation
2. Build reference data
3. Extract (from sources)
4. Validate
5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)
6. Stage (load into staging tables, if used)
7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)
8. Publish (to target tables)
9. Archive
10. Clean up