2011-03-12 60 views
2

我被要求制作一种ETL风格的应用程序,可将信息从一个数据源传输到另一个数据源。目前,我已经决定要使用三层架构,但我想了解更多的最佳实践以及本维基百科页面上所描述的生命周期:ETL架构

http://en.wikipedia.org/wiki/Extract,_transform,_load

的四层对于ETL架构设计分层方法

* Functional layer: Core functional ETL processing (extract, transform, and load). 
    * Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting. 
    * Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and error-handling, codes management. 
    * Utility layer: Common components supporting all other layers. 

现实生活中的ETL周期

The typical real-life ETL cycle consists of the following execution steps: 

    1. Cycle initiation 
    2. Build reference data 
    3. Extract (from sources) 
    4. Validate 
    5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates) 
    6. Stage (load into staging tables, if used) 
    7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair) 
    8. Publish (to target tables) 
    9. Archive 
    10. Clean up 

回答

5

我不知道是什么Ÿ你的情况或我们的要求是,但你很可能过度思考问题。

单独的名称是 “” 架构:

  • 提取
  • 变换
  • 负载

导出数据库表到CSV可以被认为是 “ET” 加载CSV是“L”。大多数ETL问题都不复杂。除此之外,你应该抓住任何已经在Java,免费和商业,图书馆和全船处理系统中使用的1或200万ETL和ESB软件包,并简单地采用你最喜欢的其中之一。

获得一块白板,将一些气泡与线条串起来,并将其转为代码。