0
我知道在cfg中我可以设置并行性,但是有没有办法按照任务完成它,或者至少每个dag?指定每个任务的并行度?
DAG1 =
task_id: 'download_sftp'
parallelism: 4 #I am fine with downloading multiple files at once
task_id: 'process_dimensions'
parallelism: 1 #I want to make sure the dimensions are processed one at a time to prevent conflicts with my 'serial' keys
task_id: 'process_facts'
parallelism: 4 #It is fine to have multiple tables processed at once since there will be no conflicts
DAG2(单独的文件)=
task_id: 'bcp_query'
parallelism: 6 #I can query separate BCP commands to download data quickly since it is very small amounts of data
有更多的我可以阅读有关池吗?我只是将一个池命名为一个字符串,然后Airflow神奇地处理了一切?我之前没有进入这个领域,所以我想确保我明白发生了什么。我想避免的一件事是有两个任务尝试更新维表或其他东西并导致冲突(我正在使用postgres和psycopg2 COPY EXPERT来加载我的数据)。因此,对于维度表更新,我希望每个源一次只执行一次,但对于SFTP下载和事实表加载,我可以同时执行多个过程。 – trench