卸载从红移多个文件至S3

你好我试图卸载从红移多个表到一个特定的S3存储得到如下错误：卸载从红移多个文件至S3

psycopg2.InternalError: Specified unload destination on S3 is not empty. Consider using a different bucket/prefix, manually removing the target files in S3, or using the ALLOWOVERWRITE option.

如果我加上“allowoverwrite”

上unload_function选项，它是之前overwritting并在S3中卸载最后一个表。

这是我给的代码：

import psycopg2 

def unload_data(r_conn, aws_iam_role, datastoring_path, region, table_name): 
    unload = '''unload ('select * from {}') 
        to '{}' 
        credentials 'aws_iam_role={}' 
        manifest 
        gzip 
        delimiter ',' addquotes escape parallel off '''.format(table_name, datastoring_path, aws_iam_role) 

    print ("Exporting table to datastoring_path") 
    cur = r_conn.cursor() 
    cur.execute(unload) 
    r_conn.commit() 

def main(): 
    host_rs = 'dataingestion.*********.us******2.redshift.amazonaws.com' 
    port_rs = '5439' 
    database_rs = '******' 
    user_rs = '******' 
    password_rs = '********' 
    rs_tables = [ 'Employee', 'Employe_details' ] 

    iam_role = 'arn:aws:iam::************:role/RedshiftCopyUnload' 
    s3_datastoring_path = 's3://mysamplebuck/' 
    s3_region = 'us_*****_2' 
    print ("Exporting from source") 
    src_conn = psycopg2.connect(host = host_rs, 
           port = port_rs, 
           database = database_rs, 
           user = user_rs, 
           password = password_rs) 
    print ("Connected to RS") 

    for i, tabe in enumerate(rs_tables): 
      if tabe[0] == tabe[-1]: 
       print("No files to read!") 
      unload_data(src_conn, aws_iam_role = iam_role, datastoring_path = s3_datastoring_path, region = s3_region, table_name = rs_tables[i]) 
      print (rs_tables[i]) 


if __name__=="__main__": 
main()

来源

2017-10-10 Chandana Puppy

你说，有使用“allowoverwrite”选项一个问题，但我没有真正遵循你的意思 - 请你能否更好/不同地解释？ –

感谢您的回复。如果我添加在卸载变量 'allowoverwrite' 如下：卸载= '' '卸载（' SELECT * FROM {} '）为 '{}' 凭证 'aws_iam_role = {}' 清单 gzip的定界符'， 'addquotes escape allowoverwrite'''.format（table_name，datastoring_path，aws_iam_role）所有表都可以写入s3存储桶，同时被下一个表覆盖。最后，我可以看到s3桶中的最后一个表格。 –

据抱怨你将数据保存到同一目的地。

这就像将计算机上的所有文件复制到相同的目录 - 将覆盖文件。

你应该改变你的datastoring_path是每个表不同，如：

.format(table_name, datastoring_path + '/' + table_name, aws_iam_role)

来源

2017-10-10 21:57:02

非常感谢。我也想为每个表格添加名称，但是我对python编码很陌生，所以我无法做到。你的回答给出了确切的解 –

卸载从红移多个文件至S3

回答

相关问题