2012-07-06 37 views
0

我试图用蓝本的Python代码示例这里python脚本加载CSV文件至BigQuery:https://developers.google.com/bigquery/docs/developers_guide的BigQuery:数据集“未找到”对表负载REST API

但我跑进以下错误,当我尝试加载一个表的REST API:

{'status': '200', 'content-length': '1492', 'expires': 'Fri, 01 Jan 1990 00:00:00 GMT', 'server': 'HTTP Upload Server Built on Jun 14 2012 02:12:09 (1339665129)', 'etag': '"tcivyOj9QvKAbuEJ5MEMf9we85w/-mxYhUDjvvydxcebR8fXI6l_5RQ"', 'pragma': 'no-cache', 'cache-control': 'no-cache, no-store, must-revalidate', 'date': 'Fri, 06 Jul 2012 22:30:55 GMT', 'content-type': 'application/json'} 

{ 
"kind": "bigquery#job", 
"etag": "\"tcivyOj9QvKAbuEJ5MEMf9we85w/-mxYhUDjvvydxcebR8fXI6l_5RQ\"", 
"id": "firespotter.com:firespotter:job_d6b99265278b4c0da9c3033acf39d6b2", 
"selfLink": "https://www.googleapis.com/bigquery/v2/projects/firespotter.com:firespotter/jobs/job_d6b99265278b4c0da9c3033acf39d6b2", 
"jobReference": { 
    "projectId": "firespotter.com:firespotter", 
    "jobId": "job_d6b99265278b4c0da9c3033acf39d6b2" 
}, 
"configuration": { 
    "load": { 
    "schema": { 
    "fields": [ 
    { 
     "name": "date", 
     "type": "STRING" 
    }, 
    { 
     "name": "time", 
     "type": "STRING" 
    }, 
    { 
     "name": "call_uuid", 
     "type": "STRING" 
    }, 
    { 
     "name": "log_level", 
     "type": "STRING" 
    }, 
    { 
     "name": "file_line", 
     "type": "STRING" 
    }, 
    { 
     "name": "message", 
     "type": "STRING" 
    } 
    ] 
    }, 
    "destinationTable": { 
    "projectId": "385479794093", 
    "datasetId": "telephony_logs", 
    "tableId": "table_name" 
    }, 
    "createDisposition": "CREATE_IF_NEEDED", 
    "writeDisposition": "WRITE_TRUNCATE", 
    "encoding": "UTF-8" 
    } 
}, 
"status": { 
    "state": "DONE", 
    "errorResult": { 
    "reason": "notFound", 
    "message": "Not Found: Dataset 385479794093:telephony_logs" 
    }, 
    "errors": [ 
    { 
    "reason": "notFound", 
    "message": "Not Found: Dataset 385479794093:telephony_logs" 
    } 
    ] 
} 
} 

错误中列出的专案编号“385479794093”不是我传中,专案编号,它的“项目编号”。该专案编号应该是“firespotter.com:firespotter”:

{ 
"kind": "bigquery#datasetList", 
"etag": "\"tcivyOj9QvKAbuEJ5MEMf9we85w/ZMa8z6LKMgWZIqLWh3ti2SsSs4g\"", 
"datasets": [ 
    { 
    "kind": "bigquery#dataset", 
    "id": "firespotter.com:firespotter:telephony_logs", 
    "datasetReference": { 
    "datasetId": "telephony_logs", 
    "projectId": "firespotter.com:firespotter" 
    } 
    } 
] 
} 

为什么REST API坚持提供自己的不正确专案编号,当我通过在三个不同的地方正确的价值?是否有另一个地方需要传入或设置项目ID?

仅供参考,这里是有关的代码片段:

PROJECT = 'firespotter.com:firespotter' 
DATASET = 'telephony_logs' 


FLOW = OAuth2WebServerFlow(
    client_id='385479794093.apps.googleusercontent.com', 
    client_secret='<a_secret_here>', 
    scope='https://www.googleapis.com/auth/bigquery', 
    user_agent='firespotter-upload-script/1.0') 

def loadTable(http, projectId, datasetId, tableId, file_path, replace=False): 
    url = "https://www.googleapis.com/upload/bigquery/v2/projects/" + projectId + "/jobs" 
    # Create the body of the request, separated by a boundary of xxx 
    mime_data = ('--xxx\n' + 
      'Content-Type: application/json; charset=UTF-8\n' + '\n' + 
      '{\n' + 
      ' "projectId": "' + projectId + '",\n' + 
      ' "configuration": {\n' + 
      '  "load": {\n' + 
      '  "schema": {\n' + 
      '   "fields": [\n' + 
      '   {"name":"date", "type":"STRING"},\n' + 
      '   {"name":"time", "type":"STRING"},\n' + 
      '   {"name":"call_uuid", "type":"STRING"},\n' + 
      '   {"name":"log_level", "type":"STRING"},\n' + 
      '   {"name":"file_line", "type":"STRING"},\n' + 
      '   {"name":"message", "type":"STRING"}\n' + 
      '  ]\n' + 
      '  },\n' + 
      '  "destinationTable": {\n' + 
      '  "projectId": "' + projectId + '",\n' + 
      '  "datasetId": "' + datasetId + '",\n' + 
      '  "tableId": "' + tableId + '"\n' + 
      '  },\n' + 
      '  "createDisposition": "CREATE_IF_NEEDED",\n' + 
      '  "writeDisposition": "' + ('WRITE_TRUNCATE' if replace else 'WRITE_APPEND') + '",\n' + 
      '  "encoding": "UTF-8"\n' + 
      ' }\n' + 
      ' }\n' + 
      '}\n' + 
      '--xxx\n' + 
      'Content-Type: application/octet-stream\n' + 
      '\n') 
    # Append data from the specified file to the request body 
    f = open(file_path, 'r') 
    header_line = f.readline() # skip header line 
    mime_data += f.read() 

    # Signify the end of the body 
    mime_data += ('--xxx--\n') 

    headers = {'Content-Type': 'multipart/related; boundary=xxx'} 
    resp, content = http.request(url, method="POST", body=mime_data, headers=headers) 
    print str(resp) + "\n" 
    print content 

# --- Main ---------------------------------------------- 
def main(argv): 

    csv_path = args[0] 

    # If the credentials don't exist or are invalid, run the native client 
    # auth flow. The Storage object will ensure that if successful the good 
    # credentials will get written back to a file. 
    storage = Storage('bigquery2_credentials.dat') # Choose a file name to store the credentials. 
    credentials = storage.get() 
    if credentials is None or credentials.invalid: 
    credentials = run(FLOW, storage) 

    # Create an httplib2.Http object to handle our HTTP requests and authorize it 
    # with our good credentials. 
    http = httplib2.Http() 
    http = credentials.authorize(http) 

    loadTable(http, PROJECT, DATASET, 'table_name', csv_path, replace=True) 

if __name__ == '__main__': 
    main(sys.argv) 

回答

1

你最近设置的项目id来firespotter.com:firespotter?如果数据集是在项目命名之前创建的,则旧项目标识和新标识之间将存在不匹配。有一个自动化的系统可以更新项目ID,但有可能它还没有运行或有问题(我现在正在度假,所以无法检查)。希望如果你再次尝试一段时间,它就会起作用。如果没有,请告诉我们。

+0

是的!我在同一天为项目命名(因为我认为这是必需的)。我星期五尝试了几个小时没有成功,但脚本现在正在与“firespotter.com:firespotter”projectId一起工作。感谢乔丹! – 2012-07-08 13:30:07

0

这里有几个问题:

  • 为什么我的负荷作业是否失败?只是为了检查,是你发送的整个请求?如果是这样,看起来没有要加载的数据,即sourceUris为空。如果是这样,那就是问题所在,我们显然正在回复世界上最糟糕的错误信息。

  • 为什么数字项目ID? BigQuery可互换地使用项目名称和关联的数字标识,因此您所看到的只是我们倾向于将项目名称转换为标识。只需确认,如果您访问Google APIs Console并查找项目,请执行你在url中看到相同的数字ID?

  • 为什么在多个地方指定项目ID?首先,您似乎将项目ID指定为作业中的顶级属性;这不应该是必要的。 (我怀疑它只是覆盖了你在作业参考中指定的任何项目ID)。这留下了两个位置 - 一个作为作业参考的一部分,另一个作为表格参考的一部分。这些实际上表示了两件不同的事情 - 工作中的人员指定了你要插入工作的项目,即谁在为工作付款,而表中的项目指定了生成的表所在的项目,即谁拥有结果数据。一般来说,这些将是相同的,但API允许它们是不同的。 (如果,例如,您构建了需要将数据插入到表最终由客户所拥有的服务,这非常有用。)

+0

我将一个csv文件附加到MIME数据。我能够通过浏览器工具手动上传相同的CSV文件,并且在返回的结果中,数据集被列为“firespotter.com:firespotter:telephony_logs”。这似乎是两者之间的差异。 – 2012-07-07 23:54:08

+0

阿洛斯,是的,当我访问Google API控制台时,我确实在URL中看到了385479794093(项目编号)。如果列出项目编号或字符串ID的数据集,我会看到firespotter.com:firespotter:telephony_logs的数据集。 – 2012-07-07 23:56:25

+0

我认为这是BigQuery中的一个错误。为了确认,我创建了另一个项目,但没有命名。然后我在同一个脚本中输入项目编号('725870756519')作为projectId。它的工作。 – 2012-07-08 00:01:19