0

我正在将Google数据流中的数据推送到Google BigQuery。我有TableRow对象与其中的数据。 TableRow中的一列包含字符串数组。Google BigQuery是否支持ARRAY <STRING>?

here我发现Google BigQuery支持Array列类型。 所以我试图创建与ARRAY<SCHEMA>类型的表。但我得到了下面的错误

com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad Request 
{ 
    "code" : 400, 
    "errors" : [ { 
    "domain" : "global", 
    "message" : "Invalid value for: ARRAY<STRING> is not a valid value", 
    "reason" : "invalid" 
    } ], 
    "message" : "Invalid value for: ARRAY<STRING> is not a valid value" 
} 
com.google.cloud.dataflow.sdk.util.UserCodeException.wrapIf(UserCodeException.java:47) 
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.wrapUserCodeException(DoFnRunnerBase.java:369) 
com.google.cloud.dataflow.sdk.util.DoFnRunnerBase.finishBundle(DoFnRunnerBase.java:162) 
com.google.cloud.dataflow.sdk.runners.worker.SimpleParDoFn.finishBundle(SimpleParDoFn.java:194) 
com.google.cloud.dataflow.sdk.runners.worker.ForwardingParDoFn.finishBundle(ForwardingParDoFn.java:47) 

这里是我使用的值发布至BigQuery

.apply(BigQueryIO.Write.named("Write enriched data") 
       .withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED) 
       .withSchema(getSchema()) 
       .withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND) 
       .to("table_name")); 

而且这里的代码架构建设

private static TableSchema getSchema() { 
    List<TableFieldSchema> fields = new ArrayList<>(); 

    fields.add(new TableFieldSchema().setName("column1").setType("STRING")); 
    fields.add(new TableFieldSchema().setName("column2").setType("STRING")); 
    fields.add(new TableFieldSchema().setName("array_column").setType("ARRAY<STRING>")); 

    return new TableSchema().setFields(fields); 
} 

我如何可以插入阵列字符串转换为BigQuery表?

回答

4

要在BigQuery中定义ARRAY<STRING>,我将该字段设置为'STRING',并将其模式设置为'REPEATED'。

在Python比如它定义为field = SchemaField(name='field_1', type='STRING', mode='REPEATED')

对于Java客户端,我可以看到你有同样的选项,你可以定义为TYPESTRINGMODEREPEATED