2017-10-19 68 views
1

阵列使用的Avro提取U型SQL:阵列可以使用的Avro提取

使用EventHub不能为空,捕捉到Blob存储我有基于任何试图改变该文件的AvroSamples功能不能为空。

这是我的U型SQL脚本:

REFERENCE ASSEMBLY [Newtonsoft.Json]; 
REFERENCE ASSEMBLY [log4net]; 
REFERENCE ASSEMBLY [Avro]; 
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 


DECLARE @ABI_DATE string = "2017/10/17/"; //replace by ADF pipeline 
DECLARE @input_file string = "wasb://[email protected]/namespace/eh/{*}/" + @ABI_DATE +"{*}/{*}/{*}"; 
DECLARE @output_file string = @"/output/" + @ABI_DATE + "extract.csv"; 


@rs = 
EXTRACT 
     SequenceNumber long 
     ,EnqueuedTimeUtc string 
     ,Body byte[] 
FROM @input_file 
USING new Microsoft.Analytics.Samples.Formats.ApacheAvro.AvroExtractor(@" 
    { 
     ""type"":""record"", 
     ""name"":""EventData"", 
     ""namespace"":""Microsoft.ServiceBus.Messaging"", 
     ""fields"":[ 
      {""name"":""SequenceNumber"",""type"":""long""}, 
      {""name"":""Offset"",""type"":""string""}, 
      {""name"":""EnqueuedTimeUtc"",""type"":""string""}, 
      {""name"":""SystemProperties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}}, 
      {""name"":""Properties"",""type"":{""type"":""map"",""values"":[""long"",""double"",""string"",""bytes""]}}, 
      {""name"":""Body"",""type"":[""null"",""bytes""]} 
     ] 
    } 
"); 

@cnt = 
SELECT 
    SequenceNumber 
    ,Encoding.UTF8.GetString(Body) AS Json //THIS LINE BREAKS !!!! 
    ,EnqueuedTimeUtc 
FROM @rs; 

OUTPUT @cnt TO @output_file USING Outputters.Text(); 

如果我运行相同的提取,但注释掉它按预期工作的正文字段。

这是错误:

Inner exception from user expression: Array cannot be null. Parameter name: bytes Current row dump: SequenceNumber: 4622 EnqueuedTimeUtc: NULL Body: NULL

Error while evaluating expression Encoding.UTF8.GetString(Body)

回答

1

弗洛里安·曼德,给我的解释是:

the extractor works correctly, you are just passing null values (intentionally, because it's in the schema) in a method (Encoding.GetString) that doesn't accept null as input. In your latest solution you will lose all the records that don't have a body, though. That's a non technical decision if this is fine or not.

因此,这是解决它的方法(使用WHERE子句)

@cnt = 
SELECT 
    SequenceNumber 
    ,Encoding.UTF8.GetString(Body) AS Json 
    ,EnqueuedTimeUtc 
FROM @rs 
WHERE Body != null;