2017-08-24 96 views
1

场景 - 客户端使用Avro Reflect Datum Writer序列化POJO并将GenericRecord写入文件。 通过反射所获得的模式是这样的(注顺序A,B,d,C) -如果字段顺序发生变化,Avro模式不兼容

{ 
"namespace": "storage.management.example.schema", 

"type": "record", 
"doc": "Example schema for testing", 
"name": "Event", 
"fields": [ 
    .... 
    .... 
    { "name": "A", "type": "string" }, 
    { "name": "B", "type": "string" }, 
    { "name": "D", "type": "string" }, 
    { "name": "C", "type": "string" }, 
    .... 
    .... 
] 
} 

一个代理读取关闭该文件,并使用默认模式(注订货 - A,B, C,d)反序列化记录的一个子集(客户端保证具有这些字段)

{ 
"namespace": "storage.management.example.schema", 
"type": "record", 
"doc": "Example schema for testing", 
"name": "Event", 
"fields": [ 
    { "name": "A", "type": "string" }, 
    { "name": "B", "type": "string" }, 
    { "name": "C", "type": "string" }, 
    { "name": "D", "type": "string" } 
] 
} 

问题: 反序列与上述子集架构结果在下面的例外 -

Caused by: java.io.IOException: Invalid int encoding 
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:145) 
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259) 
at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201) 
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:430) 
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422) 
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180) 
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) 
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240) 
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230) 

但是,如果子集架构还指定A,B,D,C顺序字段(与客户端架构相同),则反序列化会成功

此行为是否预期?我虽然Avro只依赖于字段名来建立记录而不是排序。

对此有何修正?不同的客户端可能有不同的顺序,我无法强制排序,因为模式是通过反射生成的。

+0

您是否使用BinaryDecoder?如果是这样,请尝试使用DataFileReader。 'import org.apache.avro.file.DataFileReader' –

回答