1
场景 - 客户端使用Avro Reflect Datum Writer序列化POJO并将GenericRecord写入文件。 通过反射所获得的模式是这样的(注顺序A,B,d,C) -如果字段顺序发生变化,Avro模式不兼容
{
"namespace": "storage.management.example.schema",
"type": "record",
"doc": "Example schema for testing",
"name": "Event",
"fields": [
....
....
{ "name": "A", "type": "string" },
{ "name": "B", "type": "string" },
{ "name": "D", "type": "string" },
{ "name": "C", "type": "string" },
....
....
]
}
一个代理读取关闭该文件,并使用默认模式(注订货 - A,B, C,d)反序列化记录的一个子集(客户端保证具有这些字段)
{
"namespace": "storage.management.example.schema",
"type": "record",
"doc": "Example schema for testing",
"name": "Event",
"fields": [
{ "name": "A", "type": "string" },
{ "name": "B", "type": "string" },
{ "name": "C", "type": "string" },
{ "name": "D", "type": "string" }
]
}
问题: 反序列与上述子集架构结果在下面的例外 -
Caused by: java.io.IOException: Invalid int encoding
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:145)
at org.apache.avro.io.BinaryDecoder.readString(BinaryDecoder.java:259)
at org.apache.avro.io.ResolvingDecoder.readString(ResolvingDecoder.java:201)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:430)
at org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.java:422)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:240)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:230)
但是,如果子集架构还指定A,B,D,C顺序字段(与客户端架构相同),则反序列化会成功
此行为是否预期?我虽然Avro只依赖于字段名来建立记录而不是排序。
对此有何修正?不同的客户端可能有不同的顺序,我无法强制排序,因为模式是通过反射生成的。
您是否使用BinaryDecoder?如果是这样,请尝试使用DataFileReader。 'import org.apache.avro.file.DataFileReader' –