Avro 在实际的应用中, 会因为版本的问题遇到读和写的schema不相同的情况. 这个时候就需要做兼容.
压缩数据时, 一般用 DatumWriter 写, 用 DatumReader 读.
write 的时候, 如果只有用 Avro 压缩, 没有使用 encoder, 则可以设置 default 值做兼容: hadoop深入研究:(十八)——Avro schema兼容
当有 encode 时, 则在 decode 时会用到 ResolvingDecoder, 需要同时传入 writer 和 reader 的 schema.
/**
* Produces an opaque resolver that can be used to construct a new
* {@link ResolvingDecoder#ResolvingDecoder(Object, Decoder)}. The
* returned Object is immutable and hence can be simultaneously used
* in many ResolvingDecoders. This method is reasonably expensive, the
* users are encouraged to cache the result.
*
* @param writer The writer's schema. Cannot be null.
* @param reader The reader's schema. Cannot be null.
* @return The opaque reolver.
* @throws IOException
*/
public static Object resolve(Schema writer, Schema reader)
throws IOException {
if (null == writer) {
throw new NullPointerException("writer cannot be null!");
}
if (null == reader) {
throw new NullPointerException("reader cannot be null!");
}
return new ResolvingGrammarGenerator().generate(writer, reader);
}
因此, 在构造 GenericDatumReader 或者 ReflectDatumReader 的时候, 必须要传入 writer 的 schema . 此时如果要兼容, 就需要维护两份 schema, 还是有些不方便的.
DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schmOld, schmNew);
ReflectDatumReader<TestClass> reader=new ReflectDatumReader<TestClass>(schmOld, schmNew);
目前的 Avor 1.7.7 还不支持直接将 GenericRecord 转为由 avsc 文件生成的类的对象, 使用 ReflectDatumReader 会方便很多.
可参考: SpecificRecord builders should share more functionality with GenericRecord builders