Question on Avro serializing/deserializing

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Question on Avro serializing/deserializing

Steven Nguyen

Hi All,


Follow the Avro documentation at, I define a schema like in the sample:

{"namespace": "example.avro",

"type": "record",

"name": "User",

"fields": [

     {"name": "name", "type": "string"},

     {"name": "favorite_number",  "type": ["int", "null"]},

     {"name": "favorite_color", "type": ["string", "null"]}




Then, I create 2 User records by following below and serialize it using DataFileWriter

Schema schema = new Schema.Parser().parse(new File("user.avsc"));

GenericRecord user1 = new GenericData.Record(schema);

user1.put("name", "Alyssa");

user1.put("favorite_number", 256);


// Leave favorite color null

GenericRecord user2 = new GenericData.Record(schema);

user2.put("name", "Ben");

user2.put("favorite_number", 7);

user2.put("favorite_color", "red");

// Serialize user1 and user2 to disk

File file = new File("users.avro");

DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);

DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);

dataFileWriter.create(schema, file);





I noticed that the favorite_number and favorite_color fields are UNION type. Thus, I expected that the serialized data should look like

"favorite_number" : { "int" : 7} and "favorite_color" : { "string" : "red" }


But when I deserialized it, I got

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}

{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}

I also got expected result when using JsonEncoder and JsonDecoder


// Encoder to serialize

GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);

ByteArrayOutputStream os = new ByteArrayOutputStream();

Encoder e = EncoderFactory.get().jsonEncoder(schema, os);

writer.write(record, e);


byte[] serializedPayload = os.toByteArray();


// Decoder to deserialize

DatumReader<Record> reader = new GenericDatumReader<Record>(schema);

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, new ByteArrayInputStream(input));

GenericData.Record deserializedRecord =, decoder);

If I use below payload to produce message to my topic using the schema from Schema Registry

{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}


I will get the error Expected start-union. Got VALUE_NUMBER_INT. I think this error is correct behavior because the payload could not be validated with given schema.


Can anyone tell me why there is a difference between DataFileWriter and JsonEncoder?