Question on Avro serializing/deserializing

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Question on Avro serializing/deserializing

Steven Nguyen

Hi All,

 

Follow the Avro documentation at https://avro.apache.org/docs/1.9.2/gettingstartedjava.html, I define a schema like in the sample:

{"namespace": "example.avro",

"type": "record",

"name": "User",

"fields": [

     {"name": "name", "type": "string"},

     {"name": "favorite_number",  "type": ["int", "null"]},

     {"name": "favorite_color", "type": ["string", "null"]}

]

}                                                                                        

 

Then, I create 2 User records by following below and serialize it using DataFileWriter

Schema schema = new Schema.Parser().parse(new File("user.avsc"));

GenericRecord user1 = new GenericData.Record(schema);

user1.put("name", "Alyssa");

user1.put("favorite_number", 256);

 

// Leave favorite color null

GenericRecord user2 = new GenericData.Record(schema);

user2.put("name", "Ben");

user2.put("favorite_number", 7);

user2.put("favorite_color", "red");

// Serialize user1 and user2 to disk

File file = new File("users.avro");

DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(schema);

DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter);

dataFileWriter.create(schema, file);

dataFileWriter.append(user1);

dataFileWriter.append(user2);

dataFileWriter.close();

 

I noticed that the favorite_number and favorite_color fields are UNION type. Thus, I expected that the serialized data should look like

"favorite_number" : { "int" : 7} and "favorite_color" : { "string" : "red" }

 

But when I deserialized it, I got

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}

{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}

I also got expected result when using JsonEncoder and JsonDecoder

 

// Encoder to serialize

GenericDatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);

ByteArrayOutputStream os = new ByteArrayOutputStream();

Encoder e = EncoderFactory.get().jsonEncoder(schema, os);

writer.write(record, e);

e.flush();

byte[] serializedPayload = os.toByteArray();

 

// Decoder to deserialize

DatumReader<Record> reader = new GenericDatumReader<Record>(schema);

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, new ByteArrayInputStream(input));

GenericData.Record deserializedRecord = reader.read(null, decoder);

If I use below payload to produce message to my topic using the schema from Schema Registry

{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}

 

I will get the error Expected start-union. Got VALUE_NUMBER_INT. I think this error is correct behavior because the payload could not be validated with given schema.

 

Can anyone tell me why there is a difference between DataFileWriter and JsonEncoder?

 

Regards,

Steven


Virus-free. www.avg.com