Modify an existing Avro schema

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Modify an existing Avro schema

FIXED-TERM Sonnentag Paul (CR/PJ-AI-S1)

I’m implementing an application which reads in structured data with an Avro schema applies some dynamically configurable transformations and outputs the data with Avro again. The problem I have is that for some transformations I need to modify the Avro schema. One transform could be for example that I read a value from a field apply some function to the value and write it back to a new field. In this scenario I need to add the new field to the output schema. I haven’t found a really good way to do this with Avro. What I’m doing right now is reading all the fields from the old schema, create a new schema and copy all the fields over to this new schema:


Nicely formatted version:


// ...


// creating a new schema with the fields of the old schema added plus the new fields


val schema = // ... the schema of the input data


var newSchema = SchemaBuilder





    // create new schema with existing fields from schemas and new fields which are created through transforms

    val fields = schema.getFields ++ getNewFields(schema, transforms)



      .foldLeft(newSchema)((newSchema, field: Schema.Field) => {





          // TODO: find way to differentiate between explicitly set null defaults and fields which have no default








// ...
// create new fields like this 
new Schema.Field(


Any ideas how this could be done in a way that doesn’t feel so hacky?