[jira] [Created] (AVRO-2002) Canonical form strip the default value : Schema resolution may provide 2 different answers with same schema's fingerprint

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
Report Content as Inappropriate

[jira] [Created] (AVRO-2002) Canonical form strip the default value : Schema resolution may provide 2 different answers with same schema's fingerprint

JIRA jira@apache.org
DESLANDES created AVRO-2002:

             Summary: Canonical form strip the default value : Schema resolution may provide 2 different answers with same schema's fingerprint
                 Key: AVRO-2002
                 URL: https://issues.apache.org/jira/browse/AVRO-2002
             Project: Avro
          Issue Type: Bug
          Components: java
    Affects Versions: 1.8.1
            Reporter: DESLANDES

I understand that the schema‘s fingerprint describes uniquely the Avro Schema. The following example shows 2 different schemas, with the same fingerprint but different behaviours: one can read the writer, the other one can’t. I guess it is a bug but maybe it's only a misinterpretation…  
Here are the details :
First, the Canonical form of an Avro Schema is derived using this rule: (see http://avro.apache.org/docs/1.8.1/spec.html#Transforming+into+Parsing+Canonical+Form  )
[STRIP] Keep only attributes that are relevant to parsing data, which are: type, name, fields, symbols, items, values, size. Strip all others (e.g., doc and aliases). {quote}  
So any default attribute is removed.

On the other hand, Schema Resolution is done using this particular rule: (http://avro.apache.org/docs/1.8.1/spec.html#Schema+Resolution  )
{quote}if the reader's record schema has a field with no default value, and writer's schema does not have a field with the same name, an error is signalled.{quote}

To illustrate the situation on a simple schema (writer), I have created a new version by adding a new field to the schema with 2 options: one has a default attribute and value, the other one hasn’t.  The first one can read old version of writer, the second one can’t.
In other words, the canonical form does not take into account any default attribute for the record fields but the resolution algorithm uses the default attribute to evaluate the compatibility. The conclusion is that 2 schemas that differ only with a default attribute have the same finger print: one is compatible with the writer schema, the other one is not.
I understand the different behaviors but not with the same fingerprint.

I would suggest that the canonical form would not strip the default attribute (but strip the default value which should not interfere with the compatibility).
The immediate workaround I will use is to systematically use a default value for any additional field.

package Main;

import java.util.Collections;

import org.apache.avro.Schema;
import org.apache.avro.SchemaCompatibility;
import org.apache.avro.SchemaNormalization;
import org.apache.avro.SchemaValidationException;
import org.apache.avro.SchemaValidator;
import org.apache.avro.SchemaValidatorBuilder;

public class Main {

        public static void main(String[] args) {
                Schema schemaWriter = new org.apache.avro.Schema.Parser().parse(
                Schema schemaReader = new org.apache.avro.Schema.Parser().parse(
                Schema schemaReaderNoDefault = new org.apache.avro.Schema.Parser().parse(

                long fpWriter = SchemaNormalization.parsingFingerprint64(schemaWriter);
                long fpReader = SchemaNormalization.parsingFingerprint64(schemaReader);
                long fpReaderNoDefault = SchemaNormalization.parsingFingerprint64(schemaReaderNoDefault);
                System.out.println("Schema writer          " + fpWriter + " "+ schemaWriter);
                System.out.println("Schema reader          " + fpReader + " "+ schemaReader);
                System.out.println("Schema readerNoDefault " + fpReaderNoDefault + " "+ schemaReaderNoDefault);

                // check compatibility : method 1
                String res = SchemaCompatibility.checkReaderWriterCompatibility(schemaReader, schemaWriter).getType().toString() ;
                String resNoDefault = SchemaCompatibility.checkReaderWriterCompatibility(schemaReaderNoDefault, schemaWriter).getType().toString() ;
                System.out.println(fpReader + " is " + res +  " with " +fpWriter);
                System.out.println(fpReaderNoDefault + " is " + resNoDefault +  " with " +fpWriter);

                // check compatibility : method 2
                SchemaValidator validator = new SchemaValidatorBuilder().canReadStrategy().validateAll();
                String isCompatible="";
                try {
                        validator.validate(schemaReaderNoDefault,  Collections.singletonList(schemaWriter));
                } catch (SchemaValidationException e) {
                        isCompatible="not ";
                System.out.println(fpReaderNoDefault + " is "+ isCompatible +"compatible with " +fpWriter);

                try {
                        validator.validate(schemaReader,  Collections.singletonList(schemaWriter));
                } catch (SchemaValidationException e) {
                        isCompatible="not ";
                System.out.println(fpReader + " is "+ isCompatible +"compatible with " +fpWriter);

        //The output is :
        //Schema writer          8957007963871099370 {"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"}]}
        //Schema reader          489516346825099350 {"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"},{"name":"newField","type":"int","default":0}]}
        //Schema readerNoDefault 489516346825099350 {"type":"record","name":"ExampleAvro","fields":[{"name":"field","type":"long"},{"name":"newField","type":"int"}]}
        //489516346825099350 is COMPATIBLE with 8957007963871099370
        //489516346825099350 is INCOMPATIBLE with 8957007963871099370
        //489516346825099350 is not compatible with 8957007963871099370
        //489516346825099350 is compatible with 8957007963871099370


This message was sent by Atlassian JIRA