Re: Parsing canonical forms with schemas having default values.

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing canonical forms with schemas having default values.

Doug Cutting-2
When reading data, two schemas are used: a schema with the same
fingerprint as used to write the data, typically the actual schema
used to write, and the schema you'd like to project to.  Default
values are only used from the latter schema.

Matching fingerprints indicate binary compatibility.  Schema
resolution allows evolution to a schema with a different binary
format, i.e., with additional fields that specify a default value.

Schema compatibility through resolution cannot be represented in a
single number like a fingerprint.

Doug

On Tue, Jun 6, 2017 at 11:41 AM, Satish Duggana
<[hidden email]> wrote:

> https://avro.apache.org/docs/1.8.1/spec.html#Parsing+Canonical+Form+for+Schemas
>> Parsing Canonical Form is a transformation of a writer's schema that let's
>> us define what it means for two schemas to be "the same" for the purpose of
>> reading data written agains the schema. It is called Parsing Canonical Form
>> because the transformations strip away parts of the schema, like "doc"
>> attributes, that are irrelevant to readers trying to parse incoming data. It
>> is called Canonical Form because the transformations normalize the JSON text
>> (such as the order of attributes) in a way that eliminates unimportant
>> differences between schemas. If the Parsing Canonical Forms of two different
>> schemas are textually equal, then those schemas are "the same" as far as any
>> reader is concerned, i.e., there is no serialized data that would allow a
>> reader to distinguish data generated by a writer using one of the original
>> schemas from data generated by a writing using the other original schema.
>> (We sketch a proof of this property in a companion document.)
>
>
> Currently, it keeps only attributes of type, name, fields, symbols, items,
> values, size and strips all others including default attribute.
> Should not default attribute also be kept? Because schema with default value
> and without default value are not canonically same with respect to schema
> evolution.
>
> Thanks,
> Satish.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Parsing canonical forms with schemas having default values.

Satish Duggana-2
That makes sense. "Parsing Canonical Form" talks only about the schema used
in unmarshalling the payload and it does not involve projection and schema
resolution with compatibility can not be represented with a fingerprint.

Thanks,
Satish.


On Thu, Jun 8, 2017 at 12:14 AM, Doug Cutting <[hidden email]> wrote:

> When reading data, two schemas are used: a schema with the same
> fingerprint as used to write the data, typically the actual schema
> used to write, and the schema you'd like to project to.  Default
> values are only used from the latter schema.
>
> Matching fingerprints indicate binary compatibility.  Schema
> resolution allows evolution to a schema with a different binary
> format, i.e., with additional fields that specify a default value.
>
> Schema compatibility through resolution cannot be represented in a
> single number like a fingerprint.
>
> Doug
>
> On Tue, Jun 6, 2017 at 11:41 AM, Satish Duggana
> <[hidden email]> wrote:
> > https://avro.apache.org/docs/1.8.1/spec.html#Parsing+
> Canonical+Form+for+Schemas
> >> Parsing Canonical Form is a transformation of a writer's schema that
> let's
> >> us define what it means for two schemas to be "the same" for the
> purpose of
> >> reading data written agains the schema. It is called Parsing Canonical
> Form
> >> because the transformations strip away parts of the schema, like "doc"
> >> attributes, that are irrelevant to readers trying to parse incoming
> data. It
> >> is called Canonical Form because the transformations normalize the JSON
> text
> >> (such as the order of attributes) in a way that eliminates unimportant
> >> differences between schemas. If the Parsing Canonical Forms of two
> different
> >> schemas are textually equal, then those schemas are "the same" as far
> as any
> >> reader is concerned, i.e., there is no serialized data that would allow
> a
> >> reader to distinguish data generated by a writer using one of the
> original
> >> schemas from data generated by a writing using the other original
> schema.
> >> (We sketch a proof of this property in a companion document.)
> >
> >
> > Currently, it keeps only attributes of type, name, fields, symbols,
> items,
> > values, size and strips all others including default attribute.
> > Should not default attribute also be kept? Because schema with default
> value
> > and without default value are not canonically same with respect to schema
> > evolution.
> >
> > Thanks,
> > Satish.
> >
>
Loading...