Parsing Canonical Form and logical types

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Parsing Canonical Form and logical types

Brennan Vincent
Hello,

I am a bit confused about the intended meaning of Parsing Canonical
Form. The spec suggests that when reducing a schema to PCF, only a
particular subset of fields should be retained:

     [STRIP] Keep only attributes that are relevant to parsing data,
which are: type, name, fields, symbols, items, values, size. Strip all
others (e.g., doc and aliases).

However, certain other attributes are necessary for correctly
interpreting data when logical types are supported; for example, the
logicalType, precision, and scale attributes are necessary for correctly
interpreting decimals.

What our Avro consumer is doing now is checking whether the reader and
writer schemas' PCFs match, and if so, we don't bother performing schema
resolution; this creates bugs for us when, for example, the writer
changes the scale of a decimal (we will continue interpreting it
according to the scale from the reader schema, giving wrong results).
Perhaps we shouldn't be doing this check, and should simply _always_
resolve schemas that differ in any way?

Anyone have an idea what the intended meaning of the spec is?

Thanks,
Brennan (Member of Technical Staff at Materialize, Inc.)