I am a bit confused about the intended meaning of Parsing Canonical
Form. The spec suggests that when reducing a schema to PCF, only a
particular subset of fields should be retained:
[STRIP] Keep only attributes that are relevant to parsing data,
which are: type, name, fields, symbols, items, values, size. Strip all
others (e.g., doc and aliases).
However, certain other attributes are necessary for correctly
interpreting data when logical types are supported; for example, the
logicalType, precision, and scale attributes are necessary for correctly
What our Avro consumer is doing now is checking whether the reader and
writer schemas' PCFs match, and if so, we don't bother performing schema
resolution; this creates bugs for us when, for example, the writer
changes the scale of a decimal (we will continue interpreting it
according to the scale from the reader schema, giving wrong results).
Perhaps we shouldn't be doing this check, and should simply _always_
resolve schemas that differ in any way?
Anyone have an idea what the intended meaning of the spec is?
Brennan (Member of Technical Staff at Materialize, Inc.)